star twitter facebook envelope linkedin instagram youtube alert-red alert home left-quote chevron hamburger minus plus search triangle x

When Combating Health Care Fraud, Machine Learning Tools Can Identify Providers That Overbill Insurers

The United States spends more than $4 trillion a year on health care, largely conducted by private providers and reimbursed by insurers. Major concerns in this system are overbilling, waste, and fraud by providers, who face incentives to misreport their claims to receive higher payments. In a new study, researchers developed machine learning tools to identify medical providers that overbill insurers.

The study, by researchers at Carnegie Mellon University (CMU) and Boston University (BU), appears as a National Bureau of Economic Research working paper.

“We used claims data from Medicare to identify patterns consistent with fraud or overbilling from in-patient hospitalizations,” explains Shubhranshu Shekhar, a PhD candidate in machine learning and public policy at CMU’s Heinz College, who led the study. “The tools we developed can be used to guide investigations as well as audits of suspicious providers for public and private health insurance systems.”

Fraud in health care can be difficult to detect. In this study, researchers used data from Medicare, the largest federal health care program for people ages 65 and older and the disabled, to examine patients’ medical histories and demographic characteristics, as well as providers’ coding patterns and spending. The study considered patients hospitalized in 2017—11.2 million claims from 6.6 million beneficiaries representing more than 7,600 providers—and included data from 2012 through 2016 to construct patients’ medical history.

Researchers developed machine learning tools to detect atypical behavior consistent with fraud and abuse by identifying excess expenditures. By quantifying additional spending incurred based merely on the choice of visiting a certain hospital, the methods can pinpoint specific coding behavior that stands out and associate it with higher reimbursement. As such, the techniques reflect economic value and are driven by the capability to pinpoint to individual codes and claims that can help initiate the auditing process.

Researchers validated their methods using information from the U.S. Department of Justice on hospitals facing anti-fraud lawsuits and case studies of suspicious providers.

The machine learning framework provides scalable and explainable tools to detect fraud and abuse in health care systems, according to the authors. Their methods require neither knowledge of past fraudulent activity nor labeling efforts that are often unavailable or costly. Since the methods do not require human supervision, they can process large amounts of health care claims from hospitals and in so doing, identify underlying patterns and quantify discrepancies.

The methods also allow auditors to receive multiple pieces of evidence with explanations of the suspiciousness of flagged providers to support cases and guide ongoing investigations.

“Efforts to detect and expose fraud are paramount for limiting the growth of wasteful spending,” says Leman Akoglu, associate professor of information systems at CMU’s Heinz College, who coauthored the study. “Our methods not only outperformed baseline algorithms and random targeting, but also allowed us to characterize the types of providers most likely to be ranked as suspicious, which may be useful for guiding anti-fraud policy.”

“Our methods can extend beyond Medicare and hospitalizations,” notes Jetson Leder-Luis, assistant professor of markets, public policy, and law at BU’s Questrom School of Business, who coauthored the study. “They can be used to detect fraud against private insurers, who face many of the same issues; to facilitate audits; and to examine fraud in Medicaid, the federal-state subsidy for individuals with low incomes.”

The research was funded by the National Institute on Aging of the National Institutes of Health.


Summarized from an NBER Working Paper, Unsupervised Machine Learning for Explainable Health Care Fraud Detection by Shekhar, S (Carnegie Mellon University), Leder-Luis, J (Boston University), and Akoglu, L (Carnegie Mellon University). Copyright 2023 The Authors. All rights reserved.

About Heinz College of Information Systems and Public Policy
The Heinz College of Information Systems and Public Policy is home to two internationally recognized graduate-level institutions at Carnegie Mellon University: the School of Information Systems and Management and the School of Public Policy and Management. This unique colocation combined with its expertise in analytics set Heinz College apart in the areas of cybersecurity, health care, the future of work, smart cities, and arts & entertainment. In 2016, INFORMS named Heinz College the #1 academic program for Analytics Education. For more information, please visit