star twitter facebook envelope linkedin instagram youtube alert-red alert home left-quote chevron hamburger minus plus search triangle x

New Method Clusters Sparse Data To Help Consumers Analyze Health Care Providers

Understanding health data is important to advancing clinical care, but clinical data are naturally sparse because most medical concepts do not apply to most individuals. This is because any individual physician performs a small subset of procedures, and any individual patient fills prescriptions for a small subset of medications. Recently, researchers developed a method that groups sparse information to help analyze large-scale health data from claims and health-records databases. The method, which clusters providers or patients based on the number of times each billing code, procedure code, or drug prescription appears in Medicare claims, can help consumers analyze health care providers.

The method was developed by researchers at Carnegie Mellon University, the U.S. Department of Veterans Affairs, and Harvard Medical School. It was presented at the annual symposium of the American Medical Informatics Association in Washington, D.C., in November.

“We developed a new framework to cluster providers based on their claims profile of medications filled and procedures billed, separating groups into clusters that can be characterized by subspecialties,” explains Jeremy C. Weiss, assistant professor of health informatics at Carnegie Mellon University’s Heinz College, who coauthored the study.

After developing the method, the researchers demonstrated that their new approach worked well on simulated data, then retested it on actual data from Medicare claims. In one case study, the researchers looked at summary information from Medicare Part D for 2015, which included all prescriptions and procedures given under the program that year, and concluded that their method performed better than previously designed models. They undertook several additional case studies to demonstrate their method’s flexibility and its ability to work well on actual datasets of health information.

By clustering the data in this way, the authors uncovered several findings that could be of interest to medical practitioners and the public. For example, their work shows that intensive prescribing of opioids is done across many specialties, and that the preferences follow specialty case mixes. The study also shows that care by nurse practitioners and physician assistants differs according to the specialty of the physician, which offers a way to characterize the fields the workforce focuses on.

“We hope our work helps patients visualize the type of care their provider offers based on their prescription and procedure signature,” explains Nathanael Fillmore, associate director of machine learning and predictive analytics at the U.S. Department of Veterans Affairs and instructor in medicine at Harvard Medical School, the first author of the study.


Summarized from a paper presented at the annual symposium of the American Medical Informatics Association, Hypersphere Clustering to Characterize Healthcare Providers Using Prescriptions and Procedures from Medicare Claims Data by Fillmore, N (U.S. Department of Veterans Affairs and Harvard Medical School), Goryachev, SD (U.S. Department of Veterans Affairs), and Weiss, JC (Carnegie Mellon University). Copyright 2019. All rights reserved.

About Carnegie Mellon University's Heinz College of Information Systems and Public Policy
The Heinz College of Information Systems and Public Policy is home to two internationally recognized graduate-level institutions at Carnegie Mellon University: the School of Information Systems and Management and the School of Public Policy and Management. This unique colocation combined with its expertise in analytics set Heinz College apart in the areas of cybersecurity, health care, the future of work, smart cities, and arts & entertainment. In 2016, INFORMS named Heinz College the #1 academic program for Analytics Education. For more information, please visit