New Machine-Learning Algorithm Addresses Problems Related to Classification with Few Samples

Machine learning has proven successful in data-intensive applications but is often hampered when the data set is small. For example, in screening via breast mammography, when the number of samples collected is small, diagnosing the type of breast cancer requires time-consuming and costly analysis by pathologists, often leading to results that lack consensus. In a new study, researchers developed an algorithm to address this problem and demonstrated the competitive performance of their model.

The study, by researchers at Carnegie Mellon University (CMU), the Chinese University of Hong Kong (CUHK), the Georgia Institute of Technology, and the University of Texas at Austin, appears in an article published at the Conference on Neural Information Processing Systems.

“Learning a robust classifier from a few samples is a major challenge in machine learning,” notes Shixiang Zhu, assistant professor of data analytics at CMU’s Heinz College, who led the study. “Prior research has focused on developing the so-called k-nearest neighbor based on algorithms combined with metric learning that captures similarities between samples.”

The k-nearest neighbor algorithm is a data-classification method to estimate the probability that a data point will become a member of one group or another based on the group the data points nearest it belong to.

In this study, researchers explored a method of finding the best weighted k-nearest neighbors classifiers that can handle uncertainty in the features by using a minimax distributionally robust approach. They developed an algorithm—which they called Dr.k-NN: To make a decision, each neighboring sample is weighted according to least favorable distributions resulting from a distributionally robust problem.

Both theoretical results and experiments showed that this approach achieved outstanding performance in classification accuracy compared with other baselines using minimal resources.

“By efficiently solving the functional optimization problem, our study demonstrated that their algorithm holds promise for other machine learning tasks,” says Liyan Xie, assistant professor of data science at CUHK, who coauthored the study.

###

Summarized from an article published at the Conference on Neural Information Processing Systems, Distributionally Robust Weighted k-Nearest Neighbors by Zhu, S (Carnegie Mellon University), Xie, L (Chinese University of Hong Kong), Zhang, M (Georgia Institute of Technology), Gao, R (University of Texas at Austin), and Xie, Y (Georgia Institute of Technology). Copyright 2022. All rights reserved.

About Heinz College of Information Systems and Public Policy
The Heinz College of Information Systems and Public Policy is home to two internationally recognized graduate-level institutions at Carnegie Mellon University: the School of Information Systems and Management and the School of Public Policy and Management. This unique colocation combined with its expertise in analytics set Heinz College apart in the areas of cybersecurity, health care, the future of work, smart cities, and arts & entertainment. In 2016, INFORMS named Heinz College the #1 academic program for Analytics Education. For more information, please visit www.heinz.cmu.edu.

New Machine-Learning Algorithm Addresses Problems Related to Classification with Few Samples

Meet Sam Bigham, Heinz College Student and Carnegie's Youngest Mayor

CMU Students Partner with GAO to Analyze Semiconductor Incentives

Students develop machine learning tool to mitigate side effects from cancer treatment