Faculty Details

Photo of Daniel B. Neill

Daniel B. Neill

Associate Professor of Information Systems

Full-time Faculty

Office: HBH 2105B
Voice: 412-268-3885
Email: neill@cs.cmu.edu
View curriculum vitae (.pdf)
Personal Website


Daniel B. Neill is Associate Professor of Information Systems at the Heinz College, where he directs the Event and Pattern Detection Laboratory and the Joint Ph.D. Program in Machine Learning and Policy. He also holds courtesy appointments in the Machine Learning Department and Robotics Institute, School of Computer Science, and the University of Pittsburgh's Department of Biomedical Informatics. He earned his M.Phil. in Computer Speech at Cambridge University (2002), his M.S. in Computer Science at Carnegie Mellon (2004), and his Ph.D. in Computer Science at Carnegie Mellon (2006).

Prof. Neill was a recipient of the Winston Churchill Scholarship and NSF Graduate Research Fellowship, and received the prestigious NSF CAREER Award from the National Science Foundation for his work on "Machine Learning and Event Detection for the Public Good". This research project focuses on novel methods for detection of emerging events in massive, complex real-world datasets. The approach consists of new algorithms to efficiently and exactly find the most anomalous subsets of a large, high-dimensional dataset, as well as methodological advances to incorporate incremental model learning from user feedback into event detection, incorporate society-scale data from emerging, transformative technologies such as cellular phones and user-generated web content, and augment event detection by creating methods and tools for event characterization, explanation, visualization, investigation and response. The research is integrated with a multi-pronged educational initiative to incorporate machine learning into the public policy curriculum through development of courses and seminars, workshops in machine learning and policy research and education, and establishment of a new Joint Ph.D. Program in Machine Learning and Policy. Prof. Neill's work is also funded by two other awards from the National Science Foundation, "Fast Subset Scan for Anomalous Pattern Detection" and "Discovering Complex Anomalous Patterns", and he has received additional funding from the Centers for Disease Control, CRTI, and the University of Pittsburgh Medical Center (UPMC).

Prof. Neill's research interests include machine learning, data mining, artificial intelligence, and health care information systems. He is particularly interested in developing methods for automatic detection and investigation of emerging events and other anomalous or interesting patterns in massive real-world datasets. Applications of this work include the very early detection of emerging outbreaks of disease, prediction of emerging patterns of violent crime, detecting anomalous patterns of patient care in a clinical setting, and addressing various homeland security challenges (e.g. customs monitoring, network intrusion detection). His work has appeared in a variety of journals and collections, including the Journal of the Royal Statistical Society, Machine Learning, Statistics in Medicine, International Journal of Forecasting, International Journal of Health Geographics, Advances in Neural Information Processing Systems, Advances in Disease Surveillance, Journal of Theoretical Biology, and Rationality and Society. He received the best paper award at the National Syndromic Surveillance Conference in 2005 for his work on Bayesian spatial scan statistics. Recently, Prof. Neill was named one of the "top ten artificial intelligence researchers to watch" by IEEE Intelligent Systems Magazine.

Prof. Neill has been actively involved in curriculum development and teaching at the intersection of machine learning and public policy. He is the developer and coordinator of CMU's Joint Ph.D. Program in Machine Learning and Policy, jointly administered by the Machine Learning Department (School of Computer Science) and Heinz College. He has developed an introductory course in "Large Scale Data Analysis for Policy"(90-866) for the MSPPM program, a Ph.D. Research Seminar in Machine Learning and Policy (90-904/10-830), and a series of courses, "Special Topics in Machine Learning and Policy" (90-921/10-831), with topics including "Event and Pattern Detection", "Machine Learning for the Developing World", and "Harnessing the Wisdom of Crowds". He also teaches the core statistics course for the MISM program (95-796, "Statistics for IT Managers").

Additional information about Prof. Neill's research, teaching, and curriculum development initiatives, as well as links to his publications, is available on his home page, http://www.cs.cmu.edu/~neill.

Selected Recent Publications

Daniel B. Neill. Fast subset scan for spatial pattern detection. Journal of the Royal Statistical Society (Series B: Statistical Methodology) 74(2): 337-360, 2012.

Daniel B. Neill. Fast Bayesian scan statistics for multivariate event detection and visualization. Statistics in Medicine 30(5): 455-469, 2011.

Sharique Hasan, George T. Duncan, Daniel B. Neill, and Rema Padman. Automatic detection of omissions in medication lists. Journal of the American Medical Informatics Association 18(4): 449-458, 2011.

Daniel Oliveira, Daniel B. Neill, James H. Garrett Jr., and Lucio Soibelman. Detection of patterns in water distribution pipe breakage using spatial scan statistics for point events in a physical network. Journal of Computing in Civil Engineering 25(1): 21-30, 2011.

Daniel B. Neill and Gregory F. Cooper. A multivariate Bayesian scan statistic for early event detection and characterization. Machine Learning 79: 261-282, 2010.

Daniel B. Neill. An empirical comparison of spatial scan statistics for outbreak detection. International Journal of Health Geographics 8: 20, 2009.

Daniel B. Neill. Expectation-based scan statistics for monitoring spatial time series data. International Journal of Forecasting 25: 498-517, 2009.

Daniel B. Neill, Gregory F. Cooper, Kaustav Das, Xia Jiang, and Jeff Schneider. Bayesian network scan statistics for multivariate pattern detection. In J. Glaz, V. Pozdnyakov, and S. Wallenstein, eds., Scan Statistics: Methods and Applications, 221-250, 2009.

Kaustav Das, Jeff Schneider, and Daniel B. Neill. Anomaly pattern detection in categorical datasets. Proceedings of the 14th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 169-176, 2008.

Daniel B. Neill and Andrew W. Moore. Methods for detecting spatial and spatio-temporal clusters. In M. Wagner, A. Moore, and R. Aryel, eds., Handbook of Biosurveillance, 2006.

Daniel B. Neill, Andrew W. Moore, and Gregory F. Cooper. A Bayesian spatial scan statistic. In Advances in Neural Information Processing Systems 18, 1003-1010, 2006.

Daniel B. Neill, Andrew W. Moore, Maheshkumar Sabhnani, and Kenny Daniel. Detection of emerging space-time clusters. Proceedings of the 11th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 218-227, 2005.

Daniel B. Neill, Andrew W. Moore, Francisco Pereira, and Tom Mitchell. Detecting significant multidimensional spatial clusters. In Advances in Neural Information Processing Systems 17, 969-976, 2005.

Daniel B. Neill and Andrew W. Moore. Rapid detection of significant spatial clusters. Proceedings of the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 256-265, 2004.

Research Interest(s)

Machine learning, data mining, event detection, pattern detection, disease surveillance, crime prediction


Ph.D., Computer Science, Carnegie Mellon University, 2006

Working Papers

Correlates of Homicide: New Space/Time Interaction Tests for Spatiotemporal Point Processes

Statistical inference on spatiotemporal data often proceeds by focusing on the temporal aspect of the data, ignoring space, or the spatial aspect, ignoring time. In this paper, we explicitly focus on the interaction between space and time. Using a geocoded, time-stamped dataset from Chicago of almost 9 millions calls to 911 between 2007 and 2010, we ask whether any of these call types are associated with shootings or homicides. Standard correlation techniques do not produce meaningful results in the spatiotemporal setting because of two confounds: purely spatial effects (i.e. "bad" neighborhoods) and purely temporal effects (i.e. more crimes in the summer) could introduce spurious correlations. To address this issue, a handful of statistical tests for space-time interaction have been proposed, which explicitly control for separable spatial and temporal dependencies. Yet these classical tests each have limitations. We propose a new test for space-time interaction, using a Mercer kernel-based statistic for measuring the distance between probability distributions. We compare our new test to existing tests on simulated and real data, where it performs comparably to or better than the classical tests. For the application we consider, we find a number of interesting and significant space-time interactions between 911 call types and shootings/homicides.