Course Details

Course Number: 95-791

Data Mining

Units: 6

Data mining – intelligent analysis of information stored in data sets – has gained a substantial interest among practitioners in a variety of fields and industries. Nowadays, almost every organization collects data, which can be analyzed in order to support making better decisions, improving policies, discovering computer network intrusion patterns, designing new drugs, detecting credit fraud, making accurate medical diagnoses, predicting imminent occurrences of important events, monitoring and evaluation of reliability to preempt failures of complex systems, etc.

Learning Objectives:

This course will provide participants with an understanding of fundamental data mining methodologies and with the ability to formulate and solve problems with them. Particular attention will be paid to practical, efficient and statistically sound techniques, capable of providing not only the requested discoveries, but also estimates of their utility. The lectures will be complemented with hands-on experience with data mining software, primarily R, to allow development of basic execution skills.

The scope of the course will cover the following groups of topics.

Foundations. How to make data mining practical? (approximately 40% of class time)

* Learning from data: why, what and how?
* Fundamental tasks, issues and paradigms of learning models from data.
* Real world data is noisy and uncertain. How much can we trust the results of our analyses?
* Model selection.
* Reduction of dimensionality and data engineering.
* Measures of association between data attributes: information theoretic, correlational.

Pragmatic methodologies for mining data (approximately 60% of class time)

* Predictive analytics: classification and regression.
* Cost-sensitive model selection using ROC approach.
* Compression of data and models for improved reliability, understandability, and tractability of large sets of highly dimensional data.
* Association rule learning and decision list learning, decision trees.
* Introduction to density estimation, anomaly detection, and clustering.
* Overview of mining complex types of data.
* Illustrative examples of real-world applications.

Syllabus

Prerequisites:
95-796 Statistics for IT Managers 6 Credits

Faculty:
Karen (Lujie) Chen
Alexandra Chouldechova
Artur Dubrawski
Murlikrishna Viswanathan