Professor of Statistics and Associate Dean of Dietrich College
I am interested in highly multivariate data for which we can discover an interesting and interpretable dependence structure.
Data in education, psychology and the social sciences is often of this form: a student's right and wrong answers on a multiple-choice test, categories of solutions produced by a subject performing tasks in a cognitive psychology experiment, and responses to an interest inventory employed by a job counselor are all examples of this. I am especially intrigued when ideas in seemingly-unrelated fields come together, and so I have also been interested in capture-recapture models for estimating the size of wildlife and human populations, which share many features with models for multiple-choice tests.
A common tool for modeling the dependence structure in all of these examples is the latent variable model: each discrete response, such as the answer to an exam question, is treated as an indirect measure of some underlying---or latent---quantity, such as ``ability'' or ``amount learned'', that we cannot measure directly. I have studied latent variable models employed in the design and analysis of standardized tests such as the Scholastic Aptitude Test and the Graduate Records Examination, in the analysis of small-scale experiments in psychology and psychiatry, and in the analysis of large scale educational surveys such as the National Assessment of Educational Progress. Some of my recent work aims to characterize the dependence structure implied by these models, so that one can quickly decide whether they are the right tool for a particular problem.
The quantities of interest---depression, academic achievement, job satisfaction, etc.---are necessarily fuzzily defined; hence, the models do not usually try for strong fidelity with the underlying processes. Instead, they try for predictive value and efficient data summarization. This suggests that another important problem is knowing whether and how these relatively simple descriptive models succeed in a variety of practical situations where the modelling assumptions may not exactly hold. A novel feature of my research here has been to view each subject's responses as forming the initial segment of a stochastic process with an incompletely-specified dependence structure.
Large sample theory, probability inequalities, robust estimation, and hierarchical Bayes modeling techniques are all useful tools in these efforts.
My research interests include Markov Chain Monte Carlo and other computing and estimation
methods, errors in variables, factor analysis and structural equation models in econometrics and psychiatric statistics, statistical analysis of large randomized field trials in education, rating
protocols for teacher quality, and educational data mining.