Keynote Speaker – Georgia Statistics Day 2023

S.C. Samuel Kou

Chair of Department of Statistics, Harvard University
Professor of Biostatistics, Harvard T.H. Chan School of Public Health

Samuel Kou is Professor of Statistics at Harvard University. He received a bachelor’s degree in computational mathematics from Peking University in 1997, followed by a Ph.D. in statistics from Stanford University in 2001. After completing his Ph.D., he joined Harvard University as an Assistant Professor of Statistics and was promoted to a full professor in 2008.

His research interests include big data analytics; digital disease tracking; stochastic inference in biophysics, chemistry and biology; protein folding; Bayesian inference for stochastic models; nonparametric statistical methods; model selection and empirical Bayes methods; and Monte Carlo methods.

He is the recipient of the COPSS (Committee of Presidents of Statistical Societies) Presidents’ Award, the highest honor for a statistician under the age of 40; the Guggenheim Fellowship; a US National Science Foundation CAREER Award; the Institute of Mathematical Statistics Richard Tweedie Award; the Raymond J. Carroll Young Investigator Award; and the American Statistical Association Outstanding Statistical Application Award. He is an elected Fellow of the American Statistical Association, an elected member of the International Statistical Institute, and an elected Fellow and a Medallion Lecturer of the Institute of Mathematical Statistics.

Catalytic Prior Distributions for Bayesian Inference

The prior distribution is an essential part of Bayesian statistics, and yet in practice, it is often challenging to quantify existing knowledge into pragmatic prior distributions. In this talk we will discuss a general method for constructing prior distributions that stabilize the estimation of complex target models, especially when the sample sizes are too small for standard statistical analysis, which is a common situation encountered by practitioners with real data. The key idea of our method is to supplement the observed data with a relatively small amount of “synthetic” data generated, for example, from the predictive distribution of a simpler, stably estimated model. This general class of prior distributions, called “catalytic prior distributions” is easy to use and allows direct statistical interpretation. In the numerical evaluations, the resulting posterior estimation using catalytic prior distribution outperforms the maximum likelihood estimate from the target model and is generally superior to or comparable in performance to competitive existing methods. We will illustrate the usefulness of the catalytic prior approach through real examples and explore the connection between the catalytic prior approach and a few popular regularization methods.