title | author | output | |||
---|---|---|---|---|---|
STATS 631 |
Instructor: Edward L. Ionides |
|
STATS 631 is new this year, so suggestions on how to make it work well are particularly appreciated. In addition to all the activities of 531, we will arrange a one-hour meeting at a time we call all make, at which we will discuss a paper. Everyone is expected to have something to say, for example, a fairly brief reading of the paper should be sufficient to have an opinion on one or more of the "generic questions" listed below. You are expected to spend at least an hour reading the paper. The primary intention here is that we learn about research in time series analysis by discussing some strong papers (old and new) which have helped to define the current state of the art.
Meetings are Thursday 10:30-11:30 in 438 West Hall.
- Week starting Jan 13: akaike74
Hirotugu Akaike's information criterion (AIC) is used in many recent time series papers, including those later on our reading list. Akaike developed his ideas in a sequence of papers, with time series analysis being one of his main motivations. This paper is the first one which focuses on the current standard definition of AIC. Akaike's foundational papers are 50 years old, and there has been much work on model selection since then. Why have Akaike's ideas been so persistent?
- Week starting Jan 20: box76
George Box's work popularized the autoregressive integrated moving average (ARIMA) framework for time series. Among many other notable contributions, he is responsible what is perhaps the most widely quoted advice for applied statistics, "All models are wrong, but some models are useful." This influential discussion of the relationship between science and statistics, and the role of models in this relationship, is informed by Box's extensive work in time series analysis. Look for places where dependence, including temporal dependence, play a role.
- Week starting Jan 27: hyndman08
Rob Hyndman's many contributions to time series analysis include the development of `auto.arima`, a widely used approach for choosing ARIMA models. This paper explains the construction of this procedure and its motivation, as well as mentioning alternative approaches.
- Week starting Feb 3: taylor18
This paper introduces a widely used modern forecasting tool, Facebook Prophet, implemented by the R package [__prophet__](https://cran.r-project.org/web/packages/prophet/index.html). Facebook Prophet is not based on ARIMA modeling, and the difference between these approaches is worth consideration.
- Week starting Feb 10: lim21
Deep learning has been influential throughout statistics, and time series analysis is no exception. This review discusses the deep learning for time series, situated before the widespread popularization of transformers.
- Week starting Feb 17: gruver23
When is GenAI useful for time series analysis?
- Week starting Feb 24: bjornstad01
There are surprisingly many important ideas about time series analysis for nonlinear stochastic dynamic systems discussed in this compact paper. The issues it raises have prompted much work over the past two decades, and some issues remain unresolved.
Spring Break
- Week starting Mar 10: doucet09
Particle filtering facilitates time series analysis for many nonlinear systems. We study a review of this technique by two leading experts.
- Week starting Mar 17: kristensen16
Perhaps the main alternative to likelihood-based inference for partially observed Markov process (POMP) models is the locally Gaussian approximation used in the integrated nested Laplace approximation method. A popular implemention is Template Model Builder, described in this paper.
- Week starting Mar 24: stock21
Fishery management is a major economic and environmental task that is based on POMP models for time series data. Template Model Builder is widely used in this context. Would there be an advantage to using particle filter methods to avoid the approximations inherent in Template Model Builder, or does this class of problems expose the limitations of the particle filter?
- Week starting Mar 31: wheeler24
A paper dealing with various practical issues in data analysis via mechanistic models, including residual analysis and benchmarking to help identify model misspecification.
- Week starting Apr 7: subramanian21
Many papers were written fitting mechanistic models to learn about COVID-19 transmission and to forecast its trajectory. This paper shows how it is critical to understand both the reporting process and the disease dynamics.
- Week starting Apr 14: lau21
Machine learning and mechanistic modeling are sometimes seen as alternative approaches whereas they should work together to complement each other. This paper investigates the possibilities in a case study.
akaike74. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716-723.
bjornstad01. Bjørnstad, O. N., & Grenfell, B. T. (2001). Noisy clockwork: time series analysis of population fluctuations in animals. Science, 293(5530), 638-643.
box76. Box, George E. P. (1976). Science and statistics. Journal of the American Statistical Association, 71 (356): 791–799.
doucet09. Doucet, A., & Johansen, A. M. (2009). A tutorial on particle filtering and smoothing: Fifteen years later. Handbook of Nonlinear Filtering, 12(656-704), 3.
gruver23. Gruver, N., Finzi, M., Qiu, S., & Wilson, A. G. (2023). Large language models are zero-shot time series forecasters. Advances in Neural Information Processing Systems, 36.
hyndman08. Hyndman, R. J. & Khandakar, Y. (2008) Automatic time series forecasting: The forecast package for R. Journal of Statistical Software, 26(3).
kristensen16. Kristensen, K., Nielsen, A., Berg, C. W., Skaug, H., & Bell, B. M. (2016). TMB: Automatic Differentiation and Laplace Approximation. Journal of Statistical Software, 70(5), 1–21.
lau21. Lau, M. S., Becker, A., Madden, W., Waller, L. A., Metcalf, C. J. E., & Grenfell, B. T. (2022). Comparing and linking machine learning and semi-mechanistic models for the predictability of endemic measles dynamics. PLOS Computational Biology, 18(9), e1010251.
lim21. Lim, B., & Zohren, S. (2021). Time-series forecasting with deep learning: a survey. Philosophical Transactions of the Royal Society A, 379(2194), 20200209.
stock21. Stock, B. C., & Miller, T. J. (2021). The Woods Hole Assessment Model (WHAM): a general state-space assessment framework that incorporates time-and age-varying processes via random effects and links to environmental covariates. Fisheries Research, 240, 105967.
subramanian21. Subramanian, R., He, Q., & Pascual, M. (2021). Quantifying asymptomatic infection and transmission of COVID-19 in New York City using observed cases, serology, and testing capacity. Proceedings of the National Academy of Sciences, 118(9), e2019716118.
taylor18. Taylor, S. J., & Letham, B. (2018). Forecasting at scale. The American Statistician, 72(1), 37-45.
wheeler24. Wheeler, J., Rosengart, A., Jiang, Z., Tan, K., Treutle, N., & Ionides, E. L. (2024). Informing policy via dynamic models: Cholera in Haiti. PLOS Computational Biology, 20(4), e1012032.
-
Which parts of the paper might be worthwhile for me to read in more detail, and why? If you see an immediate benefit to obtaining a better understanding of part of the paper, then you may spend extra time on it to the extent that fits into your schedule.
-
What is the strongest part of the paper? i.e., something that the paper demonstrates which deserves to be widely known.
-
What is the weakest part of the paper? Is there a limitation that may make the paper less useful in practice, or even misleading. (This may be rare, or hard to find, in high-impact papers.)
-
Has the paper had an impact on statistical theory and/or methodology and/or applications? Why or why not?
-
Technical questions include: (a) why was the notation set up this way? (b) what steps need additional explanation to be clear to this reader?
-
Study the numerical results, figures and tables. To what extent do they support the conclusions of the paper?
-
Grading is on attendance and participation in a weekly 1-hour discussion.
-
Minimal preparation for participation means spending one hour reading the paper and thinking about its contribution. This is a useful academic skill---we read many more papers superficially than we can read in detail.
-
631 students will also complete the same assignments as 531 students. The 631 meeting will count for 20% of the grade, and other components of 531 will be scaled accordingly.