Data-Driven Tuning Parameter Selection for High-Dimensional Vector Autoregressions
Abstract: Lasso-type estimators are routinely used to estimate high-dimensional time series models. The theoretical guarantees established for these estimators typically require the penalty level to be chosen in a suitable fashion often depending on unknown population quantities. Furthermore, the resulting estimates and the number of variables retained in the model depend crucially on the chosen penalty level. However, there is currently no theoretically founded guidance for this choice in the context of high-dimensional time series. Instead, one resorts to selecting the penalty level in an ad hoc manner using, e.g., information criteria or cross-validation. We resolve this problem by considering estimation of the perhaps most commonly employed multivariate time series model, the linear vector autoregressive (VAR) model, and propose versions of the Lasso, post-Lasso, and square-root Lasso estimators with penalization chosen in a fully data-driven way. The theoretical guarantees that we establish for the resulting estimation and prediction errors match those currently available for methods based on infeasible choices of penalization. We thus provide a first solution for choosing the penalization in high-dimensional time series models.
- Adamek, R., S. Smeekes, and I. Wilms (2023): “Lasso inference for high-dimensional time series,” Journal of Econometrics, 235, 1114–1143.
- Babii, A., E. Ghysels, and J. Striaukas (2022): “Machine learning time series regressions with an application to nowcasting,” Journal of Business & Economic Statistics, 40, 1094–1106.
- Basu, S. and G. Michailidis (2015): “Regularized estimation in sparse high-dimensional time series models,” The Annals of Statistics, 43, 1535–1567.
- Belloni, A., D. Chen, V. Chernozhukov, and C. Hansen (2012): “Sparse models and methods for optimal instruments with an application to eminent domain,” Econometrica, 80, 2369–2429.
- Bickel, P. J., Y. Ritov, and A. B. Tsybakov (2009): “Simultaneous analysis of Lasso and Dantzig selector,” The Annals of Statistics, 1705–1732.
- Chen, X., Q.-M. Shao, W. B. Wu, and L. Xu (2016): “Self-normalized Cramér-type moderate deviations under dependence,” The Annals of Statistics, 44, 1593–1617.
- Chernozhukov, V., W. K. Härdle, C. Huang, and W. Wang (2021): “LASSO-driven inference in time and space,” The Annals of Statistics, 49, 1702–1735.
- Davis, R. A., P. Zang, and T. Zheng (2016): “Sparse vector autoregressive modeling,” Journal of Computational and Graphical Statistics, 25, 1077–1096.
- Gao, L., Q.-M. Shao, and J. Shi (2022): “Refined Cramér-type moderate deviation theorems for general self-normalized sums with applications to dependent random variables and winsorized mean,” The Annals of Statistics, 50, 673–697.
- Guo, S., Y. Wang, and Q. Yao (2016): “High-dimensional and banded vector autoregressions,” Biometrika, 103, 889–903.
- Han, F. and H. Liu (2013): “Transition matrix estimation in high dimensional time series,” in International conference on machine learning, PMLR, 172–180.
- Han, F., H. Lu, and H. Liu (2015): “A direct estimation of high dimensional stationary vector autoregressions,” Journal of Machine Learning Research, 16, 3115–3150.
- Javanmard, A. and A. Montanari (2014): “Confidence intervals and hypothesis testing for high-dimensional regression,” The Journal of Machine Learning Research, 15, 2869–2909.
- Kock, A. B. and L. Callot (2015): “Oracle inequalities for high dimensional vector autoregressions,” Journal of Econometrics, 186, 325–344.
- Kuchibhotla, A. K. and A. Chakrabortty (2022): “Moving beyond sub-Gaussianity in high-dimensional statistics: Applications in covariance estimation and linear regression,” Information and Inference: A Journal of the IMA, 11, 1389–1456.
- Loh, P.-L. and M. J. Wainwright (2012): “High-dimensional regression with noisy and missing data: Provable guarantees and nonconvexity,” The Annals of Statistics, 40, 1637–1664.
- Masini, R. P., M. C. Medeiros, and E. F. Mendes (2022): “Regularized estimation of high-dimensional vector autoregressions with weakly dependent innovations,” Journal of Time Series Analysis, 43, 532–557.
- McCracken, M. W. and S. Ng (2016): “FRED-MD: A Monthly Database for Macroeconomic Research,” Journal of Business & Economic Statistics, 34, 574–589.
- Medeiros, M. C. and E. F. Mendes (2016): “ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-regularization of high-dimensional time-series models with non-Gaussian and heteroskedastic errors,” Journal of Econometrics, 191, 255–271.
- Miao, L., P. C. B. Phillips, and L. Su (2023): “High-dimensional VARs with common factors,” Journal of Econometrics, 233, 155–183.
- Negahban, S. N., P. Ravikumar, M. J. Wainwright, and B. Yu (2012): “A unified framework for high-dimensional analysis of M𝑀{M}italic_M-estimators with decomposable regularizers,” Statistical Science, 27, 538–557.
- Song, S. and P. J. Bickel (2011): “Large vector auto regressions,” arXiv preprint arXiv:1106.3915.
- Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), 267–288.
- van de Geer, S., P. Bühlmann, Y. Ritov, and R. Dezeure (2014): “On asymptotically optimal confidence regions and tests for high-dimensional models,” The Annals of Statistics, 42, 1166–1202.
- Vladimirova, M., S. Girard, H. Nguyen, and J. Arbel (2020): “Sub-Weibull distributions: Generalizing sub-Gaussian and sub-Exponential properties to heavier tailed distributions,” Stat, 9, e318.
- Wong, K. C., Z. Li, and A. Tewari (2020): “Lasso guarantees for β𝛽\betaitalic_β-mixing heavy-tailed time series,” The Annals of Statistics, 48, 1124–1142.
- Wu, W. B. (2005): “Nonlinear system theory: Another look at dependence,” Proceedings of the National Academy of Sciences, 102, 14150–14154.
- Wu, W. B. and X. Shao (2004): “Limit theorems for iterated random functions,” Journal of Applied Probability, 41, 425–436.
- Wu, W.-B. and Y. N. Wu (2016): “Performance bounds for parameter estimates of high-dimensional linear models with correlated errors,” Electronic Journal of Statistics, 10, 352–379, publisher: Institute of Mathematical Statistics and Bernoulli Society.
- Zhang, C.-H. and S. S. Zhang (2014): “Confidence intervals for low dimensional parameters in high dimensional linear models,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76, 217–242.
- Zhang, X. and G. Cheng (2018): “Gaussian approximation for high dimensional vector under physical dependence,” Bernoulli, 24, 2640–2675.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.