Largevars: High-Dimensional Cointegration Testing in R
- Largevars R Package is a toolkit for cointegration testing in high-dimensional VAR models using advanced random matrix theory and simulated Airy₁ quantiles.
- It employs a modified Johansen framework with techniques like detrending, cyclic indexing, and eigenvalue decomposition to accurately detect cointegration.
- Empirical applications, including analyses of S&P100 stocks and simulated VAR(2) systems, demonstrate its utility in robust model specification and risk management.
The Largevars R package is a statistical software toolkit for conducting cointegration tests in high-dimensional vector autoregressions (VARs). Developed to accommodate the challenges posed by large-scale multivariate time series analysis, Largevars modifies the classical Johansen cointegration testing framework by adopting the large N, large T asymptotic regime of Bykhovskaya and Gorin (2022, 2025). The methodology leverages connections to random matrix theory, specifically the partial sums of the Airy₁ point process, to yield asymptotically valid inference in settings with potentially hundreds of series and lengthy time horizons. The package includes simulated quantiles of the first ten partial sums of the Airy₁ point process, precise to three digits, facilitating practical hypothesis testing. Empirical and simulated examples—such as an S&P100 stocks analysis and a high-dimensional VAR(2) system—demonstrate its capacity for reliable cointegration detection.
1. Motivations and Background
Cointegration, the existence of stationary linear combinations among multivariate non-stationary series, is fundamental in econometric analysis of financial, macroeconomic, and other time series data. Classical cointegration tests, notably the Johansen likelihood ratio test, depend on asymptotic theory with fixed dimensionality () and growing sample size (). As practitioners increasingly work with systems where both and are large, standard procedures suffer from elevated type I error rates (“over-rejection”). Largevars is specifically designed to address this deficiency by reformulating the cointegration test with new asymptotic theory that remains valid as both sample size and model dimension increase.
2. Statistical Methodology
Largevars implements a test based on the squared sample canonical correlations between “detrended” lagged levels and differences of the series. The procedure is as follows:
- Detrending: Time series are shifted and de-trended:
- Cyclic Indexing & Regressor Construction: Cyclic (modulo ) matrix construction avoids edge effects and matches the VAR() architecture.
- Regression Residuals: Residuals are computed after regressing the transformed data on the regressor matrices.
- Cross-Product Matrices: Four matrices, , , , and , are constructed to measure covariance structure.
- Eigenvalue Decomposition: The matrix
yields ordered eigenvalues , representing squared canonical correlations.
- Likelihood Ratio Test Statistic: For a hypothesized cointegration rank , the statistic is
Traditional critical values are replaced by distributions derived from the Airy₁ point process.
3. Asymptotic Distribution and Random Matrix Theory
The central innovation is recognizing that, under large and , and the null hypothesis of no cointegration, the centered and normalized test statistic converges in distribution to a sum of leading points () from the Airy₁ process:
where and are explicitly derived data-dependent constants, and empirical quantiles of are supplied up to three digits for . This approach supplies more accurate critical regions than classical methods, which misrepresent the null distribution in large systems.
4. Empirical and Simulation Studies
S&P100 Stocks
Largevars is applied to weekly logarithms of adjusted closing prices for 92 S&P100 stocks across 521 observations (post-differencing). After the detrending and canonical correlation steps, the likelihood ratio-type statistic is compared with Airy₁-based quantiles. The observed statistic was approximately , with a p-value of $0.23$; the test did not reject the null of no cointegration. The package outputs diagnostic histograms for the eigenvalue distributions, overlaid with Wachter density fits.
Simulated VAR(2) System
A second example simulates a VAR(2) system with and , constructed such that two cointegrated relationships exist (via matrix of rank 2). For , the calculated statistic was about $48.43$ and the p-value approximately $0.01$, resulting in rejection of the null, consistent with the design. The separation between leading and bulk eigenvalues is exhibited graphically.
5. Applications and Practical Utility
Largevars addresses a central need in high-dimensional econometrics, finance, and risk analytics, where practitioners must diagnose cointegration among many series—such as asset prices, economic indicators, or portfolios. Reliable cointegration testing enables accurate model specification, risk management, and strategies such as pairs trading. Traditional restrictions that limit cointegration analysis to small are relaxed, permitting robust inference for datasets with dozens or hundreds of series.
Key diagnostic outputs include empirical histograms of the canonical correlation spectrum, critical value tables based on the Airy₁ point process, and direct reporting of p-values using the simulated quantiles.
6. Computational and Numerical Aspects
The package employs simulations of very large matrices and leverages tridiagonalization to efficiently compute quantiles of the partial sums of Airy₁ points. All supplied quantiles are precise up to the first three digits. Computational intensity grows with and , and further work is suggested to improve on-the-fly calculation and precision, as well as refinement for finite sample corrections.
7. Limitations, Open Problems, and Future Development
Remaining open challenges include achieving higher precision for critical value quantiles and enhancing the finite sample correction methods, as referenced in the discussion following Theorem 2. Possible future extensions may address computational efficiency for extremely large datasets and enlarged quantile libraries for the Airy₁ process. Ongoing theoretical development in random matrix theory may inform future updates.
In summary, Largevars provides a theoretically grounded, practically oriented solution to cointegration testing in high-dimensional vector autoregressions. Its use of alternative asymptotics and random matrix theory quantiles marks a substantial advance for empirical research in fields where traditional methods are no longer applicable due to scale.