Papers
Topics
Authors
Recent
Search
2000 character limit reached

Largevars: High-Dimensional Cointegration Testing in R

Updated 12 September 2025
  • Largevars R Package is a toolkit for cointegration testing in high-dimensional VAR models using advanced random matrix theory and simulated Airy₁ quantiles.
  • It employs a modified Johansen framework with techniques like detrending, cyclic indexing, and eigenvalue decomposition to accurately detect cointegration.
  • Empirical applications, including analyses of S&P100 stocks and simulated VAR(2) systems, demonstrate its utility in robust model specification and risk management.

The Largevars R package is a statistical software toolkit for conducting cointegration tests in high-dimensional vector autoregressions (VARs). Developed to accommodate the challenges posed by large-scale multivariate time series analysis, Largevars modifies the classical Johansen cointegration testing framework by adopting the large N, large T asymptotic regime of Bykhovskaya and Gorin (2022, 2025). The methodology leverages connections to random matrix theory, specifically the partial sums of the Airy₁ point process, to yield asymptotically valid inference in settings with potentially hundreds of series and lengthy time horizons. The package includes simulated quantiles of the first ten partial sums of the Airy₁ point process, precise to three digits, facilitating practical hypothesis testing. Empirical and simulated examples—such as an S&P100 stocks analysis and a high-dimensional VAR(2) system—demonstrate its capacity for reliable cointegration detection.

1. Motivations and Background

Cointegration, the existence of stationary linear combinations among multivariate non-stationary series, is fundamental in econometric analysis of financial, macroeconomic, and other time series data. Classical cointegration tests, notably the Johansen likelihood ratio test, depend on asymptotic theory with fixed dimensionality (NN) and growing sample size (TT). As practitioners increasingly work with systems where both NN and TT are large, standard procedures suffer from elevated type I error rates (“over-rejection”). Largevars is specifically designed to address this deficiency by reformulating the cointegration test with new asymptotic theory that remains valid as both sample size and model dimension increase.

2. Statistical Methodology

Largevars implements a test based on the squared sample canonical correlations between “detrended” lagged levels and differences of the series. The procedure is as follows:

  • Detrending: Time series are shifted and de-trended:

X~t=Xt1t1T(XTX0)\widetilde{X}_t = X_{t-1} - \frac{t-1}{T}(X_T - X_0)

  • Cyclic Indexing & Regressor Construction: Cyclic (modulo TT) matrix construction avoids edge effects and matches the VAR(kk) architecture.
  • Regression Residuals: Residuals are computed after regressing the transformed data on the regressor matrices.
  • Cross-Product Matrices: Four matrices, S00S_{00}, Sk0S_{k0}, S0kS_{0k}, and SkkS_{kk}, are constructed to measure covariance structure.
  • Eigenvalue Decomposition: The matrix

C~=Sk0S001S0kSkk1\widetilde{C} = S_{k0} S_{00}^{-1} S_{0k} S_{kk}^{-1}

yields ordered eigenvalues λ~1λ~2λ~N\widetilde{\lambda}_1 \geq \widetilde{\lambda}_2 \geq \cdots \geq \widetilde{\lambda}_N, representing squared canonical correlations.

  • Likelihood Ratio Test Statistic: For a hypothesized cointegration rank rr, the statistic is

LRN,T(r)=i=1rln(1λ~i)LR_{N,T}(r) = \sum_{i=1}^r \ln(1 - \widetilde{\lambda}_i)

Traditional critical values are replaced by distributions derived from the Airy₁ point process.

3. Asymptotic Distribution and Random Matrix Theory

The central innovation is recognizing that, under large NN and TT, and the null hypothesis of no cointegration, the centered and normalized test statistic converges in distribution to a sum of leading points (Ai\mathcal{A}_i) from the Airy₁ process:

LRN,T(r)rc1(N,T)N2/3c2(N,T)di=1rAi\frac{LR_{N,T}(r) - r \cdot c_1(N,T)}{N^{-2/3} c_2(N,T)} \xrightarrow{d} \sum_{i=1}^r \mathcal{A}_i

where c1(N,T)c_1(N,T) and c2(N,T)c_2(N,T) are explicitly derived data-dependent constants, and empirical quantiles of i=1rAi\sum_{i=1}^r \mathcal{A}_i are supplied up to three digits for r=1,,10r=1,\ldots,10. This approach supplies more accurate critical regions than classical methods, which misrepresent the null distribution in large systems.

4. Empirical and Simulation Studies

S&P100 Stocks

Largevars is applied to weekly logarithms of adjusted closing prices for 92 S&P100 stocks across 521 observations (post-differencing). After the detrending and canonical correlation steps, the likelihood ratio-type statistic is compared with Airy₁-based quantiles. The observed statistic was approximately 0.28-0.28, with a p-value of $0.23$; the test did not reject the null of no cointegration. The package outputs diagnostic histograms for the eigenvalue distributions, overlaid with Wachter density fits.

Simulated VAR(2) System

A second example simulates a VAR(2) system with N=100N=100 and T=1500T=1500, constructed such that two cointegrated relationships exist (via Π\Pi matrix of rank 2). For r=2r=2, the calculated statistic was about $48.43$ and the p-value approximately $0.01$, resulting in rejection of the null, consistent with the design. The separation between leading and bulk eigenvalues is exhibited graphically.

5. Applications and Practical Utility

Largevars addresses a central need in high-dimensional econometrics, finance, and risk analytics, where practitioners must diagnose cointegration among many series—such as asset prices, economic indicators, or portfolios. Reliable cointegration testing enables accurate model specification, risk management, and strategies such as pairs trading. Traditional restrictions that limit cointegration analysis to small NN are relaxed, permitting robust inference for datasets with dozens or hundreds of series.

Key diagnostic outputs include empirical histograms of the canonical correlation spectrum, critical value tables based on the Airy₁ point process, and direct reporting of p-values using the simulated quantiles.

6. Computational and Numerical Aspects

The package employs simulations of very large matrices and leverages tridiagonalization to efficiently compute quantiles of the partial sums of Airy₁ points. All supplied quantiles are precise up to the first three digits. Computational intensity grows with NN and TT, and further work is suggested to improve on-the-fly calculation and precision, as well as refinement for finite sample corrections.

7. Limitations, Open Problems, and Future Development

Remaining open challenges include achieving higher precision for critical value quantiles and enhancing the finite sample correction methods, as referenced in the discussion following Theorem 2. Possible future extensions may address computational efficiency for extremely large datasets and enlarged quantile libraries for the Airy₁ process. Ongoing theoretical development in random matrix theory may inform future updates.

In summary, Largevars provides a theoretically grounded, practically oriented solution to cointegration testing in high-dimensional vector autoregressions. Its use of alternative asymptotics and random matrix theory quantiles marks a substantial advance for empirical research in fields where traditional methods are no longer applicable due to scale.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Largevars R Package.