VARMA Models in Multivariate Time Series

Updated 23 February 2026

VARMA is a multivariate time series model that integrates autoregressive (AR) and moving average (MA) dynamics to capture joint interdependencies.
Estimation techniques include maximum likelihood, convex regularization, and tensor decompositions, addressing high-dimensional challenges effectively.
Identification requires additional constraints such as echelon forms and sparsity to overcome non-uniqueness and ensure model interpretability.

A Vector Autoregressive Moving Average (VARMA) model is a class of multivariate time series models that generalizes the vector autoregressive (VAR) family by explicitly incorporating both autoregressive (AR) and moving average (MA) dynamics among multiple interdependent time series. Aknowledged for its theoretical expressiveness, parsimony in capturing joint dynamics, and generality, the VARMA class nevertheless poses significant challenges in identification, estimation, and model selection. Recent advances in high-dimensional statistics, tensor decompositions, and hybrid deep-learning architectures have substantially broadened the practical scope of VARMA modeling.

1. Definition and Structural Properties

A VARMA $(p,q)$ model for a $d$ -dimensional stochastic process $\{X_t\}$ is defined by the lag-operator equation: $\Phi(L)X_t = \Theta(L)\varepsilon_t,$ where

$\Phi(L) = I_d - \sum_{i=1}^p \Phi_i L^i,\quad \Theta(L) = I_d + \sum_{j=1}^q \Theta_j L^j,$

with $\Phi_i, \Theta_j \in \mathbb{R}^{d \times d}$ representing AR and MA coefficient matrices, $\varepsilon_t$ is a $d$ -vector white noise process ( $E[\varepsilon_t]=0$ , $E[\varepsilon_t\varepsilon_t'] = \Sigma_\varepsilon>0$ ), and $L$ is the backshift operator.

The process is stationary (causal) if all roots of $\det\Phi(z)=0$ satisfy $|z|>1$ , and invertible if all roots of $\det\Theta(z)=0$ satisfy $|z|>1$ (Düker et al., 2024). These polynomial conditions extend the scalar ARMA criteria to the multivariate domain.

In componentwise form: $X_t = \sum_{i=1}^p \Phi_i X_{t-i} + \sum_{j=1}^q \Theta_j \varepsilon_{t-j} + \varepsilon_t.$ VARMA processes admit infinite-order vector autoregressive (VAR( $\infty$ )) representations under causality and invertibility (Düker et al., 2024, Huang et al., 2024, Zheng, 2022).

2. Identification and Uniqueness

The parameter space for a general VARMA $(p,q)$ model is nontrivially non-identifiable: for a given process, there exists an equivalence class of $(\Phi,\Theta)$ parameterizations generating the same transfer function $\Pi(L) = \Theta(L)^{-1}\Phi(L)$ . To achieve identification, additional constraints or normalizations are required:

Echelon Form (Kronecker indices): Specifies a minimal parametrization using block-Hankel rank conditions and imposes zero patterns in AR and MA matrices (Düker et al., 2024).
Scalar-Component Methodology: Decomposes the process into scalar ARMA relations and “back-solves” for the multivariate structure (Düker et al., 2024).
Block-Toeplitz and Schur-Stable Parameterization: Roy–McElroy–Linton reparametrization converts Schur-stability constraints for AR/MA polynomials into unconstrained real vectors by block-Toeplitz and orthogonal parameters, allowing optimization over $\mathbb{R}^n$ (Roy et al., 2014). This approach ensures all estimated parameters are strictly causal and invertible.
Covariance-Matching Topological Existence: For any finite positive-definite sequence of empirical covariances, there exists a VARMA $(p, q)$ model matching it, via degree theory from differential topology, though non-uniqueness persists (Zhu, 2017).
Sparse and Parsimonious Identification: Modern approaches employ strong convexity (elastic net), sparsity, or low-rank penalties to select a unique, interpretable representative in high dimensions (Wilms et al., 2017, Huang et al., 2024, Zheng, 2022).

3. Estimation and Computational Methodologies

Classical Frameworks

(Quasi-)Maximum Likelihood (ML, QML): The full or conditional log-likelihood is maximized, often via iterative numerical optimization, Kalman filtering when cast into state-space form, or closed-form regression under scalar MA parameterizations (Düker et al., 2024, Nguyen, 2019, Nguyen, 2016).
Generalized Least Squares (GLS) for Scalar MA: If the MA polynomial is scalar, estimation of AR parameters reduces to GLS at each fixed value of MA coefficients, resulting in explicit regression solutions and fast likelihood evaluations (Nguyen, 2016, Nguyen, 2019).
Innovations Algorithm: Used to fit VMA components by matching autocovariances recursively; particularly effective in reduced-rank or factor-augmented VARMA constructions (Bhamidi et al., 2023).

High-Dimensional and Structured VARMA

Sparse/Convex Regularization: Penalties such as $\ell_1$ or elastic net are used to induce parameter sparsity, enabling consistent estimation and forecast-optimal fitting, even in large dimensions $d$ (Wilms et al., 2017, Zheng, 2022, Huang et al., 2024). The estimation typically decomposes into a high-order VAR fit for innovation proxies, followed by penalized regression for (Φ,Θ), and can be implemented efficiently through block coordinate descent or proximal gradient algorithms.
Tensor Decomposition and Low-Rank Factorization: Tucker and CP (CANDECOMP/PARAFAC) low-rank decompositions provide tractable parameterizations of infinite-lag AR/MA coefficient tensors, supporting interpretation via latent dynamic factors and scalability to multidimensional settings (Huang et al., 2024).
Time-Varying Coefficients and Nonparametric Techniques: Nonstationary VARMA models with time-dependent parameters can be estimated via kernel smoothing, yielding estimators with optimal nonparametric rates and uniform consistency under local stationarity (Yan et al., 2020, Alj et al., 2015).
Hybrid Deep Learning Architectures: Neural feature extractors informed by explicit AR and MA recursions (e.g., VARMAformer) are integrated into transformer-based forecasting models, improving long-term multivariate prediction by fusing local statistical dependency capture with global temporal attention (Song et al., 5 Sep 2025).

4. Model Selection, Diagnostics, and Structural Analysis

Order Selection: Criteria such as AIC/BIC are routinely applied to select $(p,q)$ , with extensions to penalized and high-dimensional scenarios (e.g., BIC adjusted for weak/row-sparsity regimes) (Düker et al., 2024, Zheng, 2022).
Spectral and Covariance Structure Checks: Residual portmanteau tests and root location checks ensure model adequacy, invertibility, and lack of remaining serial dependence (Düker et al., 2024).
Granger Causality and Impulse Response: VARMA models facilitate testing for Granger-noncausality by restricting polynomial coefficients, and support structural analysis via identified MA representations and IRF computations (Düker et al., 2024).
Structural VARMA: With non-Gaussianity or higher-order cumulant dependence, identification and estimation can extend to frequency-domain criteria and ICA-based rotation recovery, even without causality/invertibility (Velasco, 2020).

5. Extensions, High-Dimensional Innovations, and Practical Impact

Dynamic Factor / Reduced-Rank VARMA: VARMA provides the exact observed-level representation for factor models with VAR-driven latent factors, leading to reduced-rank structures in both AR and MA components (Bhamidi et al., 2023).
Scalable High-Dimensional Estimation: Modern approaches achieve computational and statistical tractability for $d\gg 1$ via (i) explicit sparsity/low-rank/parsimonious parameterizations, (ii) efficient alternating minimization and ADMM algorithms, and (iii) nonasymptotic error bounds for both parameter recovery and forecasting (Huang et al., 2024, Zheng, 2022, Wilms et al., 2017).
Extensions: The literature details extensions to regime-switching/threshold VARMA, error-correction (cointegrated) VARMA, factor-augmented models, VARMA-GARCH (conditional heteroskedasticity), as well as robust methods for mixed-frequency or missing-data contexts (Düker et al., 2024).
Applications: VARMA modeling underpins state-of-the-art practices in high-dimensional macroeconomic forecasting, volatility modeling, large-scale demand and traffic prediction, as well as interpretable machine-learning pipelines for time series (Song et al., 5 Sep 2025, Huang et al., 2024, Wilms et al., 2017).

6. Computational Complexity and Theoretical Guarantees

Closed-Form and FFT-Accelerated Algorithms: VARMA with scalar or block-structured MA admits regression techniques with $O((dp+1)^2 T + c d T\log T)$ time complexity via FFTs on Toeplitz structure, an order-of-magnitude savings over Kalman-based methods (Nguyen, 2016).
High-Dimensional Error Bounds: For penalized and low-rank VARMA estimation, minimax rates in operator/Frobenius/loss prediction metrics are attainable under sub-Gaussian errors, explicit signal strength, and weak-sparsity conditions (Huang et al., 2024, Zheng, 2022).
Order Selection Consistency: BIC-based schemes in the high-dimensional setting achieve consistent recovery of true $(p,q)$ orders under irreducibility and minimality constraints, even as $d,T\to\infty$ (Zheng, 2022).
Exact Causal-Invertible Estimation: Via reparametrization, maximum likelihood estimation can be performed over an unconstrained set of real parameters guaranteeing strict stability and invertibility at every parameter update (Roy et al., 2014).

7. Current Developments and Open Research Directions

Ongoing and emerging themes in VARMA research include:

Automated and data-driven identification (especially scalable rank/lag selection for large $d$ ) (Huang et al., 2024).
Integration with neural and nonparametric architectures for long-memory and local nonstationarity (Song et al., 5 Sep 2025, Yan et al., 2020).
Efficient Bayesian inference in unconstrained stable/invertible C-space (Roy et al., 2014).
Hybrid time-frequency identification and structural recovery in non-Gaussian/misspecified regimes (Velasco, 2020).
Empirical benchmarking demonstrating the consistent advantage of modern penalized VARMA over VAR, in both accuracy and parsimony (Wilms et al., 2017, Huang et al., 2024).
Theory and methodology for factor-augmented and network-structured VARMA in economics, finance, and engineering (Bhamidi et al., 2023, Zheng, 2022).

The VARMA paradigm thus occupies a central position in both the theory and practice of multivariate time series analysis, as renewed by computational advances and methodological innovations in high-dimensional, nonstationary, and hybrid statistical-learning contexts.