AR Model for Source-Space MEG

Updated 4 February 2026

The paper introduces an autoregressive framework that models the evolution of cortical sources by leveraging past MEG signals.
It integrates classical MVAR techniques, sparse Bayesian learning, and state-space models to enhance source localization and causal inference.
Scalable generative methods and deep learning extensions provide robust SNR improvements and detailed connectivity analysis at millisecond resolution.

An autoregressive model for source-space MEG refers to any framework—linear or nonlinear, parametric or nonparametric—that models the evolution of magnetoencephalographic (MEG) brain source activity in time as a function of its own past. This paradigm underpins a broad set of dynamic inverse solutions, generative modeling approaches, and causal inference techniques. Such models serve both as neurophysiologically grounded priors for source estimation and as powerful tools for capturing the statistical structure, predictability, and directed interactions within cortical neural assemblies at millisecond temporal resolution.

1. Foundations of Autoregressive Source-Space Modeling in MEG

Autoregressive (AR) source-space models for MEG assert that the latent cortical source vector $x_t \in \mathbb{R}^p$ at time $t$ (with $p$ sources) can be expressed in functional dependence on prior states: $x_t = \sum_{k=1}^P A_k x_{t-k} + e_t$ where $A_k \in \mathbb{R}^{p \times p}$ are autoregressive coefficient matrices of order $P$ , and $e_t$ is a stochastic innovation (often Gaussian, $e_t \sim \mathcal{N}(0, \Sigma_e)$ ). The key distinction from sensor-level AR approaches lies in the modeling of $x_t$ after the ill-posed MEG inverse problem has been addressed—typically via a source localization algorithm such as beamforming, Minimum-Norm Estimate (MNE), or advanced Bayesian/dynamical filters (Lamus et al., 2015, Dinh et al., 2019).

Source-space AR modeling supports both explicit state-space estimation (dynamical inverse solutions) and causal modeling (connectivity analysis, Granger causality, Partial Directed Coherence), and is increasingly being integrated with deep learning and vector quantization frameworks to enable high-capacity, scalable generative models (Csaky, 28 Jan 2026).

2. Classical Linear Approaches: MVAR and Sparse Bayesian Methods

The canonical linear multivariate autoregressive (MVAR) model operates on short lists of source time-courses, typically extracted via spatial beamforming or Bayesian inverse solutions: $x(t) = \sum_{p=1}^P A_p x(t-p) + e(t)$ The MVAR formalism enables direct estimation of directed connectivity through the AR coefficients $t$ 0, and is the basis for frequency-domain connectivity metrics such as Partial Directed Coherence (PDC).

Classical estimation by least-squares (LS) regression is severely degraded by noisy or interfering sources at low signal-to-interference-ratio (SIR). To address this, sparse Bayesian learning with Automatic Relevance Determination (ARD) priors shrinks spurious coefficients toward zero, yielding robust detection of true causal links even at SIR $t$ 1 (Sekihara et al., 2012). Model selection (order $t$ 2) is typically conducted via AIC, BIC, or cross-validation. The estimation is cast as an EM-type iterative maximization of marginal likelihood with efficient updates for the ARD and noise parameters, $t$ 3.

Statistical significance of causal links is commonly assessed via frequency-dependent permutation testing or surrogate-data bootstrapping. Permutation tests provide more conservative thresholds, crucial for sparse AR models, as naive bootstrapping underestimates chance-level PDC due to the over-shrinking effect of ARD (Sekihara et al., 2012).

Key comparative findings:

SIR	Least-Squares (False-Positive Rate)	Sparse-Bayes (FPR/TPR)
$t$ 4	low	low / $t$ 5100%
$t$ 6	$t$ 7	$t$ 8 / $t$ 9100%
$p$ 0	many spurious links	$p$ 1 / $p$ 280–90%
$p$ 3	fails	$p$ 4 / $p$ 570%

This demonstrates the importance of regularization and statistical rigor for robust AR-based source-space causality analysis (Sekihara et al., 2012).

3. Spatiotemporal Dynamic Priors and State-Space Models

To further constrain AR estimates and allow biophysically plausible spatiotemporal dynamics, recent work embeds spatial smoothness priors directly into the AR matrix $p$ 6 by enforcing local interconnections along the cortical mesh (nearest-neighbors, distance-weighted). The state-space model—combining autoregressive source progression and a linear observation equation—enables principled Bayesian filtering and smoothing (Lamus et al., 2015): $p$ 7 where $p$ 8 is the lead-field matrix and $p$ 9 are noise covariances.

Parameter and state estimation proceeds via a dynamic EM algorithm with Kalman recursion for the E-step (filtering and smoothing) and closed-form M-step updates for hyperparameters (principally, the diagonal $x_t = \sum_{k=1}^P A_k x_{t-k} + e_t$ 0). This allows past and future sensor data to inform the posterior estimate at each time point, yielding substantial improvements in source detection, localization, and uncertainty quantification compared to static or pointwise methods. For example, dynamic MAP-EM source estimates achieve ROC AUC $x_t = \sum_{k=1}^P A_k x_{t-k} + e_t$ 1 static MNE, with $x_t = \sum_{k=1}^P A_k x_{t-k} + e_t$ 2– $x_t = \sum_{k=1}^P A_k x_{t-k} + e_t$ 3 detection at $x_t = \sum_{k=1}^P A_k x_{t-k} + e_t$ 4 FPR and $x_t = \sum_{k=1}^P A_k x_{t-k} + e_t$ 5– $x_t = \sum_{k=1}^P A_k x_{t-k} + e_t$ 6 RMSE reductions inside active regions (Lamus et al., 2015).

Computational scalability is achieved by exploiting the sparsity of $x_t = \sum_{k=1}^P A_k x_{t-k} + e_t$ 7 and structure of $x_t = \sum_{k=1}^P A_k x_{t-k} + e_t$ 8 and $x_t = \sum_{k=1}^P A_k x_{t-k} + e_t$ 9, enabling large source grids ( $A_k \in \mathbb{R}^{p \times p}$ 0) and hundreds of sensors.

4. Nonlinear and Deep Learning Extensions: Contextual and Token-Based AR Modeling

Nonlinear dynamical priors can be learned from data using recurrent neural architectures. The Contextual Minimum-Norm Estimate (CMNE) approach integrates a recurrent neural network—Long Short-Term Memory (LSTM)—to predict time-varying source covariance priors (beliefs) from several past time steps, which are then fused with standard MNE/dSPM measurement updates using Dempster-Shafer theory (Dinh et al., 2019). This can be interpreted as an implicit nonlinear AR( $A_k \in \mathbb{R}^{p \times p}$ 1) process for source trajectories: $A_k \in \mathbb{R}^{p \times p}$ 2 where $A_k \in \mathbb{R}^{p \times p}$ 3 is learned by minimizing the mean-squared prediction error on ground-truth dSPM maps.

Empirically, CMNE yields order-of-magnitude improvements in SNR and more accurate propagation localization relative to classical (non-contextual) MNE. In simulated spike-wave propagation, SNR rises from $A_k \in \mathbb{R}^{p \times p}$ 4 (dSPM) to $A_k \in \mathbb{R}^{p \times p}$ 5 (CMNE), while localization error remains $A_k \in \mathbb{R}^{p \times p}$ 6 mm throughout propagation (Dinh et al., 2019). A plausible implication is that nonlinear dynamical priors can capture complex cortical propagation patterns that are invisible to linear AR or static models.

5. Scalable Generative AR Models with Discrete Latent Spaces

Recent advances exploit vector quantization and deep transformer backbones to scale AR generation in source-space MEG to previously unattainable data volumes and temporal horizons (Csaky, 28 Jan 2026). In this paradigm, multichannel MEG segments are encoded as flattened vectors of discrete tokens by a causal SEANet encoder and multi-level Residual Vector Quantization (RVQ), producing up to $A_k \in \mathbb{R}^{p \times p}$ 7k tokens per minute (e.g., $A_k \in \mathbb{R}^{p \times p}$ 8 RVQ levels × $A_k \in \mathbb{R}^{p \times p}$ 9 neuro streams × $P$ 0 token windows). These tokens are then autoregressively predicted using a decoder-only transformer (FlatGPT, based on Qwen2.5-VL) with AR loss: $P$ 1 Long-horizon generation is enabled by minute-scale context ( $P$ 2 s, $P$ 3 tokens) and efficient sliding-key-value cache strategies, with open-loop rollouts exceeding $P$ 4 minutes. Model performance is evaluated using on-manifold stability (summary feature envelopes for drift), conditional specificity (prompt-swap divergence in spectral/covariance/coherence features), and cross-dataset generalization (training on CamCAN+OMEGA, testing on MOUS) (Csaky, 28 Jan 2026). BrainTokMix achieves high-fidelity reconstruction (mean absolute error $P$ 5, Pearson correlation $P$ 6) and robust stability/prompt dependence. Limitations include lack of explicit AR/VAR or diffusion baselines at these timescales; potential extensions include multi-scale tokenizers and explicit inclusion of multimodal or stimulus conditional context.

6. Applications and Limitations

Autoregressive models for source-space MEG serve crucial roles in:

Directed functional connectivity (e.g., Granger causality, PDC), with robust inference under low SIR using sparse AR and permutation-based thresholds (Sekihara et al., 2012).
Distributed source estimation with dynamic prior covariance structure, yielding superior SNR and localization (spatiotemporal smoothness or learned neural-prior) (Lamus et al., 2015, Dinh et al., 2019).
Large-scale, generative modeling of MEG time series for simulation, data augmentation, and theoretical studies of predictive coding (Csaky, 28 Jan 2026).

Limitations are context-dependent: classical linear AR/MVAR is highly sensitive to noise/interference unless regularized; Bayesian/sparse AR can suppress both spurious and weak true interactions. Deep learning models like CMNE require large evoked datasets, and transformer-based generative modeling at scale is computationally intensive and lacks direct interpretability or parametric connectivity mapping. A plausible implication is that no single AR paradigm is optimal for all scientific or translational tasks, and future progress may hinge on hybrid architectures and systematic benchmark comparisons.

7. Future Directions

Key directions include:

Extension to multi-modal, multi-resolution, and multimodal (MEG+fMRI, stimulus-conditioned) AR frameworks.
Incorporation of transformer-style attention and state-space models for further scaling of context and sequence length (Csaky, 28 Jan 2026).
Joint generative and causal source modeling to bridge predictive accuracy with interpretability.
Systematic benchmarking of linear, sparse, Bayesian, deep, and discrete AR models across SNR regimes, tasks, and datasets.
Use of AR generative models for simulation-based inference, privileged knowledge distillation, and robust data augmentation.

The field continues to evolve toward data-efficient, neurophysiologically interpretable, and computationally scalable AR methodologies, reflecting increasing integration of statistical, dynamical systems, and machine learning perspectives.