Papers
Topics
Authors
Recent
Search
2000 character limit reached

Seemingly Unrelated Regression (SUR)

Updated 28 January 2026
  • Seemingly Unrelated Regression (SUR) is a system of linear models that allows for contemporaneously correlated errors across equations, enhancing estimation efficiency.
  • It employs methods like generalized least squares (GLS) and feasible GLS, ensuring robust and efficient parameter estimation even with heterogeneous regressor sets.
  • Recent extensions include Bayesian variable selection, high-dimensional precision estimation, and robust techniques that address measurement error and outlier contamination.

A seemingly unrelated regression (SUR) model is a multivariate system of linear regression equations in which each equation potentially has a different set of regressors, but the disturbance terms across equations for each observational unit are allowed to be contemporaneously correlated. This cross-equation correlation, while lacking economic or structural interdependence in the regressors, creates potential efficiency gains by joint estimation of the system rather than separate estimation of each equation. The SUR models—formally introduced by Zellner (1962)—form the backbone of multivariate regression analysis in econometrics, systems biology, finance, and related fields, and provide a general framework for enrichment with Bayesian selection, robust estimation, high-dimensional covariance learning, measurement error, and nonlinearity.

1. Mathematical Structure and Likelihood

The prototypical SUR system, with mm equations and nn observations, writes for i=1,,mi=1,\ldots,m and k=1,,nk=1,\ldots,n: yik=xikβi+εiky_{ik} = x_{ik}^\top \beta_i + \varepsilon_{ik} or in matrix notation for each equation: yi=Xiβi+εiy_i = X_i \beta_i + \varepsilon_i with yiy_i the n×1n \times 1 response, XiX_i the n×pin \times p_i regressor matrix, βi\beta_i the pi×1p_i \times 1 coefficient vector, and εi\varepsilon_i the n×1n \times 1 error.

Stacking all equations: y=(y1  ym),X=blockdiag(X1,,Xm),β=(β1  βm)y = \begin{pmatrix} y_1 \ \vdots \ y_m \end{pmatrix},\quad X = \operatorname{blockdiag}(X_1, \ldots, X_m),\quad \beta = \begin{pmatrix} \beta_1 \ \vdots \ \beta_m \end{pmatrix} we have

y=Xβ+εy = X \beta + \varepsilon

with ε\varepsilon possessing a block covariance Cov(ε)=ΣIn\operatorname{Cov}(\varepsilon) = \Sigma \otimes I_n, where Σ\Sigma is the m×mm \times m contemporaneous covariance matrix of the errors across equations.

Under Gaussianity (εkNm(0,Σ))(\varepsilon_{k} \sim N_m(0,\Sigma)), the log-likelihood is

(β,Σ)=n2lnΣ12(yXβ)(Σ1In)(yXβ)+const\ell(\beta, \Sigma) = -\tfrac{n}{2} \ln|\Sigma| - \tfrac{1}{2}(y - X\beta)^\top (\Sigma^{-1} \otimes I_n)(y - X\beta) + \text{const}

The efficient generalized least squares (GLS) estimator is

β^GLS=[X(Σ1In)X]1X(Σ1In)y\hat{\beta}_{\rm GLS} = [X^\top (\Sigma^{-1}\otimes I_n) X]^{-1} X^\top (\Sigma^{-1} \otimes I_n) y

with Feasible GLS (FGLS) obtained by plugging in a consistent estimator Σ^\hat{\Sigma}, usually based on OLS residuals (Saraceno et al., 2021, Peremans et al., 2018, Wali et al., 2019).

2. Identification, Covariance Structure, and Efficiency

Identification of β\beta requires full column rank of the joint design matrix, as well as non-singularity of the error covariance structure in the sense of the information matrix XΩ+XX^\top \Omega^+ X having rank equal to the parameter dimension, where Ω\Omega is the full block covariance (possibly singular) and Ω+\Omega^+ denotes Moore–Penrose inverse (Haupt, 2020). Restrictions may also arise explicitly from parameter constraints or implicitly from the covariance structure (e.g., zero-variance directions corresponding to adding-up or symmetry).

For panel or multi-equation models with time and group effects, the error umitu_{mit} often receives an additive decomposition: umit=μmi+λmt+εmitu_{mit} = \mu_{mi} + \lambda_{mt} + \varepsilon_{mit} with group effects μmi\mu_{mi}, period effects λmt\lambda_{mt}, and idiosyncratic terms. Variances of each component can be heterogeneous across strata (Platoni et al., 2016). Estimation proceeds by moment-based or ANOVA-style decomposition and system-wide GLS.

For generalizations such as singular covariance, explicit formulae for the constrained GLS estimator utilize the information from both the design and the covariance's low-rank representation; in fixed effects panel models, the within-estimator is a special case of singular SUR GLS (Haupt, 2020).

The semiparametric efficiency bound for β\beta is attained by feasible GLS, as shown via influence-function calculations, and coincides with the parametric GLS bound under mild conditions (Hristache et al., 2011).

3. Extensions: Bayesian Selection, High-dimensionality, Robustness

Bayesian SUR and Variable Selection: Modern SUR implementations, exemplified by the BayesSUR framework, introduce joint variable and covariance selection using priors such as spike-and-slab, "hotspot," and Markov random field (MRF) structures on support matrices γ\gamma and decomposable hyper-inverse-Wishart priors on Σ\Sigma (Zhao et al., 2021). Covariance factorization yields computationally scalable MCMC. Empirical evaluations demonstrate near-perfect recovery in high-dimensional genomics (eQTL/mQTL) and improved predictive density in applied drug-response tasks by incorporating structured priors on inclusion indicators. The modular approach allows full decoupling and recombination of variable- and covariance-selection architectures.

High-dimensional SUR: When the number of equations NN is comparable to or exceeds the number of observations TT, sample estimators of Σ\Sigma are ill-posed. The FGLasso estimator replaces Σ\Sigma with a sparse-precision estimator Ω^glasso\hat{\Omega}_{\text{glasso}}, exploiting graphical lasso regularization assumptions. When Ω\Omega is sparse (banded, lattice), FGLasso achieves asymptotic efficiency equivalent to infeasible GLS; performance gains are demonstrated for N/T0.5N/T \gtrsim 0.5 (Tan et al., 2018).

Robust Estimation: Classical SUR estimators are highly sensitive to outliers, both at row- and cell-level. MM-estimators deliver high-breakdown, high-efficiency inference (50% breakdown achievable in the S-step; asymptotic coverage and power maintained in simulations). Fast robust bootstrap (FRB) procedures enable valid inference and hypothesis testing with substantial speedups over classical resampling (Peremans et al., 2018). Similarly, the two-step "surerob" estimator combines cellwise- and rowwise-robustification by an initial univariate MM step and multivariate 2SGS estimator for the residual covariance. Under independent cellwise contamination (ICM), surerob maintains nonzero breakdown and outperforms fastSUR and classical methods even as the number of equations grows; a plausible implication is that only methodologies explicitly targeting both contamination regimes should be applied in high-dimensional, multivariate settings (Saraceno et al., 2021).

The robustness–efficiency tradeoff and performance under distinct contamination schemes are summarized in the table below:

Method Breakdown (THCM) Breakdown (ICM) Computational Cost
Classical SUR 0 0 O((pi)3)O((\sum p_i)^3)
MM-estimator Up to 50% \to0 for large pp High
FastSUR [Hubert+] 50% \to0 for large mm O(m2n)O(m^2n)
surerob 50% >0>0, moderate mm Moderate (dominated by MM and 2SGS)

Non-Gaussian Errors and Mixture Models: SUR systems with non-normal error structure can be modeled via finite Gaussian mixtures. Identifiability is preserved under full-rank regressors. Likelihood, score, and Hessian are available in closed form, and ML estimation proceeds via EM. Empirical applications confirm improved fit and flexibility over classical SUR (Galimberti et al., 2014).

4. Model Selection, Measurement Error, and Variable Uncertainty

Model selection and predictor uncertainty: Modern approaches decouple statistical inference from variable selection using post-inference summarization and utility minimization. A penalized loss (including expected predictive accuracy and 0\ell_0 or 1\ell_1 penalties) is minimized, with summary coefficients Γ^λ\hat{\Gamma}_\lambda computed via lasso or related convex optimization. This framework supports uncertainty in predictors XX (e.g., forecasting contexts), yielding sparser models compared to the fixed-X paradigm. Empirical recovery of meaningful factor structures in asset-pricing demonstrates the gain in deploying selection criteria tied to predictive information, not just marginal likelihood (Puelz et al., 2016).

Measurement error: Extensions to SUR with classical measurement error employ fully Bayesian algorithms—Gibbs sampling or mean-field variational Bayes—to infer both noise and regression parameters. Identification is secured by priors on measurement error variance. MFVB achieves comparable accuracy at a small fraction of the computational cost of MCMC. Ignoring measurement error biases both coefficients and covariance estimates, while the Bayesian approaches correct these biases and improve predictive fit (Bresson et al., 2020).

5. Nonlinearities, Structured Graphs, and Application Domains

Nonlinear multivariate regression can be embedded in a SUR context via spline-basis expansions and spike-and-slab priors on inclusion indicators, with the error precision Ω=Σ1\Omega=\Sigma^{-1} endowed with a Gaussian graphical model prior (e.g., hyper-inverse-Wishart for decomposable graphs). Collapsing over regression coefficients and covariance enables efficient marginal MCMC for joint predictor–graph selection. Posterior model probabilities are consistent under increasing sample size, with the hyperparameters and design regularity controlling finite-sample behavior (Niu et al., 2020).

Applications of SUR models are broad, including local projection systems (where SUR likelihood arises naturally under VARMA data-generating processes) (Tanaka, 2020), multivariate forecasting, gene–drug association studies (Zhao et al., 2021), personalized fuel economy analysis using bivariate random-coefficient SUR (Wali et al., 2019), and stratified two-way panel models with SUR structure and heteroscedasticity (Platoni et al., 2016).

6. Summary Table of Modern SUR Generalizations

Extension Key Features Reference
Bayesian variable/covariance selection Hotspot/MRF priors on inclusion, HIW on Σ\Sigma, fast MCMC (Zhao et al., 2021)
High-dimensional precision FGLasso estimator, glasso on Σ1\Sigma^{-1}, large NN, sparse graph (Tan et al., 2018)
Robust estimation MM and S estimators; surerob for cell/row contamination (Peremans et al., 2018, Saraceno et al., 2021)
Non-Gaussian error mixtures Finite Gaussian mixtures, EM estimation, identifiability conditions (Galimberti et al., 2014)
Measurement error Bayesian MCMC/MFVB, prior on variance component (Bresson et al., 2020)
Model selection under predictor uncertainty Utility-based selection, post-inference lasso summarization (Puelz et al., 2016)
Stratified heteroscedastic panels Population/time stratification, component ANOVA estimation (Platoni et al., 2016)

7. Computational and Practical Considerations

Effective computation with SUR models depends on:

  • Covariance factorization (CC in block or graphical form) to avoid large matrix inversion, especially in MCMC (Zhao et al., 2021).
  • Parallelization of likelihood calculation across CPUs for Bayesian methods.
  • Algorithmic advances, such as Thompson-sampling-inspired "bandit" proposals for indicator matrices, and block-coordinate optimization in the graphical lasso (Tan et al., 2018).
  • The trade-off between breakdown point and efficiency in robust approaches, and the choice of weight functions ψ\psi in MM or MM+S estimation (Peremans et al., 2018, Saraceno et al., 2021).

Advances in this domain have enabled robust, scalable, and interpretable multivariate analysis spanning genomics, economics, engineering, and environmental domains, with continuing innovation in covariance modeling, regularization, and integration of contextual structural priors.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Seemingly Unrelated Regression Model.