Papers
Topics
Authors
Recent
Search
2000 character limit reached

Rao's Mixed-Effects Model for Complex Surveys

Updated 28 January 2026
  • Rao's Mixed-Effects Model is a two-level linear mixed model estimated via pairwise composite likelihood, yielding design-consistent results for survey data with clusters.
  • The method mitigates high-dimensional integration challenges by summarizing bivariate likelihood contributions, trading some efficiency for robustness in estimating fixed and random effects.
  • The approach is efficiently implemented in the R package svylme, making it practical for applications in educational, health, and social science surveys with complex sampling designs.

Rao's Mixed-Effects Model refers to the class of two-level linear mixed models estimated using the pairwise composite likelihood (PCL) approach developed by Rao and co-workers for complex survey data. This methodology yields a general, design-consistent estimator for hierarchical linear models when the model clusters coincide with survey design clusters, without requiring cluster sizes to grow large. Rao's approach offers robust inference for both fixed and random effects under informative and complex sampling, at the cost of some loss in statistical efficiency, particularly for variance components (Lumley et al., 2023).

1. Formulation of the Two-Level Linear Mixed Model

The classical two-level linear mixed model is defined as

Yij=Xijβ+Zijbi+εij,Y_{ij} = X_{ij}\,\beta + Z_{ij}\,b_{i} + \varepsilon_{ij},

where i=1,,n1i=1,\ldots,n_1 indexes clusters (e.g., schools) and j=1,,mij=1,\ldots,m_i indexes units within cluster ii. The random effects bib_i follow biN(0,G)b_i\sim N(0, G), the residuals εijN(0,σ2)\varepsilon_{ij}\sim N(0, \sigma^2), and Cov(bi,εij)=0\operatorname{Cov}(b_i, \varepsilon_{ij})=0. Here, XijX_{ij} is a 1×p1\times p row of fixed-effects covariates and ZijZ_{ij} is a 1×q1\times q row of random-effects covariates, with parameter vector θ=(β,σ2,θ)\theta^* = (\beta, \sigma^2, \theta), where G=σ2V(θ)G = \sigma^2 V(\theta) parametrizes variance-covariance structure.

Marginalizing over random effects, the within-cluster response vector Yi=(Yi1,,Yi,mi)TY_i = (Y_{i1},\ldots,Y_{i,m_i})^\mathrm{T} has distribution

YiN(Xiβ,σ2Ξi(θ)),Ξi(θ)=Imi+ZiV(θ)ZiT,Y_i \sim N(X_i\,\beta,\,\sigma^2\,\Xi_i(\theta)),\quad \Xi_i(\theta) = I_{m_i} + Z_i V(\theta)Z_i^\mathrm{T},

where XiX_i, ZiZ_i stack the individual-level covariates.

2. Pairwise Composite Likelihood Construction

Full maximum likelihood for mixed models involves high-dimensional integration over random effects. Rao's PCL approach instead employs a sum of bivariate log-densities for all pairs within clusters:

P(β,θ,σ2)=i=1N11j<kmii,jk(β,θ,σ2),\ell_P(\beta, \theta, \sigma^2) = \sum_{i=1}^{N_1} \sum_{1 \leq j < k \leq m_i} \ell_{i,jk}(\beta, \theta, \sigma^2),

where

fi,jk(yij,yik;β,θ,σ2)=(2πσ2)1Ξi,jk(θ)1/2exp{12σ2ri,jkTΞi,jk(θ)1ri,jk}f_{i,jk}(y_{ij}, y_{ik}; \beta, \theta, \sigma^2) = (2\pi\sigma^2)^{-1}\,|\Xi_{i,jk}(\theta)|^{-1/2} \exp\left\{ -\frac{1}{2\sigma^2} r_{i,jk}^\mathrm{T} \Xi_{i,jk}(\theta)^{-1} r_{i,jk} \right\}

and ri,jk=(yijμij,yikμik)Tr_{i,jk} = (y_{ij} - \mu_{ij},\, y_{ik} - \mu_{ik})^\mathrm{T}, μij=Xijβ\mu_{ij} = X_{ij}\beta. The 2×22\times2 covariance submatrix Ξi,jk(θ)\Xi_{i,jk}(\theta) is extracted from Ξi(θ)\Xi_i(\theta).

Survey sampling induces complex patterns of missing data: only a subset of units (and thus pairs) are observed, each with inclusion probability πi,jk\pi_{i,jk}. The design-weighted composite loglikelihood is

^P(β,θ,σ2)=i=1n1j<kRijRikπi,jki,jk(β,θ,σ2),\hat\ell_P(\beta, \theta, \sigma^2) = \sum_{i=1}^{n_1} \sum_{j < k} \frac{R_{ij} R_{ik}}{\pi_{i,jk}}\,\ell_{i,jk}(\beta, \theta, \sigma^2),

with Rij=1R_{ij} = 1 for sampled units.

3. Statistical Estimation via Composite Scores

Unbiased estimating equations are derived by differentiating ^P\hat\ell_P with respect to model parameters:

  • Fixed effects (β\beta):

Uβ(β,θ,σ2)=i,j<kRijRikπijkXi,jkTΞi,jk(θ)1ri,jk/σ2=0U_\beta(\beta, \theta, \sigma^2) = \sum_{i,j<k} \frac{R_{ij} R_{ik}}{\pi_{ijk}} X_{i,jk}^\mathrm{T} \Xi_{i,jk}(\theta)^{-1} r_{i,jk} / \sigma^2 = 0

Solving Uβ=0U_\beta = 0 for fixed (θ,σ2)(\theta, \sigma^2) yields the generalized least-squares estimator.

  • Residual variance (σ2\sigma^2):

Uσ2=i,j<kRijRikπijk[12σ2+ri,jkTΞi,jk1ri,jk2σ4]=0U_{\sigma^2} = \sum_{i,j<k} \frac{R_{ij} R_{ik}}{\pi_{ijk}} \left[ -\frac{1}{2\sigma^2} + \frac{r_{i,jk}^\mathrm{T} \Xi_{i,jk}^{-1} r_{i,jk}}{2\sigma^4} \right] = 0

  • Variance parameters (θ\theta):

Uθ=i,j<kRijRikπijk[12tr(Ξi,jk1Ξi,jk/θ)+12σ2ri,jkTΞi,jk1(Ξi,jkθ)Ξi,jk1ri,jk]=0U_{\theta_\ell} = \sum_{i,j<k} \frac{R_{ij} R_{ik}}{\pi_{ijk}} \left[ -\frac{1}{2} \operatorname{tr}(\Xi_{i,jk}^{-1} \partial \Xi_{i,jk}/\partial\theta_\ell) + \frac{1}{2\sigma^2} r_{i,jk}^\mathrm{T} \Xi_{i,jk}^{-1}\left( \frac{\partial \Xi_{i,jk}}{\partial\theta_\ell} \right) \Xi_{i,jk}^{-1} r_{i,jk} \right] = 0

Joint closed-form solutions do not exist; thus, a profiling strategy is employed. For each candidate θ\theta, corresponding profile estimators β~(θ)\tilde\beta(\theta) and σ~2(θ)\tilde\sigma^2(\theta) are obtained by maximizing ^P\hat\ell_P conditionally. The profile deviance is defined as

d^(θ)=2^P(β~(θ),θ,σ~2(θ))=2N^Plog(2πσ~2(θ))+i,j<kRijRikπijklogΞi,jk(θ).\hat d(\theta) = -2 \hat\ell_P(\tilde\beta(\theta), \theta, \tilde\sigma^2(\theta)) = 2\hat N_P \log(2\pi\tilde\sigma^2(\theta)) + \sum_{i, j<k} \frac{R_{ij} R_{ik}}{\pi_{ijk}} \log|\Xi_{i,jk}(\theta)|.

4. Computational Implementation

The R package svylme provides an efficient implementation via the function svy2lme(design, formula). Key algorithmic steps include:

  • Pairwise enumeration: Within each sampled cluster, all observed within-cluster pairs (j<k)(j<k) are enumerated. Inclusion weights πijk\pi_{ijk} are calculated using exact or approximate formulas derived from survey sampling probabilities.
  • Profiling: For each optimizer trial value of θ\theta, the corresponding profiles β~(θ)\tilde\beta(\theta) and σ~2(θ)\tilde\sigma^2(\theta) are computed by applying generalized least squares to 2NP2N_P pseudo-observations (two rows per pair).
  • Optimization: The profile deviance d^(θ)\hat d(\theta) is minimized over θ\theta using Powell’s BOBYQA derivative-free optimizer (R package minqa). Start values are obtained from an unweighted fit using lmer from lme4.
  • Efficient blockwise calculation: Closed-form formulas for determinant and inverse of 2×22\times2 covariance blocks Ξi,jk\Xi_{i,jk} are used for computational efficiency.

5. Variance Estimation and Large-Sample Properties

Under suitable conditions, including a law of large numbers and central limit theorem for weighted sums of pairwise scores, the PCL estimator θ^=(β^,θ^,σ^2)\hat\theta=(\hat\beta, \hat\theta, \hat\sigma^2) is consistent and asymptotically normal:

n1(θ^θ0)dN(0,H1JH1),\sqrt{n_1}\,(\hat\theta - \theta_0) \xrightarrow{d} N(0, H^{-1} J H^{-1}),

where H=E[U/θT]H = -E[\partial U/\partial\theta^\mathrm{T}] (sensitivity) and J=Var(U)J = \operatorname{Var}(U), with U=Ui,jkU = \sum U_{i,jk}. For complex survey designs, sandwich estimators replace HH and JJ with their weighted analogues:

  • H^=i,j<k(RijRik/πijk)(2i,jk/θθT)\hat H = \sum_{i,j<k} (R_{ij} R_{ik}/\pi_{ijk})(-\partial^2 \ell_{i,jk}/\partial\theta\partial\theta^\mathrm{T})
  • J^=\hat J = "with-replacement" PSU-level estimate of Var(i,j<k(R/π)Ui,jk)\operatorname{Var}\left( \sum_{i,j<k} (R/\pi) U_{i,jk} \right).

6. Efficiency, Robustness, and Comparative Properties

Simulations demonstrate the following empirical properties:

  • Under noninformative sampling, the PCL estimator is nearly unbiased for β\beta, σ2\sigma^2, and θ\theta, but is less efficient than full maximum likelihood and stagewise pseudolikelihood estimators with weight scaling. The efficiency loss is most pronounced for variance components (e.g., approximately 2–3 times larger standard errors for random intercept variance in moderate cluster sizes); as cluster sizes decrease, this inefficiency diminishes, with PCL equaling maximum likelihood efficiency when cluster size mi=2m_i=2.
  • Under strongly informative sampling (where unit inclusion depends on cluster-level effects), PCL estimators remain unbiased, while stagewise pseudolikelihood—regardless of weight scaling—can display substantial bias in both fixed effects (β\beta) and variance components, unless clusters are extremely large.

This approach trades statistical efficiency (especially for variance parameters) for robustness to informative sampling and is suitable for general multistage survey designs (Lumley et al., 2023).

7. Applications and Extensions

The PCL methodology is implemented in the svylme R package, supporting the analysis of two-level linear mixed models applied to complex survey data such as educational assessments (e.g., PISA). The method's robustness under complex, informative sampling extends its applicability to a broad class of health and social science surveys featuring multistage cluster designs. The approach does not require large clusters, facilitating estimation in practical settings where cluster sizes are moderate or where only a limited number of within-cluster units are observed.

A plausible implication is that future extensions may generalize PCL methods to multilevel structures beyond two levels or to other classes of generalized linear mixed models, building on the computational strategies and robustness guarantees of Rao's approach.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Rao's Mixed-Effects Model.