Rao's Mixed-Effects Model for Complex Surveys

Updated 28 January 2026

Rao's Mixed-Effects Model is a two-level linear mixed model estimated via pairwise composite likelihood, yielding design-consistent results for survey data with clusters.
The method mitigates high-dimensional integration challenges by summarizing bivariate likelihood contributions, trading some efficiency for robustness in estimating fixed and random effects.
The approach is efficiently implemented in the R package svylme, making it practical for applications in educational, health, and social science surveys with complex sampling designs.

Rao's Mixed-Effects Model refers to the class of two-level linear mixed models estimated using the pairwise composite likelihood (PCL) approach developed by Rao and co-workers for complex survey data. This methodology yields a general, design-consistent estimator for hierarchical linear models when the model clusters coincide with survey design clusters, without requiring cluster sizes to grow large. Rao's approach offers robust inference for both fixed and random effects under informative and complex sampling, at the cost of some loss in statistical efficiency, particularly for variance components (Lumley et al., 2023).

1. Formulation of the Two-Level Linear Mixed Model

The classical two-level linear mixed model is defined as

$Y_{ij} = X_{ij}\,\beta + Z_{ij}\,b_{i} + \varepsilon_{ij},$

where $i=1,\ldots,n_1$ indexes clusters (e.g., schools) and $j=1,\ldots,m_i$ indexes units within cluster $i$ . The random effects $b_i$ follow $b_i\sim N(0, G)$ , the residuals $\varepsilon_{ij}\sim N(0, \sigma^2)$ , and $\operatorname{Cov}(b_i, \varepsilon_{ij})=0$ . Here, $X_{ij}$ is a $1\times p$ row of fixed-effects covariates and $Z_{ij}$ is a $1\times q$ row of random-effects covariates, with parameter vector $\theta^* = (\beta, \sigma^2, \theta)$ , where $G = \sigma^2 V(\theta)$ parametrizes variance-covariance structure.

Marginalizing over random effects, the within-cluster response vector $Y_i = (Y_{i1},\ldots,Y_{i,m_i})^\mathrm{T}$ has distribution

$Y_i \sim N(X_i\,\beta,\,\sigma^2\,\Xi_i(\theta)),\quad \Xi_i(\theta) = I_{m_i} + Z_i V(\theta)Z_i^\mathrm{T},$

where $X_i$ , $Z_i$ stack the individual-level covariates.

2. Pairwise Composite Likelihood Construction

Full maximum likelihood for mixed models involves high-dimensional integration over random effects. Rao's PCL approach instead employs a sum of bivariate log-densities for all pairs within clusters:

$\ell_P(\beta, \theta, \sigma^2) = \sum_{i=1}^{N_1} \sum_{1 \leq j < k \leq m_i} \ell_{i,jk}(\beta, \theta, \sigma^2),$

where

$f_{i,jk}(y_{ij}, y_{ik}; \beta, \theta, \sigma^2) = (2\pi\sigma^2)^{-1}\,|\Xi_{i,jk}(\theta)|^{-1/2} \exp\left\{ -\frac{1}{2\sigma^2} r_{i,jk}^\mathrm{T} \Xi_{i,jk}(\theta)^{-1} r_{i,jk} \right\}$

and $r_{i,jk} = (y_{ij} - \mu_{ij},\, y_{ik} - \mu_{ik})^\mathrm{T}$ , $\mu_{ij} = X_{ij}\beta$ . The $2\times2$ covariance submatrix $\Xi_{i,jk}(\theta)$ is extracted from $\Xi_i(\theta)$ .

Survey sampling induces complex patterns of missing data: only a subset of units (and thus pairs) are observed, each with inclusion probability $\pi_{i,jk}$ . The design-weighted composite loglikelihood is

$\hat\ell_P(\beta, \theta, \sigma^2) = \sum_{i=1}^{n_1} \sum_{j < k} \frac{R_{ij} R_{ik}}{\pi_{i,jk}}\,\ell_{i,jk}(\beta, \theta, \sigma^2),$

with $R_{ij} = 1$ for sampled units.

3. Statistical Estimation via Composite Scores

Unbiased estimating equations are derived by differentiating $\hat\ell_P$ with respect to model parameters:

Fixed effects ( $\beta$ ):

$U_\beta(\beta, \theta, \sigma^2) = \sum_{i,j<k} \frac{R_{ij} R_{ik}}{\pi_{ijk}} X_{i,jk}^\mathrm{T} \Xi_{i,jk}(\theta)^{-1} r_{i,jk} / \sigma^2 = 0$

Solving $U_\beta = 0$ for fixed $(\theta, \sigma^2)$ yields the generalized least-squares estimator.

Residual variance ( $\sigma^2$ ):

$U_{\sigma^2} = \sum_{i,j<k} \frac{R_{ij} R_{ik}}{\pi_{ijk}} \left[ -\frac{1}{2\sigma^2} + \frac{r_{i,jk}^\mathrm{T} \Xi_{i,jk}^{-1} r_{i,jk}}{2\sigma^4} \right] = 0$

Variance parameters ( $\theta$ ):

$U_{\theta_\ell} = \sum_{i,j<k} \frac{R_{ij} R_{ik}}{\pi_{ijk}} \left[ -\frac{1}{2} \operatorname{tr}(\Xi_{i,jk}^{-1} \partial \Xi_{i,jk}/\partial\theta_\ell) + \frac{1}{2\sigma^2} r_{i,jk}^\mathrm{T} \Xi_{i,jk}^{-1}\left( \frac{\partial \Xi_{i,jk}}{\partial\theta_\ell} \right) \Xi_{i,jk}^{-1} r_{i,jk} \right] = 0$

Joint closed-form solutions do not exist; thus, a profiling strategy is employed. For each candidate $\theta$ , corresponding profile estimators $\tilde\beta(\theta)$ and $\tilde\sigma^2(\theta)$ are obtained by maximizing $\hat\ell_P$ conditionally. The profile deviance is defined as

$\hat d(\theta) = -2 \hat\ell_P(\tilde\beta(\theta), \theta, \tilde\sigma^2(\theta)) = 2\hat N_P \log(2\pi\tilde\sigma^2(\theta)) + \sum_{i, j<k} \frac{R_{ij} R_{ik}}{\pi_{ijk}} \log|\Xi_{i,jk}(\theta)|.$

4. Computational Implementation

The R package svylme provides an efficient implementation via the function svy2lme(design, formula). Key algorithmic steps include:

Pairwise enumeration: Within each sampled cluster, all observed within-cluster pairs $(j<k)$ are enumerated. Inclusion weights $\pi_{ijk}$ are calculated using exact or approximate formulas derived from survey sampling probabilities.
Profiling: For each optimizer trial value of $\theta$ , the corresponding profiles $\tilde\beta(\theta)$ and $\tilde\sigma^2(\theta)$ are computed by applying generalized least squares to $2N_P$ pseudo-observations (two rows per pair).
Optimization: The profile deviance $\hat d(\theta)$ is minimized over $\theta$ using Powell’s BOBYQA derivative-free optimizer (R package minqa). Start values are obtained from an unweighted fit using lmer from lme4.
Efficient blockwise calculation: Closed-form formulas for determinant and inverse of $2\times2$ covariance blocks $\Xi_{i,jk}$ are used for computational efficiency.

5. Variance Estimation and Large-Sample Properties

Under suitable conditions, including a law of large numbers and central limit theorem for weighted sums of pairwise scores, the PCL estimator $\hat\theta=(\hat\beta, \hat\theta, \hat\sigma^2)$ is consistent and asymptotically normal:

$\sqrt{n_1}\,(\hat\theta - \theta_0) \xrightarrow{d} N(0, H^{-1} J H^{-1}),$

where $H = -E[\partial U/\partial\theta^\mathrm{T}]$ (sensitivity) and $J = \operatorname{Var}(U)$ , with $U = \sum U_{i,jk}$ . For complex survey designs, sandwich estimators replace $H$ and $J$ with their weighted analogues:

$\hat H = \sum_{i,j<k} (R_{ij} R_{ik}/\pi_{ijk})(-\partial^2 \ell_{i,jk}/\partial\theta\partial\theta^\mathrm{T})$
$\hat J =$ "with-replacement" PSU-level estimate of $\operatorname{Var}\left( \sum_{i,j<k} (R/\pi) U_{i,jk} \right)$ .

6. Efficiency, Robustness, and Comparative Properties

Simulations demonstrate the following empirical properties:

Under noninformative sampling, the PCL estimator is nearly unbiased for $\beta$ , $\sigma^2$ , and $\theta$ , but is less efficient than full maximum likelihood and stagewise pseudolikelihood estimators with weight scaling. The efficiency loss is most pronounced for variance components (e.g., approximately 2–3 times larger standard errors for random intercept variance in moderate cluster sizes); as cluster sizes decrease, this inefficiency diminishes, with PCL equaling maximum likelihood efficiency when cluster size $m_i=2$ .
Under strongly informative sampling (where unit inclusion depends on cluster-level effects), PCL estimators remain unbiased, while stagewise pseudolikelihood—regardless of weight scaling—can display substantial bias in both fixed effects ( $\beta$ ) and variance components, unless clusters are extremely large.

This approach trades statistical efficiency (especially for variance parameters) for robustness to informative sampling and is suitable for general multistage survey designs (Lumley et al., 2023).

7. Applications and Extensions

The PCL methodology is implemented in the svylme R package, supporting the analysis of two-level linear mixed models applied to complex survey data such as educational assessments (e.g., PISA). The method's robustness under complex, informative sampling extends its applicability to a broad class of health and social science surveys featuring multistage cluster designs. The approach does not require large clusters, facilitating estimation in practical settings where cluster sizes are moderate or where only a limited number of within-cluster units are observed.

A plausible implication is that future extensions may generalize PCL methods to multilevel structures beyond two levels or to other classes of generalized linear mixed models, building on the computational strategies and robustness guarantees of Rao's approach.

Markdown Report Issue Upgrade to Chat

References (1)

Linear mixed models for complex survey data: implementing and evaluating pairwise likelihood (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Rao's Mixed-Effects Model.

Rao's Mixed-Effects Model for Complex Surveys

1. Formulation of the Two-Level Linear Mixed Model

2. Pairwise Composite Likelihood Construction

3. Statistical Estimation via Composite Scores

4. Computational Implementation

5. Variance Estimation and Large-Sample Properties

6. Efficiency, Robustness, and Comparative Properties

7. Applications and Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Rao's Mixed-Effects Model for Complex Surveys

1. Formulation of the Two-Level Linear Mixed Model

2. Pairwise Composite Likelihood Construction

3. Statistical Estimation via Composite Scores

4. Computational Implementation

5. Variance Estimation and Large-Sample Properties

6. Efficiency, Robustness, and Comparative Properties

7. Applications and Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research