Collapsed Gibbs Samplers

Updated 18 January 2026

Collapsed Gibbs samplers are MCMC methods that integrate out selected parameters, reducing the state space to improve mixing and computational efficiency in Bayesian models.
They leverage analytical marginalization in models with conjugacy, as exemplified in applications like LDA and hierarchical mixture modeling.
Practical variants, including blocked, partially collapsed, and hybrid approaches, balance computational cost with statistical accuracy in complex hierarchical setups.

A collapsed Gibbs sampler is a @@@@1@@@@ (MCMC) method in which one or more variables (“nuisance” or “fast-mixing” parameters) are analytically integrated out from the model, and the Markov chain is run on the lower-dimensional “collapsed” state space. Removing variables that are highly correlated with the remaining ones can substantially improve mixing and reduce autocorrelation. Collapsed Gibbs samplers are foundational for modern Bayesian inference in directed graphical models, with prominent applications in Latent Dirichlet Allocation (LDA) and mixture modeling. This article surveys the mathematical theoretical foundations, transition kernels, implementation strategies, computational and convergence properties, and principal variants of collapsed Gibbs samplers, with a particular emphasis on models relevant for high-dimensional and hierarchical Bayesian analysis.

1. Mathematical Formulation and Derivation

Let $x = (x_1, \ldots, x_K)$ be a $K$ -dimensional random vector on state space $\mathcal{X}_1 \times \dots \times \mathcal{X}_K$ with a target joint distribution $\pi(x)$ . The standard Gibbs sampler iteratively samples each coordinate $x_j$ from its full conditional $\pi(x_j\,|\,x_{-j})$ , cycling through all components. In a collapsed Gibbs sampler, a subset of variables, say $y$ , is integrated out, yielding a marginal (collapsed) distribution for $x':=x\setminus y$ . The transition kernel for the collapsed chain updates each block $x'_j$ given the rest, but using the marginal conditional

$p(x'_j\,|\,x'_{-j}) = \int \pi(x'_j, x'_{-j}, y)\,dy \Big/ \int \pi(x'_{-j}, y)\,dy.$

In hierarchical models, such as LDA, the collapsed Gibbs sampler exploits conjugacy: e.g., integrating out Dirichlet-multinomial parameters enables analytic computation of these conditionals in terms of sufficient statistics.

A prototypical example: in LDA, integrating out per-document topic proportions $\theta$ and topic-word distributions $\phi$ yields a conditional for topic assignment $z_i$ ,

$p(z_i = k \mid Z_{-i}, w, d; \alpha, \beta) \propto \frac{N_{w_i k}^{(-i)} + \beta}{N_k^{(-i)} + W\beta} (N_{k d_i}^{(-i)} + \alpha_k)$

where $N_{w_i k}^{(-i)}$ is the count of word $w_i$ assigned to topic $k$ (excluding position $i$ ), $N_{k d_i}^{(-i)}$ is the document-topic count, and $N_k^{(-i)}$ is the total token count for topic $k$ (Welling et al., 2012). This analytical marginalization is only tractable when conditional conjugacy or tractable integration applies.

2. Transition Operators, Convergence, and Spectral Properties

Collapsed Gibbs samplers are Markov chains whose transition operator $Q$ can be viewed as an orthogonal projection in the Hilbert space $L^2(\pi)$ . Chains with a spectral gap—quantified as $\text{Gap}(Q) = 1 - r(Q-\Pi)$ where $r$ is spectral radius and $\Pi$ is projection to constants—enjoy geometric ergodicity. The inheritance of spectral gap in collapsed and blocked variants is governed by generalized solidarity principles: every cyclic or mixture of blockwise or collapsed Gibbs projections inherits a spectral gap if and only if the full uncollapsed Gibbs operator does (Mak et al., 11 Jan 2026). Explicitly, for any collection of block projections, all update orders and mixtures possess a gap if one does.

However, geometric ergodicity of collapsed variants is not strictly monotone between different blockings. There exist models where one blocking is geometrically ergodic while another is not, even though both are collapses of the same full Gibbs sampler. For instance, a two-block collapsed sampler in (U,W) may have a spectral gap while one in (V,W) lacks it, depending on the posterior structure (Mak et al., 11 Jan 2026).

3. Implementation Algorithms and Practical Variants

Standard Collapsed Gibbs Algorithm

A basic collapsed Gibbs sampler proceeds as:

Initialize assignments.
For each coordinate $x_j$ : a) Remove $x_j$ from sufficient statistics. b) Compute collapsed conditional $p(x_j \mid x_{-j})$ . c) Sample new $x_j$ . d) Update sufficient statistics.
Repeat for multiple passes.

In LDA, the iteration over tokens is $O(NK)$ per pass, where $N$ is the number of tokens and $K$ the number of topics (Welling et al., 2012).

Blocked and Partially Collapsed Extensions

To improve mixing, nontrivial blocking strategies group dependent variables and jointly update them. For LDA, this can mean jointly reassigning all tokens in a document-word pair, leading to higher-dimensional but more efficient transitions (Zhang et al., 2016). Two exact algorithms— $O(K)$ backward simulation and $O(\log K)$ nested simulation—sample blocks efficiently.

Partial collapsing integrates out only a subset of parameters (e.g., document-topic $\theta$ but not topic-word $\phi$ ), enabling parallel and scalable variants with minimal loss in statistical efficiency (Magnusson et al., 2015). The empirical increase in autocorrelation (inefficiency) from partial collapsing is modest (IF ratios of 1.04–1.48), while parallelization yields substantial speedup.

For models without conjugacy, approximate collapsed Gibbs sampling via expectation propagation (EP) is feasible: intractable collapsed integrals are replaced by tractable EP approximations, yielding efficient samplers with mixing nearly identical to the exact collapsed variant (Aicher et al., 2018).

Partially Collapsed Gibbs with MH Steps

When full collapsed sampling is infeasible (e.g., non-analytic conditionals), partially collapsed Gibbs (PCG) methods combine exact marginalizations (for tractable components) with standard or Metropolis–Hastings (MH) updates for others. The reduce–permute–trim framework ensures correct stationary distributions when MH steps are used, enforcing that no MH step follows a collapsed removal of variables critical for the MH acceptance probability. Improper use of MH within PCG invalidates stationarity, making careful ordering essential (Dyk et al., 2013).

4. Theoretical Properties: Convergence and Invariant Measures

The convergence of collapsed and partially-collapsed Gibbs samplers can be analyzed via Markov operator theory or by leveraging “iterative conditional replacement” (ICR), which interprets each step as a closed-form I-projection (minimization of KL divergence) onto sets with fixed conditionals (Kuo et al., 2024). For compatible conditionals (arising from a valid joint), ICR exhibits strict contraction in KL and guarantees convergence in total variation. For pseudo-Gibbs scenarios with incompatible conditionals, ICR still converges but can yield multiple mutually stationary distributions.

Inheritance of a spectral gap across cycles and mixtures of blocked/collapsed Gibbs steps is universal whenever the full chain has a positive gap (Mak et al., 11 Jan 2026). However, the choice of blocking or collapse can critically impact mixing: while any such design is safeguarded from destroying a spectral gap present in the full chain, distinct blockings can differ in geometric ergodicity, and thus, empirical and theoretical checks are necessary for advanced models.

5. Computational Efficiency, Sparsity, and Parallelism

The primary computational burden in collapsed Gibbs sampling is the per-iteration cost, which, for single-site samplers in LDA, is $O(NK)$ . Exploitation of model sparsity (few active topics per document, rare word-document pairs) enables substantial acceleration using sparse-count updates, alias sampling, and parallel schemes (Magnusson et al., 2015). Blocked collapsed Gibbs samplers, with efficient block sampling (especially nested simulations in $O(\log K)$ per path), achieve both higher mixing and lower wall-clock time in large topic models (Zhang et al., 2016). Partially collapsed samplers further facilitate parallelization by exposing conditional independencies among blocks.

Pure variational Bayes provides lower-variance but biased estimates, especially for rare features, while collapsed Gibbs ensures unbiasedness at the cost of higher computational variance. Hybrid variational/collapsed algorithms, which sample low-count tokens and update high-count tokens variationally, achieve nearly optimal perplexity without computational overhead, as hybridization corrects for the bias introduced by variational methods in sparse regimes (Welling et al., 2012).

6. Design Considerations, Practical Guidelines, and Limitations

Practical application of collapsed Gibbs sampling depends on several considerations:

When to collapse: Collapsing is beneficial when conjugacy or tractable integration is present and posterior correlations between variables are strong.
Block and update order: Permissible cycles or update orders must be ensured for theoretical guarantees, particularly in partially collapsed schemes; ordering can affect convergence if conditionals are incompatible.
Computational trade-off: Analytical marginalization should be balanced against the cost of integration; when this is costly or intractable, approximate strategies (EP, partial collapsing, MH-within-Gibbs) should be deployed (Aicher et al., 2018, Dyk et al., 2013).
Scalability: For large models (many topics, large corpora), leverage sparsity and parallelism in partially or blocked collapsed samplers to balance statistical efficiency against speed (Magnusson et al., 2015, Zhang et al., 2016).
Hybridization: For highly sparse data (rare features), hybrid variational/collapsed methods yield accuracy gains at little or no computational cost (Welling et al., 2012).
Spectral and convergence properties: While collapsing cannot destroy a gap present in the full sampler, distinct blocking schemes differ in ergodicity and mixing; thus, sampler design must consider both model structure and algorithmic properties (Mak et al., 11 Jan 2026).

7. Applications and Empirical Performance

Collapsed Gibbs samplers are the de facto standard for Bayesian inference in latent variable models where conjugacy is present, such as LDA, hierarchical Bayesian mixtures, and finite mixture models (Welling et al., 2012, Magnusson et al., 2015). Empirically, fully collapsed and blocked variants reach lower perplexity and higher effective sample size (ESS) compared to pure variational or non-collapsed Gibbs. Partially collapsed and hybrid Gibbs-variational schemes combine parallel scalability with excellent statistical efficiency and predictive accuracy (Welling et al., 2012, Magnusson et al., 2015).

Approximate collapsed strategies, such as expectation propagation-collapsed Gibbs for nonconjugate mixtures and time series, achieve the mixing benefits of full collapse with greatly reduced computational cost when exact integration is not possible (Aicher et al., 2018). These developments ensure that collapsed and partially collapsed Gibbs samplers remain a critical tool in high-dimensional Bayesian inference and large-scale probabilistic modeling.