Papers
Topics
Authors
Recent
Search
2000 character limit reached

Regularized Horseshoe Prior

Updated 23 January 2026
  • Regularized Horseshoe Prior is a global-local continuous shrinkage prior designed for sparse Bayesian inference, capping coefficient variance with an additional slab scale.
  • It improves regularization by controlling the tail behavior of large signals, thereby stabilizing inference under weak identifiability or heavy-tailed likelihoods.
  • Widely applied in high-dimensional regression and robust models, it effectively bridges spike-and-slab and continuous shrinkage techniques for better performance.

The regularized horseshoe prior is a global-local continuous shrinkage prior designed for sparse Bayesian inference, which extends the classical horseshoe by introducing an additional "slab" scale, thereby capping the amount of prior variance any coefficient may accumulate. This modification provides improved control over the level of regularization applied to large coefficients: regularizing their variance and preventing the possibility of arbitrarily large coefficient excursions that may destabilize inference, especially in settings with weak identifiability or heavily-tailed likelihoods (Piironen et al., 2017, Fan et al., 15 Jul 2025). The regularized horseshoe thus bridges the original horseshoe's spike-and-infinite-slab architecture to a spike-and-finite-slab regime, offering a continuous relaxation of two-group (spike-and-slab) priors and enhanced practical stability.

1. Definition and Hierarchical Structure

For pp-dimensional regression or classification—typically the high-dimensional regime pnp \gg n—the regularized horseshoe prior (RHS, Editor's term) is placed on a parameter vector θ=(θ1,,θp)\theta = (\theta_1, \ldots, \theta_p) assumed to be (approximately) sparse (Piironen et al., 2017, Fan et al., 15 Jul 2025):

  • Standard horseshoe:
    • Local shrinkage: λjC+(0,1)\lambda_j \sim \operatorname{C}^+(0, 1)
    • Global shrinkage: τπ(τ)\tau \sim \pi(\tau) (usually C+(0,τ0)\operatorname{C}^+(0, \tau_0))
    • Coefficient prior: θjλj,τN(0,τ2λj2)\theta_j \mid \lambda_j, \tau \sim \mathcal{N}(0, \tau^2 \lambda_j^2)
  • Regularized horseshoe:

    • The prior variance is bounded by a slab scale c>0c > 0 via:

    λ~j2=c2λj2c2+τ2λj2\tilde{\lambda}_j^2 = \frac{c^2 \lambda_j^2}{c^2 + \tau^2 \lambda_j^2}

    so

    θjλj,τ,cN(0,τ2λ~j2)\theta_j \mid \lambda_j, \tau, c \sim \mathcal{N}(0, \tau^2 \tilde{\lambda}_j^2) - For cc \to \infty, λ~jλj\tilde{\lambda}_j \to \lambda_j (original horseshoe). For finite cc, all θj\theta_j are marginally sub-Gaussian with variance at most c2c^2.

  • The slab scale cc is typically given a weakly informative inverse-gamma prior c2Inv-Gamma(α,β)c^2 \sim \text{Inv-Gamma}(\alpha, \beta), for instance α=2,β=8\alpha = 2, \beta = 8 for a Student-t4t_4 slab, or fixed to a large value if domain knowledge allows (Piironen et al., 2017, Fan et al., 15 Jul 2025).

This architecture is algebraically equivalent to multiplying the standard horseshoe prior by a zero-centered Gaussian slab of variance c2c^2 and renormalizing (Piironen et al., 2017).

Prior Marginal prior for θj\theta_j Regularizes tail?
Horseshoe Cauchy-like (θ2\sim |\theta|^{-2}) No (c=c = \infty)
Reg. Horseshoe Cauchy-small, Gaussian-large Yes (finite cc)
Spike-and-slab Mixture with finite-variance slab Yes (finite cc)

2. Motivations and Theoretical Rationale

The original horseshoe prior enforces strong shrinkage at zero (via the half-Cauchy local scales), promoting sparsity, but possesses heavy Cauchy-like tails, which means large signals are essentially unregularized. This can be problematic in several circumstances:

  • Weakly identified models: e.g., separation in logistic regression, where the likelihood cannot regularize large coefficients, leading to instability or divergence in posterior sampling (Piironen et al., 2017, Fan et al., 15 Jul 2025).
  • Inference under heavy-tailed errors: Without tail regularization, the horseshoe may deliver over-diffuse or infinite credible intervals for large signals (Fan et al., 15 Jul 2025).
  • Posterior computation and MCMC pathologies: The "infinite slab" induces strongly funnel-shaped posteriors, leading to divergent transitions in Hamiltonian Monte Carlo.

The regularized horseshoe prior caps the tail behavior at scale cc, providing controlled regularization for large coefficients. It can be viewed as a continuous relaxation of two-group spike-and-slab priors, replacing the infinite-variance slab with a finite one, thus blending the "strong spike, mild slab" features of discrete mixture priors with the computational and modeling advantages of continuous shrinkage (Piironen et al., 2017).

3. Hyperparameter Specification and Defaults

  • Global scale τ\tau: The effective degree of sparsity is controlled via τ\tau, which is ideally set according to a prior guess p0p_0 of the number of relevant coefficients. For standardized predictors and known σ\sigma, a practical default is:

τ0=p0pp0σn\tau_0 = \frac{p_0}{p - p_0} \cdot \frac{\sigma}{\sqrt{n}}

and use τC+(0,τ0)\tau \sim \operatorname{C}^+(0, \tau_0) or a half-Student-tt (Piironen et al., 2017, Bhadra et al., 2019).

  • Slab scale cc: When tail regularization is required, a weakly-informative prior such as c2Inv-Gamma(2,8)c^2 \sim \operatorname{Inv\text{-}Gamma}(2, 8) (or slab df 4, scale 2) is used, placing c.d.f. mass on plausible large effects. Alternatively, cc may be fixed to domain knowledge (e.g., "no effect exceeds 5 in magnitude").
  • Local scales λj\lambda_j: Defaults to half-Cauchy(0, 1); heavier or lighter tails (half-tνt_\nu with ν>1\nu > 1) are possible.

If cc \to \infty, one recovers the original horseshoe; if cc is small, the model approaches ordinary ridge-type shrinkage for large coefficients.

4. Posterior Computation

Posterior inference for the regularized horseshoe is facilitated by the Gaussian scale-mixture representation:

  • Each θj\theta_j is conditionally normal given λj,τ,c\lambda_j, \tau, c.
  • λj\lambda_j can be represented as a half-Cauchy via a scale-mixture of inverse-Gammas: λj2νjIG(12,1/νj), νjIG(12,1)\lambda_j^2 \mid \nu_j \sim \mathrm{IG}(\frac{1}{2}, 1/\nu_j), \ \nu_j \sim \mathrm{IG}(\frac{1}{2}, 1).
  • The joint posterior is sampled via block Gibbs or hybrid Gibbs–Metropolis updates for the (θ,λ,τ,c,ν)(\theta, \lambda, \tau, c, \nu) parameters (Piironen et al., 2017, Bhattacharya et al., 2021, Fan et al., 15 Jul 2025).

For generalized linear models, the standard data-augmentation techniques (e.g., Polya-Gamma for logistic regression) are utilized in conjunction with the regularized horseshoe hierarchy (Bhadra et al., 2019, Nalenz et al., 2017).

Geometric ergodicity of the block-Gibbs sampler has been established under the mild condition that the prior on τ\tau has a finite negative moment of order (p+δ)/2(p+\delta)/2 for some δ>0\delta > 0, without requiring explicit truncation of the slab or the local scale parameters (Bhattacharya et al., 2021).

5. Shrinkage Properties and Interpretation

The regularized horseshoe prior modulates shrinkage through a coefficient-wise factor:

κ~j=11+τ2λj2a~j\tilde{\kappa}_j = \frac{1}{1 + \tau^2 \lambda_j^2 \tilde{a}_j}

where a~j\tilde{a}_j is adapted to the design and likelihood (e.g., xjxj/σ2x_j^\top x_j / \sigma^2 in Gaussian regression). For small τ2λj2\tau^2 \lambda_j^2 relative to c2c^2, the shrinkage effect is as in the horseshoe. For large τ2λj2\tau^2 \lambda_j^2, the regularized variance flattens at c2c^2, so θjrestN(0,c2)\theta_j \mid \text{rest} \sim N(0, c^2). The effect is that:

  • Small and moderate signals are strongly shrunk, promoting sparsity.
  • Large signals are regularized towards the slab, yielding bounded, stable inference, and avoiding the tail pathologies of the original horseshoe in weakly identified regimes (Piironen et al., 2017, Fan et al., 15 Jul 2025).

Empirically, credible intervals for large coefficients under the regularized horseshoe remain finite and reflect improved frequentist coverage compared to the original horseshoe when the likelihood is weak (Fan et al., 15 Jul 2025).

6. Applications and Practical Recommendations

The regularized horseshoe prior has been successfully applied in:

  • High-dimensional variable selection under both Gaussian and robust (Laplace, tt-distributed) error models, with proven valid Bayesian credible interval coverage in both low- and heavy-tailed regimes (Fan et al., 15 Jul 2025).
  • Regression models with heavy-tailed or contaminated noise, where the posterior is more robust than ordinary horseshoe or non-regularized global-local shrinkage (Fan et al., 15 Jul 2025).
  • Fused lasso and structured regularization problems, where the horseshoe prior is imposed not on coefficients per se, but on a set of contrasts or differences, and may be regularized analogously (Kakikawa et al., 2022).
  • Complex and deep models: Neural networks, generalized linear models, and tree ensembles, with the slab parameter c controlling overfitting risk in low signal-to-noise regimes (Bhadra et al., 2019, Nalenz et al., 2017).

Default settings of cc in the range [1,10][1,\,10] have been found empirically to be robust in large-scale simulation studies (Bhattacharya et al., 2021). The use of the regularized horseshoe is recommended in any setting where control over the posterior spread of large coefficients is desired, or where extreme tail robustness of the original horseshoe may cause computational problems or induce poor frequentist coverage.

7. Software and Implementation Details

The regularized horseshoe prior is now implemented in multiple Bayesian inference platforms:

  • Stan: Full, reproducible code blocks for both Gaussian and logistic models. The transformation

λ~j=cλjc2+τ2λj2\tilde{\lambda}_j = \frac{c \lambda_j}{\sqrt{c^2 + \tau^2 \lambda_j^2}}

is used at the parameter block to "cap" the effective local scaling (Piironen et al., 2017).

  • TensorFlow Probability: tfp.distributions.Horseshoe supports slab regularization for both local and global parameters (Bhadra et al., 2019).
  • R packages and Matlab/Python code: For various horseshoe family extensions—including graphical horseshoe, deepGLM, Factor, and Tree-based models—are available (see survey tables in (Bhadra et al., 2019)).

Modern inference with the regularized horseshoe is typically conducted with hybrid block-Gibbs (using inversion-free updates for θ\theta) or with Hamiltonian Monte Carlo, which shows improved sampling behavior relative to the original horseshoe due to the tamed tails. For robust regression variants (Laplace likelihood), standard augmentation and block-updating are used (Fan et al., 15 Jul 2025).


References:

(Piironen et al., 2017, Fan et al., 15 Jul 2025, Bhattacharya et al., 2021, Bhadra et al., 2019, Kakikawa et al., 2022, Nalenz et al., 2017)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Regularized Horseshoe Prior.