Papers
Topics
Authors
Recent
Search
2000 character limit reached

Regularized Horseshoe Prior

Updated 23 January 2026
  • Regularized Horseshoe Prior is a global-local continuous shrinkage prior designed for sparse Bayesian inference, capping coefficient variance with an additional slab scale.
  • It improves regularization by controlling the tail behavior of large signals, thereby stabilizing inference under weak identifiability or heavy-tailed likelihoods.
  • Widely applied in high-dimensional regression and robust models, it effectively bridges spike-and-slab and continuous shrinkage techniques for better performance.

The regularized horseshoe prior is a global-local continuous shrinkage prior designed for sparse Bayesian inference, which extends the classical horseshoe by introducing an additional "slab" scale, thereby capping the amount of prior variance any coefficient may accumulate. This modification provides improved control over the level of regularization applied to large coefficients: regularizing their variance and preventing the possibility of arbitrarily large coefficient excursions that may destabilize inference, especially in settings with weak identifiability or heavily-tailed likelihoods (Piironen et al., 2017, Fan et al., 15 Jul 2025). The regularized horseshoe thus bridges the original horseshoe's spike-and-infinite-slab architecture to a spike-and-finite-slab regime, offering a continuous relaxation of two-group (spike-and-slab) priors and enhanced practical stability.

1. Definition and Hierarchical Structure

For pp-dimensional regression or classification—typically the high-dimensional regime pnp \gg n—the regularized horseshoe prior (RHS, Editor's term) is placed on a parameter vector θ=(θ1,,θp)\theta = (\theta_1, \ldots, \theta_p) assumed to be (approximately) sparse (Piironen et al., 2017, Fan et al., 15 Jul 2025):

  • Standard horseshoe:
    • Local shrinkage: λjC+(0,1)\lambda_j \sim \operatorname{C}^+(0, 1)
    • Global shrinkage: τπ(τ)\tau \sim \pi(\tau) (usually C+(0,τ0)\operatorname{C}^+(0, \tau_0))
    • Coefficient prior: θjλj,τN(0,τ2λj2)\theta_j \mid \lambda_j, \tau \sim \mathcal{N}(0, \tau^2 \lambda_j^2)
  • Regularized horseshoe:

    • The prior variance is bounded by a slab scale c>0c > 0 via:

    λ~j2=c2λj2c2+τ2λj2\tilde{\lambda}_j^2 = \frac{c^2 \lambda_j^2}{c^2 + \tau^2 \lambda_j^2}

    so

    θjλj,τ,cN(0,τ2λ~j2)\theta_j \mid \lambda_j, \tau, c \sim \mathcal{N}(0, \tau^2 \tilde{\lambda}_j^2) - For pnp \gg n0, pnp \gg n1 (original horseshoe). For finite pnp \gg n2, all pnp \gg n3 are marginally sub-Gaussian with variance at most pnp \gg n4.

  • The slab scale pnp \gg n5 is typically given a weakly informative inverse-gamma prior pnp \gg n6, for instance pnp \gg n7 for a Student-pnp \gg n8 slab, or fixed to a large value if domain knowledge allows (Piironen et al., 2017, Fan et al., 15 Jul 2025).

This architecture is algebraically equivalent to multiplying the standard horseshoe prior by a zero-centered Gaussian slab of variance pnp \gg n9 and renormalizing (Piironen et al., 2017).

Prior Marginal prior for θ=(θ1,,θp)\theta = (\theta_1, \ldots, \theta_p)0 Regularizes tail?
Horseshoe Cauchy-like (θ=(θ1,,θp)\theta = (\theta_1, \ldots, \theta_p)1) No (θ=(θ1,,θp)\theta = (\theta_1, \ldots, \theta_p)2)
Reg. Horseshoe Cauchy-small, Gaussian-large Yes (finite θ=(θ1,,θp)\theta = (\theta_1, \ldots, \theta_p)3)
Spike-and-slab Mixture with finite-variance slab Yes (finite θ=(θ1,,θp)\theta = (\theta_1, \ldots, \theta_p)4)

2. Motivations and Theoretical Rationale

The original horseshoe prior enforces strong shrinkage at zero (via the half-Cauchy local scales), promoting sparsity, but possesses heavy Cauchy-like tails, which means large signals are essentially unregularized. This can be problematic in several circumstances:

The regularized horseshoe prior caps the tail behavior at scale θ=(θ1,,θp)\theta = (\theta_1, \ldots, \theta_p)5, providing controlled regularization for large coefficients. It can be viewed as a continuous relaxation of two-group spike-and-slab priors, replacing the infinite-variance slab with a finite one, thus blending the "strong spike, mild slab" features of discrete mixture priors with the computational and modeling advantages of continuous shrinkage (Piironen et al., 2017).

3. Hyperparameter Specification and Defaults

  • Global scale θ=(θ1,,θp)\theta = (\theta_1, \ldots, \theta_p)6: The effective degree of sparsity is controlled via θ=(θ1,,θp)\theta = (\theta_1, \ldots, \theta_p)7, which is ideally set according to a prior guess θ=(θ1,,θp)\theta = (\theta_1, \ldots, \theta_p)8 of the number of relevant coefficients. For standardized predictors and known θ=(θ1,,θp)\theta = (\theta_1, \ldots, \theta_p)9, a practical default is:

λjC+(0,1)\lambda_j \sim \operatorname{C}^+(0, 1)0

and use λjC+(0,1)\lambda_j \sim \operatorname{C}^+(0, 1)1 or a half-Student-λjC+(0,1)\lambda_j \sim \operatorname{C}^+(0, 1)2 (Piironen et al., 2017, Bhadra et al., 2019).

  • Slab scale λjC+(0,1)\lambda_j \sim \operatorname{C}^+(0, 1)3: When tail regularization is required, a weakly-informative prior such as λjC+(0,1)\lambda_j \sim \operatorname{C}^+(0, 1)4 (or slab df 4, scale 2) is used, placing c.d.f. mass on plausible large effects. Alternatively, λjC+(0,1)\lambda_j \sim \operatorname{C}^+(0, 1)5 may be fixed to domain knowledge (e.g., "no effect exceeds 5 in magnitude").
  • Local scales λjC+(0,1)\lambda_j \sim \operatorname{C}^+(0, 1)6: Defaults to half-Cauchy(0, 1); heavier or lighter tails (half-λjC+(0,1)\lambda_j \sim \operatorname{C}^+(0, 1)7 with λjC+(0,1)\lambda_j \sim \operatorname{C}^+(0, 1)8) are possible.

If λjC+(0,1)\lambda_j \sim \operatorname{C}^+(0, 1)9, one recovers the original horseshoe; if τπ(τ)\tau \sim \pi(\tau)0 is small, the model approaches ordinary ridge-type shrinkage for large coefficients.

4. Posterior Computation

Posterior inference for the regularized horseshoe is facilitated by the Gaussian scale-mixture representation:

  • Each τπ(τ)\tau \sim \pi(\tau)1 is conditionally normal given τπ(τ)\tau \sim \pi(\tau)2.
  • τπ(τ)\tau \sim \pi(\tau)3 can be represented as a half-Cauchy via a scale-mixture of inverse-Gammas: τπ(τ)\tau \sim \pi(\tau)4.
  • The joint posterior is sampled via block Gibbs or hybrid Gibbs–Metropolis updates for the τπ(τ)\tau \sim \pi(\tau)5 parameters (Piironen et al., 2017, Bhattacharya et al., 2021, Fan et al., 15 Jul 2025).

For generalized linear models, the standard data-augmentation techniques (e.g., Polya-Gamma for logistic regression) are utilized in conjunction with the regularized horseshoe hierarchy (Bhadra et al., 2019, Nalenz et al., 2017).

Geometric ergodicity of the block-Gibbs sampler has been established under the mild condition that the prior on τπ(τ)\tau \sim \pi(\tau)6 has a finite negative moment of order τπ(τ)\tau \sim \pi(\tau)7 for some τπ(τ)\tau \sim \pi(\tau)8, without requiring explicit truncation of the slab or the local scale parameters (Bhattacharya et al., 2021).

5. Shrinkage Properties and Interpretation

The regularized horseshoe prior modulates shrinkage through a coefficient-wise factor:

τπ(τ)\tau \sim \pi(\tau)9

where C+(0,τ0)\operatorname{C}^+(0, \tau_0)0 is adapted to the design and likelihood (e.g., C+(0,τ0)\operatorname{C}^+(0, \tau_0)1 in Gaussian regression). For small C+(0,τ0)\operatorname{C}^+(0, \tau_0)2 relative to C+(0,τ0)\operatorname{C}^+(0, \tau_0)3, the shrinkage effect is as in the horseshoe. For large C+(0,τ0)\operatorname{C}^+(0, \tau_0)4, the regularized variance flattens at C+(0,τ0)\operatorname{C}^+(0, \tau_0)5, so C+(0,τ0)\operatorname{C}^+(0, \tau_0)6. The effect is that:

  • Small and moderate signals are strongly shrunk, promoting sparsity.
  • Large signals are regularized towards the slab, yielding bounded, stable inference, and avoiding the tail pathologies of the original horseshoe in weakly identified regimes (Piironen et al., 2017, Fan et al., 15 Jul 2025).

Empirically, credible intervals for large coefficients under the regularized horseshoe remain finite and reflect improved frequentist coverage compared to the original horseshoe when the likelihood is weak (Fan et al., 15 Jul 2025).

6. Applications and Practical Recommendations

The regularized horseshoe prior has been successfully applied in:

  • High-dimensional variable selection under both Gaussian and robust (Laplace, C+(0,τ0)\operatorname{C}^+(0, \tau_0)7-distributed) error models, with proven valid Bayesian credible interval coverage in both low- and heavy-tailed regimes (Fan et al., 15 Jul 2025).
  • Regression models with heavy-tailed or contaminated noise, where the posterior is more robust than ordinary horseshoe or non-regularized global-local shrinkage (Fan et al., 15 Jul 2025).
  • Fused lasso and structured regularization problems, where the horseshoe prior is imposed not on coefficients per se, but on a set of contrasts or differences, and may be regularized analogously (Kakikawa et al., 2022).
  • Complex and deep models: Neural networks, generalized linear models, and tree ensembles, with the slab parameter c controlling overfitting risk in low signal-to-noise regimes (Bhadra et al., 2019, Nalenz et al., 2017).

Default settings of C+(0,τ0)\operatorname{C}^+(0, \tau_0)8 in the range C+(0,τ0)\operatorname{C}^+(0, \tau_0)9 have been found empirically to be robust in large-scale simulation studies (Bhattacharya et al., 2021). The use of the regularized horseshoe is recommended in any setting where control over the posterior spread of large coefficients is desired, or where extreme tail robustness of the original horseshoe may cause computational problems or induce poor frequentist coverage.

7. Software and Implementation Details

The regularized horseshoe prior is now implemented in multiple Bayesian inference platforms:

  • Stan: Full, reproducible code blocks for both Gaussian and logistic models. The transformation

θjλj,τN(0,τ2λj2)\theta_j \mid \lambda_j, \tau \sim \mathcal{N}(0, \tau^2 \lambda_j^2)0

is used at the parameter block to "cap" the effective local scaling (Piironen et al., 2017).

  • TensorFlow Probability: tfp.distributions.Horseshoe supports slab regularization for both local and global parameters (Bhadra et al., 2019).
  • R packages and Matlab/Python code: For various horseshoe family extensions—including graphical horseshoe, deepGLM, Factor, and Tree-based models—are available (see survey tables in (Bhadra et al., 2019)).

Modern inference with the regularized horseshoe is typically conducted with hybrid block-Gibbs (using inversion-free updates for θjλj,τN(0,τ2λj2)\theta_j \mid \lambda_j, \tau \sim \mathcal{N}(0, \tau^2 \lambda_j^2)1) or with Hamiltonian Monte Carlo, which shows improved sampling behavior relative to the original horseshoe due to the tamed tails. For robust regression variants (Laplace likelihood), standard augmentation and block-updating are used (Fan et al., 15 Jul 2025).


References:

(Piironen et al., 2017, Fan et al., 15 Jul 2025, Bhattacharya et al., 2021, Bhadra et al., 2019, Kakikawa et al., 2022, Nalenz et al., 2017)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Regularized Horseshoe Prior.