Regularized Horseshoe Prior

Updated 23 January 2026

Regularized Horseshoe Prior is a global-local continuous shrinkage prior designed for sparse Bayesian inference, capping coefficient variance with an additional slab scale.
It improves regularization by controlling the tail behavior of large signals, thereby stabilizing inference under weak identifiability or heavy-tailed likelihoods.
Widely applied in high-dimensional regression and robust models, it effectively bridges spike-and-slab and continuous shrinkage techniques for better performance.

The regularized horseshoe prior is a global-local continuous shrinkage prior designed for sparse Bayesian inference, which extends the classical horseshoe by introducing an additional "slab" scale, thereby capping the amount of prior variance any coefficient may accumulate. This modification provides improved control over the level of regularization applied to large coefficients: regularizing their variance and preventing the possibility of arbitrarily large coefficient excursions that may destabilize inference, especially in settings with weak identifiability or heavily-tailed likelihoods (Piironen et al., 2017, Fan et al., 15 Jul 2025). The regularized horseshoe thus bridges the original horseshoe's spike-and-infinite-slab architecture to a spike-and-finite-slab regime, offering a continuous relaxation of two-group (spike-and-slab) priors and enhanced practical stability.

1. Definition and Hierarchical Structure

For $p$ -dimensional regression or classification—typically the high-dimensional regime $p \gg n$ —the regularized horseshoe prior (RHS, Editor's term) is placed on a parameter vector $\theta = (\theta_1, \ldots, \theta_p)$ assumed to be (approximately) sparse (Piironen et al., 2017, Fan et al., 15 Jul 2025):

Standard horseshoe:
- Local shrinkage: $\lambda_j \sim \operatorname{C}^+(0, 1)$
- Global shrinkage: $\tau \sim \pi(\tau)$ (usually $\operatorname{C}^+(0, \tau_0)$ )
- Coefficient prior: $\theta_j \mid \lambda_j, \tau \sim \mathcal{N}(0, \tau^2 \lambda_j^2)$
Regularized horseshoe:
- The prior variance is bounded by a slab scale $c > 0$ via:
$\tilde{\lambda}_j^2 = \frac{c^2 \lambda_j^2}{c^2 + \tau^2 \lambda_j^2}$

so

$\theta_j \mid \lambda_j, \tau, c \sim \mathcal{N}(0, \tau^2 \tilde{\lambda}_j^2)$ - For $p \gg n$ 0, $p \gg n$ 1 (original horseshoe). For finite $p \gg n$ 2, all $p \gg n$ 3 are marginally sub-Gaussian with variance at most $p \gg n$ 4.
The slab scale $p \gg n$ 5 is typically given a weakly informative inverse-gamma prior $p \gg n$ 6, for instance $p \gg n$ 7 for a Student- $p \gg n$ 8 slab, or fixed to a large value if domain knowledge allows (Piironen et al., 2017, Fan et al., 15 Jul 2025).

This architecture is algebraically equivalent to multiplying the standard horseshoe prior by a zero-centered Gaussian slab of variance $p \gg n$ 9 and renormalizing (Piironen et al., 2017).

Prior	Marginal prior for $\theta = (\theta_1, \ldots, \theta_p)$ 0	Regularizes tail?
Horseshoe	Cauchy-like ( $\theta = (\theta_1, \ldots, \theta_p)$ 1)	No ( $\theta = (\theta_1, \ldots, \theta_p)$ 2)
Reg. Horseshoe	Cauchy-small, Gaussian-large	Yes (finite $\theta = (\theta_1, \ldots, \theta_p)$ 3)
Spike-and-slab	Mixture with finite-variance slab	Yes (finite $\theta = (\theta_1, \ldots, \theta_p)$ 4)

2. Motivations and Theoretical Rationale

The original horseshoe prior enforces strong shrinkage at zero (via the half-Cauchy local scales), promoting sparsity, but possesses heavy Cauchy-like tails, which means large signals are essentially unregularized. This can be problematic in several circumstances:

Weakly identified models: e.g., separation in logistic regression, where the likelihood cannot regularize large coefficients, leading to instability or divergence in posterior sampling (Piironen et al., 2017, Fan et al., 15 Jul 2025).
Inference under heavy-tailed errors: Without tail regularization, the horseshoe may deliver over-diffuse or infinite credible intervals for large signals (Fan et al., 15 Jul 2025).
Posterior computation and MCMC pathologies: The "infinite slab" induces strongly funnel-shaped posteriors, leading to divergent transitions in Hamiltonian Monte Carlo.

The regularized horseshoe prior caps the tail behavior at scale $\theta = (\theta_1, \ldots, \theta_p)$ 5, providing controlled regularization for large coefficients. It can be viewed as a continuous relaxation of two-group spike-and-slab priors, replacing the infinite-variance slab with a finite one, thus blending the "strong spike, mild slab" features of discrete mixture priors with the computational and modeling advantages of continuous shrinkage (Piironen et al., 2017).

3. Hyperparameter Specification and Defaults

Global scale $\theta = (\theta_1, \ldots, \theta_p)$ 6: The effective degree of sparsity is controlled via $\theta = (\theta_1, \ldots, \theta_p)$ 7, which is ideally set according to a prior guess $\theta = (\theta_1, \ldots, \theta_p)$ 8 of the number of relevant coefficients. For standardized predictors and known $\theta = (\theta_1, \ldots, \theta_p)$ 9, a practical default is:

$\lambda_j \sim \operatorname{C}^+(0, 1)$ 0

and use $\lambda_j \sim \operatorname{C}^+(0, 1)$ 1 or a half-Student- $\lambda_j \sim \operatorname{C}^+(0, 1)$ 2 (Piironen et al., 2017, Bhadra et al., 2019).

Slab scale $\lambda_j \sim \operatorname{C}^+(0, 1)$ 3: When tail regularization is required, a weakly-informative prior such as $\lambda_j \sim \operatorname{C}^+(0, 1)$ 4 (or slab df 4, scale 2) is used, placing c.d.f. mass on plausible large effects. Alternatively, $\lambda_j \sim \operatorname{C}^+(0, 1)$ 5 may be fixed to domain knowledge (e.g., "no effect exceeds 5 in magnitude").
Local scales $\lambda_j \sim \operatorname{C}^+(0, 1)$ 6: Defaults to half-Cauchy(0, 1); heavier or lighter tails (half- $\lambda_j \sim \operatorname{C}^+(0, 1)$ 7 with $\lambda_j \sim \operatorname{C}^+(0, 1)$ 8) are possible.

If $\lambda_j \sim \operatorname{C}^+(0, 1)$ 9, one recovers the original horseshoe; if $\tau \sim \pi(\tau)$ 0 is small, the model approaches ordinary ridge-type shrinkage for large coefficients.

4. Posterior Computation

Posterior inference for the regularized horseshoe is facilitated by the Gaussian scale-mixture representation:

Each $\tau \sim \pi(\tau)$ 1 is conditionally normal given $\tau \sim \pi(\tau)$ 2.
$\tau \sim \pi(\tau)$ 3 can be represented as a half-Cauchy via a scale-mixture of inverse-Gammas: $\tau \sim \pi(\tau)$ 4.
The joint posterior is sampled via block Gibbs or hybrid Gibbs–Metropolis updates for the $\tau \sim \pi(\tau)$ 5 parameters (Piironen et al., 2017, Bhattacharya et al., 2021, Fan et al., 15 Jul 2025).

For generalized linear models, the standard data-augmentation techniques (e.g., Polya-Gamma for logistic regression) are utilized in conjunction with the regularized horseshoe hierarchy (Bhadra et al., 2019, Nalenz et al., 2017).

Geometric ergodicity of the block-Gibbs sampler has been established under the mild condition that the prior on $\tau \sim \pi(\tau)$ 6 has a finite negative moment of order $\tau \sim \pi(\tau)$ 7 for some $\tau \sim \pi(\tau)$ 8, without requiring explicit truncation of the slab or the local scale parameters (Bhattacharya et al., 2021).

5. Shrinkage Properties and Interpretation

The regularized horseshoe prior modulates shrinkage through a coefficient-wise factor:

$\tau \sim \pi(\tau)$ 9

where $\operatorname{C}^+(0, \tau_0)$ 0 is adapted to the design and likelihood (e.g., $\operatorname{C}^+(0, \tau_0)$ 1 in Gaussian regression). For small $\operatorname{C}^+(0, \tau_0)$ 2 relative to $\operatorname{C}^+(0, \tau_0)$ 3, the shrinkage effect is as in the horseshoe. For large $\operatorname{C}^+(0, \tau_0)$ 4, the regularized variance flattens at $\operatorname{C}^+(0, \tau_0)$ 5, so $\operatorname{C}^+(0, \tau_0)$ 6. The effect is that:

Small and moderate signals are strongly shrunk, promoting sparsity.
Large signals are regularized towards the slab, yielding bounded, stable inference, and avoiding the tail pathologies of the original horseshoe in weakly identified regimes (Piironen et al., 2017, Fan et al., 15 Jul 2025).

Empirically, credible intervals for large coefficients under the regularized horseshoe remain finite and reflect improved frequentist coverage compared to the original horseshoe when the likelihood is weak (Fan et al., 15 Jul 2025).

6. Applications and Practical Recommendations

The regularized horseshoe prior has been successfully applied in:

High-dimensional variable selection under both Gaussian and robust (Laplace, $\operatorname{C}^+(0, \tau_0)$ 7-distributed) error models, with proven valid Bayesian credible interval coverage in both low- and heavy-tailed regimes (Fan et al., 15 Jul 2025).
Regression models with heavy-tailed or contaminated noise, where the posterior is more robust than ordinary horseshoe or non-regularized global-local shrinkage (Fan et al., 15 Jul 2025).
Fused lasso and structured regularization problems, where the horseshoe prior is imposed not on coefficients per se, but on a set of contrasts or differences, and may be regularized analogously (Kakikawa et al., 2022).
Complex and deep models: Neural networks, generalized linear models, and tree ensembles, with the slab parameter c controlling overfitting risk in low signal-to-noise regimes (Bhadra et al., 2019, Nalenz et al., 2017).

Default settings of $\operatorname{C}^+(0, \tau_0)$ 8 in the range $\operatorname{C}^+(0, \tau_0)$ 9 have been found empirically to be robust in large-scale simulation studies (Bhattacharya et al., 2021). The use of the regularized horseshoe is recommended in any setting where control over the posterior spread of large coefficients is desired, or where extreme tail robustness of the original horseshoe may cause computational problems or induce poor frequentist coverage.

7. Software and Implementation Details

The regularized horseshoe prior is now implemented in multiple Bayesian inference platforms:

Stan: Full, reproducible code blocks for both Gaussian and logistic models. The transformation

$\theta_j \mid \lambda_j, \tau \sim \mathcal{N}(0, \tau^2 \lambda_j^2)$ 0

is used at the parameter block to "cap" the effective local scaling (Piironen et al., 2017).

TensorFlow Probability: tfp.distributions.Horseshoe supports slab regularization for both local and global parameters (Bhadra et al., 2019).
R packages and Matlab/Python code: For various horseshoe family extensions—including graphical horseshoe, deepGLM, Factor, and Tree-based models—are available (see survey tables in (Bhadra et al., 2019)).

Modern inference with the regularized horseshoe is typically conducted with hybrid block-Gibbs (using inversion-free updates for $\theta_j \mid \lambda_j, \tau \sim \mathcal{N}(0, \tau^2 \lambda_j^2)$ 1) or with Hamiltonian Monte Carlo, which shows improved sampling behavior relative to the original horseshoe due to the tamed tails. For robust regression variants (Laplace likelihood), standard augmentation and block-updating are used (Fan et al., 15 Jul 2025).