Power Posterior in Bayesian Inference

Updated 15 January 2026

Power Posterior is a Bayesian method that raises the likelihood to a fractional power to reduce sensitivity to model misspecification.
It supports robust evidence estimation and adaptive sampling through techniques like thermodynamic integration and ATAIS.
Choosing the proper power parameter is crucial to balance bias and variance while ensuring favorable asymptotic properties.

A power posterior, also referred to as a fractional posterior or α-posterior, is a modification of the standard Bayesian posterior in which the likelihood is raised to a fractional power, effectively tempering its influence on the posterior distribution. This formalism is used to achieve robustness to model misspecification and improve the stability of inference under nonideal data-generating regimes. The power parameter (temperature), denoted α or τ, controls the degree to which the likelihood is weighted relative to the prior. Over the last decade, power posteriors have been foundational in robust Bayesian analysis, evidence estimation, and adaptive importance sampling. Their large-sample properties and practical calibration have been systematically characterized, including their asymptotic behavior, moment consistency, and the breakdown of the Bernstein–von Mises theorem under specific regimes.

1. Formal Definition and Motivation

Given observed data $X^n = (X_1, …, X_n)$ , an assumed likelihood $f_n(X^n|\theta)$ where $\theta \in \Theta \subset \mathbb{R}^p$ , and prior $\pi(\theta)$ , the power posterior is defined by:

$\pi_{n,\alpha}(\theta| X^n) \propto f_n(X^n | \theta)^\alpha \, \pi(\theta)$

where $\alpha > 0$ is the tempering (power) parameter, often $\alpha\in(0,1]$ for robustness or $\alpha\in[0,1]$ for evidence estimation (Ray et al., 2023, Friel et al., 2012). When $\alpha < 1$ , the likelihood is downweighted, increasing posterior variance and reducing sensitivity to model misspecification or outliers. The formalism can be interpreted as a solution to the generalized Bayes or Gibbs posterior problem:

$\min_{q} \left\{ \alpha\, \mathbb{E}_q[-\log f_n(X^n|\theta)] + D_{KL}(q \| \pi) \right\}$

This tempering approach is also central to methods such as automatic tempered posterior distributions, power posterior evidence estimation, and adaptive importance sampling (Martino et al., 2021, Friel et al., 2012).

2. Asymptotic Properties under Local Asymptotic Normality

If the underlying model is locally asymptotically normal (LAN)—meaning that the log-likelihood admits a quadratic approximation near the (pseudo-)true parameter $\theta^*$ —then the following summary holds (Ray et al., 2023, Ray et al., 14 Jan 2026):

Posterior Distribution: Under regularity (Assumptions A0–A2), for fixed $\alpha$ or $\tau$ ,

$\pi_{n,\alpha}(\theta| X^n) \approx \mathcal{N}\left(\hat{\theta}, \, (\alpha n V_{\theta^*})^{-1} \right)$

where $\hat{\theta}$ is the (pseudo-)MLE and $V_{\theta^*}$ is the Fisher information at $\theta^*$ .

Moments Consistency: Under mild integrability conditions (Assumption A3), all centered and scaled moments of the power posterior converge to those of the limiting normal distribution:

$\mathbb{E}_{\pi_{n,\alpha}}\left[ (\theta-\theta^*)^{\otimes k} \right] = \mathbb{E}_{N(0, V_{\theta^*}^{-1}/(\alpha n))}\left[ h^{\otimes k} \right] + o_p(n^{-k/2})$

Posterior Mean Equivalence: The mean of the power posterior estimator,

$\hat{\theta}_{\alpha} = \mathbb{E}_{\pi_{n,\alpha}}[\theta]$

satisfies $\sqrt{n}(\hat{\theta}_{\alpha} - \hat{\theta}) = o_p(1)$ and both converge to the same asymptotic normal law.

These properties hold provided $\alpha$ does not vanish too rapidly; if $\alpha_n \gg 1/n$ and $\alpha_n \sqrt{n} \to \infty$ , consistency and normality for the mean are guaranteed (Ray et al., 14 Jan 2026, Ray et al., 2023).

3. Selection and Calibration of the Power Parameter

The choice of $\alpha$ is critical:

Robustness vs. Efficiency: Smaller $\alpha$ provides enhanced robustness at the expense of wider credible sets and a slight increase in higher-order bias, which diminishes at rate $n^{-1/2}$ (Ray et al., 2023, Ray et al., 14 Jan 2026).
Empirical Selection: Data-driven approaches, such as Bayesian cross-validation (BCV), train-test splits, SafeBayes, or automatic tempering in Bayesian inverse problems, can yield $\hat\alpha_n$ sequences that mix between vanishing values and $\alpha=\infty$ (Ray et al., 14 Jan 2026, Martino et al., 2021). When $\alpha_n \sim 1/n$ , moment consistency and Bernstein–von Mises fail; in mixture regimes, the posterior can concentrate partly on a point mass at $\hat\theta$ and partly on a Gaussian.
Auto-Tempering Algorithms: Procedures like ATAIS automatically generate a sequence of tempered posteriors $p_{\beta_t}(\theta|y)$ by recursively updating the temperature parameter in response to maximizing likelihood over a noise scale, eliminating the need for user-tuned schedules (Martino et al., 2021).

4. Power Posterior Methods for Evidence Estimation

The power posterior forms the backbone of thermodynamic integration, a widely used approach for marginal likelihood (evidence) computation. The method introduces a ladder of inverse temperatures $t \in [0, 1]$ and estimates

$\log p(y) = \int_0^1 \mathbb{E}_{p_t}[\log p(y|\theta)]\,dt$

with $p_t(\theta|y) \propto p(\theta)[p(y|\theta)]^t$ . Numerical quadrature (often trapezoidal) approximates the integral based on MCMC samples at each $t$ . Bias correction is achieved by using a modified trapezoid rule exploiting derivatives of the integrand (variance of the log-likelihood); adaptive grid refinement places temperature rungs more efficiently (Friel et al., 2012). Stepping stone samplers leverage the same ladder and importance weights for unbiased evidence estimation.

Method	Power Posterior Role	Key Features
Thermodynamic Integration	$p_t(\theta\|y)$ sequence for evidence estimation	Bias correction, adaptive ladder
Stepping Stone	Power posterior sequence for proposal distributions	Importance sampling, adaptive rung selection
ATAIS	Sequentially tempered posteriors via noise parameter	Iterative, data-driven inverse temperature

5. Robustness and Bias–Variance Trade-off

Fractional posteriors are empirically and theoretically more robust under model misspecification and data contamination than standard posteriors. Their tempered nature avoids the likelihood overwhelming the prior in extreme regions (Ray et al., 2023, Ray et al., 14 Jan 2026). Variance inflates from the classical $(nI)^{-1}$ to $(n\alpha I)^{-1}$ , mitigating over-confidence. Bias trades off against variance, but higher-order bias terms vanish quickly. In misspecified regression settings, fixed $\alpha$ ensures credible sets with better coverage, while vanishing $\alpha$ regimes present more varied behaviors, including loss of Gaussianity of the posterior mean (Ray et al., 14 Jan 2026).

6. Power Posteriors in Practice: Data-Driven Tempering and High-Dimensional Problems

Contemporary applications involve adaptive or automatic tuning of the tempering parameter. Algorithms such as ATAIS split inference over model parameters and noise scale, using maximum likelihood estimates of the noise power to set the current "inverse temperature" $\beta$ (Martino et al., 2021). Here, the tempered posterior $p_\beta(\theta|y)$ is sampled via adaptive importance procedures, with the sequence $\beta_t=1/\sigma_{\text{ML}}^{(t-1)2}$ monotonically increasing toward the true noise level. Numerical experiments demonstrate improved exploration and stability in multimodal and high-dimensional settings without hand-tuned schedules.

In cosmological inference, the functional form of the posterior (e.g., Rayleigh vs. Gaussian) significantly affects parameter estimates, underlining the importance of appropriate posterior modeling (though not in the direct sense of power tempering, alternative forms are closely related in the context of robust likelihood modification) (Bahr-Kalus et al., 2015).

7. Limitations, Critical Thresholds, and Theoretical Guarantees

A critical aspect is the breakdown of asymptotic normality and moment consistency when $\alpha$ vanishes too rapidly. Specifically, if $\alpha_n \sim 1/n$ , Bernstein–von Mises does not hold, and the posterior may degenerate to a point mass. There exists a threshold $\alpha\asymp 1/\sqrt{n}$ where the Laplace approximation shifts regimes; below this rate, the posterior mean ceases to be asymptotically normal (Ray et al., 14 Jan 2026). Mixture regimes arising from empirical selectors (e.g., BCV, train–test) can lead to posteriors that are mixtures of a Gaussian and a point mass, further complicating inference and credible set construction. Practitioners must be attentive to these phenomena when deploying data-driven tempering in risk-averse or model diagnostic settings.

Power posteriors constitute a fundamental technique in contemporary Bayesian analysis, targeting robustness, efficient evidence computation, and adaptive algorithmic design. Their use demands close attention to asymptotic regimes, practical calibration of the power/temperature parameter, and recognition of the trade-offs between bias, variance, and credible interval reliability. Recent research provides detailed statistical guarantees and practical recommendations for their deployment in both parametric and high-dimensional settings (Ray et al., 2023, Ray et al., 14 Jan 2026, Friel et al., 2012, Martino et al., 2021, Bahr-Kalus et al., 2015).