Bayes Factor Surprise in Hierarchical Models

Updated 22 January 2026

Bayes Factor Surprise is defined as the ratio of predictive probabilities from a naive prior to current belief, detecting potential changes in volatile environments.
It modulates adaptation rates in hierarchical change-point models by integrating new data while selectively forgetting outdated information.
It underlies efficient online algorithms such as variational, particle filtering, and message-passing methods, achieving near-optimal performance in non-stationary settings.

Bayes Factor Surprise (SBF) is a theoretically grounded measure of surprise that emerges naturally from exact Bayesian inference in hierarchical change-point models. In environments characterized by non-stationarity and abrupt changes, SBF provides a computationally efficient mechanism for modulating adaptation rates, balancing the integration of new data with the selective forgetting of past information. SBF is defined as a ratio of predictive probabilities under the naïve prior and the current belief, and is mathematically and functionally distinct from classic Shannon surprise, which depends on the log of a weighted mixture predictive. SBF underlies a family of online learning algorithms—variational, particle, and message-passing—that achieve near-optimal performance in volatile environments with linear scaling in sequence length. Empirical benchmarks and theoretical predictions dissociate SBF from other surprise measures and suggest experimental tests to distinguish their behavioral and physiological effects (Liakoni et al., 2019).

1. Change-Point Model and Bayesian Filtering

The foundational framework is a hierarchical change-point model. At each time $t$ , a latent parameter $\Theta_t$ governs observations $Y_t$ through a known likelihood $P_Y(y \mid \Theta_t)$ . With probability $p_c$ , a “change” event resets $\Theta_t$ , sampling it anew from the prior $\pi^{(0)}$ ; otherwise, $\Theta_t = \Theta_{t-1}$ . Formally, letting $C_t$ be a Bernoulli indicator of change:

$P(\Theta_t \mid C_t, \Theta_{t-1}) = \begin{cases} \pi^{(0)}(\Theta_t) & \text{if}~C_t=1 \ \delta(\Theta_t - \Theta_{t-1}) & \text{if}~C_t=0 \end{cases}$

with $P(C_t) = \mathrm{Bernoulli}(p_c)$ .

Bayesian inference yields the filtering recursion for the posterior $\pi^{(t)}(\theta) = P(\Theta_t = \theta \mid y_{1:t})$ :

$\pi^{(t+1)}(\theta) = \frac{P_Y(y_{t+1} \mid \Theta_{t+1} = \theta) \, P(\Theta_{t+1} = \theta \mid y_{1:t})}{P(Y_{t+1} \mid y_{1:t})}$

Here, $P(\Theta_{t+1} \mid y_{1:t}) = (1-p_c)\pi^{(t)}(\theta) + p_c\pi^{(0)}(\theta)$ , reflecting a convex combination of retention and reset.

2. Formal Definition of Bayes Factor Surprise

Letting $P(y; \pi) \equiv \int P_Y(y \mid \theta)\pi(\theta)d\theta$ denote the predictive under belief $\pi$ , the Bayes-Factor Surprise of a new datum $y_{t+1}$ given current posterior $\pi^{(t)}$ is

$SBF(y_{t+1};\pi^{(t)}) \equiv \frac{P(y_{t+1};\pi^{(0)})}{P(y_{t+1};\pi^{(t)})}$

SBF compares the immediate predictive probability under the “naïve” prior to that under the agent’s current belief. When $SBF$ is large, the observation is more likely under the prior than under current predictions, indicating a potential change. This measure directly modulates learning rate and memory updating.

3. Recursive Updates and Surprise-Modulated Adaptation

Defining the one-step Bayesian update with “no-change” assumption as

$\pi_B^{(t+1)}(\theta) = \frac{P_Y(y_{t+1} \mid \theta)\pi^{(t)}(\theta)}{P(y_{t+1};\pi^{(t)})}$

and the “reset” posterior under “just-changed” as

$P(\theta \mid y_{t+1}) = \frac{P_Y(y_{t+1} \mid \theta)\pi^{(0)}(\theta)}{P(y_{t+1};\pi^{(0)})}$

the exact Bayesian recursion can be compactly expressed as

$\pi^{(t+1)}(\theta) = (1-\gamma)\pi_B^{(t+1)}(\theta) + \gamma P(\theta \mid y_{t+1})$

where the adaptation rate $\gamma$ is a function of $SBF$ and model volatility,

$\gamma(S,m) = \frac{m S}{1 + m S},\quad m = \frac{p_c}{1-p_c}$

Derivation follows by factorizing the total predictive in numerator and denominator, expressing as a convex blend set by $\gamma$ .

4. Comparison to Shannon Surprise

In this framework, the Shannon surprise at time $t+1$ is

$S_{Sh}(y_{t+1};\pi^{(t)}) = -\log P(y_{t+1}\mid y_{1:t}) = -\log\left[(1-p_c)P(y_{t+1};\pi^{(t)}) + p_c P(y_{t+1};\pi^{(0)})\right]$

By contrast, SBF is the ratio of these two predictive probabilities rather than the logarithm of their mixture. The adaptation rate $\gamma$ relates to the difference in Shannon surprise under the two beliefs:

$\Delta S_{Sh} \equiv S_{Sh}(y;\pi^{(t)}) - S_{Sh}(y;\pi^{(0)})\quad\Rightarrow\quad \gamma = p_c \exp(\Delta S_{Sh})$

Thus, SBF and Shannon surprise are mathematically and operationally distinct; SBF captures a “belief-versus-prior” contrast, while Shannon surprise registers overall informativeness.

5. Surprise-Modulated Online Algorithms

Three novel, computationally tractable algorithms embody SBF-based updates:

Algorithm	Belief Representation	Key Update
VarSMiLe	Tractable exponential-family dist.	Log-space mixing; updates suff. stats linearly.
MP_N	$N$ weighted messages/particles	Message prune/grow; harmonic-mean SBF aggregation.
pf_N	$N$ weighted particles, full traj.	Importance-weight update, Bernoulli resampling.

Variational SMiLe (VarSMiLe): Implements log-space mixing within exponential-family conjugate priors, allowing for succinct sufficient statistic updates and $O(1)$ time-per-update per time point.
Message Passing (MP_N): Maintains $N$ weighted message particles with truncation to fixed memory/complexity, using harmonic-mean SBF aggregation for global adaptation rate calculation. Update steps include message weight updates via Bayes factors and posterior resets.
Particle Filtering (pf_N): Each of $N$ particles samples change-point histories; importance weights are SBF-modulated. Posterior trajectories are efficiently approximated by Bernoulli resampling based on $\gamma$ .

All are $O(NT)$ in computational complexity (with $N \ll T$ ) and support simple update rules for exponential-family observation models (Liakoni et al., 2019).

6. Empirical Evaluation and Behavioral Predictions

Empirical validation was performed on two canonical tasks:

Gaussian mean estimation ( $y_t\sim N(\mu_t,\sigma^2)$ with $\mu_t$ reset to $N(0,1)$ with probability $p_c$ ).
Categorical probability estimation ( $y_t\in\{1,\ldots, K\}\sim \mathrm{Cat}(p_t)$ , $p_t$ reset to Dirichlet prior).

Comparative baseline algorithms include untruncated exact Bayes, SOR_N (stratified optimal resampling), generalized Nassar2010/12 variants, SMiLe, and fixed- $\alpha$ leaky integrators.

Key empirical findings:

pf20, MP20, and SOR20 closely approximate exact Bayes MSE across broad conditions (noise, change rates).
VarSMiLe and Nas12^* achieve best single-unit (constant memory) performance.
MP_N and pf_N generalize robustly across SNR and change rates, outperforming SOR_N at low change probabilities.
Leaky integrators and SMiLe are effective only in narrow regimes.

Physiological/behavioral predictions are provided to dissociate SBF from $S_{Sh}$ :

Prediction 1 (Sign-bias): In a Gaussian prediction task, holding ( $|\hat{y}|,|\delta|,C$ ) fixed, SBF and Shannon surprise yield opposite expected effects for sign-bias in prediction error, enabling empirical separation via measures such as pupil size or P300 amplitude.
Prediction 2 (Equal-probability test): On trials where $P(y_{t+1};\pi^{(t)})\approx P(y_{t+1};\pi^{(0)})$ , SBF remains constant while Shannon surprise decreases in $p$ ; physiological/behavioral covariation with $M$ (measured response) allows identification of the operational surprise metric.

7. Scope and Distinctive Properties

Bayes Factor Surprise is an intrinsic outcome of exact Bayesian reasoning in the change-point model, distinct from Shannon surprise by its reference structure (ratio vs log-mixture) and by its direct role in modulating adaptation rates through a precisely defined update rate $\gamma$ . SBF enables modular, surprise-modulated learning in a range of practical algorithms with linear scaling and simple update rules, and supports experimentally testable predictions about human and animal adaptation in non-stationary environments (Liakoni et al., 2019).

Markdown Report Issue Upgrade to Chat

References (1)

Learning in Volatile Environments with the Bayes Factor Surprise (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bayes Factor Surprise.