Papers
Topics
Authors
Recent
Search
2000 character limit reached

Bayes Factor Surprise in Hierarchical Models

Updated 22 January 2026
  • Bayes Factor Surprise is defined as the ratio of predictive probabilities from a naive prior to current belief, detecting potential changes in volatile environments.
  • It modulates adaptation rates in hierarchical change-point models by integrating new data while selectively forgetting outdated information.
  • It underlies efficient online algorithms such as variational, particle filtering, and message-passing methods, achieving near-optimal performance in non-stationary settings.

Bayes Factor Surprise (SBF) is a theoretically grounded measure of surprise that emerges naturally from exact Bayesian inference in hierarchical change-point models. In environments characterized by non-stationarity and abrupt changes, SBF provides a computationally efficient mechanism for modulating adaptation rates, balancing the integration of new data with the selective forgetting of past information. SBF is defined as a ratio of predictive probabilities under the naïve prior and the current belief, and is mathematically and functionally distinct from classic Shannon surprise, which depends on the log of a weighted mixture predictive. SBF underlies a family of online learning algorithms—variational, particle, and message-passing—that achieve near-optimal performance in volatile environments with linear scaling in sequence length. Empirical benchmarks and theoretical predictions dissociate SBF from other surprise measures and suggest experimental tests to distinguish their behavioral and physiological effects (Liakoni et al., 2019).

1. Change-Point Model and Bayesian Filtering

The foundational framework is a hierarchical change-point model. At each time tt, a latent parameter Θt\Theta_t governs observations YtY_t through a known likelihood PY(yΘt)P_Y(y \mid \Theta_t). With probability pcp_c, a “change” event resets Θt\Theta_t, sampling it anew from the prior π(0)\pi^{(0)}; otherwise, Θt=Θt1\Theta_t = \Theta_{t-1}. Formally, letting CtC_t be a Bernoulli indicator of change:

P(ΘtCt,Θt1)={π(0)(Θt)if Ct=1 δ(ΘtΘt1)if Ct=0P(\Theta_t \mid C_t, \Theta_{t-1}) = \begin{cases} \pi^{(0)}(\Theta_t) & \text{if}~C_t=1 \ \delta(\Theta_t - \Theta_{t-1}) & \text{if}~C_t=0 \end{cases}

with P(Ct)=Bernoulli(pc)P(C_t) = \mathrm{Bernoulli}(p_c).

Bayesian inference yields the filtering recursion for the posterior π(t)(θ)=P(Θt=θy1:t)\pi^{(t)}(\theta) = P(\Theta_t = \theta \mid y_{1:t}):

π(t+1)(θ)=PY(yt+1Θt+1=θ)P(Θt+1=θy1:t)P(Yt+1y1:t)\pi^{(t+1)}(\theta) = \frac{P_Y(y_{t+1} \mid \Theta_{t+1} = \theta) \, P(\Theta_{t+1} = \theta \mid y_{1:t})}{P(Y_{t+1} \mid y_{1:t})}

Here, P(Θt+1y1:t)=(1pc)π(t)(θ)+pcπ(0)(θ)P(\Theta_{t+1} \mid y_{1:t}) = (1-p_c)\pi^{(t)}(\theta) + p_c\pi^{(0)}(\theta), reflecting a convex combination of retention and reset.

2. Formal Definition of Bayes Factor Surprise

Letting P(y;π)PY(yθ)π(θ)dθP(y; \pi) \equiv \int P_Y(y \mid \theta)\pi(\theta)d\theta denote the predictive under belief π\pi, the Bayes-Factor Surprise of a new datum yt+1y_{t+1} given current posterior π(t)\pi^{(t)} is

SBF(yt+1;π(t))P(yt+1;π(0))P(yt+1;π(t))SBF(y_{t+1};\pi^{(t)}) \equiv \frac{P(y_{t+1};\pi^{(0)})}{P(y_{t+1};\pi^{(t)})}

SBF compares the immediate predictive probability under the “naïve” prior to that under the agent’s current belief. When SBFSBF is large, the observation is more likely under the prior than under current predictions, indicating a potential change. This measure directly modulates learning rate and memory updating.

3. Recursive Updates and Surprise-Modulated Adaptation

Defining the one-step Bayesian update with “no-change” assumption as

πB(t+1)(θ)=PY(yt+1θ)π(t)(θ)P(yt+1;π(t))\pi_B^{(t+1)}(\theta) = \frac{P_Y(y_{t+1} \mid \theta)\pi^{(t)}(\theta)}{P(y_{t+1};\pi^{(t)})}

and the “reset” posterior under “just-changed” as

P(θyt+1)=PY(yt+1θ)π(0)(θ)P(yt+1;π(0))P(\theta \mid y_{t+1}) = \frac{P_Y(y_{t+1} \mid \theta)\pi^{(0)}(\theta)}{P(y_{t+1};\pi^{(0)})}

the exact Bayesian recursion can be compactly expressed as

π(t+1)(θ)=(1γ)πB(t+1)(θ)+γP(θyt+1)\pi^{(t+1)}(\theta) = (1-\gamma)\pi_B^{(t+1)}(\theta) + \gamma P(\theta \mid y_{t+1})

where the adaptation rate γ\gamma is a function of SBFSBF and model volatility,

γ(S,m)=mS1+mS,m=pc1pc\gamma(S,m) = \frac{m S}{1 + m S},\quad m = \frac{p_c}{1-p_c}

Derivation follows by factorizing the total predictive in numerator and denominator, expressing as a convex blend set by γ\gamma.

4. Comparison to Shannon Surprise

In this framework, the Shannon surprise at time t+1t+1 is

SSh(yt+1;π(t))=logP(yt+1y1:t)=log[(1pc)P(yt+1;π(t))+pcP(yt+1;π(0))]S_{Sh}(y_{t+1};\pi^{(t)}) = -\log P(y_{t+1}\mid y_{1:t}) = -\log\left[(1-p_c)P(y_{t+1};\pi^{(t)}) + p_c P(y_{t+1};\pi^{(0)})\right]

By contrast, SBF is the ratio of these two predictive probabilities rather than the logarithm of their mixture. The adaptation rate γ\gamma relates to the difference in Shannon surprise under the two beliefs:

ΔSShSSh(y;π(t))SSh(y;π(0))γ=pcexp(ΔSSh)\Delta S_{Sh} \equiv S_{Sh}(y;\pi^{(t)}) - S_{Sh}(y;\pi^{(0)})\quad\Rightarrow\quad \gamma = p_c \exp(\Delta S_{Sh})

Thus, SBF and Shannon surprise are mathematically and operationally distinct; SBF captures a “belief-versus-prior” contrast, while Shannon surprise registers overall informativeness.

5. Surprise-Modulated Online Algorithms

Three novel, computationally tractable algorithms embody SBF-based updates:

Algorithm Belief Representation Key Update
VarSMiLe Tractable exponential-family dist. Log-space mixing; updates suff. stats linearly.
MP_N NN weighted messages/particles Message prune/grow; harmonic-mean SBF aggregation.
pf_N NN weighted particles, full traj. Importance-weight update, Bernoulli resampling.
  • Variational SMiLe (VarSMiLe): Implements log-space mixing within exponential-family conjugate priors, allowing for succinct sufficient statistic updates and O(1)O(1) time-per-update per time point.
  • Message Passing (MP_N): Maintains NN weighted message particles with truncation to fixed memory/complexity, using harmonic-mean SBF aggregation for global adaptation rate calculation. Update steps include message weight updates via Bayes factors and posterior resets.
  • Particle Filtering (pf_N): Each of NN particles samples change-point histories; importance weights are SBF-modulated. Posterior trajectories are efficiently approximated by Bernoulli resampling based on γ\gamma.

All are O(NT)O(NT) in computational complexity (with NTN \ll T) and support simple update rules for exponential-family observation models (Liakoni et al., 2019).

6. Empirical Evaluation and Behavioral Predictions

Empirical validation was performed on two canonical tasks:

  • Gaussian mean estimation (ytN(μt,σ2)y_t\sim N(\mu_t,\sigma^2) with μt\mu_t reset to N(0,1)N(0,1) with probability pcp_c).
  • Categorical probability estimation (yt{1,,K}Cat(pt)y_t\in\{1,\ldots, K\}\sim \mathrm{Cat}(p_t), ptp_t reset to Dirichlet prior).

Comparative baseline algorithms include untruncated exact Bayes, SOR_N (stratified optimal resampling), generalized Nassar2010/12 variants, SMiLe, and fixed-α\alpha leaky integrators.

Key empirical findings:

  • pf20, MP20, and SOR20 closely approximate exact Bayes MSE across broad conditions (noise, change rates).
  • VarSMiLe and Nas12* achieve best single-unit (constant memory) performance.
  • MP_N and pf_N generalize robustly across SNR and change rates, outperforming SOR_N at low change probabilities.
  • Leaky integrators and SMiLe are effective only in narrow regimes.

Physiological/behavioral predictions are provided to dissociate SBF from SShS_{Sh}:

  • Prediction 1 (Sign-bias): In a Gaussian prediction task, holding (y^,δ,C|\hat{y}|,|\delta|,C) fixed, SBF and Shannon surprise yield opposite expected effects for sign-bias in prediction error, enabling empirical separation via measures such as pupil size or P300 amplitude.
  • Prediction 2 (Equal-probability test): On trials where P(yt+1;π(t))P(yt+1;π(0))P(y_{t+1};\pi^{(t)})\approx P(y_{t+1};\pi^{(0)}), SBF remains constant while Shannon surprise decreases in pp; physiological/behavioral covariation with MM (measured response) allows identification of the operational surprise metric.

7. Scope and Distinctive Properties

Bayes Factor Surprise is an intrinsic outcome of exact Bayesian reasoning in the change-point model, distinct from Shannon surprise by its reference structure (ratio vs log-mixture) and by its direct role in modulating adaptation rates through a precisely defined update rate γ\gamma. SBF enables modular, surprise-modulated learning in a range of practical algorithms with linear scaling and simple update rules, and supports experimentally testable predictions about human and animal adaptation in non-stationary environments (Liakoni et al., 2019).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bayes Factor Surprise.