Bayes Factor Surprise in Hierarchical Models
- Bayes Factor Surprise is defined as the ratio of predictive probabilities from a naive prior to current belief, detecting potential changes in volatile environments.
- It modulates adaptation rates in hierarchical change-point models by integrating new data while selectively forgetting outdated information.
- It underlies efficient online algorithms such as variational, particle filtering, and message-passing methods, achieving near-optimal performance in non-stationary settings.
Bayes Factor Surprise (SBF) is a theoretically grounded measure of surprise that emerges naturally from exact Bayesian inference in hierarchical change-point models. In environments characterized by non-stationarity and abrupt changes, SBF provides a computationally efficient mechanism for modulating adaptation rates, balancing the integration of new data with the selective forgetting of past information. SBF is defined as a ratio of predictive probabilities under the naïve prior and the current belief, and is mathematically and functionally distinct from classic Shannon surprise, which depends on the log of a weighted mixture predictive. SBF underlies a family of online learning algorithms—variational, particle, and message-passing—that achieve near-optimal performance in volatile environments with linear scaling in sequence length. Empirical benchmarks and theoretical predictions dissociate SBF from other surprise measures and suggest experimental tests to distinguish their behavioral and physiological effects (Liakoni et al., 2019).
1. Change-Point Model and Bayesian Filtering
The foundational framework is a hierarchical change-point model. At each time , a latent parameter governs observations through a known likelihood . With probability , a “change” event resets , sampling it anew from the prior ; otherwise, . Formally, letting be a Bernoulli indicator of change:
with .
Bayesian inference yields the filtering recursion for the posterior :
Here, , reflecting a convex combination of retention and reset.
2. Formal Definition of Bayes Factor Surprise
Letting denote the predictive under belief , the Bayes-Factor Surprise of a new datum given current posterior is
SBF compares the immediate predictive probability under the “naïve” prior to that under the agent’s current belief. When is large, the observation is more likely under the prior than under current predictions, indicating a potential change. This measure directly modulates learning rate and memory updating.
3. Recursive Updates and Surprise-Modulated Adaptation
Defining the one-step Bayesian update with “no-change” assumption as
and the “reset” posterior under “just-changed” as
the exact Bayesian recursion can be compactly expressed as
where the adaptation rate is a function of and model volatility,
Derivation follows by factorizing the total predictive in numerator and denominator, expressing as a convex blend set by .
4. Comparison to Shannon Surprise
In this framework, the Shannon surprise at time is
By contrast, SBF is the ratio of these two predictive probabilities rather than the logarithm of their mixture. The adaptation rate relates to the difference in Shannon surprise under the two beliefs:
Thus, SBF and Shannon surprise are mathematically and operationally distinct; SBF captures a “belief-versus-prior” contrast, while Shannon surprise registers overall informativeness.
5. Surprise-Modulated Online Algorithms
Three novel, computationally tractable algorithms embody SBF-based updates:
| Algorithm | Belief Representation | Key Update |
|---|---|---|
| VarSMiLe | Tractable exponential-family dist. | Log-space mixing; updates suff. stats linearly. |
| MP_N | weighted messages/particles | Message prune/grow; harmonic-mean SBF aggregation. |
| pf_N | weighted particles, full traj. | Importance-weight update, Bernoulli resampling. |
- Variational SMiLe (VarSMiLe): Implements log-space mixing within exponential-family conjugate priors, allowing for succinct sufficient statistic updates and time-per-update per time point.
- Message Passing (MP_N): Maintains weighted message particles with truncation to fixed memory/complexity, using harmonic-mean SBF aggregation for global adaptation rate calculation. Update steps include message weight updates via Bayes factors and posterior resets.
- Particle Filtering (pf_N): Each of particles samples change-point histories; importance weights are SBF-modulated. Posterior trajectories are efficiently approximated by Bernoulli resampling based on .
All are in computational complexity (with ) and support simple update rules for exponential-family observation models (Liakoni et al., 2019).
6. Empirical Evaluation and Behavioral Predictions
Empirical validation was performed on two canonical tasks:
- Gaussian mean estimation ( with reset to with probability ).
- Categorical probability estimation (, reset to Dirichlet prior).
Comparative baseline algorithms include untruncated exact Bayes, SOR_N (stratified optimal resampling), generalized Nassar2010/12 variants, SMiLe, and fixed- leaky integrators.
Key empirical findings:
- pf20, MP20, and SOR20 closely approximate exact Bayes MSE across broad conditions (noise, change rates).
- VarSMiLe and Nas12* achieve best single-unit (constant memory) performance.
- MP_N and pf_N generalize robustly across SNR and change rates, outperforming SOR_N at low change probabilities.
- Leaky integrators and SMiLe are effective only in narrow regimes.
Physiological/behavioral predictions are provided to dissociate SBF from :
- Prediction 1 (Sign-bias): In a Gaussian prediction task, holding () fixed, SBF and Shannon surprise yield opposite expected effects for sign-bias in prediction error, enabling empirical separation via measures such as pupil size or P300 amplitude.
- Prediction 2 (Equal-probability test): On trials where , SBF remains constant while Shannon surprise decreases in ; physiological/behavioral covariation with (measured response) allows identification of the operational surprise metric.
7. Scope and Distinctive Properties
Bayes Factor Surprise is an intrinsic outcome of exact Bayesian reasoning in the change-point model, distinct from Shannon surprise by its reference structure (ratio vs log-mixture) and by its direct role in modulating adaptation rates through a precisely defined update rate . SBF enables modular, surprise-modulated learning in a range of practical algorithms with linear scaling and simple update rules, and supports experimentally testable predictions about human and animal adaptation in non-stationary environments (Liakoni et al., 2019).