Bayesian Inference and Online Sequential Updating

Updated 6 February 2026

Bayesian inference and online sequential updating are statistical frameworks that combine prior beliefs with new data to yield updated posterior distributions.
The topic covers methodologies including Kalman filtering, particle filtering, and variational Bayesian methods for efficient real-time data assimilation.
Applications span time-series modeling, adaptive systems, privacy-preserving queries, and robust forecasting using scalable, recursive updating algorithms.

Bayesian inference is the core statistical paradigm in which prior beliefs about unknown parameters are combined with observed data to yield updated, data-dependent beliefs via the posterior distribution. Online sequential updating refers to the recursive assimilation of new observations or queries, continually refining the Bayesian posterior as information accrues, making it an essential methodology in time-series modeling, adaptive systems, streaming analytics, and privacy-sensitive database query systems.

1. Fundamentals of Bayesian Sequential Updating

In standard Bayesian analysis, the goal is to obtain the posterior distribution $p(\theta \mid y_{1:T}) \propto p(\theta) \prod_{t=1}^T p(y_t \mid \theta)$ after all data are observed. However, streaming or online contexts require updating $p(\theta \mid y_{1:t-1})$ to $p(\theta \mid y_{1:t})$ with minimal recomputation. The general update is:

$p(\theta \mid y_{1:t}) \propto p(y_t \mid \theta) \, p(\theta \mid y_{1:t-1})$

This recursion is applied in parametric models (Gaussian, exponential family), nonparametric settings, and latent variable frameworks. In the context of privacy-preserving data analysis, e.g., for count-vector summaries in data cubes, online updating can be performed even when only differentially private (Laplace-perturbed) linear query answers are observable. The structure of noisy inference is:

Observed: $y = Hx + N$ , where $H$ is an $m \times n$ query matrix, $x$ is the private $n$ -dimensional data vector, and $N$ is Laplace noise scaled per-query to achieve $p(\theta \mid y_{1:t-1})$ 0-differential privacy (with $p(\theta \mid y_{1:t-1})$ 1 the $p(\theta \mid y_{1:t-1})$ 2 sensitivity per query).
Bayesian updating is performed by propagating posterior beliefs about any linear functional $p(\theta \mid y_{1:t-1})$ 3, given past noisy queries, using the likelihood induced by the composed Laplace noise and a (typically uninformative) prior (Xiao et al., 2012).

Online sequential Bayesian updating is thus formalized as a recursive application of Bayes' theorem, leveraging independence and Markov properties or adapting more elaborate filtering algorithms in dynamic latent-state models.

2. Methodological Approaches and Algorithms

A broad class of algorithms implement online Bayesian sequential inference, spanning finite-dimensional (parametric) models, state-space models, and high-dimensional or nonparametric settings.

Kalman Filtering and Extensions: In linear-Gaussian state-space models, the posterior admits closed-form Gaussian recursion (Kalman filter). Extended Kalman filter (EKF) and Unscented Kalman filter (UKF) generalize this to nonlinear/non-Gaussian cases, often used as local Gaussian approximations for online neural network weight inference (Wagner et al., 2021, Duran-Martin et al., 2021).
Particle Filtering / Sequential Monte Carlo (SMC): For intractable posteriors, SMC methods propagate an ensemble of weighted samples (particles) via importance sampling and resampling, with resampling strategies (e.g., effective sample size) and MCMC rejuvenation to mitigate particle degeneracy (Dinh et al., 2016, Shaghaghian et al., 2016).
Online Variational Bayes (VB): Variational Bayes sequentially updates tractable approximations $p(\theta \mid y_{1:t-1})$ $p (θ ∣ y_{1 : t - 1})$ 4 to the posterior by applying a mean-field or low-rank parametrization and optimizing an ELBO objective, either via full recomputation per batch (Lee et al., 8 Apr 2025), or by exponential-family projections from prior approximate posteriors when new data arrives (Tomasetti et al., 2019).
- Fast formulations include BONG (Bayesian Online Natural Gradient) which takes a unit step of natural gradient ascent on the expected log-likelihood, initialized at the prior predictive, and omits explicit KL regularization to the prior (Jones et al., 2024).
- Low-rank and block-diagonal approximations have been developed for scalable online Bayesian inference in neural network parameter spaces (Duran-Martin et al., 13 Jun 2025).
SMC for Network Structure and Change Point Detection: For discrete model selection (e.g., Bayesian networks), candidate-structure frontiers are maintained, and statistics updated per observation; structure and parameter learning are interleaved with mechanisms for incremental forgetting (Friedman et al., 2013). For change point models, Bayesian online changepoint detection recursively updates the run-length distribution and posterior sufficient statistics, maintaining a mixture over possible changepoint locations (0710.3742).

Algorithmic offerings thus include:

Fully analytic recursion for conjugate models.
Efficient SMC with proposal and resample steps for latent or non-conjugate models.
Recursive variational or natural-gradient-based updates for high-dimensional and/or non-conjugate parameterizations.
Specialized mechanisms for filtering under distribution shift or explicit structure learning.

3. Theoretical Properties: Consistency, Efficiency, and Guarantees

The central theoretical aspects of online Bayesian sequential updating include:

Posterior Consistency: SMC-based algorithms with appropriate proposal and correction mechanisms are shown to converge weakly to the true (batch) posterior as the number of particles increases, even as model dimension grows with new data (e.g., online Bayesian phylogenetics) (Dinh et al., 2016). Variational online procedures can be show to preserve Bernstein-von Mises guarantees under mild regularity and if the mini-batch size exceeds a regime-dependent threshold, so that the total variation between the online and true batch posterior vanishes as the number of observations grows (Lee et al., 8 Apr 2025).
Generalization/Regret Bounds: PAC-Bayes and online variational inference frameworks provide explicit generalization and regret bounds that extend classical batch PAC-Bayes results to arbitrarily dependent data streams, with bounds scaling as $p(\theta \mid y_{1:t-1})$ 5 for $p(\theta \mid y_{1:t-1})$ 6-dimensional models, and holding even for non-convex and adversarial settings (Haddouche et al., 2022, Chérief-Abdellatif et al., 2019). Online regimens admit tuning of learning-rate/tempering parameters to interpolate between adaptivity and memory.
Privacy-Utility Tradeoff: In privacy-aware query settings, the sequential update maintains a running posterior that respects the cumulative privacy cost. The privacy parameter $p(\theta \mid y_{1:t-1})$ 7 per query induces a tradeoff: larger $p(\theta \mid y_{1:t-1})$ 8 delivers lower posterior variance but expends more privacy budget; total cost accumulates as $p(\theta \mid y_{1:t-1})$ 9 (Xiao et al., 2012).
Robustness and Adaptivity: By incorporating generalised Bayes updates (loss replacement and tempering) and adaptive prior schemes (e.g., tempering the previous posterior upon changepoint detection), modern Bayesian filters offer strong robustness to model misspecification and distributional shifts (Li et al., 2020, Duran-Martin, 12 May 2025).
Error Decomposition: For variational sequential updating, the cumulative gap between online variational and ideal posteriors can be bounded, and efficient update schedules, such as batch sizes $p(\theta \mid y_{1:t})$ 0, ensure that the sequentially updated posterior is asymptotically indistinguishable from the batch posterior (Lee et al., 8 Apr 2025, Tomasetti et al., 2019).

4. Scaling, Approximation, and Computational Strategies

Efficient online Bayesian inference in high-dimensional settings and nonconjugate models demands algorithmic innovation:

Method	Assumed Model	Per-Step Complexity
Full Kalman (analytic)	Linear-Gaussian	$p(\theta \mid y_{1:t})$ 1
Extended Kalman/UKF	Differentiable	$p(\theta \mid y_{1:t})$ 2/ $p(\theta \mid y_{1:t})$ 3
Subspace EKF	Low-rank, $p(\theta \mid y_{1:t})$ 4	$p(\theta \mid y_{1:t})$ 5
Diagonal+Low-Rank (LoFi)	Large $p(\theta \mid y_{1:t})$ 6	$p(\theta \mid y_{1:t})$ 7
SMC/Particle Filtering	Arbitrary	$p(\theta \mid y_{1:t})$ 8
Stochastic VB (UVB-IS)	Black-box	$p(\theta \mid y_{1:t})$ 9 per iteration

Subspace EKF and Low-Rank Approximations: For neural networks with $p(\theta \mid y_{1:t}) \propto p(y_t \mid \theta) \, p(\theta \mid y_{1:t-1})$ 0 weights, parameter estimation is confined to a learned affine subspace ( $p(\theta \mid y_{1:t}) \propto p(y_t \mid \theta) \, p(\theta \mid y_{1:t-1})$ 1), or a diagonal+low-rank parameterization of the posterior covariance structure, dramatically reducing memory and computational cost (Duran-Martin et al., 2021, Duran-Martin, 12 May 2025, Duran-Martin et al., 13 Jun 2025).
Block-Diagonal Covariance Updates: In layered models, block-diagonal structure can be imposed such that output-layer (last) parameters are updated via full-covariance Kalman steps while feature extractor weights are updated with low-rank or diagonal approximations (Duran-Martin et al., 13 Jun 2025).
Proper Improper Posterior Handling: Improper, low-rank posterior approximations can still yield valid predictive distributions provided the posterior predictive is well-defined (Duran-Martin et al., 13 Jun 2025).
Fast Variational Updates: UVB and UVB-IS enable variational approximations to be updated using only the new data, with UVB-IS using cached samples and importance weighting to further reduce per-iteration costs (Tomasetti et al., 2019).
Joint and Marginal Predictive Handling: Modern work emphasizes the difference between marginal predictives (traditional Bayesian point-wise updating) and joint predictives (rolling in new observations for sequential active learning or adaptive sampling), with online importance reweighting schemes facilitating sequential updates (Kirsch et al., 2022).
Forgetting and Adaptivity: Decaying sufficient statistics or tempering the posterior in response to detected changepoints enables online procedures to gracefully adapt to regime shifts and non-stationarities (Friedman et al., 2013, Li et al., 2020, Duran-Martin, 12 May 2025).

5. Applications and Empirical Performance

Sequential Bayesian updating has broad reach:

Differentially Private Query Systems: Online Bayesian sequential updating enables credible (1–δ)-intervals and minimum privacy expenditure in database query-answering while providing tight posterior uncertainty quantification over linear queries, with up to 50–70% privacy savings over static mechanisms (Xiao et al., 2012).
Time-Series and State-Space Models: EKF and SMC algorithms yield high-frequency, adaptive inference over latent dynamic states (as in speech, motor control, and birdsong recognition (Frölich et al., 2020)) and in sequential changepoint contexts (0710.3742).
Streaming and Nonstationary Bandits: Scalable, online Bayesian neural bandit methods (subspace EKF and low-rank updates) have demonstrated state-of-the-art regret and computational efficiency on contextual bandits and non-stationary recommendation challenges (Duran-Martin et al., 2021, Duran-Martin, 12 May 2025, Duran-Martin et al., 13 Jun 2025).
Goal and Structure Inference: Online SMC schemes accommodate latent-goal inference in bounded-rational agents (Zhi-Xuan et al., 2020) and dynamic learning of Bayesian network structure in the presence of domain changes and missing data (Friedman et al., 2013).
Robust Forecasting Under Drift: Adaptive generalised Bayesian filtering techniques (WoLF, BONE) enable robust, calibrated prediction in the face of concept drift, outliers, and unmodeled transitions in time-dependent phenomena (Duran-Martin, 12 May 2025, Li et al., 2020).
Theoretical Validation: Empirical studies in real-data settings (including high-dimensional neural networks and complex datasets such as MovieLens, UCI, and Kaggle competitions) demonstrate that properly tuned sequential Bayesian variants attain performance close to or better than batch retraining or static variants with substantially reduced memory and computation (Duran-Martin et al., 13 Jun 2025, Wagner et al., 2021).

6. Extension to Modern Inference Paradigms: Robustness, Adaptivity, and Joint Predictives

Recent work extends Bayesian online updating to address the challenges posed by increasingly complex data streams and modeling requirements:

Joint Predictive Adaptation: Marginal predictive performance is inadequate for tasks like active learning, online sampling, or experiment design that require predictions to adapt to evidence seen during testing. Ongoing research addresses the evaluation and construction of accurate joint predictive distributions in high-dimensional BNNs, but current inference schemes face technical bottlenecks due to high variance and collapse of importance weights in high dimension (Kirsch et al., 2022).
Changepoint Detection and Distribution Shift: Integration of explicit changepoint indicators and beam-search tracking of breakpoints allows dynamic adaptation to both abrupt and gradual distribution shifts, outperforming greedy and static baselines in online regression and classification under drift (Li et al., 2020).
Robust and Generalised Bayesian Methodologies: Downweighting likelihood contributions from outliers (WoLF) and substitutive loss or tempering mechanisms for model misspecification provide finite influence and distributional robustness while retaining online computational tractability (Duran-Martin, 12 May 2025).
Design/Decision Theoretic Contexts: The flexible posterior structure of online Bayesian updating enables integration with bandit, Bayesian optimization, and active sampling pipelines demanding fast, calibrated uncertainty assessment and predictive exploration (Duran-Martin et al., 13 Jun 2025).

Collectively, advances in sequential Bayesian updating now support highly adaptive, scalable, robust, and privacy-aware learning in dynamic, high-dimensional, and distributed data environments, making them a foundational component for contemporary machine learning systems.