Bayesian Forest Thompson Sampling

Updated 10 February 2026

Bayesian Forest Thompson Sampling is a family of algorithms that combines Bayesian modeling and tree ensembles to enable posterior sampling-based exploration in contextual bandits.
It leverages methods like BART and random forests to capture complex, non-linear reward structures while providing calibrated uncertainty estimates for effective action selection.
The approach offers both surrogate Gaussian approximations and full Bayesian inference, delivering strong empirical performance and rigorous theoretical regret bounds.

Bayesian Forest Thompson Sampling (BFTS) refers to a family of algorithms in contextual bandit learning that integrate principled Bayesian modeling with tree ensemble predictors to facilitate posterior sampling-based exploration. BFTS leverages the capacity of tree-based models, such as Bayesian Additive Regression Trees (BART) and random forests, to capture complex, non-linear reward structures, while providing calibrated uncertainty estimates essential for Thompson Sampling (TS). The approach has evolved from surrogate uncertainty heuristics to fully Bayesian posterior inference, establishing both practical efficiency and rigorous theoretical guarantees (Deng et al., 8 Feb 2026, Nilsson et al., 2024, Osband et al., 2015).

1. Formal Problem Setting and Algorithmic Foundations

BFTS operates in the contextual multi-armed bandit (MAB) framework. At each round $t=1,\ldots,T$ , the agent observes a context $X_t\in[0,1]^p$ and selects an arm $A_t\in\{1,\ldots,K\}$ , receiving a stochastic reward

$R_t = f_0(X_t, A_t) + \varepsilon_t, \quad \varepsilon_t\sim\mathcal{N}(0,\sigma^2).$

The objective is to minimize cumulative regret

$\text{Regret}_T = \sum_{t=1}^T\left(f_0(X_t,A_t^*) - f_0(X_t,A_t)\right),$

where $A_t^*\in\arg\max_a f_0(X_t, a)$ . BFTS algorithms maintain a growing dataset of observed $(X, a, r)$ tuples and fit a Bayesian tree ensemble model for each arm to the available data. Thompson sampling is implemented by drawing a posterior sample (or surrogate) of the reward function, maximizing over arms with these sampled estimates to select actions (Deng et al., 8 Feb 2026, Nilsson et al., 2024).

2. Bayesian Modeling via Tree Ensembles

BFTS encompasses two principal modeling paradigms:

Surrogate Posteriors over Forests (Heuristic-Bayesian): Tree-ensemble-based BFTS, as in Tree Ensemble Thompson Sampling (TETS), treats each tree output as a noisy sample. The surrogate posterior at $x$ is modeled as Gaussian:

$p(x) = \sum_{n=1}^N o_n, \quad o_n\sim\mathcal{N}(\mu_n, \sigma_n^2/c_n),$

producing an ensemble posterior

$p(x)\sim\mathcal{N}\left(\sum_{n=1}^N\mu_n,\,\sum_{n=1}^N \sigma_n^2/c_n\right),$

where $\mu_n$ and $\sigma_n^2$ are empirical mean and variance in the leaf, and $c_n$ is the leaf count (Nilsson et al., 2024).

Fully Bayesian Additive Trees (BART-based): Modern BFTS specifies a full generative model using BART, a sum-of-trees regression prior:

$g_a(x) = \sum_{j=1}^m h(x;T_j^{(a)},M_j^{(a)}),$

with a depth-geometric prior over tree structure, Dirichlet priors for split selection, quantile-based splitting thresholds, and Gaussian priors for leaf values. For each arm $a$ , a BART posterior is constructed and sampled for TS (Deng et al., 8 Feb 2026).

These approaches vary in the strength of their Bayesian calibration and computational complexity. The BART-based approach enables exact posterior inference and uncertainty quantification, while the surrogate-Gaussian view provides computational simplicity.

3. Posterior Sampling and Thompson Sampling Algorithms

The core BFTS action-selection mechanism is as follows: for each round $t$ ,

For each arm $a$ , fit the Bayesian tree model (TETS: XGBoost/random forest; BART: full MCMC posterior).
For each arm, extract the posterior (or surrogate) mean $\tilde\mu_{t,a}$ and uncertainty $\tilde\sigma^2_{t,a}$ .
Draw a sample

$\tilde r_{t,a} \sim \mathcal{N}\left(\tilde\mu_{t,a},\nu^2\tilde\sigma^2_{t,a}\right),$

where $\nu$ is an exploration-scale hyperparameter.

Choose $\displaystyle a_t = \arg\max_{a} \tilde r_{t,a}$ .
Update the history with $(X_t, a_t, R_t)$ .

The BART-based BFTS performs Markov chain Monte Carlo (MCMC) sampling using a backfitting Gibbs procedure, updating trees and model parameters in batches according to a logarithmic refresh schedule, which ensures sublinear computation in $T$ (Deng et al., 8 Feb 2026). Surrogate-Gaussian-based BFTS (TETS) re-fits forests at each round, exploiting XGBoost's staged predictions for online leaf statistics (Nilsson et al., 2024).

4. Theoretical Guarantees and Statistical Properties

BFTS built on BART admits sharp theoretical guarantees:

Bayesian Regret Bound: Under the Bayesian design, with $f_0$ sampled from the model prior, BFTS achieves

$\mathbb{E}[\text{Regret}_T] \leq K\sigma\sqrt{2Tm\Psi_T} = \tilde{O}(\sqrt{T}),$

with $\Psi_T = C_\mathrm{str}\,\log p + C_\mathrm{leaf}\,\log(1+T/(4K\kappa^2\sigma^2))$ , and $m$ the number of trees per arm. The information-theoretic analysis follows Russo & Roy (2016), bounding mutual information between $f^*$ and the history by the structure and parameter entropy in the BART model (Deng et al., 8 Feb 2026).

Frequentist Minimax Optimality: The “feel-good” BFTS variant augments the loss with an optimistic bonus, and under Hölder-smooth reward $f_0$ and a Dirichlet-sparse prior, achieves

$\sum_{t=1}^T \mathbb{E}[\text{Regret}_t] = \tilde{O}\left(K T^{(\alpha+d)/(2\alpha+d)}\right),$

matching known minimax lower bounds for nonparametric contextual bandits (Deng et al., 8 Feb 2026).

Empirical Uncertainty Calibration: BFTS with BART achieves near-nominal (e.g., 94.4%) credible-interval coverage and low expected calibration error ( $\text{ECE}<0.05$ at $T=10{,}000$ ) (Deng et al., 8 Feb 2026).

In contrast, surrogate-Gaussian BFTS/TETS borrows regret logic from Gaussian TS but lacks a proven $O(\sqrt{T})$ or $O(\log T)$ finite-time regret bound (Nilsson et al., 2024).

5. Computational Complexity and Practical Implementation

BART-Based BFTS: MCMC cost per refresh is $O(mT_a)$ per arm, refreshed $O(\log T)$ times. Wall-clock for $T=10{,}000$ is 30–45 minutes on 4 CPU cores for high-dimensional tabular datasets. Online decision per round is $O(Kp)$ (Deng et al., 8 Feb 2026).
TETS (XGBoost/Random Forest): Refits $N$ trees at each round, total cost $O(NDT^2)$ for $N$ trees of depth $D$ . XGBoost default settings are $N=100$ , depth=10, learning rate $\eta=0.05$ . Warm-starting and subsampling can lower overhead. In benchmarks, runtime per $10{,}000$ -round experiment is $\sim$ 7 hours on CPU, faster than neural baseline methods (Nilsson et al., 2024).
Bootstrapped BFTS: Ensemble-based TS via the Bayesian bootstrap re-samples or weights the empirical and artificial/prior data, and may be parallelized trivially. Incremental updating approximates full bootstrap efficiently (Osband et al., 2015).

6. Empirical Performance and Benchmark Results

BFTS exhibits state-of-the-art empirical performance on both synthetic and real-world tasks:

Synthetic Benchmarks: On Friedman-type, linear, and “SynBART” reward functions, BFTS achieves substantially lower cumulative regret compared to LinTS, LinUCB, NeuralTS, random-forest TS, and XGBoostTS. For example, in Friedman sparse/disjoint, final cumulative regret is $660.1\pm51.8$ for BFTS versus $8221.9\pm3365.1$ for LinTS (Deng et al., 8 Feb 2026).
OpenML/UCI Tabular Data: On Adult, Magic Telescope, Mushroom, Shuttle, and other datasets, BFTS outperforms all baselines on 8/9 tasks. For Mushroom, final regret is $53.9\pm7.6$ for BFTS, compared with $77.4\pm10.0$ (XGBoostTS), $69.1\pm19.6$ (RFTS), and $500.6\pm135.1$ (LinTS) (Deng et al., 8 Feb 2026, Nilsson et al., 2024).
Combinatorial Real-World Bandits: In the Luxembourg shortest-path problem, BFTS/TETS attains significantly lower and faster-converging regret compared to all baselines, while neural methods incur higher variance and much slower runtime if not GPU-accelerated (Nilsson et al., 2024).
mHealth Applications: Offline policy evaluation on the Drink Less trial with 349 participants indicates a $27.0\%$ to $38.2\%$ relative increase in engagement rate versus deployed policy, with BFTS outperforming all other baselines (Deng et al., 8 Feb 2026).

7. Methodological Variants and Extensions

Bootstrapped Thompson Sampling: Separate from BART-based BFTS, bootstrapped TS implements posterior-like randomization for ensembles via a combination of empirical and artificially generated data, weighted through (classical or Bayesian) bootstrap mechanisms. For forests, each tree is trained on a bootstrap replicate of the full history augmented with prior samples, yielding a distribution over policies for action selection. The artificial data generator and prior strength provide a means to approximate Dirichlet or Beta posterior draws for discrete rewards and can be tuned to modulate exploration behavior. This methodology supports deep exploration without explicit Bayesian posteriors, extends naturally to RL, and is computationally suited for parallelization (Osband et al., 2015).
Uncertainty Heuristics vs. Full Posteriors: Surrogate-based BFTS approximates uncertainty based on the sampling distribution of leaf statistics in ensemble trees, treating the sum of tree outputs as Gaussian with analytically derived mean and variance. Fully Bayesian BFTS (with BART) instead carries out MCMC posterior inference over all tree parameters. A plausible implication is that the latter delivers more calibrated uncertainty estimates and better supports statistical guarantees for TS (Nilsson et al., 2024, Deng et al., 8 Feb 2026).

References

BFTS with BART and theoretical guarantees: "BFTS: Thompson Sampling with Bayesian Additive Regression Trees" (Deng et al., 8 Feb 2026)
Surrogate Gaussian-ensemble BFTS (TETS): "Tree Ensembles for Contextual Bandits" (Nilsson et al., 2024)
Bootstrapped TS for deep exploration and Bayesian forest implementation: "Bootstrapped Thompson Sampling and Deep Exploration" (Osband et al., 2015)

Markdown Report Issue Upgrade to Chat

References (3)

BFTS: Thompson Sampling with Bayesian Additive Regression Trees (2026)

Tree Ensembles for Contextual Bandits (2024)

Bootstrapped Thompson Sampling and Deep Exploration (2015)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bayesian Forest Thompson Sampling (BFTS).