Papers
Topics
Authors
Recent
Search
2000 character limit reached

Bayesian Causal Forest (BCF)

Updated 30 January 2026
  • Bayesian Causal Forest is a Bayesian nonparametric method that uses two independent tree ensembles to separately model baseline outcomes and treatment effects.
  • It incorporates propensity scores into the baseline model to adjust for confounding, thereby reducing bias and improving credible interval coverage.
  • The model employs blockwise MCMC for estimation, demonstrating superior performance in simulations and empirical studies through robust uncertainty quantification.

A Bayesian Causal Forest (BCF) is a Bayesian nonparametric regression model for the estimation of heterogeneous treatment effects from observational data, particularly addressing small effect sizes, effect heterogeneity, and strong confounding. Originating with the work of Hahn, Murray, and Carvalho (2017), BCF innovates on the foundational Bayesian Additive Regression Trees (BART) framework by inducing covariate-dependent priors via explicit use of propensity scores, enforcing separate regularization of prognostic and effect surfaces, and enabling robust causal inference under targeted selection and regularization-induced confounding (Hahn et al., 2017).

1. Model Structure and Parameterization

The BCF framework models nn i.i.d. units, indexed by ii, with observed covariates xiRdx_i \in \mathbb{R}^d, binary treatment indicator Zi{0,1}Z_i \in \{0,1\}, and outcome YiRY_i \in \mathbb{R}. The conditional mean of the outcome is decomposed as

Yi=μ(xi)+τ(xi)Zi+εi,εiN(0,σ2)Y_i = \mu(x_i) + \tau(x_i) Z_i + \varepsilon_i, \qquad \varepsilon_i \sim N(0, \sigma^2)

where:

  • μ(x)\mu(x) denotes the prognostic (baseline) function, approximating E[YZ=0,x]\mathbb{E}[Y \mid Z=0, x];
  • τ(x)\tau(x) is the conditional average treatment effect (CATE), E[YZ=1,x]E[YZ=0,x]\mathbb{E}[Y \mid Z=1, x] - \mathbb{E}[Y \mid Z=0, x];
  • τ(xi)Zi\tau(x_i) Z_i selects the treatment effect only for treated units;
  • εi\varepsilon_i is i.i.d. Gaussian noise.

Both μ(x)\mu(x) and τ(x)\tau(x) are modeled nonparametrically as independent sums of shallow regression trees: μ(x)=l=1Lμgl(μ)(x),τ(x)=l=1Lτgl(τ)(x)\mu(x) = \sum_{l=1}^{L_\mu} g^{(\mu)}_l(x), \qquad \tau(x) = \sum_{l=1}^{L_\tau} g^{(\tau)}_l(x) for piecewise-constant (BART-style) trees gl()g^{(\cdot)}_l.

2. Bayesian Priors and Regularization Schemes

The BCF model employs tree-ensemble priors independently on μ\mu and τ\tau, with the following key features (Hahn et al., 2017):

  • Tree-structure prior: The probability a node at depth hh splits is η(1+h)β\eta(1+h)^{-\beta}, with default (η,β)=(0.95,2)(\eta, \beta)=(0.95,2) for μ\mu (favoring small trees), and (η,β)=(0.25,3)(\eta, \beta)=(0.25,3) for τ\tau (heavy shrinkage to homogeneity).
  • Leaf-parameter prior: If tree ll has BlB_l leaves with values mlbm_{lb}, then mlbN(0,σm2)m_{lb} \sim N(0, \sigma^2_{m}), with σm=σ0/L\sigma_{m} = \sigma_0/\sqrt{L}, centering the ensemble at zero.
  • Leaf prior scales:
    • μ\mu-tree leaves: half-Cauchy prior, scale 2×sd(Y)^\approx 2 \times \widehat{\mathrm{sd}(Y)};
    • τ\tau-tree leaves: half-Normal prior, scale sd(Y)^\widehat{\mathrm{sd}(Y)}, enforcing stronger shrinkage ("shrink to homogeneity").

Ensemble sizes are typically Lμ200L_\mu \approx 200 and Lτ50L_\tau \approx 50, reflecting smoother treatment effect surfaces relative to potential outcome surfaces.

3. Incorporation of the Propensity Score

Addressing regularization-induced confounding (RIC), BCF incorporates a plug-in estimate of the propensity score π^iPr(Zi=1xi)\hat{\pi}_i \approx \Pr(Z_i=1 \mid x_i) as an additional covariate in μ\mu-trees: μ(x)BART(x,π^(x))\mu(x) \sim \text{BART}(x, \hat{\pi}(x)) This covariate-dependent prior enables adjustment for targeted selection, reducing RIC bias while retaining tree-ensemble flexibility (Hahn et al., 2017). Empirically, including π^(x)\hat{\pi}(x) yields bias reductions and improved credible interval coverage in simulated and real-world settings.

4. Posterior Computation and Inference

Inference proceeds via a blockwise MCMC ("Bayesian backfitting") approach:

  • Updating μ\mu-forest: At each iteration, compute residuals Ri(μ)=YiZiτ(xi)R^{(\mu)}_i = Y_i - Z_i \tau(x_i); for each tree, propose local changes (grow/prune/change/swap) and sample new leaf values from the Gaussian posterior given residuals.
  • Updating τ\tau-forest: Similar procedure with residuals Ri(τ)=Yiμ(xi)R^{(\tau)}_i = Y_i - \mu(x_i), updating only on treated units for informativeness.
  • Noise variance σ2\sigma^2 is drawn from its inverse-gamma full conditional.

The two-forests structure enables alternated Gibbs-like updates and separation of baseline and treatment effect signals, a distinction crucial for bias control under confounding.

5. Addressing Confounding and Heterogeneity

Traditional nonlinear regression models (including single-forest BART) can suffer substantial τ\tau bias under targeted selection, as regularization may shrink heterogeneous effects towards zero unless the prognostic landscape is sufficiently rich. The BCF model's dual-ensemble architecture with separate regularization permits accurate recovery of CATE by isolating effect heterogeneity and aligning baseline modeling with the treatment assignment structure (Hahn et al., 2017).

Simulation studies with strong confounding (e.g., “diagonal-shelf” design) demonstrate that BCF outperforms both vanilla BART and ps-BART (BART with single-forest plus propensity score covariate) in RMSE and credible interval calibration for ATE and CATE. In the 2016/2017 ACIC competitions, BCF was among the top methods globally on bias, RMSE, and PEHE (precision in estimating heterogeneous effects) across complex synthetic scenarios.

6. Empirical Applications and Extensions

Key applications include the reanalysis of a large observational study of smoking on medical expenditures (n=6,798n=6{,}798 adults, $10$ demographic/behavioral covariates). BCF revealed age-moderated treatment effects, with sharper negative impacts among younger smokers, and produced more conservative ATE estimates than vanilla BART, indicating mitigation of RIC bias. Post-hoc fit-the-fit trees on τ^(x)\hat{\tau}(x) posterior draws identified subgroups (e.g., by age and sex) with significant ATE contrasts (Hahn et al., 2017).

The BCF architecture serves as a modular foundation for contemporary extensions:

These developments preserve the BCF core: separate flexible forests for baseline and effect, estimation of CATE, and robust Bayesian uncertainty quantification.

7. Theoretical Impact and Practical Considerations

BCF’s principal methodological advances are:

  • Nonparametric modeling of CATE under strong confounding, with robust bias control.
  • Separate regularization of baseline and effect surfaces, facilitating informative “shrink to homogeneity.”
  • Explicit use of the propensity score in the baseline model, ameliorating RIC.
  • Backfitting MCMC or stochastic tree ensemble (XBCF) algorithms for computational efficiency (Krantsevich et al., 2022).

BCF provides point and interval estimates for individual or average treatment effects with uncertainty quantification calibrated by the hierarchy, tree-ensemble regularization, and correct propagation of propensity estimation. Further, the two-ensemble representation enables modular extension to multistage, hierarchical, and longitudinal causal settings.

In summary, BCF provides a rigorously constructed Bayesian framework for the estimation of heterogeneous treatment effects in observational data with strong confounding, nonlinearity, and high-dimensional covariates, with empirical superiority demonstrated across extensive synthetic and applied evaluations (Hahn et al., 2017).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bayesian Causal Forest.