Bayesian Causal Forest (BCF)

Updated 30 January 2026

Bayesian Causal Forest is a Bayesian nonparametric method that uses two independent tree ensembles to separately model baseline outcomes and treatment effects.
It incorporates propensity scores into the baseline model to adjust for confounding, thereby reducing bias and improving credible interval coverage.
The model employs blockwise MCMC for estimation, demonstrating superior performance in simulations and empirical studies through robust uncertainty quantification.

A Bayesian Causal Forest (BCF) is a Bayesian nonparametric regression model for the estimation of heterogeneous treatment effects from observational data, particularly addressing small effect sizes, effect heterogeneity, and strong confounding. Originating with the work of Hahn, Murray, and Carvalho (2017), BCF innovates on the foundational Bayesian Additive Regression Trees (BART) framework by inducing covariate-dependent priors via explicit use of propensity scores, enforcing separate regularization of prognostic and effect surfaces, and enabling robust causal inference under targeted selection and regularization-induced confounding (Hahn et al., 2017).

1. Model Structure and Parameterization

The BCF framework models $n$ i.i.d. units, indexed by $i$ , with observed covariates $x_i \in \mathbb{R}^d$ , binary treatment indicator $Z_i \in \{0,1\}$ , and outcome $Y_i \in \mathbb{R}$ . The conditional mean of the outcome is decomposed as

$Y_i = \mu(x_i) + \tau(x_i) Z_i + \varepsilon_i, \qquad \varepsilon_i \sim N(0, \sigma^2)$

where:

$\mu(x)$ denotes the prognostic (baseline) function, approximating $\mathbb{E}[Y \mid Z=0, x]$ ;
$\tau(x)$ is the conditional average treatment effect (CATE), $\mathbb{E}[Y \mid Z=1, x] - \mathbb{E}[Y \mid Z=0, x]$ ;
$\tau(x_i) Z_i$ selects the treatment effect only for treated units;
$\varepsilon_i$ is i.i.d. Gaussian noise.

Both $\mu(x)$ and $\tau(x)$ are modeled nonparametrically as independent sums of shallow regression trees: $\mu(x) = \sum_{l=1}^{L_\mu} g^{(\mu)}_l(x), \qquad \tau(x) = \sum_{l=1}^{L_\tau} g^{(\tau)}_l(x)$ for piecewise-constant (BART-style) trees $g^{(\cdot)}_l$ .

2. Bayesian Priors and Regularization Schemes

The BCF model employs tree-ensemble priors independently on $\mu$ and $\tau$ , with the following key features (Hahn et al., 2017):

Tree-structure prior: The probability a node at depth $h$ splits is $\eta(1+h)^{-\beta}$ , with default $(\eta, \beta)=(0.95,2)$ for $\mu$ (favoring small trees), and $(\eta, \beta)=(0.25,3)$ for $\tau$ (heavy shrinkage to homogeneity).
Leaf-parameter prior: If tree $l$ has $B_l$ leaves with values $m_{lb}$ , then $m_{lb} \sim N(0, \sigma^2_{m})$ , with $\sigma_{m} = \sigma_0/\sqrt{L}$ , centering the ensemble at zero.
Leaf prior scales:
- $\mu$ -tree leaves: half-Cauchy prior, scale $\approx 2 \times \widehat{\mathrm{sd}(Y)}$ ;
- $\tau$ -tree leaves: half-Normal prior, scale $\widehat{\mathrm{sd}(Y)}$ , enforcing stronger shrinkage ("shrink to homogeneity").

Ensemble sizes are typically $L_\mu \approx 200$ and $L_\tau \approx 50$ , reflecting smoother treatment effect surfaces relative to potential outcome surfaces.

3. Incorporation of the Propensity Score

Addressing regularization-induced confounding (RIC), BCF incorporates a plug-in estimate of the propensity score $\hat{\pi}_i \approx \Pr(Z_i=1 \mid x_i)$ as an additional covariate in $\mu$ -trees: $\mu(x) \sim \text{BART}(x, \hat{\pi}(x))$ This covariate-dependent prior enables adjustment for targeted selection, reducing RIC bias while retaining tree-ensemble flexibility (Hahn et al., 2017). Empirically, including $\hat{\pi}(x)$ yields bias reductions and improved credible interval coverage in simulated and real-world settings.

4. Posterior Computation and Inference

Inference proceeds via a blockwise MCMC ("Bayesian backfitting") approach:

Updating $\mu$ -forest: At each iteration, compute residuals $R^{(\mu)}_i = Y_i - Z_i \tau(x_i)$ ; for each tree, propose local changes (grow/prune/change/swap) and sample new leaf values from the Gaussian posterior given residuals.
Updating $\tau$ -forest: Similar procedure with residuals $R^{(\tau)}_i = Y_i - \mu(x_i)$ , updating only on treated units for informativeness.
Noise variance $\sigma^2$ is drawn from its inverse-gamma full conditional.

The two-forests structure enables alternated Gibbs-like updates and separation of baseline and treatment effect signals, a distinction crucial for bias control under confounding.

5. Addressing Confounding and Heterogeneity

Traditional nonlinear regression models (including single-forest BART) can suffer substantial $\tau$ bias under targeted selection, as regularization may shrink heterogeneous effects towards zero unless the prognostic landscape is sufficiently rich. The BCF model's dual-ensemble architecture with separate regularization permits accurate recovery of CATE by isolating effect heterogeneity and aligning baseline modeling with the treatment assignment structure (Hahn et al., 2017).

Simulation studies with strong confounding (e.g., “diagonal-shelf” design) demonstrate that BCF outperforms both vanilla BART and ps-BART (BART with single-forest plus propensity score covariate) in RMSE and credible interval calibration for ATE and CATE. In the 2016/2017 ACIC competitions, BCF was among the top methods globally on bias, RMSE, and PEHE (precision in estimating heterogeneous effects) across complex synthetic scenarios.

6. Empirical Applications and Extensions

Key applications include the reanalysis of a large observational study of smoking on medical expenditures ( $n=6{,}798$ adults, $10$ demographic/behavioral covariates). BCF revealed age-moderated treatment effects, with sharper negative impacts among younger smokers, and produced more conservative ATE estimates than vanilla BART, indicating mitigation of RIC bias. Post-hoc fit-the-fit trees on $\hat{\tau}(x)$ posterior draws identified subgroups (e.g., by age and sex) with significant ATE contrasts (Hahn et al., 2017).

The BCF architecture serves as a modular foundation for contemporary extensions:

Multivariate Outcomes (McJames et al., 2023): BCF with vector-valued outcomes and correlated residual structures.
Longitudinal Panel Data (McJames et al., 2024, Souto et al., 14 May 2025): BCF for growth increment and difference-in-differences designs.
Principal Stratification (Kim et al., 2024): Joint modeling of outcomes and intermediate variables with BCF forests.
Handling Aggregated Data (aBCF) (Thal et al., 2024), Ordered Treatments (Zorzetto et al., 23 Jan 2026), and Instrumental Variables (Bargagli-Stoffi et al., 2019).
Targeted Smoothing and Shrinkage (Starling et al., 2019, Caron et al., 2021), enabling smooth effect surfaces and sparsity adaptation.

These developments preserve the BCF core: separate flexible forests for baseline and effect, estimation of CATE, and robust Bayesian uncertainty quantification.

7. Theoretical Impact and Practical Considerations

BCF’s principal methodological advances are:

Nonparametric modeling of CATE under strong confounding, with robust bias control.
Separate regularization of baseline and effect surfaces, facilitating informative “shrink to homogeneity.”
Explicit use of the propensity score in the baseline model, ameliorating RIC.
Backfitting MCMC or stochastic tree ensemble (XBCF) algorithms for computational efficiency (Krantsevich et al., 2022).

BCF provides point and interval estimates for individual or average treatment effects with uncertainty quantification calibrated by the hierarchy, tree-ensemble regularization, and correct propagation of propensity estimation. Further, the two-ensemble representation enables modular extension to multistage, hierarchical, and longitudinal causal settings.

In summary, BCF provides a rigorously constructed Bayesian framework for the estimation of heterogeneous treatment effects in observational data with strong confounding, nonlinearity, and high-dimensional covariates, with empirical superiority demonstrated across extensive synthetic and applied evaluations (Hahn et al., 2017).