Papers
Topics
Authors
Recent
Search
2000 character limit reached

Causal Forests with Fixed Effects (CFFE)

Updated 16 January 2026
  • Causal Forests with Fixed Effects (CFFE) are advanced nonparametric estimators that isolate true treatment heterogeneity by locally residualizing unit and time fixed effects in panel data.
  • They overcome biases common in standard causal forests by addressing spurious heterogeneity and inadequacies in global demeaning, leading to improved CATE, ATE, and GATE estimates.
  • Both frequentist and Bayesian implementations have demonstrated robust performance through simulations and empirical studies, ensuring scalable inference in complex panel environments.

Causal Forests with Fixed Effects (CFFE) are advanced nonparametric estimators designed for heterogeneous treatment effect inference in panel data with unit and time fixed effects. Conventional causal forest algorithms assume independent and identically distributed observations and do not adequately address bias introduced by systematic unit and time effects. The CFFE methodology orthogonalizes fixed effects locally during tree-building, thereby isolating genuine treatment heterogeneity and overcoming confounding from spurious sources. Key implementations include node-level residualization in frequentist forests (Aytug, 15 Jan 2026) and the Parallel Trends Assumption (PTA)–based Bayesian Causal Forests in the DiD-BCF model (Souto et al., 14 May 2025). Simulation and empirical results confirm that CFFE yields accurate Conditional Average Treatment Effect (CATE), Group Average Treatment Effect (GATE), and Average Treatment Effect (ATE) estimates in rich panel settings.

1. Panel Data Model with Fixed Effects

CFFE algorithms operate on balanced panel data comprising NN units across TT time periods. Observed outcomes are YitY_{it}, treatment indicators Dit{0,1}D_{it}\in\{0,1\}, and covariates XitRpX_{it}\in\mathbb{R}^p. The canonical two-way fixed effects model is: Yit=αi+γt+τ(Xit)Dit+εitY_{it} = \alpha_i + \gamma_t + \tau(X_{it}) D_{it} + \varepsilon_{it} where αi\alpha_i denotes unit-level fixed effects, γt\gamma_t represents time-level shocks, τ(Xit)\tau(X_{it}) is the CATE function, and εit\varepsilon_{it} is a mean-zero error. Identification requires conditional parallel trends, overlap (Pr(Dit=1Xit=x)(ε,1ε)\Pr(D_{it}=1|X_{it}=x) \in (\varepsilon, 1-\varepsilon) for some ε>0\varepsilon > 0), and SUTVA (no spillovers).

In the PTA-based Bayesian Causal Forest framework, the model is reparameterized: Yit=μ(Xit)+τ(Xit)Dit+εitY_{it} = \mu(X_{it}) + \tau(X_{it}) D_{it} + \varepsilon_{it} with μ(Xit)\mu(X_{it}) absorbing baseline variation including fixed effects and τ(Xit)\tau(X_{it}) active only post-treatment.

2. Limitations of Standard Causal Forests in Panel Settings

Traditional causal forests maximize local heterogeneity via greedy covariate splits under the assumption of i.i.d. observations. In panel contexts, fixed effects induce two principal distortions:

  1. Spurious heterogeneity: Covariate splits that inadvertently segment units or periods with differing αi\alpha_i or γt\gamma_t produce inflated estimates of treatment effect variation, confounding true CATE heterogeneity.
  2. Residualization inadequacy: Applying global sample-wide demeaning for fixed effect correction fails to account for the compositional changes within tree nodes, leaving local bias unaddressed.

These issues result in inconsistent treatment effect estimation particularly when fixed effects are correlated with covariates. Standard causal forest estimators such as EconML’s CausalForestDML exhibit substantial bias and inflated RMSE in such panel environments (Aytug, 15 Jan 2026).

3. Node-Level Fixed Effect Residualization

CFFE employs localized fixed effect residualization during recursive partitioning. For each tree node N\mathcal{N}, the subset-specific two-way FE model is fit: Y~it(N)=Yitα^i(N)γ^t(N)\tilde{Y}_{it}^{(\mathcal{N})} = Y_{it} - \hat{\alpha}_i^{(\mathcal{N})} - \hat{\gamma}_t^{(\mathcal{N})}

D~it(N)=Ditδ^i(N)η^t(N)\tilde{D}_{it}^{(\mathcal{N})} = D_{it} - \hat{\delta}_i^{(\mathcal{N})} - \hat{\eta}_t^{(\mathcal{N})}

where {α^i(N),γ^t(N)}\{ \hat{\alpha}_i^{(\mathcal{N})}, \hat{\gamma}_t^{(\mathcal{N})} \} are node-level estimates, iteratively obtained via alternating unit and time demeaning until convergence. This orthogonalization ensures that subsequent splitting and estimation target genuine between-node heterogeneity in τ(X)\tau(X).

In Bayesian causal forest constructions, the reparameterization via PTA additionally prevents contamination from pre-treatment periods by restricting treatment-effect tree growth exclusively to post-treatment observations (Souto et al., 14 May 2025).

4. Splitting and Estimation Procedures

Frequentist CFFE

Splitting in CFFE trees is governed by local residuals. For a proposed split SS partitioning N\mathcal{N} into L\mathcal{L} and R\mathcal{R}:

  • Estimate local treatment effects: τ^C=(i,t)CD~itY~it(i,t)CD~it2\hat{\tau}_{\mathcal{C}} = \frac{\sum_{(i,t)\in\mathcal{C}} \tilde{D}_{it} \tilde{Y}_{it}}{\sum_{(i,t)\in\mathcal{C}} \tilde{D}_{it}^2}
  • Define split quality: Δ(S)=nLnRn2(τ^Lτ^R)2\Delta(S) = \frac{ n_{\mathcal{L}} n_{\mathcal{R}} }{ n^2 } \left( \hat{\tau}_{\mathcal{L}} - \hat{\tau}_{\mathcal{R}} \right)^2 The optimal split maximizes Δ(S)\Delta(S), balancing heterogeneity enhancement and node size regularity.

Tree growth incorporates cluster-aware subsampling (sampling units and all their observations), honest splitting (random structure/estimation splits), recursive node construction, and leaf-wise estimation via re-residualized data.

Bayesian DiD-BCF

The Bayesian approach deploys two forests: a prognostic (baseline) forest for μ(x)\mu(x) and a treatment effect forest for τ(x)\tau(x). Each uses tree-structure priors (with depth-dependent split probabilities), leaf-parameter priors (Half-Cauchy or Gaussian), and an error-variance inverse-χ2\chi^2 prior. Fitting leverages Gibbs sampling with optional warm-start initialization, alternating updates for the μ\mu and τ\tau forests based on residuals, and explicit FE sampling if desired (Souto et al., 14 May 2025). Posterior draws enable direct inference for CATE, ATE, and GATE.

5. Statistical Properties and Inferential Procedures

CFFE acheives consistency and asymptotic normality under regularity as N,TN,T \rightarrow \infty. Specifically,

N(τ^(x)τ(x))dN(0,σ2(x))\sqrt{N} \left( \hat{\tau}(x) - \tau(x) \right) \xrightarrow{d} N \left( 0, \sigma^2(x) \right)

Variance is estimated via half-forest Jackknife: forests are split, yielding two independent estimates (τ^(1),τ^(2))(\hat{\tau}^{(1)}, \hat{\tau}^{(2)}), with

Var(τ^(x))^12(τ^(1)(x)τ^(2)(x))2\widehat{ Var( \hat{\tau}(x) ) } \approx \frac{1}{2} \left( \hat{\tau}^{(1)}(x) - \hat{\tau}^{(2)}(x) \right)^2

Confidence intervals take the form τ^(x)±z1α/2SE(τ^(x))^\hat{\tau}(x) \pm z_{1-\alpha/2} \widehat{SE(\hat{\tau}(x))}; coverage degrades modestly in finite samples (Aytug, 15 Jan 2026).

CATE, ATE, and GATE are derived directly via τ^(x)\hat{\tau}(x) estimators. For DiD-BCF, GATEs can be computed for pre-specified groups (e.g., event-time or cohort membership).

6. Implementation and Practical Considerations

Python (causalfe):

  • Core class: CFFEForest, supporting cluster subsampling, honest splitting, node-specific FE residualization, split optimization, and tree aggregation.
  • Fitting: fit(X, Y, D, unit_ids, time_ids).
  • Prediction: predict(X_new) and predict_interval(X_new, alpha) for confidence intervals.
  • Follows scikit-learn API conventions (Aytug, 15 Jan 2026).

Bayesian approaches:

  • R packages: bcf, XBCF (for warm-start), as well as custom Stan/PyMC implementations.
  • Python: adaptation of bartpy or custom BART code (Souto et al., 14 May 2025).
  • Warm-start via XBART reduces MCMC burn-in substantially.

Hyperparameter settings (number of trees, split probabilities α,δ\alpha, \delta) are robust to defaults, but further shrinkage or deeper splits may be necessary to avoid overfitting. Computational demands are nontrivial for large panels (NT>105N T > 10^5).

7. Simulation Results and Empirical Applications

Simulation studies (e.g., N=200N=200, T=6T=6, 100 replications) validate CFFE performance on placebo (τ(x)=0\tau(x)=0), homogeneous (τ(x)=2\tau(x)=2), heterogeneous (τ(x)=x1\tau(x)=x_1), and fixed effect–correlated DGPs:

  • Placebo: mean τ^0\hat{\tau}\approx 0, RMSE 0.25\approx 0.25, coverage 56%\approx 56\%.
  • Homogeneous: mean τ^1.79\hat{\tau}\approx 1.79 (true = 2), RMSE 0.34\approx 0.34.
  • Heterogeneous: RMSE 0.54\approx 0.54, corr(τ^,τ)0.90(\hat{\tau}, \tau) \approx 0.90.
  • CFFE exhibits lower RMSE than non-FE forests when αi\alpha_i correlated with xx (Aytug, 15 Jan 2026).

DiD-BCF simulations (linear, nonlinear, staggered adoption, selection, CHTE) confirm lowest RMSE/MAE/MAPE on ATE/GATE/CATE versus alternatives (TWFE, DiD-DR, two-stage DiD, Synthetic DiD, DoubleML) (Souto et al., 14 May 2025).

An applied study on US minimum wage policy finds that DiD-BCF reveals stronger negative employment effects in small counties (ATE 0.164\approx -0.164) than are detectable with parametric TWFE or DiD-DR, demonstrating CFFE’s capacity to uncover nuanced heterogeneity (Souto et al., 14 May 2025).

8. Assumptions, Advantages, and Limitations

CFFE and DiD-BCF rely on parallel trends (unconditional or conditional), SUTVA, and no anticipation. They offer direct identification of CATE, GATE, and group/event-time effects, flexible nonlinear modeling, and full posterior inference capabilities. The PTA-based reparameterization in DiD-BCF improves identifiability and mixing in MCMC, isolates treatment periods, and guards against bias from pre-treatment data.

Computational complexity is a limitation for large N×TN \times T panels, and for DiD-BCF, efficacy depends on correct specification of PTA and hyperparameter choices (Souto et al., 14 May 2025). A plausible implication is that further methodological work on scalable Bayesian panel forests may be beneficial.

Causal Forests with Fixed Effects constitute a unified, robust framework for granular, heterogeneous causal inference in complex panel environments, addressing key vulnerabilities of parametric and nonparametric alternatives.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Causal Forests with Fixed Effects (CFFE).