ForestRiesz: Robust Semiparametric Inference
- ForestRiesz is a nonparametric method that uses the Riesz representation theorem and random forest machinery to construct debiased estimators of linear functionals in causal inference.
- It constructs locally linear estimators by computing node-wise moments and employs cross-fitting to achieve double robustness and asymptotic normality.
- The method avoids explicit inverse-propensity weighting, ensuring stability and efficiency in high-dimensional and biased sample selection settings.
ForestRiesz refers to a nonparametric method for automatic, debiased machine learning of linear functionals—particularly in causal inference settings involving high-dimensional or nonparametric regression functions, and in the presence of non-random treatment assignment and/or outcome selection. The ForestRiesz framework leverages the Riesz representation theorem and random forest machinery to construct a locally linear estimator of the Riesz representer, enabling efficient, robust, and stable semiparametric inference with automatic debiasing and double robustness properties (Chernozhukov et al., 2021, Bjelac et al., 13 Jan 2026).
1. Riesz Representation and the Debiasing Problem
Let denote the data, with the regression function of interest. For a continuous linear functional %%%%2%%%%, there exists a unique Riesz representer such that
for all square-integrable . The target estimand is . In high-dimensional and nonparametric regimes, the naive plug-in estimator is subject to regularization-induced bias that can be of order or larger. The correction term
(the "one-step" or "double-robust" correction) cancels leading bias and achieves asymptotically linear estimation if is accurately estimated.
The Riesz representer solves the variational problem
where . This variational characterization is central for the automatic machine learning of (Chernozhukov et al., 2021).
2. ForestRiesz Estimator: Construction and Algorithmic Principles
ForestRiesz models , where is a chosen feature map and is a locally linear coefficient function estimated nonparametrically.
Random Forest Implementation:
- Node-wise local moments: For each node in the covariate space, compute:
The local estimator is .
- Splitting criterion: Candidate splits are evaluated via local Riesz loss reduction (or equivalently, maximization of a negative Riesz loss criterion), ensuring balance and stability.
- Forest weights and prediction: Forest similarity weights are computed by averaging indicator functions over leaves that contain in each tree,
The estimator is given by locally weighted predictions of .
- Debiased functional estimation: The final estimator is
Cross-fitting is standard: the sample is partitioned, with ForestRiesz fitted on folds excluding the target data points to avoid overfitting and induce orthogonality (Chernozhukov et al., 2021, Bjelac et al., 13 Jan 2026).
3. Theoretical Guarantees and Statistical Properties
ForestRiesz achieves several desirable asymptotic properties under standard conditions:
- -consistency and asymptotic normality: Provided each of and converge to their population targets at rates , the ForestRiesz estimator satisfies
where the influence function is .
- Double robustness and Neyman orthogonality: The estimation bias satisfies
so that consistency obtains if either or is consistently estimated; the influence function is orthogonal to estimation errors in and (Chernozhukov et al., 2021).
- No reliance on explicit inverse-propensity weights: ForestRiesz circumvents instability from small probability weights by direct local moment matching within the forest, enhancing robustness compared to standard double machine learning approaches that rely on inverse propensity estimation (Bjelac et al., 13 Jan 2026).
The following table summarizes key asymptotic results:
| Property | Description | Condition |
|---|---|---|
| -consistency | semiparametric efficiency | |
| Local CLT for | normal | Local identification, positive definite |
| Riesz consistency | ForestRiesz regularity, moment bounds |
4. Application in Sample Selection Models and Bias Decomposition
ForestRiesz extends naturally to causal inference under sample selection, where both treatment assignment and outcome observability can be non-random (Bjelac et al., 13 Jan 2026). For sample selection average treatment effect estimation, the Riesz representer admits an explicit expression involving treatment and selection propensities, and the bias from omitting latent confounders can be decomposed as
with an upper bound , where:
- is a data-identified variance factor;
- measures outcome confounding strength (partial with respect to latent );
- is the selection confounding strength (partial in the selection index).
ForestRiesz facilitates stable estimation in these settings, where direct propensity-score based methods can be numerically unstable. A quasi-Gaussian latent-index model provides a calibration method for sensitivity analysis, mapping the strength of unobserved confounding to the potential for treatment effect estimate overturning.
5. Simulation and Empirical Evidence
In simulation studies, ForestRiesz is benchmarked against conventional double machine learning (SSM) and naive approaches (IRM) (Bjelac et al., 13 Jan 2026). In a standard MAR selection design for ATE:
- Both SSM and ForestRiesz recover the truth as increases.
- ForestRiesz demonstrates superior stability and faster bias decay with default tunings; SSM can require careful hyperparameter adjustment.
Empirically, in U.S. gender wage gap analysis using American Community Survey data (2016), ForestRiesz yields larger estimated wage gaps (in absolute value) compared to unadjusted and propensity-score-based approaches. For example:
- For college graduates, ForestRiesz estimates are (SE 0.002), versus for IRM ().
- The approach detects underestimation of the wage gap by models that ignore sample selection.
Sensitivity analysis delivers explicit robustness values: overturning the wage gap would require unobserved confounding (partial ), implying robustness to substantial levels of hidden selection bias.
6. Methodological Implications and Extensions
ForestRiesz provides a unified and robust estimator for general linear functionals—including but not limited to average treatment effects and average marginal effects—in the presence of complex sampling, high-dimensional covariates, and selective outcome observability. The method exploits the structure of the Riesz representer to automate debiasing and sidestep tuning-sensitive propensity or density estimation. This suggests ForestRiesz is particularly well-suited for finite samples, ill-posed inverse problems, and any context where orthogonality and stability are essential (Chernozhukov et al., 2021, Bjelac et al., 13 Jan 2026).
The integration of locally linear random forests for Riesz learning, coupled with doubly-robust cross-fitting and explicit influence-function-based sensitivity analysis, makes ForestRiesz a comprehensive tool for practitioners handling bias, regularization, and selection in modern causal inference frameworks.