ForestRiesz: Robust Semiparametric Inference

Updated 9 February 2026

ForestRiesz is a nonparametric method that uses the Riesz representation theorem and random forest machinery to construct debiased estimators of linear functionals in causal inference.
It constructs locally linear estimators by computing node-wise moments and employs cross-fitting to achieve double robustness and asymptotic normality.
The method avoids explicit inverse-propensity weighting, ensuring stability and efficiency in high-dimensional and biased sample selection settings.

ForestRiesz refers to a nonparametric method for automatic, debiased machine learning of linear functionals—particularly in causal inference settings involving high-dimensional or nonparametric regression functions, and in the presence of non-random treatment assignment and/or outcome selection. The ForestRiesz framework leverages the Riesz representation theorem and random forest machinery to construct a locally linear estimator of the Riesz representer, enabling efficient, robust, and stable semiparametric inference with automatic debiasing and double robustness properties (Chernozhukov et al., 2021, Bjelac et al., 13 Jan 2026).

1. Riesz Representation and the Debiasing Problem

Let $W = (Y, Z)$ denote the data, with $g_0(Z) = E[Y \mid Z]$ the regression function of interest. For a continuous linear functional $\psi(g)$ , there exists a unique Riesz representer $\alpha_0(Z)$ such that

$\psi(g) = E[m(W; g)] = E[\alpha_0(Z) g(Z)]$

for all square-integrable $g$ . The target estimand is $\theta_0 = \psi(g_0) = E[\alpha_0(Z) g_0(Z)]$ . In high-dimensional and nonparametric regimes, the naive plug-in estimator is subject to regularization-induced bias that can be of order $n^{-1/2}$ or larger. The correction term

$\psi(\widehat{g}) + E_n[\alpha_0(Z) \{Y - \widehat{g}(Z)\}]$

(the "one-step" or "double-robust" correction) cancels leading bias and achieves asymptotically linear estimation if $\alpha_0$ is accurately estimated.

The Riesz representer $g_0(Z) = E[Y \mid Z]$ 0 solves the variational problem

$g_0(Z) = E[Y \mid Z]$ 1

where $g_0(Z) = E[Y \mid Z]$ 2. This variational characterization is central for the automatic machine learning of $g_0(Z) = E[Y \mid Z]$ 3 (Chernozhukov et al., 2021).

2. ForestRiesz Estimator: Construction and Algorithmic Principles

ForestRiesz models $g_0(Z) = E[Y \mid Z]$ 4, where $g_0(Z) = E[Y \mid Z]$ 5 is a chosen feature map and $g_0(Z) = E[Y \mid Z]$ 6 is a locally linear coefficient function estimated nonparametrically.

Random Forest Implementation:

Node-wise local moments: For each node $g_0(Z) = E[Y \mid Z]$ 7 in the covariate space, compute:

$g_0(Z) = E[Y \mid Z]$ 8

The local estimator is $g_0(Z) = E[Y \mid Z]$ 9.

Splitting criterion: Candidate splits are evaluated via local Riesz loss reduction (or equivalently, maximization of a negative Riesz loss criterion), ensuring balance and stability.
Forest weights and prediction: Forest similarity weights $\psi(g)$ 0 are computed by averaging indicator functions over leaves that contain $\psi(g)$ 1 in each tree,

$\psi(g)$ 2

The estimator $\psi(g)$ 3 is given by locally weighted predictions of $\psi(g)$ 4.

Debiased functional estimation: The final estimator is

$\psi(g)$ 5

Cross-fitting is standard: the sample is partitioned, with ForestRiesz fitted on folds excluding the target data points to avoid overfitting and induce orthogonality (Chernozhukov et al., 2021, Bjelac et al., 13 Jan 2026).

3. Theoretical Guarantees and Statistical Properties

ForestRiesz achieves several desirable asymptotic properties under standard conditions:

$\psi(g)$ 6-consistency and asymptotic normality: Provided each of $\psi(g)$ 7 and $\psi(g)$ 8 converge to their population targets at rates $\psi(g)$ 9, the ForestRiesz estimator satisfies

$\alpha_0(Z)$ 0

where the influence function is $\alpha_0(Z)$ 1.

Double robustness and Neyman orthogonality: The estimation bias satisfies

$\alpha_0(Z)$ 2

so that consistency obtains if either $\alpha_0(Z)$ 3 or $\alpha_0(Z)$ 4 is consistently estimated; the influence function is orthogonal to estimation errors in $\alpha_0(Z)$ 5 and $\alpha_0(Z)$ 6 (Chernozhukov et al., 2021).

No reliance on explicit inverse-propensity weights: ForestRiesz circumvents instability from small probability weights by direct local moment matching within the forest, enhancing robustness compared to standard double machine learning approaches that rely on inverse propensity estimation (Bjelac et al., 13 Jan 2026).

The following table summarizes key asymptotic results:

Property	Description	Condition
$\alpha_0(Z)$ 7-consistency	$\alpha_0(Z)$ 8 semiparametric efficiency	$\alpha_0(Z)$ 9
Local CLT for $\psi(g) = E[m(W; g)] = E[\alpha_0(Z) g(Z)]$ 0	$\psi(g) = E[m(W; g)] = E[\alpha_0(Z) g(Z)]$ 1 normal	Local identification, positive definite $\psi(g) = E[m(W; g)] = E[\alpha_0(Z) g(Z)]$ 2
Riesz consistency	$\psi(g) = E[m(W; g)] = E[\alpha_0(Z) g(Z)]$ 3	ForestRiesz regularity, moment bounds

4. Application in Sample Selection Models and Bias Decomposition

ForestRiesz extends naturally to causal inference under sample selection, where both treatment assignment and outcome observability can be non-random (Bjelac et al., 13 Jan 2026). For sample selection average treatment effect estimation, the Riesz representer admits an explicit expression involving treatment and selection propensities, and the bias from omitting latent confounders can be decomposed as

$\psi(g) = E[m(W; g)] = E[\alpha_0(Z) g(Z)]$ 4

with an upper bound $\psi(g) = E[m(W; g)] = E[\alpha_0(Z) g(Z)]$ 5, where:

$\psi(g) = E[m(W; g)] = E[\alpha_0(Z) g(Z)]$ 6 is a data-identified variance factor;
$\psi(g) = E[m(W; g)] = E[\alpha_0(Z) g(Z)]$ 7 measures outcome confounding strength (partial $\psi(g) = E[m(W; g)] = E[\alpha_0(Z) g(Z)]$ 8 with respect to latent $\psi(g) = E[m(W; g)] = E[\alpha_0(Z) g(Z)]$ 9);
$g$ 0 is the selection confounding strength (partial $g$ 1 in the selection index).

ForestRiesz facilitates stable estimation in these settings, where direct propensity-score based methods can be numerically unstable. A quasi-Gaussian latent-index model provides a calibration method for sensitivity analysis, mapping the strength of unobserved confounding to the potential for treatment effect estimate overturning.

5. Simulation and Empirical Evidence

In simulation studies, ForestRiesz is benchmarked against conventional double machine learning (SSM) and naive approaches (IRM) (Bjelac et al., 13 Jan 2026). In a standard MAR selection design for ATE:

Both SSM and ForestRiesz recover the truth as $g$ 2 increases.
ForestRiesz demonstrates superior stability and faster bias decay with default tunings; SSM can require careful hyperparameter adjustment.

Empirically, in U.S. gender wage gap analysis using American Community Survey data (2016), ForestRiesz yields larger estimated wage gaps (in absolute value) compared to unadjusted and propensity-score-based approaches. For example:

For college graduates, ForestRiesz estimates are $g$ 3 (SE 0.002), versus $g$ 4 for IRM ( $g$ 5).
The approach detects underestimation of the wage gap by models that ignore sample selection.

Sensitivity analysis delivers explicit robustness values: overturning the wage gap would require unobserved confounding (partial $g$ 6), implying robustness to substantial levels of hidden selection bias.

6. Methodological Implications and Extensions

ForestRiesz provides a unified and robust estimator for general linear functionals—including but not limited to average treatment effects and average marginal effects—in the presence of complex sampling, high-dimensional covariates, and selective outcome observability. The method exploits the structure of the Riesz representer to automate debiasing and sidestep tuning-sensitive propensity or density estimation. This suggests ForestRiesz is particularly well-suited for finite samples, ill-posed inverse problems, and any context where orthogonality and stability are essential (Chernozhukov et al., 2021, Bjelac et al., 13 Jan 2026).

The integration of locally linear random forests for Riesz learning, coupled with doubly-robust cross-fitting and explicit influence-function-based sensitivity analysis, makes ForestRiesz a comprehensive tool for practitioners handling bias, regularization, and selection in modern causal inference frameworks.

Markdown Report Issue Upgrade to Chat

References (2)

RieszNet and ForestRiesz: Automatic Debiased Machine Learning with Neural Nets and Random Forests (2021)

Automatic debiased machine learning and sensitivity analysis for sample selection models (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ForestRiesz.