On the Equivalence between Neyman Orthogonality and Pathwise Differentiability

Published 16 Mar 2026 in stat.ME and math.ST | (2603.15817v1)

Abstract: It has been frequently observed that Neyman orthogonality, the central device underlying double/debiased machine learning (Chernozhukov et al., 2018), and pathwise differentiability, a cornerstone concept from semiparametric theory, often lead to the same debiased estimators in practice. Despite the widespread adoption of both ideas, the precise nature of this equivalence has remained elusive, with the two concepts having been developed in largely separate traditions. In this work, we revisit the semiparametric framework of van der Laan and Robins (2003) and identify an implicit regularity assumption on the relationship between target and nuisance parameters -- a local product structure -- that allows us to establish a formal equivalence between Neyman orthogonality and pathwise differentiability. We demonstrate that the two directions of this equivalence impose fundamentally different structural requirements, and illustrate the theory through a concrete example of estimating the average treatment effect. This helps clarify the relationship between these two foundational frameworks and provides a useful reference for practitioners working at their intersection.

Abstract PDF Upgrade to Chat

Summary

The paper establishes that correct Neyman orthogonal estimators yield influence functions meeting the pathwise differentiability criterion.
It employs dense tangent space and smoothness assumptions for the forward implication without requiring a local product structure.
The reverse direction underlines that pathwise differentiability entails Neyman orthogonality if an independent local product structure exists.

Equivalence of Neyman Orthogonality and Pathwise Differentiability

Introduction

The investigation addresses a foundational question in modern semiparametric statistics and causal inference: the precise relationship between Neyman orthogonality—a central property in @@@@1@@@@ (DML)—and pathwise differentiability—a cornerstone in classical semiparametric efficiency theory. Though both concepts routinely appear to yield identical estimators (notably influence function-based constructions), they were developed in disparate traditions, and a formal, general equivalence remained unarticulated. This work bridges that gap, offering a rigorous framework that characterizes the equivalence and exposes the distinct structural assumptions required for each direction of the implication.

Background and Formulations

Neyman orthogonality, as instantiated by Chernozhukov et al. [2018], stipulates that the expected value of an estimating function is first-order insensitive to perturbations of nuisance parameters. Pathwise differentiability, as formalized by van der Vaart [1998] and van der Laan and Robins [2003], ensures that target parameters admit first-order expansions along regular statistical submodels and are thus equipped with influence functions.

The analytic distinction lies in parameterization: pathwise differentiability conceptualizes tangent directions independent of explicit nuisance parameterizations, while Neyman orthogonality is defined with explicit distinguishing of target and nuisance parameters and depends on their coordinate structure.

A critical insight is the necessity of a "local product structure," a regularity condition that allows independent (first-order) variation of target and nuisance parameters along sufficiently smooth (quadratic mean differentiable) paths in the statistical model. This product structure underpins the equivalence and is isolated as an explicit assumption, filling a gap in previous literature.

Main Theoretical Results

Forward Direction: Neyman Orthogonality Implies Pathwise Differentiability

The forward implication demonstrates that a correctly specified, Neyman orthogonal estimating function with a non-degenerate Jacobian induces influence functions satisfying the pathwise differentiability criterion. This direction is notably agnostic to the local product structure; it suffices to have a dense class of regular submodels whose induced coordinate paths are differentiable, and to assume certain smoothness and boundedness properties. The authors rigorously formalize this in Theorem 1, using an $L_2$ -chain rule and a Hellinger-Lipschitz property to propagate the derivative representation from regular submodels with bounded scores to the full tangent space.

Key numerical assertion: The influence function is given by $-\mathbf{G}^{-1} m(Z; \beta_0, \eta_0)$ , with $\mathbf{G} = \mathbb{E}_0[\partial_\beta m(Z; \beta_0, \eta_0)] \neq 0$ , and correct specification plus orthogonality ensures the estimator's sensitivity to nuisance parameter estimation error is of second-order.

Reverse Direction: Pathwise Differentiability Implies Neyman Orthogonality

The reverse implication is more structurally demanding. It requires the existence of submodels that can independently move either the target or nuisance parameter—i.e., the local product structure. Under this condition, if an estimating function evaluated at the truth aligns with the influence function, then Neyman orthogonality (vanishing first-order derivative with respect to nuisance) is ensured, and the estimator’s sensitivity to the target parameter is exactly -1. Theorem 2 formalizes this, making explicit the coordinate geometry needed to tie together the analytic and geometric perspectives, and clarifying the normalization ubiquitous in semiparametric influence function formulas.

Structural Comparison and Necessity of Assumptions

The analysis carefully distinguishes between the requirements: forward direction leverages dense tangent space paths and smoothness, while the reverse direction fundamentally depends on being able to construct “coordinate” regular submodels—absent this, the equivalence can fail, as when the target is a deterministic function of the nuisance (eliminating possible independent perturbations).

The product structure required here is set-theoretically weaker than full local variation independence (the product neighborhood condition in the attainable parameter set), which in previous literature was often invoked tacitly but not separated from analytic regularity.

Illustrative Example: Average Treatment Effect

The paper provides a detailed worked example for the average treatment effect (ATE) in a nonparametric causal inference model, verifying all regularity, smoothness, and product structure assumptions. Explicit formulas for submodels, tangent directions, and influence functions are constructed. All theoretical conditions for both implications are checked, demonstrating practical attainability of the framework even in complex settings with high-dimensional function-valued nuisances.

Implications and Future Research Directions

This work unifies DML and classical semiparametric estimation under a precise equivalence, providing clear structural conditions and a framework for verifying when and how efficient influence function-based estimators coincide with solutions to Neyman orthogonal moment equations. This has direct implications for constructing robust, bias-reduced estimators in cross-disciplinary settings spanning econometrics, biostatistics, and machine learning.

However, practical deployment demands careful diligence—verifying local product structure or constructing appropriate submodels can be nontrivial for models with functional constraints or implicit parameter definitions. Systematic methods for certifying these conditions or extending the results to nonsmooth parameters are important open directions.

Furthermore, the explicit separation of analytic and set-theoretic requirements delineates the scope of current semiparametric efficiency theory and provides a foundation for advancing inference in modern settings, such as those involving non-Euclidean parameters or adversarial nuisances.

Conclusion

By rigorously establishing the equivalence between Neyman orthogonality and pathwise differentiability in nonparametric regular models—detailing the necessary regularity and product structure assumptions—this work clarifies a longstanding ambiguity regarding influence function-based estimation. The results synthesize and unify influential lines in modern causal inference and semiparametric statistics, supporting the principled construction and interpretation of robust, machine-learning-aided estimators, and pointing to substantive opportunities for theoretical and methodological advancement in high-dimensional and structured statistical settings.

Reference:

"On the Equivalence between Neyman Orthogonality and Pathwise Differentiability" (2603.15817)

Markdown Report Issue

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Explain it Like I'm 14

What this paper is about

This paper connects two big ideas used to make good estimates from messy, real-world data:

Neyman orthogonality (a trick used in “double/debiased machine learning” to reduce bias from machine‑learning steps), and
Pathwise differentiability (a core idea in semiparametric statistics that leads to “influence functions,” the blueprints for best‑possible estimators).

People noticed that both ideas often lead to the same “debiased” formulas in practice (for example, for estimating the average effect of a treatment). The authors explain exactly why and when these two ideas are actually saying the same thing.

The big questions the authors ask

When do Neyman orthogonality and pathwise differentiability agree?
What extra conditions are needed in each direction of the “if and only if” statement?
How can we make this clear using a common example: estimating the average treatment effect?

Key ideas in everyday language

Think of estimation as adjusting two knobs:

The target knob: the number you really want (like the average treatment effect).
The nuisance knob: extra “helper” quantities you must estimate (like how likely someone is to get a treatment, or their expected outcome), which can be complicated and often learned by machine learning.

Two viewpoints:

Neyman orthogonality says: “If I wiggle the nuisance knob a tiny bit, my measuring tool barely moves at first.” That first‑order insensitivity makes your final estimate robust to small errors in the nuisance estimates.
Pathwise differentiability says: “As I move through nearby ‘possible worlds’ (slightly changing the data‑generating process), the target changes smoothly, and there’s a special function (the influence function) that tells me the exact first‑order change.” That function is the recipe for building efficient, debiased estimators.

A crucial extra condition the authors identify is like making sure each knob can be adjusted on its own, at least a tiny bit, along a smooth path. They call this a local product structure: you can move the target slightly while holding the nuisance still, and you can move the nuisance slightly while holding the target still, in a smooth way.

How the authors approached the problem

The paper revisits classic semiparametric theory and adds a missing, but simple, regularity condition:

Local product structure: a “tiny, smooth” way to vary the target and nuisance independently.

Then they prove two complementary results:

From Neyman orthogonality to pathwise differentiability:

If your estimating equation is correctly set up, is smooth, and is Neyman orthogonal (insensitive to nuisance wiggles), then it automatically gives you an influence function. In other words, your method aligns with the semiparametric “best practice” view. This direction does not need the product-structure assumption.

From pathwise differentiability to Neyman orthogonality:

If your target has an influence function and your estimating equation equals that influence function at the truth, then your estimating equation must be Neyman orthogonal—provided you can adjust the two knobs independently in that smooth way (the local product structure). This direction does need the product-structure assumption.

They walk through the math carefully, using standard tools that track how expectations change along smooth paths of nearby “possible worlds.” They also illustrate the ideas with a concrete example: estimating the average treatment effect, where both approaches produce the well-known augmented inverse probability weighted (AIPW) estimator.

What they found and why it matters

Main findings:

Formal equivalence: Under mild, practical conditions, Neyman orthogonality and pathwise differentiability line up—they lead to the same debiased estimators.
Asymmetry in assumptions:
- Going from Neyman orthogonality to pathwise differentiability is relatively easy and doesn’t require the “independent knobs” assumption.
- Going from pathwise differentiability to Neyman orthogonality requires the local product structure (the ability to vary target and nuisance independently to first order).
A built‑in “−1” sensitivity: When an estimating equation matches the influence function, its average responds to changes in the target at a precise rate of −1. This helps ensure the estimating equation identifies the right target value cleanly.

Why this is important:

Clarity for practitioners: If you design a Neyman‑orthogonal moment condition, you’re essentially using the influence function, meaning you’re on track for efficient, debiased estimation—even when you estimate nuisances with machine learning.
Stronger foundations: It ties together two powerful traditions—modern debiased machine learning and classical semiparametric theory—so methods from both camps are seen as two sides of the same coin.

What this means going forward

Better guidance: Researchers and analysts can confidently use either perspective (Neyman orthogonality or influence functions) knowing when they agree and what extra conditions are needed.
Practical checks: In new problems, it’s helpful to verify the local product structure (can you nudge target and nuisance separately, at least infinitesimally?) and basic smoothness. This can be challenging in some complex models, but it’s a clear checklist.
Broad applicability: The results cover many common settings in causal inference and beyond, like estimating average treatment effects, where these tools are widely used.

In short, the paper shows that the “don’t care about nuisance wiggles” trick (Neyman orthogonality) and the “smooth change with an influence function” view (pathwise differentiability) are essentially the same—so long as you can turn the two knobs independently in a smooth way. This unifies two major approaches to building reliable, debiased estimators in modern statistics.

View Paper Prompt View All Prompts

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a concise list of unresolved issues and concrete directions for future work that arise from the paper’s assumptions, scope, and proofs.

Constructive verification of local product structure: Develop general, checkable sufficient conditions and practical recipes for building QMD coordinate submodels that perturb β and η independently, especially in constrained models (e.g., shape constraints, positivity constraints, bounded support, monotonicity).
Beyond nonparametric models: Extend the equivalence to semiparametric models with restricted tangent spaces where influence functions are non-unique; specify how orthogonality should be defined relative to the efficient influence function (projection onto the tangent space) and what normalization replaces G = −1.
Handling functionals defined implicitly: Provide tools to verify local product structure and differentiability when η or β are defined via solutions to integral/estimating equations or PDEs (e.g., nuisance components estimated as roots of moment conditions).
Relaxing bounded score assumptions: Remove or weaken the boundedness of scores used in Lemmas and Assumption 13; replace with minimal moment or tail conditions that still enable differentiation of expectations and chain-rule arguments.
Non-smooth functionals: Replace Fréchet differentiability of m and Hellinger-Lipschitz of β with weaker (e.g., directional/Gâteaux, Hadamard) differentiability frameworks to cover kinks/discontinuities (e.g., quantiles, thresholded risks, maxima) while preserving the equivalence.
Alternatives to Hellinger Lipschitz: Identify weaker continuity/regularity conditions (e.g., in total variation, χ², or Wasserstein metrics) that still allow extending the derivative identity from a dense score class to all scores, replacing Assumption 10.
Cases where β factors through η: Provide a systematic characterization of when the reverse implication fails (Remark 5) and explore reparameterizations or generalized orthogonality notions that recover a usable equivalence (or prove impossibility results).
Vector- and Hilbert-valued targets: Formalize the multi-dimensional/Jacobian version of both directions (including normalization G = −I), and address operator-valued or infinite-dimensional β using the Hilbert-space framework (beyond citing Luedtke & Chung).
Dependence and non-i.i.d. data: Extend the framework beyond i.i.d. models to time series, clustered, network, or adaptive designs, where QMD paths and tangent spaces require different constructions.
Non-dominated or support-changing models: Address settings where a common dominating measure may not exist or where support changes along paths (e.g., mixture/threshold models), and redesign submodel constructions or equivalence statements accordingly.
Approximate orthogonality and misspecification: Analyze how the equivalence degrades under moment misspecification or when Neyman orthogonality holds only approximately; quantify bias terms and give robustness guarantees for practical DML implementations.
Minimal regularity on m: Identify the weakest conditions on the map (β, η) ↦ m(·; β, η) to justify the L2 chain rule (Assumption 3), including alternatives based on dominated convergence or local bracketing that cover commonly used, non-smooth estimating functions.
Achievability of nuisance directions: Characterize when every admissible h ∈ H can be realized as d/dt η(Pt)|t=0 via a regular submodel (Assumption 1), and give sufficient conditions on H (e.g., density of representable directions) in common function spaces.
Invariance to reparameterization of η: Study how Neyman orthogonality depends on the chosen nuisance parameterization and norm on V; provide reparameterization-invariant formulations or guidance for choosing η to satisfy product structure.
Finite-sample and second-order behavior: Connect the first-order equivalence to finite-sample performance and second-order remainder terms in DML (e.g., when cross-fitting/regularization is used), and characterize how violations of assumptions impact bias and variance.
Expanded catalog of examples: Beyond ATE, provide worked constructions verifying all assumptions in more complex semiparametric problems (e.g., censored survival, instrumental variables, dynamic treatment regimes, measurement error), including explicit coordinate submodels.
Unbounded influence functions/heavy tails: Extend the equivalence to cases where the efficient influence function is not square-integrable or exhibits heavy tails, possibly requiring robust norms (e.g., L1, Orlicz) and modified orthogonality conditions.
Alternative extension route avoiding Hellinger Lipschitz: Develop direct approximation arguments to pass from dense sets of bounded scores to general scores without Assumption 10 (e.g., via perturbation bounds on pathwise derivatives).

View Paper Prompt View All Prompts

Practical Applications

Immediate Applications

The following use cases can be deployed now by leveraging the paper’s formal equivalence between Neyman orthogonality and pathwise differentiability, together with its practical diagnostics (e.g., the “−1” normalization and nuisance insensitivity) for building and validating debiased estimators.

Unified estimator design and validation for causal inference pipelines
- Sectors: software, tech/product analytics, healthcare, economics/finance, public policy
- What to do:
- When building DML/AIPW/TMLE-style estimators, treat influence-function (IF) constructions and Neyman-orthogonal moments as interchangeable design choices.
- Add unit tests that check:
- Orthogonality: numerically approximate the gradient of the empirical mean of the moment function with respect to each nuisance prediction and verify it is near zero at the truth (or high-quality estimates).
- “−1” normalization: verify that the empirical derivative of the expected moment with respect to the target parameter is approximately −1 at the solution.
- Reuse existing EIFs to define orthogonal moments (or vice versa) to shorten estimator development time.
- Tools/workflows: integrate checks into Python/R pipelines (e.g., DoubleML, econml, grf, tmle3/tlverse, causalml, DoWhy); use cross-fitting and sample-splitting.
- Assumptions/dependencies: correct local specification of the moment condition at the truth; nondegenerate Jacobian in the target dimension; mild smoothness (Fréchet differentiability) of the moment map; adequate cross-fitting to reduce overfitting; data approximately i.i.d.
Robust ATE and policy-effect estimation with ML nuisances
- Sectors: healthcare (treatment effects), public policy (program evaluation), tech (A/B tests), marketing (uplift), finance (event studies)
- What to do:
- Implement AIPW/DML/TMLE estimators with flexible ML for propensity and outcome models.
- Use the equivalence to justify estimator choice and to compute standard errors via the EIF even if the estimator was derived from an orthogonal moment, or to define an orthogonal moment from a known EIF.
- Report diagnostics: orthogonality checks and “−1” normalization to document robustness to nuisance error.
- Tools/workflows: standard causal libraries (DoubleML, econml, grf, tmle3); reproducible cross-fitting templates; pre-analysis plans including orthogonality diagnostics.
- Assumptions/dependencies: overlap/positivity and SUTVA/identifiability conditions for the causal estimand; moment correct at the truth; sufficient sample size for cross-fitting; mild smoothness (Hellinger Lipschitz near truth) for pathwise arguments.
Standard error and confidence interval construction across frameworks
- Sectors: all applied domains using low-dimensional causal parameters with high-dimensional nuisances
- What to do:
- If you start from a Neyman-orthogonal moment, set the influence function to −G⁻¹m(Z; θ₀, η₀) (where G is the derivative of the expected moment in the target), enabling IF-based variance estimation and Wald-type confidence intervals.
- If you start from the EIF, set the moment to m(Z; θ, η) = −EIF(Z; θ, η) and solve E[m] = 0; the equivalence ensures orthogonality and correct curvature.
- Tools/workflows: sandwich/IF-based variance routines; bootstrap with cross-fitting as a robustness check.
- Assumptions/dependencies: nondegenerate Jacobian G; correct local specification; stable nuisance estimation.
Practical diagnostics for model structure and identifiability
- Sectors: academia, industry analytics
- What to do:
- Use the paper’s “local product structure” insight as a checklist item: if the target is a known function of the nuisance (β = g(η)), the reverse direction fails—flag that no orthogonal moment exists that varies β while holding η fixed.
- Incorporate a “structure test” that evaluates whether small perturbations of nuisance predictions change the target-only component of the moment, indicating potential violation of product structure in the chosen parameterization.
- Tools/workflows: numerical finite-difference checks around fitted nuisances; simulation-based sensitivity analysis.
- Assumptions/dependencies: approximations rely on high-quality nuisance fits and sufficient sample size.
Method selection and pedagogy
- Sectors: academia, training programs in healthcare/economics/data science
- What to do:
- Teach and document that AIPW/DML/TMLE estimators coincide under the equivalence; choose derivations that are most transparent for the audience and reuse the same computational core.
- Curate “estimator recipe cards” that list: (i) the EIF, (ii) the Neyman-orthogonal moment, (iii) orthogonality and “−1” checks, and (iv) nuisance-learning guidance.
- Tools/workflows: shared course materials; lab templates for cross-fitting/diagnostics.
- Assumptions/dependencies: none beyond standard semiparametric regularity and identifiability for the chosen estimands.
Off-policy evaluation and recommendation systems
- Sectors: online platforms, ads, recommender systems, ops research
- What to do:
- Use doubly-robust/off-policy evaluation estimators and validate them via orthogonality/EIF equivalence, improving bias control with complex behavior policies and value models.
- Tools/workflows: adapt causal inference tooling to logged bandit/RL settings with cross-fitting; maintain orthogonality tests with respect to estimated propensities and value functions.
- Assumptions/dependencies: overlap in action logging; stable logging policy modelling; correct local specification.

Long-Term Applications

The following rely on further research, scaling, or tooling to operationalize the paper’s structural conditions (e.g., constructing coordinate submodels, handling non-smooth functionals, constrained nuisances).

Automatic orthogonal-moment and influence-function generators (“AutoIF/AutoDML”)
- Sectors: software tooling for statistics/ML, academia
- What it enables:
- Given a user-specified pathwise differentiable functional (or an EIF), automatically produce:
- A Neyman-orthogonal moment with “−1” normalization,
- A plug-in estimator with cross-fitting scaffolding,
- IF-based variance estimators and diagnostics.
- Dependencies: symbolic/differentiable programming over functional inputs; libraries of known EIFs; verification of nondegenerate Jacobian and smoothness; numerical Gateaux derivative approximations.
Submodel constructors and “product-structure provers”
- Sectors: academia, advanced analytics groups
- What it enables:
- Tooling that constructs or certifies coordinate submodels witnessing local product structure (or flags violations) for complex/constrained models (e.g., density-ratio constraints, monotonicity, IV, partial identification).
- Dependencies: advances in semiparametric geometry; repositories of regular (QMD) submodels; problem-specific constraints.
Extending to non-smooth targets and constrained nuisances
- Sectors: econometrics, biostatistics, safety-critical analytics
- What it enables:
- Robust orthogonalization and IF constructions for non-smooth functionals (e.g., quantiles, maxima) or for nuisance spaces with shape/monotonicity/fairness constraints where standard Fréchet smoothness fails.
- Dependencies: generalized (Hadamard/epi) differentiability frameworks; new estimation theory bridging orthogonality with non-smooth analysis.
Regulatory standards and audits for ML-based causal inference
- Sectors: healthcare, finance, public policy
- What it enables:
- Audit checklists and certification that submitted analyses satisfy orthogonality/pathwise differentiability conditions; standardized reporting of “−1” normalization and nuisance insensitivity diagnostics to reduce bias from ML nuisance estimation.
- Dependencies: consensus guidelines; reference implementations; simulation testbeds.
Domain-specific toolkits built on the equivalence
- Sectors:
- Healthcare: EMR-driven ATE/ATT pipelines with validated orthogonality, for comparative effectiveness and safety monitoring.
- Finance: event-study and treatment-effect estimators with robust ML nuisances and IF-based uncertainty.
- Education/policy: program evaluation dashboards with embedded diagnostics.
- Dependencies: high-quality nuisance learners; data governance/identifiability; integration with existing analytics platforms.
Integration with differentiable programming and AutoML
- Sectors: ML platforms, MLOps
- What it enables:
- Jointly train nuisance models within a differentiable pipeline that enforces (or penalizes deviations from) Neyman orthogonality; automatic hyperparameter selection emphasizing orthogonality and stability.
- Dependencies: differentiable estimation stacks; scalable cross-fitting; gradient-based proxy losses for orthogonality.
Robust off-policy evaluation in RL/robotics at scale
- Sectors: robotics, operations research, recommender systems
- What it enables:
- Use the equivalence to design and validate doubly-robust, orthogonal estimators for value/policy improvement with high-dimensional function approximation and complex logging policies.
- Dependencies: logged data quality/overlap; stable value estimators; extensions of the theory to sequential/Markovian settings.

Notes on feasibility across applications:

Immediate applications primarily require adopting cross-fitting, implementing simple numerical diagnostics for orthogonality and “−1” normalization, and reusing known EIFs or orthogonal moments. These rely on standard semiparametric regularity (correct local specification, nondegenerate Jacobian, Fréchet smoothness, mild Lipschitz behavior near the truth).
Long-term applications require new tooling for constructing/validating coordinate submodels (local product structure), extending the theory to non-smooth targets and constrained nuisances, and embedding these diagnostics in scalable software and regulatory processes.

View Paper Prompt View All Prompts

Glossary

Augmented inverse probability weighted estimator: A doubly robust estimator that combines inverse probability weighting with outcome regression to estimate causal effects. "the augmented inverse probability weighted estimator for the average treatment effect"
Average treatment effect: The expected difference in outcomes between treated and untreated groups in a population. "the average treatment effect"
Correct specification: The property that an estimating equation is unbiased at the true parameter values in a neighborhood of the truth. "Assumption 4 (Correct specification)."
Double/debiased machine learning (DML): A framework that uses orthogonal moments and flexible machine learning for nuisance functions to estimate low-dimensional targets with reduced bias. "the double/debiased machine learning (DML) framework of Chernozhukov et al. [2018]"
Efficient influence function (EIF): The unique influence function in the tangent space that achieves the lowest possible asymptotic variance for regular estimators. "This projection is called the efficient influence function (EIF) and is the unique influence function lying in T."
Estimating function: A function of data and parameters used to define moment conditions whose zero sets identify target parameters. "estimating functions of the form m(Z; B,n)"
Fréchet differentiability: A strong notion of differentiability of a map between normed spaces, requiring a uniform linear approximation. "The map (B,n) -> m( .; 3, n) E L2(Po) is Fréchet differentiable at (30, 70)."
Gâteaux derivative: A directional derivative in infinite-dimensional spaces that assesses sensitivity along admissible directions. "the Gâteaux derivative of the expected estimating function with respect to the nuisance parameter n"
Hellinger distance: A metric between probability distributions based on the L2 distance between square root densities. "we know Pt -> Po in Hellinger distance,"
Hellinger Lipschitz: A condition that a parameter changes at most linearly (Lipschitz) with Hellinger distance between distributions. "Assumption 10 (Hellinger Lipschitz)."
Influence function: The functional derivative that captures the first-order effect of small distributional perturbations on a parameter. "Any such y is called an influence function of B at Po or a gradient of the pathwise derivative."
Jacobian (nondegenerate): The derivative (here, scalar) of the expected estimating function with respect to the target parameter that is nonzero, ensuring identifiability. "Assumption 8 (Nondegenerate Jacobian)."
Linear tilt submodel: A simple QMD submodel that perturbs the density multiplicatively by 1 + t g for mean-zero g. "One simple and standard construction is the linear tilt"
Local product structure: A geometric condition ensuring the existence of regular submodels that perturb target and nuisance coordinates independently to first order. "Assumption 1 (Local product structure)."
Local variation independence: A set-theoretic condition that the attainable parameter set contains a product neighborhood, allowing independent variation of target and nuisance values. "Definition 7 (Local Variation Independence)."
L2 chain rule: A differentiation rule that computes the derivative of m(Z; β,η) along regular submodels in the L2(P0) sense. "Lemma 5 (L2 chain rule)."
Moment condition: An identification equation where the expectation of an estimating function equals zero at the true parameter. "encoding a moment condition whose solution at the true nuisance value identifies Bo"
Neyman orthogonality: A robustness condition requiring the Gâteaux derivative of the expected moment with respect to the nuisance to vanish at the truth. "Neyman orthogonality, the central device underlying double/debiased machine learning"
Nonparametric model: A statistical model that places minimal restrictions, typically containing all densities with respect to a dominating measure. "Suppose P is the full nonparametric model (all densities p w.r.t. v)."
Nuisance parameter: An auxiliary, typically high-dimensional function or parameter not of primary interest but necessary for identification. "with respect to the nuisance parameter n"
Nuisance score: A score direction along which the target parameter does not change to first order. "A score s is nuisance if there exists a regular submodel with score s along which 3 is locally constant to first order."
Nuisance tangent space: The closed linear span of nuisance scores, representing directions that do not affect the target to first order. "Definition 3 (Nuisance scores and nuisance tangent space)."
One-step correction: An estimation update that adjusts an initial estimator using the empirical average of an influence function. "as a one-step correction built from the efficient influence function"
Parametric submodel (regular): A smooth one-dimensional path through the model that satisfies quadratic-mean differentiability at the truth. "A regular parametric submodel through P0 is an indexed family {Pt : t E (-E, E)}"
Pathwise differentiability: The property that the derivative of a parameter along any regular submodel exists and equals the inner product of a score with an influence function. "We say B is pathwise differentiable at Po if there exists y E L2(Po) such that for every regular submodel with score s,"
Quadratic-mean differentiability (QMD): A smoothness condition where square root densities are differentiable in L2, enabling score-based local expansions. "The appropriate regularity condition on such paths is quadratic-mean differentiability [van der Vaart, 1998]."
Regular (QMD) submodel: A parametric submodel through the truth that satisfies the QMD condition, possessing a well-defined score. "Definition 1 (Regular (QMD) submodel and score)."
Score (of a submodel): The L2(P0) function that characterizes the first-order change in the model along a regular submodel. "The function s is the score of the submodel at 0."
Tangent space: The closed linear span of all scores of regular submodels, capturing all first-order perturbation directions. "Definition 2 (Tangent space). The (full) tangent space is T := span(S)"

On the Equivalence between Neyman Orthogonality and Pathwise Differentiability

Summary

Equivalence of Neyman Orthogonality and Pathwise Differentiability

Introduction

Background and Formulations

Main Theoretical Results

Forward Direction: Neyman Orthogonality Implies Pathwise Differentiability

Reverse Direction: Pathwise Differentiability Implies Neyman Orthogonality

Structural Comparison and Necessity of Assumptions

Illustrative Example: Average Treatment Effect

Implications and Future Research Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

What this paper is about

The big questions the authors ask

Key ideas in everyday language

How the authors approached the problem

What they found and why it matters

What this means going forward

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Practical Applications

Immediate Applications

Long-Term Applications

Glossary

Open Problems

Continue Learning

Authors (3)

Collections

Tweets

On the Equivalence between Neyman Orthogonality and Pathwise Differentiability

Summary

Equivalence of Neyman Orthogonality and Pathwise Differentiability

Introduction

Background and Formulations

Main Theoretical Results

Forward Direction: Neyman Orthogonality Implies Pathwise Differentiability

Reverse Direction: Pathwise Differentiability Implies Neyman Orthogonality

Structural Comparison and Necessity of Assumptions

Illustrative Example: Average Treatment Effect

Implications and Future Research Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

What this paper is about

The big questions the authors ask

Key ideas in everyday language

How the authors approached the problem

What they found and why it matters

What this means going forward

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Practical Applications

Immediate Applications

Long-Term Applications

Glossary

Open Problems

Continue Learning

Related Papers

Authors (3)

Collections

Tweets