Papers
Topics
Authors
Recent
Search
2000 character limit reached

RieszNet: Dual Neural Architectures

Updated 9 February 2026
  • RieszNet is a dual approach that includes a multitask network for automatic debiasing and Riesz representer estimation, yielding doubly robust causal effect estimators.
  • The image-focused RieszNet employs first- and second-order Riesz transforms to achieve provable scale equivariance, excelling in tasks like crack segmentation.
  • Both variants leverage the analytical properties of the Riesz transform to enhance statistical efficiency, orthogonality, and generalization in complex, high-dimensional settings.

RieszNet denotes two distinct but influential architectures in neural network research: (1) a multitask neural network used for automatic debiasing and Riesz representer estimation in statistical machine learning and causal inference (Chernozhukov et al., 2021, Hines et al., 2024, Frees et al., 25 Aug 2025), and (2) a scale-invariant convolutional neural network designed for image analysis, leveraging the Riesz transform’s mathematical properties to achieve scale equivariance in a single network pass (Barisin et al., 2023, Barisin et al., 30 Jan 2025). Both approaches exploit the analytic structure of the Riesz transform or representation to encode theoretically desirable properties—orthogonality and statistical efficiency in the first line, provable scale equivariance in the second.

1. Mathematical Foundations

1.1 Riesz Representer in Statistical Learning

The Riesz representer arises from the Riesz representation theorem, central to estimation of linear functionals on Hilbert spaces. For data W=(Y,Z)W=(Y,Z), given an outcome regression function g0(z)=E[YZ=z]g_0(z) = \mathbb{E}[Y|Z=z] and a linear target functional

ψ0=E[m(W;g0)],\psi_0 = \mathbb{E}[m(W;g_0)],

there exists a unique function α0(Z)\alpha_0(Z) such that

ψ0=E[α0(Z)g0(Z)].\psi_0 = \mathbb{E}[\alpha_0(Z)g_0(Z)].

Learning α0\alpha_0 transforms complex functionals into tractable inner products, crucial for constructing unbiased (doubly robust) estimators (Chernozhukov et al., 2021, Hines et al., 2024, Frees et al., 25 Aug 2025).

1.2 Riesz Transform in Signal Processing

For a signal ff in L2(Rd)L_2(\mathbb{R}^d), the first-order Riesz transform Rj\mathcal{R}_j (spatial domain operator) is defined by

Rj[f]^(ω)=iωjωf^(ω),\widehat{\mathcal{R}_j[f]}(\omega) = -i \frac{\omega_j}{\lVert\omega\rVert} \hat f(\omega),

where f^\hat f is the Fourier transform. Key properties critical to RieszNet for images include:

  • Exact scale equivariance: for the dilation operator La[f](x)=f(x/a)L_a[f](x) = f(x/a),

Rj[La[f]](x)=La[Rj[f]](x).\mathcal{R}_j[L_a[f]](x) = L_a[\mathcal{R}_j[f]](x).

2. RieszNet for Automatic Debiasing and Causal Effect Inference

RieszNet’s multitask neural network implementation addresses bias in plug-in estimation of linear functionals in high-dimensional settings. Specifically, it learns a joint representation for both the outcome regression g(Z)g(Z) and the Riesz representer α(Z)\alpha(Z).

2.1 Architecture

  • Shared Backbone: Typically multilayer perceptrons or pre-trained LLM encoders (e.g., DistilBERT for text).
  • Two Heads: One head outputs the regression g(Z)E[YZ]g(Z)\approx\mathbb{E}[Y|Z], the second outputs α(Z)α0(Z)\alpha(Z)\approx\alpha_0(Z).
  • Optional: Additional heads or backbone variants (frozen/unfrozen for stability) (Frees et al., 25 Aug 2025).

2.2 Losses and Estimation

The core loss jointly penalizes:

  • Prediction error of g(Z)g(Z) (cross-entropy or squared loss).
  • Riesz MSE loss for α(Z)\alpha(Z):

LRiesz=E[(α(Z)α0(Z))2]E[2m(W;α)+α(Z)2]L_\text{Riesz} = \mathbb{E}[(\alpha(Z)-\alpha_0(Z))^2] \propto \mathbb{E}\big[-2m(W;\alpha) + \alpha(Z)^2\big]

  • Regularization and (optionally) causal-effect matching penalties.

The final estimator for the linear functional (e.g., ATE) is the doubly robust (DR) form: ψ^DR=1ni=1n[m(Wi;g)+α(Zi)(Yig(Zi))]\widehat\psi_\mathrm{DR} = \frac{1}{n} \sum_{i=1}^n \left[m(W_i;g) + \alpha(Z_i)(Y_i-g(Z_i))\right] which is Neyman orthogonal and possesses double robustness (Chernozhukov et al., 2021, Hines et al., 2024, Frees et al., 25 Aug 2025).

Empirical Performance

CausalSent (Frees et al., 25 Aug 2025), a RieszNet instantiation, achieved 2–3× reduction in mean absolute error of effect estimates relative to propensity-based methods on semi-synthetic IMDB sentiment data. Ensembling over seeds/hyperparameters further stabilizes and improves estimation.

3. RieszNet for Scale-Invariant Image Segmentation

A separate line of work introduces RieszNet as a convolutional neural network with inherent scale equivariance for detection and segmentation in images and volumes, notably for cracks in concrete (Barisin et al., 2023, Barisin et al., 30 Jan 2025).

3.1 Architecture

  • Riesz Layer: Replaces spatial convolutions with linear combinations of first- and second-order Riesz transforms of each feature map.
  • No Pooling: Receptive field expands via repeated Riesz transforms, obviating spatial downsampling.
  • BatchNorm + ReLU: Used after each Riesz layer; both operations preserve scale equivariance.
  • Minimal Parameter Count: Example: 3D RieszNet with ~7,000 parameters vs. 2M for 3D U-Net (Barisin et al., 30 Jan 2025).

3.2 Theoretical Guarantee

If the input is scaled by any factor a>0a>0, the output is simply the scaled response of the original network: RieszNet{f(/a)}=(RieszNet{f})(/a).\text{RieszNet}\{f(\cdot/a)\} = \big(\text{RieszNet}\{f\}\big)(\cdot/a). This property holds for arbitrary compositions of RieszNet layers with scale-commuting nonlinearities and normalization (Barisin et al., 2023, Barisin et al., 30 Jan 2025).

3.3 Applications and Results

  • Crack Segmentation: Achieves Dice coefficients 0.90–0.96 across a wide range of crack widths, greatly surpassing standard U-Net generalization outside the training scale.
  • MNIST Large Scale: RieszNet trained at a single scale generalizes well to unseen scales (accuracy ≈98.5% for scales [0.5,2.0], >80% even at scale 8 with simple padding) (Barisin et al., 2023).

4. Training Protocols and Implementation

4.1 Causal Inference (Automatic Debiasing)

  • Optimization with Adam gradient methods.
  • Losses jointly minimized over regression and Riesz representer branches, with empirical risk over minibatch distributions.
  • Early stopping and regularization crucial to prevent Riesz-head instability, especially due to the unboundedness of the Riesz MSE loss (Hines et al., 2024, Frees et al., 25 Aug 2025).
  • Ensemble multiple models for MAE and bias reduction (Frees et al., 25 Aug 2025).

4.2 Scale-Invariant Perception

  • Generated semi-synthetic training data using fractional Brownian surfaces or Voronoi minimum-weight surfaces, with precise crack masks for supervised training (Barisin et al., 30 Jan 2025).
  • Augmentation with random rotations, zooms, blurring, and gray-value perturbations.
  • Weighted binary cross-entropy to address class imbalance.
  • Training feasible on limited annotated data due to low parameter count and inbuilt scale generalization (Barisin et al., 2023, Barisin et al., 30 Jan 2025).

5. Extensions and Variants

5.1 Moment-Constrained RieszNet

Moment-constrained learning further stabilizes the Riesz representer, introducing explicit constraints to the empirical moment of the learned Riesz functional, greatly improving robustness to hyperparameter settings (Hines et al., 2024).

5.2 Domain-Specific Adaptations

  • Causal NLP: RieszNet-style architectures (CausalSent) enable interpretable, doubly robust treatment effect estimation at scale with text models, isolating the causal impact of tokens through the learned α(Z)\alpha(Z) (Frees et al., 25 Aug 2025).
  • Fiber-Reinforced Concrete: RieszNet’s architecture is theoretically advantageous for structures with complex backgrounds; results suggest adaption to fiber types requires minimal calibration (Barisin et al., 30 Jan 2025).

6. Comparative Assessment and Outlook

RieszNet-type architectures unify practical multitask neural design with rigorous mathematical properties—Neyman orthogonality, double robustness, and scale equivariance. In causal effect estimation, they outperform propensity-based and functional regression competitors in finite-sample and high-dimensional settings, both in simulation and case studies (e.g., impact of "love" on movie review sentiment, establishing a +2.9% causal effect substantially below naive association) (Frees et al., 25 Aug 2025).

In perception tasks, RieszNet achieves parameter efficiency and provable generalization beyond training distribution scales—segmenting real and synthetic cracks, or classifying digits at unknown magnifications—without multi-scale pyramids or scale augmentation (Barisin et al., 2023, Barisin et al., 30 Jan 2025).

Embeddings of RieszNet into group equivariant or moment-constrained frameworks offer promising future improvements. Potential limitations include lack of rotational equivariance (in image RieszNets) and, for Riesz-heads, the risk of instability if unconstrained or improperly regularized. Generalization to entirely novel domains (backgrounds, aggregate mixes, fiber types) and adaptation to 3D/4D imaging remain active directions (Barisin et al., 2023, Barisin et al., 30 Jan 2025, Hines et al., 2024).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RieszNet.