Noise2Noise: Self-Supervised Denoising

Updated 21 January 2026

Noise2Noise is a self-supervised denoising methodology that recovers clean signals from independent noisy pairs without needing ground-truth data.
It leverages versatile loss functions such as L2, L1, and annealed L0 to adapt effectively to various noise types in domains like imaging, speech, and point clouds.
Empirical performance indicates that Noise2Noise consistently matches or surpasses traditional supervised approaches, even under complex noise conditions.

The Noise2Noise (N2N) methodology is a self-supervised learning paradigm for denoising and image restoration wherein neural network models are trained exclusively on pairs of independently corrupted observations, entirely bypassing the need for clean, ground-truth reference data. Grounded in basic statistical reasoning, N2N enables high-performance restoration in domains as varied as natural image denoising, speech enhancement, medical imaging, point cloud processing, and scientific measurement, and frequently matches or surpasses traditional supervised training in accuracy and robustness. Since its introduction by Lehtinen et al. in 2018, N2N has served both as a foundational tool for efficient noisy-data-only denoiser training and as a building block for extensions accommodating domain adaptation, non-additive noise, non-linear processing, and correlated noise settings (Lehtinen et al., 2018, Zhussip et al., 2019, Tinits et al., 31 Dec 2025).

1. Theoretical Foundation and Core Principle

The essential premise of Noise2Noise is that, for a wide variety of noise processes, the clean signal can be statistically recovered by optimizing a prediction objective where both the input and the supervision ("target") are corrupted versions of the unknown ground truth. For a clean signal $y$ and two independent noisy observations $x_1, x_2$ , the standard setting is:

$x_1 = y + n_1, \quad x_2 = y + n_2$

where $n_1, n_2$ are independent, zero-mean noise processes. Given a denoiser $f_\theta$ , classical supervised training minimizes the loss

$\mathcal{L}_{\text{N2C}}(\theta) = \mathbb{E}_{x_1,y} \left[ \|f_\theta(x_1) - y\|^2 \right]$

N2N replaces clean targets with another noisy realization:

$\mathcal{L}_{\text{N2N}}(\theta) = \mathbb{E}_{x_1, x_2} \left[ \|f_\theta(x_1) - x_2\|^2 \right]$

A key result is that for symmetric loss functions (such as L2 for Gaussian-like noise), and under the condition $\mathbb{E}[n_i]=0$ , minimizing $\mathcal{L}_{\text{N2N}}$ produces in expectation the same estimator as minimizing $\mathcal{L}_{\text{N2C}}$ (Lehtinen et al., 2018, Kashyap et al., 2021). This equivalency holds for mean (L2), median (L1), and mode-seeking losses (annealed L0), enabling unbiased estimation in a variety of regime-appropriate settings.

2. Objective Formulations and Loss Functions

A variety of loss functions are employed to address different noise characteristics and restoration desiderata:

L2 loss (mean-seeking):

$x_1, x_2$ 0

Suitable for Gaussian and Poisson-like noise.

L1 loss (median-seeking):

$x_1, x_2$ 1

Appropriate when the noise or corruption yields heavy-tailed or outlier processes, as in random text removal or stubborn impulsive noise.

Annealed L0 mode-seeking loss:

$x_1, x_2$ 2

Effective for extreme impulse corruptions.

Composite or robust/HDR losses: For high dynamic range (HDR) data, normalized losses such as

$x_1, x_2$ 3

guard against outlier amplification (Tinits et al., 31 Dec 2025).

The noise process must satisfy zero-mean and either independence or at least uncorrelatedness between training pairs. For non-linear postprocessing, nontrivial bias can arise from Jensen’s inequality, but theoretically and empirically, this bias is controlled for specific tone-mapping functions and loss choices, as detailed in (Tinits et al., 31 Dec 2025).

3. Training Pipelines and Architectures

Noise2Noise has been instantiated with a broad range of architectures and data regimes, with domain-specific adaptation:

Field	Architectural Paradigm	Notable Implementation Details
Natural Images	U-Net	Encoder-decoder, skip connections (Lehtinen et al., 2018)
Speech Denoising	FCNN, Deep Complex U-Net, TCN/UConv	Raw waveform or STFT, complex convolution (Kashyap et al., 2021, Alamdari et al., 2019, Li et al., 2023)
Medical Imaging	U-Net or Modular U-Net	Neighboring slices, regional weighting (Zhu et al., 2023, Zhou et al., 2024)
Multi-Channel Tomography	ResNet-50 U-Net	2-channel input from adjacent energy/time bins (Zharov et al., 2023)
Point Clouds	Dynamic EdgeConv (DGCNN), Multi-step nets	EMD loss, cross-view consistency (Zhou et al., 29 Oct 2025)
Monte Carlo Denoising	U-Net	HDR-robust and nonlinear loss (Tinits et al., 31 Dec 2025)

Common training elements include the Adam optimizer, data augmentation tailored to modality (e.g., crops, rotations, elastic deformation for imaging), and duplication or synthetic generation of independent noise realizations when only single noisy samples are available (e.g., via patchwise pairing, odd–even subsampling, channel combination, or spatial/temporal subsampling) (Yang et al., 2023, Zharov et al., 2023). Modular architectures, such as repeated U-Net stacking with shared weights, have been used to increase effective receptive field without parameter bloat (Zhu et al., 2023).

4. Extensions, Generalizations, and Limitations

Noise2Noise has prompted numerous extensions addressing both theoretical and practical limitations:

Correlated Noise and Imperfect Ground Truth: Standard N2N requires independent noise; when only correlated pairs or mildly noisy pseudo-ground-truth are available, extended SURE (eSURE) generalizes the N2N objective by introducing a Stein unbiased risk correction term (Zhussip et al., 2019). eSURE converges to N2N in the independent case but remains unbiased under correlation.
Nonlinear Processing: The application of nonlinear transformations (tone-mapping, gamma correction) is generally biased due to mismatch of expectations, but for mappings with low curvature in signal range and under low-noise, practical bias is negligible (Tinits et al., 31 Dec 2025).
Domain Adaptation and Bootstrapping: In domain adaptation for speech enhancement, N2N is embedded within teacher-student remixing pipelines (e.g., Re2Re), where in-domain noisy data is synthesized via pseudo-source separation, and two independently remixed mixtures provide noisy-noisy supervision for the student (Li et al., 2023).
Single-volume or Single-image Self-supervision: Where only a single noisy volume is available, local structure is exploited either by using spatially matched regions across adjacent slices (NS-N2N) or by subdividing signals temporally or spatially to generate quasi-independent pairs (Zhou et al., 2024, Yang et al., 2023).

Key limitations arise when the zero-mean or independence assumption is violated, such as with persistent annotation overlays or correlated readout noise. Biases emerge if the conditional mean/median/mode of the corrupted signal does not revert to the clean signal, unless remedied via architectural or data adaptation (Zhang et al., 2023). N2N is mostly agnostic to explicit noise modeling, but complex or unknown correlations in the real noise may warrant hybrid or augmented training strategies.

5. Empirical Performance and Quantitative Results

Noise2Noise consistently matches or slightly underperforms conventional supervised denoising in standard settings, with negligible gaps under strong noise, and frequently surpasses direct Noise2Clean methods when noise is complex or highly nonstationary. Key quantitative findings include:

Image Denoising (Gaussian, Poisson, Bernoulli): N2N matches supervised models (e.g., RED30 and U-Net) within ±0.02 dB PSNR (Lehtinen et al., 2018).
Speech Enhancement: N2N outperforms supervised counterparts under real-world, nonstationary acoustic backgrounds (PESQ, STOI, SNR improvements up to 0.5 dB or 0.05 absolute) (Kashyap et al., 2021, Alamdari et al., 2019).
Medical Imaging: NS-N2N achieves PSNR up to 40.06 dB and SSIM 0.9374 in CT, nearly matching supervised N2C, outpacing other self-supervised baselines by large margins (Zhou et al., 2024, Zhu et al., 2023).
Monte Carlo Rendering: Nonlinear N2N-trained models reach rMSE and DSSIM values within a small factor of fully supervised equivalents, despite using 512× less reference data (Tinits et al., 31 Dec 2025).
Point Clouds: Unsupervised N2N-matching with EMD loss delivers denoising/upsampling performance competitive with supervised methods (Zhou et al., 29 Oct 2025).

These outcomes are robust across noise types—including Poisson, Gaussian, impulsive, outlier-heavy, and compound noise—provided the mean-independence assumption is sufficiently approximated in practice.

6. Domain-specific Adaptations

N2N is operationalized in diverse modalities via tailored pairwise data constructions and loss adaptations:

Astronomical Imaging: CNN-based N2N denoising accurately recovers flux for Poisson (98.1%) and Gaussian (96.5%) noise under smooth signal regions (Zhang et al., 2022).
Tomographic Multi-channel Imaging: Adjacent energy/time channels approximating the same underlying structure serve as effective noisy-noisy pairs, enabling greater dose reductions (Zharov et al., 2023).
Ultrasound Annotation Removal: Artificial annotation overlays are randomized in position and style to induce the necessary noise properties, supporting unsupervised restoration of clinical frames (Zhang et al., 2023).
Sensor and Accelerometer Data: Periodic or odd–even samplers create pairs appropriate for instrument signal denoising, outperforming conventional smoothing and filtering (Yang et al., 2023).

Generalization to new domains typically requires only the ability to (a) generate or identify approximately independent, zero-mean corruptions, and (b) encode the appropriate loss for the target signal statistic (mean, median, or mode), as deduced from domain structure and noise phenomenology.

7. Impact and Influence on Denoising Research

Noise2Noise has redefined the standard paradigm of supervised denoiser training by removing the need for ground-truth labels in a broad class of noise models, leading to significant practical advantages:

Data Collection Efficiency: The requirement for only noisy pairs dramatically lowers dataset acquisition cost in domains where ground-truth is rare, expensive, or impossible (e.g., microscopy, astronomy, medical imaging).
Algorithmic Simplicity: Most N2N pipelines function independently of explicit noise model estimation, obviating the need for complex prior or likelihood modeling.
Theoretical Generality: N2N and its extensions (e.g., eSURE, non-linear N2N, Re2Re) have unified several strands of noise-robust learning, enabling principled denoiser training even with correlated, nonlinearly transformed, or compounded noise.
Catalyst for Self-Supervised Learning: The methodology has motivated hybrid approaches that combine N2N with alternative self-supervised, data-bootstrapped, or single-volume strategies, particularly in resource-constrained or high-throughput regimes.

A plausible implication is that, as noise processes in practical sensing systems become more complex and less amenable to analytic modeling, self-supervised paradigms like N2N—and their integrated generalizations—will increasingly dominate denoising, enhancement, and restoration pipelines across scientific, industrial, and consumer applications.

References:

"Noise2Noise: Learning Image Restoration without Clean Data" (Lehtinen et al., 2018)
"Extending Stein's unbiased risk estimator to train deep denoisers with correlated pairs of noisy images" (Zhussip et al., 2019)
"Nonlinear Noise2Noise for Efficient Monte Carlo Denoiser Training" (Tinits et al., 31 Dec 2025)
"Improving Deep Speech Denoising by Noisy2Noisy Signal Mapping" (Alamdari et al., 2019)
"Speech Denoising Without Clean Training Data: A Noise2Noise Approach" (Kashyap et al., 2021)
"Self-supervised Noise2noise Method Utilizing Corrupted Images with a Modular Network for LDCT Denoising" (Zhu et al., 2023)
"Neighboring Slice Noise2Noise: Self-Supervised Medical Image Denoising from Single Noisy Image Volume" (Zhou et al., 2024)
"U-CAN: Unsupervised Point Cloud Denoising with Consistency-Aware Noise2Noise Matching" (Zhou et al., 29 Oct 2025)
"Ultrasonic Image's Annotation Removal: A Self-supervised Noise2Noise Approach" (Zhang et al., 2023)
"Remixed2Remixed: Domain adaptation for speech enhancement by Noise2Noise learning with Remixing" (Li et al., 2023)
"Shot Noise Reduction in Radiographic and Tomographic Multi-Channel Imaging with Self-Supervised Deep Learning" (Zharov et al., 2023)
"Noise2NoiseFlow: Realistic Camera Noise Modeling without Clean Images" (Maleky et al., 2022)
"Unsupervised noise reductions for gravitational reference sensors or accelerometers based on Noise2Noise method" (Yang et al., 2023)
"Noise2Astro: Astronomical Image Denoising With Self-Supervised NeuralNetworks" (Zhang et al., 2022)