Compression-Based Denoising

Updated 24 December 2025

Compression-based denoising is a set of techniques that use lossy compression frameworks to remove noise while preserving essential signal structures.
The approach integrates classical transforms and neural autoencoders, applying strategies like feature guidance and latent-space optimization for effective noise reduction.
Practical implementations demonstrate superior rate-distortion performance and robustness, enabling advancements in image, text, and sensor data processing.

Compression-based denoising refers to a family of techniques that leverage lossy compression frameworks—classical or learned—as implicit or explicit denoisers. Whether applied to signals, images, text, or other structured data, these approaches exploit the fact that lossy compressors are inherently constructed to preserve signal structure while discarding unpredictable (and hence, often noisy) details. This class includes both methods that integrate denoising within compression models and methods that use compression transforms “as is” to remove noise, sometimes in the zero-shot setting without supervision. Compression-based denoising has been theoretically rigorized, algorithmically realized in both shallow and deep architectures, and demonstrated to outperform classical pipelines in rate-distortion, robustness, and generalization.

1. Theoretical Foundations and Information-Theoretic Guarantees

The core theoretical insight is that optimal lossy compression at a distortion threshold set to the noise variance necessarily discards noise components of a signal. A classic result is that for i.i.d. Gaussian noise $z^n$ added to a clean signal $x^n$ , the minimal achievable distortion by compressing the noisy signal $y^n = x^n + z^n$ at distortion level $D = \sigma^2$ leads to a reconstruction statistically equivalent to the conditional mean estimator $E[x^n \mid y^n]$ (Zafari et al., 27 Mar 2025). Recent generalizations extend this to arbitrary stationary ergodic sources and general discrete memoryless channels (DMC), showing that by selecting the distortion measure for the lossy compressor to match the negative log of the channel conditional $P_{Z\mid X}$ , the asymptotic loss achieved matches that of sampling two independent draws from the conditional posterior $P_{X_0\mid Z_{-\infty}^\infty}$ , leading to rigorous upper bounds for mean-square and Hamming losses (Song et al., 16 Dec 2025).

For neural compression with entropy constraints, finite-sample generalization error bounds directly relate rate, worst-case distortion, and noise level, establishing that patch-wise neural compression-based maximum-likelihood denoisers approach Bayes-optimal performance with high probability under suitable settings (Zafari et al., 15 Jun 2025). This underpins zero-shot denoising with no explicit external data.

2. Algorithmic Approaches and Architectures

Compression-based denoising algorithms fall into several architectural paradigms:

Classical transforms: Techniques such as PCA or wavelet-based transforms perform denoising by projecting data onto principal subspaces or by thresholding coefficients, exploiting the fact that noise is largely “incompressible” in these bases (Mukherjee et al., 2022).
Vector quantization and multi-layer quantization: Regularized residual quantization (RRQ) applies successive quantization steps, each layer projecting the residual onto codebooks designed from the high-variance “energy” subspace of clean training data, naturally discarding out-of-subspace noise in the test set (Ferdowsi et al., 2017).
Learned autoencoders and neural compressors: End-to-end differentiable codecs, equipped with an explicit or implicit rate-distortion objective and entropy model, are trained either on clean or noisy/clean image pairs, or even on a single noisy image (zero-shot regime). Denoising arises when the reconstruction is tasked to match a noise-free target (Zafari et al., 27 Mar 2025, Brummer et al., 2023, Cheng et al., 2022, Zafari et al., 15 Jun 2025).
Transformers and joint decoding adaptations: Add-on modules such as latent refinement and prompt generators adapt pre-trained Transformer-based image decoders to denoising tasks without needing separate decoders or changes to compressed bitstreams, operating efficiently in the latent space (Chen et al., 2024).
Contrastive learning and multi-scale denoising: Recent advances use contrastive losses to align noisy and clean representations in the encoder feature space and integrate multi-scale, non-linear denoisers (e.g., Self-ONNs) for robustness to diverse noise types (Xie et al., 2024).

3. Joint Compression–Denoising Models

Joint models optimize both rate (bitrate of the compressed representation) and denoising (fidelity to clean signal) objectives in a single end-to-end learning framework. Approaches include:

Feature Guidance Branches: Weight-sharing encoders process both noisy and clean images in parallel during training, using feature-level $\ell_1$ guidance losses to ensure latents align with clean representations, ensuring that the encoded information is “noise aware” (Cheng et al., 2022).
Latent-space scalability: Base layers are optimized to decode denoised signals while enhancement layers enable reconstructing the original, potentially noisy, input. This allows flexible trade-off between denoising and fidelity to original data (Alvar et al., 2022).
Contrastive regularization: Contrastive learning draws denoised features of noisy inputs closer to the clean features, explicitly encouraging the encoder to ignore noise and latch onto genuine signal content (Xie et al., 2024).
Orthogonal features and fractional autoencoders: Input features are compacted into a small energy-concentrating basis (Tchebichef moments, PCA), with subsequent neural encoding, and network weights themselves are aggressively compressed by randomized low-rank factorization (RSVD), with fractional-order gradient updates further enhancing compression/denoising robustness (Nagar et al., 2021).

4. Practical Implementations and Evaluation

Empirical results consistently demonstrate that compression-based denoising delivers competitive or superior performance across domains:

Image denoising: Joint neural compression–denoising codecs surpass cascaded “denoise then compress” or “compress then denoise” pipelines in rate-distortion, yielding up to 80% BD-rate savings compared to sequential approaches, especially under high noise (Alvar et al., 2022, Cheng et al., 2022, Brummer et al., 2023, Xie et al., 2024).
Zero-shot and single-image denoising: Neural compression-based methods trained solely on patches from a single noisy image match or exceed classical baselines (BM3D), with natural entropy constraints preventing overfitting (Zafari et al., 27 Mar 2025, Zafari et al., 15 Jun 2025).
Adversarial robustness: Automated pipelines that intertwine denoising autoencoders and compression—by projecting inputs into low-dimensional codes—demonstrably improve robustness against adversarial perturbations, cut training time, and enable modular “defense infrastructure” design (Mahfuz et al., 2021).
Outlier detection and clustering: PCA interpreted through the “compression ratio” metric not only quantifies denoising effect but also enables unsupervised detection of outliers in high-dimensional data, with downstream gains in clustering accuracy (Mukherjee et al., 2022).
Domain specialization: Compression-based models generalize across noise sources, transfer to microscopy and EEG denoising, and support flexible deployments: instance-level denoising, error-bounded releases of user-generated multimedia, and joint compressed sensing (Zheng et al., 2017, Xue et al., 2017, Ferdowsi et al., 2017).

5. Domain-Specific Variants and Extensions

Compression-based denoising adapts to diverse settings:

Text: Sequence-to-sequence denoising autoencoders, with stochastic “noising” (insertions, permutations), and decoder length constraints induce extractive–abstractive compression for tasks like sentence summarization in a fully unsupervised manner (Févry et al., 2018).
Compressed Sensing: Turbo-CS and its extension D-Turbo-CS exploit arbitrary denoisers as compression-informed projections in iterative linear recovery, robustly handling non-i.i.d. structures where typical AMP algorithms fail (Xue et al., 2017).
Contaminated source coding: In UGC compression, noise-corrupted sources are optimally handled by using denoised references as the surrogate for distortion computation, guiding codecs to avoid allocating excessive rate to irreducible artifacts (Pavez et al., 2022).
Contour and graph denoising: Joint MAP formulations that simultaneously encode rate and denoising likelihood with explicit error models yield significant bit-rate and fidelity improvements over stage-wise pipelines in shape representation (Zheng et al., 2017).

6. Advantages, Limitations, and Future Directions

Advantages:

Natural, theory-grounded regularization enforces consistent denoising without overfitting—even in the absence of ground-truth supervision or large datasets (Zafari et al., 15 Jun 2025, Zafari et al., 27 Mar 2025, Nagar et al., 2021).
Synchronized optimization of rate and fidelity ensures that bit allocation is directed away from noise and toward valuable structured signal.
The joint models provide scalable representations, facilitate outlier detection, and transfer efficiently to related tasks (super-resolution, deblocking, compressed sensing).
Compression-based denoising models are parameter- and compute-efficient, particularly when implemented in the latent or frequency domain, outperforming large sequential pipelines (Chen et al., 2024, Cheng et al., 2022, Brummer et al., 2023).

Limitations and Open Problems:

Performance may degrade when noise characteristics are strongly outside training support or particularly non-stationary; the $\ell_1$ guidance term may over-smooth hard-to-distinguish textures (Cheng et al., 2022, Brummer et al., 2023).
Outlier detection via compression-ratio is sensitive to unbalanced or very small communities; PCA-based methods may miss non-linear structure (Mukherjee et al., 2022).
Designing optimal loss functions matching arbitrary channel statistics or perceptual qualities remains challenging, with current approaches typically focusing on MSE, $\ell_1$ , or SSIM.
The theory indicates a fundamental lower bound in MSE loss: compression-based schemes for general DMCs may at best match the two-sample posterior loss, and do not always attain the indirect rate–distortion minimum (Song et al., 16 Dec 2025).

Future directions include expansion to variable-rate and conditional codecs, adaptation to real sensor/RAW domain noise, integration with multi-task pipelines (denoising, super-resolution, deblurring), improved unsupervised loss selection, and theory-driven architectural advances leveraging the deeper understanding of information-theoretic denoising in high-dimensional, heterogeneous, or structured data regimes.