SCNet: Speckle-Free Ultrathin MMF Imaging
- SCNet is a physics-guided foundation model that integrates a Mixture-of-Experts architecture, wavelet-based frequency decomposition, and curriculum learning to effectively remove speckle noise.
- It employs a material-aware gating network and a Haar DWT to isolate and suppress speckle interference while preserving essential image details in photon-limited MMF imaging.
- SCNet achieves real-time performance with significant compute reduction, outperforming conventional denoising methods in SSIM, PSNR, and RMSE across diverse imaging protocols.
Speckle Clean Network (SCNet) is a physics-guided foundation model designed for universal speckle removal in ultrathin multimode fiber (MMF) imaging. MMF-based endoscopy enables imaging with probe diameters on the scale of human hair, but is fundamentally limited by speckle noise when using a small collection aperture. SCNet integrates a Mixture of Experts (MoE) architecture, material-aware feature routing, and wavelet-based frequency decomposition, combined with curriculum-style optimization that first enforces spectral consistency then spatial fidelity. The model achieves real-time, high-fidelity image recovery from photon-limited, single-fiber measurements without the need for material- or domain-specific retraining, thus decoupling image quality from probe size and enabling practical speckle-free ultrathin endoscopy applications (Zeng et al., 10 Jan 2026).
1. Architecture and Model Components
SCNet adopts a Mixture-of-Experts backbone coupled to a wavelet-U-Net encoder–decoder and a lightweight material-aware gating network (ECCNet). The processing pipeline is as follows:
- Input speckle image is processed by a single-level Haar discrete wavelet transform (DWT), yielding four frequency sub-bands.
- Features pass through a shared shallow encoder, enhancing general representations.
- The material-aware gating network (ECCNet) computes a softmax over five canonical material classes. Each input is routed to a single expert branch, minimizing compute overhead.
- Each expert is a U-Net variant with wavelet-domain attention and learnable skip connections, tailored to a specific material class.
- The network output is reconstructed via inverse DWT, delivering a speckle-cleaned image.
The gating mechanism for expert selection computes probabilities as
where is a low-complexity feature embedding, and only the highest-probability expert is activated.
2. Wavelet-Based Frequency Decomposition
To separate multiplicative speckle from image structure, each input undergoes a Haar DWT,
where encodes low-frequency structure, while capture orthogonal detail bands. The encoder treats these sub-bands as distinct channels, and a channel-attention module applies spectral gating:
where is global average pooling, is ReLU, and is sigmoid activation. This arrangement selectively suppresses frequency channels dominated by speckle and preserves channels containing structural information. This design leverages explicit frequency semantics for effective denoising.
3. Physics-Guided Curriculum Optimization
Training employs a two-stage “coarse-to-fine” curriculum to promote physically meaningful learning:
- Stage 1: Frequency-Domain Consistency—Predictions and ground truth are DWT-decomposed, and a weighted Charbonnier loss is applied on each frequency band:
with higher weights for detail bands (, , ) to ensure early learning in noise-sensitive regions.
- Stage 2: Spatial-Domain Fidelity—Fine-tuning is performed using a hybrid objective:
supplemented by PSNR or SCIM penalties. The training schedule gradually shifts emphasis from spectral to spatial metrics.
This staged optimization ensures initial spectral alignment before enforcing pixel-domain accuracy, addressing the complexity of speckle in MMF imaging.
4. Datasets, Calibration, and Data Protocol
SCNet is validated and trained on diverse datasets spanning plastics (Lego minifigs), paper (USAF targets, text), metal (engraved steel), vegetation (multiple leaf types), and biological tissues (rabbit heart, kidney), covering a total of over 160,000 training images across domains. Cropping (220×220 px) and normalization are applied, with omission of intensity or color augmentations to preserve subtle contrasts, especially in biological tissues.
The physical imaging setup leverages a dual-MMF holographic probe, with 100 µm core fibers and DMD-based focus scanning. The fiber transmission matrix is established via off-axis holography and Lee hologram patterning. Calibration compensates for static aberrations, and stability is maintained through mechanical and thermal controls.
5. Quantitative and Qualitative Performance
Across six test domains, SCNet outperforms reference denoising models (BM3D, NAFNet, Restormer, SwinIR) under controlled, single-fiber acquisition. Performance metrics include SSIM, PSNR, RMSE, and between reconstructions and ground truth. The model resolves 5.66 lp/mm on USAF paper targets (ground-truth matched) and recovers low-contrast structures in biological tissue under challenging photon-limited conditions.
A summary table illustrates comparative performance:
| Model | SSIM ↑ | PSNR ↑ (dB) | RMSE ↓ |
|---|---|---|---|
| SCNet (MSE-Loss) | 0.721 | 26.72 | 13.05 |
| SCNet (CDPO-Loss) | 0.739 | 28.21 | 11.23 |
| SCNet (+CLAHE) | 0.739 | 28.35 | 10.89 |
| SCNet-Distill | 0.700 | 24.18 | 17.75 |
| CGNet | 0.682 | 22.73 | 20.87 |
| NAFNet | 0.675 | 22.70 | 21.02 |
| Restormer | 0.690 | 23.95 | 21.03 |
| SwinIR | 0.688 | 23.36 | 19.44 |
| BM3D | 0.408 | 13.88 | 55.40 |
This demonstrates improved SSIM and PSNR as well as substantial RMSE reduction over comparative baselines.
6. Model Compression and Real-Time Inference
To ensure practical deployment, SCNet incorporates multi-teacher distillation—compressing five expert teachers into a single lightweight student (SCNet-Distill). Each training sample is routed to its domain-specific teacher for soft target supervision. The student minimizes a total loss comprising a Charbonnier + SSIM "hard" loss, plus an "soft" loss between student and teacher predictions (scaled by a temperature, , and annealing parameter ).
Compression yields a reduction in compute requirements from 63.43 to 34.67 GMACs (−45%) and increases inference speed from 35.6 to 60 FPS, while limiting PSNR loss to <4 dB. This balance supports real-time clinical imaging scenarios.
7. Limitations and Prospects
Current limitations include reliance on a dual-fiber probe (280 µm aggregate diameter), exceeding the theoretical minimum for single-fiber endoscopy. Material-routing is pre-trained separately from the main network, breaking global end-to-end differentiability. Dynamic in vivo imaging involving motion or fluid flow requires adaptive calibration not yet fully integrated. Future directions involve:
- Miniaturization via advanced micro-optics or anti-reflection coatings for purely single-fiber operation
- Fully end-to-end trainable joint gating and routing
- Fast, in situ transmission matrix updates for non-static scenes
- Extension of the MoE + wavelet + curriculum optimization paradigm to other coherent imaging modalities, such as optical coherence tomography and photoacoustics
In summary, SCNet leverages physical priors, expert specialization, and curriculum learning to deliver universal, compute-efficient speckle removal for ultrathin endoscopy, opening paths toward speckle-free, high-quality imaging in size-limited and scattering-rich biomedical environments (Zeng et al., 10 Jan 2026).