3D Super-Resolution (3DSR) Overview

Updated 25 February 2026

3D Super-Resolution (3DSR) is a set of techniques that upscale coarse 3D data using computational models and deep learning to recover fine volumetric details.
Key methodologies include model-based inverse methods, 3D convolutional networks, physics-aware regularization, diffusion models, and multi-modal data fusion ensuring 3D consistency.
Applications span biomedical imaging, computer graphics, robotics, and scientific simulation, driving research in high-fidelity, geometrically consistent volumetric data recovery.

Three-dimensional super-resolution (3DSR) refers to computational and learning-based techniques designed to increase the spatial resolution of three-dimensional data, enabling recovery or synthesis of finer volumetric details from coarse, aliased, or incomplete observations. 3DSR is central in biomedical imaging, computer vision, scientific simulation, robotics, and computer graphics, and spans a wide methodological range including model-based inverse problems, deep learning with 3D neural architectures, physics-aware regularization, diffusion models, multi-modal data fusion, and 3D-consistent rendering.

1. Mathematical Formulations and Principles

The foundational problem in 3DSR is to estimate a high-resolution (HR) volume or structure $X_{\mathrm{HR}}$ from observations modeled as

$X_{\mathrm{LR}} = D(X_{\mathrm{HR}}) + \varepsilon$

where $D$ is a composition of 3D decimation (downsampling), blur, noise, or projection, and $\varepsilon$ is measurement noise. In 3D volumetric medical and scientific imaging, $D$ commonly consists of point spread function (PSF) convolution and axis-aligned subsampling (Tuador et al., 2020). In texture, shape, or appearance SR of objects, $D$ may instead model multi-view projection and surface mapping (Li et al., 2019).

The inverse problem is intrinsically ill-posed: fine-scale or out-of-plane information is lost, and solution recovery fundamentally depends on prior information—explicitly via regularization (e.g., 3D total variation (TV) (Pérez-Bueno et al., 2024), sparsity penalties (Stergiopoulou et al., 2021)), or implicitly via neural network parameterization and data-driven learning (Wang et al., 2018, Bi et al., 11 Aug 2025).

In multi-view and 3D-aware image synthesis regimes, the HR prediction must be 3D-consistent across viewpoints, requiring joint reasoning over geometry and multi-view appearance (Zheng et al., 12 Jan 2025, Chen et al., 6 Aug 2025, Ko et al., 2024). For time-varying data, temporal structures and dynamics act as additional axes of correlation and aliasing (Kim et al., 2018, Bi et al., 11 Aug 2025).

2. Core 3DSR Methodologies

2.1. Model-Based and Regularized Inverse Methods

Classical 3DSR leverages analytical formulations and fast optimization:

Frequency-domain closed-form solvers: For Tikhonov-regularized estimation (MAP), 3D decimation and convolution are diagonalized via the Fourier transform, enabling O(N log N) computation for very large volumes (Tuador et al., 2020).
Regularization extensions: 3D TV and anisotropic TV regularization are introduced to recover piecewise-smooth isotropic features and suppress noise artifacts, handled via ADMM and vector shrinkage (Tuador et al., 2020, Pérez-Bueno et al., 2024).
Analytical self-supervision: Where no HR ground-truth exists, self-consistency constraints (e.g., requiring that applying the measurement model $D$ to the network output matches the LR input) combined with TV regularization yield stable, self-supervised volumetric enhancement (Pérez-Bueno et al., 2024).

2.2. Deep 3D Neural Architectures

Neural approaches use end-to-end trainable 3D convolutional networks to model the LR-to-HR mapping:

Residual 3D CNNs: Deep stacks of 3D convolutions (often 10–12 layers), with zero-padding and residual (skip) connections, achieve stable, high-PSNR volumetric SR (Wang et al., 2018). Multi-scale training over mixed upsampling factors allows single-network deployment across diverse degradation scales.
Sub-voxel shuffling (PixelShuffle3D): Spatial upsampling is handled by rearrangement of output channels, improving efficiency and receptive field (Pérez-Bueno et al., 2024).
Domain-specific designs: For spatio-temporal or video SR, 3D CNNs extract spatio-temporal features while preserving temporal coherence via temporal padding and output collapse only at the final layer (Kim et al., 2018).

2.3. Hybrid and Multimodal Pipelines

Advanced SR systems integrate multiple cues or modalities:

Multi-feature fusion: In point cloud and depth imaging (e.g., SPAD histograms), networks exploit intensity images, multi-scale depth cues, and histogram features in a residual U-Net backbone to achieve robust denoising and upsampling (Ruget et al., 2020).
Normal or geometric guidance: SR of textured surfaces is improved by fusing normal maps directly into the deep SR feature pipeline, significantly boosting geometric consistency and PSNR (Li et al., 2019).
Physics-aware 3DSR: Methods such as 3D-COL0RME for MA-TIRF use covariance-domain sparse recovery for lateral SR and physical forward operators for axial inversion, exploiting unique angular mixing in the imaging model (Stergiopoulou et al., 2021).

2.4. Diffusion Models and Large-Scale Learning

Diffusion models and contrastive encoders have become central in high-fidelity and low-data SR regimes:

Diffusion-based volumetric SR: 3D denoising diffusion models with local attention encode fine-scale volumetric features and allow adaptation with minimal HR data by pre-training degradation-aware contrastive encoders and fine-tuning diffusion U-Nets (Bi et al., 11 Aug 2025).
Hierarchical approaches: Slices are first restored via 2D diffusion models to recover lateral continuity, then high-frequency-aware 3D ResNets synthesize intermediate slices, improving isotropy of electron microscopy volumes (Chen et al., 2024).
Off-the-shelf 2D SR via explicit 3D representation: Diffusion-based 2D SR models are coupled with explicit 3D scene representations—especially anisotropic Gaussian splats—to alternately impose view consistency and guidance via high-fidelity rendered views (Chen et al., 6 Aug 2025).
Contrastive learning for degradation modeling: Latent encoders explicitly align the HR/SR distribution, enabling robust 3DSR from only one or few available HR samples (Bi et al., 11 Aug 2025).

2.5. Video and Sequence Models for Multi-View or Temporal Consistency

Temporal correlation or inter-view relationships are harnessed:

3D ConvNets for video SR: 3DSRnet maintains the temporal depth of feature maps by extrapolation in the time axis, capturing nonlinear spatio-temporal dependencies (Kim et al., 2018).
Leveraging video SR for multi-view consistency: Treating multi-view images as a "video-like" sequence and applying video SR networks (with adaptive sequence construction) achieves state-of-the-art spatial and temporal consistency in 3D reconstructions (Ko et al., 2024, Shen et al., 2024).
3D-consistent GANs and NeRF-based rendering: SuperNeRF-GAN and related pipelines combine NeRF-based volume rendering, network-based latent upsampling of NeRF parameters, and geometry-aware rendering routines to ensure quality and angular consistency at scale (Zheng et al., 12 Jan 2025).

3. 3DSR in Application Domains

3.1. Biomedical and Scientific Imaging

Volumetric MRI and CT: 3D SRCNN, EDSR, and SOUP-GAN architectures produce high-fidelity volumetric data at reduced acquisition time and/or dose, typically using patch-based or block-based training to address memory constraints (Wang et al., 2018, Zhang et al., 2021).
Self-supervised medical SR: TV-regularized, analytical forward models permit SR without any real HR ground truth, achieving 70–100% of the gap to supervised deep models (Pérez-Bueno et al., 2024).
3D orientation and functional microscopy: 3D orientation SR in fluorescence microscopy is accomplished via polarized virtual spatial-frequency-shift patterns, mathematical reconstruction in Fourier space, and efficient nonlinear inversion per pixel, achieving both spatial and angular super-resolutions (Liu et al., 2024).

3.2. Depth, Shape, and Texture Super-Resolution

Single-view 3D shape upsampling: Implicit function regressors map from a low-res observed shape to an occupancy field of a high-resolution surface, with information-loss-aware loss functions training dual-MLP pipelines (Pesavento et al., 2022).
Point cloud and depth fusion: 2D representations (e.g., PNCC) enable adaptation of 2D SR networks (Swin Transformer, Vision Mamba) for real-time, guidance-free upsampling of depth sensors or LIDAR (Mas et al., 11 Nov 2025).
Texture maps for 3D models: Multi-view and normal-guided SR networks reconstruct appearance maps on 3D objects, enforcing both texture fidelity and surface-aligned high-frequency content (Li et al., 2019).

3.3. 3D-Aware Image Synthesis and Rendering

NeRF-based high-resolution synthesis: Learning to upsample the latent or explicit NeRF representation (e.g., tri-planes), combined with depth-corrected, geometry-aware rendering routines, achieves memory efficiency (3× lower than dense NeRF) and state-of-the-art FID, PSNR, and SSIM (Zheng et al., 12 Jan 2025).
Gaussian splatting: Splat-based 3D scene representations, fit in a photometric and perceptual loss loop to high-resolution upsampled images, enable category-agnostic, 3D-consistent SR workflows (Shen et al., 2024, Chen et al., 6 Aug 2025).
Multi-view and video SR for 3D: Ordering LR images into plausible "video" sequences and applying modern VSR models improve view consistency in downstream 3D reconstructions with no model re-training (Ko et al., 2024).

4. Performance Benchmarks and Trade-offs

Performance is systematically evaluated by metrics such as mean PSNR, SSIM, LPIPS (for perceptual fidelity), and geometry-aware measurements (Chamfer Distance, segmentation IoU, Dice) (Ko et al., 2024, Bi et al., 11 Aug 2025):

Method/Domain	PSNR (dB)	SSIM	LPIPS	Datasets
Video 3DSRnet (Kim et al., 2018)	27.70/25.71	0.85/0.76	–	Vidset4 (×3/×4 upscaling)
TV-3DSR (fMRI) (Pérez-Bueno et al., 2024)	26.1	0.83	–	Gorgolewski RS (fMRI)
CD-TVD (fluid sim.) (Bi et al., 11 Aug 2025)	+2–5 over best	–	min	Hurricane, vessel datasets
Depth fusion (Ruget et al., 2020)	RMSE 0.012	–	–	Middlebury, Lindell
3D texture (Li et al., 2019)	PSNR 26.46	–	–	3DASR dataset
3DGS + VSR (Ko et al., 2024)	PSNR 31.41	0.9520	0.0540	NeRF-Synthetic, Blender
GaussianSR (Shen et al., 2024)	PSNR 28.44	0.923	–	Blender-Synthetic
3DSR-GS (Chen et al., 6 Aug 2025)	PSNR 26.10	0.746	0.222	MipNeRF360, LLFF
SuperNeRF-GAN (Zheng et al., 12 Jan 2025)	PSNR 36.44	0.935	–	FFHQ1024, DeepFashion
SOUP-GAN (MRI) (Zhang et al., 2021)	30.2@4×	0.88	–	T1w/T2w MRI, CT

For LR-to-HR information recovery, deep SR methods typically yield 1–2 dB PSNR or 0.03–0.04 SSIM gains over bicubic or trilinear interpolation, with geometry/normal-aware fusions or diffusion techniques realizing additional improvements (Ko et al., 2024, Bi et al., 11 Aug 2025). Domain-specific ablations confirm the critical roles of: (1) explicit attention or contrastive modules for scarce data, (2) continuity and regularization for structural faithfulness, and (3) sequence construction for view or temporal consistency.

5. Key Limitations, Open Challenges, and Future Directions

Several fundamental and practical limitations persist:

Memory and scalability: Deep 3D architectures, especially those with wide receptive fields, incur prohibitive memory and computational demands, mandating block-based processing and restricting full-volume inference (Wang et al., 2018).
Dependence on ground truth and supervision: Many methods require paired HR-LR data, yet unsupervised or self-supervised strategies using analytical degradations and physics-based constraints are emerging (Pérez-Bueno et al., 2024, Stergiopoulou et al., 2021).
View and temporal consistency: SISR models fail to ensure cross-view consistency; advanced pipelines now leverage VSR backbones, explicit 3D refinement, or learn joint priors for global agreements (Ko et al., 2024, Chen et al., 6 Aug 2025).
Data shift and generalization: Model performance on new, unmodeled distributions may degrade rapidly, calling for robust domain adaptation, semi-supervised, or physics-informed learning (Bi et al., 11 Aug 2025).
Perceptual, semantic, and geometric fidelity: MSE and pixel-wise losses do not always capture perceptual or downstream utility; 3D-aware perceptual, adversarial, or geometric losses are now routinely incorporated for optimal fidelity (Zhang et al., 2021, Zheng et al., 12 Jan 2025).

Active research avenues include spatio-temporal 3DSR, end-to-end joint SR plus 3D representation learning, dynamic-scene or multi-modal 3D SR, hardware-aware acceleration, and unsupervised/self-supervised expansion for scenarios lacking HR ground truth.

6. Domain-Specific Examples

Domain/Problem	Methods	Salient Features
Video SR (spatiotemporal)	3DSRnet (Kim et al., 2018)	3D convolutions, residual learning, scene-change detection
Volumetric MRI/CT	3DSRCNN (Wang et al., 2018), SOUP-GAN (Zhang et al., 2021)	Patch-based SR, MSE/perceptual loss, GANs
fMRI without HR data	TV-3DSR (Pérez-Bueno et al., 2024)	Self-supervision via forward model, TV regularization
Fast single-volume SR	FFT-3D-FSR (Tuador et al., 2020)	Frequency-domain closed-form, ADMM TV solver
Depth + intensity image fusion	HistNet (Ruget et al., 2020)	Multi-feature (histogram, intensity) fusion, U-Net backbone
Single-view 3D shape recovery	SuRS (Pesavento et al., 2022)	Dual-MLP, difference loss for residual super-res
Multi-view texture (geometry-aware)	NHR/NLR (Li et al., 2019)	Normal maps in deep SR pipeline, UV mapping
3D electron microscopy	D2R-DGEAN (Chen et al., 2024)	2D diffusion, 3D frequency-aware network, EM segmentation
3D-aware high-res synthesis	SuperNeRF-GAN (Zheng et al., 12 Jan 2025), 3DSR-GS (Chen et al., 6 Aug 2025)	Tri-plane SR, NeRF diffusion/GAN, depth-guided rendering
Real-time un-guided single-view	2Dto3D-SR (Mas et al., 11 Nov 2025)	PNCC encoding, Swin Transformer/VM upsampler
Scientific simulation SR (scarce HR)	CD-TVD (Bi et al., 11 Aug 2025)	Contrastive diffusion, local attention, one HR timestep
Biological 3D orientation SR	PVSFSM (Liu et al., 2024)	Polarization, spatial-frequency shift, analytic inversion

7. Significance and Outlook

3D super-resolution has attained a central role in both scientific and applied computational imaging, bridging model-based, physical, and data-driven paradigms for the recovery, enhancement, and synthesis of high-fidelity volumetric data. The convergence of 3D representation learning (e.g., NeRF, Gaussian splatting), physically-informed regularization, and large-scale generative modeling architectures is enabling nearly artifact-free, geometrically consistent, and application-agnostic 3D SR at unprecedented speed and detail. Ongoing technical challenges remain in generalization, scalability, unsupervised learning, and fidelity-aware optimization. Progress in these areas is expected to further elevate the impact of 3DSR in scientific, medical, industrial, and creative domains.