- The paper introduces GSM-GS, a novel framework that integrates regionally adaptive single-view constraints with geometry-guided multi-view consistency for high-fidelity 3D reconstruction.
- The paper demonstrates state-of-the-art performance, achieving the lowest mean chamfer distance (0.51 mm on DTU) and improved novel view synthesis metrics on benchmark datasets.
- The paper provides practical insights for efficient surface reconstruction in challenging, texture-deficient environments, making it applicable in photogrammetry, VR/AR, and robotics.
Geometry-Constrained Single and Multi-view Gaussian Splatting for Surface Reconstruction
Introduction and Motivation
This work introduces GSM-GS, a geometry-constrained framework for high-accuracy 3D surface reconstruction and novel view synthesis based on 3D Gaussian Splatting (3DGS) (2602.12796). While vanilla 3DGS achieves real-time, high-fidelity rendering, it struggles with geometric fidelity in regions with high-frequency texture or texture-less surfaces and suffers from multi-view inconsistency artifacts. GSM-GS resolves these limitations through a hybrid regime coupling single-view adaptive sub-region constraints and cross-view geometric regularization. The methodology leverages both photometric and geometric evidence, dynamically modulating optimization across regional image statistics and global scene structure.
Advancements in surface-constrained neural scene reconstruction span neural radiance fields (NeRFs), mesh/surfel-based methods, and explicit Gaussian representations. While recent NeRFs such as Mip-NeRF 360 and BakedSDF improve either rendering quality or surface regularization, they do not achieve the speed or explicitness of 3DGS. PGSR, RaDe-GS, 2DGS, GOF, and related post-3DGS approaches incorporate unbiased depth, local plane priors, or uncertainty-driven selection, but often underperform in texture-deficient or structurally ambiguous regions. GSM-GS positions itself as an explicit geometry-aware method, surpassing competitive Gaussian, volumetric, and neural rendering baselines across several metrics and datasets.
Methodology
Single-view Sub-region Adaptive Weighting
GSM-GS decomposes input views by computing pixelwise image gradients (Sobel-based), partitioning into texture-rich (R) and texture-less (B) regions.
- In high-texture regions, consistency between depth gradients and normal vector orientation is enforced using orthogonality constraints, modulated by a trust weight derived from the rendered-vs-unbiased depth discrepancy.
- Texture-less regions undergo total variation (TV) smoothing weighted by local color affinity, mitigating over-smoothing while preserving discontinuities.
- Trust region selection is performed via adaptive thresholding on depth discrepancy-based weights, ensuring only reliable pixels contribute to gradient-based geometric loss.
This dual-branch design introduces spatial adaptivity, robustly regularizing both edges and homogeneous areas.
Figure 1: Spatial distributions of Gaussian ellipsoids show GSM-GS's regularization yields better surface conformity and reduced artifacts vs. 3DGS and PGSR.
Figure 2: GSM-GS architecture, emphasizing dual-branch (texture-rich/poor) constraints and joint single/multi-view geometry optimization.
Figure 3: Depth-based trust weighting; high-weight regions (filtered by the discrepancy map) drive reliable geometric updates.
Figure 4: Gradient-based segmentation into texture-rich (red) and texture-less (blue) sub-regions.
Figure 5: Reconstructed normal maps show GSM-GS preserves fine geometric detail and normal fidelity over PGSR.
Geometry-guided Multi-view Consistency
GSM-GS defines inter-view constraints at the point cloud level:
Loss and Training Protocol
The training objective is
L=Lrgb​+Lsvgeo​+λ3​Lmvgeo​
where Lrgb​ is photometric loss, Lsvgeo​ is the single-view (regionally adaptive) geometric loss, and Lmvgeo​ is cross-view geometric consistency. Regularization weights (λ1​ for orthogonality terms, λ2​ for TV, λ3​ for cross-view) are empirically tuned, with additional thresholds controlling mask selection and candidate sampling for computational efficiency.
Experimental Results
Geometry Reconstruction
Quantitative evaluation on the DTU and Tanks and Temples datasets shows that GSM-GS achieves the lowest mean chamfer distance (0.51 mm on DTU) and highest average F1-Score (0.36 on Tanks and Temples), outperforming 2DGS, RaDe-GS, PGSR, GOF, and others—even given moderate (but competitive) compute overhead.
Figure 7: GSM-GS better captures fine surface geometry on real DTU scenes than other baselines.
Novel View Synthesis
Rendering evaluation on Mip-NeRF360 and LLFF datasets demonstrates consistent improvements in LPIPS ($0.175$ mean) and SSIM/PSNR parity versus strongest baselines.
Figure 9: Comparison on Mip-NeRF360: GSM-GS yields higher PSNR/SSIM and the lowest LPIPS, indicating improved perceptual fidelity.
Figure 10: LLFF PSNR trend: GSM-GS tracks stronger/steadier improvement than prior Splatting techniques.
Robustness and Sensitivity
Parameter sweeps (Appendix) validate that the system is robust to variations in trust thresholds and texture segmentation quantiles, provided they fall in empirically derived optimal ranges (e.g., θ=0.8, segmentation at 75% quantile), with accuracy peaking at practical per-batch sample rates (S=16). Computation time remains reasonable (e.g., $0.45$ hours avg. on DTU).
Figure 12: Confidence threshold sensitivity; blue masks best match low-error area at θ=0.8.
Figure 13: Texture quantile p=75% yields stable texture-less segmentation—higher values cause edge loss.
Figure 14: PSNR saturates for S>16 while compute cost rises, supporting the default parameter choices.
Theoretical and Practical Implications
The GSM-GS design demonstrates that surface-fitting fidelity in explicit Gaussian Splatting frameworks can be systematically improved via regionally adaptive, geometry-guided constraints—without merging to mesh-based representations or losing the advantages of neural scene parameterization. Architecturally, it signals a shift toward multi-branch, context-sensitive optimization for hybrid explicit/implicit representations. In practice, GSM-GS is immediately applicable to high-speed, high-fidelity photogrammetry, SLAM, VR/AR, and robotics perception pipelines, with particular efficacy in challenging, weakly textured real-world scenes.
Given its modularity, the presented approach is extensible to future joint optimization of reflectance, SDF-guided priors, or uncertainty-aware selection; further, its sample-efficient regularization is compatible with few-shot/sparse-view and generalization-centered splatting extensions.
Conclusion
GSM-GS sets a new benchmark for explicit Gaussian Splatting-based 3D reconstruction, robustly regularizing scene geometry via dual-branch single-view and global multi-view adaptive constraints. Empirically, this leads to quantifiable and qualitative enhancements over baseline and state-of-the-art methods for both geometric reconstruction and novel view synthesis. While persistent challenges remain in transparent or highly specular regimes, GSM-GS provides a blueprint for region-/context-sensitive, geometry-driven regularization in scalable 3D learning systems (2602.12796).