Papers
Topics
Authors
Recent
Search
2000 character limit reached

GSM-GS: Geometry-Constrained Single and Multi-view Gaussian Splatting for Surface Reconstruction

Published 13 Feb 2026 in cs.CV and cs.GR | (2602.12796v1)

Abstract: Recently, 3D Gaussian Splatting has emerged as a prominent research direction owing to its ultrarapid training speed and high-fidelity rendering capabilities. However, the unstructured and irregular nature of Gaussian point clouds poses challenges to reconstruction accuracy. This limitation frequently causes high-frequency detail loss in complex surface microstructures when relying solely on routine strategies. To address this limitation, we propose GSM-GS: a synergistic optimization framework integrating single-view adaptive sub-region weighting constraints and multi-view spatial structure refinement. For single-view optimization, we leverage image gradient features to partition scenes into texture-rich and texture-less sub-regions. The reconstruction quality is enhanced through adaptive filtering mechanisms guided by depth discrepancy features. This preserves high-weight regions while implementing a dual-branch constraint strategy tailored to regional texture variations, thereby improving geometric detail characterization. For multi-view optimization, we introduce a geometry-guided cross-view point cloud association method combined with a dynamic weight sampling strategy. This constructs 3D structural normal constraints across adjacent point cloud frames, effectively reinforcing multi-view consistency and reconstruction fidelity. Extensive experiments on public datasets demonstrate that our method achieves both competitive rendering quality and geometric reconstruction. See our interactive project page

Summary

  • The paper introduces GSM-GS, a novel framework that integrates regionally adaptive single-view constraints with geometry-guided multi-view consistency for high-fidelity 3D reconstruction.
  • The paper demonstrates state-of-the-art performance, achieving the lowest mean chamfer distance (0.51 mm on DTU) and improved novel view synthesis metrics on benchmark datasets.
  • The paper provides practical insights for efficient surface reconstruction in challenging, texture-deficient environments, making it applicable in photogrammetry, VR/AR, and robotics.

Geometry-Constrained Single and Multi-view Gaussian Splatting for Surface Reconstruction

Introduction and Motivation

This work introduces GSM-GS, a geometry-constrained framework for high-accuracy 3D surface reconstruction and novel view synthesis based on 3D Gaussian Splatting (3DGS) (2602.12796). While vanilla 3DGS achieves real-time, high-fidelity rendering, it struggles with geometric fidelity in regions with high-frequency texture or texture-less surfaces and suffers from multi-view inconsistency artifacts. GSM-GS resolves these limitations through a hybrid regime coupling single-view adaptive sub-region constraints and cross-view geometric regularization. The methodology leverages both photometric and geometric evidence, dynamically modulating optimization across regional image statistics and global scene structure.

Advancements in surface-constrained neural scene reconstruction span neural radiance fields (NeRFs), mesh/surfel-based methods, and explicit Gaussian representations. While recent NeRFs such as Mip-NeRF 360 and BakedSDF improve either rendering quality or surface regularization, they do not achieve the speed or explicitness of 3DGS. PGSR, RaDe-GS, 2DGS, GOF, and related post-3DGS approaches incorporate unbiased depth, local plane priors, or uncertainty-driven selection, but often underperform in texture-deficient or structurally ambiguous regions. GSM-GS positions itself as an explicit geometry-aware method, surpassing competitive Gaussian, volumetric, and neural rendering baselines across several metrics and datasets.

Methodology

Single-view Sub-region Adaptive Weighting

GSM-GS decomposes input views by computing pixelwise image gradients (Sobel-based), partitioning into texture-rich (R\mathcal{R}) and texture-less (B\mathcal{B}) regions.

  • In high-texture regions, consistency between depth gradients and normal vector orientation is enforced using orthogonality constraints, modulated by a trust weight derived from the rendered-vs-unbiased depth discrepancy.
  • Texture-less regions undergo total variation (TV) smoothing weighted by local color affinity, mitigating over-smoothing while preserving discontinuities.
  • Trust region selection is performed via adaptive thresholding on depth discrepancy-based weights, ensuring only reliable pixels contribute to gradient-based geometric loss.

This dual-branch design introduces spatial adaptivity, robustly regularizing both edges and homogeneous areas. Figure 1

Figure 1: Spatial distributions of Gaussian ellipsoids show GSM-GS's regularization yields better surface conformity and reduced artifacts vs. 3DGS and PGSR.

Figure 2

Figure 2: GSM-GS architecture, emphasizing dual-branch (texture-rich/poor) constraints and joint single/multi-view geometry optimization.

Figure 3

Figure 3: Depth-based trust weighting; high-weight regions (filtered by the discrepancy map) drive reliable geometric updates.

Figure 4

Figure 4: Gradient-based segmentation into texture-rich (red) and texture-less (blue) sub-regions.

Figure 5

Figure 5: Reconstructed normal maps show GSM-GS preserves fine geometric detail and normal fidelity over PGSR.

Geometry-guided Multi-view Consistency

GSM-GS defines inter-view constraints at the point cloud level:

  • Neighboring-view alignment: Correspondence between points rendered from unbiased depth maps is established after rigid pose alignment.
  • Global weights are synthesized by convexly combining per-view trust weights, selectively sampling high-confidence regions for geometric regularization.
  • For each corresponding region, local surface normals are estimated (via PCA on 3×33\times 3 patch), and a curvature-attuned cosine similarity penalizes angular discrepancy between matched normals.
  • Adaptive candidate filtering (using a geometric validity mask and dynamic thresholds) ensures only regions correctable under photometric evidence contribute to the multi-view loss. Figure 6

    Figure 6: Multi-view consistency: geometric normal constraints on sampled high-confidence point clouds aligned by pose.

Loss and Training Protocol

The training objective is

L=Lrgb+Lsvgeo+λ3Lmvgeo\mathcal{L} = \mathcal{L}_{rgb} + \mathcal{L}_{svgeo} + \lambda_3 \mathcal{L}_{mvgeo}

where Lrgb\mathcal{L}_{rgb} is photometric loss, Lsvgeo\mathcal{L}_{svgeo} is the single-view (regionally adaptive) geometric loss, and Lmvgeo\mathcal{L}_{mvgeo} is cross-view geometric consistency. Regularization weights (λ1\lambda_1 for orthogonality terms, λ2\lambda_2 for TV, λ3\lambda_3 for cross-view) are empirically tuned, with additional thresholds controlling mask selection and candidate sampling for computational efficiency.

Experimental Results

Geometry Reconstruction

Quantitative evaluation on the DTU and Tanks and Temples datasets shows that GSM-GS achieves the lowest mean chamfer distance (0.51 mm on DTU) and highest average F1-Score (0.36 on Tanks and Temples), outperforming 2DGS, RaDe-GS, PGSR, GOF, and others—even given moderate (but competitive) compute overhead. Figure 7

Figure 7: GSM-GS better captures fine surface geometry on real DTU scenes than other baselines.

  • GSM-GS attains best-in-benchmark results in most scenes (13/15 on DTU), reducing error by up to 9.4% over PGSR.
  • Ablations confirm that both single-view and multi-view constraints independently improve reconstruction, and their integration is complementary. Figure 8

    Figure 8: DTU ablation confirms finer surface recovery with geometry-guided constraints.

Novel View Synthesis

Rendering evaluation on Mip-NeRF360 and LLFF datasets demonstrates consistent improvements in LPIPS ($0.175$ mean) and SSIM/PSNR parity versus strongest baselines. Figure 9

Figure 9: Comparison on Mip-NeRF360: GSM-GS yields higher PSNR/SSIM and the lowest LPIPS, indicating improved perceptual fidelity.

Figure 10

Figure 10: LLFF PSNR trend: GSM-GS tracks stronger/steadier improvement than prior Splatting techniques.

  • Notably, GSM-GS is especially superior in recovering fine lines/textures and suppressing floor artifacts, due to targeted constraint modulation across both texture-rich and impoverished (ambiguous) regions.
  • The inter-frame point cloud constraints are critical for eliminating spatial misalignment in novel view predictions. Figure 11

    Figure 11: GSM-GS spatial ellipsoid distribution provides tighter fit to ground truth object surface than PGSR.

Robustness and Sensitivity

Parameter sweeps (Appendix) validate that the system is robust to variations in trust thresholds and texture segmentation quantiles, provided they fall in empirically derived optimal ranges (e.g., θ=0.8\theta=0.8, segmentation at 75%75\% quantile), with accuracy peaking at practical per-batch sample rates (S=16S=16). Computation time remains reasonable (e.g., $0.45$ hours avg. on DTU). Figure 12

Figure 12: Confidence threshold sensitivity; blue masks best match low-error area at θ=0.8\theta=0.8.

Figure 13

Figure 13: Texture quantile p=75%p=75\% yields stable texture-less segmentation—higher values cause edge loss.

Figure 14

Figure 14: PSNR saturates for S>16S>16 while compute cost rises, supporting the default parameter choices.

Theoretical and Practical Implications

The GSM-GS design demonstrates that surface-fitting fidelity in explicit Gaussian Splatting frameworks can be systematically improved via regionally adaptive, geometry-guided constraints—without merging to mesh-based representations or losing the advantages of neural scene parameterization. Architecturally, it signals a shift toward multi-branch, context-sensitive optimization for hybrid explicit/implicit representations. In practice, GSM-GS is immediately applicable to high-speed, high-fidelity photogrammetry, SLAM, VR/AR, and robotics perception pipelines, with particular efficacy in challenging, weakly textured real-world scenes.

Given its modularity, the presented approach is extensible to future joint optimization of reflectance, SDF-guided priors, or uncertainty-aware selection; further, its sample-efficient regularization is compatible with few-shot/sparse-view and generalization-centered splatting extensions.

Conclusion

GSM-GS sets a new benchmark for explicit Gaussian Splatting-based 3D reconstruction, robustly regularizing scene geometry via dual-branch single-view and global multi-view adaptive constraints. Empirically, this leads to quantifiable and qualitative enhancements over baseline and state-of-the-art methods for both geometric reconstruction and novel view synthesis. While persistent challenges remain in transparent or highly specular regimes, GSM-GS provides a blueprint for region-/context-sensitive, geometry-driven regularization in scalable 3D learning systems (2602.12796).

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Authors (6)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 24 likes about this paper.