SurfSplat: Conquering Feedforward 2D Gaussian Splatting with Surface Continuity Priors

Published 2 Feb 2026 in cs.CV and cs.AI | (2602.02000v2)

Abstract: Reconstructing 3D scenes from sparse images remains a challenging task due to the difficulty of recovering accurate geometry and texture without optimization. Recent approaches leverage generalizable models to generate 3D scenes using 3D Gaussian Splatting (3DGS) primitive. However, they often fail to produce continuous surfaces and instead yield discrete, color-biased point clouds that appear plausible at normal resolution but reveal severe artifacts under close-up views. To address this issue, we present SurfSplat, a feedforward framework based on 2D Gaussian Splatting (2DGS) primitive, which provides stronger anisotropy and higher geometric precision. By incorporating a surface continuity prior and a forced alpha blending strategy, SurfSplat reconstructs coherent geometry together with faithful textures. Furthermore, we introduce High-Resolution Rendering Consistency (HRRC), a new evaluation metric designed to evaluate high-resolution reconstruction quality. Extensive experiments on RealEstate10K, DL3DV, and ScanNet demonstrate that SurfSplat consistently outperforms prior methods on both standard metrics and HRRC, establishing a robust solution for high-fidelity 3D reconstruction from sparse inputs. Project page: https://hebing-sjtu.github.io/SurfSplat-website/

Abstract PDF Upgrade to Chat

Summary

The paper presents a novel feedforward 2D Gaussian splatting approach that leverages surface continuity priors to produce coherent 3D reconstructions from sparse multi-view images.
It integrates a forced alpha blending strategy to prevent opacity collapse, ensuring multi-layer expressiveness and unbiased color blending for improved scene fidelity.
Empirical evaluations on datasets like RealEstate10K and ACID show superior PSNR, SSIM, and HRRC scores compared to prior feedforward 3D reconstruction methods.

SurfSplat: Feedforward 2D Gaussian Splatting with Surface Continuity Priors

Introduction and Motivation

High-fidelity 3D scene reconstruction from sparse multi-view images is a central problem in computer vision, impacting virtual and augmented reality, digital content creation, and robotics. Vanilla 3D Gaussian Splatting (3DGS) pipelines provide compelling novel view synthesis and scene reconstruction results by representing scenes with collections of semi-transparent ellipsoidal primitives, but they typically demand dense image sampling and computationally expensive per-scene optimization. Recent feedforward approaches accelerate the pipeline and enable real-time applications from sparse views, but introducing strong generalization creates new challenges: reconstructions often display surface discontinuities, spatial voids, and color artifacts, particularly under close-up and off-axis views. This reveals fundamental deficiencies in how geometry and appearance are disentangled, and exposes the inability of conventional metrics to capture these flaws.

The paper "SurfSplat: Conquering Feedforward 2D Gaussian Splatting with Surface Continuity Priors" (2602.02000) addresses these issues by introducing a 2D Gaussian Splatting (2DGS) feedforward approach integrating explicit surface continuity constraints and forced alpha blending, enabling effective 3D surface reasoning, coherent geometry, and reliable generalization under sparse input conditions.

Methodological Contributions

2D Gaussian Splatting as a Scene Primitive

SurfSplat uses 2DGS as its core scene primitive, leveraging the higher degree of anisotropy and geometric precision compared to 3DGS. Each pixel in an input view is mapped to a 2D Gaussian splat parameterized by 3D position, scale, rotation, color, and opacity. Unlike typical 3DGS pipelines that optimize these parameters or regress them independently, SurfSplat formulates a direct geometric coupling via a surface continuity prior.

Surface Continuity Prior

The model enforces that neighboring Gaussians (surfels) are spatially and geometrically correlated, reflecting the inherent continuity of most real-world surfaces. The prior works by using local image neighborhoods to estimate orientation (via Sobel filters and cross products of tangent vectors) and by aligning Gaussian rotations and scales to generate locally coherent surface tangent planes. This geometric guidance yields surface-oriented splats rather than scattered points and avoids the discrete, incoherent reconstructions observed with previous methods.

Forced Alpha Blending

Training with only the continuity prior introduces opacity collapse, where splats saturate the alpha channel and occlude scene layers, thereby destroying the multi-layer expressiveness required for accurate 3D reasoning. SurfSplat mitigates this by explicitly capping the opacity of each splat and enforcing distant splats to contribute to the rendered output, along with normalization strategies over the spherical harmonic color representation to ensure unbiased color blending.

Architecture and Losses

SurfSplat adopts a dual-path encoder. A ViT-based monocular depth backbone extracts single-view cues, fused with a multi-view encoder incorporating Swin Transformers and plane-sweep stereo cost volumes for inter-view consistency. A U-Net fuses these features to predict Gaussian attributes, processed with the continuity prior and alpha blending for final scene instantiation. Training relies on a mix of MSE and perceptual (LPIPS) losses between rendered and ground-truth images, without any direct supervision of depth or normals.

High-Resolution Evaluation Metric: HRRC

Conventional metrics such as PSNR, SSIM, and LPIPS, computed at nominal input resolutions, fail to expose geometry-induced artifacts that manifest at high resolutions (e.g., holes, inconsistent depth, or splatting sparsity). SurfSplat introduces High-Resolution Rendering Consistency (HRRC), which renders reconstructions at 2× or 4× resolution and computes standard metrics with bicubic-upsampled ground truth. HRRC correlates with the presence of spatial artifacts and more robustly evaluates scene fidelity and generalization.

Empirical Evaluation

SurfSplat demonstrates consistent superiority over prior feedforward generalizable 3D reconstruction methods—PixelSplat, MVSplat, TranSplat, HiSplat, and DepthSplat—across multiple evaluation regimes:

On RealEstate10K and ACID, SurfSplat achieves the highest PSNR and SSIM and lowest LPIPS under both standard and HRRC metrics. Strong performance is particularly pronounced at higher resolutions, where previous methods exhibit pronounced holes and surface inconsistencies.
On cross-domain datasets (DTU, DL3DV, ScanNet), SurfSplat shows robust generalization, with only minor degradation compared to performance on in-distribution data.
Ablation studies confirm that both the surface continuity prior and forced alpha blending are essential: without these, reconstructions revert to noisy, discontinuous, or spatially inconsistent artifacts, and HRRC scores drop significantly despite modest impact on low-res NVS metrics.

Strong quantitative results (e.g., SurfSplat-L on RealEstate10K: PSNR 27.537/26.331/24.897 at 256²/512²/1024², respectively) and improved geometric and normal coherency visualizations further confirm the effectiveness of this approach.

Implications and Future Directions

SurfSplat establishes that reliable geometric priors, coupled with careful rendering constraints, are essential for robust 3D reconstruction from sparse images in a feedforward paradigm. The introduction of HRRC is a significant contribution for benchmarking geometric fidelity, mitigating the regime where low-res novel view synthesis metrics overstate quality.

The practical implications include acceleration and scaling of 3D scene modeling pipelines, where rapid reconstruction from minimal observations is required, such as real-time AR/VR, robotics, and navigation. SurfSplat's mechanism of one Gaussian per pixel, while effective for dense modeling, can lead to redundancy and unneeded computational overhead, suggesting future research directions in adaptive or sparse representations.

Further, the reliance on known/accurate camera poses is a limitation; future work on joint pose estimation or pose-free modeling could leverage the geometric insight of SurfSplat to facilitate end-to-end scene reconstruction and registration in unconstrained settings.

Conclusion

SurfSplat proposes a principled feedforward 3D scene reconstruction pipeline utilizing 2D Gaussian Splatting, augmented with a physically-grounded surface continuity prior and a forced alpha blending strategy. The approach produces geometrically coherent, high-fidelity surfaces from sparse input images, outperforming previous methods under both standard and high-resolution (HRRC) evaluation. The framework points toward more powerful and generalizable 3D representations, with continued work needed on pose estimation and model efficiency for broader adoption and scalability (2602.02000).