Initialize to Generalize: A Stronger Initialization Pipeline for Sparse-View 3DGS

Published 20 Oct 2025 in cs.CV | (2510.17479v1)

Abstract: Sparse-view 3D Gaussian Splatting (3DGS) often overfits to the training views, leading to artifacts like blurring in novel view rendering. Prior work addresses it either by enhancing the initialization (\emph{i.e.}, the point cloud from Structure-from-Motion (SfM)) or by adding training-time constraints (regularization) to the 3DGS optimization. Yet our controlled ablations reveal that initialization is the decisive factor: it determines the attainable performance band in sparse-view 3DGS, while training-time constraints yield only modest within-band improvements at extra cost. Given initialization's primacy, we focus our design there. Although SfM performs poorly under sparse views due to its reliance on feature matching, it still provides reliable seed points. Thus, building on SfM, our effort aims to supplement the regions it fails to cover as comprehensively as possible. Specifically, we design: (i) frequency-aware SfM that improves low-texture coverage via low-frequency view augmentation and relaxed multi-view correspondences; (ii) 3DGS self-initialization that lifts photometric supervision into additional points, compensating SfM-sparse regions with learned Gaussian centers; and (iii) point-cloud regularization that enforces multi-view consistency and uniform spatial coverage through simple geometric/visibility priors, yielding a clean and reliable point cloud. Our experiments on LLFF and Mip-NeRF360 demonstrate consistent gains in sparse-view settings, establishing our approach as a stronger initialization strategy. Code is available at https://github.com/zss171999645/ItG-GS.

Abstract PDF Upgrade to Chat

Summary

The paper demonstrates that robust initialization, rather than regularization, primarily drives reconstruction quality in sparse-view 3DGS.
It introduces a three-stage pipeline—low-frequency-aware SfM, 3DGS self-initialization, and point-cloud regularization—to improve seed point density and quality.
Empirical evaluations on Mip-NeRF360 and LLFF datasets reveal state-of-the-art improvements in PSNR, SSIM, and LPIPS with efficient computational cost.

Initialize to Generalize: A Stronger Initialization Pipeline for Sparse-View 3DGS

Introduction and Motivation

Sparse-view 3D Gaussian Splatting (3DGS) is a prominent approach for novel view synthesis, offering real-time rendering and competitive quality compared to neural radiance fields. However, when the number of input views is limited, 3DGS is susceptible to overfitting, manifesting as blurring and floaters in unseen views. The paper "Initialize to Generalize: A Stronger Initialization Pipeline for Sparse-View 3DGS" (2510.17479) presents a systematic investigation into the relative impact of initialization and training-time regularization on sparse-view 3DGS performance. The authors demonstrate through controlled ablations that initialization quality is the dominant factor, setting the upper bound for attainable reconstruction fidelity, while regularization yields only modest improvements within this band.

Empirical Analysis: Initialization vs. Regularization

The paper provides a rigorous empirical study on the Mip-NeRF360 dataset, controlling initialization strength by varying the number of input views for Structure-from-Motion (SfM) and comparing several state-of-the-art regularization methods (FSGS, CoR-GS, DropGaussian). The results show that:

Initialization strength stratifies performance into distinct bands; regularization methods only provide incremental gains within each band.
Training-time regularization cannot compensate for poor initialization; the final reconstruction quality is fundamentally limited by the seed point cloud.

This finding motivates a shift in focus from regularization to the design of a robust initialization pipeline for sparse-view 3DGS.

Methodology: Three-Stage Initialization Pipeline

The proposed pipeline consists of three stages, each addressing specific limitations of conventional SfM-based initialization:

1. Low-Frequency-Aware SfM

Standard SfM is biased towards high-frequency regions due to its reliance on feature matching, resulting in poor coverage of low-texture areas. The authors introduce two key modifications:

Relaxed track length: Lowering the minimum track requirement from three to two views increases point density.
High-frequency masking and view augmentation: Pre-masking high-frequency regions and doubling the image set for SfM yields a more balanced point distribution, improving coverage in smooth regions.

Formally, the augmented view set $\mathcal{I}^{\text{aug}}$ is constructed by applying gradient-based masking (e.g., Sobel operator) and running SfM on both original and masked images.

2. 3DGS Self-Initialization

SfM fails in weakly textured or repetitive regions. To address this, the pipeline trains a lightweight first-pass 3DGS on downsampled images, seeded by the initial point cloud. The centroids of the resulting Gaussian primitives are repurposed as additional seed points, effectively lifting pixel-level photometric supervision into 3D space. This step is parameterized with SH degree 0 (DC color) and a short optimization schedule, focusing on coverage rather than rendering fidelity.

3. Point-Cloud Regularization

The union of SfM and 3DGS-generated points introduces noise and redundancy. The pipeline applies three regularization procedures:

Single-view point filtering: Retain only the top 20% of single-view-supported points with highest reliability, measured by proximity to multi-view-supported points.
Clustering-based denoising: Apply $K$ -means clustering ( $K=1000$ ) and retain the 30% nearest points to each cluster centroid, discarding outliers and duplicates.
Normal-based consistency filtering: Estimate surface normals via PCA on local neighborhoods and retain points with mean cosine similarity above a threshold (0.2), ensuring geometric consistency.

Pseudocode for these steps is provided in the appendix, facilitating reproducible implementation.

Experimental Results

Quantitative Evaluation

On the Mip-NeRF360 and LLFF datasets, the proposed initialization pipeline achieves state-of-the-art performance across PSNR, SSIM, and LPIPS metrics. Notably:

Initialization alone outperforms prior regularization-based methods.
Combining the pipeline with DropGaussian regularization further improves results, indicating that strong initialization synergizes with regularization.

The method incurs a comparable time cost to existing baselines, with detailed breakdowns provided for initialization and training phases.

Ablation Study

Incremental addition of each pipeline component yields consistent improvements in all metrics. The largest gains are attributed to low-frequency-aware SfM and 3DGS self-initialization, which significantly increase the number of reliable seed points, especially in feature-sparse regions. Point-cloud regularization further refines the initialization, suppressing noise and redundancy.

Qualitative Analysis

Visualizations demonstrate that the pipeline reconstructs more balanced textures, robustly recovers low-feature areas, and produces sharper object boundaries. The initialization quality directly correlates with final rendering fidelity, particularly at scene edges.

Implementation Considerations

Computational Requirements: The pipeline is efficient, with initialization and training times comparable to or better than existing methods.
Scalability: The approach is applicable to both small-scale (LLFF) and large-scale (Mip-NeRF360) datasets.
Integration: The pipeline can be seamlessly integrated into existing 3DGS frameworks, and its modular design allows for further extension or combination with advanced regularization techniques.

Implications and Future Directions

The paper establishes that sparse-view 3DGS is fundamentally limited by initialization quality. This insight has several implications:

Algorithmic Focus: Future research should prioritize initialization strategies, potentially leveraging generative models or multi-modal priors to further enhance coverage.
Hybrid Approaches: Combining strong initialization with lightweight regularization may yield optimal trade-offs between fidelity and computational cost.
Generalization: The pipeline's principles may extend to other neural scene representations, such as NeRF variants or mesh-based methods, under sparse-view constraints.

Potential future developments include adaptive initialization schemes that exploit scene semantics, integration with diffusion-based view synthesis, and exploration of self-supervised or unsupervised initialization in the absence of reliable camera poses.

Conclusion

The paper provides compelling evidence that initialization is the decisive factor in sparse-view 3DGS performance. The proposed three-stage pipeline—low-frequency-aware SfM, 3DGS self-initialization, and point-cloud regularization—yields cleaner, denser, and more reliable seed points, setting a new standard for initialization in novel view synthesis. The approach is efficient, reproducible, and synergistic with existing regularization methods, offering a robust foundation for future research in sparse-view 3D scene reconstruction.