Hybrid Latents -- Geometry-Appearance-Aware Surfel Splatting

Published 16 Apr 2026 in cs.CV and cs.GR | (2604.14928v1)

Abstract: We introduce a hybrid Gaussian-hash-grid radiance representation for reconstructing 2D Gaussian scene models from multi-view images. Similar to NeST splatting, our approach reduces the entanglement between geometry and appearance common in NeRF-based models, but adds per-Gaussian latent features alongside hash-grid features to bias the optimizer toward a separation of low- and high-frequency scene components. This explicit frequency-based decomposition reduces the tendency of high-frequency texture to compensate for geometric errors. Encouraging Gaussians with hard opacity falloffs further strengthens the separation between geometry and appearance, improving both geometry reconstruction and rendering efficiency. Finally, probabilistic pruning combined with a sparsity-inducing BCE opacity loss allows redundant Gaussians to be turned off, yielding a minimal set of Gaussians sufficient to represent the scene. Using both synthetic and real-world datasets, we compare against the state of the art in Gaussian-based novel-view synthesis and demonstrate superior reconstruction fidelity with an order of magnitude fewer primitives.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper presents a hybrid radiance field representation that disentangles low-frequency geometry from high-frequency texture, reducing surfel count and enhancing reconstruction fidelity.
It introduces a novel hybrid feature decomposition by combining per-surfel latent features with a spatial hash-grid, allowing sharper silhouettes and efficient rendering.
The method leverages beta-distribution kernels, MCMC optimization, and BCE regularization to improve geometric reconstruction and scalability in real-time applications.

Hybrid Latents: Geometry-Appearance-Aware Surfel Splatting

Introduction and Motivation

The paper "Hybrid Latents -- Geometry-Appearance-Aware Surfel Splatting" (2604.14928) introduces a hybrid radiance field representation that explicitly disentangles geometry and appearance in surfel-based scene models. Building on recent advances in Gaussian Splatting (GS) and Neural Radiance Fields (NeRF), the authors identify the entanglement of geometry and appearance as a bottleneck for sparse, high-fidelity scene representations. Existing methods either struggle with memory constraints, optimization instability, or excessive primitive counts, especially when modeling high-frequency textures. The proposed method aims to mitigate these issues by combining learnable per-surfel latents for low-frequency, geometry-consistent features with a spatial hash-grid for high-frequency texture details, thus robustly separating structural and appearance information.

Figure 1: Hybrid Latents disentangle low-frequency scene components (via per-surfel latent features) from high-frequency texture details (via a hash-grid). They achieve superior visual quality with fewer surfels and improve geometric fidelity, shown by an accurate silhouette (vs. ground truth in white) and depth reconstruction (right).

Methodology

Hybrid Feature Decomposition

Each surfel primitive is augmented with a low-dimensional learnable latent $\mathbf{f}_{g}$ encoding low-frequency properties such as geometry and base color. High-frequency information is encoded via a single-resolution spatial hash-grid $\mathcal{H}(\mathbf{x})$ , evaluated at the view-ray–surfel intersection. For rendering, concatenated hybrid features per fragment are alpha-blended along the view ray. An MLP, receiving the blended feature alongside harmonic-encoded view directions, decodes the final RGB value, supporting both view-independent and view-dependent effects.

Figure 2: Method Overview: Features from per-surfel representations and a single-resolution hash-grid are blended via volumetric rendering along each view ray. The blended feature, augmented with viewing direction, is processed by an MLP to output color.

This hybrid decomposition provides a strong inductive bias: per-surfel latents efficiently encode geometry-consistent, smooth structures that are difficult to represent in spatially coarse hash-grids, while the hash-grid flexibly captures residual high-frequency textures.

Differentiable Surfel Splatting with Beta Kernels

The method models surfels as oriented planar disks parameterized by position, orientation, scale, and opacity. For spatial support, the approach replaces standard Gaussian kernels with learnable Beta-distribution kernels, which interpolate between soft Gaussian blobs and hard disks, supporting sharp silhouette modeling and efficient overdraw reduction.

Figure 3: Beta kernels. Negative $b$ values yield flatter peaks with sharper cutoffs for learning sharp geometry, whereas positive $b$ values produce smooth distributions resembling the Gaussian kernel (dashed line).

Optimization Framework

Pure gradient-based optimization often results in geometric dilation, where primitives expand to compensate for geometry errors via high-capacity appearance features. To avoid this, the method employs Stochastic Gradient Langevin Dynamics (MCMC), injecting noise in the tangent plane of surfels, thus enabling better exploration of the geometric parameter space and preventing overfitting to local minima.

To further enforce sparsity and hard surface formation, a Binary Cross-Entropy (BCE) regularization is introduced on surfel opacities, explicitly penalizing intermediate (semi-transparent) alpha values. This forces surfels to become either entirely opaque or transparent, which, combined with Beta kernels’ bounded support, enables effective axis-aligned culling and aggressive pruning of redundant primitives.

Experimental Evaluation

Quality and Sparsity

Experiments demonstrate superior scene reconstruction and novel view synthesis on NeRF Synthetic, Mip-NeRF 360, and DTU datasets. The method achieves comparable or higher PSNR and LPIPS scores than state-of-the-art splatting (3DGS [kerbl_3d_2023], 2DGS [huang20242d], NeST [zhang2025neural], Beta Splatting [liu2025deformable]), while requiring an order of magnitude fewer surfels, e.g., $19$k–$50$k vs. hundreds of thousands or millions in baselines.

Figure 4: Qualitative comparison of Beta Splatting and Hybrid Latents on representative scenes.

Ablation studies on the Mip-NeRF 360 Bicycle scene show progressive gains in sparsity and frame rate (up to 80 FPS) as each method component is added (hybrid features, MCMC, BCE, Beta kernels), with an expected trade-off in PSNR and LPIPS at the extreme sparse regime. The explicit frequency decomposition makes the hybrid method uniquely capable of maintaining high-fidelity texturing even with aggressive primitive reduction.

Figure 5: Per-primitive information with increasing sparsity. Top: hybrid features active; Bottom: hash-grid features disabled, revealing the per-surfel latent’s role.

Disentanglement and Frequency Decomposition

The hybrid design induces a clear semantic and spectral separation between low-frequency (base geometry, coarse illumination) and high-frequency (texture, fine appearance) scene components.

Figure 6: Per-primitive features vs texturing methods. Hybrid Latents and NeST Splatting correctly represent textures at low primitive counts, in contrast to GaussianSpa and Mini-Splat's failures due to spherical harmonic limitations.

Figure 7: Using a single hash-grid layer vs. pure per-primitive features; hybridization allows effective texture encoding even with a minimal hash-grid.

Figure 8: The separation of features into low-frequency structural components on the primitives (left) and high-frequency texture components in the hash-grid (right).

Geometric Reconstruction

On DTU, the method attains lower Chamfer Distances than NeST Splatting, validating improved geometric faithfulness due to stable, large per-surfel latents anchoring the primitives to structural surfaces. The explicit surfel support and BCE regularization contribute to this geometric accuracy.

Practical and Theoretical Implications

The approach advances neural scene representations by combining explicit, differentiable surfel parameterizations with neural appearance fields in a manner that decouples, rather than entangles, geometric and textural signals. This enables efficient, compact reconstructions with minimal primitives and reduced overdraw, directly addressing scalability and speed bottlenecks of prior NeRF-based and splatting methods.

From a theoretical perspective, the paper provides evidence that explicit frequency decomposition—motivated by both surface modeling and optimization stability—offers strong inductive biases for joint geometry-appearance fitting. The integration of MCMC optimization and per-surfel latent anchoring shapes the optimization landscape to avoid degenerate solutions, such as texture overcompensation for geometric inaccuracies.

Practically, the high sparsity and real-time rendering efficiencies reported by the authors promote deployment on resource-constrained hardware, and the method's hybridization framework appears broadly compatible with a range of explicit and implicit primitives beyond surfels. The BCE-based hard alpha enforcement and Beta kernels' compact support are particularly impactful for applications requiring real-time adaptation (e.g., AR, VR).

Limitations and Future Directions

While achieving remarkable sparsity and geometric fidelity, the method's reliance on a large per-pixel MLP decoder can be a computational bottleneck compared to Spherical Harmonics queries in pure GS approaches. The use of implicit hash-grid fields restricts explicit scene editing and composability.

The authors speculate that baking the MLP-decoded colors into static surfel textures—or extending the reconstruction to watertight textured meshes using the hybrid representation—could unlock further speed and utility. Integrations with mesh-based pipelines such as Mesh-in-the-Loop and Triangle Splatting appear promising for the adaptation of these ideas to mesh-based or hybrid mesh–point scene graphs.

Conclusion

Hybrid Latents provide a geometry-appearance-aware scene model that robustly separates low-frequency geometric structure from high-frequency texture, yielding highly sparse and efficient reconstructions with state-of-the-art visual quality. The explicit hybridization of per-surfel and hash-grid latent features, combined with stochastic optimization and aggressive sparsity induction, represents a significant advancement for real-time neural rendering regimes and scalable scene modeling. Future research should investigate MLP baking strategies, mesh integration, and broader applications of explicit frequency-aware decomposition in neural graphics.

Markdown Report Issue