Hybrid Gaussian Splatting (HGS) Method

Updated 23 December 2025

Hybrid Gaussian Splatting (HGS) is a method that integrates explicit 3D Gaussian splatting with auxiliary components like neural MLPs, mesh surfaces, and dynamic temporal functions to enhance scene representation.
It improves view-dependent modulation, semantic and geometric layer decoupling, and dynamic-static decomposition, resulting in higher rendering fidelity and real-time performance.
HGS employs joint optimization strategies using photometric, covariance, and transmittance losses to achieve state-of-the-art benchmarks in PSNR, SSIM, and runtime efficiency.

Hybrid Gaussian Splatting (HGS) encompasses a spectrum of methods that fuse explicit 3D Gaussian Splatting with auxiliary models, including neural MLPs, mesh surfaces, temporal decomposition, and view-dependent modulation, to address the limitations of pure GS and radiance field methods. HGS approaches improve fidelity, accelerate rendering and training, boost compression, enable dynamic and semantic decomposition, and support physically plausible shading. Current definitions span VDGS (NeRF-based modulation), mesh–GS mixtures, dynamic hybridization, hierarchical sharing, multi-modal rasterization, and background–foreground separation. The unifying principle is the explicit allocation of differentiated parameterizations and auxiliary components within the Gaussian population to optimize for scene-specific demands.

1. Core Principles and Mathematical Formulation

HGS extends the standard 3D Gaussian splatting representation in several dimensions:

Explicit Representation: Each scene is constructed as a set of $n$ anisotropic Gaussians $G = \{G_i\}_{i=1}^n$ , parametrized by mean $\mu_i \in \mathbb{R}^3$ , covariance $\Sigma_i \in \mathbb{R}^{3\times 3}$ , base color $c_i \in \mathbb{R}^3$ (often spherical harmonics), and opacity $\sigma_i \in \mathbb{R}^{+}$ (Malarz et al., 2023).
Neural Modulation (VDGS): Color and opacity can be modulated as functions of viewing direction $d\in S^2$ via a small per-Gaussian MLP $F_{VDGS}$ , which produces increments $\Delta c_i(d), \Delta \sigma_i(d)$ , yielding

$c_i(d) = c_i + \Delta c_i(d),\quad \sigma_i(d) = \sigma_i \cdot \Delta \sigma_i(d)$

(Malarz et al., 2023).

Hybrid Layering and Decoupling: Cohorts of Gaussians may be stratified by semantic or geometric criteria (e.g., road/environment layers, static/dynamic, mesh/Gaussian, 2D/3D modes) (Shi et al., 2024, Huang et al., 8 Jun 2025, Zhang et al., 16 Dec 2025, Zhang et al., 2 Dec 2025).

Rendering in HGS leverages forward alpha compositing, either in a unified pass over all primitives or in decoupled tile-based routines for separate layers or types. The canonical per-ray integration for a sorted Gaussian set is:

$\hat{C}(r) = \sum_{j=1}^N T_j\, (1 - e^{-\sigma_j \delta_j})\, c_j,\qquad T_j = \exp\left(-\sum_{k<j} \sigma_k \delta_k\right)$

where $\delta_j$ is the local sample thickness (Malarz et al., 2023).

For dynamic/HGS methods, static and dynamic components are delineated. Static Gaussians share time-invariant parameters, while dynamic Gaussians are expressed with time-dependent radial basis functions:

$\mu_i(t) = \sum_{k=0}^3 b_{i,k}(t-\mu_i^\tau)^k,\quad r_i(t) = \sum_{k=0}^1 c_{i,k}(t-\mu_i^\tau)^k,\quad \sigma_i(t) = \sigma_i^\tau \exp[-s_i^\tau|t-\mu_i^\tau|^2]$

(Zhang et al., 16 Dec 2025).

Table: Canonical Parameterizations—Selected HGS Variants

Model	Primitive Types	Auxiliary Component
VDGS (Malarz et al., 2023)	3D GS	Per-Gaussian MLP
Hybrid Mesh-GS (Huang et al., 8 Jun 2025)	Mesh + 3D GS	Joint optimization
Dynamic HGS (Zhang et al., 16 Dec 2025)	Static/Dynamic GS	SDD + RBF
EGGS (Zhang et al., 2 Dec 2025)	2D/3D GS	Adaptive exchange
HO-Gaussian (Li et al., 2024)	GS + grid vol	Neural warping

2. Hybridization Strategies

HGS employs multiple schemes to allocate representation and compute, with context-specific tradeoffs:

Viewing Direction Modulation: VDGS incorporates an MLP per Gaussian, trained to output $\Delta c_i(d), \Delta \sigma_i(d)$ for fine-grained, view-dependent control of color and opacity (Malarz et al., 2023). This enables the capture of phenomena such as specular highlights and transparent effects otherwise inaccessible to static GS.
Semantic/Geometric Layer Decoupling: DHGS (Shi et al., 2024) partitions the Gaussian set into road vs. environment layers, rendering them separately and blending at a pixelwise level using depth-ordered or smooth sigmoid masks to ensure continuity and visibility correctness.
Mesh–GS Integration: Hybrid mesh-GS (Huang et al., 8 Jun 2025) allocates large, planar regions to textured mesh surfaces and reserves Gaussians for complex geometry. Optimization alternates between mesh refinement, texture completion, and GS densification, using transmittance-aware loss terms.
Dynamic–Static Decomposition: HGS for dynamic scenes reduces parameter redundancy by sharing temporally invariant copies among static regions and instantiating full time-dependent RBFs only for dynamic Gaussians (Zhang et al., 16 Dec 2025, Oh et al., 19 May 2025). Hybrid conversion algorithms migrate 4D Gaussians to 3D when their learned temporal scale exceeds a threshold, preserving computational and memory efficiency.

3. Training and Optimization Protocols

HGS frameworks typically involve joint optimization of multiple parameter sets, with loss functions targeting photometric accuracy, regularization, semantic constraints, and physically plausible compositing.

Photometric Loss: Standard formulation is $L_{phot} = \sum_{r \in R} ||\hat{C}(r) - C_{gt}(r)||_2^2$ , with possible D-SSIM or L1 mixing for appearance regularization (Malarz et al., 2023, Huang et al., 8 Jun 2025, Li et al., 2024).
Covariance Regularization: Penalizing overlarge or degenerate splats via $L_{reg} = \lambda_{reg} \sum_i ||\Sigma_i||_F^2$ (Malarz et al., 2023).
Transmittance and Consistency: In layer-decoupled HGS, losses enforce appropriate segmentation between layers (e.g., $L_{tran}$ , $L_{cons}$ ) and smooth transitions at boundaries (Shi et al., 2024).
Frequency-Decoupled Optimization: EGGS (Zhang et al., 2 Dec 2025) decomposes training losses into low- and high-frequency bands via DWT, modulating updates so that geometric (2D GS) and appearance (3D GS) contributions do not interfere destructively.

Parameter splitting, adaptive pruning (by gradient magnitude or sensitivity), and hybrid densification schedules are integral to most HGS training protocols (Li et al., 2024, Zhang et al., 2024).

4. Algorithmic and Computational Features

HGS implementations often outpace pure GS or radiance field methods in both runtime and model compactness, through explicit division of labor across representation components and optimized rasterization routines.

Tile-Based Splitting and Blending: HybridSplat (Liu et al., 9 Dec 2025) renders base and reflection-baked Gaussians in two parallel passes before blending, using accelerated AABB-tile intersection and pipelined fetch–compute–accumulate routines for >7× speed gains on reflective scenes.
Hierarchical Parameter Sharing: GaussianForest (Zhang et al., 2024) organizes Gaussians into tree structures, sharing latent features among siblings at different depths. Explicit per-Gaussian parameters are minimized, and most shape/color attributes are synthesized by small MLP decoders acting upon shared roots and internal nodes, yielding order-of-magnitude compression.
Adaptive Rasterization: EGGS (Zhang et al., 2 Dec 2025) runs hybrid CUDA kernels, switching dynamically between affine projection for 3D GS and tangent-plane ray–splat intersection for 2D GS, supporting real-time FPS across large scenes.
Perspective-Correct Transparency: Hybrid transparency (Hahlbohm et al., 2024) achieves order-independent blending by exactly compositing the largest $K$ opacities and approximating the tail, enabling up to $2\times$ higher frame rates without perspective artifacts.

5. Experimental Results and Benchmarks

HGS variants consistently outperform or match state-of-the-art baselines in PSNR, SSIM, LPIPS, geometric metrics (Chamfer Distance), and runtime:

Viewing Direction GS (Malarz et al., 2023): NeRF Synthetic PSNR 33.47/SSIM 0.969 (VDGS), vs. GS (33.30/0.969), NeRF (31.01/0.947). Tanks & Temples: VDGS 24.08/0.854; GS-30K 23.14/0.841.
Mesh–GS Hybrid (Huang et al., 8 Jun 2025): Comparable or superior PSNR/SSIM to 3DGS with 18–64% fewer Gaussians and higher FPS (e.g., 446 vs. 122 on Replica scenes).
Dynamic HGS (Zhang et al., 16 Dec 2025, Oh et al., 19 May 2025): Up to 98% reduction in model size, real-time 4K rendering at 125 FPS, PSNR 32.36 dB/SSIM 0.952 on Neural 3D Video; $>2\times$ speedups vs. prior dynamic GS methods.
EGGS (Zhang et al., 2 Dec 2025): Mip-NeRF360 PSNR 27.96/SSIM 0.851/LPIPS 0.192, outperforming both 2DGS and 3DGS; geometry benchmark CD 0.91 vs. 3DGS 1.96.
Hybrid Mapping (Wu et al., 2024, Shi et al., 2024): SOTA accuracy and completeness on KITTI, Waymo, VKITTI2 with ~20% faster reconstruction.
HybridSplat (Liu et al., 9 Dec 2025): Reflection-rich scenes at 107 FPS, matching ray-traced GS quality but with 4× fewer primitives and 7× render speed.

6. Limitations and Open Problems

Inference Speed: Certain schemes, especially those adding neural or dynamic components, incur moderate inference slowdowns (e.g., VDGS 30 FPS vs. GS 90 FPS) (Malarz et al., 2023).
Boundary and Segmentation Dependence: Static–dynamic and semantic-decoupled methods rely on pretraining and external segmentation masks; misclassification leads to minor fidelity drops or boundary artifacts (Zhang et al., 16 Dec 2025, Shi et al., 2024).
Training Overhead: Multi-stage or two-step hybrid training schedules can increase optimization time or complexity (Zhang et al., 16 Dec 2025, Huang et al., 8 Jun 2025).
Parameter Schedule Heuristics: Densification rates, anchor-loss weights, exchange thresholds remain empirically selected, with robustness sensitive to heuristic choices. Automated or adaptive versions remain future work.
Hybrid Splat Format Design: Interoperability among types (2D/3D GS, mesh, volume, MLP) requires careful compositing and rendering logic.

7. Contextual Significance and Future Directions

HGS methods have substantially extended the capacity of explicit Gaussian Splatting pipelines, enabling real-time photorealistic scene synthesis, dynamic view rendering, geometric–semantic disentanglement, controllable editing, and memory-efficient large-model deployment. These advances underpin state-of-the-art results in urban mapping (Li et al., 2024, Wu et al., 2024, Omran et al., 14 Oct 2025), dynamic video streaming for VR (Zhang et al., 16 Dec 2025), high-fidelity editing under generative diffusion guidance (Chen et al., 2023), and compressed scene modeling (Zhang et al., 2024).

Active research directions include learnable dynamic decomposition, mesh–Gaussian–MLP fusion, physically accurate reflection and transparency modeling, hierarchical and semantic-aware Gaussian allocation, generalization to multimodal sensor data, and automation of parameter schedule selection.

Hybrid Gaussian Splatting thus functions as a general meta-architecture for efficient, high-fidelity scene representation, leveraging explicit splatting, neural modulation, semantic layering, and adaptive multi-modal fusion for advanced computer vision and graphics.