CompGS: Compressed Gaussian Splatting
- Compressed Gaussian Splatting (CompGS) is a method that compresses 3D scene representations using hybrid anchor-residual coding, redundancy elimination, and hierarchical sampling.
- It employs prediction networks and context-based entropy modeling to optimize storage and maintain rendering quality and speed.
- Empirical benchmarks demonstrate order-of-magnitude size reductions with minimal PSNR and SSIM loss, enabling efficient real-time applications.
Compressed Gaussian Splatting (CompGS) refers to a family of methods that reduce the storage and transmission cost of 3D Gaussian Splatting (3DGS) scene representations by leveraging statistical redundancy and structured prediction among Gaussian primitives. CompGS aims to enable high-fidelity, real-time rendering with order-of-magnitude reductions in model size. This is accomplished through hybrid primitive structures, hierarchical sampling, entropy modeling, and rate–distortion optimization both for static and dynamic scene settings. The following sections provide a comprehensive technical account of the key concepts, methodologies, compression architectures, evaluation benchmarks, and implementation guidelines for CompGS, reflecting the current state of research.
1. Background and Motivation
3D Gaussian Splatting (3DGS) represents complex scenes for novel view synthesis by encoding surfaces with millions of spatially distributed, anisotropic Gaussian kernels, each defined by geometric and appearance attributes (mean , covariance , color or higher-order spherical harmonics, and opacity ). Analytic rasterization or splatting enables real-time image synthesis with superior rendering fidelity compared to NeRF-like implicit fields.
However, the sheer quantity and granularity of explicit Gaussian primitives burdens storage, bandwidth, and memory—real-world scenes may reach several gigabytes at 32-bit precision, impeding practical deployment for streaming, embedded, and bandwidth-critical scenarios (Liu et al., 2024, Navaneet et al., 2023, Wang et al., 7 Aug 2025). Simple pruning and quantization offer limited gains, as they ignore the strong local correlations between primitives. CompGS methods therefore seek to combine advanced redundancy reduction (inspired by predictive video/image codecs), hierarchical organization, semantically-aware pruning, and learned entropy coding to achieve substantial reductions in data size and improved system efficiency.
2. Hybrid Primitive Structures and Prediction Frameworks
The archetypal CompGS technique divides the set of Gaussians into two roles: anchors (fully-coded, spatially sparse) and coupled/residual primitives (compactly encoded as functions of anchors). This hybrid representation exploits the strong spatial, photometric, and geometric correlation among neighboring Gaussians to achieve redundancy elimination (Liu et al., 2024, Liu et al., 17 Apr 2025).
Anchor and Coupled Representation
- Anchors (): Sparse set, each described by full geometry (mean, covariance), appearance features, and a high-dimensional learned reference embedding (, often ).
- Coupled/Residual Primitives (): Each coupled Gaussian stores only a low-dimensional embedding (e.g., ); all other attributes are predicted from its anchor and residual embedding.
- Prediction Networks: At inference, small MLPs reconstruct full coupled Gaussian attributes from a concatenation of anchor and residual features via network-predicted affine transforms:
- Hierarchical or Pyramid Structures: Multiple levels of anchors and residuals further reduce redundancy at different spatial scales (Wang et al., 7 Aug 2025, Huang et al., 2024).
Prediction-driven coding is variationally optimized under a rate–distortion loss, where most Gaussians are uniquely specified by high-entropy codes, while the bulk of volume is encoded by compact residuals, maximizing compression without evident image degradation.
3. Pyramid, Hierarchical, and Context-Based Compression Strategies
Modern CompGS extends beyond static spatial anchors by introducing multi-level and context-aware prediction pipelines that exploit scene structure and visual importance.
Hierarchy and Sampling
- Laplacian-Style Pyramids: Scene point clouds are voxelized at multiple resolutions, with residual details encoded only where critical for rendering quality (Wang et al., 7 Aug 2025).
- Importance-Driven Sampling: Scene perception modules dynamically reweight Gaussians based on camera view and depth-of-field analysis. Primitives with higher visual importance are retained at finer levels-of-detail (LOD), while coarser LODs suffice for near-empty or less salient regions (Wang et al., 7 Aug 2025). Visibility, coverage scoring, and adaptive thresholds govern sampling density and allocation.
Attribute Compression with Context
- Generalized Gaussian Models: Each quantized attribute is modeled using a Generalized Gaussian Distribution (GGD), with MLPs adaptively predicting the location, scale, and shape parameters for context-dependent entropy estimation (Wang et al., 7 Aug 2025).
- Context-Driven Coding: Hierarchical coding in CompGS++ and related frameworks incorporates spatial priors (e.g., feature grids) and hyperpriors for modeling the entropy of both anchor and residual parameter streams. This enables efficient arithmetic coding tailored to the local scene structure (Liu et al., 17 Apr 2025, Huang et al., 13 May 2025).
- Attribute Prediction from Nearest Anchors: Non-anchor primitives are predicted from spatially nearest anchors using transform coding methods (e.g., RAHT), with only the residuals requiring explicit storage (Huang et al., 2024, Wang et al., 30 May 2025).
- Temporal Compression: For dynamic scenes or 4DGS, temporal priors and motion prediction are incorporated, with motion residuals and compensation coded in hash-grid or context-driven structures (Liu et al., 17 Apr 2025, Chen et al., 26 Apr 2025, Huang et al., 13 May 2025).
4. Rate–Distortion Optimization and Entropy Modeling
CompGS pipelines are trained or optimized end-to-end under a Lagrangian rate–distortion objective: where encompasses traditional photometric image losses (PSNR, SSIM, LPIPS), and captures the average bits-per-symbol across quantized anchor and residual attributes (Liu et al., 2024, Liu et al., 17 Apr 2025, Huang et al., 13 May 2025).
Entropy models are realized using learned Gaussian, Laplacian, or mixed GGD priors, conditioned on context features or hyperpriors. Scalar, vector, or noise-substituted quantization are applied during training, with simulated quantization noise to maintain differentiability (Wang et al., 3 Apr 2025, Javed et al., 2024).
Bitstreams are composed of the following at minimum:
- Quantized anchor feature codes and geometry (optionally entropy coded)
- Compact residual codes
- Metadata: codebooks, MLP weights, layout/LOD structure
- Optionally, point cloud geometry is compressed using specialized point-cloud codecs for additional gain (Wang et al., 21 May 2025, Wang et al., 30 May 2025).
5. Empirical Performance and Ablations
Quantitative evaluations consistently demonstrate that CompGS yields order-of-magnitude decreases in memory/storage, negligible or imperceptible drops in image quality, and often increased rendering speed due to smaller memory footprints and less data movement (Wang et al., 7 Aug 2025, Liu et al., 2024, Liu et al., 17 Apr 2025, Navaneet et al., 2023).
| Scene/Data | Uncompressed Size | CompGS Size | Compression Ratio | PSNR Drop | SSIM Drop | FPS Gain | Reference |
|---|---|---|---|---|---|---|---|
| Waymo, MatrixCity | 80–100 MB | 9.5 MB | ~8–10x | +0.3 dB | +0.01 | 130–160 (1x) | (Wang et al., 7 Aug 2025) |
| Mip-NeRF360 | 789 MB | 16.5 MB | ~48x | –0.2 dB | –0.01 | > 2x (vs NeRF) | (Liu et al., 2024) |
| Deep Blending | 666 MB | 8.8 MB | ~76x | –0.1 dB | – | ↑ | (Liu et al., 2024) |
| Tanks & Temples | 434 MB | 9.6 MB | ~45x | –0.02 dB | – | ↑ | (Liu et al., 2024) |
Ablation studies show:
- Removing scene perception compensation or adaptive LOD dramatically increases storage and/or decreases quality (Wang et al., 7 Aug 2025).
- Without hierarchical anchor residualization, model sizes balloon by up to 6x (Wang et al., 26 Mar 2025).
- Rate–distortion curves for CompGS methods are consistently Pareto-optimal among alternatives.
- In dynamic 4DGS, anchor grouping, coarse-to-fine motion decomposition, and spatio-temporal entropy models yield 30–204x compression and 300–800% rendering speedup (Huang et al., 13 May 2025, Chen et al., 26 Apr 2025, Liu et al., 17 Apr 2025).
6. Practical Implementation Guidance and Limitations
Implementation of CompGS pipelines involves careful selection of feature embedding size, anchor/residual ratios ( per anchor), quantization step prediction, and entropy model capacity (Liu et al., 2024, Liu et al., 17 Apr 2025). Hyperparameters for LOD granularity, pruning thresholds, and rate–distortion weighting () should be tuned per scene distribution (e.g., for smooth RD trade-offs). Multi-level voxel grids or Laplacian pyramids are set so that the coarsest cell matches the minimal visible detail.
For real-time streaming or on-demand fidelity, progressive compression strategies (e.g., PCGS) implement monotonic anchor decoding and progressive quantization, guaranteeing bitstream scalability (Chen et al., 11 Mar 2025).
Key limitations:
- Initial training time is higher than for pure explicit 3DGS (e.g., ~1.6h vs. 50min for large scenes) due to added compensation, residualization, and entropy coding (Wang et al., 7 Aug 2025).
- Excessive anchor reduction may degrade performance in highly sparse scenes or degenerate with aggressive LOD scaling (Wang et al., 7 Aug 2025).
- For dynamic scenes, motion residualization and temporal context coding add complexity and compression artifacts may arise for extremely rapid or unstructured motion (Huang et al., 13 May 2025, Chen et al., 26 Apr 2025).
- Adaptive or hierarchical anchor assignment and further model acceleration remain open research directions (Liu et al., 2024, Wang et al., 26 Mar 2025, Wang et al., 7 Aug 2025).
7. Extensions to Dynamic Scenes and Future Research
CompGS methodologies generalize to dynamic (4D) scenes by extending anchor-based spatial prediction to temporal prediction of Gaussian motion and appearance. This is achieved via:
- Encoding static canonical anchors, then dynamically deforming or compensating positions, shape, and color through per-frame codes or neural deformation fields (Huang et al., 13 May 2025, Chen et al., 26 Apr 2025, Javed et al., 2024).
- Temporal prediction modules that exploit inter-frame correlations, segmenting Gaussians into static or dynamic sets, and learning motion-/appearance-residuals from context (Liu et al., 17 Apr 2025).
Promising lines for future research include:
- Multi-scale, variable anchor coupling and hierarchical dynamic anchors for extremely large or unbounded scenes.
- Further integration of point cloud geometry codecs for Gaussians, closing the gap with state-of-the-art AI-based point-cloud compressors, notably GausPcgc (Wang et al., 21 May 2025).
- Neural entropy models that incorporate attribute, spatial, and temporal dependencies for optimal bits/splat allocation (Liu et al., 17 Apr 2025, Wang et al., 3 Apr 2025).
- Fully progressive, streaming-friendly CompGS bitstreams to support real-time, bandwidth-adaptive immersive applications (Chen et al., 11 Mar 2025).
- Efficient, end-to-end training for CompGS-latent field diffusion, triplane field representations, and end-to-end generative modeling (Ju et al., 10 Mar 2025, Wang et al., 26 Mar 2025).
CompGS and its derivatives constitute the current state-of-the-art in highly compressed, real-time splatting-based scene representations, yielding substantial efficiency gains with minimal, rigorously bounded loss in rendering quality across both static and dynamic scene modalities (Wang et al., 7 Aug 2025, Liu et al., 2024, Liu et al., 17 Apr 2025).