WebGPU Gaussian Splatting for Real-Time 3D Scenes
- The paper demonstrates a unified ONNX-based Gaussian Generator that enables plug-and-play integration of various 3DGS models for dynamic, real-time rendering.
- It details the mathematical formulation of anisotropic 3D Gaussians and GPU-resident scheduling that reduces frame times by up to 85× compared to traditional pipelines.
- The work showcases versatile applications including dynamic scene synthesis, neural avatars, and generative 3D reconstruction, promoting flexible neural graphics pipelines.
WebGPU-powered Gaussian splatting represents a unified, web-native platform for real-time neural rendering where @@@@1@@@@ are dynamically generated, manipulated, and rendered directly in the browser. Visionary, a system built on this approach, leverages a standardized ONNX-based Gaussian Generator contract and fully GPU-resident pipelines to enable high-efficiency, interactive rendering for diverse classes of 3DGS (3D Gaussian Splatting) models, including static, dynamic, and generative variants (Gong et al., 9 Dec 2025).
1. Gaussian Generator Contract
The Gaussian Generator contract is a rigorously defined ONNX I/O schema integrated with model-specific metadata that standardizes data exchange between generative neural models and the splatting renderer. Each participating 3DGS variant must implement this contract, facilitating plug-and-play algorithmic swapping without code changes in downstream rendering or UI stacks.
Per-frame ONNX model inputs include:
- Camera/view matrices (e.g., , )
- Optional control signals (e.g., time for 4DGS, pose parameters for avatars)
- Frame index or random seed (generative models)
The ONNX graph outputs:
- A packed tensor “Gaussians” of shape : = number of splats per frame; = 3 (center ) + 6 (covariance ) + 3 (color or SH coefficients) + 1 (opacity )
- Model-level JSON metadata: , dtype (FP16/FP32)
At runtime, the JavaScript front-end binds inputs to the ONNX session, invokes a per-frame GPU-executed graph, reads back the GPU-resident output buffer, and passes this directly to the WebGPU renderer. A uniform schema allows interchanging generative models, deformers, and post-processors within a stable rendering backend (Gong et al., 9 Dec 2025).
2. Mathematical Formulation of 3D Gaussian Splatting
The foundational representation is a set of anisotropic 3D Gaussians:
where:
- : Gaussian center
- : spatial covariance (parameterized via rotation and scale)
- : opacity
- or higher-order SH: color/features
Each Gaussian furnishes a spatial density
During rendering, centers are projected onto screen space via the camera projection . The screen-space covariance , with at , allows computation of splat contribution:
Pixel color is composited in a depth-sorted back-to-front manner:
This formulation supports multi-layer, semi-transparent rendering essential for high-fidelity, real-time neural scene depiction (Gong et al., 9 Dec 2025).
3. ONNX Inference and WebGPU Scheduling
Model architectures supported within the Visionary platform span:
- MLP-based 3DGS (Scaffold-GS): Receives anchor features and view direction; a multi-layer perceptron outputs per-splat parameters.
- 4DGS: Receives time ; computes per-splat deformations by sampling multi-dimensional feature planes and passing through a lightweight MLP.
- Animatable Avatars: Receives SMPL-X pose and shape; uses forward kinematics and linear blend skinning (LBS) in ONNX to deform canonical Gaussian sets.
Each ONNX model is loaded as an InferenceSession with the WebGPU backend. Crucially, all I/O buffers reside on device, with mapped buffer scheduling enabling zero CPU-GPU transfer per frame. Warm-up routines and command buffer capture further eliminate JavaScript and CPU scheduling bottlenecks.
Per-frame workflow:
- Update GPU input buffers (camera, control variables)
- Execute asynchronous session.run() on WebGPU
- Output mapped to pre-allocated GPUBuffer for "Gaussians," directly consumed by the renderer
This approach achieves full device-resident dynamic neural inference with minimized host overhead (Gong et al., 9 Dec 2025).
4. API, Integration, and Pipeline Interchangeability
Visionary’s implementation surfaces as a three.js extension and TypeScript library, exposing high-level methods for modular, dynamic model management. Core API endpoints include:
| Method | Purpose |
|---|---|
| registerGaussianGenerator | Registers an ONNX model and schema metadata |
| loadAllGenerators | Loads binaries and creates inference sessions |
| onFrame | Exposes callback for per-frame input updates |
| runGenerators | Performs inference for all loaded generators |
| addGaussianModelToScene | Binds a generator output buffer to the WebGPU renderer |
This standardized contract enables runtime model swapping (e.g., exchanging a static 3DGS for a generative diffusion model) without modifications to rendering or browser integration logic. The integration model facilitates hybrid pipelines and extensibility for future architectures (Gong et al., 9 Dec 2025).
5. Performance Metrics and Real-Time Efficiency
Direct comparison of Visionary’s WebGPU pipeline against traditional WebGL+CPU viewer pipelines highlights substantial gains attributable to GPU-resident computation and optimized primitive sorting.
End-to-end rendering (6M Gaussians, RTX 4090):
- SparkJS (WebGL+CPU sort): 176 ms/frame (172 ms sorting)
- Visionary (WebGPU compute + GPU radix sort): 2.1 ms/frame (0.58 ms sorting, 1.52 ms preprocess+draw)
- Approximate reduction: in total frame time
Auxiliary metrics:
| Model | Gaussians | Inference Latency (ms) |
|---|---|---|
| Scaffold-GS | 2.49M | 9.3 |
| Scaffold-GS | 4.56M | 16.1 |
| 4DGS | 0.03M | 4.8 |
| 4DGS | 0.06M | 7.9 |
| Animatable Avatar (1 instance) | ~0.04M | 7.5–8.0 |
| Animatable Avatar (10 instances) | ~0.4M | 50–55 |
- GPU radix-sort: work, ms for 6M points, eliminates the CPU bottleneck present in WebGL-based approaches.
- Renderer throughput: 400 fps (1/8 scale), 200 fps (1/4), 100 fps (1/2), 50–60 fps (full resolution) for a single static model.
- Frame time remains 3 ms under rapid camera motion or multi-model compositions (Gong et al., 9 Dec 2025).
A plausible implication is that the platform’s unified ONNX contract and device-resident compute enable real-time generative or deformable 3DGS, supporting both reconstructive and feedforward generative scene manipulation in-browser.
6. Applications and Impact on Neural Rendering Ecosystem
WebGPU-powered Gaussian splatting, as realized in Visionary, provides a standardized, flexible substrate for neural scene synthesis, reconstruction, and generative manipulation. The platform supports MLP-based models, multi-temporal 4DGS representations, animatable neural avatars, and style or enhancement networks. Unified inference and rendering—entirely in the browser—significantly lowers the infrastructure barrier for research reproduction, benchmarking, and deployment of new 3DGS-family algorithms (Gong et al., 9 Dec 2025).
This architectural unification accelerates integration of dynamic, generative world models and facilitates modular experimentation with scene deformers, diffusion post-processors, and hybrid pipelines for academic and industrial research in real-time neural graphics.