Papers
Topics
Authors
Recent
Search
2000 character limit reached

WebGPU Gaussian Splatting for Real-Time 3D Scenes

Updated 11 December 2025
  • The paper demonstrates a unified ONNX-based Gaussian Generator that enables plug-and-play integration of various 3DGS models for dynamic, real-time rendering.
  • It details the mathematical formulation of anisotropic 3D Gaussians and GPU-resident scheduling that reduces frame times by up to 85× compared to traditional pipelines.
  • The work showcases versatile applications including dynamic scene synthesis, neural avatars, and generative 3D reconstruction, promoting flexible neural graphics pipelines.

WebGPU-powered Gaussian splatting represents a unified, web-native platform for real-time neural rendering where @@@@1@@@@ are dynamically generated, manipulated, and rendered directly in the browser. Visionary, a system built on this approach, leverages a standardized ONNX-based Gaussian Generator contract and fully GPU-resident pipelines to enable high-efficiency, interactive rendering for diverse classes of 3DGS (3D Gaussian Splatting) models, including static, dynamic, and generative variants (Gong et al., 9 Dec 2025).

1. Gaussian Generator Contract

The Gaussian Generator contract is a rigorously defined ONNX I/O schema integrated with model-specific metadata that standardizes data exchange between generative neural models and the splatting renderer. Each participating 3DGS variant must implement this contract, facilitating plug-and-play algorithmic swapping without code changes in downstream rendering or UI stacks.

Per-frame ONNX model inputs include:

  • Camera/view matrices (e.g., VR4×4V \in \mathbb{R}^{4 \times 4}, PR4×4P \in \mathbb{R}^{4 \times 4})
  • Optional control signals (e.g., time tt for 4DGS, pose parameters (θ,β)(\theta, \beta) for avatars)
  • Frame index or random seed (generative models)

The ONNX graph outputs:

  • A packed tensor “Gaussians” of shape [N,D][N, D]: NN = number of splats per frame; DD = 3 (center μ\mu) + 6 (covariance Σ\Sigma) + 3 (color cc or SH coefficients) + 1 (opacity α\alpha)
  • Model-level JSON metadata: NmaxN_\text{max}, dtype (FP16/FP32)

At runtime, the JavaScript front-end binds inputs to the ONNX session, invokes a per-frame GPU-executed graph, reads back the GPU-resident output buffer, and passes this directly to the WebGPU renderer. A uniform schema allows interchanging generative models, deformers, and post-processors within a stable rendering backend (Gong et al., 9 Dec 2025).

2. Mathematical Formulation of 3D Gaussian Splatting

The foundational representation is a set of anisotropic 3D Gaussians:

G={Gi}i=1N,Gi=(μi,Σi,αi,ci)\mathcal{G} = \{G_i\}_{i=1}^N,\quad G_i = (\boldsymbol\mu_i,\,\boldsymbol\Sigma_i,\,\alpha_i,\,\mathbf c_i)

where:

  • μiR3\boldsymbol\mu_i \in \mathbb{R}^3: Gaussian center
  • ΣiR3×3\boldsymbol\Sigma_i \in \mathbb{R}^{3 \times 3}: spatial covariance (parameterized via rotation and scale)
  • αi(0,1)\alpha_i \in (0,1): opacity
  • ci[0,1]3\mathbf c_i \in [0,1]^3 or higher-order SH: color/features

Each Gaussian furnishes a spatial density

gi(x)=1(2π)3/2Σi1/2exp(12(xμi)Σi1(xμi))g_i(\mathbf x) = \frac{1}{(2\pi)^{3/2}|\Sigma_i|^{1/2}} \exp\left(-\frac{1}{2}(\mathbf x - \mu_i)^\top \Sigma_i^{-1} (\mathbf x - \mu_i)\right)

During rendering, centers are projected onto screen space via the camera projection Π()\Pi(\cdot). The screen-space covariance Si=JiΣiJi\mathbf S_i = J_i\,\Sigma_i\,J_i^\top, with Ji=Π/xJ_i = \partial\Pi/\partial x at μi\mu_i, allows computation of splat contribution:

wi(x)=αiexp(12(xxi)Si1(xxi))w_i(\mathbf x) = \alpha_i \exp\left(-\frac{1}{2}(\mathbf x - \mathbf x_i)^\top \mathbf S_i^{-1} (\mathbf x - \mathbf x_i)\right)

Pixel color is composited in a depth-sorted back-to-front manner:

C(x)=i=1N(wi(x)j<i(1wj(x)))ciC(\mathbf x) = \sum_{i=1}^N \left(w_i(\mathbf x)\prod_{j<i}(1-w_j(\mathbf x))\right) c_i

This formulation supports multi-layer, semi-transparent rendering essential for high-fidelity, real-time neural scene depiction (Gong et al., 9 Dec 2025).

3. ONNX Inference and WebGPU Scheduling

Model architectures supported within the Visionary platform span:

  • MLP-based 3DGS (Scaffold-GS): Receives anchor features and view direction; a multi-layer perceptron outputs per-splat parameters.
  • 4DGS: Receives time tt; computes per-splat deformations by sampling multi-dimensional feature planes and passing through a lightweight MLP.
  • Animatable Avatars: Receives SMPL-X pose and shape; uses forward kinematics and linear blend skinning (LBS) in ONNX to deform canonical Gaussian sets.

Each ONNX model is loaded as an InferenceSession with the WebGPU backend. Crucially, all I/O buffers reside on device, with mapped buffer scheduling enabling zero CPU-GPU transfer per frame. Warm-up routines and command buffer capture further eliminate JavaScript and CPU scheduling bottlenecks.

Per-frame workflow:

  1. Update GPU input buffers (camera, control variables)
  2. Execute asynchronous session.run() on WebGPU
  3. Output mapped to pre-allocated GPUBuffer for "Gaussians," directly consumed by the renderer

This approach achieves full device-resident dynamic neural inference with minimized host overhead (Gong et al., 9 Dec 2025).

4. API, Integration, and Pipeline Interchangeability

Visionary’s implementation surfaces as a three.js extension and TypeScript library, exposing high-level methods for modular, dynamic model management. Core API endpoints include:

Method Purpose
registerGaussianGenerator Registers an ONNX model and schema metadata
loadAllGenerators Loads binaries and creates inference sessions
onFrame Exposes callback for per-frame input updates
runGenerators Performs inference for all loaded generators
addGaussianModelToScene Binds a generator output buffer to the WebGPU renderer

This standardized contract enables runtime model swapping (e.g., exchanging a static 3DGS for a generative diffusion model) without modifications to rendering or browser integration logic. The integration model facilitates hybrid pipelines and extensibility for future architectures (Gong et al., 9 Dec 2025).

5. Performance Metrics and Real-Time Efficiency

Direct comparison of Visionary’s WebGPU pipeline against traditional WebGL+CPU viewer pipelines highlights substantial gains attributable to GPU-resident computation and optimized primitive sorting.

End-to-end rendering (6M Gaussians, RTX 4090):

  • SparkJS (WebGL+CPU sort): 176 ms/frame (172 ms sorting)
  • Visionary (WebGPU compute + GPU radix sort): 2.1 ms/frame (0.58 ms sorting, 1.52 ms preprocess+draw)
  • Approximate reduction: 85×85 \times in total frame time

Auxiliary metrics:

Model Gaussians Inference Latency (ms)
Scaffold-GS 2.49M 9.3
Scaffold-GS 4.56M 16.1
4DGS 0.03M 4.8
4DGS 0.06M 7.9
Animatable Avatar (1 instance) ~0.04M 7.5–8.0
Animatable Avatar (10 instances) ~0.4M 50–55
  • GPU radix-sort: O(N)O(N) work, 0.6\approx 0.6 ms for 6M points, eliminates the CPU bottleneck present in WebGL-based approaches.
  • Renderer throughput: \gtrsim 400 fps (1/8 scale), \gtrsim 200 fps (1/4), \gtrsim 100 fps (1/2), \sim 50–60 fps (full resolution) for a single static model.
  • Frame time remains << 3 ms under rapid camera motion or multi-model compositions (Gong et al., 9 Dec 2025).

A plausible implication is that the platform’s unified ONNX contract and device-resident compute enable real-time generative or deformable 3DGS, supporting both reconstructive and feedforward generative scene manipulation in-browser.

6. Applications and Impact on Neural Rendering Ecosystem

WebGPU-powered Gaussian splatting, as realized in Visionary, provides a standardized, flexible substrate for neural scene synthesis, reconstruction, and generative manipulation. The platform supports MLP-based models, multi-temporal 4DGS representations, animatable neural avatars, and style or enhancement networks. Unified inference and rendering—entirely in the browser—significantly lowers the infrastructure barrier for research reproduction, benchmarking, and deployment of new 3DGS-family algorithms (Gong et al., 9 Dec 2025).

This architectural unification accelerates integration of dynamic, generative world models and facilitates modular experimentation with scene deformers, diffusion post-processors, and hybrid pipelines for academic and industrial research in real-time neural graphics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to WebGPU-Powered Gaussian Splatting.