Papers
Topics
Authors
Recent
Search
2000 character limit reached

ONNX-Gaussian Generator for Real-Time 3DGS

Updated 11 December 2025
  • ONNX-based Gaussian Generator is a neural module in ONNX format that produces 3D Gaussian primitives for rendering applications.
  • It integrates dynamic inference with GPU-accelerated pipelines, enabling real-time 3D Gaussian Splatting and eliminating CPU bottlenecks.
  • The system leverages strict I/O contracts and efficient pre-processing to achieve sub-millisecond frame times in advanced WebGPU rendering.

An ONNX-based Gaussian Generator is a standardized neural network module, exported in Open Neural Network Exchange (ONNX) format, for producing frame-specific Gaussian primitives within a neural rendering pipeline. Deployed primarily in the context of real-time 3D Gaussian Splatting (3DGS), these generators integrate dynamic inference with GPU-accelerated rendering, eliminating legacy pipeline constraints and CPU bottlenecks. The ONNX-based Gaussian Generator is a core element in platforms such as Visionary, which unifies the inference and rendering of generative and reconstructive world models directly in the browser, leveraging WebGPU for high-throughput interactive synthesis and visualization (Gong et al., 9 Dec 2025).

1. Interface Schema and Data Contract

The ONNX-based Gaussian Generator adheres to a strict I/O schema per animation frame, ensuring compatibility and efficient interoperation with WebGPU-based rendering engines.

Inputs:

  • camera_extrinsic: float32[4,4], representing the world-to-camera transformation matrix.
  • camera_intrinsic: float324,4, encoding the camera’s projection parameters.
  • Optional control variables: sequence index, timestamp, or latent vector, each as float32 scalar or 1D tensor.

Outputs:

  • N (int32 scalar): specifies the number of Gaussian primitives produced per frame.
  • gaussians: float16[N,13] packed tensor, each row parameterizing a 3D Gaussian with:
    • Mean: μx,μy,μz\mu_x, \mu_y, \mu_z (3)
    • Covariance (upper-triangular): Σxx,Σxy,Σxz,Σyy,Σyz,Σzz\Sigma_{xx}, \Sigma_{xy}, \Sigma_{xz}, \Sigma_{yy}, \Sigma_{yz}, \Sigma_{zz} (6)
    • RGB color: colorr,colorg,colorbcolor_r, color_g, color_b (3)
    • Opacity: α\alpha (1)
  • meta (optional): Dictionary with fields such as "packed_dtype" ("FP16" or "FP32") and upper bound for N.

The contract enforces that exactly N Gaussian tuples are output per frame. The use of a packed float16 layout minimizes upload bandwidth requirements; unpacking occurs on the GPU at runtime. Covariance is expressed in a compressed 6-value format corresponding to the independent elements of the symmetric 3×33 \times 3 matrix.

2. Mathematical Foundations of Gaussian Splatting

Each rendered primitive GiG_i is modeled as an anisotropic Gaussian in R3\mathbb{R}^3, parameterized by weight wiw_i, mean vector μiR3\mu_i \in \mathbb{R}^3, and covariance matrix ΣiR3×3\Sigma_i \in \mathbb{R}^{3\times 3}. Under camera projection Π()\Pi(\cdot), the 3D Gaussian maps to a 2D ellipse in image space:

  • Projected mean: xi=Π(μi)x_i = \Pi(\mu_i)
  • Projected covariance: Si=JiΣiJiTS_i = J_i \Sigma_i J_i^{T}, where Ji=Πxx=μiJ_i = \frac{\partial \Pi}{\partial x}\Big|_{x=\mu_i}

The influence on pixel xR2x \in \mathbb{R}^2 is given by:

gi(x)=αiexp(12(xxi)TSi1(xxi))g_i(x) = \alpha_i \cdot \exp \left( -\frac{1}{2} (x - x_i)^T S_i^{-1} (x - x_i) \right)

with view-independent color ci[0,1]3c_i \in [0,1]^3. Final color compositing proceeds front-to-back:

C(x)=i[gi(x)j<i(1gj(x))]ciC(x) = \sum_i \left[ g_i(x) \prod_{j < i} (1 - g_j(x)) \right] c_i

The generator network maps its outputs to these parameters: αiwi(0,1)\alpha_i \equiv w_i \in (0,1) (output via sigmoid); μi\mu_i (3-vector head); Σi\Sigma_i (6-vector head, upper-triangular entries, positivity enforced via softplus); cic_i (3-vector head, postprocessed if using spherical harmonics coefficients).

3. ONNX Inference Pipeline and Runtime Integration

Model deployment and inference occur directly within WebGPU-enabled browser contexts. The pipeline comprises several phases:

  • Model Loading: Performed once at application initialization using onnxruntime-web. The ONNX model is loaded, often with parameters quantized to FP16 and constants baked in at export (opset ≥ 14, constant folding enabled).
  • Warm-Up: A dummy inference primes the WebGPU execution graph and cache, optimizing subsequent per-frame executions.
  • Per-Frame Schedule:
    • Camera and optional control data are packed into WebGPU buffers.
    • Synchronous/asynchronous forward pass via ONNX runtime, with outputs streamed directly as GPU buffers (avoiding CPU roundtrips).
    • Gaussian buffer is bound as storage for GPU compute pre-processing: transform, cull, and radix-sort the Gaussian splats.
    • Instanced draw call issues one triangle-strip per splat; fragment shader evaluates gi(x)g_i(x) and performs compositing.

Reusing persistent WebGPU bind groups and ONNX session bindings enables predictable low-latency execution and amortized dispatch costs within the frame budget.

The following table summarizes the operational pipeline:

Step Action Technology
Model load ONNX import, session instantiate onnxruntime-web
Warm-up Dummy inference to cache WebGPU execution graph WebGPU
Per-frame inference Input assembly, neural decoding, GPU buffer bind WebGPU, ONNX
Preprocessing Splat preprocess, cull, radix-sort (O(N)) WebGPU compute
Rendering Instanced draw, fragment compositing WebGPU fragment

4. Network Architectures and Training Paradigms

Several network designs are supported:

  • MLP-based 3DGS: Follows Scaffold-GS style. Input is per-anchor feature (fif_i, dimension 256–512) concatenated with view direction (dviewd_{view}, dimension 3). 4–6 fully connected layers with hidden units 256–512 (GELU or ReLU activations). Four output heads: HμH_{\mu} (3 floats), HΣH_{\Sigma} (6 floats via softplus), HcH_c (3 floats, either direct RGB or SH coefficients), and HαH_\alpha (1 float via sigmoid). Trained via L2 or L1 pixel loss plus regularization on Σ\Sigma scale. ONNX export uses dynamic axes for batch/view dimensions.
  • 4D Gaussian Splatting: Temporal input (timestamp tt) with canonical parameters and multi-scale feature planes. Feature lookup (bilinear sampling) is concatenated and fed to a small MLP (2–3 layers, 128 units). Outputs are delta mean (3), delta quaternion for rotation (4), and delta scale (3). The covariance updates as RΣcanRT+diag(Δs)R \Sigma_{can} R^T + \operatorname{diag}(\Delta s).
  • Animatable Avatars (e.g., LHM, R3-Avatar): Input is SMPL-X pose θ\theta (72D), shape β\beta (10D), and optional frame index. Internally, canonical μ\mu, Σ\Sigma, and per-Gaussian skinning weights WiW_i are used. Forward kinematics and LBS are included within the ONNX graph. Output is the deformed μi\mu_i and Σi\Sigma_i in the current observation frame.

Training commonly employs mixed-precision (fp16) weights, opset ≥ 14, and constant folding. Large Concats/Splits are refactored in export to conform with WebGPU operational constraints.

5. Browser-Based Integration and Application Example

Integration is facilitated by a concise TypeScript API compatible with three.js and the “visionary-webgpu” library. The typical workflow involves initializing a WebGPU renderer and Gaussian renderer, loading the ONNX model, warm-up, and a main loop that, per frame, updates camera matrices, performs ONNX inference, streams results to GPU, and dispatches render calls.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import * as THREE from 'three';
import { InferenceSession, Tensor } from 'onnxruntime-web';
import { WebGPURenderer, GaussianRenderer } from 'visionary-webgpu';

async function initVisionary(canvas: HTMLCanvasElement) {
  const renderer = new WebGPURenderer(canvas, { useCompute: true });
  const gaussRenderer = new GaussianRenderer(renderer);

  const session = await InferenceSession.create('/models/gauss_gen.onnx', {
    executionProviders: ['webgpu']
  });

  const dummyExtrinsic = new Tensor('float32', new Float32Array(16), [4,4]);
  const dummyIntrinsic = new Tensor('float32', new Float32Array(16), [4,4]);
  await session.run({ camera_extrinsic: dummyExtrinsic, camera_intrinsic: dummyIntrinsic });

  function frame(time: number) {
    const cam = camera.matrixWorldInverse;
    const proj = camera.projectionMatrix;
    const extrinsicTensor = new Tensor('float32', cam.toArray(), [4,4]);
    const intrinsicTensor = new Tensor('float32', proj.toArray(), [4,4]);
    session.run({
      camera_extrinsic: extrinsicTensor,
      camera_intrinsic: intrinsicTensor,
      // optional control...
    }).then((outputs) => {
      const gaussBuffer = outputs.gaussians.data as Uint16Array;
      const count = outputs.N.data[0] as number;
      gaussRenderer.updatePrimitiveBuffer(gaussBuffer, count);
      renderer.beginFrame();
      gaussRenderer.draw();
      renderer.endFrame();
    });
    requestAnimationFrame(frame);
  }
  requestAnimationFrame(frame);
}

This workflow maintains all data and computation on the GPU, minimizing latency and eliminating CPU-GPU roundtrips.

6. Performance Characteristics and Throughput

The ONNX+WebGPU approach yields high throughput and low latency:

  • Per-frame preprocessing—frustum culling, μ\mu \rightarrowclip, Σ\Sigma \rightarrowellipse axes—runs in a single WebGPU compute pass.
  • Depth keys and indices are updated via GPU atomics, and a GPU radix sort organizes splats in O(N)O(N) runtime, achieving sub-millisecond times for millions of points.
  • Instanced triangle-strip draws and fragment-based Gaussian summation enable efficient compositing and blending.
  • Captured and replayed ONNX execution graphs result in stable dispatch overhead, with reported per-frame decoding times:
    • Scaffold-GS: 2.5M splats \rightarrow 9 ms
    • 4DGS: 0.06M splats \rightarrow 8 ms
    • Avatars: 0.2M splats \rightarrow 30 ms
  • Aggregate end-to-end frame times for static 6M Gaussians are \sim2 ms, representing approximately 100×\times speedup over CPU-based WebGL sorting.
  • The entire pipeline remains within a single-frame time budget, leveraging unified memory architectures for device-local execution and rapid updates (Gong et al., 9 Dec 2025).

A plausible implication is that the ONNX-based Gaussian Generator contract enables real-time, browser-native world model rendering and generative processing suitable for both reconstruction and synthetic content, with substantial practical advantages for reproducibility and deployment.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ONNX-Based Gaussian Generator.