Air Light VR (ALVR) Streaming Bridge

Updated 30 January 2026

Air Light VR (ALVR) is an open-source, MIT-licensed platform for untethered VR streaming that decouples high-fidelity rendering from head-mounted displays.
It uses a client–server model to stream HEVC-encoded video, audio, and motion tracking data over UDP, ensuring low motion-to-photon latency.
The system supports adaptive bitrate control and extensive network metric analysis, making it a robust platform for VR performance research.

Air Light VR (ALVR) is an open-source, MIT-licensed VR streaming bridge designed for untethered, cloud-based Virtual Reality (VR) content delivery over Wi-Fi. Its primary function is to decouple high-fidelity graphical rendering, performed on a SteamVR server, from display operations on a head-mounted device (HMD). ALVR achieves real-time transmission of HEVC-encoded video, audio, and control streams packaged over UDP, supporting motion-to-photon latencies suitable for interactive VR workloads. The system is a research and development platform for studying networked VR performance metrics, adaptive bitrate control, codec pipeline optimizations, and multi-user wireless contention scenarios.

1. Architectural Design and Data Flow

ALVR implements a client–server paradigm, facilitating the separation of VR scene rendering from display and motion tracking. The server component, running on a VR-ready PC, performs pose-prediction based on HMD sensor feedback and renders stereo frames with optional foveated rendering and reprojection. Video frames are encoded using FFmpeg’s HEVC pipeline (“fast” preset) and chunked into discrete intervals (e.g., $T_{\mathrm{CHUNK}}=1.5$ s). Each encoded NAL unit is extracted, tagged with ALVR-specific headers (including stream type, sequence number, total packets), then fragmented for UDP transport.

Audio streams are sent in fixed-size 2,000-byte pairs at 10 ms intervals. On the HMD client, UDP fragments are reassembled into complete video frames for hardware decoding and VR runtime display. Head-pose telemetry is transmitted uplink at triple the frame rate. Loss notifications are triggered if frame reassembly exceeds a 0.1 s deadline, enforcing stringent latency and reliability requirements (Maura et al., 23 Jan 2026).

2. Emulated 802.11 Network Integration

ALVR’s operation has been extensively studied within controlled and simulated IEEE 802.11 environments. Research teams have recreated ALVR’s traffic injection mechanisms and application-layer timing on both physical and emulated Wi-Fi infrastructures, yielding realistic analyses of VR-specific network behavior (Maura et al., 23 Jan 2026, Casasnovas et al., 20 Feb 2025).

A discrete-event Rust framework (NexoSim) modularly instantiates server/client “Model” endpoints tethered to access point and station models. Native ALVR packetization logic is retained, ensuring that traffic profiles, including HEVC frame sizes, burstiness, and fragmentation, match operational deployments. The wireless channel is modeled with IEEE 802.11be PHY/MAC parameters (5 GHz, 80 MHz, DCF, RTS/CTS, single-user MIMO, AMPDU aggregation, and typical exponential backoff), with path loss modeled per TMB indoor assumptions. A 10% MPDU packet error rate triggers MAC-level retransmissions, accurately emulating congestion and reliability events.

Table: ALVR Traffic Injection and 802.11 Simulation

Component	Implementation	Key Parameters
Traffic source	ALVR StreamSocket	Application headers, UDP
Codec pipeline	FFmpeg HEVC fast	GOP/IR, 4K, 60/90 FPS
Wireless model	NexoSim, 802.11be	80 MHz, DCF, PER ≈ 10%

3. Video Codec Modes and Traffic Parameters

ALVR supports both IPPP-typed Group of Pictures (GOP) and Intra-refresh (IR) HEVC coding. Standard GOP structures exhibit periodic I-frame spikes and interspersed P-frames with GOP sizes of 30 or 90. In contrast, IR coding disperses intra-coded macroblocks across every frame, flattening frame-size distributions and yielding lower latency variability at the cost of reduced video perceptual quality (typically 2–3 dB VMAF loss for equivalent bitrates).

Experimental traffic scenarios use CBR profiles at 10–100 Mbps and frame rates of 60 or 90 FPS. Frame size variance, burstiness, and channel utilization increase with bitrate and frame rate, impacting overall queuing and airtime. Larger GOPs increase compression efficiency but magnify instantaneous airtime bursts on I-frame transmission (Maura et al., 23 Jan 2026).

4. Network Metrics and Analytical Expressions

ALVR collects a comprehensive set of application-layer network metrics for both performance evaluation and adaptive control, as extended and validated in several research publications (Maura et al., 2024, Casasnovas et al., 20 Feb 2025):

End-to-End Latency (Video-Frame RTT): $L = \tau_{\mathrm{tx}} + \tau_{\mathrm{net}} + \tau_{\mathrm{dec}}$ , where $\tau_{\mathrm{tx}}$ is packetization/transmission time, $\tau_{\mathrm{net}}$ is wireless channel time (with backoff, collisions), and $\tau_{\mathrm{dec}}$ is decode and buffer time.
Latency Jitter: $\sigma_L = \sqrt{\frac{1}{N} \sum_{i=1}^{N}(L_i - \bar{L})^2}$ ,
Channel Utilization (CU): Fraction of airtime consumed (including collision and retransmission), computed per MAC logs.
Throughput Capacity: $C = \frac{\text{Total bits successfully transmitted}}{\text{Simulation time}}$ ,
Frame Loss Rate (FLR): Fraction of dropped VFs when deadline exceeded (typically >0.1 s).

Additional metrics include client-side frame span, frame inter-arrival time, packet loss counts, instantaneous and peak throughput, video-frame jitter, and filtered one-way delay gradients (FOWD), with Kalman-filter postprocessing for stability (Maura et al., 2024).

QoS/QoE thresholds are set per ITU-T J.1631 guidance (median $L\leq 33$ ms, FLR $\leq 1\%$ ).

5. Adaptive Bitrate Control and the NeSt-VR Algorithm

To address Wi-Fi channel fluctuations, multi-user contention, and mobility, ALVR integrates Adaptive Bitrate (ABR) algorithms. The principal ABR implementations include ALVR’s native profile and the Network-aware Step-wise ABR (“NeSt-VR”) algorithm, both relying on real-time network metrics to inform video encoder target bitrate setting (Casasnovas et al., 20 Feb 2025).

NeSt-VR Algorithm Highlights:

Periodic execution ( $\tau$ =1 s) with discrete bitrate steps ( $\Delta B$ ), ranging between $B_{\text{min}}$ =10 Mbps and $B_{\text{max}}$ =100 Mbps.
Inputs: Smoothed Network Frame Ratio ( $\overline{\text{NFR}}$ ), Video-Frame RTT ( $\overline{\text{VF-RTT}}$ ), estimated filtered channel capacity ( $C$ ).
Decision logic:
- If $\overline{\text{NFR}}<\rho$ , aggressively decrement bitrate;
- If $\overline{\text{VF-RTT}}>\sigma$ , probabilistically reduce bitrate;
- Otherwise, probabilistically increment bitrate, capped so $B_v\leq m\cdot C$ for $m=0.90$ .
Parameterizations for “Balanced”, “Speedy”, and “Anxious” profiles determine adaptation granularity.

This algorithm achieves higher average delivered bitrates and superior QoE-proxy metrics (frame delivery rate, VF-RTT, packet loss) relative to CBR and ALVR’s native ABR, especially under capacity drops, mobility, and co-channel interference scenarios (Casasnovas et al., 20 Feb 2025, Maura et al., 2024).

6. Experimental Validation and Capacity Limits

Research testbeds at UPF (Barcelona) and CREW (Brussels) employed ALVR for extensive single-user and multi-user evaluations (Casasnovas et al., 20 Feb 2025, Maura et al., 23 Jan 2026). Under controlled 802.11be settings (5 GHz, 80 MHz):

Single-user, 100 Mbps CBR, 90 FPS: Median latency $L$ ≈ 10–11 ms, FLR ≈ 0.1%.
Multi-user (up to 6 users): At 4 users, CU reaches 96–100%, FLR surpasses 1%, and median $L$ exceeds 33 ms (QoS threshold breached).
GOP vs. IR coding: IR reduces $\sigma_L$ (latency jitter) by ~30–40% at saturation compared to GOP, with only minor VMAF penalty.
Adaptive ABR (NeSt-VR): Maintains frame delivery ≈ 90 fps and VF-RTT ≈ 12 ms across dynamic bandwidth and mobility cases, scaling average bitrate to available capacity, and mitigating packet loss during transitions and interference.
Mobility and OBSS interference: NeSt-VR dynamically reduces target bitrate in response to increased RTT and capacity loss, preserving frame rate and avoiding stalling.

Table: Performance Metrics for ALVR at 90 FPS, 100 Mbps CBR

Users	Codec	CU (%)	FLR (%)	Median $L$ (ms)	$\sigma_L$ (ms)	QoS OK?
1	GOP90	24	0.1	10	4	Yes
4	IR	96	1.2	18	8	No
6	GOP90	92	3.8	80	40	No

This suggests that vanilla IEEE 802.11 channels suffice for only up to 4 concurrent 100 Mbps VR streams before excessive latency and loss, with IR coding extending stability marginally (Maura et al., 23 Jan 2026).

7. Limitations and Prospective Advancements

Current ALVR ABR implementations use static metric thresholds (e.g., $\rho$ , $\sigma$ ) tuned for general VR streaming, but dynamic thresholding may be required for different applications or environments (Casasnovas et al., 20 Feb 2025). Fairness between users is implicit; future developments could integrate explicit coordination or weighted ABR algorithms. Objective perceptual metrics (PSNR, SSIM, VMAF) are not currently factored into ABR logic but are suggested for future QoE-driven refinements.

Research directions involve leveraging upcoming Wi-Fi 7/8 feature sets (Multi-Link Operation, Multi-AP Coordination) to refine capacity estimation, integrating offline reinforcement learning for predictive adaptation, and exploring the impact of realistic network dynamics and interference.

All ALVR code, metrics, and NeSt-VR implementations remain publicly available for further development and reproducibility (Casasnovas et al., 20 Feb 2025).