LuxRemix: Real-Time Light Editing Pipeline
- LuxRemix is a generative computational pipeline that decomposes complex indoor scene lighting into ambient and one-light-at-a-time components for precise, interactive editing.
- It integrates diffusion models, multi-view harmonization, and 3D Gaussian splatting to achieve photorealistic rendering and maintain consistency across various viewpoints.
- The system enables real-time control over individual light sources by allowing on/off toggling and adjustment of chromaticity and intensity, outperforming global relighting methods.
LuxRemix is a generative computational pipeline for decomposing and interactively editing the illumination of complex indoor scenes from single or multi-view captures. It enables precise control of individual light sources—including on/off state, chromaticity, and intensity—for post-capture editing while maintaining photorealistic and multi-view-consistent rendering behavior. The approach integrates generative diffusion models, multi-view harmonization, and relightable 3D Gaussian splatting, resulting in real-time, per-light editing capabilities for both synthetic and real-world datasets (Liang et al., 21 Jan 2026).
1. Problem Setting and Objectives
Indoor scenes typically feature spatially varying, near-field illumination from multiple sources such as ceiling fixtures, table lamps, and wall sconces. Captured images of such scenes encode the aggregate effect of all lights, impeding downstream editing and relighting tasks. LuxRemix targets the post-capture, per-light decomposition and remixing of such lighting: given a set of captured images, the goal is to factorize the scene illumination into an ambient component and distinct "one-light-at-a-time" (OLAT) contributions, and then to enable interactive, independent control of each source—including toggling individual lights and modulating their radiometric and chromatic properties. This editing must be feasible from arbitrary viewpoints and at interactive frame rates.
The method comprises three principal components:
- Lighting decomposition: Image-based factorization into ambient term plus OLAT components, separated from a full illumination image.
- Multi-view harmonization: Propagation and enforcement of lighting decomposition across all captured views, ensuring both geometric and photometric consistency.
- Relightable 3D Gaussian splatting: Real-time rendering of scenes as Gaussian clouds with per-light appearance parameterization.
2. Generative Image-Based Light Decomposition
2.1 Mathematical Factorization
Given a post-tone-mapped input image , LuxRemix expresses it as: where comprises indirect and background lighting, and each is the OLAT image showing the scene illuminated solely by light , with as its RGB scaling factor. This decomposition enables isolation and manipulation of each light's contribution.
2.2 Architectural Components
- Base model: A pretrained diffusion-transformer (DiT), suited for conditional image editing.
- LoRA adapters: Low-rank adaptation modules inserted into every Transformer attention block, allowing efficient fine-tuning for lighting tasks.
- Spatial prompt embedding: User-specified light masks, encoded by a single-layer MLP and broadcasted as channelwise latent additions, localize edits to selected lights.
- Dual instruct tuning: Fine-tuning of LoRA for (1) OLAT decomposition tasks ("switch off all lights except the selected one at specified brightness") and (2) "light-off" editing tasks ("turn off only the selected light, keeping others unchanged").
- Multi-exposure prompting: Three LDR exposure levels (EV–4, EV–2, EV0) of each OLAT pass are synthesized and then merged through exposure-bracket HDR fusion (following Debevec–Malik).
3. Multi-View Lighting Harmonization
3.1 Diffusion-Based Harmonization
To generalize decompositions across all viewpoints, a multi-view diffusion U-Net (LuxRemix-MV) is fine-tuned. Inputs comprise:
- Multi-view RGB images
- OLAT and ambient decompositions for a subset of views
- Plücker ray embeddings
- Binary reference-view masks
The network produces photometrically coherent decompositions for unobserved viewpoints. Conditioning is achieved by concatenating all reference and target views, their masks, and their ray-based spatial encodings, enforcing geometric alignment and cross-view consistency.
3.2 Optimization Regime
Fine-tuning is performed for 30,000 iterations in three curriculum phases (increasing batch view count from 4 to 8 to 15), using AdamW (). Each predicted output is generated at three exposure levels and merged for HDR fidelity. The loss combines diffusion objectives and cross-view composition terms.
4. Integration with Relightable 3D Gaussian Splatting
4.1 3D Representation
Scenes are modeled as collections of 3D Gaussians with associated appearance features, allowing real-time, differentiable rendering by projecting and splatting into image space.
4.2 Lighting Parameter Optimization
Each Gaussian is extended to include -dimensional HDR RGB coefficients , corresponding to each OLAT and ambient component. The contribution of light at pixel is rendered as: with as the splat weight.
4.3 Objective Functions
- OLAT fidelity:
- Composition consistency: where is a differentiable tonemapping operator.
- Spatial smoothness: The total loss is the weighted sum of the above.
5. Interactive Editing System
At runtime, users can:
- Toggle any light's on/off state (),
- Apply per-light chromaticity shifts (),
- Modify intensity via scalar multipliers.
Rendering combines per-light splats: The GPU-optimized 3DGS backbone supports real-time refresh rates (30 FPS at ), enabling interactive workflow.
6. Datasets and Training Protocols
6.1 Synthetic Data Synthesis
Using 12,400 procedurally generated indoor scenes enriched with up to six controllable lights (Avetisyan et al. 2024, via Infinigen), four equirectangular HDR views () are rendered for each scene with ambient, full-light, and OLAT passes, as well as masks and depth.
6.2 Fine-tuning Regimens
- Single-image decomposition: DiT's LoRA adapters fine-tuned for 3,000 iterations (batch size 192, Prodigy optimizer, 48×A100 GPUs, 12 h).
- Multi-view harmonization: Diffusion U-Net trained for 30,000 iterations (28 h).
- 3DGS fitting: Two-stage approach—geometry is fit on standard RGBs, then joint Gaussians+lighting are optimized, and finally, lighting parameters alone are further refined.
6.3 Data Augmentation
- Random OLAT combination synthesis and multi-exposure prompting for robustness and HDR alignment.
- All equirectangular views enable arbitrary perspective cropping at training time.
7. Results, Limitations, and Future Work
7.1 Quantitative Evaluation
Performance on 30 held-out synthetic scenes demonstrates:
- Single-view decomposition (PSNR/SSIM/LPIPS):
- ScribbleLight: 14.39 / 0.395 / 0.688
- Qwen-Image: 18.23 / 0.714 / 0.237
- LuxRemix-SV: 27.68 / 0.898 / 0.082
- Multi-view harmonization (PSNR/SSIM/LPIPS):
- LuxRemix-SV (independent): 25.14 / 0.807 / 0.149
- LuxRemix-MV-Edit: 26.37 / 0.794 / 0.136
- LuxRemix-MV: 30.76 / 0.867 / 0.091
Qualitative results indicate LuxRemix accurately preserves and isolates shadows, highlights, and color casts per light, outperforming baseline diffusion editors in both plausibility and photometric fidelity. On real-world captures, LuxRemix achieves 30+ FPS at with fine-grained per-light editing, which is not supported by NeRF-W or Instruct-NeRF2NeRF, where only global relighting is feasible.
7.2 Limitations and Future Directions
- The current method is verified only for static, indoor scenes with point or planar light sources. Performance on outdoor scenes or dynamic content is untested.
- Training data emphasizes conical falloff; highly diffuse fixtures may not be faithfully modeled ("light-spread bias").
- Future extensions are anticipated to address arbitrary HDR environment maps, dynamic scene geometry, and real-time simulation of multi-bounce global illumination.