Papers
Topics
Authors
Recent
Search
2000 character limit reached

LuxRemix: Real-Time Light Editing Pipeline

Updated 25 January 2026
  • LuxRemix is a generative computational pipeline that decomposes complex indoor scene lighting into ambient and one-light-at-a-time components for precise, interactive editing.
  • It integrates diffusion models, multi-view harmonization, and 3D Gaussian splatting to achieve photorealistic rendering and maintain consistency across various viewpoints.
  • The system enables real-time control over individual light sources by allowing on/off toggling and adjustment of chromaticity and intensity, outperforming global relighting methods.

LuxRemix is a generative computational pipeline for decomposing and interactively editing the illumination of complex indoor scenes from single or multi-view captures. It enables precise control of individual light sources—including on/off state, chromaticity, and intensity—for post-capture editing while maintaining photorealistic and multi-view-consistent rendering behavior. The approach integrates generative diffusion models, multi-view harmonization, and relightable 3D Gaussian splatting, resulting in real-time, per-light editing capabilities for both synthetic and real-world datasets (Liang et al., 21 Jan 2026).

1. Problem Setting and Objectives

Indoor scenes typically feature spatially varying, near-field illumination from multiple sources such as ceiling fixtures, table lamps, and wall sconces. Captured images of such scenes encode the aggregate effect of all lights, impeding downstream editing and relighting tasks. LuxRemix targets the post-capture, per-light decomposition and remixing of such lighting: given a set of captured images, the goal is to factorize the scene illumination into an ambient component and distinct "one-light-at-a-time" (OLAT) contributions, and then to enable interactive, independent control of each source—including toggling individual lights and modulating their radiometric and chromatic properties. This editing must be feasible from arbitrary viewpoints and at interactive frame rates.

The method comprises three principal components:

  • Lighting decomposition: Image-based factorization into ambient term plus NN OLAT components, separated from a full illumination image.
  • Multi-view harmonization: Propagation and enforcement of lighting decomposition across all captured views, ensuring both geometric and photometric consistency.
  • Relightable 3D Gaussian splatting: Real-time rendering of scenes as Gaussian clouds with per-light appearance parameterization.

2. Generative Image-Based Light Decomposition

2.1 Mathematical Factorization

Given a post-tone-mapped input image Iinput(x)I_\text{input}(x), LuxRemix expresses it as: Iinput(x)=tonemap(Iambient(x)+i=1NciIi(x))I_\text{input}(x) = \mathrm{tonemap}\Bigl( I_{\rm ambient}(x) + \sum_{i=1}^N \mathbf{c}_i \cdot I_i(x) \Bigr) where Iambient(x)I_{\rm ambient}(x) comprises indirect and background lighting, and each Ii(x)I_i(x) is the OLAT image showing the scene illuminated solely by light ii, with ci\mathbf{c}_i as its RGB scaling factor. This decomposition enables isolation and manipulation of each light's contribution.

2.2 Architectural Components

  • Base model: A pretrained diffusion-transformer (DiT), suited for conditional image editing.
  • LoRA adapters: Low-rank adaptation modules inserted into every Transformer attention block, allowing efficient fine-tuning for lighting tasks.
  • Spatial prompt embedding: User-specified light masks, encoded by a single-layer MLP and broadcasted as channelwise latent additions, localize edits to selected lights.
  • Dual instruct tuning: Fine-tuning of LoRA for (1) OLAT decomposition tasks ("switch off all lights except the selected one at specified brightness") and (2) "light-off" editing tasks ("turn off only the selected light, keeping others unchanged").
  • Multi-exposure prompting: Three LDR exposure levels (EV–4, EV–2, EV0) of each OLAT pass are synthesized and then merged through exposure-bracket HDR fusion (following Debevec–Malik).

3. Multi-View Lighting Harmonization

3.1 Diffusion-Based Harmonization

To generalize decompositions across all viewpoints, a multi-view diffusion U-Net (LuxRemix-MV) is fine-tuned. Inputs comprise:

  • Multi-view RGB images {Ij}\{I_j\}
  • OLAT and ambient decompositions for a subset of views
  • Plücker ray embeddings
  • Binary reference-view masks

The network produces photometrically coherent decompositions for unobserved viewpoints. Conditioning is achieved by concatenating all reference and target views, their masks, and their ray-based spatial encodings, enforcing geometric alignment and cross-view consistency.

3.2 Optimization Regime

Fine-tuning is performed for 30,000 iterations in three curriculum phases (increasing batch view count from 4 to 8 to 15), using AdamW (5×1055\times10^{-5}). Each predicted output is generated at three exposure levels and merged for HDR fidelity. The loss combines diffusion objectives and cross-view composition terms.

4. Integration with Relightable 3D Gaussian Splatting

4.1 3D Representation

Scenes are modeled as collections of NN 3D Gaussians {pi,Σi}\{\mathbf{p}_i, \Sigma_i\} with associated appearance features, allowing real-time, differentiable rendering by projecting and splatting into image space.

4.2 Lighting Parameter Optimization

Each Gaussian is extended to include (M+1)×3(M+1)\times 3-dimensional HDR RGB coefficients Li\mathbf{L}_i, corresponding to each OLAT and ambient component. The contribution of light mm at pixel (u,v)(u,v) is rendered as: I^m(u,v)=i=1Nwi(u,v)Li[m]\hat I_m(u,v) = \sum_{i=1}^N w_{i}(u,v)\,\mathbf{L}_i[m] with wi(u,v)w_i(u,v) as the splat weight.

4.3 Objective Functions

  • OLAT fidelity: Lolat=1Mm=1MI^mIm1+λdssimLD ⁣SSIM(I^m,Im)\mathcal L_{\rm olat} = \frac1{M}\sum_{m=1}^{M}\bigl\| \hat I_m - I_m \bigr\|_1 + \lambda_{\rm dssim}\, \mathcal L_{\rm D\!SSIM}(\hat I_m, I_m)
  • Composition consistency: I^comp=mwmI^m,Lcomp=T(I^comp)Iori1\hat I_{\rm comp} = \sum_m w_m\,\hat I_m,\qquad \mathcal L_{\rm comp} = \bigl\| \mathcal T(\hat I_{\rm comp}) - I_{\rm ori} \bigr\|_1 where T()\mathcal T(\cdot) is a differentiable tonemapping operator.
  • Spatial smoothness: Lsmooth=1NKi=1NjN(i)LiLj22\mathcal L_{\rm smooth} = \frac{1}{N K} \sum_{i=1}^N\sum_{j\in\mathcal N(i)} \bigl\| \mathbf L_i - \mathbf L_j \bigr\|_2^2 The total loss is the weighted sum of the above.

5. Interactive Editing System

At runtime, users can:

  • Toggle any light's on/off state (αm\alpha_m),
  • Apply per-light chromaticity shifts (RmR_m),
  • Modify intensity via scalar multipliers.

Rendering combines per-light splats: Ifinal(u,v)=tonemap(m=1Mwm(αmRm)I^m(u,v))I_{\rm final}(u,v) = \mathrm{tonemap}\Bigl( \sum_{m=1}^{M} w_m (\alpha_m R_m)\, \hat I_m(u,v) \Bigr) The GPU-optimized 3DGS backbone supports real-time refresh rates (\geq30 FPS at 1024×7681024\times768), enabling interactive workflow.

6. Datasets and Training Protocols

6.1 Synthetic Data Synthesis

Using 12,400 procedurally generated indoor scenes enriched with up to six controllable lights (Avetisyan et al. 2024, via Infinigen), four equirectangular HDR views (2048×10242048\times1024) are rendered for each scene with ambient, full-light, and OLAT passes, as well as masks and depth.

6.2 Fine-tuning Regimens

  • Single-image decomposition: DiT's LoRA adapters fine-tuned for 3,000 iterations (batch size 192, Prodigy optimizer, 48×A100 GPUs, 12 h).
  • Multi-view harmonization: Diffusion U-Net trained for 30,000 iterations (28 h).
  • 3DGS fitting: Two-stage approach—geometry is fit on standard RGBs, then joint Gaussians+lighting are optimized, and finally, lighting parameters alone are further refined.

6.3 Data Augmentation

  • Random OLAT combination synthesis and multi-exposure prompting for robustness and HDR alignment.
  • All equirectangular views enable arbitrary perspective cropping at training time.

7. Results, Limitations, and Future Work

7.1 Quantitative Evaluation

Performance on 30 held-out synthetic scenes demonstrates:

  • Single-view decomposition (PSNR/SSIM/LPIPS):
    • ScribbleLight: 14.39 / 0.395 / 0.688
    • Qwen-Image: 18.23 / 0.714 / 0.237
    • LuxRemix-SV: 27.68 / 0.898 / 0.082
  • Multi-view harmonization (PSNR/SSIM/LPIPS):
    • LuxRemix-SV (independent): 25.14 / 0.807 / 0.149
    • LuxRemix-MV-Edit: 26.37 / 0.794 / 0.136
    • LuxRemix-MV: 30.76 / 0.867 / 0.091

Qualitative results indicate LuxRemix accurately preserves and isolates shadows, highlights, and color casts per light, outperforming baseline diffusion editors in both plausibility and photometric fidelity. On real-world captures, LuxRemix achieves 30+ FPS at 1024×7681024\times768 with fine-grained per-light editing, which is not supported by NeRF-W or Instruct-NeRF2NeRF, where only global relighting is feasible.

7.2 Limitations and Future Directions

  • The current method is verified only for static, indoor scenes with point or planar light sources. Performance on outdoor scenes or dynamic content is untested.
  • Training data emphasizes conical falloff; highly diffuse fixtures may not be faithfully modeled ("light-spread bias").
  • Future extensions are anticipated to address arbitrary HDR environment maps, dynamic scene geometry, and real-time simulation of multi-bounce global illumination.

(Liang et al., 21 Jan 2026)

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LuxRemix.