Tune-Your-Style: Tunable 3D Stylization
- Tune-Your-Style is a paradigm for intensity-tunable 3D scene stylization that augments base Gaussian Splatting with controllable style offsets.
- The method employs per-splat MLP 'Gaussian neurons' and learnable intensity embeddings to interpolate between unstyled and fully stylized scenes.
- It integrates a multi-view diffusion model and tunable hybrid losses to ensure real-time, coherent, and customizable style transfer in 3D rendering.
Tune-Your-Style is a paradigm and family of methods for intensity-tunable, user-controllable style transfer in 3D scene stylization, with an explicit focus on controlling the content–style trade-off at inference time within a 3D Gaussian Splatting (3DGS) rendering framework. This approach responds to a fundamental limitation of prior 3DGS-based stylization techniques: the inability to flexibly modulate the amount of style imposed on the scene after training, thus restricting customization for diverse user requirements and creative workflows (Zhao et al., 31 Jan 2026).
1. Intensity-Tunable 3D Style Transfer: Conceptual Framework
The Tune-Your-Style system models a stylized scene by augmenting a base 3DGS reconstruction with an explicit, continuous “style intensity” parameter, denoted β ∈ [0,1], controlling the amount of style injected into the geometry and appearance of each Gaussian splat. A learnable style tuner, parameterized by a set of embeddings tied to different intensity levels, linearly interpolates between the unstyled base and the fully stylized variant provided by a reference style image.
At the core, the method attaches a small per-splat MLP (“Gaussian neuron”) that outputs attribute offsets for each splat under a given style. These offsets are then modulated in their amplitude by the selected style intensity before being applied, yielding a decomposable and tunable stylization mechanism.
2. 3D Gaussian Splatting Representation with Neuronal Style Offsets
The base scene is represented as a set of N Gaussian splats:
where is the center, the scaling, the rotation, the opacity, and the color/feature vector. Each splat is augmented with a “Gaussian neuron” (an MLP) predicting style-induced attribute offsets:
for a reference style image . The stylized scene at maximal intensity becomes:
To introduce intensity-tunability, a continuous parameter is quantized into bins, and for each bin, an embedding is learned, which scales the attribute offsets elementwise:
where is the elementwise product. The base scene () and fully stylized scene () thus correspond to being all zeros and all ones, respectively.
3. Tunable Stylization Guidance via Multi-View Consistent Diffusion Models
A major challenge in 3D stylization is cross-view consistency: stylized appearance must remain coherent across rendered views. Tune-Your-Style employs stylized 2D views generated for multiple camera poses using a pretrained, image-conditioned diffusion model (e.g., IP-Adapter + Stable Diffusion XL). To enforce style consistency, a cross-view style alignment mechanism warps features from an anchor view into target views and performs mutual self-attention during the diffusion process, compelling the stylized outputs to share patterns and color distributions across views.
Stylization guidance is provided by three losses:
- Full-style guidance: and LPIPS distances between rendered, stylized views and the diffusion model output.
- Zero-style guidance: and LPIPS distances between the rendered view and the original, unstyled rendering.
- Tunable hybrid loss: A convex combination of full- and zero-style losses, weighted by the user-chosen .
4. Two-Stage Optimization and Style Tuner Learning
Training proceeds in two sequential stages:
- Full-style supervision: Learn the Gaussian neurons and the embedding for maximal style intensity (, typically all ones) to fit the style imposed by the diffusion-based stylized views.
- Intensity embedding optimization: With the Gaussian neurons and frozen, learn the embeddings for intermediate style intensities. Each governs the amplitude of style offset injection for its corresponding intensity bin. Training minimizes the tunable guidance loss:
where is the rendered image from the current stylized splats at intensity , the original rendering, and the stylized diffusion output.
To enhance stability, redundant (low-importance) splats are filtered, limiting overfitting. Training is efficient and performed on a single high-memory GPU, supporting interactive workflows.
5. Inference and User-Controlled Style Adjustment
At inference, style tuning is realized as follows:
- The user specifies a desired style intensity .
- The corresponding embedding is retrieved.
- The per-splat style offsets are modulated via .
- The updated splat set is rendered from arbitrary camera poses.
Adjusting at inference incurs negligible computational overhead, as only the small embedding lookup and vector scaling are performed before compositing. The system supports highly granular control, allowing for real-time preview and fine-tuning of the content–style balance in rendered 3D scenes.
6. Empirical Results and Practical Implications
Extensive evaluation demonstrates that Tune-Your-Style achieves high visual quality and flexible customizability in 3D scene stylization. The cross-view alignment mechanism enforces multi-angle coherence of style elements, while the intensity-tunable injection yields continuous morphing between content-preserving (low ) and fully stylized (high ) outputs.
The implementation filters the Gaussian splats based on importance, uses 10 quantization bins for intensity, and relies on a modern image-conditioned diffusion backbone for view-specific reference stylization. The entire stylization process, including intensity tuning, is compatible with real-time rendering (≥200 fps reported) when using a highly optimized 3DGS renderer (Zhao et al., 31 Jan 2026).
Future extensions may include spatially varying intensity modulation across objects, finer quantization strategies, and more advanced cross-view perceptual constraints.
7. Summary Table: Core Components of Tune-Your-Style 3D Stylization
| Component | Role | Key Mechanism |
|---|---|---|
| Gaussian neurons | Per-splat style offset prediction | MLP, one per splat, outputs Δ-attributes |
| Intensity embedding () | Controls style strength injection | Embedding table modulates offsets by intensity |
| Style guidance (diffusion) | Enforces style similarity and cross-view consistency | Multi-view stylized diffusion, cross-attention |
| Training stages | Learn full/max style & per-intensity embeddings | Two-phase optimization with tunable loss |
The Tune-Your-Style framework establishes a foundational methodology for user-adjustable, intensity-tunable artistic style transfer in 3D scenes, leveraging the compositional flexibility of Gaussian Splatting and the power of diffusion-based stylization guidance (Zhao et al., 31 Jan 2026).