Tune-Your-Style: Tunable 3D Stylization

Updated 7 February 2026

Tune-Your-Style is a paradigm for intensity-tunable 3D scene stylization that augments base Gaussian Splatting with controllable style offsets.
The method employs per-splat MLP 'Gaussian neurons' and learnable intensity embeddings to interpolate between unstyled and fully stylized scenes.
It integrates a multi-view diffusion model and tunable hybrid losses to ensure real-time, coherent, and customizable style transfer in 3D rendering.

Tune-Your-Style is a paradigm and family of methods for intensity-tunable, user-controllable style transfer in 3D scene stylization, with an explicit focus on controlling the content–style trade-off at inference time within a 3D Gaussian Splatting (3DGS) rendering framework. This approach responds to a fundamental limitation of prior 3DGS-based stylization techniques: the inability to flexibly modulate the amount of style imposed on the scene after training, thus restricting customization for diverse user requirements and creative workflows (Zhao et al., 31 Jan 2026).

1. Intensity-Tunable 3D Style Transfer: Conceptual Framework

The Tune-Your-Style system models a stylized scene by augmenting a base 3DGS reconstruction with an explicit, continuous “style intensity” parameter, denoted β ∈ [0,1], controlling the amount of style injected into the geometry and appearance of each Gaussian splat. A learnable style tuner, parameterized by a set of embeddings tied to different intensity levels, linearly interpolates between the unstyled base and the fully stylized variant provided by a reference style image.

At the core, the method attaches a small per-splat MLP (“Gaussian neuron”) that outputs attribute offsets for each splat under a given style. These offsets are then modulated in their amplitude by the selected style intensity before being applied, yielding a decomposable and tunable stylization mechanism.

2. 3D Gaussian Splatting Representation with Neuronal Style Offsets

The base scene is represented as a set of N Gaussian splats:

$\Theta = \{ (\mu_i, S_i, R_i, \sigma_i, c_i) \}_{i=1}^N$

where $\mu_i$ is the center, $S_i$ the scaling, $R_i$ the rotation, $\sigma_i$ the opacity, and $c_i$ the color/feature vector. Each splat is augmented with a “Gaussian neuron” (an MLP) predicting style-induced attribute offsets:

$G(S_k, \Theta) = \{ \Delta_k\mu_i, \Delta_kS_i, \Delta_kR_i, \Delta_k\sigma_i, \Delta_kc_i \}_{i=1}^N$

for a reference style image $S_k$ . The stylized scene at maximal intensity becomes:

$\hat{\Theta}_k = \Theta + G(S_k, \Theta)$

To introduce intensity-tunability, a continuous parameter $\beta$ is quantized into $Z$ bins, and for each bin, an embedding $V_\beta \in \mathbb{R}^d$ is learned, which scales the attribute offsets elementwise:

$\hat{\Theta}_k^\beta = \Theta + V_\beta \odot G(S_k, \Theta)$

where $\odot$ is the elementwise product. The base scene ( $\beta = 0$ ) and fully stylized scene ( $\beta = 1$ ) thus correspond to $V_\beta$ being all zeros and all ones, respectively.

3. Tunable Stylization Guidance via Multi-View Consistent Diffusion Models

A major challenge in 3D stylization is cross-view consistency: stylized appearance must remain coherent across rendered views. Tune-Your-Style employs stylized 2D views generated for multiple camera poses using a pretrained, image-conditioned diffusion model (e.g., IP-Adapter + Stable Diffusion XL). To enforce style consistency, a cross-view style alignment mechanism warps features from an anchor view into target views and performs mutual self-attention during the diffusion process, compelling the stylized outputs to share patterns and color distributions across views.

Stylization guidance is provided by three losses:

Full-style guidance: $L_1$ and LPIPS distances between rendered, stylized views and the diffusion model output.
Zero-style guidance: $L_1$ and LPIPS distances between the rendered view and the original, unstyled rendering.
Tunable hybrid loss: A convex combination of full- and zero-style losses, weighted by the user-chosen $\beta$ .

4. Two-Stage Optimization and Style Tuner Learning

Training proceeds in two sequential stages:

Full-style supervision: Learn the Gaussian neurons and the embedding for maximal style intensity ( $V_{full}$ , typically all ones) to fit the style imposed by the diffusion-based stylized views.
Intensity embedding optimization: With the Gaussian neurons and $V_{full}$ frozen, learn the embeddings $V_\beta$ for intermediate style intensities. Each $V_\beta$ governs the amplitude of style offset injection for its corresponding intensity bin. Training minimizes the tunable guidance loss:

$\mathcal{L}(\beta) = (1-\beta)\cdot [L_1(I_v^\beta, I_v) + \text{LPIPS}(I_v^\beta, I_v)] + \beta \cdot [L_1(I_v^\beta, I_v^k) + \text{LPIPS}(I_v^\beta, I_v^k)]$

where $I_v^\beta$ is the rendered image from the current stylized splats at intensity $\beta$ , $I_v$ the original rendering, and $I_v^k$ the stylized diffusion output.

To enhance stability, redundant (low-importance) splats are filtered, limiting overfitting. Training is efficient and performed on a single high-memory GPU, supporting interactive workflows.

5. Inference and User-Controlled Style Adjustment

At inference, style tuning is realized as follows:

The user specifies a desired style intensity $\beta \in [0,1]$ .
The corresponding embedding $V_\beta$ is retrieved.
The per-splat style offsets $G(S_k, \Theta)$ are modulated via $V_\beta \odot G(S_k, \Theta)$ .
The updated splat set $\hat{\Theta}_k^\beta$ is rendered from arbitrary camera poses.

Adjusting $\beta$ at inference incurs negligible computational overhead, as only the small embedding lookup and vector scaling are performed before compositing. The system supports highly granular control, allowing for real-time preview and fine-tuning of the content–style balance in rendered 3D scenes.

6. Empirical Results and Practical Implications

Extensive evaluation demonstrates that Tune-Your-Style achieves high visual quality and flexible customizability in 3D scene stylization. The cross-view alignment mechanism enforces multi-angle coherence of style elements, while the intensity-tunable injection yields continuous morphing between content-preserving (low $\beta$ ) and fully stylized (high $\beta$ ) outputs.

The implementation filters the Gaussian splats based on importance, uses 10 quantization bins for intensity, and relies on a modern image-conditioned diffusion backbone for view-specific reference stylization. The entire stylization process, including intensity tuning, is compatible with real-time rendering (≥200 fps reported) when using a highly optimized 3DGS renderer (Zhao et al., 31 Jan 2026).

Future extensions may include spatially varying intensity modulation across objects, finer quantization strategies, and more advanced cross-view perceptual constraints.

7. Summary Table: Core Components of Tune-Your-Style 3D Stylization

Component	Role	Key Mechanism
Gaussian neurons	Per-splat style offset prediction	MLP, one per splat, outputs Δ-attributes
Intensity embedding ( $V_\beta$ )	Controls style strength injection	Embedding table modulates offsets by intensity
Style guidance (diffusion)	Enforces style similarity and cross-view consistency	Multi-view stylized diffusion, cross-attention
Training stages	Learn full/max style & per-intensity embeddings	Two-phase optimization with tunable loss

The Tune-Your-Style framework establishes a foundational methodology for user-adjustable, intensity-tunable artistic style transfer in 3D scenes, leveraging the compositional flexibility of Gaussian Splatting and the power of diffusion-based stylization guidance (Zhao et al., 31 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Tune-Your-Style: Intensity-tunable 3D Style Transfer with Gaussian Splatting (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Tune-Your-Style.