Gaussian Splatting in Style

Published 13 Mar 2024 in cs.CV | (2403.08498v2)

Abstract: 3D scene stylization extends the work of neural style transfer to 3D. A vital challenge in this problem is to maintain the uniformity of the stylized appearance across multiple views. A vast majority of the previous works achieve this by training a 3D model for every stylized image and a set of multi-view images. In contrast, we propose a novel architecture trained on a collection of style images that, at test time, produces real time high-quality stylized novel views. We choose the underlying 3D scene representation for our model as 3D Gaussian splatting. We take the 3D Gaussians and process them using a multi-resolution hash grid and a tiny MLP to obtain stylized views. The MLP is conditioned on different style codes for generalization to different styles during test time. The explicit nature of 3D Gaussians gives us inherent advantages over NeRF-based methods, including geometric consistency and a fast training and rendering regime. This enables our method to be useful for various practical use cases, such as augmented or virtual reality. We demonstrate that our method achieves state-of-the-art performance with superior visual quality on various indoor and outdoor real-world data.

Abstract PDF HTML Upgrade to Chat

Citations (8)

View on Semantic Scholar

Summary

The paper introduces a novel GSS approach that leverages 3D Gaussians for real-time scene stylization, achieving approximately 150FPS.
It employs a dual-module system using adaptive instance normalization for 2D stylization and a multi-resolution hash grid for 3D color consistency.
Experimental results demonstrate superior visual fidelity and consistency, outperforming NeRF-based methods in AR/VR applications.

Gaussian Splatting in Style: Real-time Scene Stylization Using 3D Gaussians

Introduction to Scene Stylization

Scene stylization addresses the artistic challenge of modifying a scene's appearance to reflect a particular style, such as those found in various artworks. Traditional neural style transfer techniques, which adapt the aesthetic of an image using another image's style, face significant limitations when applied to three-dimensional scenes. Current methodologies often rely on optimizing a scene with a specific style image or employing NeRF-based solutions, which, despite their novelty, struggle with maintaining uniform appearance across multiple views and are hampered by slow training and rendering times. In light of these challenges, the need for a more efficient and geometrically consistent technique is evident, particularly for applications in augmented and virtual reality where real-time performance is crucial.

Gaussian Splatting in Style (GSS) Approach

Our approach, dubbed Gaussian Splatting in Style (GSS), introduces a novel architecture that leverages a collection of style images to produce high-quality stylized novel views of a scene in real-time. By building upon the framework of 3D Gaussian Splatting (3DGS), we employ pretrained Gaussians, processed through a multi-resolution hash grid and a small Multilayer Perceptron (MLP) to conditionally stylize views based on the input style, all while maintaining 3D spatial consistency.

Key contributions of GSS include:

Development of a state-of-the-art method for neural scene stylization that uniquely utilizes 3D Gaussians, rendering novel views in real-time (approximately 150FPS).
Demonstrations of our method's effectiveness via both quantitative and qualitative comparisons across various real-world datasets and different settings against currently available baselines.

The Technical Workings of GSS

GSS operationally consists of two primary modules: a 2D Stylization Module relying on Adaptive Instance Normalization (AdaIN) and a 3D Color Module. The first component captures the style features from the style image and applies them to the rendered views from 3DGS, thus serving as a guide for the stylization process. The 3D Color Module takes as input the position of Gaussians within the scene, along with a latent code representing the style. These are then fed into a multi-resolution hash grid and a tiny MLP to output new RGB colors for each Gaussian, effectively stylizing the scene.

In training and subsequent loss optimization, we adopt a tailored loss function that comprises a guide loss (between the stylized 2D image and generated view) and a content loss (ensuring geometric consistency by comparing the generated view against the original, unstyled view).

Experimental Validation and Implications

Our experiments solidify GSS's superior performance, demonstrating that it achieves significant improvements in short-term and long-term consistency measures while providing visually richer and more detailed stylized views compared to state-of-the-art methods. The real-time aspect of GSS, alongside its efficiency and lower computational requirements, positions it as a promising solution for numerous practical applications, notably in augmented reality (AR) and virtual reality (VR) settings.

Looking Forward

The introduction of GSS presents a significant step forward in the domain of scene stylization, merging the fields of style transfer and 3D scene understanding. Looking forward, the scalability of this approach to handle larger, more complex scenes efficiently, the incorporation of dynamic elements within scenes, and further reduction in the computational overhead present exciting avenues for future research. The adaptability of the GSS framework to integrate with emerging technologies in AR and VR suggests a promising horizon for real-time, interactive scene stylization that both captivates and immerses users in vividly reimagined environments.