Painting 3D Nature in 2D: View Synthesis of Natural Scenes from a Single Semantic Mask
Abstract: We introduce a novel approach that takes a single semantic mask as input to synthesize multi-view consistent color images of natural scenes, trained with a collection of single images from the Internet. Prior works on 3D-aware image synthesis either require multi-view supervision or learning category-level prior for specific classes of objects, which can hardly work for natural scenes. Our key idea to solve this challenging problem is to use a semantic field as the intermediate representation, which is easier to reconstruct from an input semantic mask and then translate to a radiance field with the assistance of off-the-shelf semantic image synthesis models. Experiments show that our method outperforms baseline methods and produces photorealistic, multi-view consistent videos of a variety of natural scenes.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.