NeuMan: Neural Human Radiance Field from a Single Video

Published 23 Mar 2022 in cs.CV | (2203.12575v2)

Abstract: Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences. We propose a novel framework to reconstruct the human and the scene that can be rendered with novel human poses and views from just a single in-the-wild video. Given a video captured by a moving camera, we train two NeRF models: a human NeRF model and a scene NeRF model. To train these models, we rely on existing methods to estimate the rough geometry of the human and the scene. Those rough geometry estimates allow us to create a warping field from the observation space to the canonical pose-independent space, where we train the human model in. Our method is able to learn subject specific details, including cloth wrinkles and accessories, from just a 10 seconds video clip, and to provide high quality renderings of the human under novel poses, from novel views, together with the background.

Abstract PDF Upgrade to Chat

Citations (183)

View on Semantic Scholar

Summary

The paper introduces NeuMan, a framework that leverages neural radiance fields to reconstruct dynamic human and scene models from a single video.
It decouples human and scene modeling by employing SMPL-based warping and advanced optimization for consistent, pose-independent rendering.
Experimental evaluations show NeuMan’s superiority in novel pose handling and image fidelity, outperforming models like NeuralBody and HumanNeRF.

Overview of NeuMan: Neural Human Radiance Field from a Single Video

The paper "NeuMan: Neural Human Radiance Field from a Single Video" introduces a sophisticated framework designed to reconstruct human models alongside scene models with novel poses and views, utilizing merely a single video input. The core contribution lies in the ability to render photorealistic and dynamic human representations within any given environment, extending the capabilities of augmented reality applications without the constraints typically involved in multi-camera setups or intensive manual annotations.

Methodology

The proposed approach leverages Neural Radiance Fields (NeRF), which have emerged as prominent tools in synthesizing novel views for static and dynamic scenes. However, existing methods often necessitate complex setups or extensive datasets, posing challenges for single video applications. The NeuMan framework addresses these limitations by segregating two distinct models: the human NeRF model and the scene NeRF model, trained concurrently from a single video captured by a moving camera.

Human and Scene Models:
- The human model is trained by estimating the basic geometry using existing methods, followed by its transformation into a canonical pose-independent space for consistency in novel pose integration. The transformation utilizes a warping field derived from SMPL for mesh refinement and subsequent optimization.
- The scene model abstracts the background and context, focusing on areas outside of the human meshes identified via segmentation and depth predictions.
Optimization Techniques:
- Advanced techniques are employed for optimizing both the SMPL estimates and the NeRF model parameters. This includes an error-correction network to manage discrepancies in initial geometric estimations and ensure sharp renderings.
- Incremental improvements incorporate end-to-end alignment and pose optimization, facilitating accurate depiction of human geometry and encouraging realistic animation under diverse viewing angles.

Experimental Evaluation

The paper demonstrates qualitative and quantitative evaluations on a specifically curated dataset consisting of videos capturing human motion in various scenarios. The results underscore the efficacy of NeuMan over existing models like NeuralBody and HumanNeRF, particularly in scenarios requiring extrapolation to novel poses and reposing within previously unseen contexts.

Quantitative comparisons using metrics like PSNR, SSIM, and LPIPS affirm NeuMan’s superiority in rendering precision and image fidelity across multiple scenarios, backed by effective geometry corrections and novel pose handling.

Implications and Future Prospects

NeuMan opens pathways for various application domains in augmented reality and entertainment, allowing for enhanced virtual interactions and realistic human representations from minimal input data. The approach exhibits potential benefits for telegathering applications, where composite human models can be integrated into shared virtual spaces seamlessly.

Future research could explore refined mechanisms for handling dynamic scene ambiguities through more expressive body models or garment dynamics, as well as optimizing geometry warping functions to counteract limitations in extreme pose scenarios. Addressing scale and alignment complexities remains pivotal for broad-spectrum applications within variable environmental contexts.

The NeuMan framework exemplifies progress in simplifying the technical demands of neural rendering for human-centric scenes while preserving high-quality visual outcomes to foster deeper immersive experiences.

Markdown Report Issue