DLow: Diversifying Latent Flows for Diverse Human Motion Prediction

Published 18 Mar 2020 in cs.CV, cs.LG, and eess.IV | (2003.08386v2)

Abstract: Deep generative models are often used for human motion prediction as they are able to model multi-modal data distributions and characterize diverse human behavior. While much care has been taken into designing and learning deep generative models, how to efficiently produce diverse samples from a deep generative model after it has been trained is still an under-explored problem. To obtain samples from a pretrained generative model, most existing generative human motion prediction methods draw a set of independent Gaussian latent codes and convert them to motion samples. Clearly, this random sampling strategy is not guaranteed to produce diverse samples for two reasons: (1) The independent sampling cannot force the samples to be diverse; (2) The sampling is based solely on likelihood which may only produce samples that correspond to the major modes of the data distribution. To address these problems, we propose a novel sampling method, Diversifying Latent Flows (DLow), to produce a diverse set of samples from a pretrained deep generative model. Unlike random (independent) sampling, the proposed DLow sampling method samples a single random variable and then maps it with a set of learnable mapping functions to a set of correlated latent codes. The correlated latent codes are then decoded into a set of correlated samples. During training, DLow uses a diversity-promoting prior over samples as an objective to optimize the latent mappings to improve sample diversity. The design of the prior is highly flexible and can be customized to generate diverse motions with common features (e.g., similar leg motion but diverse upper-body motion). Our experiments demonstrate that DLow outperforms state-of-the-art baseline methods in terms of sample diversity and accuracy. Our code is released on the project page: https://www.ye-yuan.com/dlow.

Abstract PDF Upgrade to Chat

Citations (214)

View on Semantic Scholar

Summary

The paper introduces DLow, a method using learnable latent mapping to yield more diverse human motion predictions.
It employs invertible affine transformations and a diversity-promoting prior within a constrained optimization framework.
Experimental results on Human3.6M and HumanEva-I demonstrate significant improvements in motion diversity and prediction accuracy.

An Expert Analysis of "DLow: Diversifying Latent Flows for Diverse Human Motion Prediction"

The paper "DLow: Diversifying Latent Flows for Diverse Human Motion Prediction" by Ye Yuan and Kris Kitani proposes a novel approach for enhancing the diversity of samples generated by deep generative models, specifically targeting the problem of human motion prediction. This research is a pertinent contribution to the field of computer vision, addressing the issue of generating diverse human motion predictions from pretrained generative models, which has remained relatively under-explored compared to model design and training methodologies.

Key Contributions and Methodology

The authors introduce "Diversifying Latent Flows" (DLow), a method designed to improve sample diversity through a structured sampling approach from a pretrained conditional variational autoencoder (CVAE). The core idea is to replace the traditional independent Gaussian latent code sampling with a strategy that involves a single random variable mapped through learnable functions to create a correlated set of latent codes. These codes are then decoded into human motion samples.

Key components of the DLow methodology include:

Latent Mapping Functions: DLow employs a set of invertible affine transformations to map the random variable to a set of correlated latent codes. This setup allows for a flexible and efficient exploration of the latent space, leading to more diverse sample outputs.
Diversity-Promoting Prior: A novel diversity-promoting prior is introduced as an optimization objective, enabling the model to favor sample sets that exhibit greater diversity by utilizing an energy-based formulation focused on pairwise sample distances.
Constrained Optimization: The authors formulate a constrained optimization problem, balancing between optimizing for diversity and ensuring the produced samples are probable under the original generative model. This involves minimizing the cross-entropy between the sample distribution and the prior while constraining the KL divergence to adhere to the Gaussian prior distribution.
Flexible Prior Design: The paper highlights the flexibility of the DLow framework in accommodating different prior designs to achieve specific objectives, such as controllable motion prediction where certain features of the predicted motions are constrained to be similar.

Experimental Results

The evaluation of the DLow framework, conducted on the Human3.6M and HumanEva-I datasets, reveals that the proposed method outperforms state-of-the-art baseline methods in generating both more diverse and accurate samples. Notably, key metrics such as Average Pairwise Distance (APD) and Average Displacement Error (ADE) indicate significant improvements in diversity and fidelity of the generated human motions. Furthermore, through qualitative visualizations, the DLow method shows its capability to predict a wide array of plausible future motions, covering multiple modes of the human motion space.

Theoretical and Practical Implications

From a theoretical perspective, the DLow framework offers a compelling new direction for the sampling strategies in generative modeling. By integrating learnable mappings and a diversity-promoting prior, DLow introduces a layer of optimization post-training, thereby enhancing the utility and applicability of generative models in real-world scenarios that require the modeling of complex and multimodal data distributions.

Practically, the improvements in sample diversity support applications in autonomous systems, robotics, and interactive media, where understanding and predicting diverse human motions can significantly impact system performance and safety.

Future Directions

The adaptive nature of the DLow method opens several avenues for future research. Enhancements could focus on exploring more sophisticated mapping functions or priors that potentially allow for even finer control over the diversity-likelihood trade-offs. Additionally, applying the DLow methodology to other domains with multimodal prediction needs, such as natural language processing or climate modeling, could further validate its robustness and versatility.

In summary, Yuan and Kitani's work on DLow is a substantial step forward in human motion prediction from generative models, providing a framework that is both scientifically rigorous and practically relevant for creating diverse human motion predictions.

Markdown Report Issue