Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic

Published 8 Jan 2019 in cs.LG, cs.AI, and stat.ML | (1901.02705v1)

Abstract: Learning a policy using only observational data is challenging because the distribution of states it induces at execution time may differ from the distribution observed during training. We propose to train a policy by unrolling a learned model of the environment dynamics over multiple time steps while explicitly penalizing two costs: the original cost the policy seeks to optimize, and an uncertainty cost which represents its divergence from the states it is trained on. We measure this second cost by using the uncertainty of the dynamics model about its own predictions, using recent ideas from uncertainty estimation for deep networks. We evaluate our approach using a large-scale observational dataset of driving behavior recorded from traffic cameras, and show that we are able to learn effective driving policies from purely observational data, with no environment interaction.

Abstract PDF Upgrade to Chat

Citations (119)

View on Semantic Scholar

Summary

Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic

This paper addresses the complexities of policy learning in the context of autonomous driving, specifically in dense traffic environments. The primary focus is on improving the efficacy of learning driving policies from observational data without interaction with the real environment, leveraging model predictive control (MPC) techniques augmented with uncertainty regularization.

Problem Context and Contribution

Traditional model-free reinforcement learning (RL) approaches often rely on extensive environment interactions, which are not feasible in high-risk domains like autonomous driving. These methods face an inherent challenge known as distributional shift: the states observed during training differ from those encountered during policy execution. This paper proposes a model that learns from purely observational data, overcoming this distributional mismatch by integrating uncertainty estimates into model-based policy learning.

Approach

The researchers introduce a framework called Model-Predictive Policy with Uncertainty Regularization (MPUR). The methodology consists of two integral components: an action-conditional forward model and a policy network trained using this model. The stochastic model is constructed using a variational autoencoder (VAE) framework, allowing for the prediction of future states from past observations and actions. It incorporates a latent space that handles aleatoric uncertainties inherent in dense traffic simulations.

Uncertainty regularization is applied during policy training, where the model penalizes trajectories with high uncertainty—quantified using dropout-based variance estimation—and encourages the selection of actions that maintain the distribution of states within the manifold defined by the training data. This helps in mitigating the risk arising from the propagation of errors in poorly understood regions of the state space.

Experimental Evaluation

The framework's applicability is validated using the NGSIM I-80 dataset, a real-world driving dataset with high variability and interaction complexity. Evaluations are carried out in a tailored simulation environment where trained policies are tested for their ability to navigate dense traffic without collisions. The MPUR approach is compared against baseline techniques, such as single-step imitation learners and standard value gradient methods, showcasing substantial improvements.

The numerical results highlight two essential achievements:

Success Rate: MPUR achieves a notable success rate (approximately 74.8% in clear scenarios) in terms of reaching the end of road segments without collisions.
Policy Robustness: The approach proves its robustness by demonstrating lower cost predictions and more stable driving trajectories when compared to unregularized variants.

Implications and Future Directions

This work emphasizes the significance of integrating uncertainty regularization into policy learning, paving the way for safer deployment of autonomous systems in complex environments where direct interaction is either impractical or unsafe. The combination of MPC with uncertainty estimates provides a framework for leveraging large-scale observational datasets, which are increasingly available.

In future developments, this approach could foreseeably be extended to other domains requiring cautious decision-making based on partial information. Further exploration could target enhancing the scalability of these methods to more sophisticated environments with interactive agents, potentially utilizing advancements in deep learning architectures to further refine predictive accuracy and policy robustness. The release of the dataset and environment infrastructure encourages the academic community to advance and fine-tune these insights further.

Markdown Report Issue