VLM-MPC: Vision Language Foundation Model (VLM)-Guided Model Predictive Controller (MPC) for Autonomous Driving

Published 9 Aug 2024 in cs.RO | (2408.04821v2)

Abstract: Motivated by the emergent reasoning capabilities of Vision LLMs (VLMs) and their potential to improve the comprehensibility of autonomous driving systems, this paper introduces a closed-loop autonomous driving controller called VLM-MPC, which combines the Model Predictive Controller (MPC) with VLM to evaluate how model-based control could enhance VLM decision-making. The proposed VLM-MPC is structured into two asynchronous components: The upper layer VLM generates driving parameters (e.g., desired speed, desired headway) for lower-level control based on front camera images, ego vehicle state, traffic environment conditions, and reference memory; The lower-level MPC controls the vehicle in real-time using these parameters, considering engine lag and providing state feedback to the entire system. Experiments based on the nuScenes dataset validated the effectiveness of the proposed VLM-MPC across various environments (e.g., night, rain, and intersections). The results demonstrate that the VLM-MPC consistently maintains Post Encroachment Time (PET) above safe thresholds, in contrast to some scenarios where the VLM-based control posed collision risks. Additionally, the VLM-MPC enhances smoothness compared to the real-world trajectories and VLM-based control. By comparing behaviors under different environmental settings, we highlight the VLM-MPC's capability to understand the environment and make reasoned inferences. Moreover, we validate the contributions of two key components, the reference memory and the environment encoder, to the stability of responses through ablation tests.

Abstract PDF HTML Upgrade to Chat

Citations (2)

View on Semantic Scholar

Summary

The paper introduces a dual-layer VLM-MPC system combining high-level vision language modeling with real-time MPC for enhanced autonomous driving.
The paper details a chain-of-thought prompt generation and closed-loop feedback process, which improves safety and control under diverse conditions.
The paper demonstrates superior driving safety and comfort through rigorous evaluation on the nuScenes dataset, outperforming traditional methods.

VLM-MPC: Vision LLM-Guided Model Predictive Controller for Autonomous Driving

Introduction to VLM-MPC

The integration of Vision LLMs (VLMs) with Model Predictive Controllers (MPCs) presents a novel approach to enhancing autonomous driving systems. This paper describes a system named VLM-MPC, which leverages the emergent reasoning capabilities of VLMs to improve both the comprehensibility and adaptability of autonomous vehicles to dynamic environments. Unlike traditional systems that are primarily rule-based or rely heavily on extensive reward-function designs, the VLM-MPC architecture presents a dual-layer asynchronous mechanism wherein the high-level VLM component interprets complex environmental inputs and influences the lower-level MPC component, which executes real-time vehicle control based on these inputs.

Methodology

VLM-MPC Architecture

The proposed VLM-MPC system comprises two distinct levels:

Upper-level VLM: This component acts as the high-level decision-making engine. It processes sensory inputs, including camera images, textual scenario descriptions, ego vehicle states, and environment conditions, to derive essential driving parameters. It consists of subcomponents such as Reference Memory, Environment Description Model, Scenario Encoder, and Prompt Generator. The outputs are key driving parameters like desired speed and headway.
Lower-level MPC: The MPC handles real-time trajectory planning and control, operating at a higher frequency than the VLM. It adjusts the vehicle actions based on both the key parameters provided by the VLM and feedback from the vehicle's current state to achieve optimized driving performance.

Prompt Generation and Reasoning

Prompt generation in the VLM leverages chain-of-thought (CoT) analysis to facilitate logical step-by-step reasoning, essential for navigating complex driving scenarios. This CoT strategy enables VLMs to effectively manage scenarios with varied environmental conditions, ensuring high-quality decision outputs.

Closed-Loop Evaluation Strategy

Closed-loop control is a central feature, where feedback from vehicle states dynamically influences the VLM's decision-making. This real-time update mechanism enhances operational safety and responsiveness to unforeseen changes in the environment, challenging the typical shortcomings of open-loop designs.

Experimental Evaluation

Safety and Driving Comfort

Experimentation involved validating the VLM-MPC system through the nuScenes dataset across various environmental conditions such as night, rain, and intersections. Strong performance metrics were reported, particularly concerning safety (measured by Post Encroachment Time, PET) and driving comfort (assessed using RMS acceleration). Results illustrated the superiority of VLM-MPC over baseline models, showcasing higher PET values consistently above the critical safety threshold, indicative of enhanced safety margins.

Model Compatibility and Response Times

Comparative analyses demonstrated the practical feasibility of integrating VLMs like Llama and GPT series as foundational components, with significant emphasis on response speed metrics. Llama 3.1 emerged advantageous due to its local operation, mitigating latency issues linked to cloud-based GPT models.

Implications and Future Research

The results from VLM-MPC highlight significant implications for autonomous driving control systems, advocating for closed-loop mechanisms that accommodate high-level reasoning coupled with efficient real-time execution. Practical implementation can mitigate current challenges in dynamic environment adaptability and decision interpretability in AVs.

The potential for future advancements involves extensive ablation studies to fine-tune system components. Emphasis will be laid upon assessing the impact of image understanding capabilities and memory mechanisms on overall system robustness. Additionally, real-world vehicular experiments are essential to address the limitations of dataset-specific discrimination in diverse scenarios.

Conclusion

The VLM-MPC approach successfully demonstrates a pioneering pathway in employing VLMs for autonomous vehicle control. Ultimately, this system offers substantial gains in safety, comfort, and response efficiency, setting a new benchmark for future developments in AI-driven autonomous driving technologies. The trajectory for future research is clearly marked by the aspiration to overcome distributional biases and harness increased real-world applications through dedicated empirical evaluation.