Cooperative Decision-Making for CAVs at Unsignalized Intersections: A MARL Approach with Attention and Hierarchical Game Priors

Published 9 Sep 2024 in cs.RO | (2409.05712v1)

Abstract: The development of autonomous vehicles has shown great potential to enhance the efficiency and safety of transportation systems. However, the decision-making issue in complex human-machine mixed traffic scenarios, such as unsignalized intersections, remains a challenge for autonomous vehicles. While reinforcement learning (RL) has been used to solve complex decision-making problems, existing RL methods still have limitations in dealing with cooperative decision-making of multiple connected autonomous vehicles (CAVs), ensuring safety during exploration, and simulating realistic human driver behaviors. In this paper, a novel and efficient algorithm, Multi-Agent Game-prior Attention Deep Deterministic Policy Gradient (MA-GA-DDPG), is proposed to address these limitations. Our proposed algorithm formulates the decision-making problem of CAVs at unsignalized intersections as a decentralized multi-agent reinforcement learning problem and incorporates an attention mechanism to capture interaction dependencies between ego CAV and other agents. The attention weights between the ego vehicle and other agents are then used to screen interaction objects and obtain prior hierarchical game relations, based on which a safety inspector module is designed to improve the traffic safety. Furthermore, both simulation and hardware-in-the-loop experiments were conducted, demonstrating that our method outperforms other baseline approaches in terms of driving safety, efficiency, and comfort.

Abstract PDF Upgrade to Chat

Citations (5)

View on Semantic Scholar

Summary

The paper introduces the MA-GA-DDPG algorithm integrating attention and hierarchical game priors for decision-making at unsignalized intersections.
It leverages a multi-agent deep deterministic policy gradient framework enhanced by attention mechanisms to predict interactions and manage conflict risks.
Performance evaluations demonstrate improved learning efficiency, driving safety, and traffic flow in mixed-autonomy scenarios via real-time policy corrections.

Cooperative Decision-Making for CAVs at Unsignalized Intersections

Introduction

The paper "Cooperative Decision-Making for CAVs at Unsignalized Intersections: A MARL Approach with Attention and Hierarchical Game Priors" introduces an innovative algorithm, MA-GA-DDPG, designed to address the challenges faced by Connected Autonomous Vehicles (CAVs) at complex unsignalized intersections. This intersection environment involves mixed human-machine traffic where conventional decision-making methods have shown limitations. Reinforcement learning (RL) provides a promising avenue for developing effective decision strategies, yet faces challenges in safety, cooperation, and realistic modeling. MA-GA-DDPG formulates this decision-making as a decentralized multi-agent reinforcement learning (MARL) problem, incorporating attention mechanisms and hierarchical game priors to enhance interaction prediction, risk assessment, and policy correction.

Framework Overview

MA-GA-DDPG leverages the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) as its foundational algorithm. To enhance decision-making performance, the approach integrates an attention-based policy network that highlights relevant agents for interaction based on calculated attention weights.

Figure 1: The framework of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG).

The algorithm further develops a hierarchical game framework where vehicles at intersections are processed based on the attention-derived interaction priorities. This system proactively predicts potential conflicts during exploration and corrects actions to improve traffic safety without compromising efficiency.

Technical Contributions

Policy Network Design: The attention mechanism in MA-GA-DDPG enables the policy network to weigh interaction dependencies effectively. Using a multi-head attention model, the network identifies significant agents in the traffic environment, producing an attention matrix that informs the importance of each interaction.

Figure 2: The attention-based policy network for every single agent.

Hierarchical Game Priorities: Attention weights translate into hierarchical game relations, forming a level-k priority schema. Vehicles are assessed based on their likelihood and importance for interaction within the traffic scenario, enabling strategic decision-making that incorporates potential vehicle conflicts.

Figure 3: Attention-based interactive object selection for each CAV.

Safety Inspector Module: Utilizing game priors, the safety inspector module anticipates, evaluates, and corrects high-risk actions. The module supervises CAV movements, predicting possible collisions and adjusting strategies in real-time to avoid conflicts, thereby enhancing the algorithm's overall learning efficiency.

Figure 4: Trajectory prediction of surrounding agents and conflict checking for CAV i at the intersection.

Performance Evaluation

The MA-GA-DDPG algorithm was rigorously evaluated across various simulated scenarios, including environments exclusively with CAVs, as well as those incorporating either homogeneous or heterogeneous human-driven vehicles (HVs). Through extensive simulations and hardware-in-the-loop evaluations, the algorithm consistently demonstrated improvements in metrics such as learning efficiency, driving safety, and overall traffic efficiency.

Figure 5: The mean reward and cumulative reward of our model and other baselines in different environments (a) and (d): just CAVs; (b) and (e): CAVs and homogeneous HVs; (c) and (d): CAVs and heterogeneous HVs.

Additionally, its ability to balance aggressive and cautious driving styles based on real-time interaction assessments was showcased, fostering smoother transitions and enhancing passage success rates across intersections.

Conclusions and Future Work

The research provides a robust framework for CAV decision-making at unsignalized intersections, leveraging attention mechanisms and hierarchical game priors to tackle complexities in mixed-traffic environments. Future developments may focus on expanding this approach to more intricate traffic scenarios and refining the conflict resolution process to further advance the safety and efficiency of MARL-driven traffic systems. Continued research into the cooperative dynamics between CAVs and human-driven vehicles will ensure these systems emulate pragmatic driving behaviors while maintaining stringent safety standards.