- The paper introduces the MA-GA-DDPG algorithm integrating attention and hierarchical game priors for decision-making at unsignalized intersections.
- It leverages a multi-agent deep deterministic policy gradient framework enhanced by attention mechanisms to predict interactions and manage conflict risks.
- Performance evaluations demonstrate improved learning efficiency, driving safety, and traffic flow in mixed-autonomy scenarios via real-time policy corrections.
Cooperative Decision-Making for CAVs at Unsignalized Intersections
Introduction
The paper "Cooperative Decision-Making for CAVs at Unsignalized Intersections: A MARL Approach with Attention and Hierarchical Game Priors" introduces an innovative algorithm, MA-GA-DDPG, designed to address the challenges faced by Connected Autonomous Vehicles (CAVs) at complex unsignalized intersections. This intersection environment involves mixed human-machine traffic where conventional decision-making methods have shown limitations. Reinforcement learning (RL) provides a promising avenue for developing effective decision strategies, yet faces challenges in safety, cooperation, and realistic modeling. MA-GA-DDPG formulates this decision-making as a decentralized multi-agent reinforcement learning (MARL) problem, incorporating attention mechanisms and hierarchical game priors to enhance interaction prediction, risk assessment, and policy correction.
Framework Overview
MA-GA-DDPG leverages the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) as its foundational algorithm. To enhance decision-making performance, the approach integrates an attention-based policy network that highlights relevant agents for interaction based on calculated attention weights.
Figure 1: The framework of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG).
The algorithm further develops a hierarchical game framework where vehicles at intersections are processed based on the attention-derived interaction priorities. This system proactively predicts potential conflicts during exploration and corrects actions to improve traffic safety without compromising efficiency.
Technical Contributions
Policy Network Design: The attention mechanism in MA-GA-DDPG enables the policy network to weigh interaction dependencies effectively. Using a multi-head attention model, the network identifies significant agents in the traffic environment, producing an attention matrix that informs the importance of each interaction.
Figure 2: The attention-based policy network for every single agent.
Hierarchical Game Priorities: Attention weights translate into hierarchical game relations, forming a level-k priority schema. Vehicles are assessed based on their likelihood and importance for interaction within the traffic scenario, enabling strategic decision-making that incorporates potential vehicle conflicts.
Figure 3: Attention-based interactive object selection for each CAV.
Safety Inspector Module: Utilizing game priors, the safety inspector module anticipates, evaluates, and corrects high-risk actions. The module supervises CAV movements, predicting possible collisions and adjusting strategies in real-time to avoid conflicts, thereby enhancing the algorithm's overall learning efficiency.
Figure 4: Trajectory prediction of surrounding agents and conflict checking for CAV i at the intersection.
The MA-GA-DDPG algorithm was rigorously evaluated across various simulated scenarios, including environments exclusively with CAVs, as well as those incorporating either homogeneous or heterogeneous human-driven vehicles (HVs). Through extensive simulations and hardware-in-the-loop evaluations, the algorithm consistently demonstrated improvements in metrics such as learning efficiency, driving safety, and overall traffic efficiency.
Figure 5: The mean reward and cumulative reward of our model and other baselines in different environments (a) and (d): just CAVs; (b) and (e): CAVs and homogeneous HVs; (c) and (d): CAVs and heterogeneous HVs.
Additionally, its ability to balance aggressive and cautious driving styles based on real-time interaction assessments was showcased, fostering smoother transitions and enhancing passage success rates across intersections.
Conclusions and Future Work
The research provides a robust framework for CAV decision-making at unsignalized intersections, leveraging attention mechanisms and hierarchical game priors to tackle complexities in mixed-traffic environments. Future developments may focus on expanding this approach to more intricate traffic scenarios and refining the conflict resolution process to further advance the safety and efficiency of MARL-driven traffic systems. Continued research into the cooperative dynamics between CAVs and human-driven vehicles will ensure these systems emulate pragmatic driving behaviors while maintaining stringent safety standards.