Papers
Topics
Authors
Recent
Search
2000 character limit reached

Maneuver Decision-Making Through Automatic Curriculum Reinforcement Learning Without Handcrafted Reward functions

Published 12 Jul 2023 in cs.AI, cs.LG, and cs.RO | (2307.06152v1)

Abstract: Maneuver decision-making is the core of unmanned combat aerial vehicle for autonomous air combat. To solve this problem, we propose an automatic curriculum reinforcement learning method, which enables agents to learn effective decisions in air combat from scratch. The range of initial states are used for distinguishing curricula of different difficulty levels, thereby maneuver decision is divided into a series of sub-tasks from easy to difficult, and test results are used to change sub-tasks. As sub-tasks change, agents gradually learn to complete a series of sub-tasks from easy to difficult, enabling them to make effective maneuvering decisions to cope with various states without the need to spend effort designing reward functions. The ablation studied show that the automatic curriculum learning proposed in this article is an essential component for training through reinforcement learning, namely, agents cannot complete effective decisions without curriculum learning. Simulation experiments show that, after training, agents are able to make effective decisions given different states, including tracking, attacking and escaping, which are both rational and interpretable.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Application of deep reinforcement learning in maneuver planning of beyond-visual-range air combat. IEEE Access, 9:32282–32297, 2021.
  2. Human-level control through deep reinforcement learning. Nature, 2015.
  3. A differential game approach for beyond visual range tactics. In 2021 American control conference (ACC), pages 3210–3215. IEEE, 2021.
  4. Supervised machine learning for effective missile launch based on beyond visual range air combat simulations. In 2022 Winter Simulation Conference (WSC), pages 1990–2001. IEEE, 2022.
  5. Air combat maneuver decision method based on a3c deep reinforcement learning. Machines, 10(11):1033, 2022.
  6. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928–1937. PMLR, 2016.
  7. Research on ucav maneuvering decision method based on heuristic reinforcement learning. Computational Intelligence and Neuroscience, 2022, 2022.
  8. Autonomous air combat maneuver decision using bayesian inference and moving horizon optimization. Journal of Systems Engineering and Electronics, 29(1):86–97, 2018.
  9. Hierarchical reinforcement learning for air-to-air combat. In 2021 international conference on unmanned aircraft systems (ICUAS), pages 275–284. IEEE, 2021.
  10. Automated curriculum learning for neural networks. In international conference on machine learning, pages 1311–1320. Pmlr, 2017.
  11. Teacher–student curriculum learning. IEEE transactions on neural networks and learning systems, 31(9):3732–3740, 2019.
  12. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  13. Minerl: A large-scale dataset of minecraft demonstrations. arXiv preprint arXiv:1907.13440, 2019.
  14. Automatic curriculum learning through value disagreement. Advances in Neural Information Processing Systems, 33:7648–7659, 2020.
  15. Superloss: A generic loss for robust curriculum learning. Advances in Neural Information Processing Systems, 33:4308–4319, 2020.
  16. Self-paced curriculum learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015.
  17. Coarse-to-fine curriculum learning. arXiv preprint arXiv:2106.04072, 2021.
  18. Intrinsic motivation and automatic curricula via asymmetric self-play. arXiv preprint arXiv:1703.05407, 2017.
  19. Sunayana Rane. Learning with curricula for sparse-reward tasks in deep reinforcement learning. PhD thesis, Massachusetts Institute of Technology, 2020.
  20. Curriculum reinforcement learning via constrained optimal transport. In International Conference on Machine Learning, pages 11341–11358. PMLR, 2022.
  21. Robust deep reinforcement learning through bootstrapped opportunistic curriculum. In International Conference on Machine Learning, pages 24177–24211. PMLR, 2022.
  22. Curriculum reinforcement learning using optimal transport via gradual domain adaptation. Advances in Neural Information Processing Systems, 35:10656–10670, 2022.
  23. Paul Williams. Three-dimensional aircraft terrain-following via real-time optimal control. Journal of guidance, control, and dynamics, 30(4):1201–1206, 2007.
  24. Air-to-air missile launchable area based on target escape maneuver estimation. Journal of Beijing University of Aeronautics and Astronautics, 45(4):722–734, 2019.
  25. Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999.
  26. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pages 1889–1897, 2015.
  27. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  28. Agent57: Outperforming the atari human benchmark. In International conference on machine learning, pages 507–517. PMLR, 2020.
  29. Human-timescale adaptation in an open-ended task space. arXiv preprint arXiv:2301.07608, 2023.
  30. High-speed quadrupedal locomotion by imitation-relaxation reinforcement learning. Nature Machine Intelligence, 4(12):1198–1208, 2022.
  31. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
  32. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.