Hostname: page-component-68c7f8b79f-m4fzj Total loading time: 0 Render date: 2025-12-18T05:47:59.733Z Has data issue: false hasContentIssue false

Autonomous manoeuvre decision-making algorithm for air combat based on hybrid temporal difference error-reward prioritised experience replay with twin delayed deep deterministic policy gradient

Published online by Cambridge University Press:  16 December 2025

J. He
Affiliation:
School of Aeronautic Science and Engineering, Beihang University, Beijing, China
Z. Meng*
Affiliation:
School of Aeronautic Science and Engineering, Beihang University, Beijing, China
J. Zhang
Affiliation:
School of Aeronautic Science and Engineering, Beihang University, Beijing, China
Z. Wang
Affiliation:
School of Aeronautic Science and Engineering, Beihang University, Beijing, China
Y. Xie
Affiliation:
School of Aeronautic Science and Engineering, Beihang University, Beijing, China
*
Corresponding author: Z. Meng; Email: mengzhijun@buaa.edu.cn

Abstract

Autonomous manoeuvre decision-making is essential for enhancing the survivability and operational effectiveness of unmanned aerial vehicles in high-risk and dynamic air combat scenarios. To address the limitations of traditional air combat decision-making methods in dealing with complex and rapidly changing environments, this paper proposes an autonomous air combat decision-making algorithm based on hybrid temporal difference error-reward prioritised experience replay with twin delayed deep deterministic policy gradient. This algorithm constructs a closed-loop learning system from environmental interaction to policy optimisation, addressing the key challenges of slow convergence and insufficient identification of critical tactical decisions in autonomous air combat. A hybrid priority metric leveraging reward backpropagation and temporal difference error filter is introduced to optimise the learning of high-value experiences while balancing sample diversity and the reuse of critical experiences. To reduce excessive trial and error in the initial phase, an integrated reward function combining task rewards and auxiliary guidance rewards is designed using the reward reshaping method to guide the agent on how to choose a manoeuvre strategy. Based on the established three-dimensional close-range air combat game model, simulation validations were conducted for both basic manoeuvre and expert system engagements. The results demonstrate that the proposed autonomous air combat manoeuvre decision-making algorithm exhibits higher learning efficiency and convergence stability. It can rapidly identify high-value manoeuvres and effectively formulate rational yet superior tactical strategies in the face of complex battlefield scenarios, demonstrating obvious benefits in enhancing combat effectiveness and tactical adaptability.

Information

Type
Research Article
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Royal Aeronautical Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Wang, X., Wang, Y., Su, X., Wang, L., Lu, C., Peng, H. and Liu, J. Deep reinforcement learning-based air combat manoeuvre decision-making: Literature review, implementation tutorial, and future direction, Artif. Intell. Rev., 2024, 57, (1), p 1. https://doi.org/10.1007/s10462-023-10620-2 CrossRefGoogle Scholar
Park, H., Lee, B.Y., Tahk, M.J. and Yoo, D.W. Differential game based air combat maneuver generation using scoring function matrix, Int. J. Aeronaut. Space Sci., 2016, 17, (2), pp 204213. https://doi.org/10.5139/IJASS.2016.17.2.204 CrossRefGoogle Scholar
Weintraub, I.E., Pachter, M. and Garcia, E. An introduction to pursuit-evasion differential games, In 2020 American Control Conference (ACC), July 2020, pp 1049–1066. https://doi.org/10.23919/ACC45564.2020.9147205 CrossRefGoogle Scholar
He, M. and Wang, X. Nonlinear differential game guidance law for guarding a target, In 2020 6th International Conference on Control, Automation and Robotics (ICCAR), April 2020, pp 713–721. https://doi.org/10.1109/ICCAR49639.2020.9108001 CrossRefGoogle Scholar
Virtanen, K., Karelahti, J. and Raivio, T. Modeling air combat by a moving horizon influence diagram game, J. Guid. Control Dyn., 2006, 29, (5), pp 10801091. https://doi.org/10.2514/1.17168 CrossRefGoogle Scholar
Lin, Z., Ming’an, T., Wei, Z. and Shengyun, Z. Sequential maneuvering decisions based on multi-stage influence diagram in air combat, J. Syst. Eng. Electron., 2007, 18, (3), pp 551555. https://doi.org/10.1016/S1004-4132(07)60128-5 CrossRefGoogle Scholar
Pan, Q., Zhou, D., Huang, J., Lv, X., Yang, Z., Zhang, K. and Li, X. Maneuver decision for cooperative close-range air combat based on state predicted influence diagram, In 2017 IEEE International Conference on Information and Automation (ICIA), July 2017, pp. 726731. https://doi.org/10.1109/ICInfA.2017.807900110 CrossRefGoogle Scholar
Kaneshige, J. and Krishnakumar, K. Artificial immune system approach for air combat maneuvering, In Intelligent Computing: Theory and Applications V, April 2007, vol. 6560, pp 6879. https://doi.org/10.1117/12.71889210.1117/12.718892 Google Scholar
McGrew, J.S., How, J.P., Williams, B. and Roy, N. Air-combat strategy using approximate dynamic programming, J. Guid. Control Dyn., 2010, 33, (5), pp 16411654. https://doi.org/10.2514/1.4681510.2514/1.46815 CrossRefGoogle Scholar
Li, W.H., Shi, J.P., Wu, Y.Y., Wang, Y.P. and Lyu, Y.X. A multi-UCAV cooperative occupation method based on weapon engagement zones for beyond-visual-range air combat, Def. Technol., 2022, 18, (6), pp 10061022. https://doi.org/10.1016/j.dt.2021.04.009 CrossRefGoogle Scholar
Ruan, W., Duan, H. and Deng, Y. Autonomous maneuver decisions via transfer learning pigeon-inspired optimization for UCAVs in dogfight engagements, IEEE/CAA J. Autom. Sin., 2022, 9, (9), pp 16391657. https://doi.org/10.1109/JAS.2022.105803 CrossRefGoogle Scholar
Fu, L., Xie, F., Meng, G. and Wang, D. An UAV air-combat decision expert system based on receding horizon control, J. Beijing Univ. Aeronaut. Astronaut., 2015, 41, (11), pp 19941999. https://doi.org/10.13700/j.bh.1001-5965.2014.0726 Google Scholar
Yang, Q., Zhang, J., Shi, G., Hu, J. and Wu, Y. Maneuver decision of UAV in short-range air combat based on deep reinforcement learning, IEEE Access, 2019, 8, pp 363378. https://doi.org/10.1109/ACCESS.2019.2961426 CrossRefGoogle Scholar
Li, Y.F., Shi, J.P., Jiang, W., Zhang, W.G. and Lyu, Y.X. Autonomous maneuver decision-making for a UCAV in short-range aerial combat based on an MS-DDQN algorithm, Def. Technol., 2022, 18, (9), pp 16971714. https://doi.org/10.1016/j.dt.2021.09.014 CrossRefGoogle Scholar
Cao, Y., Kou, Y.X., Li, Z.W. and Xu, A. Autonomous maneuver decision of UCAV air combat based on double deep Q network algorithm and stochastic game theory, Int. J. Aerosp. Eng., 2023. https://doi.org/10.1155/2023/3657814 CrossRefGoogle Scholar
Guo, J., Wang, Z., Lan, J., Dong, B., Li, R., Yang, Q. and Zhang, J. Maneuver decision of UAV in air combat based on deterministic policy gradient, In 2022 IEEE 17th International Conference on Control & Automation (ICCA), June 2022, pp 243248. https://doi.org/10.1109/ICCA54724.2022.9831941 CrossRefGoogle Scholar
Kong, W., Zhou, D., Yang, Z., Zhao, Y. and Zhang, K. UAV autonomous aerial combat maneuver strategy generation with observation error based on state-adversarial deep deterministic policy gradient and inverse reinforcement learning, Electronics, 2020, 9, (7), p 1121. https://doi.org/10.3390/electronics9071121 CrossRefGoogle Scholar
Fan, Z., Xu, Y., Kang, Y. and Luo, D. Air combat maneuver decision method based on A3C deep reinforcement learning, Machines, 2022, 10, (11), p 1033. https://doi.org/10.3390/machines10111033 CrossRefGoogle Scholar
Zhang, H., Wei, Y., Zhou, H. and Huang, C. (2022). Maneuver decision-making for autonomous air combat based on FRE-PPO, Appl. Sci., 2022, 12, (20), p 10230. https://doi.org/10.3390/app122010230 CrossRefGoogle Scholar
Li, B., Huang, J., Bai, S., Gan, Z., Liang, S., Evgeny, N. and Yao, S. Autonomous air combat decision-making of UAV based on parallel self-play reinforcement learning, CAAI Trans. Intell. Technol., 2023, 8, (1), pp 6481. https://doi.org/10.1049/cit2.12109 CrossRefGoogle Scholar
Stevens, B.L., Lewis, F.L. and Johnson, E.N. Aircraft Control and Simulation: Dynamics, Controls Design, and Autonomous Systems, John Wiley & Sons, Hoboken, NJ, 2015.10.1002/9781119174882CrossRefGoogle Scholar
Masson, W., Ranchod, P. and Konidaris, G. Reinforcement learning with parameterized actions. In Proceedings of the AAAI Conference on Artificial Intelligence, February 2016, vol. 30, (1). https://doi.org/10.1609/aaai.v30i1.10226 CrossRefGoogle Scholar
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D. and Wierstra, D. Continuous control with deep reinforcement learning, arXiv preprint arXiv:1509.02971, 2015. https://doi.org/10.48550/arXiv.1509.02971 CrossRefGoogle Scholar
Fujimoto, S., Hoof, H. and Meger, D. Addressing function approximation error in actor-critic methods, In International Conference on Machine Learning, July 2018, pp 15871596.Google Scholar
Schaul, T., Quan, J., Antonoglou, I. and Silver, D. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015. https://doi.org/10.48550/arXiv.1511.05952 CrossRefGoogle Scholar