No CrossRef data available.
Published online by Cambridge University Press: 16 December 2025
Autonomous manoeuvre decision-making is essential for enhancing the survivability and operational effectiveness of unmanned aerial vehicles in high-risk and dynamic air combat scenarios. To address the limitations of traditional air combat decision-making methods in dealing with complex and rapidly changing environments, this paper proposes an autonomous air combat decision-making algorithm based on hybrid temporal difference error-reward prioritised experience replay with twin delayed deep deterministic policy gradient. This algorithm constructs a closed-loop learning system from environmental interaction to policy optimisation, addressing the key challenges of slow convergence and insufficient identification of critical tactical decisions in autonomous air combat. A hybrid priority metric leveraging reward backpropagation and temporal difference error filter is introduced to optimise the learning of high-value experiences while balancing sample diversity and the reuse of critical experiences. To reduce excessive trial and error in the initial phase, an integrated reward function combining task rewards and auxiliary guidance rewards is designed using the reward reshaping method to guide the agent on how to choose a manoeuvre strategy. Based on the established three-dimensional close-range air combat game model, simulation validations were conducted for both basic manoeuvre and expert system engagements. The results demonstrate that the proposed autonomous air combat manoeuvre decision-making algorithm exhibits higher learning efficiency and convergence stability. It can rapidly identify high-value manoeuvres and effectively formulate rational yet superior tactical strategies in the face of complex battlefield scenarios, demonstrating obvious benefits in enhancing combat effectiveness and tactical adaptability.