Hostname: page-component-76fb5796d-dfsvx Total loading time: 0 Render date: 2024-04-29T01:40:06.074Z Has data issue: false hasContentIssue false

Multi-objective reward shaping for global and local trajectory planning of wing-in-ground crafts based on deep reinforcement learning

Published online by Cambridge University Press:  14 June 2023

H. Hu
Affiliation:
State Key Laboratory of Structural Analysis for Industrial Equipment, School of Naval Architecture Engineering, Dalian University of Technology, Dalian, China
D. Li
Affiliation:
School of Aeronautic Science and Engineering, Beihang University, Beijing, China
G. Zhang*
Affiliation:
State Key Laboratory of Structural Analysis for Industrial Equipment, School of Naval Architecture Engineering, Dalian University of Technology, Dalian, China Collaborative Innovation Center for Advanced Ship and Deep-Sea Exploration, Shanghai, China
Z. Zhang
Affiliation:
State Key Laboratory of Structural Analysis for Industrial Equipment, School of Naval Architecture Engineering, Dalian University of Technology, Dalian, China
*
Corresponding author: G. Zhang; Email: dutgyzhang@163.com

Abstract

The control of a wing-in-ground craft (WIG) usually allows for many needs, like cruising, speed, survival and stealth. Various degrees of emphasis on these requirements result in different trajectories, but there has not been a way of integrating and quantifying them yet. Moreover, most previous studies on other vehicles’ multi-objective trajectory is planned globally, lacking for local planning. For the multi-objective trajectory planning of WIGs, this paper proposes a multi-objective function in a polynomial form, in which each item represents an independent requirement and is adjusted by a linear or exponential weight. It uses the magnitude of weights to demonstrate how much attention is paid relatively to the corresponding demand. Trajectories of a virtual WIG model above the wave trough terrain are planned using reward shaping based on the introduced multi-objective function and deep reinforcement learning (DRL). Two conditions are considered globally and locally: a single scheme of weights is assigned to the whole environment, and two different schemes of weights are assigned to the two parts of the environment. Effectiveness of the multi-object reward function is analysed from the local and global perspectives. The reward function provides WIGs with a universal framework for adjusting the magnitude of weights, to meet different degrees of requirements on cruising, speed, stealth and survival, and helps WIGs guide an expected trajectory in engineering.

Type
Research Article
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of Royal Aeronautical Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Brunke, L., Greeff, M., Hall, A.W., Yuan, Z., Zhou, S., Panerati, J. and Schoellig, A.P. Safe learning in robotics: From learning-based control to safe reinforcement learning, Annu. Rev. Control robot. Auton. Syst., 2022, 5, pp 411444.CrossRefGoogle Scholar
Dooraki, A.R. and Lee, D.-J. A multi-objective reinforcement learning based controller for autonomous navigation in challenging environments, Machines, 2022, 10, p 500.CrossRefGoogle Scholar
Xu, G., Jiang, W., Wang, Z. and Wang, Y. Autonomous obstacle avoidance and target tracking of UAV based on deep reinforcement learning, J. Intell. Robot. Syst., 2022, 104, p 60.CrossRefGoogle Scholar
Wang, W., Luo, X., Li, Y. and Xie, S. Unmanned surface vessel obstacle avoidance with prior knowledge-based reward shaping, Concurr. Comput. Pract. Exp., 2021, 33, p. e6110.CrossRefGoogle Scholar
Xu, X., Lu, Y., Liu, X. and Zhang, W. Intelligent collision avoidance algorithms for USVs via deep reinforcement learning under COLREGs, Ocean Eng., 2020, 217, p 107704.CrossRefGoogle Scholar
Zhou, X., Wu, P., Zhang, H., Guo, W. and Liu, Y. Learn to navigate: Cooperative path planning for unmanned surface vehicles using deep reinforcement learning, IEEE Access, 2019, 7, pp 165262165278.CrossRefGoogle Scholar
Liu, J., Liu, Z., Wu, Z. and Yu, J. Three-dimensional path following control of an underactuated robotic dolphin using deep reinforcement learning, In 2020 IEEE International Conference on Real-time Computing and Robotics (RCAR), IEEE, Asahikawa, Japan, 2020, pp 315–320.CrossRefGoogle Scholar
Sun, Y., Luo, X., Ran, X. and Zhang, G. A 2D optimal path planning algorithm for autonomous underwater vehicle driving in unknown underwater canyons, J. Mar. Sci. Eng., 2021, 9, p 252.CrossRefGoogle Scholar
Chen, J., Yuan, B. and Tomizuka, M. Model-free deep reinforcement learning for urban autonomous driving, In 2019 IEEE Intelligent Transportation Systems Conference (ITSC), IEEE, Auckland, New Zealand, 2019, pp 2765–2771.CrossRefGoogle Scholar
Deshpande, N. and Spalanzani, A. Deep reinforcement learning based vehicle navigation amongst pedestrians using a grid-based state representation, In 2019 IEEE Intelligent Transportation Systems Conference (ITSC), IEEE, Auckland, New Zealand, 2019, pp 2081–2086.CrossRefGoogle Scholar
Wang, H., Wang, Z. and Cui, X. Multi-objective optimization based deep reinforcement learning for autonomous driving policy, J. Phys. Conf. Ser., 2021, 1861, p 012097.CrossRefGoogle Scholar
Hu, B., Li, J., Yang, J., Bai, H., Li, S., Sun, Y. and Yang, X. Reinforcement learning approach to design practical adaptive control for a small-scale intelligent vehicle, Symmetry, 2019, 11, p 1139.CrossRefGoogle Scholar
Hu, W., Li, X., Hu, J., Song, X., Dong, X., Kong, D., Xu, Q. and Ren, C. A rear anti-collision decision-making methodology based on deep reinforcement learning for autonomous commercial vehicles, IEEE Sens. J., 2022, 22, pp 1637016380.CrossRefGoogle Scholar
Ye, F., Cheng, X., Wang, P., Chan, C.-Y. and Zhang, J. Automated lane change strategy using proximal policy optimization-based deep reinforcement learning, In 2020 IEEE Intelligent Vehicles Symposium (IV), 2020, pp 1746–1752.CrossRefGoogle Scholar
Luo, Z., Zhou, J. and Wen, G., Deep reinforcement learning based tracking control of unmanned vehicle with safety guarantee, In 2022 13th Asian Control Conference (ASCC), 2022, pp 1893–1898.CrossRefGoogle Scholar
Bakker, L. and Grammatico, S. A multi-agent deep reinforcement learning framework for automated driving on highways, In 2020 28th Mediterranean Conference on Control and Automation ( MED ), 2020, pp. 770775.Google Scholar
Schmidt, L.M., Kontes, G., Plinge, A. and Mutschler, C. Can you trust your autonomous car? interpretable and verifiably safe reinforcement learning, In 2021 IEEE Intelligent Vehicles Symposium (IV), 2021, pp 171–178.CrossRefGoogle Scholar
Xu, J., Pei, X. and Lv, K. Decision-Making for Complex Scenario using Safe Reinforcement Learning, In 2020 4th CAA International Conference on Vehicular Control and Intelligence (CVCI), IEEE, Hangzhou, China, 2020, pp. 1–6.CrossRefGoogle Scholar
Lv, K., Pei, X., Chen, C. and Xu, J. A safe and efficient lane change decision-making strategy of autonomous driving based on deep reinforcement learning, Mathematics, 2022, 10, p 1551.CrossRefGoogle Scholar
Peake, A., McCalmon, J., Raiford, B., Liu, T. and Alqahtani, S. Multi-agent reinforcement learning for cooperative adaptive cruise control, In 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), pp 15–22.CrossRefGoogle Scholar
Wurman, P.R., Barrett, S., Kawamoto, K., MacGlashan, J., Subramanian, K., Walsh, T.J., Capobianco, R., Devlic, A., Eckert, F., Fuchs, F., Gilpin, L., Khandelwal, P., Kompella, V., Lin, H., MacAlpine, P., Oller, D., Seno, T., Sherstan, C., Thomure, M.D., Aghabozorgi, H., Barrett, L., Douglas, R., Whitehead, D., Dürr, P., Stone, P., Spranger, M. and Kitano, H. Outracing champion Gran Turismo drivers with deep reinforcement learning, Nature, 2022, 602, pp 223228.CrossRefGoogle ScholarPubMed
Zhang, J., Chen, H., Song, S. and Hu, F. Reinforcement learning-based motion planning for automatic parking system, IEEE Access, 2020, 8, pp 154485154501.CrossRefGoogle Scholar
Yuan, C. Philosophies of the Stability and Control of WIG Craft, Modern Ship Mechanics, National Defense Industry Press, 2014.Google Scholar
Melin, T. A vortex lattice MATLAB implementation for linear aerodynamic wing applications, Master’s Thesis, Department of Aeronautics, Royal Institute of Technology (KTH), Stockholm, Sweden, 2000.Google Scholar
Barber, T.J., Leonardi, E. and Archer, R.D. A technical note on the appropriate CFD boundary conditions for the prediction of ground effect aerodynamics, Aeronaut. J. 1968, 1999, 103, pp 545547.CrossRefGoogle Scholar
Raymer, D.P. Aircraft Design: A Conceptual Approach, 4th ed., AIAA Education Series, American Institute of Aeronautics and Astronautics, Reston, VA, 2006.Google Scholar
Beard, R. and McLain, T. Small Unmanned Aircraft: Theory and Practice, Princeton University Press, 2012.CrossRefGoogle Scholar
Diston, D.J. Computational Modelling and Simulation of Aircraft and the Environment: Platform Kinematics and Synthetic Environment, volume 1, 1st ed. Aerospace Series, John Wiley & Sons Ltd, United Kingdom, 2009.CrossRefGoogle Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. and Klimov, O., Policy Optimization Algorithms, Proximal. ArXiv:1707.06347 [cs], 2017.Google Scholar
Liu, X., Li, Z. and Zheng, J. ElegantRL: Massively Parallel Framework for Cloud-native Deep Reinforcement Learning, 2021. GitHub Repository.Google Scholar