Multi-objective reward shaping for global and local trajectory planning of wing-in-ground crafts based on deep reinforcement learning

H. Hu; D. Li; G. Zhang; Z. Zhang

doi:10.1017/aer.2023.43

Multi-objective reward shaping for global and local trajectory planning of wing-in-ground crafts based on deep reinforcement learning

Published online by Cambridge University Press: 14 June 2023

and

H. Hu: Affiliation:
State Key Laboratory of Structural Analysis for Industrial Equipment, School of Naval Architecture Engineering, Dalian University of Technology, Dalian, China
D. Li: Affiliation:
School of Aeronautic Science and Engineering, Beihang University, Beijing, China
G. Zhang*: Affiliation:
State Key Laboratory of Structural Analysis for Industrial Equipment, School of Naval Architecture Engineering, Dalian University of Technology, Dalian, China Collaborative Innovation Center for Advanced Ship and Deep-Sea Exploration, Shanghai, China
Z. Zhang: Affiliation:
State Key Laboratory of Structural Analysis for Industrial Equipment, School of Naval Architecture Engineering, Dalian University of Technology, Dalian, China
*: Corresponding author: G. Zhang; Email: dutgyzhang@163.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

The control of a wing-in-ground craft (WIG) usually allows for many needs, like cruising, speed, survival and stealth. Various degrees of emphasis on these requirements result in different trajectories, but there has not been a way of integrating and quantifying them yet. Moreover, most previous studies on other vehicles’ multi-objective trajectory is planned globally, lacking for local planning. For the multi-objective trajectory planning of WIGs, this paper proposes a multi-objective function in a polynomial form, in which each item represents an independent requirement and is adjusted by a linear or exponential weight. It uses the magnitude of weights to demonstrate how much attention is paid relatively to the corresponding demand. Trajectories of a virtual WIG model above the wave trough terrain are planned using reward shaping based on the introduced multi-objective function and deep reinforcement learning (DRL). Two conditions are considered globally and locally: a single scheme of weights is assigned to the whole environment, and two different schemes of weights are assigned to the two parts of the environment. Effectiveness of the multi-object reward function is analysed from the local and global perspectives. The reward function provides WIGs with a universal framework for adjusting the magnitude of weights, to meet different degrees of requirements on cruising, speed, stealth and survival, and helps WIGs guide an expected trajectory in engineering.

Keywords

Wing-in-ground craft Trajectory planning Multi-objective function Deep reinforcement learning Reward shaping

Information

Type: Research Article
Information: The Aeronautical Journal , Volume 128 , Issue 1320 , February 2024 , pp. 371 - 397

DOI: https://doi.org/10.1017/aer.2023.43 [Opens in a new window]
Copyright: © The Author(s), 2023. Published by Cambridge University Press on behalf of Royal Aeronautical Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Brunke, L., Greeff, M., Hall, A.W., Yuan, Z., Zhou, S., Panerati, J. and Schoellig, A.P. Safe learning in robotics: From learning-based control to safe reinforcement learning, Annu. Rev. Control robot. Auton. Syst., 2022, 5, pp 411–444.CrossRef Google Scholar

Dooraki, A.R. and Lee, D.-J. A multi-objective reinforcement learning based controller for autonomous navigation in challenging environments, Machines, 2022, 10, p 500.CrossRef Google Scholar

Xu, G., Jiang, W., Wang, Z. and Wang, Y. Autonomous obstacle avoidance and target tracking of UAV based on deep reinforcement learning, J. Intell. Robot. Syst., 2022, 104, p 60.CrossRef Google Scholar

Wang, W., Luo, X., Li, Y. and Xie, S. Unmanned surface vessel obstacle avoidance with prior knowledge-based reward shaping, Concurr. Comput. Pract. Exp., 2021, 33, p. e6110.CrossRef Google Scholar

Xu, X., Lu, Y., Liu, X. and Zhang, W. Intelligent collision avoidance algorithms for USVs via deep reinforcement learning under COLREGs, Ocean Eng., 2020, 217, p 107704.CrossRef Google Scholar

Zhou, X., Wu, P., Zhang, H., Guo, W. and Liu, Y. Learn to navigate: Cooperative path planning for unmanned surface vehicles using deep reinforcement learning, IEEE Access, 2019, 7, pp 165262–165278.CrossRef Google Scholar

Liu, J., Liu, Z., Wu, Z. and Yu, J. Three-dimensional path following control of an underactuated robotic dolphin using deep reinforcement learning, In 2020 IEEE International Conference on Real-time Computing and Robotics (RCAR), IEEE, Asahikawa, Japan, 2020, pp 315–320.CrossRef Google Scholar

Sun, Y., Luo, X., Ran, X. and Zhang, G. A 2D optimal path planning algorithm for autonomous underwater vehicle driving in unknown underwater canyons, J. Mar. Sci. Eng., 2021, 9, p 252.CrossRef Google Scholar

Chen, J., Yuan, B. and Tomizuka, M. Model-free deep reinforcement learning for urban autonomous driving, In 2019 IEEE Intelligent Transportation Systems Conference (ITSC), IEEE, Auckland, New Zealand, 2019, pp 2765–2771.CrossRef Google Scholar

Deshpande, N. and Spalanzani, A. Deep reinforcement learning based vehicle navigation amongst pedestrians using a grid-based state representation, In 2019 IEEE Intelligent Transportation Systems Conference (ITSC), IEEE, Auckland, New Zealand, 2019, pp 2081–2086.CrossRef Google Scholar

Wang, H., Wang, Z. and Cui, X. Multi-objective optimization based deep reinforcement learning for autonomous driving policy, J. Phys. Conf. Ser., 2021, 1861, p 012097.CrossRef Google Scholar

Hu, B., Li, J., Yang, J., Bai, H., Li, S., Sun, Y. and Yang, X. Reinforcement learning approach to design practical adaptive control for a small-scale intelligent vehicle, Symmetry, 2019, 11, p 1139.CrossRef Google Scholar

Hu, W., Li, X., Hu, J., Song, X., Dong, X., Kong, D., Xu, Q. and Ren, C. A rear anti-collision decision-making methodology based on deep reinforcement learning for autonomous commercial vehicles, IEEE Sens. J., 2022, 22, pp 16370–16380.CrossRef Google Scholar

Ye, F., Cheng, X., Wang, P., Chan, C.-Y. and Zhang, J. Automated lane change strategy using proximal policy optimization-based deep reinforcement learning, In 2020 IEEE Intelligent Vehicles Symposium (IV), 2020, pp 1746–1752.CrossRef Google Scholar

Luo, Z., Zhou, J. and Wen, G., Deep reinforcement learning based tracking control of unmanned vehicle with safety guarantee, In 2022 13th Asian Control Conference (ASCC), 2022, pp 1893–1898.CrossRef Google Scholar

Bakker, L. and Grammatico, S. A multi-agent deep reinforcement learning framework for automated driving on highways, In 2020 28th Mediterranean Conference on Control and Automation ( MED ), 2020, pp. 770–775.Google Scholar

Schmidt, L.M., Kontes, G., Plinge, A. and Mutschler, C. Can you trust your autonomous car? interpretable and verifiably safe reinforcement learning, In 2021 IEEE Intelligent Vehicles Symposium (IV), 2021, pp 171–178.CrossRef Google Scholar

Xu, J., Pei, X. and Lv, K. Decision-Making for Complex Scenario using Safe Reinforcement Learning, In 2020 4th CAA International Conference on Vehicular Control and Intelligence (CVCI), IEEE, Hangzhou, China, 2020, pp. 1–6.CrossRef Google Scholar

Lv, K., Pei, X., Chen, C. and Xu, J. A safe and efficient lane change decision-making strategy of autonomous driving based on deep reinforcement learning, Mathematics, 2022, 10, p 1551.CrossRef Google Scholar

Peake, A., McCalmon, J., Raiford, B., Liu, T. and Alqahtani, S. Multi-agent reinforcement learning for cooperative adaptive cruise control, In 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), pp 15–22.CrossRef Google Scholar

Wurman, P.R., Barrett, S., Kawamoto, K., MacGlashan, J., Subramanian, K., Walsh, T.J., Capobianco, R., Devlic, A., Eckert, F., Fuchs, F., Gilpin, L., Khandelwal, P., Kompella, V., Lin, H., MacAlpine, P., Oller, D., Seno, T., Sherstan, C., Thomure, M.D., Aghabozorgi, H., Barrett, L., Douglas, R., Whitehead, D., Dürr, P., Stone, P., Spranger, M. and Kitano, H. Outracing champion Gran Turismo drivers with deep reinforcement learning, Nature, 2022, 602, pp 223–228.CrossRef Google Scholar PubMed

Zhang, J., Chen, H., Song, S. and Hu, F. Reinforcement learning-based motion planning for automatic parking system, IEEE Access, 2020, 8, pp 154485–154501.CrossRef Google Scholar

Yuan, C. Philosophies of the Stability and Control of WIG Craft, Modern Ship Mechanics, National Defense Industry Press, 2014.Google Scholar

Melin, T. A vortex lattice MATLAB implementation for linear aerodynamic wing applications, Master’s Thesis, Department of Aeronautics, Royal Institute of Technology (KTH), Stockholm, Sweden, 2000.Google Scholar

Barber, T.J., Leonardi, E. and Archer, R.D. A technical note on the appropriate CFD boundary conditions for the prediction of ground effect aerodynamics, Aeronaut. J. 1968, 1999, 103, pp 545–547.CrossRef Google Scholar

Raymer, D.P. Aircraft Design: A Conceptual Approach, 4th ed., AIAA Education Series, American Institute of Aeronautics and Astronautics, Reston, VA, 2006.Google Scholar

Beard, R. and McLain, T. Small Unmanned Aircraft: Theory and Practice, Princeton University Press, 2012.CrossRef Google Scholar

Diston, D.J. Computational Modelling and Simulation of Aircraft and the Environment: Platform Kinematics and Synthetic Environment, volume 1, 1st ed. Aerospace Series, John Wiley & Sons Ltd, United Kingdom, 2009.CrossRef Google Scholar

Schulman, J., Wolski, F., Dhariwal, P., Radford, A. and Klimov, O., Policy Optimization Algorithms, Proximal. ArXiv:1707.06347 [cs], 2017.Google Scholar

Liu, X., Li, Z. and Zheng, J. ElegantRL: Massively Parallel Framework for Cloud-native Deep Reinforcement Learning, 2021. GitHub Repository.Google Scholar

Article contents

Multi-objective reward shaping for global and local trajectory planning of wing-in-ground crafts based on deep reinforcement learning

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests