Hostname: page-component-5447f9dfdb-gf5gg Total loading time: 0 Render date: 2025-07-30T05:28:41.589Z Has data issue: false hasContentIssue false

Sim-to-real pipeline for training autonomous obstacle avoidance of underwater robots based on high-fidelity model

Published online by Cambridge University Press:  30 July 2025

Suohang Zhang
Affiliation:
State Key Laboratory of Fluid Power and Mechatronic Systems, Zhejiang University, Hangzhou, China
Luning Zhang
Affiliation:
State Key Laboratory of Fluid Power and Mechatronic Systems, Zhejiang University, Hangzhou, China
Yanhu Chen*
Affiliation:
State Key Laboratory of Fluid Power and Mechatronic Systems, Zhejiang University, Hangzhou, China
*
Corresponding author: Yanhu Chen; Email: 12425012@zju.edu.cn

Abstract

Underwater robots conducting inspections require autonomous obstacle avoidance capabilities to ensure safe operations. Training methods based on reinforcement learning (RL) can effectively develop autonomous obstacle avoidance strategies for underwater robots; however, training in real environments carries significant risks and can easily result in robot damage. This paper proposes a Sim-to-Real pipeline for RL-based training of autonomous obstacle avoidance in underwater robots, addressing the challenges associated with training and deploying RL methods for obstacle avoidance in this context. We establish a simulation model and environment for underwater robot training based on the mathematical model of the robot, comprehensively reducing the gap between simulation and reality in terms of system inputs, modeling, and outputs. Experimental results demonstrate that our high-fidelity simulation system effectively facilitates the training of autonomous obstacle avoidance algorithms, achieving a 94% success rate in obstacle avoidance and collision-free operation exceeding 5000 steps in virtual environments. Directly transferring the trained strategy to a real robot successfully performed obstacle avoidance experiments in a pool, validating the effectiveness of our method for autonomous strategy training and sim-to-real transfer in underwater robots.

Information

Type
Research Article
Copyright
© The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J. and Zaremba., W. (2016). openai gym. arXiv preprint arXiv: 1606.01540.Google Scholar
Chen, Y., Liu, S., Zhang, L., Zheng, P. and Yang, C., “Study on the adsorption performance of underwater propeller-driven Bernoulli adsorption device,” Ocean Eng. 266, 112724 (2022).10.1016/j.oceaneng.2022.112724CrossRefGoogle Scholar
Chen, Y., Tu, Z., Zhang, S., Zhou, J. and Yang, C., “A synchronous multi-agent reinforcement learning framework for UVMS grasping,” Ocean Eng. 307, 118155 (2024).10.1016/j.oceaneng.2024.118155CrossRefGoogle Scholar
Chu, Z., Wang, F., Lei, T. and Luo, C., “Path planning based on deep reinforcement learning for autonomous underwater vehicles under ocean current disturbance,” IEEE Trans. Intell. Veh. 8(1), 108120 (2022).10.1109/TIV.2022.3153352CrossRefGoogle Scholar
Duane, C. B., “Close-range camera calibration,” Photogramm. Eng. 37, 855866 (1971).Google Scholar
Fan, J., Yang, C., Chen, Y., Wang, H., Huang, Z., Shou, Z., Jiang, P. and Wei, Q., “An underwater robot with self-adaption mechanism for cleaning steel pipes with variable diameters,” Ind. Robot Int. J. 45(2), 193205 (2018).10.1108/IR-09-2017-0168CrossRefGoogle Scholar
Fossen, T. I.. Handbook of Marine Craft Hydrodynamics and Motion Control (John Wiley & Sons, Chichester, UK, 2011). doi: 10.1002/9781119994138 CrossRefGoogle Scholar
Fossen, T. I. and Sagatun, S. I., Adaptive control of nonlinear underwater robotic systems (1991).10.1109/ICAR.1991.240508CrossRefGoogle Scholar
Fryer, J. G. and Brown, D. C., “Lens distortion for close-range photogrammetry,” Photogramm. Eng. Remote Sens. 52, 5158 (1986).Google Scholar
Fujimoto, S., Hoof, H. and Meger, D.. “Addressing Function Approximation Error in Actor-Critic Methods.” In: International Conference on Machine Learning, PMLR (2018) pp. 15871596.Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P. and Levine, S.. “Soft Actor-Critic: Off-policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor.” In: International Conference on Machine Learning, PMLR (2018) pp. 18611870.Google Scholar
Hadi, B., Khosravi, A. and Sarhadi, P., “Deep reinforcement learning for adaptive path planning and control of an autonomous underwater vehicle,” Appl. Ocean Res. 129, 103326 (2022).10.1016/j.apor.2022.103326CrossRefGoogle Scholar
He, Z., Dong, L., Sun, C. and Wang, J., “Asynchronous multithreading reinforcement-learning-based path planning and tracking for unmanned underwater vehicle,” IEEE Trans. Syst. Man Cybern. Syst. 52(5), 27572769 (2021).10.1109/TSMC.2021.3050960CrossRefGoogle Scholar
Henriksen, E. H., Schjølberg, I. and Gjersvik, T. B., “Uw Morse: The Underwater Modular Open Robot Simulation Engine,” In: 2016 IEEE/OES Autonomous Underwater Vehicles (AUV), IEEE (2016) pp. 261267.10.1109/AUV.2016.7778681CrossRefGoogle Scholar
Hong, S., Chung, D., Kim, J., Kim, Y., Kim, A. and Yoon, H. K., “In-water visual ship hull inspection using a hover-capable underwater vehicle with stereo vision,” J. Field Robot. 36(3), 531546 (2019).10.1002/rob.21841CrossRefGoogle Scholar
Hosny, A., Hashemi, S., Shalan, M. and Reda, S.. “Drills: Deep Reinforcement Learning for Logic Synthesis.” In: 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC), IEEE (2020) pp. 581586.Google Scholar
Hu, S., Liang, Q., Huang, H. and Yang, C., “Construction of a digital twin system for the blended-wing-body underwater glider,” Ocean Eng. 270, 113610 (2023).10.1016/j.oceaneng.2022.113610CrossRefGoogle Scholar
Hussein, S. M. and Al-Araji, A. S., “Development of path-finding controller design for hovercraft model via neural network technique and meta-heuristic algorithms,” Int. J. Intell. Eng. Syst. 17(4), 576597 (2024a).Google Scholar
Hussein, S. M. and Al-Araji, A. S., “Enhancement of a path-finding algorithm for the hovercraft system based on intelligent hybrid stochastic methods,” Int. J. Intell. Eng. Syst. 17(2), 346364 (2024b).Google Scholar
Kashyap, A. K., Parhi, D. R. and Kumar, V., “Navigation for multi-humanoid using mfo-aided reinforcement learning approach,” Robotica 41(1), 346369 (2023).10.1017/S0263574722001357CrossRefGoogle Scholar
Kutzke, D. T., Carter, J. B. and Hartman, B. T., “Subsystem selection for digital twin development: A case study on an unmanned underwater vehicle,” Ocean Eng. 223, 108629 (2021).10.1016/j.oceaneng.2021.108629CrossRefGoogle Scholar
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D. and Wierstra, D., “Continuous control with deep reinforcement learning (2015). arXiv preprint arXiv: 1509.02971.Google Scholar
Liu, T., Huang, J. and Zhao, J., “Research on obstacle avoidance of underactuated autonomous underwater vehicle based on offline reinforcement learning,” Robotica 43.125 (2024).Google Scholar
Liu, Y., Xu, H., Liu, D. and Wang, L., “A digital twin-based sim-to-real transfer for deep reinforcement learning-enabled industrial robot grasping,” Robot. Comput-integ. Manuf. 78, 102365 (2022).10.1016/j.rcim.2022.102365CrossRefGoogle Scholar
Ma, R., Wang, Y., Tang, C., Wang, S. and Wang, R., “Position and attitude tracking control of a biomimetic underwater vehicle via deep reinforcement learning,” IEEE/ASME Trans. Mechatron. 28(5), 28102819 (2023).10.1109/TMECH.2023.3249194CrossRefGoogle Scholar
Manhães, M. M. M., Scherer, S. A., Voss, M., Douat, L. R. and Rauschenbach, T., “Uuv Simulator: A Gazebo-Based Package for Underwater Intervention and Multi-Robot Simulation,” In: Oceans 2016 MTS/IEEE Monterey, IEEE (2016) pp. 18.Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S. and Hassabis, D., “Human-level control through deep reinforcement learning,” Nature 518(7540), 529533 (2015).10.1038/nature14236CrossRefGoogle ScholarPubMed
Player, T. R., Chakravarty, A., Zhang, M. M., Raanan, B. Y., Kieft, B., Zhang, Y. and Hobson, B.. “From Concept to Field Tests: Accelerated Development of Multi-AUV Missions Using a High-fidelity Faster-than-real-time Simulator.” In: 2023 IEEE International Conference on Robotics and Automation (ICRA), IEEE (2023) pp. 31023108.Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. and Klimov, O., “Proximal policy optimization algorithms (2017). arXiv preprint arXiv: 1707.06347.Google Scholar
Sivčev, S., Coleman, J., Omerdić, E., Dooly, G. and Toal, D., “Underwater manipulators: A review,” Ocean Eng. 163, 431450 (2018).10.1016/j.oceaneng.2018.06.018CrossRefGoogle Scholar
Sun, Y., Ran, X., Zhang, G., Xu, H. and Wang, X., “Auv 3d path planning based on the improved hierarchical deep q network,” J. Mar. Sci. Eng. 8(2), 145 (2020).10.3390/jmse8020145CrossRefGoogle Scholar
Tao, F., Xiao, B., Qi, Q., Cheng, J. and Ji, P., “Digital twin modeling,” J. Manuf. Syst. 64, 372389 (2022).10.1016/j.jmsy.2022.06.015CrossRefGoogle Scholar
Trucco, E. and Verri, A.. Introductory Techniques for 3-D Computer Vision, vol. 201. Prentice Hall Englewood Cliffs (1998).Google Scholar
Wu, H., Song, S., Hsu, Y., You, K. and Wu, C.. “End-to-end Sensorimotor Control Problems of AUVs with Deep Reinforcement Learning.” In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE (2019) pp. 58695874.Google Scholar
Yadav, S. P., Nagar, R. and Shah, S. V., “Learning vision-based robotic manipulation tasks sequentially in offline reinforcement learning settings,” Robotica 42(6), 17151730 (2024).10.1017/S0263574724000389CrossRefGoogle Scholar
Yang, C., Liu, S., Su, H., Zhang, L., Xia, Q. and Chen, Y., “Review of underwater adsorptive-operating robots: Design and application,” Ocean Eng. 294, 116794 (2024a).10.1016/j.oceaneng.2024.116794CrossRefGoogle Scholar
Yang, X., Gao, J., Wang, P., Long, W. and Fu, C., “Social learning with actor–critic for dynamic grasping of underwater robots via digital twins,” Ocean Eng. 306, 118070 (2024b).10.1016/j.oceaneng.2024.118070CrossRefGoogle Scholar
Yu, K., Fu, M., Tian, X., Yang, S. and Yang, Y., “Curriculum reinforcement learning-based drifting along a general path for autonomous vehicles,” Robotica 42(10), 32633280 (2024a).10.1017/S026357472400119XCrossRefGoogle Scholar
Yu, L., Qiao, L. and Shen, C., “High-speed obstacle avoidance of a large-scale underactuated autonomous underwater vehicle under a finite field of view,” IEEE Trans. Autom. Sci. Eng. (2024b).10.1109/TASE.2024.3373607CrossRefGoogle Scholar
Zhang, D., Ju, R. and Cao, Z., “Reinforcement learning-based motion control for snake robots in complex environments,” Robotica 42(4), 947961 (2024).10.1017/S0263574723001613CrossRefGoogle Scholar
Zhang, T., Miao, X., Li, Y., Jia, L., Wei, Z., Gong, Q. and Wen, T., “Auv 3d docking control using deep reinforcement learning,” Ocean Eng. 283, 115021 (2023).10.1016/j.oceaneng.2023.115021CrossRefGoogle Scholar
Zhang, Z., “A flexible new technique for camera calibration,” IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 13301334 (2002).10.1109/34.888718CrossRefGoogle Scholar