Sim-to-real pipeline for training autonomous obstacle avoidance of underwater robots based on high-fidelity model

Suohang Zhang; Luning Zhang; Yanhu Chen

doi:10.1017/S0263574725101999

Sim-to-real pipeline for training autonomous obstacle avoidance of underwater robots based on high-fidelity model

Published online by Cambridge University Press: 30 July 2025

Suohang Zhang

Luning Zhang and

Yanhu Chen

Show author details

Suohang Zhang: Affiliation:
State Key Laboratory of Fluid Power and Mechatronic Systems, Zhejiang University, Hangzhou, China
Luning Zhang: Affiliation:
State Key Laboratory of Fluid Power and Mechatronic Systems, Zhejiang University, Hangzhou, China
Yanhu Chen*: Affiliation:
State Key Laboratory of Fluid Power and Mechatronic Systems, Zhejiang University, Hangzhou, China
*: Corresponding author: Yanhu Chen; Email: 12425012@zju.edu.cn

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Underwater robots conducting inspections require autonomous obstacle avoidance capabilities to ensure safe operations. Training methods based on reinforcement learning (RL) can effectively develop autonomous obstacle avoidance strategies for underwater robots; however, training in real environments carries significant risks and can easily result in robot damage. This paper proposes a Sim-to-Real pipeline for RL-based training of autonomous obstacle avoidance in underwater robots, addressing the challenges associated with training and deploying RL methods for obstacle avoidance in this context. We establish a simulation model and environment for underwater robot training based on the mathematical model of the robot, comprehensively reducing the gap between simulation and reality in terms of system inputs, modeling, and outputs. Experimental results demonstrate that our high-fidelity simulation system effectively facilitates the training of autonomous obstacle avoidance algorithms, achieving a 94% success rate in obstacle avoidance and collision-free operation exceeding 5000 steps in virtual environments. Directly transferring the trained strategy to a real robot successfully performed obstacle avoidance experiments in a pool, validating the effectiveness of our method for autonomous strategy training and sim-to-real transfer in underwater robots.

Keywords

underwater operational robots autonomous obstacle avoidance transfer learning deep reinforcement learning

Information

Type: Research Article
Information: Robotica , Volume 43 , Issue 8 , August 2025 , pp. 2952 - 2974

DOI: https://doi.org/10.1017/S0263574725101999 [Opens in a new window]
Copyright: © The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J. and Zaremba., W. (2016). openai gym. arXiv preprint arXiv: 1606.01540.Google Scholar

Chen, Y., Liu, S., Zhang, L., Zheng, P. and Yang, C., “Study on the adsorption performance of underwater propeller-driven Bernoulli adsorption device,” Ocean Eng. 266, 112724 (2022).CrossRef Google Scholar

Chen, Y., Tu, Z., Zhang, S., Zhou, J. and Yang, C., “A synchronous multi-agent reinforcement learning framework for UVMS grasping,” Ocean Eng. 307, 118155 (2024).CrossRef Google Scholar

Chu, Z., Wang, F., Lei, T. and Luo, C., “Path planning based on deep reinforcement learning for autonomous underwater vehicles under ocean current disturbance,” IEEE Trans. Intell. Veh. 8(1), 108–120 (2022).CrossRef Google Scholar

Duane, C. B., “Close-range camera calibration,” Photogramm. Eng. 37, 855–866 (1971).Google Scholar

Fan, J., Yang, C., Chen, Y., Wang, H., Huang, Z., Shou, Z., Jiang, P. and Wei, Q., “An underwater robot with self-adaption mechanism for cleaning steel pipes with variable diameters,” Ind. Robot Int. J. 45(2), 193–205 (2018).CrossRef Google Scholar

Fossen, T. I.. Handbook of Marine Craft Hydrodynamics and Motion Control (John Wiley & Sons, Chichester, UK, 2011). doi: 10.1002/9781119994138 CrossRef Google Scholar

Fossen, T. I. and Sagatun, S. I., Adaptive control of nonlinear underwater robotic systems (1991).Google Scholar

Fryer, J. G. and Brown, D. C., “Lens distortion for close-range photogrammetry,” Photogramm. Eng. Remote Sens. 52, 51–58 (1986).Google Scholar

Fujimoto, S., Hoof, H. and Meger, D.. “Addressing Function Approximation Error in Actor-Critic Methods.” In: International Conference on Machine Learning, PMLR (2018) pp. 1587–1596.Google Scholar

Haarnoja, T., Zhou, A., Abbeel, P. and Levine, S.. “Soft Actor-Critic: Off-policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor.” In: International Conference on Machine Learning, PMLR (2018) pp. 1861–1870.Google Scholar

Hadi, B., Khosravi, A. and Sarhadi, P., “Deep reinforcement learning for adaptive path planning and control of an autonomous underwater vehicle,” Appl. Ocean Res. 129, 103326 (2022).CrossRef Google Scholar

He, Z., Dong, L., Sun, C. and Wang, J., “Asynchronous multithreading reinforcement-learning-based path planning and tracking for unmanned underwater vehicle,” IEEE Trans. Syst. Man Cybern. Syst. 52(5), 2757–2769 (2021).CrossRef Google Scholar

Henriksen, E. H., Schjølberg, I. and Gjersvik, T. B., “Uw Morse: The Underwater Modular Open Robot Simulation Engine,” In: 2016 IEEE/OES Autonomous Underwater Vehicles (AUV), IEEE (2016) pp. 261–267.CrossRef Google Scholar

Hong, S., Chung, D., Kim, J., Kim, Y., Kim, A. and Yoon, H. K., “In-water visual ship hull inspection using a hover-capable underwater vehicle with stereo vision,” J. Field Robot. 36(3), 531–546 (2019).CrossRef Google Scholar

Hosny, A., Hashemi, S., Shalan, M. and Reda, S.. “Drills: Deep Reinforcement Learning for Logic Synthesis.” In: 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC), IEEE (2020) pp. 581–586.Google Scholar

Hu, S., Liang, Q., Huang, H. and Yang, C., “Construction of a digital twin system for the blended-wing-body underwater glider,” Ocean Eng. 270, 113610 (2023).CrossRef Google Scholar

Hussein, S. M. and Al-Araji, A. S., “Development of path-finding controller design for hovercraft model via neural network technique and meta-heuristic algorithms,” Int. J. Intell. Eng. Syst. 17(4), 576–597 (2024a).Google Scholar

Hussein, S. M. and Al-Araji, A. S., “Enhancement of a path-finding algorithm for the hovercraft system based on intelligent hybrid stochastic methods,” Int. J. Intell. Eng. Syst. 17(2), 346–364 (2024b).Google Scholar

Kashyap, A. K., Parhi, D. R. and Kumar, V., “Navigation for multi-humanoid using mfo-aided reinforcement learning approach,” Robotica 41(1), 346–369 (2023).CrossRef Google Scholar

Kutzke, D. T., Carter, J. B. and Hartman, B. T., “Subsystem selection for digital twin development: A case study on an unmanned underwater vehicle,” Ocean Eng. 223, 108629 (2021).CrossRef Google Scholar

Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D. and Wierstra, D., “Continuous control with deep reinforcement learning (2015). arXiv preprint arXiv: 1509.02971.Google Scholar

Liu, T., Huang, J. and Zhao, J., “Research on obstacle avoidance of underactuated autonomous underwater vehicle based on offline reinforcement learning,” Robotica 43.1–25 (2024).Google Scholar

Liu, Y., Xu, H., Liu, D. and Wang, L., “A digital twin-based sim-to-real transfer for deep reinforcement learning-enabled industrial robot grasping,” Robot. Comput-integ. Manuf. 78, 102365 (2022).CrossRef Google Scholar

Ma, R., Wang, Y., Tang, C., Wang, S. and Wang, R., “Position and attitude tracking control of a biomimetic underwater vehicle via deep reinforcement learning,” IEEE/ASME Trans. Mechatron. 28(5), 2810–2819 (2023).CrossRef Google Scholar

Manhães, M. M. M., Scherer, S. A., Voss, M., Douat, L. R. and Rauschenbach, T., “Uuv Simulator: A Gazebo-Based Package for Underwater Intervention and Multi-Robot Simulation,” In: Oceans 2016 MTS/IEEE Monterey, IEEE (2016) pp. 1–8.Google Scholar

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S. and Hassabis, D., “Human-level control through deep reinforcement learning,” Nature 518(7540), 529–533 (2015).CrossRef Google Scholar PubMed

Player, T. R., Chakravarty, A., Zhang, M. M., Raanan, B. Y., Kieft, B., Zhang, Y. and Hobson, B.. “From Concept to Field Tests: Accelerated Development of Multi-AUV Missions Using a High-fidelity Faster-than-real-time Simulator.” In: 2023 IEEE International Conference on Robotics and Automation (ICRA), IEEE (2023) pp. 3102–3108.Google Scholar

Schulman, J., Wolski, F., Dhariwal, P., Radford, A. and Klimov, O., “Proximal policy optimization algorithms (2017). arXiv preprint arXiv: 1707.06347.Google Scholar

Sivčev, S., Coleman, J., Omerdić, E., Dooly, G. and Toal, D., “Underwater manipulators: A review,” Ocean Eng. 163, 431–450 (2018).CrossRef Google Scholar

Sun, Y., Ran, X., Zhang, G., Xu, H. and Wang, X., “Auv 3d path planning based on the improved hierarchical deep q network,” J. Mar. Sci. Eng. 8(2), 145 (2020).CrossRef Google Scholar

Tao, F., Xiao, B., Qi, Q., Cheng, J. and Ji, P., “Digital twin modeling,” J. Manuf. Syst. 64, 372–389 (2022).CrossRef Google Scholar

Trucco, E. and Verri, A.. Introductory Techniques for 3-D Computer Vision, vol. 201. Prentice Hall Englewood Cliffs (1998).Google Scholar

Wu, H., Song, S., Hsu, Y., You, K. and Wu, C.. “End-to-end Sensorimotor Control Problems of AUVs with Deep Reinforcement Learning.” In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE (2019) pp. 5869–5874.Google Scholar

Yadav, S. P., Nagar, R. and Shah, S. V., “Learning vision-based robotic manipulation tasks sequentially in offline reinforcement learning settings,” Robotica 42(6), 1715–1730 (2024).CrossRef Google Scholar

Yang, C., Liu, S., Su, H., Zhang, L., Xia, Q. and Chen, Y., “Review of underwater adsorptive-operating robots: Design and application,” Ocean Eng. 294, 116794 (2024a).CrossRef Google Scholar

Yang, X., Gao, J., Wang, P., Long, W. and Fu, C., “Social learning with actor–critic for dynamic grasping of underwater robots via digital twins,” Ocean Eng. 306, 118070 (2024b).CrossRef Google Scholar

Yu, K., Fu, M., Tian, X., Yang, S. and Yang, Y., “Curriculum reinforcement learning-based drifting along a general path for autonomous vehicles,” Robotica 42(10), 3263–3280 (2024a).CrossRef Google Scholar

Yu, L., Qiao, L. and Shen, C., “High-speed obstacle avoidance of a large-scale underactuated autonomous underwater vehicle under a finite field of view,” IEEE Trans. Autom. Sci. Eng. (2024b).CrossRef Google Scholar

Zhang, D., Ju, R. and Cao, Z., “Reinforcement learning-based motion control for snake robots in complex environments,” Robotica 42(4), 947–961 (2024).CrossRef Google Scholar

Zhang, T., Miao, X., Li, Y., Jia, L., Wei, Z., Gong, Q. and Wen, T., “Auv 3d docking control using deep reinforcement learning,” Ocean Eng. 283, 115021 (2023).CrossRef Google Scholar

Zhang, Z., “A flexible new technique for camera calibration,” IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1330–1334 (2002).CrossRef Google Scholar

Article contents

Sim-to-real pipeline for training autonomous obstacle avoidance of underwater robots based on high-fidelity model

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests