Image segmentation-driven sim-to-real deep reinforcement learning framework for accurate peg-in-hole assembly

Ning Zhang; Yongjia Zhao; Minghao Yang; Shuling Dai

doi:10.1017/S026357472510177X

Image segmentation-driven sim-to-real deep reinforcement learning framework for accurate peg-in-hole assembly

Published online by Cambridge University Press: 18 July 2025

Minghao Yang and

Ning Zhang: Affiliation:
China’s State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
Yongjia Zhao*: Affiliation:
China’s State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China Jiangxi Research Institute, Beihang University, Nanchang, Jiangxi, China
Minghao Yang: Affiliation:
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Shuling Dai: Affiliation:
China’s State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
*: Corresponding author: Yongjia Zhao; Email: zhaoyongjia@buaa.edu.cn

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

The automation of assembly operations with industrial robots is pivotal in modern manufacturing, particularly for multispecies, low-volume, and customized production. Traditional programing methods are time-consuming and lack adaptability to complex, variable environments. Reinforcement learning-based assembly tasks have shown success in simulation environments, but face challenges like the simulation-to-reality gap and safety concerns when transferred to real-world applications. This article addresses these challenges by proposing a low-cost, image-segmentation-driven deep reinforcement learning strategy tailored for insertion tasks, such as the assembly of peg-in-hole components in satellite manufacturing, which involve extensive contact interactions. Our approach integrates visual and forces feedback into a prior dueling deep Q-network for insertion skill learning, enabling precise alignment of components. To bridge the simulation-to-reality gap, we transform the raw image input space into a canonical space based on image segmentation. Specifically, we employ a segmentation model based on U-net, pretrained in simulation and fine-tuned with real-world data, significantly reducing the need for labor-intensive real image segment labels. To handle the frequent contact inherent in peg-in-hole tasks, we integrated safety protections and impedance control into the training process, providing active compliance and reducing the risk of assembly failures. Our approach was evaluated in both simulated and real robotic environments, demonstrating robust performance in handling camera position errors and varying ambient light intensities and different lighting colors. Finally, the algorithm was validated in a real satellite assembly scenario, achieving a success rate of 15 out of 20 tests.

Keywords

Peg-in-Hole (PiH)sim-to-real reinforcement learning (RL)impedance control image segmentation

Information

Type: Research Article
Information: Robotica , First View , pp. 1 - 20

DOI: https://doi.org/10.1017/S026357472510177X [Opens in a new window]
Copyright: © The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Beltran-Hernandez, C. C., Petit, D., Ramirez-Alpizar, I. G. and Harada, K., “Variable compliance control for robotic peg-in-hole assembly: A deep-reinforcement-learning approach,” Appl. Sci. 10a(19), 6923 (2020a).10.3390/app10196923CrossRef Google Scholar

Beltran-Hernandez, C. C., Petit, D., Ramirez-Alpizar, I. G., Nishi, T., Kikuchi, S., Matsubara, T. and Harada, K., “Learning force control for contact-rich manipulation tasks with rigid position-controlled robots,” IEEE Robot. Autom. Lett. 5b(4), 5709–5716 (2020b).10.1109/LRA.2020.3010739CrossRef Google Scholar

Bogdanovic, M., Khadiv, M. and Righetti, L., “Learning variable impedance control for contact sensitive tasks,” IEEE Robot. Autom. Lett. 5(4), 6129–6136 (2020).10.1109/LRA.2020.3011379CrossRef Google Scholar

Bouchard, C., Nesme, M., Tournier, M., Wang, B., Faure, F. and Kry, P., “6d frictional contact for rigid bodies,” In: Proceedings of the 41st Graphics Interface Conference, Canadian Information Processing Society(2015) pp. 105–114.Google Scholar

Breyer, M., Furrer, F., Novkovic, T., Siegwart, R. and Nieto, J., “Comparing task simplifications tolearn closed-loop object picking using deep reinforcement learning,” IEEE Robot. Autom. Lett. 4(2), 1549–1556 (2019).10.1109/LRA.2019.2896467CrossRef Google Scholar

Cao, M. Y., Laws, S. and y Baena, F. R., “Six-axis force/torque sensors for robotics applications: A review,” IEEE Sens. J. 21(24), 27238–27251 (2021).10.1109/JSEN.2021.3123638CrossRef Google Scholar

Chen, W., Zeng, C., Liang, H., Sun, F. and Zhang, J., “Multimodality driven impedance-based sim2real transfer learning for robotic multiple peg-in-hole assembly,” IEEE Trans. Cybernetics 54(5), 2784–2797 (2023).10.1109/TCYB.2023.3310505CrossRef Google Scholar

Haarnoja, T., Zhou, A., Abbeel, P. and Levine, S., “Soft actor-critic: Off-policy maximum entropydeep reinforcement learning with a stochastic actor,” In: International Conference on Machine Learning, PMLR (2018) pp. 1861–1870.Google Scholar

Zhang, W. H., Chen, Y. M., Yang, H., Su, S., Shann, T., Chang, Y., Brian, H., Tu, C., Hsiao, T., Lai, S., Chang, Y. and Lee, C., “Virtual-to-real: learning to control in visual semantic segmentation,” In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI), AAAI Press (2018) pp. 4912–4920.Google Scholar

Hou, Z., Dong, H., Zhang, K., Gao, Q., Chen, K. and Xu, J., “Knowledge-driven Deep Deterministic Policy Gradient for Robotic Multiple Peg-in-hole Assembly Tasks,” In: 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO) 2018, IEEE (2018) pp. 256–261.Google Scholar

Inoue, T., De Magistris, G., Munawar, A., Yokoya, T. and Tachibana, R., “Deep Reinforcement Learning for High Precision Assembly Tasks,” In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE (2017) pp. 819–825.Google Scholar

James, S., Wohlhart, P., Kalakrishnan, M., Kalashnikov, D., Irpan, A., Ibarz, J., Levine, S., Hadsell, R. and Bousmalis, K., “Sim-to-real Via Sim-to-sim: Data-efficient Robotic Grasping Via Randomized-to-canonical Adaptation Networks,” In: Proceedings Of The IEEE/CVF Conference On Computer Vision And Pattern Recognition, IEEE (2019) pp. 12627–12637.Google Scholar

Jeong, R., Aytar, Y., Khosid, D., Zhou, Y., Kay, J., Lampe, T., Bousmalis, K. and Nori, F., “Self-supervised Sim-to-real Adaptation for Visual Robotic Manipulation,” In: 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE (2020) pp. 2718–2724.Google Scholar

Kim, Y.-L., Song, H.-C. and Song, J.-B., “Hole detection algorithm for chamferless square peg-in-hole based on shape recognition using f/t sensor,” Int. J. Precis. Eng. Man. 15(3), 425–432 (2014).10.1007/s12541-014-0353-6CrossRef Google Scholar

Kroemer, O., Niekum, S. and Konidaris, G., “A review of robot learning for manipulation: Challenges, representations, and algorithms,” J. Mach. Learn. Res. 22(30), 1–82 (2021).Google Scholar

Kulkarni, P., Kober, J., Babuška, R. and Santina, C. D., “Learning assembly tasks in a few minutes by combining impedance control and residual recurrent reinforcement learning,” Adv. Intell. Syst. 4(1), 2100095 (2022).10.1002/aisy.202100095CrossRef Google Scholar

Li, X., Xiao, J., Zhao, W., Liu, H. and Wang, G., “Multiple peg-in-hole compliant assembly based on a learning-accelerated deep deterministic policy gradient strategy,” Industrial Robot: The International Journal of Robotics Research and Application 49(1), 54–64 (2022).10.1108/IR-01-2021-0003CrossRef Google Scholar

Liu, Y., Chen, Z., Qiao, H. and Gan, S., “Compliant peg-in-hole assembly for nonconvex axisymmetric components based on attractive region in environment,” Robotica 41(8), 2314–2336 (2023).10.1017/S0263574723000425CrossRef Google Scholar

Luo, J., Solowjow, E., Wen, C., Ojea, J. A., Agogino, A. M., Tamar, A. and Abbeel, P., “Reinforcement Learning on Variable Impedance Controller for High-precision Robotic Assembly,” In: 2019 International Conference on Robotics and Automation (ICRA), IEEE (2019) pp. 3080–3087.Google Scholar

Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D. and Kavukcuoglu, K.,“Asynchronous Methods for Deep Reinforcement Learning,” In: International Conference on Machine Learning, PMLR (2016) pp. 1928–1937.Google Scholar

Pareek, S., Nisar, H. J. and Kesavadas, T., “Ar3n: A reinforcement learning-based assist-as-needed controller for robotic rehabilitation,” IEEE Robot. Autom. Mag. 31(3), 74–82 (2023).10.1109/MRA.2023.3282434CrossRef Google Scholar

Pashevich, A., Strudel, R., Kalevatykh, I., Laptev, I. and Schmid, C., “Learning to Augment Synthetic Images for Sim2real Policy Transfer,” In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE (2019) pp. 2651–2657.Google Scholar

Puang, E. Y., Tee, K. P. and Jing, W., “Kovis: Keypoint-based Visual Servoing with Zero-shot Sim-to-real Transfer for Robotics Manipulation,” 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE (2020) pp. 7527–7533.Google Scholar

Rao, K., Harris, C., Irpan, A., Levine, S., Ibarz, J. and Khansari, M., “Rl-cyclegan: Reinforcement Learning Aware Simulation-to-real,” In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE (2020) pp. 11157–11166.Google Scholar

Ronneberger, O., Fischer, P. and Brox, T., “U-net: Convolutional Networks for Biomedical Image Segmentation,” In: Medical Image Computing and Computer-assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, Springer (2015) pp. 234–241, proceedings, part III 18 2015.Google Scholar

Shi, Y., Yuan, C., Tsitos, A., Cong, L., Hadjar, H., Chen, Z. and Zhang, J., “A sim-to-real learning-based framework for contact-rich assembly by utilizing cyclegan and force control,” IEEE Trans. Cogn. Dev. Syst. 15(4), 2144–2155 (2023).10.1109/TCDS.2023.3237734CrossRef Google Scholar

Spector, O. and Di Castro, D., “Insertionnet-a scalable solution for insertion,” IEEE Robot. Autom. Lett. 6(3), 5509–5516 (2021).10.1109/LRA.2021.3076971CrossRef Google Scholar

Stemmer, A., Schreiber, G., Arbter, K. and Albu-Schaffer, A., “Robust Assembly of Complex Shaped Planar Parts Using Vision And Force,” 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, IEEE (2006) pp. 493–500.Google Scholar

Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W. and Abbeel, P., “Domain Randomization for Transferring Deep Neural Networks From Simulation to the Real World,” In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE (2017) pp. 23–30.Google Scholar

Villagomez, R. C. and Ordoñez, J., “Robot Grasping Based on RGB Object And Grasp Detection Using Deep Learning,” In: 2022, 8th International Conference on Mechatronics and Robotics Engineering (ICMRE), IEEE (2022) pp. 84–90.Google Scholar

Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M. and Freitas, N., “Dueling Network Architectures for Deep Reinforcement Learning,” International Conference on Machine Learning, PMLR (2016) pp. 1995–2003.Google Scholar

Wu, J., Zhou, Y., Yang, H., Huang, Z. and Lv, C., “Human-guided reinforcement learning with sim-to-real transfer for autonomous navigation,” IEEE Trans. Pattern Anal. Mach. Intell. 45(12), 14745–14759 (2023).10.1109/TPAMI.2023.3314762CrossRef Google Scholar PubMed

Wu, W., Liu, K. and Wang, T., “Robot assembly theory and simulation of circular-rectangular compound peg-in-hole,” Robotica 40(9), 3306–3339 (2022).10.1017/S0263574722000200CrossRef Google Scholar

Xu, B., Hassan, T. and Hussain, I., “Seg-curl: Segmented contrastive unsupervised reinforcement learning for sim-to-real in visual robotic manipulation,” IEEE Access 11, 50195–50204 (2023).10.1109/ACCESS.2023.3278208CrossRef Google Scholar

Xu, J., Hou, Z., Wang, W., Xu, B., Zhang, K. and Chen, K., “Feedback deep deterministic policy gradient with fuzzy reward for robotic multiple peg-in-hole assembly tasks,” IEEE Trans. Ind. Inform. 15(3), 1658–1667 (2018).10.1109/TII.2018.2868859CrossRef Google Scholar

Yang, Q., Dürr, A., Topp, E. A., Stork, J. A. and Stoyanov, T., “Variable impedance skill learning for contact-rich manipulation,” IEEE Robot. Autom. Lett. 7(3), 8391–8398 (2022).10.1109/LRA.2022.3187276CrossRef Google Scholar

Yu, X., He, W., Li, Q., Li, Y. and Li, B., “Human-robot co-carrying using visual and force sensing,” IEEE Trans. Ind. Electron. 68(9), 8657–8666 (2020).10.1109/TIE.2020.3016271CrossRef Google Scholar

Yu, X., He, W., Li, Y., Xue, C., Li, J., Zou, J. and Yang, C., “Bayesian estimation of human impedance and motion intention for human–robot collaboration,” IEEE Trans. Cybernetics 51(4), 1822–1834 (2019).10.1109/TCYB.2019.2940276CrossRef Google Scholar

Zeng, C., Li, S., Chen, Z., Yang, C., Sun, F. and Zhang, J., “Multifingered robot hand compliant manipulation based on vision-based demonstration and adaptive force control,” IEEE Trans. Neural Netw. Learn. Syst. 34(9), 5452–5463 (2022).10.1109/TNNLS.2022.3184258CrossRef Google Scholar

Zhang, X., Sun, L., Kuang, Z. and Tomizuka, M., “Learning variable impedance control via inverse reinforcement learning for force-related tasks,” IEEE Robot. Autom. Lett. 6(2), 2225–2232 (2021).10.1109/LRA.2021.3061374CrossRef Google Scholar

Zhao, W., Queralta, J. P. and Westerlund, T., “SIm-to-real Transfer in Deep Reinforcement Learning for Robotics: A Survey,” In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI), IEEE (2020) pp. 737–744.Google Scholar

Article contents

Image segmentation-driven sim-to-real deep reinforcement learning framework for accurate peg-in-hole assembly

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests