Hostname: page-component-6bb9c88b65-vpjdr Total loading time: 0 Render date: 2025-07-22T15:20:32.622Z Has data issue: false hasContentIssue false

Image segmentation-driven sim-to-real deep reinforcement learning framework for accurate peg-in-hole assembly

Published online by Cambridge University Press:  18 July 2025

Ning Zhang
Affiliation:
China’s State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
Yongjia Zhao*
Affiliation:
China’s State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China Jiangxi Research Institute, Beihang University, Nanchang, Jiangxi, China
Minghao Yang
Affiliation:
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Shuling Dai
Affiliation:
China’s State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
*
Corresponding author: Yongjia Zhao; Email: zhaoyongjia@buaa.edu.cn

Abstract

The automation of assembly operations with industrial robots is pivotal in modern manufacturing, particularly for multispecies, low-volume, and customized production. Traditional programing methods are time-consuming and lack adaptability to complex, variable environments. Reinforcement learning-based assembly tasks have shown success in simulation environments, but face challenges like the simulation-to-reality gap and safety concerns when transferred to real-world applications. This article addresses these challenges by proposing a low-cost, image-segmentation-driven deep reinforcement learning strategy tailored for insertion tasks, such as the assembly of peg-in-hole components in satellite manufacturing, which involve extensive contact interactions. Our approach integrates visual and forces feedback into a prior dueling deep Q-network for insertion skill learning, enabling precise alignment of components. To bridge the simulation-to-reality gap, we transform the raw image input space into a canonical space based on image segmentation. Specifically, we employ a segmentation model based on U-net, pretrained in simulation and fine-tuned with real-world data, significantly reducing the need for labor-intensive real image segment labels. To handle the frequent contact inherent in peg-in-hole tasks, we integrated safety protections and impedance control into the training process, providing active compliance and reducing the risk of assembly failures. Our approach was evaluated in both simulated and real robotic environments, demonstrating robust performance in handling camera position errors and varying ambient light intensities and different lighting colors. Finally, the algorithm was validated in a real satellite assembly scenario, achieving a success rate of 15 out of 20 tests.

Information

Type
Research Article
Copyright
© The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Beltran-Hernandez, C. C., Petit, D., Ramirez-Alpizar, I. G. and Harada, K., “Variable compliance control for robotic peg-in-hole assembly: A deep-reinforcement-learning approach,” Appl. Sci. 10a(19), 6923 (2020a).10.3390/app10196923CrossRefGoogle Scholar
Beltran-Hernandez, C. C., Petit, D., Ramirez-Alpizar, I. G., Nishi, T., Kikuchi, S., Matsubara, T. and Harada, K., “Learning force control for contact-rich manipulation tasks with rigid position-controlled robots,” IEEE Robot. Autom. Lett. 5b(4), 57095716 (2020b).10.1109/LRA.2020.3010739CrossRefGoogle Scholar
Bogdanovic, M., Khadiv, M. and Righetti, L., “Learning variable impedance control for contact sensitive tasks,” IEEE Robot. Autom. Lett. 5(4), 61296136 (2020).10.1109/LRA.2020.3011379CrossRefGoogle Scholar
Bouchard, C., Nesme, M., Tournier, M., Wang, B., Faure, F. and Kry, P., “6d frictional contact for rigid bodies,” In: Proceedings of the 41st Graphics Interface Conference, Canadian Information Processing Society(2015) pp. 105114.Google Scholar
Breyer, M., Furrer, F., Novkovic, T., Siegwart, R. and Nieto, J., “Comparing task simplifications tolearn closed-loop object picking using deep reinforcement learning,” IEEE Robot. Autom. Lett. 4(2), 15491556 (2019).10.1109/LRA.2019.2896467CrossRefGoogle Scholar
Cao, M. Y., Laws, S. and y Baena, F. R., “Six-axis force/torque sensors for robotics applications: A review,” IEEE Sens. J. 21(24), 2723827251 (2021).10.1109/JSEN.2021.3123638CrossRefGoogle Scholar
Chen, W., Zeng, C., Liang, H., Sun, F. and Zhang, J., “Multimodality driven impedance-based sim2real transfer learning for robotic multiple peg-in-hole assembly,” IEEE Trans. Cybernetics 54(5), 27842797 (2023).10.1109/TCYB.2023.3310505CrossRefGoogle Scholar
Haarnoja, T., Zhou, A., Abbeel, P. and Levine, S., “Soft actor-critic: Off-policy maximum entropydeep reinforcement learning with a stochastic actor,” In: International Conference on Machine Learning, PMLR (2018) pp. 18611870.Google Scholar
Zhang, W. H., Chen, Y. M., Yang, H., Su, S., Shann, T., Chang, Y., Brian, H., Tu, C., Hsiao, T., Lai, S., Chang, Y. and Lee, C., “Virtual-to-real: learning to control in visual semantic segmentation,” In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI), AAAI Press (2018) pp. 49124920.Google Scholar
Hou, Z., Dong, H., Zhang, K., Gao, Q., Chen, K. and Xu, J., “Knowledge-driven Deep Deterministic Policy Gradient for Robotic Multiple Peg-in-hole Assembly Tasks,” In: 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO) 2018, IEEE (2018) pp. 256261.Google Scholar
Inoue, T., De Magistris, G., Munawar, A., Yokoya, T. and Tachibana, R., “Deep Reinforcement Learning for High Precision Assembly Tasks,” In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE (2017) pp. 819825.Google Scholar
James, S., Wohlhart, P., Kalakrishnan, M., Kalashnikov, D., Irpan, A., Ibarz, J., Levine, S., Hadsell, R. and Bousmalis, K., “Sim-to-real Via Sim-to-sim: Data-efficient Robotic Grasping Via Randomized-to-canonical Adaptation Networks,” In: Proceedings Of The IEEE/CVF Conference On Computer Vision And Pattern Recognition, IEEE (2019) pp. 1262712637.Google Scholar
Jeong, R., Aytar, Y., Khosid, D., Zhou, Y., Kay, J., Lampe, T., Bousmalis, K. and Nori, F., “Self-supervised Sim-to-real Adaptation for Visual Robotic Manipulation,” In: 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE (2020) pp. 27182724.Google Scholar
Kim, Y.-L., Song, H.-C. and Song, J.-B., “Hole detection algorithm for chamferless square peg-in-hole based on shape recognition using f/t sensor,” Int. J. Precis. Eng. Man. 15(3), 425432 (2014).10.1007/s12541-014-0353-6CrossRefGoogle Scholar
Kroemer, O., Niekum, S. and Konidaris, G., “A review of robot learning for manipulation: Challenges, representations, and algorithms,” J. Mach. Learn. Res. 22(30), 182 (2021).Google Scholar
Kulkarni, P., Kober, J., Babuška, R. and Santina, C. D., “Learning assembly tasks in a few minutes by combining impedance control and residual recurrent reinforcement learning,” Adv. Intell. Syst. 4(1), 2100095 (2022).10.1002/aisy.202100095CrossRefGoogle Scholar
Li, X., Xiao, J., Zhao, W., Liu, H. and Wang, G., “Multiple peg-in-hole compliant assembly based on a learning-accelerated deep deterministic policy gradient strategy,” Industrial Robot: The International Journal of Robotics Research and Application 49(1), 5464 (2022).10.1108/IR-01-2021-0003CrossRefGoogle Scholar
Liu, Y., Chen, Z., Qiao, H. and Gan, S., “Compliant peg-in-hole assembly for nonconvex axisymmetric components based on attractive region in environment,” Robotica 41(8), 23142336 (2023).10.1017/S0263574723000425CrossRefGoogle Scholar
Luo, J., Solowjow, E., Wen, C., Ojea, J. A., Agogino, A. M., Tamar, A. and Abbeel, P., “Reinforcement Learning on Variable Impedance Controller for High-precision Robotic Assembly,” In: 2019 International Conference on Robotics and Automation (ICRA), IEEE (2019) pp. 30803087.Google Scholar
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D. and Kavukcuoglu, K.,“Asynchronous Methods for Deep Reinforcement Learning,” In: International Conference on Machine Learning, PMLR (2016) pp. 19281937.Google Scholar
Pareek, S., Nisar, H. J. and Kesavadas, T., “Ar3n: A reinforcement learning-based assist-as-needed controller for robotic rehabilitation,” IEEE Robot. Autom. Mag. 31(3), 7482 (2023).10.1109/MRA.2023.3282434CrossRefGoogle Scholar
Pashevich, A., Strudel, R., Kalevatykh, I., Laptev, I. and Schmid, C., “Learning to Augment Synthetic Images for Sim2real Policy Transfer,” In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE (2019) pp. 26512657.Google Scholar
Puang, E. Y., Tee, K. P. and Jing, W., “Kovis: Keypoint-based Visual Servoing with Zero-shot Sim-to-real Transfer for Robotics Manipulation,” 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE (2020) pp. 75277533.Google Scholar
Rao, K., Harris, C., Irpan, A., Levine, S., Ibarz, J. and Khansari, M., “Rl-cyclegan: Reinforcement Learning Aware Simulation-to-real,” In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE (2020) pp. 1115711166.Google Scholar
Ronneberger, O., Fischer, P. and Brox, T., “U-net: Convolutional Networks for Biomedical Image Segmentation,” In: Medical Image Computing and Computer-assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, Springer (2015) pp. 234241, proceedings, part III 18 2015.Google Scholar
Shi, Y., Yuan, C., Tsitos, A., Cong, L., Hadjar, H., Chen, Z. and Zhang, J., “A sim-to-real learning-based framework for contact-rich assembly by utilizing cyclegan and force control,” IEEE Trans. Cogn. Dev. Syst. 15(4), 21442155 (2023).10.1109/TCDS.2023.3237734CrossRefGoogle Scholar
Spector, O. and Di Castro, D., “Insertionnet-a scalable solution for insertion,” IEEE Robot. Autom. Lett. 6(3), 55095516 (2021).10.1109/LRA.2021.3076971CrossRefGoogle Scholar
Stemmer, A., Schreiber, G., Arbter, K. and Albu-Schaffer, A., “Robust Assembly of Complex Shaped Planar Parts Using Vision And Force,” 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, IEEE (2006) pp. 493500.Google Scholar
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W. and Abbeel, P., “Domain Randomization for Transferring Deep Neural Networks From Simulation to the Real World,” In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE (2017) pp. 2330.Google Scholar
Villagomez, R. C. and Ordoñez, J., “Robot Grasping Based on RGB Object And Grasp Detection Using Deep Learning,” In: 2022, 8th International Conference on Mechatronics and Robotics Engineering (ICMRE), IEEE (2022) pp. 8490.Google Scholar
Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M. and Freitas, N., “Dueling Network Architectures for Deep Reinforcement Learning,” International Conference on Machine Learning, PMLR (2016) pp. 19952003.Google Scholar
Wu, J., Zhou, Y., Yang, H., Huang, Z. and Lv, C., “Human-guided reinforcement learning with sim-to-real transfer for autonomous navigation,” IEEE Trans. Pattern Anal. Mach. Intell. 45(12), 1474514759 (2023).10.1109/TPAMI.2023.3314762CrossRefGoogle ScholarPubMed
Wu, W., Liu, K. and Wang, T., “Robot assembly theory and simulation of circular-rectangular compound peg-in-hole,” Robotica 40(9), 33063339 (2022).10.1017/S0263574722000200CrossRefGoogle Scholar
Xu, B., Hassan, T. and Hussain, I., “Seg-curl: Segmented contrastive unsupervised reinforcement learning for sim-to-real in visual robotic manipulation,” IEEE Access 11, 5019550204 (2023).10.1109/ACCESS.2023.3278208CrossRefGoogle Scholar
Xu, J., Hou, Z., Wang, W., Xu, B., Zhang, K. and Chen, K., “Feedback deep deterministic policy gradient with fuzzy reward for robotic multiple peg-in-hole assembly tasks,” IEEE Trans. Ind. Inform. 15(3), 16581667 (2018).10.1109/TII.2018.2868859CrossRefGoogle Scholar
Yang, Q., Dürr, A., Topp, E. A., Stork, J. A. and Stoyanov, T., “Variable impedance skill learning for contact-rich manipulation,” IEEE Robot. Autom. Lett. 7(3), 83918398 (2022).10.1109/LRA.2022.3187276CrossRefGoogle Scholar
Yu, X., He, W., Li, Q., Li, Y. and Li, B., “Human-robot co-carrying using visual and force sensing,” IEEE Trans. Ind. Electron. 68(9), 86578666 (2020).10.1109/TIE.2020.3016271CrossRefGoogle Scholar
Yu, X., He, W., Li, Y., Xue, C., Li, J., Zou, J. and Yang, C., “Bayesian estimation of human impedance and motion intention for human–robot collaboration,” IEEE Trans. Cybernetics 51(4), 18221834 (2019).10.1109/TCYB.2019.2940276CrossRefGoogle Scholar
Zeng, C., Li, S., Chen, Z., Yang, C., Sun, F. and Zhang, J., “Multifingered robot hand compliant manipulation based on vision-based demonstration and adaptive force control,” IEEE Trans. Neural Netw. Learn. Syst. 34(9), 54525463 (2022).10.1109/TNNLS.2022.3184258CrossRefGoogle Scholar
Zhang, X., Sun, L., Kuang, Z. and Tomizuka, M., “Learning variable impedance control via inverse reinforcement learning for force-related tasks,” IEEE Robot. Autom. Lett. 6(2), 22252232 (2021).10.1109/LRA.2021.3061374CrossRefGoogle Scholar
Zhao, W., Queralta, J. P. and Westerlund, T., “SIm-to-real Transfer in Deep Reinforcement Learning for Robotics: A Survey,” In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI), IEEE (2020) pp. 737744.Google Scholar