One-shot sim-to-real transfer policy for robotic assembly via reinforcement learning with visual demonstration

Ruihong Xiao; Chenguang Yang; Yiming Jiang; Hui Zhang

doi:10.1017/S0263574724000092

One-shot sim-to-real transfer policy for robotic assembly via reinforcement learning with visual demonstration

Published online by Cambridge University Press: 24 January 2024

Yiming Jiang and

Ruihong Xiao: Affiliation:
School of Automation Science and Engineering, South China University of Technology, Guangzhou, China
Chenguang Yang*: Affiliation:
School of Automation Science and Engineering, South China University of Technology, Guangzhou, China
Yiming Jiang: Affiliation:
The National Engineering Research Center for Robot Visual Perception and Control, Hunan University, Changsha, China
Hui Zhang: Affiliation:
The National Engineering Research Center for Robot Visual Perception and Control, Hunan University, Changsha, China
*: Corresponding author: Chenguang Yang; Email: cyang@ieee.org

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Reinforcement learning (RL) has been successfully applied to a wealth of robot manipulation tasks and continuous control problems. However, it is still limited to industrial applications and suffers from three major challenges: sample inefficiency, real data collection, and the gap between simulator and reality. In this paper, we focus on the practical application of RL for robot assembly in the real world. We apply enlightenment learning to improve the proximal policy optimization, an on-policy model-free actor-critic reinforcement learning algorithm, to train an agent in Cartesian space using the proprioceptive information. We introduce enlightenment learning incorporated via pretraining, which is beneficial to reduce the cost of policy training and improve the effectiveness of the policy. A human-like assembly trajectory is generated through a two-step method with segmenting objects by locations and iterative closest point for pretraining. We also design a sim-to-real controller to correct the error while transferring to reality. We set up the environment in the MuJoCo simulator and demonstrated the proposed method on the recently established The National Institute of Standards and Technology (NIST) gear assembly benchmark. The paper introduces a unique framework that enables a robot to learn assembly tasks efficiently using limited real-world samples by leveraging simulations and visual demonstrations. The comparative experiment results indicate that our approach surpasses other baseline methods in terms of training speed, success rate, and efficiency.

Keywords

visual demonstration RL from demonstration sim-to-real transfer robotic assembly

Type: Research Article
Information: Robotica , Volume 42 , Issue 4 , April 2024 , pp. 1074 - 1093

DOI: https://doi.org/10.1017/S0263574724000092 [Opens in a new window]
Copyright: © The Author(s), 2024. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Gao, H., Li, Z., Yu, X. and Qiu, J., “Hierarchical multiobjective heuristic for pcb assembly optimization in a beam-head surface mounter,” IEEE Trans. Cybernet. 52(7), 6911–6924 (2021).Google Scholar

Fujimoto, S., Hoof, H. and Meger, D., “Addressing Function Approximation Error in Actor-Critic Methods” International Conference on Machine Learning, PMLR (2018) pp. 1587–1596.Google Scholar

Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D. and Wierstra, D., “Continuous control with deep reinforcement learning” (2015), arXiv preprint arXiv: 1509.02971.Google Scholar

Schulman, J., Wolski, F., Dhariwal, P., Radford, A. and Klimov, O., “Proximal policy optimization algorithms” (2017), arXiv preprint arXiv: 1707.06347.Google Scholar

Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P. and Levine, S. “Soft actor-critic algorithms and applications” (2018), arXiv preprint arXiv: 1812.05905.Google Scholar

Kwan, W.-C., Wang, H.-R., Wang, H.-M. and Wong, K.-F., “A survey on recent advances and challenges in reinforcement learning methods for task-oriented dialogue policy learning,” Mach. Intell. Res. 20(3), 318–334 (2023).CrossRef Google Scholar

Li, R. and Qiao, H., “A survey of methods and strategies for high-precision robotic grasping and assembly tasks–some new trends,” IEEE/ASME Trans. Mechatron. 24(6), 2718–2732 (2019).CrossRef Google Scholar

Khlif, N., Nahla, K. and Safya, B., “Reinforcement learning with modified exploration strategy for mobile robot path planning,” Robotica 41, 1–15 (2023).CrossRef Google Scholar

Dong, J., Si, W. and Yang, C., “A novel human-robot skill transfer method for contact-rich manipulation task,” Robot. Intell. Automat. 43, 327–337 (2023).CrossRef Google Scholar

Qiao, H., Wu, Y.-X., Zhong, S.-L., Yin, P.-J. and Chen, J.-H., “Brain-inspired intelligent robotics: Theoretical analysis and systematic application,” Mach. Intell. Res. 20(1), 1–18 (2023).CrossRef Google Scholar

Qiao, H., Zhong, S., Chen, Z. and Wang, H., “Improving performance of robots using human-inspired approaches: A survey,” Sci. China Inf. Sci. 65(12), 1–31 (2022).CrossRef Google Scholar

Schaal, S., Ijspeert, A. and Billard, A., “Computational approaches to motor learning by imitation,” Philos. Trans. R. Soc. Lond.. Ser. B Biol. Sci. 358(1431), 537–547 (2003).CrossRef Google Scholar PubMed

Wen, B., Lian, W., Bekris, K. and Schaal, S., “You only demonstrate once: Category-level manipulation from single visual demonstration” (2022), arXiv preprint arXiv: 2201.12716.Google Scholar

Rajeswaran, A., Kumar, V., Gupta, A., Vezzani, G., Schulman, J., Todorov, E. and Levine, S., “Learning complex dexterous manipulation with deep reinforcement learning and demonstrations” (2017), arXiv preprint arXiv: 1709.10087.Google Scholar

Vecerik, M., Hester, T., Scholz, J., Wang, F., Pietquin, O., Piot, B., Heess, N., Rothörl, T., Lampe, T. and Riedmiller, M., “Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards” (2017), arXiv preprint arXiv: 1707.08817.Google Scholar

Zhang, L., Feng, Y., Wang, R., Xu, Y., Xu, N., Liu, Z. and Du, H., “Efficient experience replay architecture for offline reinforcement learning,” Robot. Intell. Automat. 43(1), 35–43 (2023).CrossRef Google Scholar

Zhao, W., Queralta, J. P. and Westerlund, T., “Sim-to-real transfer in deep reinforcement learning for robotics: a survey” 2020 IEEE Symposium Series on Computational Intelligence (SSCI), IEEE (2020) pp. 737–744.Google Scholar

Kimble, K., Van Wyk, K., Falco, J., Messina, E., Sun, Y., Shibata, M., Uemura, W. and Yokokohji, Y., “Benchmarking protocols for evaluating small parts robotic assembly systems,” IEEE Robot. Automat. Lett. 5(2), 883–889 (2020).CrossRef Google Scholar PubMed

Wu, W., Liu, K. and Wang, T., “Robot assembly theory and simulation of circular-rectangular compound peg-in-hole,” Robotica 40(9), 3306–3339 (2022).CrossRef Google Scholar

Lee, B. R. and Ro, P. I., “Path finding and grasp planning for robotic assembly,” Robotica 12(4), 353–360 (1994).CrossRef Google Scholar

Hart, P. E., Nilsson, N. J. and Raphael, B., “A formal basis for the heuristic determination of minimum cost paths,” IEEE Trans. Syst. Sci. Cybern. 4(2), 100–107 (1968).CrossRef Google Scholar

Stentz, A., “Optimal and Efficient Path Planning for Partially-known Environments” Proceedings of the 1994 IEEE international conference on robotics and automation, IEEE (1994) pp. 3310–3317,Google Scholar

Kuffner, J. J. and LaValle, S. M., “RRT-Connect: An Efficient Approach to Single-Query Path Planning” Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No. 00CH37065), IEEE, vol. 2 (2000) pp. 995–1001.Google Scholar

Dong, L., He, Z., Song, C. and Sun, C., “A review of mobile robot motion planning methods: From classical motion planning workflows to reinforcement learning-based architectures,” J. Syst. Eng. Electron. 34(2), 439–459 (2023).CrossRef Google Scholar

Inoue, T., De Magistris, G., Munawar, A., Yokoya, T. and Tachibana, R., “Deep Reinforcement Learning for High Precision Assembly Tasks” 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE (2017) pp. 819–825.Google Scholar

Theodorou, E., Buchli, J. and Schaal, S., “Reinforcement Learning of Motor Skills in High Dimensions: A Path Integral Approach” 2010 IEEE International Conference on Robotics and Automation, IEEE (2010) pp. 2397–2403.Google Scholar

Schaal, S., “Dynamic Movement Primitives-a Framework for Motor Control in Humans and Humanoid Robotics,” In: Adaptive Motion of Animals and Machines (Springer, 2006) pp. 261–280.CrossRef Google Scholar

Lu, Z. and Wang, N., “Dynamic movement primitives based cloud robotic skill learning for point and non-point obstacle avoidance,” Assembly Autom. 41(3), 302–311 (2021).CrossRef Google Scholar

Kong, L.-H., He, W., Chen, W.-S., Zhang, H. and Wang, Y.-N., “Dynamic movement primitives based robot skills learning,” Mach. Intell. Res. 20(3), 396–407 (2023).CrossRef Google Scholar

Hester, T., Vecerik, M., Pietquin, O., Lanctot, M., Schaul, T., Piot, B., Sendonaris, A., Dulac-Arnold, G., Osband, I., Agapiou, J. P., J. Z. Leibo and A. Gruslys, “Learning from demonstrations for real world reinforcement learning” (2017).CrossRef Google Scholar

Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M. and Dormann, N., “Stable-baselines3: Reliable reinforcement learning implementations,” J. Mach. Learn. Res. 22(1), 12348–12355 (2021).Google Scholar

Xu, Y., Yang, C., Liu, X. and Li, Z., “A Novel Robot Teaching System Based on Mixed Reality” 2018 3rd International Conference on Advanced Robotics and Mechatronics (ICARM), IEEE (2018) pp. 250–255,CrossRef Google Scholar

Xu, Y., Yang, C., Zhong, J., Wang, N. and Zhao, L., “Robot teaching by teleoperation based on visual interaction and extreme learning machine,” Neurocomputing 275, 2093–2103 (2018).CrossRef Google Scholar

He, Y., Sun, W., Huang, H., Liu, J., Fan, H. and Sun, J., “Pvn3d: A Deep Point-wise 3D Keypoints Voting Network for 6dof Pose Estimation” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2020) pp. 11632–11641.Google Scholar

Lin, S., Wang, Z., Ling, Y., Tao, Y. and Yang, C., “E2EK: End-to-end regression network based on keypoint for 6d pose estimation,” IEEE Robot. Automat. Lett. 7(3), 6526–6533 (2022).CrossRef Google Scholar

He, K., Gkioxari, G., Dollár, P. and Girshick, R., “Mask R-CNN” Proceedings of the IEEE International Conference on Computer Vision (2017) pp. 2961–2969.Google Scholar

Wang, X., Zhang, R., Kong, T., Li, L. and Shen, C., “Solov2: Dynamic and Fast Instance Segmentation,” In: Advances in Neural Information Processing Systems. vol. 33, (2020) pp. 17721–17732.Google Scholar

Zakharov, S., Shugurov, I. and Ilic, S., “Dpod: 6D Pose Object Detector and Refiner” Proceedings of the IEEE/CVF International Conference on Computer Vision (2019) pp. 1941–1950.Google Scholar

Rusu, R. B., Blodow, N., Marton, Z. C. and Beetz, M., “Aligning Point Cloud Views Using Persistent Feature Histograms” 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE (2008) pp. 3384–3391.Google Scholar

Rusu, R. B., Blodow, N. and Beetz, M., “Fast Point Feature Histograms (FPFH) for 3D Registration” 2009 IEEE International Conference on Robotics and Automation, IEEE (2009) pp. 3212–3217.Google Scholar

Besl, P. J. and McKay, N. D., “Method for Registration of 3-D Shapes,” In: Sensor Fusion IV: Control Paradigms and Data Structures. vol. 1611 (Spie, 1992) pp. 586–606.Google Scholar

Chen, Y., Zeng, C., Wang, Z., Lu, P. and Yang, C., “Zero-Shot sim-to-real transfer of reinforcement learning framework for robotics manipulation with demonstration and force feedback,” Robotica 41, 1015–1024 (2022).CrossRef Google Scholar

Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W. and Abbeel, P., “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World” 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE (2017) pp. 23–30,CrossRef Google Scholar

Arndt, K., Hazara, M., Ghadirzadeh, A. and Kyrki, V., “Meta Reinforcement Learning for Sim-to-Real Domain Adaptation” 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE (2020) pp. 2725–2731.Google Scholar

Li, Y., Song, J. and Ermon, S., “Infogail: Interpretable Imitation Learning From Visual Demonstrations,” In: Advances in Neural Information Processing Systems. vol. 30 (2017).Google Scholar

Coumans, E. and Bai, Y., “Pybullet, a python module for physics simulation for games, robotics and machine learning (2016).Google Scholar

Todorov, E., Erez, T. and Tassa, Y., “Mujoco: A Physics Engine for Model-based Control” 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE (2012) pp. 5026–5033.Google Scholar

Collins, J., Chand, S., Vanderkop, A. and Howard, D., “A review of physics simulators for robotic applications,” IEEE Access 9, 51416–51431 (2021).CrossRef Google Scholar

Zhang, J., Li, M., Feng, Y. and Yang, C., “Robotic grasp detection based on image processing and random forest,” Multimed. Tools Appl. 79(3), 2427–2446 (2020).CrossRef Google Scholar

Xiao et al. supplementary material

Video 49.1 MB

Article contents

One-shot sim-to-real transfer policy for robotic assembly via reinforcement learning with visual demonstration

Abstract

Keywords

Access options

References

Xiao et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests