Hostname: page-component-848d4c4894-pftt2 Total loading time: 0 Render date: 2024-06-03T10:21:14.928Z Has data issue: false hasContentIssue false

One-shot sim-to-real transfer policy for robotic assembly via reinforcement learning with visual demonstration

Published online by Cambridge University Press:  24 January 2024

Ruihong Xiao
Affiliation:
School of Automation Science and Engineering, South China University of Technology, Guangzhou, China
Chenguang Yang*
Affiliation:
School of Automation Science and Engineering, South China University of Technology, Guangzhou, China
Yiming Jiang
Affiliation:
The National Engineering Research Center for Robot Visual Perception and Control, Hunan University, Changsha, China
Hui Zhang
Affiliation:
The National Engineering Research Center for Robot Visual Perception and Control, Hunan University, Changsha, China
*
Corresponding author: Chenguang Yang; Email: cyang@ieee.org

Abstract

Reinforcement learning (RL) has been successfully applied to a wealth of robot manipulation tasks and continuous control problems. However, it is still limited to industrial applications and suffers from three major challenges: sample inefficiency, real data collection, and the gap between simulator and reality. In this paper, we focus on the practical application of RL for robot assembly in the real world. We apply enlightenment learning to improve the proximal policy optimization, an on-policy model-free actor-critic reinforcement learning algorithm, to train an agent in Cartesian space using the proprioceptive information. We introduce enlightenment learning incorporated via pretraining, which is beneficial to reduce the cost of policy training and improve the effectiveness of the policy. A human-like assembly trajectory is generated through a two-step method with segmenting objects by locations and iterative closest point for pretraining. We also design a sim-to-real controller to correct the error while transferring to reality. We set up the environment in the MuJoCo simulator and demonstrated the proposed method on the recently established The National Institute of Standards and Technology (NIST) gear assembly benchmark. The paper introduces a unique framework that enables a robot to learn assembly tasks efficiently using limited real-world samples by leveraging simulations and visual demonstrations. The comparative experiment results indicate that our approach surpasses other baseline methods in terms of training speed, success rate, and efficiency.

Type
Research Article
Copyright
© The Author(s), 2024. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Gao, H., Li, Z., Yu, X. and Qiu, J., “Hierarchical multiobjective heuristic for pcb assembly optimization in a beam-head surface mounter,IEEE Trans. Cybernet. 52(7), 6911–6924 (2021).Google Scholar
Fujimoto, S., Hoof, H. and Meger, D., “Addressing Function Approximation Error in Actor-Critic Methods” International Conference on Machine Learning, PMLR (2018) pp. 1587–1596.Google Scholar
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D. and Wierstra, D., “Continuous control with deep reinforcement learning” (2015), arXiv preprint arXiv: 1509.02971.Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. and Klimov, O., “Proximal policy optimization algorithms” (2017), arXiv preprint arXiv: 1707.06347.Google Scholar
Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P. and Levine, S. “Soft actor-critic algorithms and applications” (2018), arXiv preprint arXiv: 1812.05905.Google Scholar
Kwan, W.-C., Wang, H.-R., Wang, H.-M. and Wong, K.-F., “A survey on recent advances and challenges in reinforcement learning methods for task-oriented dialogue policy learning,Mach. Intell. Res. 20(3), 318334 (2023).CrossRefGoogle Scholar
Li, R. and Qiao, H., “A survey of methods and strategies for high-precision robotic grasping and assembly tasks–some new trends,” IEEE/ASME Trans. Mechatron. 24(6), 27182732 (2019).CrossRefGoogle Scholar
Khlif, N., Nahla, K. and Safya, B., “Reinforcement learning with modified exploration strategy for mobile robot path planning,” Robotica 41, 115 (2023).CrossRefGoogle Scholar
Dong, J., Si, W. and Yang, C., “A novel human-robot skill transfer method for contact-rich manipulation task,” Robot. Intell. Automat. 43, 327337 (2023).CrossRefGoogle Scholar
Qiao, H., Wu, Y.-X., Zhong, S.-L., Yin, P.-J. and Chen, J.-H., “Brain-inspired intelligent robotics: Theoretical analysis and systematic application,” Mach. Intell. Res. 20(1), 118 (2023).CrossRefGoogle Scholar
Qiao, H., Zhong, S., Chen, Z. and Wang, H., “Improving performance of robots using human-inspired approaches: A survey,” Sci. China Inf. Sci. 65(12), 131 (2022).CrossRefGoogle Scholar
Schaal, S., Ijspeert, A. and Billard, A., “Computational approaches to motor learning by imitation,” Philos. Trans. R. Soc. Lond.. Ser. B Biol. Sci. 358(1431), 537547 (2003).CrossRefGoogle ScholarPubMed
Wen, B., Lian, W., Bekris, K. and Schaal, S., “You only demonstrate once: Category-level manipulation from single visual demonstration” (2022), arXiv preprint arXiv: 2201.12716.Google Scholar
Rajeswaran, A., Kumar, V., Gupta, A., Vezzani, G., Schulman, J., Todorov, E. and Levine, S., “Learning complex dexterous manipulation with deep reinforcement learning and demonstrations” (2017), arXiv preprint arXiv: 1709.10087.Google Scholar
Vecerik, M., Hester, T., Scholz, J., Wang, F., Pietquin, O., Piot, B., Heess, N., Rothörl, T., Lampe, T. and Riedmiller, M., “Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards” (2017), arXiv preprint arXiv: 1707.08817.Google Scholar
Zhang, L., Feng, Y., Wang, R., Xu, Y., Xu, N., Liu, Z. and Du, H., “Efficient experience replay architecture for offline reinforcement learning,” Robot. Intell. Automat. 43(1), 3543 (2023).CrossRefGoogle Scholar
Zhao, W., Queralta, J. P. and Westerlund, T., “Sim-to-real transfer in deep reinforcement learning for robotics: a survey2020 IEEE Symposium Series on Computational Intelligence (SSCI), IEEE (2020) pp. 737744.Google Scholar
Kimble, K., Van Wyk, K., Falco, J., Messina, E., Sun, Y., Shibata, M., Uemura, W. and Yokokohji, Y., “Benchmarking protocols for evaluating small parts robotic assembly systems,” IEEE Robot. Automat. Lett. 5(2), 883889 (2020).CrossRefGoogle ScholarPubMed
Wu, W., Liu, K. and Wang, T., “Robot assembly theory and simulation of circular-rectangular compound peg-in-hole,” Robotica 40(9), 33063339 (2022).CrossRefGoogle Scholar
Lee, B. R. and Ro, P. I., “Path finding and grasp planning for robotic assembly,” Robotica 12(4), 353360 (1994).CrossRefGoogle Scholar
Hart, P. E., Nilsson, N. J. and Raphael, B., “A formal basis for the heuristic determination of minimum cost paths,” IEEE Trans. Syst. Sci. Cybern. 4(2), 100107 (1968).CrossRefGoogle Scholar
Stentz, A., “Optimal and Efficient Path Planning for Partially-known EnvironmentsProceedings of the 1994 IEEE international conference on robotics and automation, IEEE (1994) pp. 33103317,Google Scholar
Kuffner, J. J. and LaValle, S. M., “RRT-Connect: An Efficient Approach to Single-Query Path PlanningProceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No. 00CH37065), IEEE, vol. 2 (2000) pp. 9951001.Google Scholar
Dong, L., He, Z., Song, C. and Sun, C., “A review of mobile robot motion planning methods: From classical motion planning workflows to reinforcement learning-based architectures,” J. Syst. Eng. Electron. 34(2), 439459 (2023).CrossRefGoogle Scholar
Inoue, T., De Magistris, G., Munawar, A., Yokoya, T. and Tachibana, R., “Deep Reinforcement Learning for High Precision Assembly Tasks2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE (2017) pp. 819825.Google Scholar
Theodorou, E., Buchli, J. and Schaal, S., “Reinforcement Learning of Motor Skills in High Dimensions: A Path Integral Approach2010 IEEE International Conference on Robotics and Automation, IEEE (2010) pp. 23972403.Google Scholar
Schaal, S., “Dynamic Movement Primitives-a Framework for Motor Control in Humans and Humanoid Robotics,” In: Adaptive Motion of Animals and Machines (Springer, 2006) pp. 261280.CrossRefGoogle Scholar
Lu, Z. and Wang, N., “Dynamic movement primitives based cloud robotic skill learning for point and non-point obstacle avoidance,” Assembly Autom. 41(3), 302311 (2021).CrossRefGoogle Scholar
Kong, L.-H., He, W., Chen, W.-S., Zhang, H. and Wang, Y.-N., “Dynamic movement primitives based robot skills learning,” Mach. Intell. Res. 20(3), 396407 (2023).CrossRefGoogle Scholar
Hester, T., Vecerik, M., Pietquin, O., Lanctot, M., Schaul, T., Piot, B., Sendonaris, A., Dulac-Arnold, G., Osband, I., Agapiou, J. P., J. Z. Leibo and A. Gruslys, “Learning from demonstrations for real world reinforcement learning” (2017).CrossRefGoogle Scholar
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M. and Dormann, N., “Stable-baselines3: Reliable reinforcement learning implementations,” J. Mach. Learn. Res. 22(1), 1234812355 (2021).Google Scholar
Xu, Y., Yang, C., Liu, X. and Li, Z., “A Novel Robot Teaching System Based on Mixed Reality2018 3rd International Conference on Advanced Robotics and Mechatronics (ICARM), IEEE (2018) pp. 250255,CrossRefGoogle Scholar
Xu, Y., Yang, C., Zhong, J., Wang, N. and Zhao, L., “Robot teaching by teleoperation based on visual interaction and extreme learning machine,” Neurocomputing 275, 20932103 (2018).CrossRefGoogle Scholar
He, Y., Sun, W., Huang, H., Liu, J., Fan, H. and Sun, J., “Pvn3d: A Deep Point-wise 3D Keypoints Voting Network for 6dof Pose EstimationProceedings of the IEEE/CVF conference on computer vision and pattern recognition (2020) pp. 1163211641.Google Scholar
Lin, S., Wang, Z., Ling, Y., Tao, Y. and Yang, C., “E2EK: End-to-end regression network based on keypoint for 6d pose estimation,” IEEE Robot. Automat. Lett. 7(3), 65266533 (2022).CrossRefGoogle Scholar
He, K., Gkioxari, G., Dollár, P. and Girshick, R., “Mask R-CNNProceedings of the IEEE International Conference on Computer Vision (2017) pp. 29612969.Google Scholar
Wang, X., Zhang, R., Kong, T., Li, L. and Shen, C., “Solov2: Dynamic and Fast Instance Segmentation,” In: Advances in Neural Information Processing Systems. vol. 33, (2020) pp. 1772117732.Google Scholar
Zakharov, S., Shugurov, I. and Ilic, S., “Dpod: 6D Pose Object Detector and RefinerProceedings of the IEEE/CVF International Conference on Computer Vision (2019) pp. 19411950.Google Scholar
Rusu, R. B., Blodow, N., Marton, Z. C. and Beetz, M., “Aligning Point Cloud Views Using Persistent Feature Histograms2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE (2008) pp. 33843391.Google Scholar
Rusu, R. B., Blodow, N. and Beetz, M., “Fast Point Feature Histograms (FPFH) for 3D Registration2009 IEEE International Conference on Robotics and Automation, IEEE (2009) pp. 32123217.Google Scholar
Besl, P. J. and McKay, N. D., “Method for Registration of 3-D Shapes,In: Sensor Fusion IV: Control Paradigms and Data Structures. vol. 1611 (Spie, 1992) pp. 586606.Google Scholar
Chen, Y., Zeng, C., Wang, Z., Lu, P. and Yang, C., “Zero-Shot sim-to-real transfer of reinforcement learning framework for robotics manipulation with demonstration and force feedback,Robotica 41, 10151024 (2022).CrossRefGoogle Scholar
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W. and Abbeel, P., “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE (2017) pp. 2330,CrossRefGoogle Scholar
Arndt, K., Hazara, M., Ghadirzadeh, A. and Kyrki, V., “Meta Reinforcement Learning for Sim-to-Real Domain Adaptation2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE (2020) pp. 27252731.Google Scholar
Li, Y., Song, J. and Ermon, S., “Infogail: Interpretable Imitation Learning From Visual Demonstrations,” In: Advances in Neural Information Processing Systems. vol. 30 (2017).Google Scholar
Coumans, E. and Bai, Y., “Pybullet, a python module for physics simulation for games, robotics and machine learning (2016).Google Scholar
Todorov, E., Erez, T. and Tassa, Y., “Mujoco: A Physics Engine for Model-based Control2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE (2012) pp. 50265033.Google Scholar
Collins, J., Chand, S., Vanderkop, A. and Howard, D., “A review of physics simulators for robotic applications,” IEEE Access 9, 5141651431 (2021).CrossRefGoogle Scholar
Zhang, J., Li, M., Feng, Y. and Yang, C., “Robotic grasp detection based on image processing and random forest,” Multimed. Tools Appl. 79(3), 24272446 (2020).CrossRefGoogle Scholar

Xiao et al. supplementary material

Xiao et al. supplementary material

Download Xiao et al. supplementary material(Video)
Video 49.1 MB