Hostname: page-component-848d4c4894-ttngx Total loading time: 0 Render date: 2024-05-18T22:38:36.889Z Has data issue: false hasContentIssue false

Learning vision-based robotic manipulation tasks sequentially in offline reinforcement learning settings

Published online by Cambridge University Press:  02 May 2024

Sudhir Pratap Yadav*
Affiliation:
iHub Drishti Foundation, Jodhpur, India, iHub Drishti Foundation
Rajendra Nagar
Affiliation:
Department of Electrical Engineering, IIT Jodhpur, Jodhpur, India, IIT Jodhpur
Suril V. Shah
Affiliation:
Department of Mechanical Engineering, IIT Jodhpur, Jodhpur, India, IIT Jodhpur
*
Corresponding author: Sudhir Pratap Yadav; Email: yadav.1@iitj.ac.in

Abstract

With the rise of deep reinforcement learning (RL) methods, many complex robotic manipulation tasks are being solved. However, harnessing the full power of deep learning requires large datasets. Online RL does not suit itself readily into this paradigm due to costly and time-consuming agent-environment interaction. Therefore, many offline RL algorithms have recently been proposed to learn robotic tasks. But mainly, all such methods focus on a single-task or multitask learning, which requires retraining whenever we need to learn a new task. Continuously learning tasks without forgetting previous knowledge combined with the power of offline deep RL would allow us to scale the number of tasks by adding them one after another. This paper investigates the effectiveness of regularisation-based methods like synaptic intelligence for sequentially learning image-based robotic manipulation tasks in an offline-RL setup. We evaluate the performance of this combined framework against common challenges of sequential learning: catastrophic forgetting and forward knowledge transfer. We performed experiments with different task combinations to analyse the effect of task ordering. We also investigated the effect of the number of object configurations and the density of robot trajectories. We found that learning tasks sequentially helps in the retention of knowledge from previous tasks, thereby reducing the time required to learn a new task. Regularisation-based approaches for continuous learning, like the synaptic intelligence method, help mitigate catastrophic forgetting but have shown only limited transfer of knowledge from previous tasks.

Type
Research Article
Copyright
© The Author(s), 2024. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D. and Riedmiller, M., “Playing atari with deep reinforcement learning,” (2013). arXiv preprint arXiv: 1312.5602, 2013.Google Scholar
Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M. and Vanhoucke, V., “Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation, (2018). arXiv preprint arXiv: 1806.10293, 2018.Google Scholar
Devin, C., Gupta, A., Darrell, T., Abbeel, P. and Levine, S., “Learning Modular Neural Network Policies for Multi-Task and Multi-Robot Transfer,” In: International Conference on Robotics and Automation (ICRA), (2017) pp. 21692176.Google Scholar
Gu, S., Holly, E., Lillicrap, T. and Levine, S., “Deep reinforcement learning for robotic manipulation,” (2016). arXiv preprint arXiv: 1610.00633 1, 2016.Google Scholar
Haarnoja, T., Pong, V., Zhou, A., Dalal, M., Abbeel, P. and Levine, S., “Composable Deep Reinforcement Learning for Robotic Manipulation,” In: International Conference on Robotics and Automation (ICRA), (2018) pp. 62446251.Google Scholar
Gualtieri, M., ten Pas, A. and Platt, R., “Category level pick and place using deep reinforcement learning,” Computing Research Repository (2017). arXiv preprint arXiv: 1707.05615.Google Scholar
Berscheid, L., Meißner, P. and Kröger, T., “Self-supervised learning for precise pick-and-place without object model,” Robot Automa Lett 5(3), 48284835 (2020).CrossRefGoogle Scholar
Lee, A., Devin, C., Zhou, Y., Lampe, T., Bousmalis, K., Springenberg, J., Byravan, A., Abdolmaleki, A., Gileadi, N. and Khosid, D., “Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes.” In: Conference on Robot Learning, (2021).Google Scholar
Bao, J., Zhang, G., Peng, Y., Shao, Z. and Song, A., “Learn multi-step object sorting tasks through deep reinforcement learning,” Robotica 40(11), 38783894 (2022).CrossRefGoogle Scholar
Wu, X., Zhang, D., Qin, F. and Xu, D., “Deep reinforcement learning of robotic precision insertion skill accelerated by demonstrations,” In: International Conference on Automation Science and Engineering (CASE), 1651-1656, (2019).Google Scholar
Yasutomi, A., Mori, H. and Ogata, T., “A Peg-in-Hole Task Strategy for Holes in Concrete,” In: International Conference on Robotics and Automation (ICRA), (2021) pp. 22052211.Google Scholar
Schoettler, G., Nair, A., Luo, J., Bahl, S., Ojea, J., Solowjow, E. and Levine, S., “Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards,” In: International Conference on Intelligent Robots and Systems (IROS), (2020) pp. 55485555.Google Scholar
Nemec, B., Žlajpah, L. and Ude, A., “Door Opening by Joining Reinforcement Learning and Intelligent Control,” In: International Conference on Advanced Robotics (ICAR, (2017) pp. 222228.Google Scholar
Chen, Y., Zeng, C., Wang, Z., Lu, P. and Yang, C., “Zero-shot sim-to-real transfer of reinforcement learning framework for robotics manipulation with demonstration and force feedback,” Robotica 41(3), 10151024 (2023).CrossRefGoogle Scholar
Sun, X., Naito, H., Namiki, A., Liu, Y., Matsuzawa, T. and Takanishi, A., “Assist system for remote manipulation of electric drills by the robot WAREC-1R using deep reinforcement learning,” Robotica 40(2), 365376 (2022).CrossRefGoogle Scholar
Apolinarska, A., Pacher, M., Li, H., Cote, N., Pastrana, R., Gramazio, F. and Kohler, M., “Robotic assembly of timber joints using reinforcement learning,” Automat Constr 125, 103569 (2021).CrossRefGoogle Scholar
Neves, M. and Neto, P., “Deep reinforcement learning applied to an assembly sequence planning problem with user preferences,” Int J Adv Manuf Tech 122(11-12), 42354245 (2022).CrossRefGoogle Scholar
Kulkarni, P., Kober, J., Babuška, R. and Santina, C. D., “Learning assembly tasks in a few minutes by combining impedance control and residual recurrent reinforcement learning,” Adv Intell Syst 4(1), 2100095 (2022).CrossRefGoogle Scholar
Nair, A., Chen, D., Agrawal, P., Isola, P., Abbeel, P., Malik, J. and Levine, S., “Combining Self-Supervised Learning and Imitation for Vision-based Rope Manipulation,” In: International Conference on Robotics and Automation (ICRA), (2017) pp. 21462153.Google Scholar
Lee, R., Ward, D., Cosgun, A., Dasagi, V., Corke, P. and Leitner, J., “Learning arbitrary-goal fabric folding with one hour of real robot experience (2020). arXiv preprint arXiv: 2010.03209.Google Scholar
Gupta, A., Yu, J., Zhao, T., Kumar, V., Rovinsky, A., Xu, K., Devlin, T. and Levine, S., “Reset-Free Reinforcement Learning via Multi-Task Learning: Learning Dexterous Manipulation Behaviors Without Human Intervention,” In: International Conference on Robotics and Automation (ICRA), (2021) pp. 66646671.Google Scholar
Kalashnikov, D., Varley, J., Chebotar, Y., Swanson, B., Jonschkowski, R., Finn, C., Levine, S. and Hausman, K., “Scaling Up Multi-Task Robotic Reinforcement Learning,” In: Conference on Robot Learning (CoRL), (2021).Google Scholar
Sodhani, S., Zhang, A. and Pineau, J., “Multi-Task Reinforcement Learning with Context-based Representations,” In: International Conference on Machine Learning, (2021) pp. 97679779.Google Scholar
Teh, Y., Bapst, V., Czarnecki, W., Quan, J., Kirkpatrick, J., Hadsell, R., Heess, N. and Pascanu, R., “Distral: Robust Multitask Reinforcement Learning,” In: Advances in Neural Information Processing Systems, (2017).Google Scholar
Yu, T., Kumar, S., Gupta, A., Levine, S., Hausman, K. and Finn, C., “Gradient surgery for multi-task learning,” Adv Neur Infor Pro Syst 33, 58245836 (2020).Google Scholar
Goodfellow, I., Mirza, M., Xiao, D., Courville, A. and Bengio, Y., “An empirical investigation of catastrophic forgetting in gradient-based neural networks,” (2013). arXiv preprint arXiv: 1312.6211, 2013.Google Scholar
Mallya, A. and Lazebnik, S., “Packnet: Adding Multiple Tasks to a Single Network by Iterative Pruning,” In: Computer Vision and Pattern Recognition, (2018) pp. 77657773.Google Scholar
Li, Z. and Hoiem, D., “Learning without forgetting,” IEEE Trans Patt Anal Mach Intell 40(12), 29352947 (2017).CrossRefGoogle Scholar
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., Hassabis, D., Clopath, C., Kumaran, D. and Hadsell, R., “Overcoming catastrophic forgetting in neural networks,” Proceed Nat Acad Sci 114(13), 35213526 (2017).CrossRefGoogle ScholarPubMed
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J. and Zaremba, W., “OpenAI gym,” Comp Res Reposit (2016). arXiv preprint arXiv: 1606.01540.Google Scholar
Wołczyk, M., Zajac, M., Pascanu, R., Kuciński, Ł. and Miłoś, P., “Continual world: A robotic benchmark for continual reinforcement learning,” Adv Neur Infor Pro Syst 34, 2849628510 (2021).Google Scholar
Caccia, M., Mueller, J., Kim, T., Charlin, L. and Fakoor, R., “Task-agnostic continual reinforcement learning: In praise of a simple baseline,” (2022). arXiv preprint arXiv: 2205.14495, 2022.Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P. and Levine, S., “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,” In: International Conference on Machine Learning, (2018) pp. 18611870.Google Scholar
French, R., “Catastrophic forgetting in connectionist networks,” Trends Cogn Sci 3(4), 128135 (1999).CrossRefGoogle ScholarPubMed
Zenke, F., Poole, B. and Ganguli, S., “Continual Learning through Synaptic Intelligence,” In: International Conference on Machine Learning, (2017) pp. 39873995.Google Scholar
Singh, A., Yu, A., Yang, J., Zhang, J., Kumar, A. and Levine, S., “Cog: Connecting new skills to past experience with offline reinforcement learning,” (2020). arXiv preprint arXiv: 2010.14500.Google Scholar
Kumar, A., Zhou, A., Tucker, G. and Levine, S., “Conservative q-learning for offline reinforcement learning,” Adv Neur Info Pro Syst 33, 11791191 (2020).Google Scholar
Van Hasselt, H., Guez, A. and Silver, D., “Deep reinforcement learning with double Q-learning,” Proceed AAAI Conf Arti Intell 30(1), (2016). doi: 10.1609/aaai.v30i1.10295 Google Scholar
Kingma, D. and Ba, J., “Adam: A method for stochastic optimization,” (2014). arXiv preprint arXiv: 1412.6980.Google Scholar
Erwin, C. and Yunfei, B.. PyBullet, a Python Module for Physics Simulation for Games, Robotics and Machine Learning (PyBullet, 2016).Google Scholar