Learning vision-based robotic manipulation tasks sequentially in offline reinforcement learning settings

Sudhir Pratap Yadav; Rajendra Nagar; Suril V. Shah

doi:10.1017/S0263574724000389

Learning vision-based robotic manipulation tasks sequentially in offline reinforcement learning settings

Published online by Cambridge University Press: 02 May 2024

Sudhir Pratap Yadav

Rajendra Nagar and

Suril V. Shah

Show author details

Sudhir Pratap Yadav*: Affiliation:
iHub Drishti Foundation, Jodhpur, India, iHub Drishti Foundation
Rajendra Nagar: Affiliation:
Department of Electrical Engineering, IIT Jodhpur, Jodhpur, India, IIT Jodhpur
Suril V. Shah: Affiliation:
Department of Mechanical Engineering, IIT Jodhpur, Jodhpur, India, IIT Jodhpur
*: Corresponding author: Sudhir Pratap Yadav; Email: yadav.1@iitj.ac.in

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

With the rise of deep reinforcement learning (RL) methods, many complex robotic manipulation tasks are being solved. However, harnessing the full power of deep learning requires large datasets. Online RL does not suit itself readily into this paradigm due to costly and time-consuming agent-environment interaction. Therefore, many offline RL algorithms have recently been proposed to learn robotic tasks. But mainly, all such methods focus on a single-task or multitask learning, which requires retraining whenever we need to learn a new task. Continuously learning tasks without forgetting previous knowledge combined with the power of offline deep RL would allow us to scale the number of tasks by adding them one after another. This paper investigates the effectiveness of regularisation-based methods like synaptic intelligence for sequentially learning image-based robotic manipulation tasks in an offline-RL setup. We evaluate the performance of this combined framework against common challenges of sequential learning: catastrophic forgetting and forward knowledge transfer. We performed experiments with different task combinations to analyse the effect of task ordering. We also investigated the effect of the number of object configurations and the density of robot trajectories. We found that learning tasks sequentially helps in the retention of knowledge from previous tasks, thereby reducing the time required to learn a new task. Regularisation-based approaches for continuous learning, like the synaptic intelligence method, help mitigate catastrophic forgetting but have shown only limited transfer of knowledge from previous tasks.

Keywords

sequential task learning offline deep reinforcement learning grasping computer vision vision-based manipulation

Type: Research Article
Information: Robotica , First View , pp. 1 - 16

DOI: https://doi.org/10.1017/S0263574724000389 [Opens in a new window]
Copyright: © The Author(s), 2024. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D. and Riedmiller, M., “Playing atari with deep reinforcement learning,” (2013). arXiv preprint arXiv: 1312.5602, 2013.Google Scholar

Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M. and Vanhoucke, V., “Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation, (2018). arXiv preprint arXiv: 1806.10293, 2018.Google Scholar

Devin, C., Gupta, A., Darrell, T., Abbeel, P. and Levine, S., “Learning Modular Neural Network Policies for Multi-Task and Multi-Robot Transfer,” In: International Conference on Robotics and Automation (ICRA), (2017) pp. 2169–2176.Google Scholar

Gu, S., Holly, E., Lillicrap, T. and Levine, S., “Deep reinforcement learning for robotic manipulation,” (2016). arXiv preprint arXiv: 1610.00633 1, 2016.Google Scholar

Haarnoja, T., Pong, V., Zhou, A., Dalal, M., Abbeel, P. and Levine, S., “Composable Deep Reinforcement Learning for Robotic Manipulation,” In: International Conference on Robotics and Automation (ICRA), (2018) pp. 6244–6251.Google Scholar

Gualtieri, M., ten Pas, A. and Platt, R., “Category level pick and place using deep reinforcement learning,” Computing Research Repository (2017). arXiv preprint arXiv: 1707.05615.Google Scholar

Berscheid, L., Meißner, P. and Kröger, T., “Self-supervised learning for precise pick-and-place without object model,” Robot Automa Lett 5(3), 4828–4835 (2020).CrossRef Google Scholar

Lee, A., Devin, C., Zhou, Y., Lampe, T., Bousmalis, K., Springenberg, J., Byravan, A., Abdolmaleki, A., Gileadi, N. and Khosid, D., “Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes.” In: Conference on Robot Learning, (2021).Google Scholar

Bao, J., Zhang, G., Peng, Y., Shao, Z. and Song, A., “Learn multi-step object sorting tasks through deep reinforcement learning,” Robotica 40(11), 3878–3894 (2022).CrossRef Google Scholar

Wu, X., Zhang, D., Qin, F. and Xu, D., “Deep reinforcement learning of robotic precision insertion skill accelerated by demonstrations,” In: International Conference on Automation Science and Engineering (CASE), 1651-1656, (2019).Google Scholar

Yasutomi, A., Mori, H. and Ogata, T., “A Peg-in-Hole Task Strategy for Holes in Concrete,” In: International Conference on Robotics and Automation (ICRA), (2021) pp. 2205–2211.Google Scholar

Schoettler, G., Nair, A., Luo, J., Bahl, S., Ojea, J., Solowjow, E. and Levine, S., “Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards,” In: International Conference on Intelligent Robots and Systems (IROS), (2020) pp. 5548–5555.Google Scholar

Nemec, B., Žlajpah, L. and Ude, A., “Door Opening by Joining Reinforcement Learning and Intelligent Control,” In: International Conference on Advanced Robotics (ICAR, (2017) pp. 222–228.Google Scholar

Chen, Y., Zeng, C., Wang, Z., Lu, P. and Yang, C., “Zero-shot sim-to-real transfer of reinforcement learning framework for robotics manipulation with demonstration and force feedback,” Robotica 41(3), 1015–1024 (2023).CrossRef Google Scholar

Sun, X., Naito, H., Namiki, A., Liu, Y., Matsuzawa, T. and Takanishi, A., “Assist system for remote manipulation of electric drills by the robot WAREC-1R using deep reinforcement learning,” Robotica 40(2), 365–376 (2022).CrossRef Google Scholar

Apolinarska, A., Pacher, M., Li, H., Cote, N., Pastrana, R., Gramazio, F. and Kohler, M., “Robotic assembly of timber joints using reinforcement learning,” Automat Constr 125, 103569 (2021).CrossRef Google Scholar

Neves, M. and Neto, P., “Deep reinforcement learning applied to an assembly sequence planning problem with user preferences,” Int J Adv Manuf Tech 122(11-12), 4235–4245 (2022).CrossRef Google Scholar

Kulkarni, P., Kober, J., Babuška, R. and Santina, C. D., “Learning assembly tasks in a few minutes by combining impedance control and residual recurrent reinforcement learning,” Adv Intell Syst 4(1), 2100095 (2022).CrossRef Google Scholar

Nair, A., Chen, D., Agrawal, P., Isola, P., Abbeel, P., Malik, J. and Levine, S., “Combining Self-Supervised Learning and Imitation for Vision-based Rope Manipulation,” In: International Conference on Robotics and Automation (ICRA), (2017) pp. 2146–2153.Google Scholar

Lee, R., Ward, D., Cosgun, A., Dasagi, V., Corke, P. and Leitner, J., “Learning arbitrary-goal fabric folding with one hour of real robot experience (2020). arXiv preprint arXiv: 2010.03209.Google Scholar

Gupta, A., Yu, J., Zhao, T., Kumar, V., Rovinsky, A., Xu, K., Devlin, T. and Levine, S., “Reset-Free Reinforcement Learning via Multi-Task Learning: Learning Dexterous Manipulation Behaviors Without Human Intervention,” In: International Conference on Robotics and Automation (ICRA), (2021) pp. 6664–6671.Google Scholar

Kalashnikov, D., Varley, J., Chebotar, Y., Swanson, B., Jonschkowski, R., Finn, C., Levine, S. and Hausman, K., “Scaling Up Multi-Task Robotic Reinforcement Learning,” In: Conference on Robot Learning (CoRL), (2021).Google Scholar

Sodhani, S., Zhang, A. and Pineau, J., “Multi-Task Reinforcement Learning with Context-based Representations,” In: International Conference on Machine Learning, (2021) pp. 9767–9779.Google Scholar

Teh, Y., Bapst, V., Czarnecki, W., Quan, J., Kirkpatrick, J., Hadsell, R., Heess, N. and Pascanu, R., “Distral: Robust Multitask Reinforcement Learning,” In: Advances in Neural Information Processing Systems, (2017).Google Scholar

Yu, T., Kumar, S., Gupta, A., Levine, S., Hausman, K. and Finn, C., “Gradient surgery for multi-task learning,” Adv Neur Infor Pro Syst 33, 5824–5836 (2020).Google Scholar

Goodfellow, I., Mirza, M., Xiao, D., Courville, A. and Bengio, Y., “An empirical investigation of catastrophic forgetting in gradient-based neural networks,” (2013). arXiv preprint arXiv: 1312.6211, 2013.Google Scholar

Mallya, A. and Lazebnik, S., “Packnet: Adding Multiple Tasks to a Single Network by Iterative Pruning,” In: Computer Vision and Pattern Recognition, (2018) pp. 7765–7773.Google Scholar

Li, Z. and Hoiem, D., “Learning without forgetting,” IEEE Trans Patt Anal Mach Intell 40(12), 2935–2947 (2017).CrossRef Google Scholar

Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., Hassabis, D., Clopath, C., Kumaran, D. and Hadsell, R., “Overcoming catastrophic forgetting in neural networks,” Proceed Nat Acad Sci 114(13), 3521–3526 (2017).CrossRef Google Scholar PubMed

Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J. and Zaremba, W., “OpenAI gym,” Comp Res Reposit (2016). arXiv preprint arXiv: 1606.01540.Google Scholar

Wołczyk, M., Zajac, M., Pascanu, R., Kuciński, Ł. and Miłoś, P., “Continual world: A robotic benchmark for continual reinforcement learning,” Adv Neur Infor Pro Syst 34, 28496–28510 (2021).Google Scholar

Caccia, M., Mueller, J., Kim, T., Charlin, L. and Fakoor, R., “Task-agnostic continual reinforcement learning: In praise of a simple baseline,” (2022). arXiv preprint arXiv: 2205.14495, 2022.Google Scholar

Haarnoja, T., Zhou, A., Abbeel, P. and Levine, S., “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,” In: International Conference on Machine Learning, (2018) pp. 1861–1870.Google Scholar

French, R., “Catastrophic forgetting in connectionist networks,” Trends Cogn Sci 3(4), 128–135 (1999).CrossRef Google Scholar PubMed

Zenke, F., Poole, B. and Ganguli, S., “Continual Learning through Synaptic Intelligence,” In: International Conference on Machine Learning, (2017) pp. 3987–3995.Google Scholar

Singh, A., Yu, A., Yang, J., Zhang, J., Kumar, A. and Levine, S., “Cog: Connecting new skills to past experience with offline reinforcement learning,” (2020). arXiv preprint arXiv: 2010.14500.Google Scholar

Kumar, A., Zhou, A., Tucker, G. and Levine, S., “Conservative q-learning for offline reinforcement learning,” Adv Neur Info Pro Syst 33, 1179–1191 (2020).Google Scholar

Van Hasselt, H., Guez, A. and Silver, D., “Deep reinforcement learning with double Q-learning,” Proceed AAAI Conf Arti Intell 30(1), (2016). doi: 10.1609/aaai.v30i1.10295 Google Scholar

Kingma, D. and Ba, J., “Adam: A method for stochastic optimization,” (2014). arXiv preprint arXiv: 1412.6980.Google Scholar

Erwin, C. and Yunfei, B.. PyBullet, a Python Module for Physics Simulation for Games, Robotics and Machine Learning (PyBullet, 2016).Google Scholar

Article contents

Learning vision-based robotic manipulation tasks sequentially in offline reinforcement learning settings

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests