Hostname: page-component-77f85d65b8-6bnxx Total loading time: 0 Render date: 2026-03-30T06:37:08.641Z Has data issue: false hasContentIssue false

A multi-modal learning method for pick-and-place task based on human demonstration

Published online by Cambridge University Press:  12 December 2024

Diqing Yu
Affiliation:
Zhejiang University of Technology, Hangzhou, China
Xinggang Fan
Affiliation:
Zhejiang University of Technology, Hangzhou, China Shenzhen Academy of Robotics, Shenzhen, China
Yaonan Li*
Affiliation:
Shenzhen Academy of Robotics, Shenzhen, China
Heping Chen
Affiliation:
Shenzhen Academy of Robotics, Shenzhen, China
Han Li
Affiliation:
Zhejiang University of Technology, Hangzhou, China
Yuao Jin
Affiliation:
Shenzhen Academy of Robotics, Shenzhen, China
*
Corresponding author: Yaonan Li; Email: ynli@szarobots.com

Abstract

Robot pick-and-place for unknown objects is still a very challenging research topic. This paper proposes a multi-modal learning method for robot one-shot imitation of pick-and-place tasks. This method aims to enhance the generality of industrial robots while reducing the amount of data and training costs the one-shot imitation method relies on. The method first categorizes human demonstration videos into different tasks, and these tasks are classified into six types to symbolize as many types of pick-and-place tasks as possible. Second, the method generates multi-modal prompts and finally predicts the action of the robot and completes the symbolic pick-and-place task in industrial production. A carefully curated dataset is created to complement the method. The dataset consists of human demonstration videos and instance images focused on real-world scenes and industrial tasks, which fosters adaptable and efficient learning. Experimental results demonstrate favorable success rates and loss results both in simulation environments and real-world experiments, confirming its effectiveness and practicality.

Information

Type
Research Article
Copyright
© The Author(s), 2024. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable