Hostname: page-component-89b8bd64d-sd5qd Total loading time: 0 Render date: 2026-05-09T07:56:17.027Z Has data issue: false hasContentIssue false

Image segmentation-driven sim-to-real deep reinforcement learning framework for accurate peg-in-hole assembly

Published online by Cambridge University Press:  18 July 2025

Ning Zhang
Affiliation:
China’s State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
Yongjia Zhao*
Affiliation:
China’s State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China Jiangxi Research Institute, Beihang University, Nanchang, Jiangxi, China
Minghao Yang
Affiliation:
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Shuling Dai
Affiliation:
China’s State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
*
Corresponding author: Yongjia Zhao; Email: zhaoyongjia@buaa.edu.cn

Abstract

The automation of assembly operations with industrial robots is pivotal in modern manufacturing, particularly for multispecies, low-volume, and customized production. Traditional programing methods are time-consuming and lack adaptability to complex, variable environments. Reinforcement learning-based assembly tasks have shown success in simulation environments, but face challenges like the simulation-to-reality gap and safety concerns when transferred to real-world applications. This article addresses these challenges by proposing a low-cost, image-segmentation-driven deep reinforcement learning strategy tailored for insertion tasks, such as the assembly of peg-in-hole components in satellite manufacturing, which involve extensive contact interactions. Our approach integrates visual and forces feedback into a prior dueling deep Q-network for insertion skill learning, enabling precise alignment of components. To bridge the simulation-to-reality gap, we transform the raw image input space into a canonical space based on image segmentation. Specifically, we employ a segmentation model based on U-net, pretrained in simulation and fine-tuned with real-world data, significantly reducing the need for labor-intensive real image segment labels. To handle the frequent contact inherent in peg-in-hole tasks, we integrated safety protections and impedance control into the training process, providing active compliance and reducing the risk of assembly failures. Our approach was evaluated in both simulated and real robotic environments, demonstrating robust performance in handling camera position errors and varying ambient light intensities and different lighting colors. Finally, the algorithm was validated in a real satellite assembly scenario, achieving a success rate of 15 out of 20 tests.

Information

Type
Research Article
Copyright
© The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable