Hostname: page-component-89b8bd64d-sd5qd Total loading time: 0 Render date: 2026-05-09T08:08:35.584Z Has data issue: false hasContentIssue false

Robot autonomous learning system based on multi-source sensor information fusion

Published online by Cambridge University Press:  18 February 2026

Ze Cui*
Affiliation:
College of Mechanical and Electrical Engineering and Automation, Shanghai University, China
Chenzhao Sun
Affiliation:
College of Mechanical and Electrical Engineering and Automation, Shanghai University, China
Zenghao Chen
Affiliation:
Shanghai Aerospace Control Technology Institute, Shanghai Engineering Research C, China
Jing’ao Huang
Affiliation:
College of Mechanical and Electrical Engineering and Automation, Shanghai University, China
Yang Zhang
Affiliation:
College of Mechanical and Electrical Engineering and Automation, Shanghai University, China
*
Corresponding author: Ze Cui; Email: cuize0421@126.com

Abstract

With the rapid development of artificial intelligence technology, robotics, as its core branch, has attracted extensive attention from researchers. This paper designs and develops a robotic arm learning system based on multi-source sensor information fusion, which investigates the autonomous learning capability of robotic arms by closely integrating deep reinforcement learning (DRL) as the core framework for skill acquisition. By incorporating imitation learning as a source of expert prior and leveraging DRL’s intrinsic ability for policy optimization through environmental exploration, the proposed system achieves both rapid learning and robust generalization. Specifically, we introduce the gradient penalty mechanism from Wasserstein generative adversarial networks (WGANs), a technique that improves the stability of adversarial training by penalizing gradients that deviate from a specified norm. This mechanism is incorporated into the soft actor-critic (SAC) algorithm, a widely used off-policy DRL method known for its sample efficiency and robust performance in continuous control tasks. The resulting SAC-GP (SAC-gradient penalty) algorithm benefits from both SAC’s stable policy learning and WGAN’s improved training regularization, leading to superior convergence speed and system stability. Furthermore, this paper proposes a hybrid learning framework by combining generative adversarial imitation learning (GAIL) with SAC-GP, enabling the agent to benefit from both demonstration-based policy initialization and continuous self-improvement via reinforcement learning. Finally, a door-opening experiment is designed to verify the learning and execution capabilities of the system in both virtual and real environments. Experimental results demonstrate that the proposed learning system possesses excellent learning and motion execution abilities in practical applications. This achievement not only provides new insights for research in robot learning but also lays a solid foundation for the future development of robotic technology.

Information

Type
Research Article
Copyright
© The Author(s), 2026. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable