Robot autonomous learning system based on multi-source sensor information fusion

Ze Cui; Chenzhao Sun; Zenghao Chen; Jing’ao Huang; Yang Zhang

doi:10.1017/S0263574726103191

Robot autonomous learning system based on multi-source sensor information fusion

Published online by Cambridge University Press: 18 February 2026

Jing’ao Huang and

Ze Cui*: Affiliation:
College of Mechanical and Electrical Engineering and Automation, Shanghai University, China
Chenzhao Sun: Affiliation:
College of Mechanical and Electrical Engineering and Automation, Shanghai University, China
Zenghao Chen: Affiliation:
Shanghai Aerospace Control Technology Institute, Shanghai Engineering Research C, China
Jing’ao Huang: Affiliation:
College of Mechanical and Electrical Engineering and Automation, Shanghai University, China
Yang Zhang: Affiliation:
College of Mechanical and Electrical Engineering and Automation, Shanghai University, China
*: Corresponding author: Ze Cui; Email: cuize0421@126.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

With the rapid development of artificial intelligence technology, robotics, as its core branch, has attracted extensive attention from researchers. This paper designs and develops a robotic arm learning system based on multi-source sensor information fusion, which investigates the autonomous learning capability of robotic arms by closely integrating deep reinforcement learning (DRL) as the core framework for skill acquisition. By incorporating imitation learning as a source of expert prior and leveraging DRL’s intrinsic ability for policy optimization through environmental exploration, the proposed system achieves both rapid learning and robust generalization. Specifically, we introduce the gradient penalty mechanism from Wasserstein generative adversarial networks (WGANs), a technique that improves the stability of adversarial training by penalizing gradients that deviate from a specified norm. This mechanism is incorporated into the soft actor-critic (SAC) algorithm, a widely used off-policy DRL method known for its sample efficiency and robust performance in continuous control tasks. The resulting SAC-GP (SAC-gradient penalty) algorithm benefits from both SAC’s stable policy learning and WGAN’s improved training regularization, leading to superior convergence speed and system stability. Furthermore, this paper proposes a hybrid learning framework by combining generative adversarial imitation learning (GAIL) with SAC-GP, enabling the agent to benefit from both demonstration-based policy initialization and continuous self-improvement via reinforcement learning. Finally, a door-opening experiment is designed to verify the learning and execution capabilities of the system in both virtual and real environments. Experimental results demonstrate that the proposed learning system possesses excellent learning and motion execution abilities in practical applications. This achievement not only provides new insights for research in robot learning but also lays a solid foundation for the future development of robotic technology.

Keywords

machine learning deep reinforcement learning demonstration learning SAC algorithm

Information

Type: Research Article
Information: Robotica , First View , pp. 1 - 25

DOI: https://doi.org/10.1017/S0263574726103191 [Opens in a new window]
Copyright: © The Author(s), 2026. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Waseem, M. and Chang, Q., “From Nash Q-learning to nash-MADDPG: Advancements in multiagent control for multiproduct flexible manufacturing systems,” J. Manuf. Syst. 74, 129–140 (2024).10.1016/j.jmsy.2024.03.004CrossRef Google Scholar

Boal, M., Di Girasole, C. G., Tesfai, F., Morrison, T. E. M., Higgs, S., Ahmad, J., Arezzo, A. and Francis, N., “Evaluation status of current and emerging minimally invasive robotic surgical platforms,” Surg. Endosc. 38(2), 554–585 (2024).10.1007/s00464-023-10554-4CrossRef Google Scholar PubMed

Guo, J., Wang, S., Mao, Y., Wang, G., Wu, G., Wu, Y. and Liu, Z., “Supervised learning study on ground clas sification and state recognition of agricultural robots based on multi-source vibration data fusion,” Comput. Electron. Agr. 219, 108791 (2024).10.1016/j.compag.2024.108791CrossRef Google Scholar

Kumar, A., Ray, P. K. and Ansari, M. H., “AI-based self-learning robotic arm using microcontroller,” J. Control Instrum. Eng. 10(1), 20–25 (2024).Google Scholar

Kroemer, O., Niekum, S. and Konidaris, G., “A review of robot learning for manipulation: Challenges, representations, and algorithms. arXiv preprint arXiv: 1907.03146, 2019.Google Scholar

Liang, C. A., Munson, S. A. and Kientz, J. A., “Embracing four tensions in human-computer interaction research with marginalized people,” ACM Trans. Comput.-Hum. Interact. 28(2), 1–47 (2021).10.1145/3443686CrossRef Google Scholar

Gawali, M. B. and Gawali, S. S., “Optimized skill knowledge transfer model using hybrid Chicken Swarm plus Deer Hunting Optimization for human to robot interaction,” Knowl.-BASED Syst. 220, 106945 (2021).10.1016/j.knosys.2021.106945CrossRef Google Scholar

Zhu, K. and Zhang, T., “Deep reinforcement learning based mobile robot navigation: A review,” Tsinghua Sci. Technol. 26(5), 674–691 (2021).10.26599/TST.2021.9010012CrossRef Google Scholar

Xu, C., Zhao, Y.-B., Lu, Z. and Zhang, Y., “Reinforcement-learning-based algorithms for optimization problems and applications to inverse problems,” (2025).Google Scholar

Cao, Y., Ni, K., Kawaguchi, T. and Hashimoto, S., “Path following for autonomous mobile robots with deep reinforcement learning,” Sensors 24(2), 561 (2024).10.3390/s24020561CrossRef Google Scholar PubMed

Lyu, L., Shen, Y. and Zhang, S., “The Advance of Reinforcement Learning and Deep Reinforcement Learning,” In: 2022 IEEE International Conference on Electrical Engineering, Big Data and Algorithms (EEBDA) (2022) pp. 644–648.Google Scholar

Haarnoja, T., Zhou, A., Abbeel, P. and Levine, S., “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, arXiv preprint arXiv: 1801.01290, 2018.Google Scholar

Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P. and Levine, S., “Soft actor-critic algorithms and applications, arXiv preprint arXiv: 1812.05905, 2018.Google Scholar

Guo, H., Ren, Z., Lai, J., Wu, Z. and Xie, S., “Optimal navigation for AGVs: A soft actor–critic-based reinforcement learning approach with composite auxiliary rewards,” Eng. Appl. Artif. Intel. 124, 106613 (2023).10.1016/j.engappai.2023.106613CrossRef Google Scholar

Li, H., Zhang, S. and Guo, D., “RoboCleaner: Robotic tabletop cleaning via VLM-powered multi-agent collaboration,” IEEE Trans. Autom. Sci. Eng. 23, 3651–3660 (2026).10.1109/TASE.2025.3594680CrossRef Google Scholar

Liu, X., Wen, S., Guo, Z. and Liu, H., “Transfer learning-based sparse-reward meta-Q-learning algorithm for active SLAM,” Expert Syst. Appl. 299, 129799 (2026).10.1016/j.eswa.2025.129799CrossRef Google Scholar

Liu, Y., Zeng, Y., Ma, B., Pan, Y., Gao, H. and Zhang, Y., “Improved demonstration-knowledge utilization in reinforcement learning,” IEEE Trans. Artif. Intell. 5(5), 2139–2150 (2024).10.1109/TAI.2023.3328848CrossRef Google Scholar

Huang, T., Chen, K., Li, B., Liu, Y.-H. and Dou, Q., “Demonstration-guided reinforcement learning with efficient exploration for task automation of surgical robot. arXiv preprint arXiv: 2302.09772, 2023.Google Scholar

Zhu, Y., Wang, Z., Merel, J., Rusu, A., Erez, T., Cabi, S., Tunyasuvunakool, S., Kramár, J., Hadsell, R., de Freitas, N. and Heess, N., “Reinforcement and imitation learning for diverse visuomotor skills. arXiv preprint arXiv: 1802.09564, 2018.Google Scholar

Wu, P., Escontrela, A., Hafner, D., Goldberg, K. and Abbeel, P., “Daydreamer: World models for physical robot learning. arXiv preprint arXiv: 2206.14176, 2022.Google Scholar

Gawali, M. B. and Gawali, S. S., “Development of improved coyote optimization with deep neural network for intelligent skill knowledge transfer for human to robot interaction,” Int. J. Intell. Robot. Appl. 6(2), 288–305 (2022).10.1007/s41315-022-00236-0CrossRef Google Scholar

Levine, S., Popović, Z. and Koltun, V., “Nonlinear Inverse Reinforcement Learning with Gaussian Processes,” In: Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS), NIPS’11 (Curran Associates Inc., Red Hook, NY, USA, 2011) pp. 19–27.Google Scholar

Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V. and Courville, A., “Improved training of wasserstein gans. arXiv preprint arXiv: 1704.00028, 2017.Google Scholar

Sutton, R. S. and Barto, A. G., Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA, 1998).10.1109/TNN.1998.712192Google Scholar

Recht, B., “A tour of reinforcement learning: The view from continuous control,” Annu. Rev. Control Robot. Auton. Syst. 2, 253–279 (2019).10.1146/annurev-control-053018-023825CrossRef Google Scholar

Todorov, E., Erez, T. and Tassa, Y., “Mujoco: A Physics Engine for Model-based Control,” In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2012) pp. 5026–5033.Google Scholar

Zhu, Y., Wong, J., Mandlekar, A., Martín-Martín, R., Joshi, A., Nasiriany, S. and Zhu, Y., “Robosuite: A modular simulation framework and benchmark for robot learning. arXiv preprint arXiv: 2009.12293, 2020.Google Scholar

Zhu, Z., Lin, K., Jain, A. K. and Zhou, J., “Transfer learning in deep reinforcement learning: A survey,” IEEE Trans. Pattern Anal. Mach. Intell. 45(11), 13344–13362 (2023).10.1109/TPAMI.2023.3292075CrossRef Google Scholar PubMed

Azzam, R., Chehadeh, M., Abdulhay, O., Boiko, I. and Zweiri, Y., “Learning to navigate through reinforcement across the Sim2Real gap,” TechRxiv, (2022).10.36227/techrxiv.20138960.v1Google Scholar

Abeyruwan, S., Graesser, L., D’Ambrosio, D. B., Singh, A., Shankar, A., Bewley, A., Jain, D., Choromanski, K. and Sanketi, P. R., “I-Sim2Real: Reinforcement learning of robotic policies in tight human–robot interaction loops, arXiv preprint arXiv: 2207.06572, 2022.Google Scholar

Iyengar, K., Sadati, S. M. H., Bergeles, C., Spurgeon, S. and Stoyanov, D., “Sim2Real transfer of reinforcement learning for concentric tube robots,” IEEE Robot. Autom. Lett. 8(10), 6147–6154 (2023).10.1109/LRA.2023.3303714CrossRef Google Scholar

Sandha, S. S., Garcia, L., Balaji, B., Anwar, F. and Srivastava, M., “Sim2Real Transfer for Deep Reinforcement Learning with Stochastic State Transition Delays,” In: Jens Kober, Fabio Ramos, and Claire Tomlin, editors, Proceedings of the 2020 Conference on Robot Learning, vol. 155 of Proceedings of Machine Learning Research (PMLR, 2021) pp. 1066–1083.Google Scholar

Liu, X., Wang, G., Liu, Z., Liu, Y., Liu, Z. and Huang, P., “Hierarchical reinforcement learning integrating with human knowledge for practical robot skill learning in complex multi-stage manipulation,” IEEE Trans. Autom. Sci. Eng. 21(3), 3852–3862 (2024).10.1109/TASE.2023.3288037CrossRef Google Scholar

Chen, C., Xiang, P., Zhang, J., Xiong, R., Wang, Y. and Lu, H., “Deep reinforcement learning based co- optimization of morphology and gait for small-scale legged robot,” IEEE/ASME Trans. Mechatronics 29(4), 2697–2708 (2024).10.1109/TMECH.2023.3330427CrossRef Google Scholar

Chang, L., Shan, L., Zhang, W. and Dai, Y., “Hierarchical multi-robot navigation and formation in unknown environments via deep reinforcement learning and distributed optimization,” Robot. Cim-INT. Manuf. 83, 102570 (2023).10.1016/j.rcim.2023.102570CrossRef Google Scholar

Tan, W., Fang, X., Zhang, W., Song, R., Chen, T., Zheng, Y. and Li, Y., “A hierarchical framework for quadruped omnidirectional locomotion based on reinforcement learning,” IEEE Trans. Autom. Sci. Eng. 21(4), 5367–5378 (2024).10.1109/TASE.2023.3310945CrossRef Google Scholar

Gupta, A., Yu, J., Zhao, T. Z., Kumar, V., Rovinsky, A., Xu, K., Devlin, T. and Levine, S., “Reset-Free Reinforcement Learning Via Multi-task Learning: Learning Dexterous Manipulation Behaviors Without Human Intervention,” In: 2021 IEEE International Conference on Robotics and Automation (ICRA) (2021) pp. 6664–6671.Google Scholar

Article contents

Robot autonomous learning system based on multi-source sensor information fusion

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests