Hostname: page-component-89b8bd64d-ktprf Total loading time: 0 Render date: 2026-05-08T03:59:04.714Z Has data issue: false hasContentIssue false

A hierarchical deep reinforcement learning algorithm for typing with a dual-arm humanoid robot

Published online by Cambridge University Press:  20 November 2024

Jacky Baltes
Affiliation:
Department of Electrical Engineering, National Taiwan Normal University, Taipei, Taiwan
Hanjaya Mandala
Affiliation:
Department of Electrical Engineering, National Taiwan Normal University, Taipei, Taiwan
Saeed Saeedvand*
Affiliation:
Department of Electrical Engineering, National Taiwan Normal University, Taipei, Taiwan
*
Corresponding author: Saeed Saeedvand; Email: saeedvand@ntnu.edu.tw
Rights & Permissions [Opens in a new window]

Abstract

Recently, the field of robotics development and control has been advancing rapidly. Even though humans effortlessly manipulate everyday objects, enabling robots to interact with human-made objects in real-world environments remains a challenge despite years of dedicated research. For example, typing on a keyboard requires adapting to various external conditions, such as the size and position of the keyboard, and demands high accuracy from a robot to be able to use it properly. This paper introduces a novel hierarchical reinforcement learning algorithm based on the Deep Deterministic Policy Gradient (DDPG) algorithm to address the dual-arm robot typing problem. In this regard, the proposed algorithm employs a Convolutional Auto-Encoder (CAE) to deal with the associated complexities of continuous state and action spaces at the first stage, and then a DDPG algorithm serves as a strategy controller for the typing problem. Using a dual-arm humanoid robot, we have extensively evaluated our proposed algorithm in simulation and real-world experiments. The results showcase the high efficiency of our approach, boasting an average success rate of 96.14% in simulations and 92.2% in real-world settings. Furthermore, we demonstrate that our proposed algorithm outperforms DDPG and Deep Q-Learning, two frequently employed algorithms in robotic applications.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use and/or adaptation of the article.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. Modified THORMANG3 dual-arm humanoid robot to type with the keyboard

Figure 1

Figure 2. Proposed architecture for typing with the keyboard, including convolutional auto-encoder and deep reinforcement learning

Figure 2

Figure 3. THORMANG3 right arm kinematic chain (real and modeled robot)

Figure 3

Figure 4. The structure of the proposed CAE algorithm

Figure 4

Figure 5. Block diagram of the DDPG algorithm and its interface with the environment

Figure 5

Table 1. HDDPG algorithm’s pseudocode for object placement on gym toolkit

Figure 6

Table 2. The actor-network structure and layers’ details

Figure 7

Table 3. Dimensions and weights of the used objects in the experiments

Figure 8

Figure 6. Examples of simulations in the gym toolkit that the robot is typing with a keyboard

Figure 9

Figure 7. (a) Accumulated reward during training episodes; comparing proposed HDDPG, DDPG, and DQL algorithms; (b) average keys pressing error for HDDPG algorithm in simulation

Figure 10

Figure 8. Success rate comparison of HDDPG, DDPG, and DQL algorithms for different keyboards on simulations

Figure 11

Table 4. Success rate comparison for different objects in the real environment

Figure 12

Figure 9. The type procedure snapshots on the real environment