Hostname: page-component-77f85d65b8-2tv5m Total loading time: 0 Render date: 2026-03-29T18:01:37.390Z Has data issue: false hasContentIssue false

Generalization in transfer learning: robust control of robot locomotion

Published online by Cambridge University Press:  11 May 2022

Suzan Ece Ada*
Affiliation:
Department of Computer Engineering, Bogazici University, Istanbul, Turkey
Emre Ugur
Affiliation:
Department of Computer Engineering, Bogazici University, Istanbul, Turkey
H. Levent Akin
Affiliation:
Department of Computer Engineering, Bogazici University, Istanbul, Turkey
*
*Corresponding author. E-mail: ece.ada@boun.edu.tr
Rights & Permissions [Opens in a new window]

Abstract

In this paper, we propose a set of robust training methods for deep reinforcement learning to transfer learning acquired in one control task to a set of previously unseen control tasks. We improve generalization in commonly used transfer learning benchmarks by a novel sample elimination technique, early stopping, and maximum entropy adversarial reinforcement learning. To generate robust policies, we use sample elimination during training via a method we call strict clipping. We apply early stopping, a method previously used in supervised learning, to deep reinforcement learning. Subsequently, we introduce maximum entropy adversarial reinforcement learning to increase the domain randomization during training for a better target task performance. Finally, we evaluate the robustness of these methods compared to previous work on simulated robots in target environments where the gravity, the morphology of the robot, and the tangential friction coefficient of the environment are altered.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright
© The Author(s), 2022. Published by Cambridge University Press
Figure 0

Table I. Target task extrapolation success range.

Figure 1

Figure 1. Proposed architecture and procedures for transfer RL.

Figure 2

Figure 2. (a) Humanoid running in the source environment. (b) Learning curves of SC-PPO and PPO on standard humanoid source task.

Figure 3

Figure 3. (a) Hopping action, and (b) learning curves of ACC-RARL, SC-RARL, RARL, SC-PPO on standard hopper source task.

Figure 4

Table II. Delivery humanoid environment.

Figure 5

Figure 4. (a) Standard humanoid source task with 3 torso components, (b) short humanoid target task with 2 torso components, (c) tall humanoid target task with 4 torso components, and (d) delivery humanoid target task.

Figure 6

Figure 5. Performance of SC-PPO and PPO on (a) shorter and lighter humanoid target task, (b) taller and heavier humanoid target task, and (c) delivery humanoid target task.

Figure 7

Table III. Comparison of SC-PPO and PPO in target friction environment.

Figure 8

Figure 6. Performance of SC-PPO and PPO on target environment with tangential friction 3.5 times the source environment.

Figure 9

Figure 7. Performance of SC-PPO and PPO on target environment with gravity = −4.905 ($0.5G_{Earth}$).

Figure 10

Figure 8. Performance of SC-PPO and PPO on target environment with gravity = −14.715 ($0.5G_{Earth}$).

Figure 11

Table IV. Hopper source environment.

Figure 12

Figure 9. Performance of SC-PPO and PPO on target environment with gravity = −17.1675 ($1.75G_{Earth}$).

Figure 13

Figure 10. Performance of SC-PPO and adversarial methods on Hopper target tasks with torso mass (a) 1, (b) 2, (c) 3, (d) 4, (e) 5, (f) 6, (g) 7.

Figure 14

Table V. Performance of ACC-RARL.

Figure 15

Figure 11. Performance of EACC-RARL, ESC-RARL, and ERARL on target tasks with torso mass (a) 7 and (b) 8.

Figure 16

Table VI. Performance of EACC-RARL.

Figure 17

Figure 12. (a) Hopper that has torso mass of 9 units, and (b) Performance of EACC-RARL, ESC-RARL, and ERARL on target task with torso mass 9.

Figure 18

Figure 13. Performance of (a) SC-PPO, ACC-RARL, SC-RARL and RARL, and (b) ESC-PPO, ACC-RARL, ESC-RARL, and RARL with curriculum on target environment with gravity = −4.905 ($0.5G_{Earth}$).

Figure 19

Figure 14. Performance of (a) SC-PPO, ACC-RARL, SC-RARL and RARL, and (b) ACC-RARL, ESC-RARL and ERARL on target environment with gravity = −14.715 ($1.5G_{Earth}$).

Figure 20

Figure 15. Performance of (a) SC-PPO, ACC-RARL, SC-RARL and RARL, and (b) EACC-RARL, ESC-RARL and ERARL on target environment with gravity $= -17.1675 \left(1.75G_{Earth}\right)$.

Figure 21

Figure 16. Torque commands (Nm) for the source task.

Figure 22

Figure 17. Torque commands (Nm) for the target task with torso mass = 1.

Figure 23

Table VII. Humanoid actions.

Figure 24

Table VIII. Hopper actions.

Figure 25

Table IX. Source task training hyperparameters.

Ada et al. supplementary material

Ada et al. supplementary material

Download Ada et al. supplementary material(Video)
Video 15.9 MB