Hostname: page-component-89b8bd64d-shngb Total loading time: 0 Render date: 2026-05-10T13:01:03.539Z Has data issue: false hasContentIssue false

Reinforcement learning-based collision avoidance: impact of reward function and knowledge transfer

Published online by Cambridge University Press:  16 March 2020

Xiongqing Liu
Affiliation:
Department of Aerospace and Mechanical Engineering, University of Southern California, 3650 McClintock Avenue, OHE-430, Los Angeles, CA90089-1453, USA
Yan Jin*
Affiliation:
Department of Aerospace and Mechanical Engineering, University of Southern California, 3650 McClintock Avenue, OHE-430, Los Angeles, CA90089-1453, USA
*
Author for correspondence: Yan Jin, E-mail: yjin@usc.edu

Abstract

Collision avoidance for robots and vehicles in unpredictable environments is a challenging task. Various control strategies have been developed for the agent (i.e., robots or vehicles) to sense the environment, assess the situation, and select the optimal actions to avoid collision and accomplish its mission. In our research on autonomous ships, we take a machine learning approach to collision avoidance. The lack of available ship steering data of human ship masters has made it necessary to acquire collision avoidance knowledge through reinforcement learning (RL). Given that the learned neural network tends to be a black box, it is desirable that a method is available which can be used to design an agent's behavior so that the desired knowledge can be captured. Furthermore, RL with complex tasks can be either time consuming or unfeasible. A multi-stage learning method is needed in which agents can learn from simple tasks and then transfer their learned knowledge to closely related but more complex tasks. In this paper, we explore the ways of designing agent behaviors through tuning reward functions and devise a transfer RL method for multi-stage knowledge acquisition. The computer simulation-based agent training results have shown that it is important to understand the roles of each component in a reward function and the various design parameters in transfer RL. The settings of these parameters are all dependent on the complexity of the tasks and the similarities between them.

Information

Type
Research Article
Copyright
Copyright © Cambridge University Press 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable