Hostname: page-component-89b8bd64d-ksp62 Total loading time: 0 Render date: 2026-05-11T15:07:09.928Z Has data issue: false hasContentIssue false

Plan-based reward shaping for multi-agent reinforcement learning

Published online by Cambridge University Press:  11 February 2016

Sam Devlin
Affiliation:
Department of Computer Science, University of York, York, YO10 5GH, England e-mail: sam.devlin@york.ac.uk, daniel.kudenko@york.ac.uk
Daniel Kudenko
Affiliation:
Department of Computer Science, University of York, York, YO10 5GH, England e-mail: sam.devlin@york.ac.uk, daniel.kudenko@york.ac.uk

Abstract

Recent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function.

Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learning. Following up on this work, we investigate the use of STRIPS planning knowledge in the context of MARL.

Our results show that a potential function based on joint or individual plan knowledge can significantly improve MARL performance compared with no shaping. In addition, we investigate the limitations of individual plan knowledge as a source of reward shaping in cases where the combination of individual agent plans causes conflict.

Information

Type
Articles
Copyright
© Cambridge University Press, 2016 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable