Hostname: page-component-5db58dd55d-d6ndz Total loading time: 0 Render date: 2026-05-31T01:14:42.640Z Has data issue: false hasContentIssue false

Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning

Published online by Cambridge University Press:  04 December 2018

Patrick Mannion
Affiliation:
Department of Computer Science & Applied Physics, Galway-Mayo Institute of Technology, Dublin Road, GalwayH91 T8NW, Ireland; e-mail: patrick.mannion@gmit.ie
Sam Devlin
Affiliation:
Microsoft Research, 21 Station Road, CambridgeCB1 2FB, United Kingdom; e-mail: sam.devlin@microsoft.com
Jim Duggan
Affiliation:
Discipline of Information Technology, National University of Ireland Galway, GalwayH91 TK33, Ireland; e-mail: jim.duggan@nuigalway.ie, enda.howley@nuigalway.ie
Enda Howley
Affiliation:
Discipline of Information Technology, National University of Ireland Galway, GalwayH91 TK33, Ireland; e-mail: jim.duggan@nuigalway.ie, enda.howley@nuigalway.ie

Abstract

The majority of multi-agent reinforcement learning (MARL) implementations aim to optimize systems with respect to a single objective, despite the fact that many real-world problems are inherently multi-objective in nature. Research into multi-objective MARL is still in its infancy, and few studies to date have dealt with the issue of credit assignment. Reward shaping has been proposed as a means to address the credit assignment problem in single-objective MARL, however it has been shown to alter the intended goals of a domain if misused, leading to unintended behaviour. Two popular shaping methods are potential-based reward shaping and difference rewards, and both have been repeatedly shown to improve learning speed and the quality of joint policies learned by agents in single-objective MARL domains. This work discusses the theoretical implications of applying these shaping approaches to cooperative multi-objective MARL problems, and evaluates their efficacy using two benchmark domains. Our results constitute the first empirical evidence that agents using these shaping methodologies can sample true Pareto optimal solutions in cooperative multi-objective stochastic games.

Information

Type
Special Issue Contribution
Copyright
© Cambridge University Press, 2018 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable