Hostname: page-component-5db58dd55d-smskv Total loading time: 0 Render date: 2026-06-04T13:49:44.473Z Has data issue: false hasContentIssue false

Context-sensitive reward shaping for sparse interaction multi-agent systems

Published online by Cambridge University Press:  11 February 2016

Yann-Michaël de Hauwere
Affiliation:
Computational Modeling Lab, Vrije Universiteit Brussel Pleinlaan 2, 1050 Brussels, Belgium e-mail: ydehauwe@vub.ac.be, anowe@vub.ac.be
Sam Devlin
Affiliation:
Department of Computer Science, University of York Heslington, York, YO10 5DD, UK e-mail: sam.devlin@york.ac.uk, daniel.kudenko@york.ac.uk
Daniel Kudenko
Affiliation:
Department of Computer Science, University of York Heslington, York, YO10 5DD, UK e-mail: sam.devlin@york.ac.uk, daniel.kudenko@york.ac.uk
Ann Nowé
Affiliation:
Computational Modeling Lab, Vrije Universiteit Brussel Pleinlaan 2, 1050 Brussels, Belgium e-mail: ydehauwe@vub.ac.be, anowe@vub.ac.be

Abstract

Potential-based reward shaping is a commonly used approach in reinforcement learning to direct exploration based on prior knowledge. Both in single and multi-agent settings this technique speeds up learning without losing any theoretical convergence guarantees. However, if speed ups through reward shaping are to be achieved in multi-agent environments, a different shaping signal should be used for each context in which agents have a different subgoal or when agents are involved in a different interaction situation.

This paper describes the use of context-aware potential functions in a multi-agent system in which the interactions between agents are sparse. This means that, unknown to the agents a priori, the interactions between the agents only occur sporadically in certain regions of the state space. During these interactions, agents need to coordinate in order to reach the global optimal solution.

We demonstrate how different reward shaping functions can be used on top of Future Coordinating Q-learning (FCQ-learning); an algorithm capable of automatically detecting when agents should take each other into consideration. Using FCQ-learning, coordination problems can even be anticipated before the actual problems occur, allowing the problems to be solved timely. We evaluate our approach on a range of gridworld problems, as well as a simulation of air traffic control.

Information

Type
Articles
Copyright
© Cambridge University Press, 2016 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable