Overcoming incorrect knowledge in plan-based reward shaping

Kyriakos Efthymiadis; Sam Devlin; Daniel Kudenko

doi:10.1017/S026988891500017X

Overcoming incorrect knowledge in plan-based reward shaping

Published online by Cambridge University Press: 11 February 2016

Kyriakos Efthymiadis ,

Sam Devlin and

Daniel Kudenko

Show author details

Kyriakos Efthymiadis: Affiliation:
Department of Computer Science, Deramore Lane, University of York, Heslington, York, YO10 5GH, UK e-mail: kirk@cs.york.ac.uk, sam.devlin@york.ac.uk, daniel.kudenko@york.ac.uk
Sam Devlin: Affiliation:
Department of Computer Science, Deramore Lane, University of York, Heslington, York, YO10 5GH, UK e-mail: kirk@cs.york.ac.uk, sam.devlin@york.ac.uk, daniel.kudenko@york.ac.uk
Daniel Kudenko: Affiliation:
Department of Computer Science, Deramore Lane, University of York, Heslington, York, YO10 5GH, UK e-mail: kirk@cs.york.ac.uk, sam.devlin@york.ac.uk, daniel.kudenko@york.ac.uk

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. Plan-based reward shaping is a successful approach in which a STRIPS plan is used in order to guide the agent to the optimal behaviour. However, if the provided knowledge is wrong, it has been shown the agent will take longer to learn the optimal policy. Previously, in some cases, it was better to ignore all prior knowledge despite it only being partially incorrect.

This paper introduces a novel use of knowledge revision to overcome incorrect domain knowledge when provided to an agent receiving plan-based reward shaping. Empirical results show that an agent using this method can outperform the previous agent receiving plan-based reward shaping without knowledge revision.

Type: Articles
Information: The Knowledge Engineering Review , Volume 31 , Issue 1: Adaptive Learning Agents , January 2016 , pp. 31 - 43

DOI: https://doi.org/10.1017/S026988891500017X [Opens in a new window]
Copyright: © Cambridge University Press, 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Asmuth, J., Littman, M. & Zinkov, R. 2008. Potential-based shaping in model-based reinforcement learning. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, 604–609.Google Scholar

Bertsekas, D. P. 2007. Dynamic Programming and Optimal Control (2 Vol Set), 3rd edition. Athena Scientific.Google Scholar

Devlin, S., Grześ, M. & Kudenko, D. 2011. An empirical study of potential-based reward shaping and advice in complex, multi-agent systems. Advances in Complex Systems.CrossRef Google Scholar

Devlin, S. & Kudenko, D. 2011. Theoretical considerations of potential-based reward shaping for multi-agent systems. In Proceedings of The Tenth Annual International Conference on Autonomous Agents and Multiagent Systems.Google Scholar

Devlin, S. & Kudenko, D. 2012. Dynamic potential-based reward shaping. In Proceedings of The Eleventh Annual International Conference on Autonomous Agents and Multiagent Systems.Google Scholar

Efthymiadis, K. & Kudenko, D. 2013. Using plan-based reward shaping to learn strategies in StarCraft: Brood War. In Computational Intelligence and Games (CIG). IEEE.CrossRef Google Scholar

Fikes, R. E. & Nilsson, N. J. 1972. STRIPS: a new approach to the application of theorem proving to problem solving. Artificial Intelligence 2(3), 189–208.CrossRef Google Scholar

Gärdenfors, P. 1992. Belief revision: an introduction. Belief Revision 29, 1–28.Google Scholar

Grześ, M. & Kudenko, D. 2008a. Multigrid reinforcement learning with reward shaping. In Artificial Neural Networks-ICANN 2008, 357–366.Google Scholar

Grześ, M. & Kudenko, D. 2008b. Plan-based reward shaping for reinforcement learning. In Proceedings of the 4th IEEE International Conference on Intelligent Systems (IS’08), 22–29. IEEE.CrossRef Google Scholar

Marthi, B. 2007. Automatic shaping and decomposition of reward functions. In Proceedings of the 24th International Conference on Machine Learning, 608. ACM.CrossRef Google Scholar

Ng, A. Y., Harada, D. & Russell, S. J. 1999. Policy invariance under reward transformations: theory and application to reward shaping. In Proceedings of the 16th International Conference on Machine Learning, 278–287.Google Scholar

Puterman, M. L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, Inc.CrossRef Google Scholar

Randløv, J. & Alstrom, P. 1998. Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of the 15th International Conference on Machine Learning, 463–471.Google Scholar

Sutton, R. S. & Barto, A. G. 1998. Reinforcement Learning: An Introduction. MIT Press.Google Scholar

Article contents

Overcoming incorrect knowledge in plan-based reward shaping

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests