Hostname: page-component-77f85d65b8-grvzd Total loading time: 0 Render date: 2026-03-27T07:44:59.834Z Has data issue: false hasContentIssue false

Overcoming incorrect knowledge in plan-based reward shaping

Published online by Cambridge University Press:  11 February 2016

Kyriakos Efthymiadis
Affiliation:
Department of Computer Science, Deramore Lane, University of York, Heslington, York, YO10 5GH, UK e-mail: kirk@cs.york.ac.uk, sam.devlin@york.ac.uk, daniel.kudenko@york.ac.uk
Sam Devlin
Affiliation:
Department of Computer Science, Deramore Lane, University of York, Heslington, York, YO10 5GH, UK e-mail: kirk@cs.york.ac.uk, sam.devlin@york.ac.uk, daniel.kudenko@york.ac.uk
Daniel Kudenko
Affiliation:
Department of Computer Science, Deramore Lane, University of York, Heslington, York, YO10 5GH, UK e-mail: kirk@cs.york.ac.uk, sam.devlin@york.ac.uk, daniel.kudenko@york.ac.uk

Abstract

Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. Plan-based reward shaping is a successful approach in which a STRIPS plan is used in order to guide the agent to the optimal behaviour. However, if the provided knowledge is wrong, it has been shown the agent will take longer to learn the optimal policy. Previously, in some cases, it was better to ignore all prior knowledge despite it only being partially incorrect.

This paper introduces a novel use of knowledge revision to overcome incorrect domain knowledge when provided to an agent receiving plan-based reward shaping. Empirical results show that an agent using this method can outperform the previous agent receiving plan-based reward shaping without knowledge revision.

Information

Type
Articles
Copyright
© Cambridge University Press, 2016 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable