Skip to main content Accessibility help
×
Home
Hostname: page-component-684bc48f8b-mkrr2 Total loading time: 0.302 Render date: 2021-04-12T14:02:35.414Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "metricsAbstractViews": false, "figures": false, "newCiteModal": false, "newCitedByModal": true }

Leveraging human knowledge in tabular reinforcement learning: a study of human subjects

Published online by Cambridge University Press:  17 September 2018

Ariel Rosenfeld
Affiliation:
Department of Management, Bar-Ilan University, Max and Anna Webb Street, 5290002 Ramat Gan, Israel; e-mail: arielros1@gmail.com
Moshe Cohen
Affiliation:
Department of Computer Science, Bar-Ilan University, Max and Anna Webb Street, 5290002 Ramat-Gan, Israel e-mail: moshec40@gmail.com; sarit@cs.biu.ac.il
Matthew E. Taylor
Affiliation:
Department of Computer Science, Washington State University, Pullman, WA 99164, USA e-mail: taylorm@eecs.wsu.edu
Sarit Kraus
Affiliation:
Department of Computer Science, Bar-Ilan University, Max and Anna Webb Street, 5290002 Ramat-Gan, Israel e-mail: moshec40@gmail.com; sarit@cs.biu.ac.il

Abstract

Reinforcement learning (RL) can be extremely effective in solving complex, real-world problems. However, injecting human knowledge into an RL agent may require extensive effort and expertise on the human designer’s part. To date, human factors are generally not considered in the development and evaluation of possible RL approaches. In this article, we set out to investigate how different methods for injecting human knowledge are applied, in practice, by human designers of varying levels of knowledge and skill. We perform the first empirical evaluation of several methods, including a newly proposed method named State Action Similarity Solutions (SASS) which is based on the notion of similarities in the agent’s state–action space. Through this human study, consisting of 51 human participants, we shed new light on the human factors that play a key role in RL. We find that the classical reward shaping technique seems to be the most natural method for most designers, both expert and non-expert, to speed up RL. However, we further find that our proposed method SASS can be effectively and efficiently combined with reward shaping, and provides a beneficial alternative to using only a single-speedup method with minimal human designer effort overhead.

Type
Adaptive and Learning Agents
Copyright
© Cambridge University Press, 2018 

Access options

Get access to the full version of this content by using one of the access options below.

References

Albus, J. S. 1981. Brains, Behavior and Robotics. McGraw-Hill Inc.Google Scholar
Benda, M. 1985. On Optimal Cooperation of Knowledge Sources. Technical report BCS-G2010-28.Google Scholar
Bianchi, R. A., Martins, M. F., Ribeiro, C. H. & Costa, A. H. 2014. ‘Heuristically-accelerated multiagent reinforcement learning’. IEEE Transactions on Cybernetics 44(2), 252265.CrossRefGoogle Scholar
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J. & Zaremba, W. 2016. Openai gym. https://gym.openai.com (accessed 24 October 2017).Google Scholar
Bruner, J. S. 1957. Going beyond the information given. Contemporary Approaches to Cognition 1(1), 119160.Google Scholar
Brys, T., Harutyunyan, A., Suay, H. B., Chernova, S., Taylor, M. E. & Nowé, A. 2015. Reinforcement learning from demonstration through shaping. In IJCAI, 3352–3358.Google Scholar
Brys, T., Nowé, A., Kudenko, D. & Taylor, M. E. 2014. Combining multiple correlated reward and shaping signals by measuring confidence. In AAAI, 1687–1693.Google Scholar
Busoniu, L., Babuska, R., De Schutter, B. & Ernst, D. 2010. Reinforcement Learning and Dynamic Programming Using Function Approximators, 39 . CRC Press.CrossRefGoogle Scholar
Devlin, S., Grze´s, M. & Kudenko, D. 2011. Multi-agent, reward shaping for robocup keepaway. In The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 3, 1227–1228. International Foundation for Autonomous Agents and Multiagent Systems.Google Scholar
Geramifard, A., Klein, R. H., Dann, C., Dabney, W. & How, J. P. 2013. RLPy: The Reinforcement Learning Library for Education and Research. http://acl.mit.edu/RLPy.Google Scholar
Girgin, S., Polat, F. & Alhajj, R. 2007. Positive impact of state similarity on reinforcement learning performance. IEEE Transactions on Cybernetics 37(5), 12561270.CrossRefGoogle ScholarPubMed
Hart, S. G. & Staveland, L. E. 1988. Development of NASA-TLX (task load index): results of empirical and theoretical research. Advances in Psychology 52, 139183.CrossRefGoogle Scholar
Hester, T. & Stone, P. 2013. Texplore: real-time sample-efficient reinforcement learning for robots. Machine Learning 90(3), 385429.CrossRefGoogle Scholar
Jong, N. K. & Stone, P. 2007. Model-based function approximation in reinforcement learning. In AAMAS, 95. ACM.Google Scholar
Karakovskiy, S. & Togelius, J. 2012. The Mario AI benchmark and competitions. IEEE Transactions on Computational Intelligence and AI in Games 4(1), 5567.CrossRefGoogle Scholar
Kelly, G. 1955. Personal Construct Psychology. Norton.Google Scholar
Knox, W. B. & Stone, P. 2010. Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In Proceedings of AAMAS.Google Scholar
Leffler, B. R., Littman, M. L. & Edmunds, T. 2007. Efficient reinforcement learning with relocatable action models. AAAI 7, 572–577.Google Scholar
Littman, M. L. 1994. Markov games as a framework for multi-agent reinforcement learning. ICML 157, 157–163.Google Scholar
Martins, M. F. & Bianchi, R. A. 2013. Heuristically-accelerated reinforcement learning: a comparative analysis of performance. In Conference Towards Autonomous Robotic Systems, 15–27. Springer.Google Scholar
Mataric, M. J. 1994. Reward functions for accelerated learning. In Machine Learning: Proceedings of the Eleventh International Conference, 181–189.Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S. & Hassabis, D. 2015. Human-level control through deep reinforcement learning. Nature 518(7540), 529533.CrossRefGoogle ScholarPubMed
Narayanamurthy, S. M. & Ravindran, B. 2008. On the hardness of finding symmetries in Markov decision processes. In ICML, 688–695.Google Scholar
Ng, A. Y., Harada, D. & Russell, S. 1999. Policy invariance under reward transformations: theory and application to reward shaping. ICML. 99, 278–287.Google Scholar
Peng, B., MacGlashan, J., Loftin, R., Littman, M. L., Roberts, D. L. & Taylor, M. E. 2016. A need for speed: adapting agent action speed to improve task learning from non-expert humans. In AAMAS, 957–965.Google Scholar
Randløv, J. & Alstrøm, P. 1998. Learning to drive a bicycle using reinforcement learning and shaping. ICML 98, 463–471.Google Scholar
Ribeiro, C. H. 1995. Attentional mechanisms as a strategy for generalisation in the q-learning algorithm. Proceedings of ICANN 95, 455–460.Google Scholar
Ribeiro, C. & Szepesv´ari, C. 1996. Q-learning combined with spreading: convergence and results. In Proceedings of the ISRF-IEE International Conference on Intelligent and Cognitive Systems (Neural Networks Symposium), 32–36.Google Scholar
Rosenfeld, A. & Kraus, S. 2018. Predicting human decision-making: from prediction to action. Synthesis Lectures on Artificial Intelligence and Machine Learning 12(1), 1150.CrossRefGoogle Scholar
Rosenfeld, A., Taylor, M. E. & Kraus, S. 2017a. Leveraging human knowledge in tabular reinforcement learning: a study of human subjects. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19–25, 2017, 3823–3830.Google Scholar
Rosenfeld, A., Taylor, M. E. & Kraus, S. 2017b. Speeding up tabular reinforcement learning using stateaction similarities. In AAMAS, 1722–1724.Google Scholar
Schaul, T., Bayer, J., Wierstra, D., Sun, Y., Felder, M., Sehnke, F., Rückstieß, T & Schmidhuber, J. 2010. PyBrain, Journal of Machine Learning Research 11, 743–746.Google Scholar
Sequeira, P., Melo, F. S. & Paiva, A. 2013. An associative state-space metric for learning in factored mdps. In Portuguese Conference on Artificial Intelligence, 163–174. Springer.Google Scholar
Skinner, B. F. 1958. Reinforcement today. American Psychologist 13(3), 94.CrossRefGoogle Scholar
Stone, P., Kuhlmann, G., Taylor, M. E. & Liu, Y. 2006. Keepaway soccer: from machine learning testbed to benchmark. In RoboCup-2005: Robot Soccer World Cup IX, I. Noda, A. Jacoff, A. Bredenfeld & Y. Takahashi (eds). Springer Verlag 4020, 93–105.Google Scholar
Suay, H. B., Brys, T., Taylor, M. E. & Chernova, S. 2016. Learning from demonstration for shaping through inverse reinforcement learning. In AAMAS, 429–437.Google Scholar
Sutton, R. S. & Barto, A. G. 1998. Reinforcement Learning: An Introduction. MIT press.Google Scholar
Szepesvári, C. & Littman, M. L. 1999. ‘A unified analysis of value-function-based reinforcementlearning algorithms’. Neural Computation 11(8), 20172060.CrossRefGoogle Scholar
Tamassia, M., Zambetta, F., Raffe, W., Mueller, F. & Li, X. 2016. Dynamic choice of state abstraction in q-learning. In ECAI.Google Scholar
Tanner, B. & White, A. 2009. RL-Glue : language-independent software for reinforcement-learning experiments. Journal of Machine Learning Research 10, 21332136.Google Scholar
Watkins, C. J. C. H. 1989. Learning from Delayed Rewards. PhD thesis, University of Cambridge.Google Scholar
Witten, I. H., Frank, E., Hall, M. A. & Pal, C. J. 2016. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann.Google Scholar
Zinkevich, M. & Balch, T. 2001. Symmetry in Markov decision processes and its implications for single agent and multi agent learning. In ICML.Google Scholar

Full text views

Full text views reflects PDF downloads, PDFs sent to Google Drive, Dropbox and Kindle and HTML full text views.

Total number of HTML views: 12
Total number of PDF views: 68 *
View data table for this chart

* Views captured on Cambridge Core between 17th September 2018 - 12th April 2021. This data will be updated every 24 hours.

Send article to Kindle

To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Leveraging human knowledge in tabular reinforcement learning: a study of human subjects
Available formats
×

Send article to Dropbox

To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

Leveraging human knowledge in tabular reinforcement learning: a study of human subjects
Available formats
×

Send article to Google Drive

To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

Leveraging human knowledge in tabular reinforcement learning: a study of human subjects
Available formats
×
×

Reply to: Submit a response


Your details


Conflicting interests

Do you have any conflicting interests? *