Q-Table compression for reinforcement learning

Leonardo Amado; Felipe Meneguzzi

doi:10.1017/S0269888918000280

Q-Table compression for reinforcement learning

Part of: Adaptive Learning Agents 2017

Published online by Cambridge University Press: 04 December 2018

Leonardo Amado and

Felipe Meneguzzi

Show author details

Leonardo Amado: Affiliation:
Pontifical Catholic University of Rio Grande do Sul, Av. Ipiranga 6681, Porto Alegre, RS, 90619-900, Brazil; e-mail leonardo.amado@acad.pucrs.br, felipe.meneguzzi@pucrs.br
Felipe Meneguzzi: Affiliation:
Pontifical Catholic University of Rio Grande do Sul, Av. Ipiranga 6681, Porto Alegre, RS, 90619-900, Brazil; e-mail leonardo.amado@acad.pucrs.br, felipe.meneguzzi@pucrs.br

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Reinforcement learning (RL) algorithms are often used to compute agents capable of acting in environments without prior knowledge of the environment dynamics. However, these algorithms struggle to converge in environments with large branching factors and their large resulting state-spaces. In this work, we develop an approach to compress the number of entries in a Q-value table using a deep auto-encoder. We develop a set of techniques to mitigate the large branching factor problem. We present the application of such techniques in the scenario of a real-time strategy (RTS) game, where both state space and branching factor are a problem. We empirically evaluate an implementation of the technique to control agents in an RTS game scenario where classical RL fails and provide a number of possible avenues of further work on this problem.

Information

Type: Special Issue Contribution
Information: The Knowledge Engineering Review , Volume 33 , 2018 , e22

DOI: https://doi.org/10.1017/S0269888918000280 [Opens in a new window]
Copyright: © Cambridge University Press, 2018

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Barriga, N. A., Stanescu, M. & Buro, M. 2017. Combining strategic learning with tactical search in real-time strategy games. In AIIDE, 9–15. AAAI Press.Google Scholar

Bianchi, R. A., Celiberto, L. A. Jr., Santos, P. E., Matsuura, J. P. & de Mantaras, R. L. 2015. Transferring knowledge as heuristics in reinforcement learning: a case-based approach. Artificial Intelligence 226, 0 102-121.Google Scholar

Boyan, J. A. & Moore, A. W. 1995. Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems 7, Tesauro, G., Touretzky, D. S. & Leen, T. K. (eds). MIT Press, 369–376.Google Scholar

Guestrin, C., Lagoudakis, M. G. & Parr, R. 2002. Coordinated reinforcement learning. In Proceedings of the Nineteenth International Conference on Machine Learning, ICML ’02, 227–234, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.Google Scholar

Hebb, D. O. 1949. The Organization of Behavior: A Neuropsychological Theory. Wiley.Google Scholar

Jaidee, U. & Munoz-Avila, H. 2012. Classq-l: a q-learning algorithm for adversarial real-time strategy games.Google Scholar

Kaelbling, L. P., Littman, M. L. & Moore, A. W. 1996. Reinforcement learning: a survey. Journal of Artificial Intelligence Research 4, 237–285.Google Scholar

Kok, J. R. & Vlassis, N. 2006. Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research 7, 1789–1828.Google Scholar

Lange, S. & Riedmiller, M. A. 2010. Deep auto-encoder neural networks in reinforcement learning. In IJCNN, 1–8. IEEE.Google Scholar

Legenstein, R., Wilbert, N. & Wiskott, L. 2010. Reinforcement Learning on Slow Features of High-Dimensional Input Streams. PLoS Comput Biol 6, e1000894.Google Scholar

Mataric, M. J. 1994. Reward functions for accelerated learning. In Proceedings of the Eleventh International Conference on Machine Learning, 181–189. Morgan Kaufmann.Google Scholar

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D. & Riedmiller, M. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.Google Scholar

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S. & Hassabis, D. 2015. Human-level control through deep reinforcement learning. Nature 5180, 529–533.Google Scholar

Nair, A., Srinivasan, P., Blackwell, S., Alcicek, C., Fearon, R., Maria, A. D., Panneershelvam, V., Suleyman, M., Beattie, C., Petersen, S., Legg, S., Mnih, V., Kavukcuoglu, K. & Silver, D. 2015. Massively parallel methods for deep reinforcement learning. CoRR, abs/1507.04296.Google Scholar

Ontañón, S. 2013. The combinatorial multi-armed bandit problem and its application to real-time strategy games. In AIIDE, Sukthankar, G. & Horswill, I. (eds),. AAAI.Google Scholar

Pan, S. J. & Yang, Q. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 220, 1345–1359.Google Scholar

Puterman, M. L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edition. John Wiley & Sons, Inc.Google Scholar

Rumelhart, D. E., Hinton, G. E. & Williams, R. J. 1988. Neurocomputing: foundations of research. Learning Representations by Back-propagating Errors, 696–699. MIT Press.Google Scholar

Sharma, M., Holmes, M., Santamaria, J., Irani, A., Isbell, C. & Ram, A. 2007. Transfer learning in real-time strategy games using hybrid cbr/rl. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI'07, 1041–1046. Morgan Kaufmann Publishers Inc.Google Scholar

R., S. & Barto, A. G. 1998. Introduction to Reinforcement Learning, 1st edition. MIT Press.Google Scholar

Synnaeve, G., Nardelli, N., Auvolat, A., Chintala, S., Lacroix, T., Lin, Z., Richoux, F. & Usunier, N. 2016. Torchcraft: a library for machine learning research on real-time strategy games. arXiv preprint arXiv:1611.00625.Google Scholar

Tesauro, G. 1992. Practical issues in temporal difference learning. Machine Learning 8, 257–277.Google Scholar

Tesauro, G. 1995. Temporal difference learning and td-gammon. Communications of the ACM 380, 58–68.Google Scholar

Tokic, M. 2010. Adaptive e-greedy exploration in reinforcement learning based on value differences. In Proceedings of the 33rd Annual German Conference on Advances in Artificial Intelligence, KI’10, pages 203–210. Springer-Verlag.Google Scholar

Vincent, P., Larochelle, H., Bengio, Y. & Manzagol, P.-A. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, 1096–1103. ACM.Google Scholar

Vinyals, O., Ewalds, T., Bartunov, S., Georgiev, P., Vezhnevets, A. S., Yeo, M., Makhzani, A., Küttler, H., Agapiou, J., Schrittwieser, J., Quan, J., Gaffney, S., Petersen, S., Simonyan, K., Schaul, T., van Hasselt, H., Silver, D., Lillicrap, T. P., Calderone, K., Keet, P., Brunasso, A., Lawrence, D., Ekermo, A., Repp, J. & Tsing, R. 2017. Starcraft II: a new challenge for reinforcement learning. CoRR abs/1708.04782.Google Scholar

Watkins, J. C. H. & Dayan, P. 1992. Technical note: Q-learning. Machine Learning 8, 279–292.Google Scholar

Wendel, V., Alef, J., Göbel, S. & Steinemtz, R. 2014. A method for simulating players in a collaborative multiplayer serious game. In Proceedings of the 2014 ACM International Workshop on Serious Games, SeriousGames ’14, 15–20. ACM.Google Scholar

Zhang, C. & Lesser, V. 2013. Coordinating multi-agent reinforcement learning with limited communication. In Proceedings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems, AAMAS ’13, 1101–1108. International Foundation for Autonomous Agents and Multiagent Systems.Google Scholar

Zhang, J. & Zong, C. 2015. Deep neural networks in machine translation: an overview. IEEE Intelligent Systems 300, 16–25.Google Scholar

Article contents

Q-Table compression for reinforcement learning

Abstract

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests