Skip to main content

Q-Table compression for reinforcement learning

  • Leonardo Amado (a1) and Felipe Meneguzzi (a1)

Reinforcement learning (RL) algorithms are often used to compute agents capable of acting in environments without prior knowledge of the environment dynamics. However, these algorithms struggle to converge in environments with large branching factors and their large resulting state-spaces. In this work, we develop an approach to compress the number of entries in a Q-value table using a deep auto-encoder. We develop a set of techniques to mitigate the large branching factor problem. We present the application of such techniques in the scenario of a real-time strategy (RTS) game, where both state space and branching factor are a problem. We empirically evaluate an implementation of the technique to control agents in an RTS game scenario where classical RL fails and provide a number of possible avenues of further work on this problem.

Hide All
Barriga, N. A., Stanescu, M. & Buro, M. 2017. Combining strategic learning with tactical search in real-time strategy games. In AIIDE, 9–15. AAAI Press.
Bianchi, R. A., Celiberto, L. A. Jr., Santos, P. E., Matsuura, J. P. & de Mantaras, R. L. 2015. Transferring knowledge as heuristics in reinforcement learning: a case-based approach. Artificial Intelligence 226, 0 102-121.
Boyan, J. A. & Moore, A. W. 1995. Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems 7, Tesauro, G., Touretzky, D. S. & Leen, T. K. (eds). MIT Press, 369–376.
Guestrin, C., Lagoudakis, M. G. & Parr, R. 2002. Coordinated reinforcement learning. In Proceedings of the Nineteenth International Conference on Machine Learning, ICML ’02, 227–234, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
Hebb, D. O. 1949. The Organization of Behavior: A Neuropsychological Theory. Wiley.
Jaidee, U. & Munoz-Avila, H. 2012. Classq-l: a q-learning algorithm for adversarial real-time strategy games.
Kaelbling, L. P., Littman, M. L. & Moore, A. W. 1996. Reinforcement learning: a survey. Journal of Artificial Intelligence Research 4, 237–285.
Kok, J. R. & Vlassis, N. 2006. Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research 7, 1789–1828.
Lange, S. & Riedmiller, M. A. 2010. Deep auto-encoder neural networks in reinforcement learning. In IJCNN, 1–8. IEEE.
Legenstein, R., Wilbert, N. & Wiskott, L. 2010. Reinforcement Learning on Slow Features of High-Dimensional Input Streams. PLoS Comput Biol 6, e1000894.
Mataric, M. J. 1994. Reward functions for accelerated learning. In Proceedings of the Eleventh International Conference on Machine Learning, 181–189. Morgan Kaufmann.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D. & Riedmiller, M. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S. & Hassabis, D. 2015. Human-level control through deep reinforcement learning. Nature 5180, 529–533.
Nair, A., Srinivasan, P., Blackwell, S., Alcicek, C., Fearon, R., Maria, A. D., Panneershelvam, V., Suleyman, M., Beattie, C., Petersen, S., Legg, S., Mnih, V., Kavukcuoglu, K. & Silver, D. 2015. Massively parallel methods for deep reinforcement learning. CoRR, abs/1507.04296.
Ontañón, S. 2013. The combinatorial multi-armed bandit problem and its application to real-time strategy games. In AIIDE, Sukthankar, G. & Horswill, I. (eds),. AAAI.
Pan, S. J. & Yang, Q. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 220, 1345–1359.
Puterman, M. L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edition. John Wiley & Sons, Inc.
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. 1988. Neurocomputing: foundations of research. Learning Representations by Back-propagating Errors, 696–699. MIT Press.
Sharma, M., Holmes, M., Santamaria, J., Irani, A., Isbell, C. & Ram, A. 2007. Transfer learning in real-time strategy games using hybrid cbr/rl. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI'07, 1041–1046. Morgan Kaufmann Publishers Inc.
R., S. & Barto, A. G. 1998. Introduction to Reinforcement Learning, 1st edition. MIT Press.
Synnaeve, G., Nardelli, N., Auvolat, A., Chintala, S., Lacroix, T., Lin, Z., Richoux, F. & Usunier, N. 2016. Torchcraft: a library for machine learning research on real-time strategy games. arXiv preprint arXiv:1611.00625.
Tesauro, G. 1992. Practical issues in temporal difference learning. Machine Learning 8, 257–277.
Tesauro, G. 1995. Temporal difference learning and td-gammon. Communications of the ACM 380, 58–68.
Tokic, M. 2010. Adaptive e-greedy exploration in reinforcement learning based on value differences. In Proceedings of the 33rd Annual German Conference on Advances in Artificial Intelligence, KI’10, pages 203–210. Springer-Verlag.
Vincent, P., Larochelle, H., Bengio, Y. & Manzagol, P.-A. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, 1096–1103. ACM.
Vinyals, O., Ewalds, T., Bartunov, S., Georgiev, P., Vezhnevets, A. S., Yeo, M., Makhzani, A., Küttler, H., Agapiou, J., Schrittwieser, J., Quan, J., Gaffney, S., Petersen, S., Simonyan, K., Schaul, T., van Hasselt, H., Silver, D., Lillicrap, T. P., Calderone, K., Keet, P., Brunasso, A., Lawrence, D., Ekermo, A., Repp, J. & Tsing, R. 2017. Starcraft II: a new challenge for reinforcement learning. CoRR abs/1708.04782.
Watkins, J. C. H. & Dayan, P. 1992. Technical note: Q-learning. Machine Learning 8, 279–292.
Wendel, V., Alef, J., Göbel, S. & Steinemtz, R. 2014. A method for simulating players in a collaborative multiplayer serious game. In Proceedings of the 2014 ACM International Workshop on Serious Games, SeriousGames ’14, 15–20. ACM.
Zhang, C. & Lesser, V. 2013. Coordinating multi-agent reinforcement learning with limited communication. In Proceedings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems, AAMAS ’13, 1101–1108. International Foundation for Autonomous Agents and Multiagent Systems.
Zhang, J. & Zong, C. 2015. Deep neural networks in machine translation: an overview. IEEE Intelligent Systems 300, 16–25.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

The Knowledge Engineering Review
  • ISSN: 0269-8889
  • EISSN: 1469-8005
  • URL: /core/journals/knowledge-engineering-review
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed