Skip to main content Accessibility help
×
Home
Hostname: page-component-768ffcd9cc-jp8mt Total loading time: 0.352 Render date: 2022-12-04T13:17:43.917Z Has data issue: true Feature Flags: { "useRatesEcommerce": false } hasContentIssue true

Learning self-play agents for combinatorial optimization problems

Published online by Cambridge University Press:  23 March 2020

Ruiyang Xu
Affiliation:
Northeastern University Khoury College of Computer Sciences, Boston, MA, USA, e-mails: ruiyang@ccs.neu.edu, lieber@ccs.neu.edu
Karl Lieberherr
Affiliation:
Northeastern University Khoury College of Computer Sciences, Boston, MA, USA, e-mails: ruiyang@ccs.neu.edu, lieber@ccs.neu.edu

Abstract

Recent progress in reinforcement learning (RL) using self-play has shown remarkable performance with several board games (e.g., Chess and Go) and video games (e.g., Atari games and Dota2). It is plausible to hypothesize that RL, starting from zero knowledge, might be able to gradually approach a winning strategy after a certain amount of training. In this paper, we explore neural Monte Carlo Tree Search (neural MCTS), an RL algorithm that has been applied successfully by DeepMind to play Go and Chess at a superhuman level. We try to leverage the computational power of neural MCTS to solve a class of combinatorial optimization problems. Following the idea of Hintikka’s Game-Theoretical Semantics, we propose the Zermelo Gamification to transform specific combinatorial optimization problems into Zermelo games whose winning strategies correspond to the solutions of the original optimization problems. A specially designed neural MCTS algorithm is then introduced to train Zermelo game agents. We use a prototype problem for which the ground-truth policy is efficiently computable to demonstrate that neural MCTS is promising.

Type
Research Article
Copyright
© Cambridge University Press, 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Anthony, T., Tian, Z. & Barber, D. 2017. Thinking fast and slow with deep learning and tree search. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS ’17, 5366–5376.Google Scholar
Auer, P., Cesa-Bianchi, N. & Fischer, P. 2002. Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2), 235256.CrossRefGoogle Scholar
Auger, D., Couetoux, A. & Teytaud, O. 2013. Continuous upper confidence trees with polynomial exploration - consistency. In ECML/PKDD (1), Lecture Notes in Computer Science 8188, 194209. Springer.Google Scholar
Battaglia, P. W., Hamrick, J. B., Bapst, V., Sanchez-Gonzalez, A., Zambaldi, V., Malinowski, M., Tacchetti, A., Raposo, D., Santoro, A., Faulkner, R., Gulcehre, C., Song, F., Ballard, A, Gilmer, J., Dahl, G., Vaswani, A., Allen, K., Nash, C., Langston, V., Dyer, C., Heess, N, Wierstra, D., Kohli, P., Botvinick, M., Vinyals, O., Li, Y. & Pascanu, R. 2018. Relational inductive biases, deep learning, and graph networks. arXiv preprint .Google Scholar
Bello, I., Pham, H., Le, Q. V., Norouzi, M. & Bengio, S. 2016. Neural combinatorial optimization with reinforcement learning. arXiv preprint .Google Scholar
Bjornsson, Y. & Finnsson, H. 2009. Cadiaplayer: a simulation-based general game player. IEEE Transactions on Computational Intelligence and AI in Games 1(1), 415.CrossRefGoogle Scholar
Browne, C., Powley, E. J., Whitehouse, D., Lucas, S. M., Cowling, P. I., Rohlfshagen, P., Tavener, S., Liebana, D. P., Samothrakis, S. & Colton, S. 2012. A survey of Monte Carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games 4(1), 143.CrossRefGoogle Scholar
Fujikawa, Y. & Min, M. 2013. A new environment for algorithm research using gamification. IEEE International Conference on Electro-Information Technology , EIT 2013, Rapid City, SD, 16.Google Scholar
Genesereth, M., Love, N. & Pell, B. 2005. General game playing: Overview of the AAAI competition. AI Magazine 26(2), 62.Google Scholar
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. 2017. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning 70, 12631272. JMLR. org.Google Scholar
Hintikka, J. 1982. Game-theoretical semantics: insights and prospects. Notre Dame Journal of Formal Logic 23(2), 219241.CrossRefGoogle Scholar
Khalil, E., Dai, H., Zhang, Y., Dilkina, B. & Song, L. 2017. Learning combinatorial optimization algorithms over graphs. In Advances in Neural Information Processing Systems, 63486358.Google Scholar
Kocsis, L. & Szepesvári, C. 2006. Bandit based Monte-Carlo planning. In Proceedings of the 17th European Conference on Machine Learning, ECML ’06, 282–293. Springer-Verlag.CrossRefGoogle Scholar
Laterre, A., Fu, Y., Jabri, M. K., Cohen, A.-S., Kas, D., Hajjar, K., Dahl, T. S., Kerkeni, A. & Beguir, K. 2018. Ranked reward: enabling self-play reinforcement learning for combinatorial optimization. arXiv preprint .Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S. & Hassabis, D. 2015. Human-level control through deep reinforcement learning. Nature 518(7540), 529533.CrossRefGoogle ScholarPubMed
Novikov, F. & Katsman, V. 2018. Gamification of problem solving process based on logical rules. In Informatics in Schools. Fundamentals of Computer Science and Software Engineering, Pozdniakov, S. N. & Dagienė, V. (eds). Springer International Publishing, 369380.CrossRefGoogle Scholar
Racanière, S., Weber, T., Reichert, D. P., Buesing, L., Guez, A., Rezende, D., Badia, A. P., Vinyals, O., Heess, N., Li, Y., Pascanu, R., Battaglia, P., Hassabis, D., Silver, D. & Wierstra, D. 2017. Imagination-augmented agents for deep reinforcement learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS ’17, 5694–5705. Curran Associates Inc.Google Scholar
Rezende, M. & Chaimowicz, L. 2017. A methodology for creating generic game playing agents for board games. In 2017 16th Brazilian Symposium on Computer Games and Digital Entertainment (SBGames), 19–28. IEEE.CrossRefGoogle Scholar
Selsam, D., Lamm, M., Bunz, B., Liang, P., de Moura, L. & Dill, D. L. 2018. Learning a sat solver from single-bit supervision. arXiv preprint .Google Scholar
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T. P., Simonyan, K. & Hassabis, D. 2017a. Mastering Chess and Shogi by self-play with a general reinforcement learning algorithm. CoRR .Google Scholar
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K. & Hassabis, D. 2018. A general reinforcement learning algorithm that masters Chess, Shogi, and Go through self-play. Science 362(6419), 11401144.CrossRefGoogle ScholarPubMed
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T. & Hassabis, D. 2017b. Mastering the game of Go without human knowledge. Nature 550, 354.CrossRefGoogle ScholarPubMed
Sniedovich, M. 2003. OR/MS games: 4. The joy of egg-dropping in Braunschweig and Hong Kong. INFORMS Transactions on Education 4(1), 4864.CrossRefGoogle Scholar
Vinyals, O., Fortunato, M. & Jaitly, N. 2015. Pointer networks. In Advances in Neural Information Processing Systems, Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M. & Garnett, R. (eds), 28. Curran Associates, Inc., 2692–2700.Google Scholar
Williams, R. J. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229256.CrossRefGoogle Scholar
2
Cited by

Save article to Kindle

To save this article to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Learning self-play agents for combinatorial optimization problems
Available formats
×

Save article to Dropbox

To save this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Dropbox account. Find out more about saving content to Dropbox.

Learning self-play agents for combinatorial optimization problems
Available formats
×

Save article to Google Drive

To save this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Google Drive account. Find out more about saving content to Google Drive.

Learning self-play agents for combinatorial optimization problems
Available formats
×
×

Reply to: Submit a response

Please enter your response.

Your details

Please enter a valid email address.

Conflicting interests

Do you have any conflicting interests? *