Hostname: page-component-77f85d65b8-t6st2 Total loading time: 0 Render date: 2026-03-29T02:54:48.936Z Has data issue: false hasContentIssue false

Learning self-play agents for combinatorial optimization problems

Published online by Cambridge University Press:  23 March 2020

Ruiyang Xu
Affiliation:
Northeastern University Khoury College of Computer Sciences, Boston, MA, USA, e-mails: ruiyang@ccs.neu.edu, lieber@ccs.neu.edu
Karl Lieberherr
Affiliation:
Northeastern University Khoury College of Computer Sciences, Boston, MA, USA, e-mails: ruiyang@ccs.neu.edu, lieber@ccs.neu.edu

Abstract

Recent progress in reinforcement learning (RL) using self-play has shown remarkable performance with several board games (e.g., Chess and Go) and video games (e.g., Atari games and Dota2). It is plausible to hypothesize that RL, starting from zero knowledge, might be able to gradually approach a winning strategy after a certain amount of training. In this paper, we explore neural Monte Carlo Tree Search (neural MCTS), an RL algorithm that has been applied successfully by DeepMind to play Go and Chess at a superhuman level. We try to leverage the computational power of neural MCTS to solve a class of combinatorial optimization problems. Following the idea of Hintikka’s Game-Theoretical Semantics, we propose the Zermelo Gamification to transform specific combinatorial optimization problems into Zermelo games whose winning strategies correspond to the solutions of the original optimization problems. A specially designed neural MCTS algorithm is then introduced to train Zermelo game agents. We use a prototype problem for which the ground-truth policy is efficiently computable to demonstrate that neural MCTS is promising.

Information

Type
Research Article
Copyright
© Cambridge University Press, 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable