Skip to main content Accessibility help
×
Home

Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems

  • Laetitia Matignon (a1), Guillaume J. Laurent (a1) and Nadine Le Fort-Piat (a1)

Abstract

In the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate. This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, non-stationarity, stochasticity, alter-exploration and shadowed equilibria. A selection of multi-agent domains is classified according to those challenges: matrix games, Boutilier's coordination game, predators pursuit domains and a special multi-state game. Moreover, the performance of a range of algorithms for independent reinforcement learners is evaluated empirically. Those algorithms are Q-learning variants: decentralized Q-learning, distributed Q-learning, hysteretic Q-learning, recursive frequency maximum Q-value and win-or-learn fast policy hill climbing. An overview of the learning algorithms’ strengths and weaknesses against each challenge concludes the paper and can serve as a basis for choosing the appropriate algorithm for a new domain. Furthermore, the distilled challenges may assist in the design of new learning algorithms that overcome these problems and achieve higher performance in multi-agent applications.

Copyright

Corresponding author

References

Hide All
Abdallah, S., Lesser, V. 2008. A multiagent reinforcement learning algorithm with non-linear dynamics. Journal of Artificial Intelligence Research 33, 521549.
Agogino, A., Turner, K. 2005. Multi-agent reward analysis for learning in noisy domains. In Proceedings of the 4th InternationalJoint Conference on Autonomous Agents and Multiagent Systems, AAMAS'05, 81–88. ACM.
Bab, A., Brafman, R. I. 2008. Multi-agent reinforcement learning in common interest and fixed sum stochastic games: an experimental study. Journal of Machine Learning Research 9, 26352675.
Balch, T., Arkin, R. C. 1994. Communication in reactive multiagent robotic systems. Autonomous Robots 1(1), 2752.
Banerjee, B., Peng, J. 2003. Adaptive policy gradient in multiagent learning. In AAMAS '03: Proceedings of the 2nd International Joint Conference on Autonomous Agents and Multiagent Systems, 686–692. ACM.
Banerjee, B., Sen, S., Peng, J. 2004. On-policy concurrent reinforcement learning. Journal of Experimental & Theoretical Artificial Intelligence 16(4), 245260.
Benda, M., Jagannathan, V., Dodhiawala, R. 1986. On Optimal Cooperation of Knowledge Sources – an Experimental Investigation. Technical report BCS-G2010-280, Boeing Advanced Technology Center, Boeing Computing Services.
Boutilier, C. 1996. Planning, learning and coordination in multiagent decision processes. In Theoretical Aspects of Rationality and Knowledge, Morgan Kaufmann Publishers Inc., 195201.
Boutilier, C. 1999. Sequential optimality and coordination in multiagent systems. In IJCAI, Morgan Publishers Inc., 478485.
Bowling, M. 2005. Convergence and no-regret in multiagent learning. In Advances in Neural Information Processing Systems, Saul, L. K., Weiss, Y. & Bottou, L. (eds). MIT Press, 209216.
Bowling, M., Veloso, M. 2000. An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning. Technical report, Computer Science Department, Carnegie Mellon University.
Bowling, M., Veloso, M. 2002. Multiagent learning using a variable learning rate. Artificial Intelligence 136, 215250.
Brafman, R. I., Tennenholtz, M. 2003. Learning to coordinate efficiently: a model-based approach. Journal of Artificial Intelligence Research 19, 1123.
Busoniu, L., Babuska, R., De Schutter, B. 2006. Decentralized reinforcement learning control of a robotic manipulator. In Proceedings of the 9th International Conference on Control, Automation, Robotics and Vision (ICARCV 2006), 1347–1352. Singapore.
Busoniu, L., Babuska, R., De Schutter, B. 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 38(2), 156172.
Carpenter, M., Kudenko, D. 2005. Baselines for joint-action reinforcement learning of coordination in cooperative multi-agent systems. In Adaptive Agents and Multi-Agent Systems II: Adaptation and Multi-Agent Learning, Lecture Notes in Computer Science, 3394, 5572. Springer.
Claus, C., Boutilier, C. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the 15th National Conference on Artificial Intelligence, 746–752, American Association for Artificial Intelligence.
Dowling, J., Cunningham, R., Curran, E., Cahill, V. 2006. Building autonomic systems using collaborative reinforcement learning. Knowledge Engineering Review 21(3), 231238.
Fulda, N., Ventura, D. 2007. Predicting and preventing coordination problems in cooperative q-learning systems. In Proceedings of the International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc.
Gabel, T., Riedmiller, M. 2006. Multi-agent case-based reasoning for cooperative reinforcement learners. In Proceedings of the ECCBR, 3246. Springer.
Gomes, E. R., Kowalczyk, R. 2009. Dynamic analysis of multiagent-learning with ε-greedy exploration. In ICML'09: Proceedings of the 26th International Conference on Machine Learning, 47. ACM.
Hu, J., Wellman, M. P. 2003. Nash q-learning for general-sum stochastic games. Journal of Machine Learning Research 4, 10391069.
Kaelbling, L. P., Littman, M., Moore, A. 1996. Reinforcement learning: a survey. Journal of Artificial Intelligence Research 4, 237285.
Kapetanakis, S., Kudenko, D. 2002. Reinforcement learning of coordination in cooperative multi-agent systems. In Proceedings of the 9th NCAI, Dechter, R., Kearns, M. & Sutton, R. (eds.). Edmonton, Alberta, Canada.
Kapetanakis, S., Kudenko, D. 2004. Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems In AAMAS ‘04: Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multiagent Systems, 1258–1259. IEEE Computer Society.
Kapetanakis, S., Kudenko, D., Strens, M. J. A. 2005. Learning to coordinate using commitment sequences in cooperative multi-agent systems. In Adaptive Agents and Multi-Agent Systems II: Adaptation and Multi-Agent Learning, Lecture Notes in Computer Science, 106118. Springer.
Kuyer, L., Whiteson, S., Bakker, B., Vlassis, N. 2008. Multiagent reinforcement learning for urban traffic control using coordination graphs. In ECML PKDD '08: Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases – Part I, Lecture Notes in Computer Science, 5211, 656–671. Springer.
Lauer, M., Riedmiller, M. 2000. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In Proceedings of the 17th International Conference on Machine Learning, 535–542. Morgan Kaufmann.
Lauer, M., Riedmiller, M. 2004. Reinforcement learning for stochastic cooperative multi-agent systems. Autonomous Agents and Multi-Agent Systems 03, 15161517.
Laurent, G. J., Matignon, L., Le Fort-Piat, N. 2010. The world of independent learners is not Markovian. Innovation in Knowledge-Based & Intelligent Engineering Systems 15, IOS Press.
Littman, M. 2001. Value-function reinforcement learning in Markov games. Journal of Cognitive Systems Research 2, 5566.
Luntz, J. E., Messner, W., Choset, H. 2001. Distributed manipulation using discrete actuator arrays. The International Journal of Robotics Research 20(7), 553583.
Mataric, M. J. 1998. Using communication to reduce locality in distributed multiagent learning. Journal of Experimental & Theoretical Artificial Intelligence 10(3), 357369.
Matignon, L., Laurent, G. J., Le Fort-Piat, N. 2006. Reward function and initial values : better choices for accelerated goal-directed reinforcement learning. In Proceedings of the 16th International Conference on Artificial Neural Networks (ICANN'06), Lecture Notes in Computer Science, 4131, 840–849. Springer.
Matignon, L., Laurent, G. J., Le Fort-Piat, N. 2007. Hysteretic q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems IROS 2007, 64–69.
Matignon, L., Laurent, G. J., Le Fort-Piat, N. 2008. A study of FMQ heuristic in cooperative multi-agent games. In Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems. Workshop 10 : Multi-Agent Sequential Decision Making in Uncertain Multi-Agent Domains (AAMAS 08), Estoril, Portugal.
Matignon, L., Laurent, G. J., Le Fort-Piat, N., Chapuis, Y. A. 2010. Designing decentralized controllers for distributed-air-jet MEMS-based micromanipulators by reinforcement learning. Journal of Intelligent and Robotic Systems 59(2), 145166.
McGlohon, M., Sen, S. 2005. Learning to cooperate in multi-agent systems by combining q-learning and evolutionary strategy. International Journal on Lateral Computing 1(2), 5864.
Melo, F. S., Lopes, M. C. 2007. Convergence of independent adaptive learners. In Progress in Artificial Intelligence: 13th Portuguese Conference on Artificial Intelligence, Lecture Notes in Artificial Intelligence, 4874, 555–567. Springer-Verlag.
Nash, J. F. 1950. Equilibrium points in n-person games. In Proceedings of the National Academy of Sciences of the United States of America 36, 4849.
Osborne, M. J., Rubinstein, A. 1994. A Course in Game Theory. MIT Press.
Panait, L., Sullivan, K., Luke, S. 2006. Lenient learners in cooperative multiagent systems. In AAMAS '06: Proceedings of the 5th International Joint Conference on Autonomous Agents and Multiagent Systems, 801–803. ACM Press.
Panait, L., Tuyls, K., Luke, S. 2008. Theoretical advantages of lenient learners: an evolutionary game theoretic perspective. Journal of Machine Learning Research 9, 423457.
Peshkin, L., Kim, K.-E., Meuleau, N., Kaelbling, L. P. 2000. Learning to cooperate via policy search. In 16th Conference on Uncertainty in Artificial Intelligence, 307–314. Morgan Kaufmann.
Sen, S., Sekaran, M. 1998. Individual learning of coordination knowledge. Journal of Experimental & Theoretical Artificial Intelligence 10(3), 333356.
Sen, S., Sekaran, M., Hale, J. 1994. Learning to coordinate without sharing information. In Proceedings of the 12th National Conference on Artificial Intelligence, 426–431, Seattle, WA.
Shapley, L. 1953. Stochastic games. Proceedings of the National Academy of Sciences of the United States of America 39, 10951100.
Singh, S. P., Jaakkola, T., Littman, M. L., Szepesvari, C. 2000. Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning 38(3), 287308.
Stone, P., Veloso, M. M. 2000. Multiagent systems: a survey from a machine learning perspective. Autonomous Robots 8(3), 345383.
Sutton, R. S., Barto, A. G. 1998. Reinforcement Learning: An Introduction. The MIT Press.
Tan, M. 1993. Multiagent reinforcement learning: independent vs. cooperative agents. In Proceedings of the 10th International Conference on Machine Learning, 330–337. Morgan Kaufmann.
Tumer, K., Agogino, A. K. 2010. A multiagent approach to managing air traffic flow. Journal of Autonomous Agents and Multi-Agent Systems 24, 125.
Tumer, K., Agogino, A. 2007. Distributed agent-based air traffic flow management In AAMAS ‘07: Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, 1–8. ACM.
Tuyls, K., Nowé, A. 2005. Evolutionary game theory and multi-agent reinforcement learning. Knowledge Engineering Review 20(1), 6390.
Verbeeck, K., Nowé, A., Parent, J., Tuyls, K. 2007. Exploring selfish reinforcement learning in repeated games with stochastic rewards. Autonomous Agents and Multi-Agent Systems 14(3), 239269.
Wang, Y., de Silva, C. W. 2006. Multi-robot box-pushing: single-agent q-learning vs. team q-learning. In Proceedings opf the IROS, 36943699.
Wang, Y., de Silva, C. W. 2008. A machine-learning approach to multi-robot coordination. Engineering Applications of Artificial Intelligence 21(3), 470484.
Watkins, C., Dayan, P. 1992. Technical note: Q-learning. Machine Learning 8, 279292.
Wolpert, D. H., Tumer, K. 1999. An Introduction to Collective Intelligence. Technical Report NASA-ARC-IC-99-63, NASA Ames Research Center.
Wolpert, D. H., Tumer, K. 2001. Optimal payoff functions for members of collectives. Advances in Complex Systems 04(02), 265279.
Wunder, M., Littman, M. L., Babes, M. 2010. Classes of multiagent q-learning dynamics with epsilon-greedy exploration. In ICML'10: Proceedings of the 27th international Conference on Machine Learning, 1167–1174. Omni Press.
Yang, E., Gu, D. 2004. Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey. Department of Computer Science, University of Essex.

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed