Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems

Laetitia Matignon; Guillaume J. Laurent; Nadine Le Fort-Piat

doi:10.1017/S0269888912000057

Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems

Published online by Cambridge University Press: 22 February 2012

Laetitia Matignon ,

Guillaume J. Laurent and

Nadine Le Fort-Piat

Show author details

Laetitia Matignon*: Affiliation:
FEMTO-ST Institute, UMR CNRS 6174, UFC/ENSMM/UTBM, 24 rue Alain Savary, 25000 Besançon, France
Guillaume J. Laurent*: Affiliation:
FEMTO-ST Institute, UMR CNRS 6174, UFC/ENSMM/UTBM, 24 rue Alain Savary, 25000 Besançon, France
Nadine Le Fort-Piat*: Affiliation:
FEMTO-ST Institute, UMR CNRS 6174, UFC/ENSMM/UTBM, 24 rue Alain Savary, 25000 Besançon, France
*: e-mail: laetitia.matignon@gmail.com, guillaume.laurent@ens2m.fr, nadine.piat@ens2m.fr
e-mail: laetitia.matignon@gmail.com, guillaume.laurent@ens2m.fr, nadine.piat@ens2m.fr
e-mail: laetitia.matignon@gmail.com, guillaume.laurent@ens2m.fr, nadine.piat@ens2m.fr

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate. This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, non-stationarity, stochasticity, alter-exploration and shadowed equilibria. A selection of multi-agent domains is classified according to those challenges: matrix games, Boutilier's coordination game, predators pursuit domains and a special multi-state game. Moreover, the performance of a range of algorithms for independent reinforcement learners is evaluated empirically. Those algorithms are Q-learning variants: decentralized Q-learning, distributed Q-learning, hysteretic Q-learning, recursive frequency maximum Q-value and win-or-learn fast policy hill climbing. An overview of the learning algorithms’ strengths and weaknesses against each challenge concludes the paper and can serve as a basis for choosing the appropriate algorithm for a new domain. Furthermore, the distilled challenges may assist in the design of new learning algorithms that overcome these problems and achieve higher performance in multi-agent applications.

Information

Type: Articles
Information: The Knowledge Engineering Review , Volume 27 , Issue 1 , 22 February 2012 , pp. 1 - 31

DOI: https://doi.org/10.1017/S0269888912000057 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Abdallah, S., Lesser, V. 2008. A multiagent reinforcement learning algorithm with non-linear dynamics. Journal of Artificial Intelligence Research 33, 521–549.CrossRef Google Scholar

Agogino, A., Turner, K. 2005. Multi-agent reward analysis for learning in noisy domains. In Proceedings of the 4th InternationalJoint Conference on Autonomous Agents and Multiagent Systems, AAMAS'05, 81–88. ACM.CrossRef Google Scholar

Bab, A., Brafman, R. I. 2008. Multi-agent reinforcement learning in common interest and fixed sum stochastic games: an experimental study. Journal of Machine Learning Research 9, 2635–2675.Google Scholar

Balch, T., Arkin, R. C. 1994. Communication in reactive multiagent robotic systems. Autonomous Robots 1(1), 27–52.CrossRef Google Scholar

Banerjee, B., Peng, J. 2003. Adaptive policy gradient in multiagent learning. In AAMAS '03: Proceedings of the 2nd International Joint Conference on Autonomous Agents and Multiagent Systems, 686–692. ACM.CrossRef Google Scholar

Banerjee, B., Sen, S., Peng, J. 2004. On-policy concurrent reinforcement learning. Journal of Experimental & Theoretical Artificial Intelligence 16(4), 245–260.CrossRef Google Scholar

Benda, M., Jagannathan, V., Dodhiawala, R. 1986. On Optimal Cooperation of Knowledge Sources – an Experimental Investigation. Technical report BCS-G2010-280, Boeing Advanced Technology Center, Boeing Computing Services.Google Scholar

Boutilier, C. 1996. Planning, learning and coordination in multiagent decision processes. In Theoretical Aspects of Rationality and Knowledge, Morgan Kaufmann Publishers Inc., 195–201.Google Scholar

Boutilier, C. 1999. Sequential optimality and coordination in multiagent systems. In IJCAI, Morgan Publishers Inc., 478–485.Google Scholar

Bowling, M. 2005. Convergence and no-regret in multiagent learning. In Advances in Neural Information Processing Systems, Saul, L. K., Weiss, Y. & Bottou, L. (eds). MIT Press, 209–216.Google Scholar

Bowling, M., Veloso, M. 2000. An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning. Technical report, Computer Science Department, Carnegie Mellon University.Google Scholar

Bowling, M., Veloso, M. 2002. Multiagent learning using a variable learning rate. Artificial Intelligence 136, 215–250.CrossRef Google Scholar

Brafman, R. I., Tennenholtz, M. 2003. Learning to coordinate efficiently: a model-based approach. Journal of Artificial Intelligence Research 19, 11–23.CrossRef Google Scholar

Busoniu, L., Babuska, R., De Schutter, B. 2006. Decentralized reinforcement learning control of a robotic manipulator. In Proceedings of the 9th International Conference on Control, Automation, Robotics and Vision (ICARCV 2006), 1347–1352. Singapore.CrossRef Google Scholar

Busoniu, L., Babuska, R., De Schutter, B. 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 38(2), 156–172.CrossRef Google Scholar

Carpenter, M., Kudenko, D. 2005. Baselines for joint-action reinforcement learning of coordination in cooperative multi-agent systems. In Adaptive Agents and Multi-Agent Systems II: Adaptation and Multi-Agent Learning, Lecture Notes in Computer Science, 3394, 55–72. Springer.CrossRef Google Scholar

Claus, C., Boutilier, C. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the 15th National Conference on Artificial Intelligence, 746–752, American Association for Artificial Intelligence.Google Scholar

Dowling, J., Cunningham, R., Curran, E., Cahill, V. 2006. Building autonomic systems using collaborative reinforcement learning. Knowledge Engineering Review 21(3), 231–238.CrossRef Google Scholar

Fulda, N., Ventura, D. 2007. Predicting and preventing coordination problems in cooperative q-learning systems. In Proceedings of the International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc.Google Scholar

Gabel, T., Riedmiller, M. 2006. Multi-agent case-based reasoning for cooperative reinforcement learners. In Proceedings of the ECCBR, 32–46. Springer.Google Scholar

Gomes, E. R., Kowalczyk, R. 2009. Dynamic analysis of multiagent-learning with ε-greedy exploration. In ICML'09: Proceedings of the 26th International Conference on Machine Learning, 47. ACM.Google Scholar

Hu, J., Wellman, M. P. 2003. Nash q-learning for general-sum stochastic games. Journal of Machine Learning Research 4, 1039–1069.Google Scholar

Kaelbling, L. P., Littman, M., Moore, A. 1996. Reinforcement learning: a survey. Journal of Artificial Intelligence Research 4, 237–285.CrossRef Google Scholar

Kapetanakis, S., Kudenko, D. 2002. Reinforcement learning of coordination in cooperative multi-agent systems. In Proceedings of the 9th NCAI, Dechter, R., Kearns, M. & Sutton, R. (eds.). Edmonton, Alberta, Canada.Google Scholar

Kapetanakis, S., Kudenko, D. 2004. Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems In AAMAS ‘04: Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multiagent Systems, 1258–1259. IEEE Computer Society.Google Scholar

Kapetanakis, S., Kudenko, D., Strens, M. J. A. 2005. Learning to coordinate using commitment sequences in cooperative multi-agent systems. In Adaptive Agents and Multi-Agent Systems II: Adaptation and Multi-Agent Learning, Lecture Notes in Computer Science, 106–118. Springer.CrossRef Google Scholar

Kuyer, L., Whiteson, S., Bakker, B., Vlassis, N. 2008. Multiagent reinforcement learning for urban traffic control using coordination graphs. In ECML PKDD '08: Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases – Part I, Lecture Notes in Computer Science, 5211, 656–671. Springer.CrossRef Google Scholar

Lauer, M., Riedmiller, M. 2000. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In Proceedings of the 17th International Conference on Machine Learning, 535–542. Morgan Kaufmann.Google Scholar

Lauer, M., Riedmiller, M. 2004. Reinforcement learning for stochastic cooperative multi-agent systems. Autonomous Agents and Multi-Agent Systems 03, 1516–1517.Google Scholar

Laurent, G. J., Matignon, L., Le Fort-Piat, N. 2010. The world of independent learners is not Markovian. Innovation in Knowledge-Based & Intelligent Engineering Systems 15, IOS Press.Google Scholar

Littman, M. 2001. Value-function reinforcement learning in Markov games. Journal of Cognitive Systems Research 2, 55–66.CrossRef Google Scholar

Luntz, J. E., Messner, W., Choset, H. 2001. Distributed manipulation using discrete actuator arrays. The International Journal of Robotics Research 20(7), 553–583.CrossRef Google Scholar

Mataric, M. J. 1998. Using communication to reduce locality in distributed multiagent learning. Journal of Experimental & Theoretical Artificial Intelligence 10(3), 357–369.CrossRef Google Scholar

Matignon, L., Laurent, G. J., Le Fort-Piat, N. 2006. Reward function and initial values : better choices for accelerated goal-directed reinforcement learning. In Proceedings of the 16th International Conference on Artificial Neural Networks (ICANN'06), Lecture Notes in Computer Science, 4131, 840–849. Springer.CrossRef Google Scholar

Matignon, L., Laurent, G. J., Le Fort-Piat, N. 2007. Hysteretic q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems IROS 2007, 64–69.Google Scholar

Matignon, L., Laurent, G. J., Le Fort-Piat, N. 2008. A study of FMQ heuristic in cooperative multi-agent games. In Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems. Workshop 10 : Multi-Agent Sequential Decision Making in Uncertain Multi-Agent Domains (AAMAS 08), Estoril, Portugal.Google Scholar

Matignon, L., Laurent, G. J., Le Fort-Piat, N., Chapuis, Y. A. 2010. Designing decentralized controllers for distributed-air-jet MEMS-based micromanipulators by reinforcement learning. Journal of Intelligent and Robotic Systems 59(2), 145–166.CrossRef Google Scholar

McGlohon, M., Sen, S. 2005. Learning to cooperate in multi-agent systems by combining q-learning and evolutionary strategy. International Journal on Lateral Computing 1(2), 58–64.Google Scholar

Melo, F. S., Lopes, M. C. 2007. Convergence of independent adaptive learners. In Progress in Artificial Intelligence: 13th Portuguese Conference on Artificial Intelligence, Lecture Notes in Artificial Intelligence, 4874, 555–567. Springer-Verlag.CrossRef Google Scholar

Nash, J. F. 1950. Equilibrium points in n-person games. In Proceedings of the National Academy of Sciences of the United States of America 36, 48–49.CrossRef Google Scholar PubMed

Osborne, M. J., Rubinstein, A. 1994. A Course in Game Theory. MIT Press.Google Scholar

Panait, L., Sullivan, K., Luke, S. 2006. Lenient learners in cooperative multiagent systems. In AAMAS '06: Proceedings of the 5th International Joint Conference on Autonomous Agents and Multiagent Systems, 801–803. ACM Press.CrossRef Google Scholar

Panait, L., Tuyls, K., Luke, S. 2008. Theoretical advantages of lenient learners: an evolutionary game theoretic perspective. Journal of Machine Learning Research 9, 423–457.Google Scholar

Peshkin, L., Kim, K.-E., Meuleau, N., Kaelbling, L. P. 2000. Learning to cooperate via policy search. In 16th Conference on Uncertainty in Artificial Intelligence, 307–314. Morgan Kaufmann.Google Scholar

Sen, S., Sekaran, M. 1998. Individual learning of coordination knowledge. Journal of Experimental & Theoretical Artificial Intelligence 10(3), 333–356.CrossRef Google Scholar

Sen, S., Sekaran, M., Hale, J. 1994. Learning to coordinate without sharing information. In Proceedings of the 12th National Conference on Artificial Intelligence, 426–431, Seattle, WA.Google Scholar

Shapley, L. 1953. Stochastic games. Proceedings of the National Academy of Sciences of the United States of America 39, 1095–1100.CrossRef Google Scholar PubMed

Singh, S. P., Jaakkola, T., Littman, M. L., Szepesvari, C. 2000. Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning 38(3), 287–308.CrossRef Google Scholar

Stone, P., Veloso, M. M. 2000. Multiagent systems: a survey from a machine learning perspective. Autonomous Robots 8(3), 345–383.CrossRef Google Scholar

Sutton, R. S., Barto, A. G. 1998. Reinforcement Learning: An Introduction. The MIT Press.Google Scholar

Tan, M. 1993. Multiagent reinforcement learning: independent vs. cooperative agents. In Proceedings of the 10th International Conference on Machine Learning, 330–337. Morgan Kaufmann.CrossRef Google Scholar

Tumer, K., Agogino, A. K. 2010. A multiagent approach to managing air traffic flow. Journal of Autonomous Agents and Multi-Agent Systems 24, 1–25.Google Scholar

Tumer, K., Agogino, A. 2007. Distributed agent-based air traffic flow management In AAMAS ‘07: Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, 1–8. ACM.CrossRef Google Scholar

Tuyls, K., Nowé, A. 2005. Evolutionary game theory and multi-agent reinforcement learning. Knowledge Engineering Review 20(1), 63–90.CrossRef Google Scholar

Verbeeck, K., Nowé, A., Parent, J., Tuyls, K. 2007. Exploring selfish reinforcement learning in repeated games with stochastic rewards. Autonomous Agents and Multi-Agent Systems 14(3), 239–269.CrossRef Google Scholar

Wang, Y., de Silva, C. W. 2006. Multi-robot box-pushing: single-agent q-learning vs. team q-learning. In Proceedings opf the IROS, 3694–3699.Google Scholar

Wang, Y., de Silva, C. W. 2008. A machine-learning approach to multi-robot coordination. Engineering Applications of Artificial Intelligence 21(3), 470–484.CrossRef Google Scholar

Watkins, C., Dayan, P. 1992. Technical note: Q-learning. Machine Learning 8, 279–292.CrossRef Google Scholar

Wolpert, D. H., Tumer, K. 1999. An Introduction to Collective Intelligence. Technical Report NASA-ARC-IC-99-63, NASA Ames Research Center.Google Scholar

Wolpert, D. H., Tumer, K. 2001. Optimal payoff functions for members of collectives. Advances in Complex Systems 04(02), 265–279.CrossRef Google Scholar

Wunder, M., Littman, M. L., Babes, M. 2010. Classes of multiagent q-learning dynamics with epsilon-greedy exploration. In ICML'10: Proceedings of the 27th international Conference on Machine Learning, 1167–1174. Omni Press.Google Scholar

Yang, E., Gu, D. 2004. Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey. Department of Computer Science, University of Essex.Google Scholar

Article contents

Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems

Abstract

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests