Hostname: page-component-89b8bd64d-4ws75 Total loading time: 0 Render date: 2026-05-11T03:56:26.893Z Has data issue: false hasContentIssue false

Adaptive learning with artificial barriers yielding Nash equilibria in general games

Published online by Cambridge University Press:  29 November 2023

Ismail Hassan*
Affiliation:
OsloMet – Oslo Metropolitan University, Oslo, Norway
B. John Oommen
Affiliation:
Carleton University, Ottowa, Canada
Anis Yazidi
Affiliation:
OsloMet – Oslo Metropolitan University, Oslo, Norway
*
Corresponding author: Ismail Hassan; Email: ismail@oslomet.no
Rights & Permissions [Opens in a new window]

Abstract

Artificial barriers in Learning Automata (LA) is a powerful and yet under-explored concept although it was first proposed in the 1980s. Introducing artificial non-absorbing barriers makes the LA schemes resilient to being trapped in absorbing barriers, a phenomenon which is often referred to as lock in probability leading to an exclusive choice of one action after convergence. Within the field of LA and reinforcement learning in general, there is a sacristy of theoretical works and applications of schemes with artificial barriers. In this paper, we devise a LA with artificial barriers for solving a general form of stochastic bimatrix game. Classical LA systems possess properties of absorbing barriers and they are a powerful tool in game theory and were shown to converge to game’s of Nash equilibrium under limited information. However, the stream of works in LA for solving game theoretical problems can merely solve the case where the Saddle Point of the game exists in a pure strategy and fail to reach mixed Nash equilibrium when no Saddle Point exists for a pure strategy.

Furthermore, we provide experimental results that are in line with our theoretical findings.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Figure 1. LA interacting with the environment.

Figure 1

Table 1. Error for different values of $\theta$ and $p_{max}$, when $p_{opt}=0.6667$ and $q_{opt}=0.3333$ for the game specified by the R matrix given by Equation (6.1) and the C matrix given by Equation (6.2)

Figure 2

Figure 2. Trajectory of $[p_1(t), q_1(t)]^{\intercal}$ for the case of the R matrix given by Equation (6.1) and the C matrix given by Equation (6.2) with $p_{opt}=0.6667$ and $q_{opt}=0.3333$, and using $p_{max}=0.99$ and $\theta=0.01$.

Figure 3

Figure 3. Time Evolution X(t) for the case of the R matrix given by Equation (6.1) and the C matrix given by Equation (6.2) with $p_{opt}=0.6667$ and $q_{opt}=0.3333$, and using $p_{max}=0.99$ and $\theta=0.00001$.

Figure 4

Figure 4. Trajectory of X(t) where $p_{opt}=0.6667$ and $q_{opt}=0.3333$, using $p_{max}=0.99$ and $\theta=0.00001$.

Figure 5

Table 2. Error for different values of $\theta$ and $p_{max}$ for the game specified by the R matrix and the C matrix given by Equations (6.5) and (6.6)

Figure 6

Figure 5. Trajectory of ODE using $p_{max}=0.99$ for case 1.

Figure 7

Figure 6. Trajectory of the deterministic ODE using $p_{max}=0.99$ for case 2.

Figure 8

Figure 7. Time evolution over time of X(t) for $\theta=0.01$ and $p_{max}=0.99$ for the case of the R matrix given Equation (6.5) and for the C matrix given by Equation (6.6).

Figure 9

Figure 8. Trajectory of ODE using $p_{max}=0.999$ for case 2.

Figure 10

Figure 9. The figure shows (a) the evolution over time of X(t) for $\theta=0.01$ and $p_{max}=0.999$ when applied to game with payoffs specified by the R matrix and the C matrix given by Equation (6.5) and Equation (6.6), and (b) is a zoomed version around player A strategy c) and is a zoomed version around player B strategy.

Figure 11

Figure 10. 9 Trajectories of the LA with barriers starting from random initial point with $p_{max}=0.99$ and $\theta=0.0001$.

Figure 12

Figure 11. Trajectory of ODE using $p_{max}=0.99$ for case 3.

Figure 13

Table 3. Error for different values of $\theta$ and $p_{max}$ for the game specified by the R matrix and the C matrix given by Equations (6.7) and (6.8)

Figure 14

Figure 12. Trajectory of S-LA using $p_{max}=0.99$ and $\theta=0.0001$ for case 1.

Figure 15

Figure 13. Trajectory of S-LA using $p_{max}=0.99$ and $\theta=0.0001$ for case 2.

Figure 16

Figure 14. Trajectory of S-LA using $p_{max}=0.99$ and $\theta=0.0001$ for case 3.