Hostname: page-component-77f85d65b8-hzqq2 Total loading time: 0 Render date: 2026-03-29T16:44:28.165Z Has data issue: false hasContentIssue false

Knowledge-based Reasoning and Learning under Partial Observability in Ad Hoc Teamwork

Published online by Cambridge University Press:  26 June 2023

HASRA DODAMPEGAMA
Affiliation:
Intelligent Robotics Lab, School of Computer Science, University of Birmingham, UK (e-mails: hhd968@student.bham.ac.uk, m.sridharan@bham.ac.uk)
MOHAN SRIDHARAN
Affiliation:
Intelligent Robotics Lab, School of Computer Science, University of Birmingham, UK (e-mails: hhd968@student.bham.ac.uk, m.sridharan@bham.ac.uk)
Rights & Permissions [Opens in a new window]

Abstract

Ad hoc teamwork (AHT) refers to the problem of enabling an agent to collaborate with teammates without prior coordination. State of the art methods in AHT are data-driven, using a large labeled dataset of prior observations to model the behavior of other agent types and to determine the ad hoc agent’s behavior. These methods are computationally expensive, lack transparency, and make it difficult to adapt to previously unseen changes. Our recent work introduced an architecture that determined an ad hoc agent’s behavior based on non-monotonic logical reasoning with prior commonsense domain knowledge and models learned from limited examples to predict the behavior of other agents. This paper describes KAT, a knowledge-driven architecture for AHT that substantially expands our prior architecture’s capabilities to support: (a) online selection, adaptation, and learning of the behavior prediction models; and (b) collaboration with teammates in the presence of partial observability and limited communication. We illustrate and experimentally evaluate KAT’s capabilities in two simulated benchmark domains for multiagent collaboration: Fort Attack and Half Field Offense. We show that KAT’s performance is better than a purely knowledge-driven baseline, and comparable with or better than a state of the art data-driven baseline, particularly in the presence of limited training data, partial observability, and changes in team composition.

Information

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Fig. 1. Screenshots: (a, b) fort attack environment; (c, d) half-field offense environment.

Figure 1

Fig. 2. Our KAT architecture combines complementary strengths of knowledge-based and data-driven heuristic reasoning and learning.

Figure 2

Table 1. Attributes considered for models of other agents’ behavior in FA domain. Number of attributes represent the number of variables in each attribute times the number of agents

Figure 3

Table 2. Attributes for models of teammates and defense agents’ behavior in HFO domain. Number of attributes represent the number of variables in each attribute times the number of agents

Figure 4

Fig. 3. Examples of FF trees for a guard and an attacker in the FA domain.

Figure 5

Algorithm 1: Model Selection

Figure 6

Table 3. Wins (%) for guards with hand-crafted policies in FA domain (Exp1). Model adaptation improves performance

Figure 7

Table 4. Wins (%) for guards with built-in policies in FA domain (Exp2). Model adaptation improves performance

Figure 8

Table 5. Wins (%) for guards with hand-crafted policies in FA domain (Exp1). Communication addresses partial observability

Figure 9

Table 6. Wins (%) for guards with built-in policies in FA domain (Exp2). Communication addresses partial observability

Figure 10

Table 7. Fraction of goals scored (i.e., games won) by the offense team in HFO domain with and without the learned behavior prediction models (Exp3). Reasoning with prior domain knowledge but without the behavior prediction models has a negative impact on performance

Figure 11

Table 8. Prediction accuracy of the learned agent behavior models in limited (2v2) version of the HFO domain (Exp4)

Figure 12

Table 9. Prediction accuracy of the learned agent behavior models in full (4v5) version of the HFO domain (Exp5)

Figure 13

Table 10. Fraction of goals scored (i.e., games won) by the offense team in HFO domain in the limited version (2v2, Exp4) and full version (4v5, Exp5). KAT’s performance comparable with the baselines in the limited version and much better than the baselines in the full version

Figure 14

Table 11. Goals scored (i.e., games won) by offense team in HFO domain under partial observability (Exp6, Exp7). KAT’s performance comparable with baseline that had no ad hoc agents in the team but used training datasets that were orders of magnitude larger