Hostname: page-component-77f85d65b8-jkvpf Total loading time: 0 Render date: 2026-04-20T17:49:40.034Z Has data issue: false hasContentIssue false

Using active learning and an agent-based system to perform interactive knowledge extraction based on the COVID-19 corpus

Published online by Cambridge University Press:  08 November 2023

Yao Yao*
Affiliation:
Lero–Science Foundation Ireland Research Centre for Software, Department of Computer Science and Information Systems, University of Limerick, Limerick, Ireland
Junying Liu
Affiliation:
NatPro Center, School of Pharmacy and Pharmaceutical Sciences, Trinity College Dublin, Dublin 2, Ireland
Conor Ryan
Affiliation:
Lero–Science Foundation Ireland Research Centre for Software, Department of Computer Science and Information Systems, University of Limerick, Limerick, Ireland
*
Corresponding author: Yao Yao; Email: Yao.Yao@ncirl.ie
Rights & Permissions [Opens in a new window]

Abstract

Efficient knowledge extraction from Big Data is quite a challenging topic. Recognizing relevant concepts from unannotated data while considering both context and domain knowledge is critical to implementing successful knowledge extraction. In this research, we provide a novel platform we call Active Learning Integrated with Knowledge Extraction (ALIKE) that overcomes the challenges of context awareness and concept extraction, which have impeded knowledge extraction in Big Data. We propose a method to extract related concepts from unorganized data with different contexts using multiple agents, synergy, reinforcement learning, and active learning.

We test ALIKE on the datasets of the COVID-19 Open Research Dataset Challenge. The experiment result suggests that the ALIKE platform can more efficiently distinguish inherent concepts from different papers than a non-agent-based method (without active learning) and that our proposed approach has a better chance to address the challenges of knowledge extraction with heterogeneous datasets. Moreover, the techniques used in ALIKE are transferable across any domain with multidisciplinary activity.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Figure 1. The generic functional role of a CI agent in the corresponding knowledge extraction.

Figure 1

Figure 2. Illustration of the active learning dialogues for domain-specific expert users.

Figure 2

Figure 3. The interaction between AL agents and experts.

Figure 3

Figure 4. The interactive collaboration of agents in knowledge extraction. During this process, the agents also interact with experts for better validate extracted knowledge.

Figure 4

Figure 5. The comparison of three different approaches to concept identification in single experiment. (a) The comparison result using the same dataset in one run, while (b) illustrates the result obtained from another dataset in another run. The efficiency of the three different approaches may be slightly different based on different datasets, but the general tendency is clear for these three approaches.

Figure 5

Figure 6. Multiple comparisons based on three different approaches in multiple runs. (a) The comparison result using the different datasets in the different runs, while (b) illustrates the result obtained from the same experimental conditions with a piece of different initial knowledge. In the experiments, different datasets and initial knowledge both can vary the efficiency of these approaches although CI agents with reinforcement learning always tend to be the most optimal method on all these cases. We separately applied the Mann–Whitney U test and proved the average efficiency of CI agents with the reinforcement learning method is significantly greater (p-value <0.05) than the average efficiency of other methods.

Figure 6

Figure 7. The comparisons between the knowledge extraction involved active learning agents and the ones without. (a) The comparison result using dataset A, while (b) illustrates the result obtained from dataset B. The application of active learning demonstrates evident benefits across diverse datasets, albeit with slight variations.

Figure 7

Table 1. Relevant datasets in the experiment

Figure 8

Figure 8. The similar tendency based on different training datasets.

Figure 9

Figure 9. The distribution patterns of the repetitive knowledge detection.