Hostname: page-component-89b8bd64d-dvtzq Total loading time: 0 Render date: 2026-05-07T10:09:48.374Z Has data issue: false hasContentIssue false

Research topic evolution: a comparative analysis of human and machine approaches

Published online by Cambridge University Press:  27 August 2025

Siyi Xiao*
Affiliation:
Texas A&M University, USA
Daniel A. McAdams
Affiliation:
Texas A&M University, USA

Abstract:

Exploring patterns in large text corpus is essential for effective knowledge discovery in research domains. However, machine-driven methods often introduce noise and rely heavily on parameter thresholds. Human expertise is therefore essential for ensuring reliable outcomes. This study conducts a comparative analysis of a classification task performed by both human and computer algorithms. During the task, human experts are asked to categorize a list of abstracts based on their semantic contents, where computer algorithms perform computations, including network analysis and document embeddings, to group the abstracts. The findings show a significant level of disagreement between human and computer-generated clusters, indicating the need for further investigation into the factors influencing community categorization and incorporating more advanced techniques to improve the results.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright
© The Author(s) 2025
Figure 0

Tabel 1. Summary of computational methods for topic analysis

Figure 1

Tabel 2. Categorization strategies from each researcher

Figure 2

Figure 1. The 2018 network uses Doc2Vec with a threshold of 0.85

Figure 3

Tabel 3. Community Labels and corresponding three selected document titles

Figure 4

Figure 2. Comparison of document categorization results among five researchers and computer algorithms

Figure 5

Figure 3. Sankey diagram from Doc2Vec model with a threshold of 0.825