Hostname: page-component-5db58dd55d-jhf8m Total loading time: 0 Render date: 2026-06-03T15:05:37.480Z Has data issue: false hasContentIssue false

Inferring social networks from unstructured text data: A proof of concept detection of hidden communities of interest

Published online by Cambridge University Press:  26 January 2024

Christophe Malaterre*
Affiliation:
Département de philosophie, Université du Québec à Montréal (UQAM), Montréal, Québec, Canada Centre interuniversitaire de recherche sur la science et la technologie, Université du Québec à Montréal (UQAM), Montréal, Québec, Canada
Francis Lareau
Affiliation:
Département d’informatique, Université du Québec à Montréal (UQAM), Montréal, Québec, Canada
*
Corresponding author: Christophe Malaterre; Email: malaterre.christophe@uqam.ca

Abstract

Social network analysis is known to provide a wealth of insights relevant to many aspects of policymaking. Yet, the social data needed to construct social networks are not always available. Furthermore, even when they are, interpreting such networks often relies on extraneous knowledge. Here, we propose an approach to infer social networks directly from the texts produced by actors and the terminological similarities that these texts exhibit. This approach relies on fitting a topic model to the texts produced by these actors and measuring topic profile correlations between actors. This reveals what can be called “hidden communities of interest,” that is, groups of actors sharing similar semantic contents but whose social relationships with one another may be unknown or underlying. Network interpretation follows from the topic model. Diachronic perspectives can also be built by modeling the networks over different time periods and mapping genealogical relationships between communities. As a case study, the approach is deployed over a working corpus of academic articles (domain of philosophy of science; N=16,917).

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. Approach to identify HCoI’s from a corpus of text data.

Figure 1

Table 1. Document distributions depending on sources (journals)

Figure 2

Figure 2. Authors and articles.

Figure 3

Figure 3. Cv coherence topic models as a function of the number k of topics.

Figure 4

Table 2. Topics and keywords

Figure 5

Figure 4. Inter-topic similarity measures (1-Hellinger distance).

Figure 6

Table 3. Network statistics per time period

Figure 7

Figure 5. (a) Actor communities and (b) their topic profiles 1930–1951.

Figure 8

Figure 6. (a) Actor communities and (b) their topic profiles 1952-1973.

Figure 9

Figure 7. (a) Actor communities and (b) their topic profiles 1974-1995.

Figure 10

Figure 8. (a) Actor communities and (b) their topic profiles 1996–2017.

Figure 11

Figure 9. Community distances (Hellinger distances between community topic probability distributions): (a) between the communities of the first period (1930-1951) and those of the second (1952-1973); (b) between the communities of the second period and those of the third (1974-1995); (c) between the communities of the third period and those of the fourth (1996-2017).

Submit a response

Comments

No Comments have been published for this article.