Hostname: page-component-89b8bd64d-n8gtw Total loading time: 0 Render date: 2026-05-07T11:37:06.950Z Has data issue: false hasContentIssue false

Modeling Latent Topics in Social Media using Dynamic Exploratory Graph Analysis: The Case of the Right-wing and Left-wing Trolls in the 2016 US Elections

Published online by Cambridge University Press:  01 January 2025

Hudson Golino*
Affiliation:
University of Virginia
Alexander P. Christensen
Affiliation:
University of Pennsylvania
Robert Moulder
Affiliation:
University of Virginia
Seohyun Kim
Affiliation:
University of Virginia
Steven M. Boker
Affiliation:
University of Virginia
*
Correspondence should be made to Hudson Golino, University of Virginia, Charlottesville, USA. Email: hfg9s@virginia.edu
Rights & Permissions [Opens in a new window]

Abstract

The past few years were marked by increased online offensive strategies perpetrated by state and non-state actors to promote their political agenda, sow discord, and question the legitimacy of democratic institutions in the US and Western Europe. In 2016, the US congress identified a list of Russian state-sponsored Twitter accounts that were used to try to divide voters on a wide range of issues. Previous research used latent Dirichlet allocation (LDA) to estimate latent topics in data extracted from these accounts. However, LDA has characteristics that may limit the effectiveness of its use on data from social media: The number of latent topics must be specified by the user, interpretability of the topics can be difficult to achieve, and it does not model short-term temporal dynamics. In the current paper, we propose a new method to estimate latent topics in texts from social media termed Dynamic Exploratory Graph Analysis (DynEGA). In a Monte Carlo simulation, we compared the ability of DynEGA and LDA to estimate the number of simulated latent topics. The results show that DynEGA is substantially more accurate than several different LDA algorithms when estimating the number of simulated topics. In an applied example, we performed DynEGA on a large dataset with Twitter posts from state-sponsored right- and left-wing trolls during the 2016 US presidential election. DynEGA revealed topics that were pertinent to several consequential events in the election cycle, demonstrating the coordinated effort of trolls capitalizing on current events in the USA. This example demonstrates the potential power of our approach for revealing temporally relevant information from qualitative text data.

Information

Type
Application Reviews and Case Studies
Creative Commons
Creative Common License - CCCreative Common License - BY
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Copyright
Copyright © 2021 The Author(s)
Figure 0

Table 1 Matrix D for individual ID1

Figure 1

Figure. 1 Mean accuracy per network method used in the DynEGA technique, magnitude of the loadings (x-axis), number of time points (vertical facets), and magnitude of measurement error (horizontal facets)

Figure 2

Figure. 2 Mean normalized mutual information per network method used in the DynEGA technique, magnitude of the loadings (x-axis), number of time points (vertical facets), and magnitude of measurement error (horizontal facets)

Figure 3

Figure. 3 Mean correlation between simulated and estimated topic scores per network method used in the DynEGA technique, magnitude of the loadings (x-axis), number of time points (vertical facets), and magnitude of measurement error (horizontal facets)

Figure 4

Figure. 4 Mean accuracy per method in the ordered categorical data condition. Magnitude of the loadings (x-axis), number of time points (vertical facets), and magnitude of measurement error (horizontal facets)

Figure 5

Figure. 5 Mean normalized mutual information per network method used in the DynEGA technique, magnitude of the loadings (x-axis), number of time points (vertical facets), and magnitude of measurement error (horizontal facets)

Figure 6

Figure. 6 Mean correlation between simulated and estimated topic scores for the ordered categorical data condition per network method used in the DynEGA technique, magnitude of the loadings (x-axis), number of time points (vertical facets), and magnitude of measurement error (horizontal facets)

Figure 7

Figure. 7 Mean accuracy per number of embedded dimensions, method, magnitude of the loadings (x-axis), number of embedded dimensions (vertical facets), and data type (horizontal facets)

Figure 8

Figure. 8 Mean normalized mutual information per number of embedded dimensions, method, magnitude of the loadings (x-axis), number of embedded dimensions (vertical facets), and data type (horizontal facets)

Figure 9

Figure. 9 Mean difference in accuracy and normalized mutual information between the structure with the lowest TEFI value and the maximum TEFI value per magnitude of the loadings (x-axis), network construction method (colors), and data type (horizontal facets)

Figure 10

Figure. 10 Network structure estimated using DynEGA of the right trolls document-term matrix showing eight topics (clusters). The nodes represent the words and the edges are the Pearson correlation of the words’ first-order derivatives

Figure 11

Figure. 11 Latent trends of the topics per date

Figure 12

Figure. 12 Latent trends of topic E (supporting Trump for President) and topic I (Terrorism, Refugees, Conservatives)

Figure 13

Figure. 13 Network structure estimated using DynEGA of the first-order topics. The nodes represent the network scores for each topic, and the edges are the Pearson correlation of scores’ first-order derivatives