Hostname: page-component-5db58dd55d-ggg9q Total loading time: 0 Render date: 2026-06-01T09:00:43.281Z Has data issue: false hasContentIssue false

Enhancing joint ideal point estimation strategies with Twitter social networks and text data

Published online by Cambridge University Press:  28 May 2026

Yicheng Shen
Affiliation:
Biostatistics, Johns Hopkins University, USA
Christopher Bail
Affiliation:
Sociology, Duke University, USA
Alexander Volfovsky*
Affiliation:
Statistical Science, Duke University, USA
*
Corresponding author: Alexander Volfovsky; Email: alexander.volfovsky@duke.edu
Rights & Permissions [Opens in a new window]

Abstract

Analyses of online social interactions on major platforms have become central to evaluating the ideological leanings of users. Social scientists often represent individuals’ policy preferences as ideal points in a low-dimensional space. We first demonstrate that standard ideal-point estimation approaches can be temporally inconsistent by examining the evolution of ideological positions for political elites and non-elite Twitter users from 2017 to 2022. To address this instability and to improve overall measurement quality, we augment the social-network data conventionally used for ideal-point estimation with unstructured text data widely available on social media. Our results indicate that this data-augmented approach improves identification at the extreme ends of the ideological spectrum and yields robust estimates consistent with those from more complex word-embedding techniques, while offering greater computational efficiency and interpretability. This framework can enhance future studies that measure political ideology using behavioral data and inform broader discussions about integrating textual and network information.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (https://creativecommons.org/licenses/by-nc-sa/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited. The written permission of Cambridge University Press or the rights holder(s) must be obtained prior to any commercial use.
Copyright
© The Author(s), 2026. Published by Cambridge University Press
Figure 0

Figure 1. The ideological spectrum often serves as a tool for understanding and categorizing political positions. Usually, it is more difficult to accurately estimate individuals who either have sparse data and are ambivalent about their political beliefs or, conversely, have popular connections from both sides and hold extreme political beliefs.

Figure 1

Algorithm 1. Data-Augmented (DA) Ideal-Point Estimation

Figure 2

Figure 2. Simulation evidence supporting the data-augmented estimator. The top row shows all users, illustrating how adding more sources, from $\textbf{A}_1\textbf{A}_1^{\top }$ to $\sum _{s=1}^{10}\textbf{A}_s\textbf{A}_s^{\top }$, improves alignment with the true latent ideal points $\textbf{U}$. The bottom row focuses on the upper tail (the 25 most extreme users), showing that the aggregated estimator better preserves the ordering of ideological extremes (highlighted in red).

Figure 3

Figure 3. Point estimations of 704 participants’ ideal points using the CA method with different reference data. Points are colored according to users’ self-reported party affiliations (blue = Democrat, red = Republican).

Figure 4

Table 1. Selected elites’ twitter accounts with their ideal point estimates and number of followers taken based on network data collected in 2015 and 2022

Figure 5

Figure 4. Density plots of estimated ideal points for 933 elites’ Twitter accounts, grouped by their party affiliation. The CA estimator (blue) uses only network information; the word-embedding estimator (yellow) uses word2vec-derived textual embeddings only; and the DA estimator (pink) integrates network and text data. The DA method produces the clearest separation between Democratic and Republican elites, indicating improved identification of ideological extremes.

Figure 6

Figure 5. Estimated ideal points of political elites before and after adjusting for text data, with their true political alignment marked in colors. Some well-known elites with significant changes to their estimated ideal points after applying the new method are named and highlighted. Those with extreme ideologies are more distinct under the new estimation method while CA estimation does not clearly identify true extremes.

Figure 7

Figure 6. Estimated participants’ ideal points by the CA method. Ideal points are generally consistent with party alignments reported by users during the experiment.

Figure 8

Figure 7. Estimated participants’ ideal points by the DA method at various time points. Distributions remained stable during 2017, but experienced notable changes in 2022, as circled boxplots highlight that ideal points of weakly aligned Republicans spread across both positive and negative values.

Figure 9

Figure 8. Estimations of changes in ideal points for Democrats and Republicans over time, in contrast to theoretical, no-treatment-effect changes if the parallel trend assumption holds.

Figure 10

Figure 9. Age distribution across non-estimable, missing, and analyzed participants.

Figure 11

Figure 10. Illustration of non-estimable, missing, and analyzed participants. (a) Party identification, (b) Party strength, (c) Twitter activity, (d) Twitter usage.

Figure 12

Figure 11. Full simulation results examining the use of different eigenvectors when extracting multiple sources of characteristics.

Figure 13

Figure 12. Density plots of all 704 participants’ ideal points estimated using the correspondence analysis method with different reference data of elites taken in 2017 and 2022.

Figure 14

Figure 13. Two-dimensional t-SNE projection of user-level TF-IDF–SVD embeddings. The projection reveals clear clustering by political alignment, indicating that textual features learned from users’ tweet content encode meaningful ideological signals.

Figure 15

Figure 14. Lexical polarization by extremity. Bars show the top words (by magnitude of the lexical polarization index) that distinguish extremists (top 10%) from moderates (bottom 50%). Extremist discourse concentrates on partisan media and identity-laden topics (e.g., FBI, Donald, FOX, GOP), while moderate discourse emphasizes institutional and policy language (e.g., communities, veterans, bipartisan, legislation, infrastructure).

Figure 16

Figure 15. Text–network coupling by ideological extremity. Each box shows the distribution of per-elite correlations between text-based and network-based similarity matrices. Higher values indicate stronger alignment between linguistic and structural neighborhoods. Moderates display the highest coupling, suggesting that their language reflects broader network connections, whereas extremists show weaker coupling, indicating discourse that diverges from network ties and contributes to greater ideological differentiation.