Hostname: page-component-89b8bd64d-4ws75 Total loading time: 0 Render date: 2026-05-07T10:26:18.488Z Has data issue: false hasContentIssue false

Estimating the Ideology of Political YouTube Videos

Published online by Cambridge University Press:  13 February 2024

Angela Lai*
Affiliation:
Center for Social Media and Politics, New York University, New York, NY, USA Center for Data Science, New York University, New York, NY, USA
Megan A. Brown
Affiliation:
Center for Social Media and Politics, New York University, New York, NY, USA School of Information, University of Michigan, Ann Arbor, MI, USA
James Bisbee
Affiliation:
Center for Social Media and Politics, New York University, New York, NY, USA
Joshua A. Tucker
Affiliation:
Center for Social Media and Politics, New York University, New York, NY, USA Politics Department, New York University, New York, NY, USA
Jonathan Nagler
Affiliation:
Center for Social Media and Politics, New York University, New York, NY, USA Politics Department, New York University, New York, NY, USA
Richard Bonneau
Affiliation:
Center for Social Media and Politics, New York University, New York, NY, USA Center for Data Science, New York University, New York, NY, USA Computer Science Department, New York University, New York, NY, USA Department of Biology, New York University, New York, NY, USA
*
Corresponding author: Angela Lai; Email: csmap@nyu.edu
Rights & Permissions [Opens in a new window]

Abstract

We present a method for estimating the ideology of political YouTube videos. The subfield of estimating ideology as a latent variable has often focused on traditional actors such as legislators, while more recent work has used social media data to estimate the ideology of ordinary users, political elites, and media sources. We build on this work to estimate the ideology of a political YouTube video. First, we start with a matrix of political Reddit posts linking to YouTube videos and apply correspondence analysis to place those videos in an ideological space. Second, we train a language model with those estimated ideologies as training labels, enabling us to estimate the ideologies of videos not posted on Reddit. These predicted ideologies are then validated against human labels. We demonstrate the utility of this method by applying it to the watch histories of survey respondents to evaluate the prevalence of echo chambers on YouTube in addition to the association between video ideology and viewer engagement. Our approach gives video-level scores based only on supplied text metadata, is scalable, and can be easily adjusted to account for changes in the ideological landscape.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of The Society for Political Methodology
Figure 0

Figure 1 A schematic of our overall method for ideology estimation from cross platform links.

Figure 1

Figure 2 Subreddits arranged from most liberal (left) to most conservative (right) according to their ideology scores calculated via correspondence analysis. Subreddits are sized by the number of YouTube videos posted in that subreddit in our data set.

Figure 2

Figure 3 (a) For each channel in our data set labeled by Hosseinmardi et al. (2020), we calculate the mean of its videos’ ideology scores and plot it under the corresponding channel label. (b) For each of the five channel label categories, we calculate the mean number of views for channels. Then, for each label, we take the three channels with the highest number of mean views and at least 50 videos in our data set and plot the corresponding box plots for a total of 15 plots.

Figure 3

Figure 4 The text model scores perform similarly to the correspondence analysis scores (see the Supplementary Material) when compared to human labeled data. In (a), the score distance is the absolute value of the difference between the ideology scores of two videos. Percent agreement is the percentage of labeled video pairs where the ideology scores aligned with the label and is calculated for videos falling within each score distance bin. In (b), each point is a labeled video pair, where the x-coordinate represents the score distance and the binary y-coordinate is whether the ideology scores of the videos agreed with the human label. We fit a probit regression to these points and find that it trends upward, increasing with score distance.

Figure 4

Figure 5 The average of the ideological bin placements by human coders versus text-based ideology scores for videos where coders do not have significant cross-aisle disagreement. A local polynomial regression fitting is plotted on top of the points to show the overall trend of the points.

Figure 5

Figure 6 (a) We show an ideology distribution of videos viewed by Republicans, Independents, and Democrats with overlaid box plots indicated the IQR and median. (b) For respondents who viewed at least five political videos, we plot the median of the ideologies of the videos in their watch history. Lines denote the IQR of those ideologies. Note that for both of these figures, we removed duplicates at the respondent-video level since some respondents watched the same video more than once.

Figure 6

Figure 7 We plot the ideology score of political videos from users’ watch histories against metrics of popularity and engagement. Generalized additive models are fit to the data and plotted on top of the data points.

Supplementary material: File

Lai et al. supplementary material

Lai et al. supplementary material
Download Lai et al. supplementary material(File)
File 5.3 MB