Hostname: page-component-6766d58669-kn6lq Total loading time: 0 Render date: 2026-05-17T21:18:35.637Z Has data issue: false hasContentIssue false

A Framework for the Unsupervised and Semi-Supervised Analysis of Visual Frames

Published online by Cambridge University Press:  23 October 2023

Michelle Torres*
Affiliation:
Assistant Professor, Department of Political Science, University of California, Los Angeles, Los Angeles, CA, USA.
Rights & Permissions [Opens in a new window]

Abstract

This article introduces to political science a framework to analyze the content of visual material through unsupervised and semi-supervised methods. It details the implementation of a tool from the computer vision field, the Bag of Visual Words (BoVW), for the definition and extraction of “tokens” that allow researchers to build an Image-Visual Word Matrix which emulates the Document-Term matrix in text analysis. This reduction technique is the basis for several tools familiar to social scientists, such as topic models, that permit exploratory, and semi-supervised analysis of images. The framework has gains in transparency, interpretability, and inclusion of domain knowledge with respect to other deep learning techniques. I illustrate the scope of the BoVW by conducting a novel visual structural topic model which focuses substantively on the identification of visual frames from the pictures of the migrant caravan from Central America.

Information

Type
Article
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of the Society for Political Methodology
Figure 0

Figure 1 Workflow for building an Image-Visual Word Matrix.

Figure 1

Figure 2 Location of key points.AP Photo/Ramon Espinosa.

Figure 2

Figure 3 Computing pixel intensity changes in the neighborhood of a key point.

Figure 3

Figure 4 Representation of the neighborhood of the key point with histograms.Note: The x-axis in each of the histogram plots of (b) represents the angles of the gradients in each cell of (a). The angles are generally in the range of [0, 180]. This range is binned into eight groups: a bar in each of the histograms.

Figure 4

Figure 5 Creating the visual vocabulary: clustering and centroids.

Figure 5

Figure 6 Examples of visual words.

Figure 6

Figure 7 Comparison of different proportions of a crowd in an image.Note: (a) By Sandra Cuffe/Al Jazeera; (b) By Jesús Alvarado.

Figure 7

Figure 8 FREX visual words per topic.Note: The numbers of the topics in the replication file are 1, 4, 6, 9, 11, and 13.

Figure 8

Figure 9 Most representative images per topic.Note: Photo credits in the Supplementary Material.

Figure 9

Figure 10 Identification of crowds and distribution of “crowd” proportions.Note: The “No crowd” and “Crowd” labels are hand-coded. The density curves show the distribution of the topic “all crowd” (Dense Crowd + Outdoor crowds + People walking + Medium sized crowd) in each group.

Figure 10

Figure 11 Crowd topic by media outlet.Note: Each point represents the mean “crowd” topic proportion among the images of each of the outlets in the sample. The points are ordered from lowest to highest proportion of topic “crowd.” Colors indicate the ideological slant of the outlet.

Figure 11

Figure 12 Ideological leanings and portrayal of crowds.Note: Each point represents the mean “crowd” topic proportion among the images published by media outlets in each of the ideological bias categories. Brackets indicate the differences between a few groups, and the $*$ indicates that the 95% confidence interval of the difference does not cover 0.

Figure 12

Table 1 Decisions and hyperparameter tuning when building BoVW.

Figure 13

Figure 13 Visualizing mistakes.

Supplementary material: PDF

Torres supplementary material

Torres supplementary material

Download Torres supplementary material(PDF)
PDF 12 MB
Supplementary material: Link

Torres Dataset

Link