Hostname: page-component-6766d58669-6mz5d Total loading time: 0 Render date: 2026-05-16T14:55:50.103Z Has data issue: false hasContentIssue false

Face Detection, Tracking, and Classification from Large-Scale News Archives for Analysis of Key Political Figures

Published online by Cambridge University Press:  06 November 2023

Andreu Girbau*
Affiliation:
Digital Content and Media Sciences Research Division, National Institute of Informatics, Tokyo, Japan
Tetsuro Kobayashi*
Affiliation:
School of Political Science and Economics, Waseda University, Tokyo, Japan
Benjamin Renoust
Affiliation:
Institute for Datability Science, Osaka University, Osaka, Japan
Yusuke Matsui
Affiliation:
Department of Information and Communication Engineering, The University of Tokyo, Tokyo, Japan
Shin’ichi Satoh
Affiliation:
Digital Content and Media Sciences Research Division, National Institute of Informatics, Tokyo, Japan Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
*
Corresponding authors: Andreu Girbau and Tetsuro Kobayashi; Email: agirbau@nii.ac.jp, tkobayas@waseda.jp
Corresponding authors: Andreu Girbau and Tetsuro Kobayashi; Email: agirbau@nii.ac.jp, tkobayas@waseda.jp
Rights & Permissions [Opens in a new window]

Abstract

Analyzing the appearances of political figures in large-scale news archives is increasingly important with the growing availability of large-scale news archives and developments in computer vision. We present a deep learning-based method combining face detection, tracking, and classification, which is particularly unique because it does not require any re-training when targeting new individuals. Users can feed only a few images of target individuals to reliably detect, track, and classify them. Extensive validation of prominent political figures in two news archives spanning 10 to 20 years, one containing three U.S. cable news and the other including two major Japanese news programs, consistently shows high performance and flexibility of the proposed method. The codes are made readily available to the public.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of the Society for Political Methodology
Figure 0

Table 1 Target individuals for the U.S. TV channels CNN, FOX, and MSNBC.

Figure 1

Table 2 Target individuals for the Japanese TV news programs NHK News 7 and HODO station.

Figure 2

Figure 1 Example of publicly available downloaded images for the analysis. Left: The 53 target individuals in the U.S. TV. Right: The 41 politicians to analyze in Japanese TV.

Figure 3

Figure 2 Pipeline for the first stage.

Figure 4

Figure 3 Representation of template embedding extraction, tracklet clustering, and tracklet ID assignment by voting.

Figure 5

Figure 4 Annotation examples for U.S. TV and Japanese TV annotations. The U.S. TV dataset consists of a single face annotation per frame, whereas in the Japanese TV dataset, all individuals of interest are annotated in every frame. Green bounding boxes indicate the annotated ground truth data.

Figure 6

Table 3 Missed detections percentage per TV channel over the three different face detectors DFSD, MTCNN, and YOLO-face.

Figure 7

Table 4 Evaluation of our method over the three U.S. TV cable news channels.

Figure 8

Table 5 Comparison of our method between two Japanese TV programs, NHK News 7 and Hodo Station, for different detectors and classifiers.

Figure 9

Figure 5 F-score and mAP evaluation for different face sizes on the three channels of the U.S. dataset. As ground truth annotations in the U.S. dataset are not tight to the faces, they produce a IoU misalignment with predictions. To compute the mAP, we modify the IoU sweep threshold to [$0.4$, …$0.6$] with an increase of $0.05$ per step.

Figure 10

Figure 6 F-score and mAP evaluation for different face sizes on the two channels of the Japanese dataset. Here, we follow the COCO standard procedure of computing mAP, with the IoU sweep threshold as [$0.5$, …$0.95$] with an increase of $0.05$ per step.

Figure 11

Table 6 Performance with or without face tracking for different detectors.

Figure 12

Table 7 Screen time of major U.S. candidates over the 847 randomly sampled videos for the 2016 general elections.

Figure 13

Figure 7 Share of screen time of incumbent party leaders among all leaders and its NHK/HODO ratio. The three vertical solid lines represent the timing of the House of Representatives elections, whereas the dashed lines indicate the timing of the change of prime ministers.

Supplementary material: Link

Girbau et al. Dataset

Link
Supplementary material: PDF

Girbau et al. supplementary material

Girbau et al. supplementary material

Download Girbau et al. supplementary material(PDF)
PDF 9.2 MB