Skip to main content Accessibility help
×
Hostname: page-component-5db58dd55d-688nx Total loading time: 0 Render date: 2026-05-31T12:21:58.785Z Has data issue: false hasContentIssue false

Automatic Image Tagging for Corpus Linguistics

A Multimodal Study of News Representations of Islam

Published online by Cambridge University Press:  02 July 2025

Paul Baker
Affiliation:
Lancaster University and Zhejiang Gongshang University
Hanna Schmück
Affiliation:
Lancaster University
Yufang Qian
Affiliation:
Zhejiang Gongshang University

Summary

This Element reports on the creation and analysis of a 1.5-million-word corpus consisting of a year's worth of UK national press news articles about Islam and Muslims, published between December 2022 and November 2023. The corpus also contains 8,546 image files which have been automatically tagged using Google's Vertex AI. Analysis was carried out on three levels a) written text only, b) images only, c) interactions between written text and images. Using examples from the analyses, the authors demonstrate the affordances of these three approaches, providing a critical evaluation of Vertex AI's capabilities and the abilities of popular corpus software to work with visually tagged corpora. The Element acts as a practical guide for researchers who want to carry out this form of analysis. This title is also available as Open Access on Cambridge Core.

Information

Figure 0

Figure 1 Overview of timelines showing the number of words in the corpus per day and per newspaper.Figure 1 long description.

Figure 1

Figure 2 Number of occurrences of image tags.

Figure 2

Figure 3 Overview of timelines showing the number of images contained in the corpus per day and per newspaper.

Figure 3

Figure 4 Boxplot showing the number of tags per image for each of the nine newspapers.Figure 4 long description.

Figure 4

Figure 5 Image Tag Explorer.Figure 5 long description.

Figure 5

Figure 6 Muslims at prayer.

Image from the Daily Mail, 10 April 2024.
Figure 6

Figure 7 Image of Ivana Knoll from the Daily Star (8 December 2022), tagged Smile, Brassiere, Waist, Undergarment, Trunk, Navel, Chest, Underpants, HumanLeg, Lingerie, Cheerleading, CheerleadingUniform, Bikini, Sports, CompetitionEvent, Uniform and PublicEvent.

Figure 7

Figure 8 Image in the Express (3 November 2023) showing past, present and future British prime ministers at a Remembrance Day event in London, tagged with Crowd, Black, Tie and Cap.Figure 8 long description.

Figure 8

Figure 9 Image in the Express (7 November 2023) of a pro-Palestine march in London in an article about Remembrance Day, tagged with Crowd, Face, Cap, Human, Jacket and PublicEvent.Figure 9 long description.

Figure 9

Figure 10 Image in the Independent (26 June 2023) of Saima Razzaq, the first Muslim woman to lead a LGBTQ+ pride march in Britain.Figure 10 long description.

Figure 10

Figure 11 A Muslim family smiling during Ramadan in the Mirror (21 February 2023). Image tags: Food, Smile, Tableware, Table, Sharing, Plate, Cuisine, Window, Cooking, Dish, Hat and Room.

Save element to Kindle

To save this element to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Automatic Image Tagging for Corpus Linguistics
  • Paul Baker, Lancaster University and Zhejiang Gongshang University, Hanna Schmück, Lancaster University, Yufang Qian, Zhejiang Gongshang University
  • Online ISBN: 9781009581233
Available formats
×

Save element to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Automatic Image Tagging for Corpus Linguistics
  • Paul Baker, Lancaster University and Zhejiang Gongshang University, Hanna Schmück, Lancaster University, Yufang Qian, Zhejiang Gongshang University
  • Online ISBN: 9781009581233
Available formats
×

Save element to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Automatic Image Tagging for Corpus Linguistics
  • Paul Baker, Lancaster University and Zhejiang Gongshang University, Hanna Schmück, Lancaster University, Yufang Qian, Zhejiang Gongshang University
  • Online ISBN: 9781009581233
Available formats
×