Skip to main content Accessibility help
×
Hostname: page-component-89b8bd64d-9prln Total loading time: 0 Render date: 2026-05-06T10:14:14.628Z Has data issue: false hasContentIssue false

AI and Image

Critical Perspectives on the Application of Technology on Art and Cultural Heritage

Published online by Cambridge University Press:  10 September 2025

Anna Foka
Affiliation:
Uppsala University
Jan von Bonsdorff
Affiliation:
Uppsala University

Summary

AI and Image illustrates the importance of critical perspectives in the study of AI and its application to image collections in the art and heritage sector. The authors' approach is that such entanglements of image and AI are neither dystopian or utopian but may amplify, reduce or condense existing societal inequalities depending on how they may be implemented in relation to human expertise and sensibility in terms of diversity and inclusion. The Element further discusses regulations around the use of AI for such cultural datasets as they touch upon legalities, regulations and ethics. In the conclusion they emphasise the importance of the professional expert factor in the entanglements of AI and images and advocate for a continuous and renegotiating professional symbiosis between human and machines. This title is also available as Open Access on Cambridge Core.

Information

Figure 0

Figure 1 The distinctions and overlaps between computer vision and machine vision. Computer vision (left) focuses on algorithmic processing of images for applications like facial recognition and object detection, primarily in software-driven environments. Machine vision (right) is tailored for industrial and manufacturing purposes. The overlapping area highlights shared techniques such as image processing and the use of AI tools, which are central to both fields.Figure 1 long description.

Illustration: J. v. Bonsdorff.
Figure 1

Figure 2 A generic diagram from Ferdinand de Saussure’s Cours de linguistique générale illustrating the relationship between signified (French Signifié) and signifier (French Signifiant). Ferdinand de Saussure, Cours de linguistique générale, Paris 1922 (2nd ed.), p. 158.

Figure 2

Figure 3 An example of image interpretation, from materiality to denotation, cultural implications, social and cultural context. The objets trouvés (lit. a natural or discarded object found by chance and held to have aesthetic value) of Picasso’s handlebars and saddle resembles a bull, immediately alluding to contexts like male virility or Modernism.Figure 3 long description.

Illustration: J. v. Bonsdorff.
Figure 3

Table 1a The chart exemplifies a selection of image frameworks used by early twentieth-century art historians

Figure 4

Table 1b The chart exemplifies a selection of image frameworks used by mid twentieth-century scholars.

Figure 5

Table 1c The chart exemplifies a selection of image frameworks used by late twentieth century to early twenty-first century scholars

Figure 6

Table 2 Charting the Uses of AI and Heritage, The author’s own charting of AI implementation for the heritage sector, based on the Commission study on Opportunities, and challenges of artificial intelligence technologies for the cultural and creative sectors, 2022. Accessible at https://www.europarl.europa.eu/thinktank/en/document/EPRS_BRI(2023)747120

Figure 7

Figure 4 Possibilities of using AI and computer vision as for now (2024).Figure 4 long description.

Illustration: J. v. Bonsdorff.
Figure 8

Figure 5 Estimate of scientific effort put into AI and machine learning since the 1950s. This figure presupposes a general field of AI outside machine learning and deep learning. This general category includes early AI disciplines (1950s to 1980s) such as expert systems, logic programming, robotics, search algorithms, NLP (pre-deep learning techniques), knowledge representation, and more. Machine learning expands from the mid-1980s to 2000s and deep learning emerges around 2010, now being prominent in research.

Illustration: J. v. Bonsdorff.
Figure 9

Figure 6 A timeline of landmarks for the technical development of computer vision 2006–2024.Figure 6 long description.

Illustration: J. v. Bonsdorff.
Figure 10

Figure 7 Types of AI algorithms and models categorised by primary functions and underlying technologies.Figure 7 long description.

Illustration: J. v. Bonsdorff.
Figure 11

Figure 8 Key components for text-to-image generation. These steps outline the journey from prompt submission to image output, illustrating an AI’s capability to interpret textual descriptions and translate them into visual representations.Figure 8 long description.

Illustration: J. v. Bonsdorff.
Figure 12

Figure 9 A Screenshot of Google Arts and Culture depicting the Peplos Kore.

Figure 13

Figure 10 DALL·E 2 2023 generative art created with the prompt: ‘a photorealistic image of an archaic Kore’. This image does not correspond to reality.

Figure 14

Figure 11 Multimodal reasoning: In the world’s first film on art from 1919, art historian Johnny Roosval points out features on the large mediaeval equestrian wooden sculpture of St. George in Storkyrkan, Stockholm. Still from the film S:t Göran och draken (1919). KB SF2415.

Figure 15

Figure 12 The difference between unimodality (or monomodality) and multimodality. Unimodality implies one clear mode of communication (e.g. ‘text’). Multimodality integrates different modes (e.g. ‘text’, ‘image’, etc.), implying a richness of communication (but possibly a loss of clarity). Even if many modes may signify the same thing, they never fully overlap.Figure 12 long description.

Illustration: J. v. Bonsdorff.
Figure 16

Figure 13(a) Depicts the traditional lexicon, where the root consists of textual head words.

Figure 17

Figure 13(b) Depicts a truly multimodal (or cross-modal) lexicon, where any modality can serve as an entry. Search terms can be formulated in other modalities than text. Cross-connections between all modalities are possible.

Illustration: J. v. Bonsdorff.
Figure 18

Figure 14 St. George and the Dragon, Storkyrkan in Stockholm, inaugurated 1489, by Bernt Notke from Lübeck.

Photo: Wikipedia Commons.
Figure 19

Figure 15 By proactively preparing ethical parameters and addressing legal and ethical concerns, GLAM institutions can leverage the benefits of AI while upholding public trust, professional ethics, and responsible data stewardship.Figure 15 long description.

Illustration: J. v. Bonsdorff.
Figure 20

Figure 16 For the AI to function effectively, well-structured and well-annotated data is paramount. These are some guidelines for enhancing image interpretation and engagement for machine learning.Figure 16 long description.

Illustration: J. v. Bonsdorff.

Save element to Kindle

To save this element to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

AI and Image
Available formats
×

Save element to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

AI and Image
Available formats
×

Save element to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

AI and Image
Available formats
×