Hostname: page-component-5db58dd55d-f6s65 Total loading time: 0 Render date: 2026-06-01T09:55:12.851Z Has data issue: false hasContentIssue false

Decolonizing Archival Narratives: Exploring Digital Bias in the Catalogs of Portuguese-Colonized African Territories

Published online by Cambridge University Press:  14 October 2025

Agata Błoch*
Affiliation:
Tadeusz Manteuffel Institute of History, Polish Academy of Sciences, Poland
Guillem Martos Oms
Affiliation:
University of Barcelona, Spain
Clodomir Santana
Affiliation:
Tadeusz Manteuffel Institute of History, Polish Academy of Sciences and University of California Davis, USA
*
Corresponding author: Agata Błoch; Email: agata.natalia.bloch@gmail.com
Rights & Permissions [Opens in a new window]

Abstract

This study discusses the intersection between Black/African Digital Humanities, and computational methods, including natural language processing (NLP) and generative artificial intelligence (AI). We have structured the narrative around four critical themes: biases in colonial archives; postcolonial digitization; linguistic and representational inequalities in Lusophone digital content; and technical limitations of AI models when applied to the archival records from Portuguese-colonized African territories (1640–1822). Through three case studies relating to the Africana Collection at the Arquivo Histórico Ultramarino, the Dembos Collection, and Sebestyén’s Caculo Cangola Collection, we demonstrate the infrastructural biases inherent in contemporary computational tools. This begins with the systematic underrepresentation of African archives in global digitization efforts and ends with biased AI models that have not been trained on African historical corpora.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press.
Figure 0

Figure 1. Examples of entities recognized in texts with vocabulary specific from Dembos Collection using a pretrained general purpose model. The green, red, and gray boxes indicate, respectively, correct, incorrect, and missed entities. The original text can be translated as “Letters addressed to soba Ngolla Tumba, to Dembo Bulo Atumba, to Mane Quilé Quissamba, and to D. Paulo Domingos.”

Figure 1

Figure 2. Examples of entities recognized in texts with vocabulary specific from Dembos Collection using a model trained on the data from the Portuguese Overseas Archive. Note: (a) shows the original text; and (b) illustrates the same text but with African vocabulary replaced by their equivalent in European Portuguese. The green, red, and gray boxes indicate, respectively, correct, incorrect, and missed entities. The original text can be translated as “Letters addressed to soba Ngolla Tumba, to Dembo Bulo Atumba, to Mane Quilé Quissamba, and to D. Paulo Domingos.”

Figure 2

Figure 3. Example of a network of entities and other significant words connected to them. Note: The highlighted blue nodes are entities linked to “soba” which would not be properly identified due to the issues with the NER model.

Figure 3

Figure 4. Extraction of named entities from historical Dembos collection using LLMs. Note: (a), (b), and (c) present, respectively, the results of ChatGPT, Gemini, and Meta AI, three representatives of the state-of-the-art LLMs architectures. Notice that all models identified the entities but gave them the incorrect label of “Person.”

Figure 4

Figure 5. LLMs terms knowledge assessment. Note: Level 1 (Language Context) represents the most basic context you can provide to LLMs and expect meaningful results about the meaning of a term. In Level 2, besides the language, we also offer an example of text where the term appears. Lastly, in Level 3, we also provide the historical and geographic context of the text where the term appears.

Figure 5

Table 1. Experiments assessing the LLMs’ knowledge of the meaning of some terms originated from Kimbumdo in Sebestyén’s Caculo Cangola Collection