Latest volume | Computational Humanities Research

Cognitive stylometry: A computational study of defamiliarization in modern Chinese
Part of:
- CHR Expanding the Toolkit: Large Language Models in Humanities Research
Maciej Kurzynski
Published online by Cambridge University Press:

05 December 2025, e1
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Autoregressive language models generate text by predicting the next word from the preceding context. The regularities internalized from specific training data make this mechanism a useful proxy for historically situated readerly expectations, reflecting what earlier linguistic communities would find probable or meaningful. In this article, I pre-train a GPT model (223M parameters) on a broad corpus of Chinese texts (FineWeb Edu Chinese V2.1) and fine-tune it on the collected writings of Mao Zedong (1893–1976) to simulate the evolving linguistic landscape of post-1949 China. Identifying token sequences with the sharpest drops in perplexity – a measure of the model’s surprise – reveals the core phraseology of “Maospeak,” the militant language style that developed from Mao’s writings and pronouncements. A comparative analysis of modern Chinese fiction demonstrates how literature becomes unfamiliar to the fine-tuned model, generating perplexity spikes of increasing magnitude. The findings suggest a mechanism of attentional control: whereas propaganda backgrounds meaning through repetition (cognitive overfitting), literature foregrounds it through deviation (non-anomalous surprise). By visualizing token sequences as perplexity landscapes with peaks and valleys, the article reconceives style as a probabilistic phenomenon and showcases the potential of “cognitive stylometry” for literary theory and close reading .

Relational arcs as narrative structure: Dynamics, distribution and diachronic change in fiction
Part of:
- CHR Computational Narratology
Despina Christou, Grigorios Tsoumakas
Published online by Cambridge University Press:

19 December 2025, e2
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Relationships between characters are not just themes in a story but key elements that shape how plots unfold. This article presents a large-scale study of relational arcs, the trajectories of ties, such as kinship, romance, alliance and enmity as they rise and fall across the course of a novel. We build on the Artificial Relationships in Fiction dataset, which contains over 120,000 automatically annotated relationships from 96 novels published between 1850 and 1950. Our study makes four contributions. First, we show that relationship dynamics can be modeled as arcs that highlight recurring narrative patterns, such as conflicts peaking near the climax or romances resolving toward the end. Second, we use temporal normalization to compare books of very different lengths, allowing us to identify consistent trends across the corpus. Third, we demonstrate that genres and historical periods leave clear relational “fingerprints.” For instance, domestic fiction emphasizes family ties, while adventure stories highlight shifting alliances and adversaries. Finally, we cluster arcs into four common shapes (Rise, U-shape, Decline and Oscillating) that echo well-known narrative prototypes. By bringing narratology together with modern natural language processing, we argue that relationships provide a measurable grammar of plot. This approach offers new resources for literary analysis, new methods for computational modeling of narrative, and fresh evidence about how cultural storytelling patterns change over time.

Fine-tuning large-language models for early modern Dutch translation
Gavin Lip, Victor de Boer, Arno Bosse, David Grantsaan
Published online by Cambridge University Press:

30 January 2026, e3
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Large-language models (LLMs) have transformed natural language processing and opened new possibilities for the computational social sciences and digital humanities. Yet translating historical sources remains difficult because early modern varieties are scarcely represented in contemporary training corpora and because standard tokenizers fragment their non-standard orthography. This article tackles these gaps by adapting open LLMs to early modern Dutch-to-English translation and advances two concrete contributions: (i) a memory-efficient fine-tuning workflow that runs on a single consumer GPU, comparing order-reward policy optimization with the Unsloth supervised fine-tuning approach and (ii) a verifiable evaluation protocol that combines embedding-based metrics with systematic expert review. Experiments on testimonial texts (1680–1792) show that fine-tuning choice decisively shapes quality: the Unsloth-tuned Mistral model attains the highest BERTScore and METEOR values and most faithfully preserves historical nuance. The framework supports a collaborative workflow where machine-generated drafts accelerate expert translation, making archival texts more accessible while maintaining scholarly oversight through domain-expert validation.

The narrative function of ending speech and hermeneutic complexity in Aesopic fables: A computational analysis of 600-fable corpus
Sukhwan Jung, Hochang Kwon
Published online by Cambridge University Press:

11 February 2026, e4
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
The ending speech in Aesopic fables, where stories conclude with direct utterances from characters, is not merely a didactic tool but a crucial narrative device constructing hermeneutic complexity. This study systematically examines the narrative function of ending speech through computational analysis of 600 Aesopic fables from Laura Gibbs’ edition. We quantitatively analyzed the complex relationships between ending speech, story content, explicit morals and speaker identity using natural language processing techniques. The analysis reveals three key findings. First, the average similarity of ending speeches (0.1820) is significantly lower than that of stories (0.3578), confirming that ending speech forms a unique semantic domain rather than serving as a simple summary of the narrative. Latent Dirichlet allocation analysis also shows that ending speeches are differentiated into 13 topics, displaying a more complex structure than stories (seven topics). Second, we found that ending speech constitutes a distinct narrative domain from epimythium, with an overwhelming ratio of their relationships being either independent (76.8%) or tensional (21.4%). This indicates that the ending speech is a narrative device that amplifies interpretive complexity, often clashing with the epimythium rather than reinforcing it. Third, 249 different ending speech speakers each represent unique voices and perspectives, with the frequency of utterances – fox (34 times), lion (19 times) and wolf (18 times) – demonstrating a value system in Aesopic fables where wisdom is prioritized over physical strength. These findings indicate that the ending speech establishes complex and sometimes tensional relationships with both story and epimythium, thereby transforming fables into “open work” that can be newly interpreted. This study provides empirical evidence for understanding Aesopic fables not as simple didactic tales but as complex narratives with structural features supporting polyphonic interpretation, demonstrating the potential of computational narratology.

From shots to narratives: Expanding multimodal approaches to filmic storytelling in the digital humanities
Chiao-I Tseng, Leandra Thiele, Bernhard Liebl, Eric Müller-Budack, John Bateman, Manuel Burghardt, Gullal Cheema, Ralph Ewerth
Published online by Cambridge University Press:

16 February 2026, e5
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
In recent years, digital humanities (DH) research has evolved from its textual origins to encompass film and video studies as critical areas of inquiry as well. Nevertheless, much of this research has remained tied to the formal levels of description most readily revealed by automatic processing. This maintains a gap between treatments in terms of formal technical features and the concerns of many researchers involved in film analysis of a more qualitative, interpretative nature, thereby reiterating the classic tension within DH as such: that is, how to relate levels of description that are “computable” and those more responsive to broader humanities-oriented interests. In this article, we set out an approach to this challenge that incorporates a multi-layered analytic framework capable of specifying increasingly abstract descriptions in terms of patterns at lower levels. This enables us to start bringing concerns of narrative organization and interpretation into analysis at scale. We set out the overall approach and show several examples of its use.

Undate: humanistic dates for computation – ERRATUM
Rebecca Sutton Koeser, Julia Damerow, Robert Casties, Cole Crawford
Published online by Cambridge University Press:

26 March 2026, e7
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation

Toward an ontological representation of fictional characters
Antoine Bourgois, Jean Barré, Olga Seminck, Thierry Poibeau
Published online by Cambridge University Press:

20 February 2026, e6
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Characters are central to narrative theory but remain under-specified in computational work, where they are often reduced to clusters of words or vectors. We propose an operationalizable ontology of characterization that bridges narratological theory and NLP. From BERT-based clustering of character descriptions, we derive 17 classes of attributes (actions, emotions, traits, relations, possessions, etc.), validated through manual annotation ($k = 0.77$) and automatic classification (64% accuracy vs. 12% baseline). Applied to character similarity tasks for French fiction, our framework outperforms existing models. By aligning narratological insights with computational methods, we move toward a representation of fictional characters as structured, comparable entities for large-scale literary analysis.

McK: Ontology for mapping semantic conflict in plot summary
Nicolas Chiappucci
Published online by Cambridge University Press:

24 March 2026, e8
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This article investigates the use of a neurosymbolic approach in the analysis of conflict within narrative discourse. Due to the difficulty of providing a precise conceptual definition, abstract elements such as conflict prove to be hard to analyze through computational systems. The retrieval of a neurosymbolic approach, through the combination of knowledge graphs and LLMs, can open new perspectives for the analysis of abstract elements typical of narrative discourse at the level of plot summary. Starting from the schematization of conflict elaborated by Robert McKee, an ontology was constructed and subsequently populated with the entities present in the screenplay of the film A Clockwork Orange. Afterward, through experimental validation, the approach was tested by means of a prompt injection of the ontology into the request. Through a comparative-qualitative approach, the experimentation considered the analysis of the narrative discourse first by combining synopsis (The use of the plot summary, instead of the screenplay, is connected to specific provisions concerning copyright law) and knowledge graph, and then relying only on the synopsis. In conclusion, the potentialities of the neurosymbolic approach are presented with regard to screenplay analysis, opening up the possibilities of the approach to a larger portion of text, such as the screenplay.

Operationalizing narrativized change in the political talk for computational recognition
Kirsi Sandberg, Mykola Andrushchenko, Mari Hatavara
Published online by Cambridge University Press:

24 March 2026, e9
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This article furthers the methodology for computational recognition of narratives in argumentative language use. Narratives are understood as a cognitive and rhetorical tool for making sense of change and the unexpected, as well as arguing for a point. Building on narrative theory and linguistic knowledge, this study operationalizes narrative as the linguistic portrayal of experienced change. Our data consist of Finnish parliamentary records (1980–2022). Agentive experientiality plays a vital role in political speech, where deliberation over different choices and outcomes takes place. Our methodology relies on identifying verbs that encode cognitive and emotional shifts – key signals of narrative experientiality – based on a tailored semantic resource. Using Deptreepy, a search tool based on dependency trees, these verb classes were systematically extracted from a pre-existing sample of 60 manually annotated plenary session transcripts, where the annotation marked narrative and non-narrative segments. This approach offers a method for identifying narratives in complex, rhetorically layered genres that is compatible with low-resource languages. Results show that particular semantic verb classes – especially those indicating mental and emotional change – serve as effective indicators of narrativity. The study contributes to both narrative theory and computational linguistics by demonstrating how semantic classification of verbs, rooted in linguistic and narratological theory, can yield a viable tool for extracting narratives in argumentative language use. It also highlights how experientiality is not only conveyed in the stories told but also embedded in the situation of the telling, often amplified through cognitive stance verbs that address the audience’s shared knowledge or memories. These findings suggest a dual layer of experiential engagement in parliamentary narratives, reinforcing their argumentative power.

Decoding the conqueror’s gaze: A computational approach to Ennio Flaiano’s (post)colonialism
Silvia Lilli, Daniel Raffini
Published online by Cambridge University Press:

24 March 2026, e10
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Ennio Flaiano’s Tempo di uccidere (1947) has long divided critics over whether it challenges or merely aestheticizes Italian colonialism. This study applies computational narratology methods to investigate this ambivalence through systematic analysis of narrative focalization. Through manual annotation of 5,793 text segments across four narrative categories (Narrator, Conqueror, Indigenous and Description) and their subcategories, we examine how quantitative analysis of focalization patterns illuminates Flaiano’s complex stance toward the Ethiopian colonial campaign. We combine multiple analytical methods with an exploratory bottom-up approach: qualitative analysis of lexical distribution, part-of-speech distribution analysis to identify grammatical signatures of each narrative category and their semantic implications and syntactic role analysis to examine agency patterns in character representation. Statistical testing confirms that narrative categories exhibit robust grammatical distinctions, validating the annotation schema. The analysis reveals a deliberate ambiguity: syntactic role analysis shows comparable levels of agentivity between Indigenous and Conqueror characters, contrasting with traditional colonial discourse where colonized subjects are typically represented as passive. Lexical analysis exposes asymmetries in how characters are individuated and how different types of knowledge are attributed to each group. Rather than confirming a straightforward colonial or anti-colonial position, the analysis reveals how Flaiano through narrative techniques, exploiting and subverting genre conventions, consciously deconstructs the propagandistic reassuring ideology. This study contributes both methodologically – by developing an annotation schema for narrative focalization in Italian prose that can be applied to other texts – and interpretively, by demonstrating how computational narratology can document ideological complexity, revealing patterns invisible to traditional close reading alone.

Introducing reproducible navigation of a web archive: SolrWayback navigation tracker
Victor Harbo Johnston
Published online by Cambridge University Press:

13 April 2026, e11
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Web archives are an exhaustive source for humanities research. They are, however, hard to navigate and research with material from web archives is often opaque as no existing software for exploring web archives provide researcher with the possibility to track their pathways around the archive. This article presents an extension of the Open-Source software SolrWayback, which provides researchers with a navigation tracking feature that supports a more reproducible and transparent methodology for documenting how a web archive collection has been explored as part of research. The functionality has been developed from a user- and test-driven approach, where the needs of contemporary historians have decided how the feature was implemented. This user-centered approach provides new functionality for a piece of software that has primarily been developed by archiving institutions.

A language modeling approach to identifying Russian information confrontation in Colombia
Part of:
- CHR Expanding the Toolkit: Large Language Models in Humanities Research
Joe Parson, Adriana Jaramillo
Published online by Cambridge University Press:

04 June 2026, e12
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This study integrates multilingual text analytics with time-series modeling to examine how geopolitical narratives emerging around the 2022 Russia–Ukraine conflict align across Russian and Colombian media, drawing on a corpus spanning 2013–2023 that also encompasses the 2014 conflict period. Drawing on a corpus of more than 38,000 Spanish- and Russian-language news articles (2013–2023), we employ a six-stage pipeline combining multilingual NER, document-level sentiment scoring, e5-large-instruct embeddings, BERTopic clustering, and a semi-automated human-LLM labeling workflow. Narrative salience and tone were aggregated to weekly, log-normalized series and analyzed using linear regression and Granger causality tests (lags 1–4 weeks), providing a descriptive view of temporal coupling rather than evidence of direct causal influence.
The period surrounding the 2022 invasion produced the clearest and most coherent patterns. Across four macro-narratives (Security & Conflict, Diplomacy, Economy, and Politics & Society), Russian and Colombian coverage exhibited sequential alignment in which shifts in narrative volume were often followed by shifts in evaluative tone. Thirty-two of forty-four target–topic pairs displayed significant lag structures prior to false-discovery correction ($p<0.05$), with 25 surviving Benjamini–Hochberg adjustment. Categories with lower geopolitical salience (Technology, Health, and Off-topic) displayed less consistent temporal coupling, though this contrast is offered as descriptive context rather than a formal control. While these results do not imply coordinated messaging, they indicate recurring forms of narrative mimicry, understood here as patterned convergence in framing and sentiment across media ecosystems.
By demonstrating how transformer-based semantic representations can be integrated with human-in-the-loop interpretation and classical time-series analysis, this study contributes a reproducible workflow for tracing narrative trajectories at scale. The approach provides a methodological foundation for examining how geopolitical frames circulate, mutate, and align across linguistic and regional boundaries, offering new possibilities for computational humanities research on transnational discourse.

Messy data, low-resource languages, and LLMs: Narrative analysis of pre-modern Slavic Lives of Saints
Achim Rabus, Alexander Ermakov, Iris Ferrazzo
Published online by Cambridge University Press:

12 May 2026, e13
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This study addresses the challenges of performing narratological analysis on low-resource languages, with a focus on Old Church Slavonic. Understanding the roles, interactions, and networks of persons is central to narrative analysis, yet such investigation is hindered by the scarcity of experts and the limited availability of annotated resources. We explore both established natural language processing (NLP) methods and large language models (LLMs) for analyzing pre-modern Slavic Lives of Saints, including several Slavic versions, the Greek original, and an English translation. Pre-modern Slavic texts pose particular difficulties due to rich morphology, orthographic variation, and limited standardization, which complicate the application of both traditional NLP tools and off-the-shelf LLMs. Through experiments using annotated and non-annotated ground truth data, we demonstrate that while conventional NLP methods often reach their limits on such low-resource, highly variable texts, LLMs provide complementary capabilities that can support narratological insights, especially in tracking persons and their interactions, albeit with important caveats regarding accuracy and coverage.

Misaligning stories: Narrative unreliability, double noise, and feedback ethics in clinical AI
Rosa E. Martín-Peña
Published online by Cambridge University Press:

01 June 2026, e14
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
A worker with cancer is dismissed after algorithmic dashboards misread her treatment fatigue as cognitive decline. A stroke prediction model achieves 95 percent accuracy while missing every single stroke case. These are not edge cases – they are structural failures, produced when algorithmic distortions intersect with the cognitive variability of human judgment in the absence of feedback. This article calls that convergence double noise. Drawing on Shannon–Weaver’s communication model, Kahneman and colleagues’ concept of noise, and narratological theory – unreliable narration, focalization, paralepsis, emplotment, and situatedness – the article argues that predictive failures are not only statistical but narrative: computational systems operating under epistemic constraint produce false stories that resist correction. In both cases, the absence of meaningful feedback channels turns local distortions into entrenched misjudgments: the first through algorithmic dashboards that freeze discontinuous behavioral signals into an irrevocable story of cognitive decline; the second through routine design decisions – class balancing, metric selection, and threshold setting – that normalize the erasure of clinically decisive false negatives. By integrating narratological theory with computational methods and epistemic critique, the article positions double noise as a central challenge for clinical AI and advances feedback ethics as a normative orientation, calling for systems that preserve ambiguity, enable contestation, and institutionalize shared judgment in high-risk environments.

Computational Humanities Research - Latest volume

Refine listing

Actions for selected content:

Volume 2 - 2026

Research Article

Cognitive stylometry: A computational study of defamiliarization in modern Chinese

Relational arcs as narrative structure: Dynamics, distribution and diachronic change in fiction

Fine-tuning large-language models for early modern Dutch translation

The narrative function of ending speech and hermeneutic complexity in Aesopic fables: A computational analysis of 600-fable corpus

From shots to narratives: Expanding multimodal approaches to filmic storytelling in the digital humanities

Erratum

Undate: humanistic dates for computation – ERRATUM

Research Article

Toward an ontological representation of fictional characters

McK: Ontology for mapping semantic conflict in plot summary

Operationalizing narrativized change in the political talk for computational recognition

Decoding the conqueror’s gaze: A computational approach to Ennio Flaiano’s (post)colonialism

Software Paper

Introducing reproducible navigation of a web archive: SolrWayback navigation tracker

Research Article

A language modeling approach to identifying Russian information confrontation in Colombia

Messy data, low-resource languages, and LLMs: Narrative analysis of pre-modern Slavic Lives of Saints

Misaligning stories: Narrative unreliability, double noise, and feedback ethics in clinical AI

Computational Humanities Research - Latest volume

Refine listing

Actions for selected content:

Save Search

Volume 2 - 2026

Research Article

Erratum

Research Article

Software Paper

Research Article