Hostname: page-component-5db58dd55d-d6ndz Total loading time: 0 Render date: 2026-05-30T09:11:13.344Z Has data issue: false hasContentIssue false

Fine-tuning large-language models for early modern Dutch translation

Published online by Cambridge University Press:  30 January 2026

Gavin Lip*
Affiliation:
Faculty of Science, Vrije Universiteit Amsterdam, Netherlands
Victor de Boer
Affiliation:
Vrije Universiteit Amsterdam, Netherlands
Arno Bosse
Affiliation:
HUC – Digital Infrastructure, KNAW, Netherlands
David Grantsaan
Affiliation:
Vrije Universiteit Amsterdam, Netherlands
*
Corresponding author: Gavin Lip; Email: lip.gavin@gmail.com
Rights & Permissions [Opens in a new window]

Abstract

Large-language models (LLMs) have transformed natural language processing and opened new possibilities for the computational social sciences and digital humanities. Yet translating historical sources remains difficult because early modern varieties are scarcely represented in contemporary training corpora and because standard tokenizers fragment their non-standard orthography. This article tackles these gaps by adapting open LLMs to early modern Dutch-to-English translation and advances two concrete contributions: (i) a memory-efficient fine-tuning workflow that runs on a single consumer GPU, comparing order-reward policy optimization with the Unsloth supervised fine-tuning approach and (ii) a verifiable evaluation protocol that combines embedding-based metrics with systematic expert review. Experiments on testimonial texts (1680–1792) show that fine-tuning choice decisively shapes quality: the Unsloth-tuned Mistral model attains the highest BERTScore and METEOR values and most faithfully preserves historical nuance. The framework supports a collaborative workflow where machine-generated drafts accelerate expert translation, making archival texts more accessible while maintaining scholarly oversight through domain-expert validation.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices
Open materials
Copyright
© The Author(s), 2026. Published by Cambridge University Press
Figure 0

Figure 1. End-to-end training and evaluation pipeline.

Figure 1

Figure 2. Word-count distribution for early modern Dutch sentences in the training and test splits.

Figure 2

Table 1. Tokenizer performance comparison on early modern Dutch

Figure 3

Table 2. Example of a high-quality translation by Llama-3-Unsloth showing preservation of legal syntax and formal register

Figure 4

Table 3. Example of an unsatisfactory translation: the Dutch text is copied verbatim rather than translated

Figure 5

Figure 3. Distribution of BERTScore values for all systems, ranked by mean.

Figure 6

Figure 4. Distribution of METEOR scores for all models, ranked by mean.

Figure 7

Figure 5. Sentence-level METEOR distributions for the two best systems.

Figure 8

Table A1. Bad translation from early modern Dutch to English by [Unsloth Mistral]

Figure 9

Table A2. Good translation from early modern Dutch to English by [Unsloth Mistral]

Figure 10

Table A3. Good translation from early modern Dutch to English by [Llama & Assistant]

Figure 11

Table A4. Poor translation from early modern Dutch to English by [Llama & Assistant]

Figure 12

Table A5. Translations of early modern Dutch text by different models

Figure 13

Table A6. Training arguments for different models

Figure 14

Table A7. Translation responses of various models

Figure 15

Table A8. Good translation from early modern Dutch to English by [Mistral-7B-Instruct-v0.3]

Figure 16

Table A9. Good translation from early modern Dutch to English by [Mistral-7B-Instruct-v0.3]

Figure 17

Table A10. Statistics of BERT scores for all models

Figure 18

Table A11. Statistics of METEOR scores for all models

Figure 19

Figure A1. BERT and METEOR score heat map.

Submit a response

Rapid Responses

No Rapid Responses have been published for this article.