Hostname: page-component-6766d58669-bkrcr Total loading time: 0 Render date: 2026-05-20T03:30:27.447Z Has data issue: false hasContentIssue false

Creating, Linking, and Analyzing Chinese and Korean Datasets: Digital Text Annotation in MARKUS and COMPARATIVUS

Published online by Cambridge University Press:  12 August 2020

Hilde De Weerdt*
Affiliation:
Leiden University
*
*Corresponding author. Email: h.g.d.g.de.weerdt@hum.leidenuniv.nl
Rights & Permissions [Opens in a new window]

Extract

MARKUS, a multilingual digital text annotation and analysis platform, allows historians and other researchers to construct datasets from primary sources available to them in full-text digital format. Originally designed for those working with pre-twentieth-century Chinese texts, MARKUS has developed into a multifunctional annotation platform that is particularly suited for the automated annotation, referencing, and visualization of named entities in modern and literary Chinese and premodern Korean texts, but many of its additional annotation features can be used to analyze and read texts in any language, as long as the electronic documents are encoded in the most common standard for language encoding, Unicode. Below I discuss the main goals and methodological features of MARKUS and the allied text comparison utility COMPARATIVUS. I will illustrate these with some examples of how MARKUS has been used in Chinese and Korean historical research.

Information

Type
Utilities
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0/), which permits noncommercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is included and the original work is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use.
Copyright
Copyright © Cambridge University Press 2020
Figure 0

Figure 1. List of datasets that can be selected in MARKUS automated markup.