Skip to main content Accessibility help
×
Home
Hostname: page-component-59b7f5684b-8dvf2 Total loading time: 0.261 Render date: 2022-10-03T09:11:55.203Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "useRatesEcommerce": false, "displayNetworkTab": true, "displayNetworkMapGraph": true, "useSa": true } hasContentIssue true

Digitizing Premodern Text with the Chinese Text Project

Published online by Cambridge University Press:  12 August 2020

Donald Sturgeon*
Affiliation:
Durham University, email: donald.j.sturgeon@durham.ac.uk

Abstract

The widespread availability of digitized premodern textual sources – together with increasingly sophisticated means for their manipulation – has brought enormous practical benefits to scholars whose work relies upon reference to their contents. While great progress has been made with the construction of ever more comprehensive database systems and archives, far more remains not only possible but also realistically achievable in the near future. This paper discusses some of the key challenges faced, and progress made towards solving them, in the context of a widely used open digital platform attempting to expand the range of digitized sources available while simultaneously increasing the scope of meaningful tasks that can be performed with them computationally. This paper aims to suggest how seemingly simple human-mediated additions to the digitized historical record – when combined with the power of digital systems to repeatedly perform mechanical tasks at enormous scales – quickly lead to transformative changes in the feasible scope of computational analysis of premodern writing.

Type
Utilities
Copyright
Copyright © Cambridge University Press 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

1 I do not mean here to trivialize the considerable effort and expertise which goes into professional digitization.

2 Sturgeon, Donald, “Large-scale Optical Character Recognition of Pre-modern Chinese Texts,” International Journal of Buddhist Thought and Culture 28.2 (2018), 1144CrossRefGoogle Scholar.

3 Additional rules are used to cover more complex cases, such as instances of textual corruption in the edition being transcribed: https://ctext.org/instructions/wiki-formatting.

4 A logical enhancement of the approach described here is to further subdivide texts, using markup to explicitly record which parts are authored by which persons (if known) and during which time periods—recording the information that a preface is of different authorship to the main text, say, or that the text being commented upon predates the commentary.

5 The contribution to this proportion of texts whose dates of authorship are imprecise is distributed equally across all years within the recorded range for that text.

6 Donald Sturgeon, “Chinese Text Project: A Dynamic Digital Library of Premodern Chinese,” Digital Scholarship in the Humanities (2019, Advance articles).

7 De Weerdt, Hilde, Chu, Ming-kin, and Ho, Hou-ieong, “Chinese Empires in Comparative Perspective: A Digital Approach,” Verge: Studies in Global Asias 2.2 (2016), 5869Google Scholar.

8 Sturgeon, Donald, “Digital Approaches to Text Reuse in the Early Chinese Corpus,” Journal of Chinese Literature and Culture 5.2 (2018), 186213CrossRefGoogle Scholar.

11 Simon, Rainer, Barker, Elton, Isaksen, Leif, and de Soto Cañamares, Pau, “Linking Early Geospatial Documents, One Place at a Time: Annotation of Geographic Documents with Recogito,” e-Perimetron 10.2 (2015), 4959Google Scholar.

12 De Weerdt et al, “Chinese Empires in Comparative Perspective.”

14 “Verifiable” here means having the ability to confirm that such a claim was made in some particular primary source text—not, of course, that the claim itself is historical fact.

15 Chao-Lin Liu, Chih-Kai Huang, Hongsu Wang, and Peter K. Bol, “Toward Algorithmic Discovery of Biographical Information in Local Gazetteers of Ancient China,” 29th Pacific Asia Conference on Language, Information and Computation (2015), 87–95.

2
Cited by

Save article to Kindle

To save this article to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Digitizing Premodern Text with the Chinese Text Project
Available formats
×

Save article to Dropbox

To save this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Dropbox account. Find out more about saving content to Dropbox.

Digitizing Premodern Text with the Chinese Text Project
Available formats
×

Save article to Google Drive

To save this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Google Drive account. Find out more about saving content to Google Drive.

Digitizing Premodern Text with the Chinese Text Project
Available formats
×
×

Reply to: Submit a response

Please enter your response.

Your details

Please enter a valid email address.

Conflicting interests

Do you have any conflicting interests? *