Skip to main content Accessibility help
×
Home
Hostname: page-component-768ffcd9cc-8zwnf Total loading time: 0.225 Render date: 2022-12-04T13:24:09.554Z Has data issue: true Feature Flags: { "useRatesEcommerce": false } hasContentIssue true

Digitizing the Textual Heritage of the Premodern Islamicate World: Principles and Plans

Published online by Cambridge University Press:  31 January 2018

Matthew Thomas Miller
Affiliation:
Roshan Institute for Persian Studies, University of Maryland, College Park, Md.; e-mail: mtmiller@umd.edu
Maxim G. Romanov
Affiliation:
Department of History, University of Vienna, Vienna, Austria; e-mail: maxim.romanov@univie.ac.at
Sarah Bowen Savant
Affiliation:
Aga Khan University, Institute for the Study of Muslim Civilisations, London; e-mail: sarahsavant@aku.edu

Extract

The varied textual traditions of the premodern Islamicate World represent an opportunity and a problem for the Digital Humanities (DH). The opportunity lies in the sheer extent of this textual heritage: if we combine the textual output of premodern Persian and Arabic authors (not to mention Turkish and other less well-represented Islamicate languages), this body of texts constitutes arguably the largest written repository of human culture. Analytical methods developed for other linguistic heritages can be repurposed to make use of this wealth of texts, and efforts are now underway to apply to them a series of computationally enhanced methods that derive from a variety of disciplines (e.g., corpus linguistics, computational linguistics, the social sciences, and statistics). The application of these forms of analysis to these large new corpora promises new insights on premodern Islamicate cultures and the improvement of existing digital tools and methodologies.

Type
Roundtable
Copyright
Copyright © Cambridge University Press 2018 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

NOTES

1 In alphabetical order.

2 For the guidelines, see http://mesana.org/resources/digital-scholarship.html. See also Presner, Todd, “How to Evaluate Digital Scholarship.” Journal of Digital Humanities 1 (2012)Google Scholar, accessed 18 September 2017, http://journalofdigitalhumanities.org/1-4/how-to-evaluate-digital-scholarship-by-todd-presner.

3 See, for example, the “Collaborators’ Bill of Rights” and the “Student Collaborators’ Bill of Rights” for important efforts to lay out foundational principles for equitable collaboration: Tanya Clement and Doug Reside, “Off the Tracks: Laying New Lines for Digital Humanities Scholars,” Media Commons Press, accessed 15 September 2017, http://mcpress.media-commons.org/offthetracks/part-one-models-for-collaboration-career-paths-acquiring-institutional-support-and-transformation-in-the-field/a-collaboration/collaborators%E2%80%99-bill-of-rights/; Haley Di Pressi, Stephanie Gorman, Miriam Posner, Raphael Sasayama, and Tori Schmitt, with contributions from Roderic Crooks, Megan Driscoll, Amy Earhart, Spencer Keralis, Tiffany Naiman, and Todd Presner, “A Student Collaborators’ Bill of Rights,” UCLA Center for Digital Humanities, accessed 15 September 2017, www.cdh.ucla.edu/news-events/a-student-collaborators-bill-of-rights/.

4 See al-Maktaba al-Shamila, accessed 15 September 2017, http://shamela.ws/.

5 See al-Maktaba al-Shiʿiyya, accessed 15 September 2017, http://shiaonlinelibrary.com.

6 See A Digital Corpus for Graeco-Arabic Studies, accessed 15 September 2017, https://www.graeco-arabic-studies.org/.

7 See Arabic Commentaries on the Hippocratic Aphorisms, accessed 15 September 2017, http://cordis.europa.eu/project/rcn/100847_en.html.

8 See Ganjoor, accessed 15 September 2017, https://ganjoor.net/.

9 For more on OpenITI mARkdown schema, see Maxim Romanov, “OpenITI mARkdown,” al-Raqmiyyat, accessed 15 September 2017, https://alraqmiyyat.github.io/mARkdown/. For more on CTS and specifically CapiTainS, see CapiTainS, accessed 15 September 2017, http://capitains.org/. For more on TEI, see Text Encoding Initiative, accessed 15 September 2017, http://www.tei-c.org/index.xml.

10 The OpenITI repository is available at https://github.com/OpenITI/, accessed 15 September 2017. For more on OpenITI CTS URNs, see Maxim Romanov, “OpenITI,” al-Raqmiyyat, accessed 15 September 2017, https://alraqmiyyat.github.io/OpenITI/.

11 Traditional OCR approaches work by segmenting page images into lines, then each line into words, and then each word into characters. Since segmentation is extremely problematic when it comes to connected, ligature-rich scripts, performance is consistently poor on the last two steps. In contrast to this approach, Kraken completely eliminates the issue of word/character segmentation by instead employing a form of machine learning called a neural network. Neural networks mimic the way we learn, enabling Kraken to “learn” from transcriptions (training data) to recognize letters in the images of entire lines of text. This new approach to OCR makes Kraken uniquely able to handle the wide variety of ligatures in connected scripts such as Arabic and Persian.

12 Benjamin Kiessling, Matthew Thomas Miller, Maxim Romanov, and Sarah Bowen Savant, “Important New Developments in Arabographic Optical Character Recognition (OCR),” al-ʿUsur al-Wusta, accessed 20 November 2017, http://islamichistorycommons.org/mem/wp-content/uploads/sites/55/2017/11/UW-25-Savant-et-al.pdf.

13 Generalized models incorporate script features from multiple typefaces and thus are less typeface specific and better able to handle typefaces for which we have not trained a specific model.

6
Cited by

Save article to Kindle

To save this article to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Digitizing the Textual Heritage of the Premodern Islamicate World: Principles and Plans
Available formats
×

Save article to Dropbox

To save this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Dropbox account. Find out more about saving content to Dropbox.

Digitizing the Textual Heritage of the Premodern Islamicate World: Principles and Plans
Available formats
×

Save article to Google Drive

To save this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Google Drive account. Find out more about saving content to Google Drive.

Digitizing the Textual Heritage of the Premodern Islamicate World: Principles and Plans
Available formats
×
×

Reply to: Submit a response

Please enter your response.

Your details

Please enter a valid email address.

Conflicting interests

Do you have any conflicting interests? *