Hostname: page-component-89b8bd64d-72crv Total loading time: 0 Render date: 2026-05-07T17:00:57.958Z Has data issue: false hasContentIssue false

Creating a Software Methodology to Analyze and Preserve Archaeological Legacy Data

Published online by Cambridge University Press:  28 March 2023

Emily C. Fletcher*
Affiliation:
Department of Anthropology, Purdue University, West Lafayette, IN, USA
*
(Fletch47@purdue.edu, corresponding author)
Rights & Permissions [Opens in a new window]

Abstract

Software now allows archaeologists to document excavations in more detail than ever before through rich, born-digital datasets. In comparison, paper documentation of past excavations (a valuable corpus of legacy data) is prohibitively difficult to work with. This pilot study explores creating custom software to digitize paper field notes from the 1970s excavations of the Gulkana site into machine-readable text and maps to be compatible with born-digital data from subsequent excavations in the 1990s. This site, located in Alaska's Copper River Basin, is important to archaeological understanding of metalworking innovation by precontact Northern Dene people, but is underrepresented in the literature because no comprehensive map of the site exists. The process and results of digitizing this corpus are presented in hopes of aiding similar efforts by other researchers.

El software ahora le permite a los arqueólogos documentar las excavaciones con más detalle que nunca a través de conjuntos de datos de origen digital. En comparación, la documentación en papel de excavaciones pasadas (un cuerpo muy valioso de datos) es difícil de trabajar. Este estudio piloto explora la creación de un software para digitalizar notas de campo de las excavaciones de Gulkana en la década de 1970 a texto y mapas que sean legibles por máquina y compatibles con datos de origen digital de excavaciones posteriores en la década de 1990. Gulkana, ubicado en la cuenca del río Copper de Alaska, es importante para la comprensión arqueológica de la innovación metalúrgica por parte de los Dene del norte antes del contacto, pero está subrepresentado en la literatura ya que no existe un mapa completo del sitio. El proceso y los resultados de la digitalización se presentan con la esperanza de ayudar a otros investigadores en esfuerzos similares.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Open Practices
Open materials
Copyright
Copyright © The Author(s), 2023. Published by Cambridge University Press on behalf of Society for American Archaeology
Figure 0

FIGURE 1. Ethnolinguistic map of the Copper River region, including the Gulkana site and nearby sources of native copper (Wrangell and St. Elias Mountains). Adapted from Cooper 2011.

Figure 1

FIGURE 2. A scanned page from a field notebook. Full pages such as these can also be used as training images. This limits time spent collecting training images, but it leads to a more tedious annotation process. Additionally, these pages may not necessarily include a comparable number of instances for each character.

Figure 2

FIGURE 3. Example of a training image created by the researcher. This image was created by cropping an empty line from a scan of a field notebook, then overlaying characters cut and pasted from other locations in that notebook.

Figure 3

FIGURE 4. Using jTessBoxEditor to create a box file from the training image in Figure 2. The user draws boxes around each character. These boxes are represented on the left (and in the resulting box file) by a starting location and dimensions.

Figure 4

TABLE 1. Comparing the Accuracy and Time Investment of Identifying Location Data by Hand and Algorithmically.

Figure 5

TABLE 2. Cross Tabulation of Researcher and Algorithmic (ArchLocateR) Success at Identifying Location Words in the Text.

Figure 6

FIGURE 5. Example of a relative location coordinate associated with a found object. Transcription: “Found what appears to be a charred seed in the yellowish-grey sandy silt near west wall of test pit. Depth 96 cm. below datum. Coordinates 5.02N 3.14E.”

Figure 7

FIGURE 6. A comparison of maps created manually by the researcher from the Feature 40 corpus and created algorithmically through steps 2 and 3 of this analysis. Labels have not been cleaned and are as they appear in the original text.

Figure 8

FIGURE 7. Comparison of details included in a section of maps in Figure 6 (the northeast quadrant of unit N6E4).

Supplementary material: File

Fletcher supplementary material

Fletcher supplementary material 1
Download Fletcher supplementary material(File)
File 622 Bytes
Supplementary material: File

Fletcher supplementary material

Fletcher supplementary material 2
Download Fletcher supplementary material(File)
File 1.4 KB