Hostname: page-component-89b8bd64d-dvtzq Total loading time: 0 Render date: 2026-05-06T19:13:28.378Z Has data issue: false hasContentIssue false

Semantic Segmentation of Archaeological Features on Public Lands: Case Study of Historical Cotton Terraces within the Piedmont National Wildlife Refuge, Georgia, USA

Published online by Cambridge University Press:  08 July 2025

Claudine Gravel-Miguel*
Affiliation:
Cultural Resource Sciences Program, New Mexico Consortium, Los Alamos, NM, USA Department of Anthropology, University of New Mexico, Albuquerque, NM, USA
Grant Snitker
Affiliation:
Cultural Resource Sciences Program, New Mexico Consortium, Los Alamos, NM, USA Department of Anthropology, University of New Mexico, Albuquerque, NM, USA
Jayde N. Hirniak
Affiliation:
Institute of Human Origins, School of Human Evolution and Social Change, Arizona State University, Tempe, AZ, USA
Katherine Peck
Affiliation:
Cultural Resource Sciences Program, New Mexico Consortium, Los Alamos, NM, USA Department of Anthropology, University of New Mexico, Albuquerque, NM, USA
Alex Fetterhoff
Affiliation:
Cultural Resource Sciences Program, New Mexico Consortium, Los Alamos, NM, USA Department of Anthropology, University of New Mexico, Albuquerque, NM, USA
*
Corresponding author: Claudine Gravel-Miguel; Email: cgravelmiguel@newmexicoconsortium.org
Rights & Permissions [Opens in a new window]

Abstract

The logistics, costs, and capacity needed to complete extensive archaeological pedestrian surveys to inventory cultural resources present challenges to public land managers. To address these issues, we developed a workflow combining lidar-derived imagery and deep learning (DL) models tailored for cultural resource management (CRM) programs on public lands. It combines Python scripts that fine-tune models to recognize archaeological features in lidar-derived imagery with denoising QGIS steps that improve the predictions’ performance and applicability. We present this workflow through an applied case study focused on detecting historic agricultural terraces in the Piedmont National Wildlife Refuge, Georgia, USA. For this project, we fine-tuned pretrained U-Net models to teach them to recognize agricultural terraces in imagery, identified the parameter settings that led to the highest recall for detecting terraces, and used those settings to train models on incremental dataset sizes, which allowed us to identify the minimum training size necessary to obtain satisfying models. Results present effective models that can detect most terraces even when trained on small datasets. This study provides a robust methodology that requires basic proficiencies in Python coding but expands DL applications in federal CRM by advancing methods in lidar and machine learning for archaeological inventorying, monitoring, and preservation.

Resumen

Resumen

La logística, el coste y la capacidad necesarios para completar prospecciones arqueológicas terrestres para inventariar bienes culturales presentan retos para los administradores de fincas públicas. Para abordar estos problemas, hemos desarrollado un flujo de trabajo que combina imágenes derivadas de lidar y modelos de aprendizaje profundo diseñado para programas de gestión de recursos culturales (PGRC) en fincas públicas. Este flujo de trabajo combina “scripts” de Python que afinan los modelos para reconocer características arqueológicas en imágenes obtenidas por lidar con filtros de eliminación de ruido en QGIS que mejoran el rendimiento y la aplicabilidad de las predicciones. Presentamos este flujo de trabajo a través de un estudio de caso centrado en la detección de terrazas agrícolas históricas en el Refugio Nacional de Vida Silvestre Piedmont, Georgia, EE. UU. Para llevarlo a cabo, pasamos algunas semanas geolocalizando terrazas visibles en imágenes derivadas de lidar. Para ajustar los modelos U-Net previamente entrenados y enseñarles a reconocer terrazas agrícolas en las imágenes, probamos distintas combinaciones de parámetros. Identificamos aquellas configuraciones que lograron la mayor precisión en la detección de terrazas y utilizamos esas configuraciones para entrenar modelos en tamaños incrementales de conjuntos de datos. Este proceso nos permitió determinar el tamaño mínimo de entrenamiento necesario para obtener un rendimiento satisfactorio en los modelos. Los resultados presentan modelos muy efectivos que son lo suficientemente sensibles como para detectar la mayoría de las terrazas, incluso cuando están entrenados en conjuntos de datos pequeños. Por lo tanto, este estudio proporciona una metodología robusta que requiere conocimientos básicos de codificación Python, pero amplía las aplicaciones de aprendizaje profundo en PGRC federal mediante el avance de métodos en lidar y aprendizaje automático para el inventariado, monitoreo y preservación de bienes arqueológicos.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Society for American Archaeology.
Figure 0

Figure 1. (Left) Example of terraces visible on an SLRM (20 m radius) tile and (right) its associated binary mask showing the location of terraces (white) versus the background (black).

Figure 1

Figure 2. Study area of the PNWR and the coverage of the lidar-derived DEM (in grayscale) within its broader spatial context (the white arrow points to the white dot that shows its location). Basemap: ESRI satellite imagery.

Figure 2

Figure 3. Workflow from raw data to mapped terraces.

Figure 3

Figure 4. Figure of the different U-Net structures (done by CGM based on visualization in Ronneberger et al. 2015). Top: VGG16 and VGG19 (the gray boxes show the convolutions that are not present in VGG16), bottom: ResNet18.

Figure 4

Figure 5. Set of terraces visible in different visualization one-band maps (SLRM with 10 m radius, PosOp, and Slope) as well as the RGB they produce when combined as a three-band tile (where SLRM is used for the red channel, PosOp is used for the green channel, and Slope is used for the blue channel).

Figure 5

Table 1. Augmentations Used in This Research.

Figure 6

Table 2. Parameters Tested in the Order They Were Tested.

Figure 7

Figure 6. Extent of the different training datasets against the extent of the lidar data (black) and the annotated terraces (white lines).

Figure 8

Table 3. Number of Tiles Used for Each Training on Smaller Datasets.

Figure 9

Figure 7. Example of object-by-object metric workflow. Windows A–C are from a different area to windows D–E: (A) predicted terraces shown as black pixels; (B) polygons with their area (m2) —the polygon’s color shows if they will be automatically deleted (dark gray) or if they will be kept (white) based on the 1,600 m2 denoising threshold; (C) denoised predicted polygons; (D) example of annotated terraces; (E) superposition of the annotated terraces with the predicted terraces. The fill colors show how they are used to calculate the object-per-object metrics; TP and FN are annotations that do and do not overlap with predictions, respectively, whereas FP are predictions that do not overlap with annotations.

Figure 10

Figure 8. Smoothed metrics per epoch for the 200-epoch training of the best parameters (smoothing span 0.2). The gray represents the 95% confidence interval around the smoothed value.

Figure 11

Table 4. Object-by-Object Performance Metrics after Automatic Denoising of the Predicted Vector Map.

Figure 12

Figure 9. Examples of true positives (a), false positives (b), and false negatives (c) identified by the model: (a) small and indistinct terrace (left) and more regular terraces that are likely modern (right); (b) pedestrian path that was incorrectly flagged; (c) annotated terrace not predicted by the model that is likely a pedestrian path. The basemaps represent slope overlayed on a hillshade map for increased topography visibility. All white scale lines represent 50 m.

Figure 13

Figure 10. Metrics per epoch for each model trained on different dataset sizes. The gray represents the 95% confidence interval around the smoothed value (span 0.2). Refer to Figure 6 for the extent of the training dataset for each model.

Figure 14

Figure 11. Values extracted at the location of terrace centroids. Top: median slope within a 100 m radius. Bottom: number of terrace centroids within a 250 m radius. The notches show the 95% confidence interval around the median.

Figure 15

Figure 12. Comparing some of the test predictions of two models trained on southern terraces.

Figure 16

Table 5. Object-by-Object Metrics Calculated on Denoised Predictions (Using Threshold 2,000) Filtered to Focus Only on the Northeastern Region That Was Used for Testing in All Models.

Figure 17

Figure 13. Terraces predicted by our best model cleaned using 2,000 threshold and clipped to the extent of the PNWR.