Hostname: page-component-89b8bd64d-5bvrz Total loading time: 0 Render date: 2026-05-06T23:53:15.805Z Has data issue: false hasContentIssue false

The Legality and Ethics of Web Scraping in Archaeology

Published online by Cambridge University Press:  19 April 2024

Jonathan Paige*
Affiliation:
Department of Anthropology, University of Missouri, Columbia, MO, and Center for Archaeological Research, University of Texas, San Antonio, TX
*
(jonathan.n.paige@gmail.com, corresponding author)
Rights & Permissions [Opens in a new window]

Abstract

Web scraping, the practice of automating the collection of data from websites, is a key part of how the internet functions, and it is an increasingly important part of the research tool kit for scientists, cultural resources professionals, and journalists. There are few resources intended to train archaeologists in how to develop web scrapers. Perhaps more importantly, there are also few resources that outline the normative, ethical, and legal frameworks within which scraping of archaeological data is situated. This article is intended to introduce archaeologists to web scraping as a research method, as well as to outline the norms concerning scraping that have evolved since the 1990s, and the current state of US legal frameworks that touch on the practice. These norms and legal frameworks continue to evolve, representing an opportunity for archaeologists to become more involved in how scraping is practiced and how it should be regulated in the future.

Web scraping, la práctica de automatizar la recopilación de datos de sitios web, es una parte clave del funcionamiento de Internet y, cada vez más, es una parte importante del conjunto de herramientas de investigación para científicos, profesionales de recursos culturales y periodistas. Hay pocos recursos destinados a capacitar a los arqueólogos sobre cómo desarrollar web scrapers. Quizás lo más importante es que también hay pocos recursos que describan los marcos normativos, éticos y legales dentro de los cuales se sitúa el raspado de datos arqueológicos. Este documento tiene como objetivo presentar a los arqueólogos el web scraping como método de investigación, así como delinear las normas relacionadas con el scraping que han evolucionado desde la década de 1990 y el estado actual de los marcos legales de los Estados Unidos que tocan esta práctica. Estas normas y marcos legales continúan evolucionando, lo que representa una oportunidad para que los arqueólogos se involucren más en cómo se practica y regula el raspado en el futuro.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
Copyright © The Author(s), 2024. Published by Cambridge University Press on behalf of Society for American Archaeology
Figure 0

FIGURE 1. Robots.txt file associated with eBay.com in the mid-1990s. The only restriction on web scrapers was the exclusion of the service “roverbot.”

Figure 1

FIGURE 2. RoverBot.com website as it stood in December 1996. This was the only web-scraping program that eBay.com disallowed from scraping data on its website in 1998.

Figure 2

FIGURE 3. The first fraction of 475 lines of the robots.txt file associated with eBay.com as of July 2023. Most kinds of web scraping are disallowed.