Hostname: page-component-77f85d65b8-9nbrm Total loading time: 0 Render date: 2026-03-26T15:23:01.318Z Has data issue: false hasContentIssue false

The Neotoma Paleoecology Database, a multiproxy, international, community-curated data resource

Published online by Cambridge University Press:  17 January 2018

John W. Williams*
Affiliation:
Department of Geography, University of Wisconsin–Madison, Madison, Wisconsin 53706, USA Center for Climatic Research, University of Wisconsin–Madison, Madison, Wisconsin 53706, USA
Eric C. Grimm
Affiliation:
Department of Earth Sciences, University of Minnesota, Minneapolis, Minnesota 55455, USA
Jessica L. Blois
Affiliation:
School of Natural Sciences, University of California, Merced, Merced, California 95343, USA
Donald F. Charles
Affiliation:
Earth and Environmental Science, Drexel University and Patrick Center, Academy of Natural Sciences of Drexel University, Philadelphia, Pennsylvania 19103, USA
Edward B. Davis
Affiliation:
Department of Earth Sciences and Museum of Natural and Cultural History, University of Oregon, Eugene, Oregon 97403, USA
Simon J. Goring
Affiliation:
Department of Geography, University of Wisconsin–Madison, Madison, Wisconsin 53706, USA
Russell W. Graham
Affiliation:
Department of Geosciences, College of Earth and Mineral Sciences, The Pennsylvania State University, State College, Pennsylvania 16802, USA
Alison J. Smith
Affiliation:
Department of Geology, Kent State University, Kent, Ohio 44242, USA
Michael Anderson
Affiliation:
SpatialIT, State College, Pennsylvania 16802, USA
Joaquin Arroyo-Cabrales
Affiliation:
Laboratorio de Arqueozoología, Instituto Nacional de Antropología e Historia, 06060 Ciudad de Mexico, CDMX, Mexico
Allan C. Ashworth
Affiliation:
Department of Geosciences, North Dakota State University, Fargo, North Dakota 58108, USA
Julio L. Betancourt
Affiliation:
National Research Program, Water Mission Area, U.S. Geological Survey, Reston, Virginia 20192, USA
Brian W. Bills
Affiliation:
Center for Environmental Informatics, The Pennsylvania State University, State College, Pennsylvania 16802, USA
Robert K. Booth
Affiliation:
Earth and Environmental Sciences Department, Lehigh University, Bethlehem, Pennsylvania 18015, USA
Philip I. Buckland
Affiliation:
Environmental Archaeology Lab, Department of Historical, Philosophical and Religious Studies, Umeå University, Umeå SE-90187, Sweden
B. Brandon Curry
Affiliation:
Illinois State Geological Survey, Champaign, Illinois 61820, USA
Thomas Giesecke
Affiliation:
Department of Palynology and Climate Dynamics, Albrecht-von-Haller-Institute for Plant Sciences, University of Göttingen, Göttingen, Germany
Stephen T. Jackson
Affiliation:
Southwest Climate Science Center, U.S. Geological Survey, Tucson, Arizona 85721, USA; Department of Geosciences, University of Arizona, Tucson, Arizona 85721, USA
Claudio Latorre
Affiliation:
Departamento de Ecologia, Facultad de Ciencias Biológicas, Pontificia Univeridad Católica de Chile, Casilla 114-D, Santiago and Institute of Ecology and Biodiversity (IEB), Santiago, Chile
Jonathan Nichols
Affiliation:
Lamont-Doherty Earth Observatory, Palisades, New York 10964, USA
Timshel Purdum
Affiliation:
Academy of Natural Sciences of Drexel University, Philadelphia, Pennsylvania 19103, USA
Robert E. Roth
Affiliation:
Department of Geography, University of Wisconsin–Madison, Madison, Wisconsin 53706, USA Cartography Lab, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
Michael Stryker
Affiliation:
National Research Program, Water Mission Area, U.S. Geological Survey, Reston, Virginia 20192, USA
Hikaru Takahara
Affiliation:
Laboratory of Forest Vegetation Dynamics, Kyoto Prefectural University, Hangi-cho, Shimogamo, Sakyo-ku, Kyoto 606-8522, Japan
*
*Corresponding author at: Department of Geography, 550 North Park St., University of Wisconsin-Madison, Madison, Wisconsin 53706, USA. E-mail address: jww@geography.wisc.edu (J.W. Williams).
Rights & Permissions [Opens in a new window]

Abstract

The Neotoma Paleoecology Database is a community-curated data resource that supports interdisciplinary global change research by enabling broad-scale studies of taxon and community diversity, distributions, and dynamics during the large environmental changes of the past. By consolidating many kinds of data into a common repository, Neotoma lowers costs of paleodata management, makes paleoecological data openly available, and offers a high-quality, curated resource. Neotoma’s distributed scientific governance model is flexible and scalable, with many open pathways for participation by new members, data contributors, stewards, and research communities. The Neotoma data model supports, or can be extended to support, any kind of paleoecological or paleoenvironmental data from sedimentary archives. Data additions to Neotoma are growing and now include >3.8 million observations, >17,000 datasets, and >9200 sites. Dataset types currently include fossil pollen, vertebrates, diatoms, ostracodes, macroinvertebrates, plant macrofossils, insects, testate amoebae, geochronological data, and the recently added organic biomarkers, stable isotopes, and specimen-level data. Multiple avenues exist to obtain Neotoma data, including the Explorer map-based interface, an application programming interface, the neotoma R package, and digital object identifiers. As the volume and variety of scientific data grow, community-curated data resources such as Neotoma have become foundational infrastructure for big data science.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © University of Washington. Published by Cambridge University Press, 2018
Figure 0

Figure 1 Papers citing Neotoma and its constituent databases.

Figure 1

Figure 2 (color online) Neotoma serves many communities and acts as a boundary organization (Guston, 2001) among these communities. Neotoma serves paleoecologists by providing a high-quality repository for their paleoecological data, with value added via digital object identifiers to facilitate data citation, data curation, and a flexible data model. Neotoma serves data users by providing a well-structured, open-access, and easy-to-use source of paleoecological data, specializing in time scales that bridge the boundary between global change ecology and geology (Jackson and Hobbs, 2009; Dietl and Flessa, 2011; Betancourt, 2012; Jackson and Blois, 2015; Kidwell, 2015; Jackson, in press). In return, these communities generate new questions and analytical approaches for paleoecological data. Neotoma serves educators, students, and the general public seeking to learn about the past distributions of charismatic species such as the Pleistocene megafauna and the effects of climate change on species distribution and diversity. Neotoma also serves as a boundary organization between geoscientists and computer scientists, passing data, new research questions, best practices and protocols, and geoscientific use cases and priorities.

Figure 2

Figure 3 (color online) Diagram of Neotoma’s governance structure. Neotoma is governed by a leadership council, which is populated by elected members serving four-year terms. The executive working group coordinates day-to-day operations and reports to the leadership council. Other working groups coordinate education and outreach activities, build informatics and development activities, cultivate international partnerships, and handle membership requests and leadership elections. Constituent databases and the data stewards within these databases are charged with uploading data to Neotoma, setting data standards and vocabularies, adopting and harmonizing taxonomies, and deciding default age models. These constituent databases are organized by taxonomic group or paleoecological proxy type and often are further subdivided by region or time period. The Neotoma governance system is extensible, such that new members can readily join and new constituent databases can form.

Figure 3

Figure 4 (color online) Diagram of the Neotoma software ecosystem. Data preparation and cleaning for upload to Neotoma are handled by the Tilia software (https://www.tiliait.com/), which has password-protected access for data stewards to upload data sets, update age models, and correct errors. Data are stored in the Neotoma relational database, which is deployed in SQL Server and currently hosted at Pennsylvania State University’s Center for Environmental Informatics. Neotoma data can be discovered, explored, viewed, and obtained through multiple platforms. Neotoma Explorer and its graphical map-based interface is designed for first-pass data explorations, new users, and educational and student groups. The application programming interfaces (APIs) and neotoma R package are intended for programmatic access and for users who wish to do large-volume searches of Neotoma data holdings. Tilia can also download data sets from Neotoma, which is useful for data visualizations and for data stewards needing to update data sets or looking for examples of prepared Tilia files.

Figure 4

Figure 5 The Neotoma data model handles different kinds of sampling designs by paleoecologists through a flexible hierarchical system consisting of sites, collection units, analysis units, samples, and datasets. Sites are the field locations from which paleoecological data are obtained and can contain multiple collection units. Collection Units are the specific point-level locations within sites from which data are obtained and can contain multiple analysis units. Analysis Units are the specific depth horizons from which data are obtained and can contain multiple samples. A Sample is a single piece of material extracted from an analysis unit, for which a single kind of measurement is made (e.g., analyzed for fossil pollen, stable isotopic analyses, etc.). A Dataset comprises all samples of a single data set type in a single collection unit (e.g., all pollen samples from a single core).

Figure 5

Figure 6 (color online) History of data uploads to Neotoma, expressed as number of observations (left) and data sets (right). Neotoma launched in 2009 with a number of data sets already in it, mostly pollen and vertebrates, representing prior database building efforts from the Global Pollen Database and FAUNMAP efforts. Rate of data uploads accelerated after 2013, when the new Neotoma data model was established and Tilia’s data upload and validation routines were written. The number of data sets is relatively even among several major data set types (vertebrates, pollen, geochronological data) with recent rapid growth of ostracode and diatom data sets. The number of pollen observations (left) is large relative to the number of data sets (right) because pollen data sets often have many samples (e.g., many samples per core) and many variables per sample (i.e., dozens of taxa per sample). As other taxa- and sample-rich data sets are added to Neotoma (e.g., diatoms, ostracodes), their relative proportions will quickly increase.

Figure 6

Table 1 Constituent Databases in Neotoma and the number of datasets in each.

Figure 7

Figure 7 (color online) Tilia’s interface for stewards to add new taxonomic names to Neotoma’s Taxa table. Names are placed within a taxonomic tree, and each taxon name is assigned a unique identifier. Stewards can also upload a citation for the source of that taxonomic name.

Supplementary material: File

Williams et al. supplementary material

Williams et al. supplementary material 1

Download Williams et al. supplementary material(File)
File 23.5 KB