Hostname: page-component-6766d58669-vgfm9 Total loading time: 0 Render date: 2026-05-19T07:27:26.043Z Has data issue: false hasContentIssue false

Horses in the Cloud: big data exploration and mining of fossil and extant Equus (Mammalia: Equidae)

Published online by Cambridge University Press:  21 October 2016

Bruce J. MacFadden
Affiliation:
Florida Museum of Natural History, University of Florida, Gainesville, Florida 32611, U.S.A. E-mail: bmacfadd@flmnh.ufl.edu
Robert P. Guralnick
Affiliation:
Florida Museum of Natural History, University of Florida, Gainesville, Florida 32611, U.S.A. E-mail: bmacfadd@flmnh.ufl.edu

Abstract

Extant species of the genus Equus (e.g., horses, asses, and zebras) have a widespread distribution today on all continents except Antarctica. Extinct species of Equus represented by fossils were likewise widely distributed in the Pliocene and even more so during the Pleistocene. In order to understand the efficacy of “big data” for (paleo)biogeographic analyses, location records (latitude, longitude) and fossil occurrences for the genus Equus were mined and further explored from six databases, including iDigBio, Paleobiology Database, VertNet, BISON, Neotoma, and GBIF. These were chosen from a priori knowledge of where relevant data might be aggregated. We also realized that these databases have different objectives and data sources and therefore would provide a useful comparative study of the widespread taxon Equus in space and time.

The mining of Equus data from these six sources yielded a combined total of 123.8 K location records, including 116.2K fossil specimens. These include individual points that are unique, that is, only occurring in one of these databases, and those that are duplicated in multiple databases. Of the six databases, three (iDigBio, Paleobiology Database, and GBIF) were judged to be the most useful in the Equus use case. Most of the databases are biased toward North American records, thus limiting the reconstruction of the actual distribution of the genus Equus in space and time outside of this continent. Although Equus has a large number of digitally accessible records, fundamentally interesting questions pertaining to evolutionary dynamics and extinction geography are still a challenge for these kinds of biodiversity databases due primarily to the lack of sufficiently dense and precise temporal data.

Information

Type
Paleobiology Letters - Rapid Communication
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © 2016 The Paleontological Society. All rights reserved
Figure 0

Table 1 Data standards that pertain to the temporal, age, and related geological context of fossil specimens contained in the databases described here.

Figure 1

Table 2 Summary characteristics of the six databases used in this metaresearch study of Equus. See text for discussion of iDigBio, PBDB, and GBIF and Supplementary Document 1 for VertNet, BISON, and Neotoma.

Figure 2

Figure 1 Plots of location for Equus records (retrieved 10 August 2016) using the integrated mapping function of iDigBio. A, All specimens. B, Fossil specimens.

Figure 3

Figure 2 Plot of 1.6K occurrence records of Equus using the integrated mapping function of PBDB (retrieved 14 February 2016).

Figure 4

Figure 3 Plots of fossil and extant Equus location data using the integrated mapping function in GBIF (retrieved 10 August 2016). A, All Equus, i.e., extant and fossil. B, Fossil Equus.

Figure 5

Figure 4 Comparison of all Equus location data. A, iDigBio (extant and fossil). B, PBDB (fossil). C, GBIF (extant and fossil).