Hostname: page-component-6766d58669-bkrcr Total loading time: 0 Render date: 2026-05-21T05:22:55.348Z Has data issue: false hasContentIssue false

AI and paleontology: effects of vertebrate fossil sample size on machine learning image classification

Published online by Cambridge University Press:  25 February 2026

Bruce J. MacFadden*
Affiliation:
Florida Museum of Natural History, University of Florida , Gainesville, Florida 32611, U.S.A.
Cristobal A. Barberis
Affiliation:
Adaptive Computing, 1100 5th Avenue S #201, Naples, Florida 34102, U.S.A.
Maria C. Vallejo-Pareja
Affiliation:
Florida Museum of Natural History, University of Florida , Gainesville, Florida 32611, U.S.A.
Samantha P. Zbinden
Affiliation:
Department of Integrative Biology, The University of Texas at Austin , Austin, Texas 78712, U.S.A.
Victor J. Perez
Affiliation:
Natural and Historic Resources Division, Prince George’s County Parks and Recreation, Maryland-National Capital Park and Planning Commission, Upper Marlboro, Maryland 20772, U.S.A.
Stephanie R. Killingsworth
Affiliation:
Florida Museum of Natural History, University of Florida , Gainesville, Florida 32611, U.S.A. Department of Geological Sciences, University of Florida, Gainesville, Florida 32611, U.S.A.
Kenneth W. Marks
Affiliation:
17356 S. Parker Road, Homer Glen, Illinois 60491, U.S.A.
Dévi Hall
Affiliation:
541 Palm Drive, Key Largo, Florida 33037, U.S.A.
Arthur Porto
Affiliation:
Florida Museum of Natural History, University of Florida , Gainesville, Florida 32611, U.S.A.
*
Corresponding author: Bruce J. MacFadden; Email: bmacfadd@ufl.edu

Abstract

With the growing application of artificial intelligence (AI) and machine learning (ML), great potential exists to leverage these technologies in paleontology. Relative to many other scientific fields, a challenge of ML applied to paleontology is small sample sizes, particularly for fossil vertebrates. Shark teeth, abundant in the fossil record, provide a model system to use ML across varying sample sizes. Here we use six classes (taxa) of Neogene shark teeth for taxonomic identification, including a curated dataset of 3150 images. Each class was evaluated using an 80% training and 20% validation split, with a separate, external test set of 25 samples per class. Pretrained models perform well (accuracy > 90%), providing a strong baseline for classification. However, enabling fine-tuning of the ML model to identify fossil shark teeth improves performance considerably. Likewise, sample size per class also affects the accuracy of the models’ classifications. Smaller sample sizes (n = 50 individuals per class) yielded a mean accuracy of 93.4%, but plateaued at ~99% between 200 and 500 images per class. Confidence likewise increases with larger samples, from 81.8% (n = 50 individuals per class) to >90% (n = 300 to 500 individuals per class). Misidentifications followed consistent patterns, reflecting morphological similarities and/or poor preservation. Artificially increasing the training datasets using data augmentation improves the confidence of identifications. This research indicates that relatively small samples of vertebrate species (~50 to 500 individuals per class) can effectively train an ML model to identify these shark teeth with high levels of accuracy.

Information

Type
Methodological Advances
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited. The written permission of Cambridge University Press or the rights holder(s) must be obtained prior to any commercial use and/or adaptation of the article.
Copyright
© The Author(s), 2026. Published by Cambridge University Press on behalf of Paleontological Society
Figure 0

Figure 1. Representative examples of the six machine learning (ML) classes of Neogene shark teeth used during this study, lingual view: A,Carcharodon carcharias (UF 131981); B,Carcharodon hastalis (UF 231961); C,Carcharhinus sp. (UF 228647); D,Galeocerdo cuvier (UF/TRO 13692); E,Hemipristis serra (UF/TRO 7145); F,Otodus megalodon (USNM PAL 530605). Not to scale (normalized to vertical dimension).

Figure 1

Figure 2. Theoretical machine learning (ML) performance with increasing sample size in each class, producing a logarithmic curve in which metrics, such as accuracy or confidence, plateau at a model optimum. Dashed vertical line indicates the inflection point, after which at higher sample sizes, the model achieves optimal performance.

Figure 2

Table 1. Sampling of the fossil shark teeth used to develop the digital image datasets for the analyses presented here. Generic assignment for Otodus megalodon follows Cappetta (2012) and Shimada et al. (2016). Generic assignment for Carcharodon hastalis follows Ehret et al. (2013; also see Rojas 2012). Also see Supplementary Material 2. Institutional Abbreviations: CMM, Calvert Marine Museum; NCMNS, North Carolina Museum of Natural Sciences; UCMP, University of California Museum of Paleontology; UF, University of Florida, Florida Museum; USNM, National Museum of Natural History. For private collections see “Acknowledgments”

Figure 3

Figure 3. Individual variation of an associated dentition of Otodus megalodon, illustrating both monognathic and dignathic heterodonty. Scale bar, 5 cm. Modified from Perez et al. (2021).

Figure 4

Figure 4. Morphological and taphonomic variation of the lingual surfaces of selected examples of the six taxonomic groups included in this study. Conventional orientation is with apex down for upper teeth and apex up for lower teeth. All specimens are contained in the UF Vertebrate Paleontology collection (https://www.floridamuseum.ufl.edu/vertpaleo/). A–G,Otodus megalodon: A, UF 245000, B, UF 17932, C, UF 229802, D, UF/TRO 8488, E, UF 311000, F, UF/TRO 5343, G, UF/TRO 5600; H–M,Carcharodon hastalis: H, UF/TRO 7664, I, UF/TRO 7125, J, UF/TRO 14350, K, UF/TRO15970, L, UF/TRO 11513, M, UF/TRO 7128; N–S,Carcharodon carcharias: N, UF 132025, O, UF 123304, P, UF 234841, Q, UF/TRO 2369, R, UF 85092, S, UF 123105; T–Y,Galeocerdo cuvier: T, UF/TRO 11010, U, UF 233649, V, UF/TRO 11590, W, UF/TRO 13684, X, UF/TRO 6010, Y, UF/TRO 2384; Z–AF,Hemipristis serra: Z, UF/TRO 7131, AA, UF/TRO 13597, AB, UF/TRO 7598, AC. UF/TRO 7145, AD, UF 232388, AE, UF/TRO 11092, AF, UF/TRO 7146; AG–AR,Carcharhinus spp.: AG, UF 233474, AH, UF 80284, AI, UF 17920a, AJ, UF 123090, AK, UF 80026, AL, UF 233674, AM, UF 17920b, AN, UF 80621, AO, UF 17920c, AP, UF 17920d, AQ, UF 7920e, AR, UF 17920f. Scale bars, 1 cm.

Figure 5

Figure 5. Comparison of fossil shark tooth, Galeocerdo cuvier, UF 234137. A, digital photograph at 384 pixels; B, saliency map showing pixel “hotspots” represented by saliency values greater than ~0.90.

Figure 6

Figure 6. Median box plots showing model accuracy and confidence scores as sample sizes increase from 50 to 500 (x-axis) specimen images per class (ipc). A, Mean accuracy for model v1, without fine-tuning enabled; B, mean confidence scores for model v1, without fine-tuning enabled; C, mean accuracy for model v2, with fine-tuning enabled; D, mean confidence scores for model v2, with fine-tuning enabled. Mean indicated by “X”; median indicated by horizontal black line within colored boxes.

Figure 7

Figure 7. Median box plots of data augmentation (DA) for 50, 100, and 150 specimen images per class (ipc) using a fine-tuned model showing accuracy percentage (A, C, E) and confidence scores (B, D, F). Abbreviations on x-axis: nDA, no data augmentation; RG, random grayscale DA; HF, horizontal flip DA; RG + HF, random grayscale and horizontal flip together. Mean indicated by “X”; median indicated by horizontal black line within colored box. Data and relevant statistics are in Supplementary Material 6.

Figure 8

Figure 8. Training and validation parameters (80%/20% split), including learning loss and accuracy, for three sample sizes of the Neogene fossil sharks, that is, (A, D) 50 specimen images per class (ipc), (B, E) 150 ipc, and (C, F) 500 ipc, respectively.

Figure 9

Table 2. Total number of misidentifications of test specimens per sample size (all 10 iterations) for model v2 (Supplementary Material 7). Note that some specimens may be misidentified multiple times in different iterations

Figure 10

Figure 9. Patterns of tooth image misidentifications and the number of specimens affected for each of the six taxonomic classes studied (Supplementary Material 7). Blue bars record the total number of misidentifications for each class, while orange bars record the total number of specimens misidentified for each class.

Figure 11

Figure 10. Model v2 showing confusion matrices for 10th iteration of external testing (n = 25 per class). A, Trained with 50 specimen images per class (ipc); B, trained with 150 ipc; C, trained with 500 ipc.

Figure 12

Figure 11. Comparisons of digital photos and saliency maps for the five misidentified specimens. A,Carcharodon carcharias (CMM-V-10324), identified as Otodus megalodon at 50 specimen images per class (ipc) (B) and 150 ipc (C); D, E,Carcharhinus sp. (UF 17846AF), identified as Galeocerdo cuvier at 50 ipc; F, G,Carcharhinus sp. (UF 230862), identified as Galeocerdo cuvier at 50 ipc; H, I,Carcharhinus sp. (UF 233726), identified as Carcharodon hastalis at 50 ipc. Tooth images scaled to saliency map, otherwise not to scale relative to other specimens illustrated here.