Hostname: page-component-89b8bd64d-b5k59 Total loading time: 0 Render date: 2026-05-09T13:13:43.572Z Has data issue: false hasContentIssue false

Automatic taxonomic identification based on the Fossil Image Dataset (>415,000 images) and deep convolutional neural networks

Published online by Cambridge University Press:  17 June 2022

Xiaokang Liu
Affiliation:
State Key Laboratory of Biogeology and Environmental Geology, School of Earth Sciences, China University of Geosciences, Wuhan 430074, China. E-mail: xkliu@cug.edu.cn, jsy@cug.edu.cn, ruicug@163.com, wenchaoshu@live.cn, hjbb@cug.edu.cn, yongfangsun@cug.edu.cn, s@cug.edu.cn, chudl@cug.edu.cn, wuyuyang01@gmail.com, haijunsong@cug.edu.cn
Shouyi Jiang
Affiliation:
State Key Laboratory of Biogeology and Environmental Geology, School of Earth Sciences, China University of Geosciences, Wuhan 430074, China. E-mail: xkliu@cug.edu.cn, jsy@cug.edu.cn, ruicug@163.com, wenchaoshu@live.cn, hjbb@cug.edu.cn, yongfangsun@cug.edu.cn, s@cug.edu.cn, chudl@cug.edu.cn, wuyuyang01@gmail.com, haijunsong@cug.edu.cn
Rui Wu
Affiliation:
State Key Laboratory of Biogeology and Environmental Geology, School of Earth Sciences, China University of Geosciences, Wuhan 430074, China. E-mail: xkliu@cug.edu.cn, jsy@cug.edu.cn, ruicug@163.com, wenchaoshu@live.cn, hjbb@cug.edu.cn, yongfangsun@cug.edu.cn, s@cug.edu.cn, chudl@cug.edu.cn, wuyuyang01@gmail.com, haijunsong@cug.edu.cn
Wenchao Shu
Affiliation:
State Key Laboratory of Biogeology and Environmental Geology, School of Earth Sciences, China University of Geosciences, Wuhan 430074, China. E-mail: xkliu@cug.edu.cn, jsy@cug.edu.cn, ruicug@163.com, wenchaoshu@live.cn, hjbb@cug.edu.cn, yongfangsun@cug.edu.cn, s@cug.edu.cn, chudl@cug.edu.cn, wuyuyang01@gmail.com, haijunsong@cug.edu.cn
Jie Hou
Affiliation:
State Key Laboratory of Biogeology and Environmental Geology, School of Earth Sciences, China University of Geosciences, Wuhan 430074, China. E-mail: xkliu@cug.edu.cn, jsy@cug.edu.cn, ruicug@163.com, wenchaoshu@live.cn, hjbb@cug.edu.cn, yongfangsun@cug.edu.cn, s@cug.edu.cn, chudl@cug.edu.cn, wuyuyang01@gmail.com, haijunsong@cug.edu.cn
Yongfang Sun
Affiliation:
State Key Laboratory of Biogeology and Environmental Geology, School of Earth Sciences, China University of Geosciences, Wuhan 430074, China. E-mail: xkliu@cug.edu.cn, jsy@cug.edu.cn, ruicug@163.com, wenchaoshu@live.cn, hjbb@cug.edu.cn, yongfangsun@cug.edu.cn, s@cug.edu.cn, chudl@cug.edu.cn, wuyuyang01@gmail.com, haijunsong@cug.edu.cn
Jiarui Sun
Affiliation:
State Key Laboratory of Biogeology and Environmental Geology, School of Earth Sciences, China University of Geosciences, Wuhan 430074, China. E-mail: xkliu@cug.edu.cn, jsy@cug.edu.cn, ruicug@163.com, wenchaoshu@live.cn, hjbb@cug.edu.cn, yongfangsun@cug.edu.cn, s@cug.edu.cn, chudl@cug.edu.cn, wuyuyang01@gmail.com, haijunsong@cug.edu.cn
Daoliang Chu
Affiliation:
State Key Laboratory of Biogeology and Environmental Geology, School of Earth Sciences, China University of Geosciences, Wuhan 430074, China. E-mail: xkliu@cug.edu.cn, jsy@cug.edu.cn, ruicug@163.com, wenchaoshu@live.cn, hjbb@cug.edu.cn, yongfangsun@cug.edu.cn, s@cug.edu.cn, chudl@cug.edu.cn, wuyuyang01@gmail.com, haijunsong@cug.edu.cn
Yuyang Wu
Affiliation:
State Key Laboratory of Biogeology and Environmental Geology, School of Earth Sciences, China University of Geosciences, Wuhan 430074, China. E-mail: xkliu@cug.edu.cn, jsy@cug.edu.cn, ruicug@163.com, wenchaoshu@live.cn, hjbb@cug.edu.cn, yongfangsun@cug.edu.cn, s@cug.edu.cn, chudl@cug.edu.cn, wuyuyang01@gmail.com, haijunsong@cug.edu.cn
Haijun Song*
Affiliation:
State Key Laboratory of Biogeology and Environmental Geology, School of Earth Sciences, China University of Geosciences, Wuhan 430074, China. E-mail: xkliu@cug.edu.cn, jsy@cug.edu.cn, ruicug@163.com, wenchaoshu@live.cn, hjbb@cug.edu.cn, yongfangsun@cug.edu.cn, s@cug.edu.cn, chudl@cug.edu.cn, wuyuyang01@gmail.com, haijunsong@cug.edu.cn
*
*Corresponding author.

Abstract

The rapid and accurate taxonomic identification of fossils is of great significance in paleontology, biostratigraphy, and other fields. However, taxonomic identification is often labor-intensive and tedious, and the requisition of extensive prior knowledge about a taxonomic group also requires long-term training. Moreover, identification results are often inconsistent across researchers and communities. Accordingly, in this study, we used deep learning to support taxonomic identification. We used web crawlers to collect the Fossil Image Dataset (FID) via the Internet, obtaining 415,339 images belonging to 50 fossil clades. Then we trained three powerful convolutional neural networks on a high-performance workstation. The Inception-ResNet-v2 architecture achieved an average accuracy of 0.90 in the test dataset when transfer learning was applied. The clades of microfossils and vertebrate fossils exhibited the highest identification accuracies of 0.95 and 0.90, respectively. In contrast, clades of sponges, bryozoans, and trace fossils with various morphologies or with few samples in the dataset exhibited a performance below 0.80. Visual explanation methods further highlighted the discrepancies among different fossil clades and suggested similarities between the identifications made by machine classifiers and taxonomists. Collecting large paleontological datasets from various sources, such as the literature, digitization of dark data, citizen-science data, and public data from the Internet may further enhance deep learning methods and their adoption. Such developments will also possibly lead to image-based systematic taxonomy to be replaced by machine-aided classification in the future. Pioneering studies can include microfossils and some invertebrate fossils. To contribute to this development, we deployed our model on a server for public access at www.ai-fossil.com.

Information

Type
Featured Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (https://creativecommons.org/licenses/by-nc-sa/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is included and the original work is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use.
Copyright
Copyright © The Author(s), 2022. Published by Cambridge University Press on behalf of The Paleontological Society
Figure 0

Figure 1. Statistics of publications and citations of the topics of machine learning (ML) and convolutional neural networks (CNNs) in geoscience and its multidisciplinary fields from Web of Science (to 13 August 2021).

Figure 1

Figure 2. Example images of each class in our dataset, which contains 50 clades (Table 1). Specimens are not to scale. The source URLs of the images are provided in Supplementary Table S1.

Figure 2

Table 1. Number of samples for the three subsets and each class.

Figure 3

Figure 3. Schematic of a convolutional neural network, modified from Krizhevsky et al. (2012). FC layer, fully connected layer.

Figure 4

Table 2. Experiments for the three deep convolutional neural network (DCNN) architectures. For the “Load weights” column, pre-trained parameters were used for variable initialization (i.e., transfer learning). In the “Train layers” column, the settings of training/froze layers for Inception-v4 and Inception-ResNet-v2 follow the methods of Liu and Song (2020). For PNASNet-5-large, the trainable layers include cell_6, cell_7, cell_8, cell_9, cell_10, cell_11, aux_7, and final_layer in Liu et al. (2018b). “DA with RC” shows data augmentation with the random crop, the random cropped image covers 0.4–1 range of the original image (except experiment 7, which used a range of 0.65–1). Other data augmentation methods follow the methods of Szegedy et al. (2017) and Liu et al. (2018b). All experiments used batch normalization, dropout (with 0.8), and Adam optimizer. The input size of Inception-v4 and Inception-ResNet-v2 is 299 × 299, and that of PNASNet-5-large is 331 × 331. The decay rate for experiment 6 is 0.96 when training epochs <15 and 0.9 when training epochs ≥15. Experiment 13 was trained on the reduced Fossil Image Dataset (FID). During the training processing, we printed the output (including train/validation loss and accuracy) for each 1000 iterations and tried the model's performance for each two epochs. The maximum training/validation accuracy and minimum training/validation loss have the best results among all outputs. Similarly, the maximum top-1/top-3 test accuracies have the best performance of the whole training process.

Figure 5

Figure 4. Curves demonstrate the (A) training loss, (B) training accuracy, (C) validation loss, and (D) validation accuracy of three deep convolutional neural network architectures during the training process. Experiments 8, 10, 11, and 14 are from Table 2. The fluctuations of the validation loss/accuracy may result in a higher learning rate. With a lower learning rate, more training epochs could smooth the curves and improve the accuracy, but it would also take longer to train the model, considering it currently takes 40–100 hours to train 40 epochs (depending on whether deep half layers or all layers were fine-tuned).

Figure 6

Figure 5. Distribution of individual clade recall with the volume of training images in the Fossil Image Dataset (FID). Δrecall equals accuracy on the FID minus accuracy on the reduced FID.

Figure 7

Table 3. Optimum performance from experiment 10 of Inception-ResNet-v2, which analyzed validation and test datasets to reduce occasional fluctuations in data.

Figure 8

Figure 6. Receiver operating characteristic (ROC) curves of an average of 50 clades (dashed curves), the five highest, and five lowest classes from the validation and test datasets. AUC describes the area under the ROC curve. Ideally, an area close to 1 is the best scenario. Black dashed line comprises 0.5 ROC space, indicating a random prediction.

Figure 9

Figure 7. Visual explanation of samples from the test set, including the original image, gradient-weighted class activation mapping (Grad-CAM) fused with the original image, and guided Grad-CAM. The lower rectangle shows the predicted label and its probability. U–X were predicted incorrectly, and their true labels are crinoid, sponge, trace fossil, and bivalve, respectively. The red (blue) regions correspond to a high (low) score for predicting contribution in Grad-CAM. Specimens are not to scale. The image URLs are provided in Supplementary Table S1.

Figure 10

Figure 8. Visualization of the feature maps of different layers from Inception-ResNet-v2. From convd2_3 to mixed _7b (Supplementary Fig. S1), layers become deeper. First column of each layer is the averaged feature map (A), and the remaining column feature maps are nine examples of this layer (B–J). The dimensions of convd2_3, convd2_5, mixed_5b, mixed_6a, mixed_7a, and conv_7b are 147 × 147 × 64, 71 × 71 × 192, 35 × 35 × 320, 17 × 17 × 1088, 8 × 8 × 2080, and 8 × 8 × 1536, respectively. A schematic of the Inception-ResNet-v2 architecture is provided in Supplementary Fig. S1. Yellow (blue) pixels correspond to higher (lower) activations.

Figure 11

Figure 9. Feature visualization of the feature maps extracted from the final global average pooling layer of the Inception-ResNet-v2 architecture with 2000 random images (each class contains 40 images with 0.88 accuracy) in the test set using t-distributed stochastic neighbor embedding (t-SNE). The class order in alphabetical order is shown in Table 1. Some of the samples of clustering into other groups are shown in the rectangle with input images and their predicted labels. Specimens are not to scale. The image URLs are provided in Supplementary Table S1.