Hostname: page-component-5db58dd55d-4jdj6 Total loading time: 0 Render date: 2026-06-01T11:37:15.780Z Has data issue: false hasContentIssue false

Addressing the “open world”: detecting and segmenting pollen on palynological slides with deep learning

Published online by Cambridge University Press:  26 August 2025

Jennifer T. Feng
Affiliation:
Department of Plant Biology, University of Illinois , Urbana, Illinois 61801, U.S.A. Department of Anthropology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, U.S.A.
Sandeep Puthanveetil Satheesan
Affiliation:
National Center for Supercomputing Applications , Urbana, Illinois 61801, U.S.A.
Shu Kong
Affiliation:
Faculty of Science and Technology, University of Macau, Macau 999078, China
Timme H. Donders
Affiliation:
Department of Physical Geography, Utrecht University , 3584 CS Utrecht, The Netherlands
Surangi W. Punyasena*
Affiliation:
Department of Plant Biology, University of Illinois , Urbana, Illinois 61801, U.S.A. National Center for Supercomputing Applications , Urbana, Illinois 61801, U.S.A.
*
Corresponding author: Surangi W. Punyasena; Email: spunya1@illinois.edu

Abstract

Fossil pollen analysis is an “open-world” problem in paleontology for which there is a long-standing need for automated identification and classification. In the open world, categorical classes are imbalanced, test classes are not known a priori, and test data are captured across different domains. Pollen samples capture large numbers of specimens that include both common and abundant types and rare and sometimes novel taxa. Pollen is diverse morphologically and features can be altered during fossilization. Additionally, there is little standardization in the imaging of pollen samples. Therefore, generalized workflows for automated pollen analysis require techniques that are robust to these differences and can work with microscope images. We focus on a critical first step, the initial detection of pollen specimens on a palynological slide and review how existing methods can be employed to build robust and generalizable analysis pipelines. First, we demonstrate how a mixture-of-experts approach—the fusion of a general pollen detector with an expert model trained on minority classes—can be used to address taxonomic biases in detections, particularly the missed detections of rarer pollen types. Second, we demonstrate the efficiency of domain fine-tuning in addressing domain gaps—differences in image magnification and resolution across microscopes and of taxa across different sample sources. Third, we demonstrate the importance of continual learning workflows, which integrate expert feedback, in training detection models from incomplete data. Finally, we demonstrate how cutting-edge segmentation models can be used to refine and clean detections for downstream deep learning classification models.

Information

Type
Methodological Advances
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Paleontological Society
Figure 0

Figure 1. A, Example of a slide scan (~20.5 × 20.5mm), image stack (7 or 9 focal planes with 3 or 4 μm step size), image tile (1040 × 1392 pixels), and crop (800 × 800 pixels). Comparison of 1040 × 1392 pixel image tiles taken with a (B) standard upright microscope (0.146 μm/pixel) and (C) slide-scanning microscope (0.225 μm/pixel). The 40 × 40 μm boxes highlight Lycopodium spores in each image for comparison. Noticeable differences include color, brightness, and scale. The upright microscope domain was used in training the general pollen detection model (GPDM) and small-grain detection model (SGDM), and the slide-scanning microscope domain was used for domain fine-tuning and continual learning.

Figure 1

Table 1. Summary of the two different slide imaging and image tiling methods used in this paper, identifying several variables: the sediment core source for the pollen sample, the techniques described in the paper used on those samples, the number of slides used in the analysis, the microscope used for imaging, the lens magnification and image resolution, the number of focal planes in the image stacks, the image stack focal plane step size, the focal depth range of the image stack, the slide image tiling method, and the tile dimensions. Samples were imaged using two different but comparable bright-field microscopy methods, making use of the available institutional resources at Utrecht University and the University of Illinois, Urbana-Champaign. GPDM, general pollen detection model

Figure 2

Figure 2. The 18 most common palynomorphs (>20 training examples, representing 92% of the total dataset): A,Alnus; B, Apiaceae; C, Asteraceae Tubuliflorae-type; D, Cyperaceae; E,Hedyosmum; F,Myrica; G,Myrsine; H,Plantago; I,Polylepis spp.; J, Poaceae; K,Podocarpus; L,Valeriana; M,Cecropia; N, Melastomataceae; O, Urticaceae-Moraceae; P,Huperzia; Q,Isoetes; and R,Lycopodium clavatum (exote marker). Black borders indicate taxa with medium to large pollen grains (A–L), medium-gray borders indicate taxa with small grains (M–O), and light gray borders indicate plant spores included in the annotated dataset (P–R). Scale bars, 10 μm.

Figure 3

Figure 3. Setups for: A, the general pollen detection model (GPDM), B, the mixture-of-experts technique, C, transfer learning across imaging domains, and D, continual learning with human-in-the-loop annotation. A, The process of training the GPDM by splitting the annotated dataset into training and validation sets, used respectively for training the model and for model selection. The validation set was then passed through the model to obtain a list of detections and associated confidence scores. Using the detections, we drew a precision–recall curve for model evaluation. B–D are variations on A. In B, we trained a small-grain detection model (SGDM) on a subset of the PAL IV data, selecting only image stacks that contained Urticaceae-Moraceae, Melastomataceae, or Cecropia grains and revising the masks to only represent these taxa. The SGDM was then fused with the GPDM into a single pipeline. In C, we fine-tuned the GPDM on slides from a new domain, PAL 1999. In D, we implemented a continual learning workflow. We used Slide 1 to fine-tune the GPDM in TP0, producing the TP1 detector, then used the TP1 detector to detect pollen in image stacks from Slide 2. In the fine-tuning stage, experts manually verified detections so that the detections served as new training data to fine-tune detectors. The process was repeated continually in subsequent time steps.

Figure 4

Figure 4. The detection workflow. A, One plane of the image stack, overlaid with the original square annotations. B, The ground-truth distance transform mask, created from the annotations. C, The softmax layer, one of the model outputs. D, The predicted distance transform mask, a second model output. E, The predicted pollen grain centers, determined by calculating the peaks in the distance transform mask. F, The detection mask, created using the predicted pollen grain centers and radii, overlaid on the image with confidence scores below each detection. The detection mask was thresholded at a confidence score of 0.025. Note that a single image is used solely to illustrate our workflow. In our study, training and evaluation images were not duplicated.

Figure 5

Figure 5. Results of general pollen detection model (GPDM), application of the mixture-of-experts technique, and application of transfer learning. A, For the mixture-of-experts technique, comparison of the blue (GPDM) and yellow (fused model) curves shows how the addition of an expert model trained only on small pollen grains improves maximum model recall from 93% to 95%. B, Comparison of detections from the general model, the small-grain model, and the fused model. Boxes in each panel indicate ground-truth labels, and circles indicate detections. The color of the detections is arbitrary. Confidence scores are shown adjacent to each detection. small-grain detection model (SGDM) confidence scores in the fused model have been calibrated as described in the text. A validation-set image stack containing a Cecropia pollen grain and Isoetes spore was fed into the two models. The GPDM detected the Isoetes spore with high confidence but missed the Cecropia grain. The SGDM was able to detect the Cecropia grain but had poor localization accuracy and low confidence for the Isoetes spore. The fused model kept both detections. C, Blue and yellow bars indicate model recall by taxon for the GPDM and the fused (GPDM + SGDM) models respectively (left y-axis). The black and gray lines indicate the abundance distribution of taxa in the GPDM training and validation datasets, respectively (right y-axis). Both models were thresholded at the confidence value that yielded 20% precision. The taxa highlighted in orange are particularly small-grained taxa, which have low rates of detection relative to the number of training examples. Hatched bars indicate taxa that had no training examples but were in the validation set and detected by the detector. mAP, mean average precision; PAL IV.

Figure 6

Figure 6. Comparison of the lightest and darkest purple curves shows the improvement of the model after fine-tuning. Maximum recall increased from 82% to 93% and precision increased from 10% to 46% at the 80% recall level. The medium purple curve, representing the performance of a model trained from scratch on the PAL 1999 domain, shows that training from scratch on a small dataset is not as effective as fine-tuning.

Figure 7

Figure 7. A–C, The improvement in model performance from fine-tuning in three time periods. A, The light blue curve represents the performance of the general pollen detection model (GPDM) on the TP0 validation set. The darker blue line shows the increase in model performance after fine-tuning on the TP0 training set with ground-truth annotations. B, The second time step, TP1. The dashed gray curve is the same as the dark blue curve in A. Comparing the dashed gray curve with the light blue curve shows the drop in model performance when switching slides and introducing a domain gap. Comparing the light and dark blue curves, we see that fine-tuning on the expert-verified TP0 model detections of the TP1 training set improves model performance again. C, A similar pattern is shown in the next time step, TP2. D, A comparison of the performance of the three models on the same validation set. Here, the TP0 + TP1 + TP2 validation set. mAP, mean average precision.

Figure 8

Figure 8. Illustration of segmentation of cropped and detected pollen images using Segment Anything Model 2 (SAM-2). Top row shows the cropped pollen grain images produced by our U-Net convolutional neural networks (CNN) pollen detection model. Taxon identifications from left to right: Amaranthaceae, Lycopodium, Alnus, Poaceae. Bottom row shows the grain segmented with SAM-2 using an input prompt. The prompt is generated by the detection model and is illustrated with a small green cross.