Hostname: page-component-89b8bd64d-rbxfs Total loading time: 0 Render date: 2026-05-13T23:39:25.609Z Has data issue: false hasContentIssue false

PhyloKey: a novel method to rapidly and reliably identify species in complex, species-rich genera, and an opportunity for ‘non-molecular museomics’

Published online by Cambridge University Press:  22 September 2023

Robert Lücking*
Affiliation:
Botanischer Garten und Botanisches Museum Berlin, Freie Universität Berlin, 14195 Berlin, Germany
Bibiana Moncada
Affiliation:
Botanischer Garten und Botanisches Museum Berlin, Freie Universität Berlin, 14195 Berlin, Germany Licenciatura en Biología, Universidad Distrital Francisco José de Caldas, Torre de Laboratorios, Herbario, Bogotá D.C., Colombia; Research Associate, Science & Education, The Field Museum, Chicago, IL 60605, USA
Manuela Dal Forno
Affiliation:
Botanical Research Institute of Texas, Fort Worth Botanic Garden, Fort Worth, TX 76107, USA Research Associate, Department of Botany, Smithsonian Institution, National Museum of Natural History, Washington, DC 20560, USA
*
Corresponding author: Robert Lücking; Email: r.luecking@bo.berlin

Abstract

We present a novel identification tool called PhyloKey, based on the method of morphology-based, phylogenetic binning developed within the software package RAxML. This method takes a reference data set of species for which both molecular and morphological data are available, computes a molecular reference tree, maps the morphological characters on the tree, and computes weights based on their level of consistency versus homoplasy using maximum likelihood (ML) and maximum parsimony (MP). Additional units for which only morphological data are known are then binned onto the reference tree, calculating bootstrap support values for alternative placements. This approach is modified here to work as an identification tool which uses the same character coding approach as interactive keys. However, rather than identifying individual samples through a progressive filtering process when entering or selecting characters, query samples are binned in batch mode to all possible alternative species in the tree, with each placement receiving a bootstrap support adding to 100% for all alternative placements. In addition to the fact that, after scoring a character matrix, a large number of specimens can be identified at once in short time, all possible alternative identifications are immediately apparent and can be evaluated based on their bootstrap support values. We illustrate this approach using the basidiolichen genus Cora, which was recently shown to contain hundreds of species. We also demonstrate how the PhyloKey approach can aid the restudying of herbarium samples, adding further value to these collections and contributing with large quantitative data matrices to ‘non-molecular museomics’. Our analysis showed that PhyloKey identifies species correctly with as low as 50% of the characters sampled, depending on the nature of the reference tree and the character weighting scheme. Overall, a molecular reference tree worked best, but a randomized reference tree gave more consistent results, whereas a morphological reference tree performed less well. Surprisingly, even character weighting gave the best results, followed by parsimony weighting and then maximum likelihood weighting.

Information

Type
Standard Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Author(s), 2023. Published by Cambridge University Press on behalf of the British Lichen Society
Figure 0

Figure 1. Selected characters and character states used to score Cora species and samples. For complete list of characters and states, see Table 1. A, sutures in C. suturifera Nugra et al. B, rugose surface in C. auriculeslia B. Moncada et al. C, narrowly undulate surface in C. celestinoa B. Moncada et al. D, broadly undulate surface in C. imi Lücking et al. E, pitted surface in C. elephas Lücking et al. F, setose upper surface in C. barbifera B. Moncada et al. G, strigose upper surface in C. hirsuta (B. Moncada & Lücking) Moncada & Lücking. H, soredia in C. hawksworthiana Dal-Forno et al. I, viaduct-shaped upper cortex in C. leslactuca Lücking et al. J, paraplectenchymatous upper cortex in Corella melvinii (Chaves et al.) Lücking et al. K, papillae in the lower medulla in Cora haledana Dal-Forno et al. L, adnate hymenophore in C. soredavidia. M, concentric hymenophore in C. viliewoa Lücking et al. N, cyphelloid hymenophore in C. benitoana B. Moncada et al. O, pigment (after rewetting) in C. rubrosanguinea Nugra et al. In colour online.

Figure 1

Table 1. Characters used to score Cora species and samples. For visual character definitions, see Fig. 1, and for additional details on character definitions, see Dal Forno et al. (2022).

Figure 2

Table 2. Details of herbarium specimens held at B and traditionally identified as Cora pavonia or Dictyonema glabratum, used for the PhyloKey test.

Figure 3

Table 3. Character weights used in the different setups to bin the Cora samples onto the reference tree. For ‘even’, all characters were weighted equally. The MP and ML weights were derived from the corresponding weight vector algorithm implemented in RAxML, depending on the underlying reference tree. MP = maximum parsimony, ML = maximum likelihood, Mol = molecular reference tree, Mor = morphological reference tree, Ran = randomized reference tree. Note the differences in character weights between MP and ML approaches and between underlying reference trees.

Figure 4

Table 4. Results of the ‘simulated’ Cora test samples with increasing number of missing characters. MP = maximum parsimony; ML = maximum likelihood.

Figure 5

Figure 2. Performance of the combined score (number of matches and mean bootstrap support) relative to the nature of the reference tree for Cora samples with decreasing number of sampled characters (e.g. minus_01 means one less character out of the 20 scored and so forth, while minus_19 means only one character was used). In colour online.

Figure 6

Figure 3. Labelled classification tree resulting from phylogenetic binning of the simulated Cora samples onto the random reference tree under an even weighting scheme. Blue species names = correctly binned samples with decreasing number of randomly sampled characters; orange filled circles= incorrectly binned samples with low number of randomly sampled characters with ≥ 70% bootstrap support. For detailed tree, see Supplementary Material File S7 (available online). In colour online.

Supplementary material: File

Lücking et al. supplementary material

Lücking et al. supplementary material

Download Lücking et al. supplementary material(File)
File 284.2 KB