Bayesian inference of phylogenetic trees is not misled by correlated discrete morphological characters

Xueer Liu; Chi Zhang

doi:10.1017/pab.2025.10076

Bayesian inference of phylogenetic trees is not misled by correlated discrete morphological characters

Published online by Cambridge University Press: 10 October 2025

Xueer Liu and

Chi Zhang

Show author details

Xueer Liu: Affiliation:
Key Laboratory of Vertebrate Evolution and Human Origins, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing 100044, China University of Chinese Academy of Sciences , Beijing 101408, China
Chi Zhang*: Affiliation:
Key Laboratory of Vertebrate Evolution and Human Origins, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing 100044, China University of Chinese Academy of Sciences , Beijing 101408, China
*: Corresponding author: Chi Zhang; Email: zhangchi@ivpp.ac.cn

Article contents

Abstract
Non-technical Summary
Introduction
Methods
Results
Discussion
Conclusion
Author Contribution
Competing Interests
Data Availability Statement
Footnotes
References

Rights & Permissions

Abstract

Morphological characters are central to phylogenetic inference, especially for fossil taxa for which genomic data are unavailable. While Bayesian methods have gained popularity in recent years, they typically assume characters evolve independently, despite known correlations among characters. Here, we assess the impact of character correlation and evolutionary rate heterogeneity on Bayesian phylogenetic inference using extensive simulations of binary characters evolving under independent and correlated models. We find that Bayesian inference assuming character independence accurately recovers tree topologies even when characters are strongly correlated or evolve under heterogeneous rates. However, branch lengths or clock rates tend to be underestimated, particularly under extreme rate heterogeneity. These biases are partially corrected using models that integrate over character-state heterogeneity. Our results demonstrate that Bayesian methods are robust to violations of character independence in topological inference, supporting their continued use in morphological phylogenetics.

Information

Type: Article
Information: Paleobiology , First View , pp. 1 - 9

DOI: https://doi.org/10.1017/pab.2025.10076 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use and/or adaptation of the article.
Copyright: © The Author(s), 2025. Published by Cambridge University Press on behalf of Paleontological Society

Non-technical Summary

Scientists often use morphological traits to figure out how fossil species are related. A popular method to do this assumes each trait changes on its own, even though many traits can be linked. This study used computer simulations to see how much this assumption affects the results. The researchers found that even when traits are connected or change at different rates, the method still does a good job figuring out the species tree. However, it can make mistakes in estimating how fast the traits changed over time. Some improved models help fix these errors. Overall, the study shows that current methods work well for figuring out relationships among species using morphological traits.

Introduction

Phylogenetic inference is essential for answering various questions in evolutionary biology. Despite the tremendous amount of genomic data available, morphological characters remain the primary or sole information to infer phylogenies of fossil taxa and to study deep-time divergence (Lee and Palci Reference Lee and Palci2015; Donoghue and Yang Reference Donoghue and Yang2016). Discrete characters are the main type of data and are traditionally analyzed under maximum parsimony. In recent years, model-based methods, including maximum likelihood and Bayesian inference, have been shown to have comparable or better performance in inferring phylogenies (Wright and Hillis Reference Wright and Hillis2014; O’Reilly et al. Reference O’Reilly, Puttick, Parry, Tanner, Tarver, Fleming, Pisani and Donoghue2016, Reference O’Reilly, Puttick, Pisani and Donoghue2018; Brown et al. Reference Brown, Parins-Fukuchi, Stull, Vargas and Smith2017; Puttick et al. Reference Puttick, O’Reilly, Tanner, Fleming, Clark, Holloway and Lozano-Fernandez2017, Reference Puttick, O’Reilly, Pisani and Donoghue2019; Smith Reference Smith2019; Keating et al. Reference Keating, Sansom, Sutton, Knight and Garwood2020). Among these methods, characters are treated as independent features. The simplest model for discrete characters is the Mk model (Lewis Reference Lewis2001), in which the rates of changes among the k states are equal. The most frequently used Mkv model (with suffix “v”; Lewis Reference Lewis2001) is a variant that accounts for the ascertainment bias of coding only variable characters.

Correlation among characters has long been recognized. One classical example is the tail-presence and tail-color problem (Maddison Reference Maddison1993), because the two characters are logically dependent. Several algorithmic solutions have been proposed to handle such cases, under either parsimony- or model-based criteria (Brazeau et al. Reference Brazeau, Guillerme and Smith2019; Goloboff et al. Reference Goloboff, Laet, Ríos-Tamayo and Szumik2021; Hopkins and St. John Reference Hopkins and St. John2021; Tarasov Reference Tarasov2023). Another example is that some characters are functionally or developmentally dependent (Beaulieu and Donoghue Reference Beaulieu and Donoghue2013; Leslie et al. Reference Leslie, Beaulieu, Crane, Knopf and Donoghue2015; Billet and Bardin Reference Billet and Bardin2019). This study mainly deals with the latter but also covers the former if the inapplicable states are governed by a hidden process (Tarasov Reference Tarasov2021). In general, correlated discrete characters can be modeled by a Markov chain with rates among states as parameters (Pagel Reference Pagel1994; Pagel and Meade Reference Pagel and Meade2006). However, such a model is typically employed in phylogenetic comparative methods to study the evolution of two or three characters given fixed trees (Pagel et al. Reference Pagel, Meade and Barker2004; Pagel and Meade Reference Pagel and Meade2006; Beaulieu and Donoghue Reference Beaulieu and Donoghue2013; Billet and Bardin Reference Billet and Bardin2019), but is rarely used for inference of phylogenetic trees, as the number of parameters grows so dramatically with the number of characters and the model quickly becomes unidentifiable. Instead, all inference methods (parsimony, maximum likelihood, and Bayesian) typically assume all the characters are independent (Felsenstein Reference Felsenstein1985a).

Simulation studies have shown that Bayesian inference assuming character independence outperforms parsimony-based solutions in the case of logical dependence (Simões et al. Reference Simões, Vernygora, de Medeiros and Wright2022). However, no study so far has investigated how the Bayesian method performs in the general case of character dependence. Previous simulations used the simplest Mkv model for inference (Wright and Hillis Reference Wright and Hillis2014; O’Reilly et al. Reference O’Reilly, Puttick, Parry, Tanner, Tarver, Fleming, Pisani and Donoghue2016, Reference O’Reilly, Puttick, Pisani and Donoghue2018; Puttick et al. Reference Puttick, O’Reilly, Tanner, Fleming, Clark, Holloway and Lozano-Fernandez2017, Reference Puttick, O’Reilly, Pisani and Donoghue2019; Smith Reference Smith2019; Keating et al. Reference Keating, Sansom, Sutton, Knight and Garwood2020), thus rate heterogeneity in state changes and across characters was not considered. Those studies also focused on non-clock (unrooted) trees. Herein we perform computer simulations to study the performance of Bayesian inference assuming character independence, with data simulated under either independent or dependent evolution under various conditions of evolutionary rate heterogeneity. We perform both non-clock and tip-dating analyses, and in the latter case, fossil ages are used and the results are dated (rooted) timetrees (Pyron Reference Pyron2011; Ronquist et al. Reference Ronquist, Klopfstein, Vilhelmsen, Schulmeister, Murray and Rasnitsyn2012a; Zhang et al. Reference Zhang, Stadler, Klopfstein, Heath and Ronquist2016).

Methods

Markov Models

In general, discrete character evolution can be modeled by a Markov chain with a Q-matrix specifying the rates of changes (Pagel Reference Pagel1994). We first describe the model for a single binary character, then the models for a doublet and a triplet of correlated binary characters. For simplicity, we do not further consider correlation of four or more characters, or characters with more than two states.

For a binary character, the changes between states 0 and 1 are determined by this instantaneous rate matrix

$$ {Q}_1=\unicode{x03BB} \left[\begin{array}{cc}-{\unicode{x03C0}}_1& {\unicode{x03C0}}_1\\ {}{\unicode{x03C0}}_0& -{\unicode{x03C0}}_0\end{array}\right], $$

and the transition probability matrix is

$$ P(t)=\left[\;\begin{array}{cc}{\unicode{x03C0}}_0+{\unicode{x03C0}}_1{\mathrm{e}}^{-\unicode{x03BB} t}& {\unicode{x03C0}}_1-{\unicode{x03C0}}_1{\mathrm{e}}^{-\unicode{x03BB} t}\\ {}{\unicode{x03C0}}_0-{\unicode{x03C0}}_0{\mathrm{e}}^{-\unicode{x03BB} t}& {\unicode{x03C0}}_1+{\unicode{x03C0}}_0{\mathrm{e}}^{-\unicode{x03BB} t}\end{array}\right]. $$

This model extends the Mk model (Lewis [Reference Lewis2001], in which π₀ = π₁ = 0.5), allowing the equilibrium state frequencies to vary (Ronquist and Huelsenbeck Reference Ronquist and Huelsenbeck2003; Klopfstein et al. Reference Klopfstein, Vilhelmsen and Ronquist2015; Wright et al. Reference Wright, Lloyd and Hillis2016) and is a two-state variate of the F81 model (Felsenstein Reference Felsenstein1981). It has two free parameters (λ and π₀). The average rate of change is 2λπ₀π₁. Because λ and t are multiplied together in the transition probability matrix, they are not identifiable without further assumptions about the time and/or the rate.

For a doublet of binary characters, the general model for the four state pairs, 00, 01, 10 and 11, is introduced with eight free parameters (Pagel Reference Pagel1994). The model is not necessarily time reversible, and the Q-matrix may have complex eigenvalues and eigenvectors. For mathematical convenience, we reparametrize the Q-matrix as

$$ {Q}_2=\left[\begin{array}{cccc}\cdot & {a\unicode{x03C0}}_2& {b\unicode{x03C0}}_3& 0\\ {}{a\unicode{x03C0}}_1& \cdot & 0& {c\unicode{x03C0}}_4\\ {}{b\unicode{x03C0}}_1& 0& \cdot & {d\unicode{x03C0}}_4\\ {}0& {c\unicode{x03C0}}_2& {d\unicode{x03C0}}_3& \cdot \end{array}\right]=\left[\begin{array}{cccc}\cdot & a& b& 0\\ {}a& \cdot & 0& c\\ {}b& 0& \cdot & d\\ {}0& c& d& \cdot \end{array}\right]\left[\begin{array}{cccc}{\unicode{x03C0}}_1& 0& 0& 0\\ {}0& {\unicode{x03C0}}_2& 0& 0\\ {}0& 0& {\unicode{x03C0}}_3& 0\\ {}0& 0& 0& {\unicode{x03C0}}_4\end{array}\right], $$

with {a, b, c, d} as the exchangeability rates and π = {π₁, π₂, π₃, π₄} as the equilibrium state frequencies for the four state pairs. The model is then time-reversible with seven free parameters. This can be viewed as a special case of the GTR model (Tavaré Reference Tavaré1986; Yang Reference Yang1994a). Setting q ₁₂ = q ₃₄, q ₁₃ = q ₂₄, q ₂₁ = q ₄₃, and q ₃₁ = q ₄₂ results in independent evolution with four parameters (π is derived from {a, b, c, d}), and will be equivalent to the Mk model by further constraining a = b = c = d (as few as one free parameter).

Similarly, we can use this rate matrix,

$$ {Q}_3=\left[\begin{array}{cccccccc}\cdot & a& b& 0& i& 0& 0& 0\\ {}a& \cdot & 0& c& 0& j& 0& 0\\ {}b& 0& \cdot & d& 0& 0& k& 0\\ {}0& c& d& \cdot & 0& 0& 0& l\\ {}i& 0& 0& 0& \cdot & e& f& 0\\ {}0& j& 0& 0& e& \cdot & 0& g\\ {}0& 0& k& 0& f& 0& \cdot & h\\ {}0& 0& 0& l& 0& g& h& \cdot \end{array}\right]\left[\begin{array}{cccccccc}{\unicode{x03C0}}_1& 0& 0& 0& 0& 0& 0& 0\\ {}0& {\unicode{x03C0}}_2& 0& 0& 0& 0& 0& 0\\ {}0& 0& {\unicode{x03C0}}_3& 0& 0& 0& 0& 0\\ {}0& 0& 0& {\unicode{x03C0}}_4& 0& 0& 0& 0\\ {}0& 0& 0& 0& {\unicode{x03C0}}_5& 0& 0& 0\\ {}0& 0& 0& 0& 0& {\unicode{x03C0}}_6& 0& 0\\ {}0& 0& 0& 0& 0& 0& {\unicode{x03C0}}_7& 0\\ {}0& 0& 0& 0& 0& 0& 0& {\unicode{x03C0}}_8\end{array}\right], $$

for a triplet of binary characters with eight states, 000, 001, 010, 011, 100, 101, 110 and 111. Simultaneous changes of two or three states are negligible, so that their rates are zero. The model has 19 free parameters. The average rate is −∑ _iπ _iq_ii, where q_ii is the i ^th diagonal element in the Q-matrix.

Simulation Procedure

We first generated variable timetrees from a birth–death process using TreeSim in R (Stadler Reference Stadler2011) with a birth rate of 5.0 and a death rate of 4.0, conditioned on a root age of 1.0. The ages are on a relative scale and can be arbitrarily rescaled depending on the chosen time unit. From the trees simulated, we kept 100 trees with no more than 50 tips to make sure the data size is manageable. We simply treated the extinct tips as fossils, without further sampling fossils along the tree. The distribution of tree lengths and the numbers of extant and extinct tips are shown in Figure 1.

Figure 1. The distribution of tree length (A) and the numbers of extant and extinct tips (B) of the simulated trees.

For each tree, we then simulated evolution of discrete morphological characters along the tree under various models and settings (Table 1). The general procedure was generating the exponential waiting times and using the jump chain given the Q-matrix (Yang Reference Yang2014: section 12.5.4). This is particularly useful when the transition probability matrix is hard to derive. The starting state at the root was randomly drawn from the equilibrium frequencies (π). Only variable characters were kept at the tips referring to empirical practices.

Table 1. Models and settings used in the simulations and inferences. See “Methods” for the explanations of the symbols.

For independent binary characters, the simplest model is fixing π₀ = π₁ = 0.5 (referred to as M2v herein) for all characters, representing homogeneous evolution. To introduce heterogeneity in character states, we drew π₀ from a uniform distribution independently for each character and let π₁ = 1 − π₀ (the F81-alike extension, referred to as F2v herein). We rescaled Q ₁ so that the average rate per character (i.e., the base clock rate) is 1.0, such that the branch lengths in the tree are measured by distance. To further introduce heterogeneity along time, each branch length for each character was multiplied by a relative rate r independently drawn from a lognormal distribution with mean 1.0 and variance 4.0, representing the most heterogeneous case (referred as “no common mechanism” [NCM]; Tuffley and Steel Reference Tuffley and Steel1997). We recorded a moderate size of 200 variable characters in the data matrix in each replicate of the three settings.

We used the rate matrix Q ₂ to simulate pairs of binary characters (referred as G4v herein). We drew {a, b, c, d} and {π₁, π₂, π₃, π₄} from a symmetric Dirichlet distribution with parameter 10 (representing slight correlation) or 1.0 (severe correlation) for each doublet. We rescaled Q ₂ to have an average rate of 2.0 (per character rate being 1.0). To further have heterogeneous evolutionary rates, each branch length for each doublet is multiplied by an independent relative rate r as we did previously. We recorded 200 characters (i.e., 100 doublets) and ensured that all characters were variable in the data matrix.

Similarly, with three settings, we rescaled Q ₃ to have an average rate of 3.0 (per character rate being 1.0), and simulated triplets of binary characters (referred as G8v). We recorded 201 variable binary characters (i.e., 67 triplets) but discarded the last character, so that we still had 200 characters in the data matrix. Considering that many empirical datasets are much smaller, we repeated the simulations under the same procedure with 50 variable correlated characters.

Phylogenetic Inference

Each data matrix was analyzed using the Bayesian phylogenetic inference software MrBayes 3.2.7 (Ronquist et al. Reference Ronquist, Teslenko, van der Mark, Ayres, Darling, Höhna, Larget, Liu, Suchard and Huelsenbeck2012b). All characters were treated as independent, no matter how they were simulated. They were also treated as a single partition, meaning the branch lengths are shared by all characters (referred to as “common mechanism”; Tuffley and Steel Reference Tuffley and Steel1997). This setting reflects the practice in most empirical analyses.

MrBayes supports both the M2v and F2v models. The M2v model has no free parameter other than the tree topology and branch lengths, while the F2v model has an extra parameter, π₀, which is averaged using a discretized symmetric beta prior with parameter α (Wright et al. Reference Wright, Lloyd and Hillis2016). We used an exponential hyperprior with mean 1.0 (Exp(1)) for α by default. For datasets simulated under NCM, we partially accommodated rate variation among characters using a discrete gamma distribution (Yang Reference Yang1994b) (F2v+G; Table 1).

The non-clock analyses used the morphological data only and the branch lengths were measured by distance. As we simulated timetrees with both extant and extinct tips, we further incorporated the tip ages in another round of tip-dating analyses, so that we could disentangle the times and clock rates. The tip ages were assigned their true values assuming they are perfectly known. We specified diffuse Exp(1) prior for the root age and mean clock rate, used constant-rate fossilized birth–death prior (Stadler Reference Stadler2010) for the timetree and independent lognormal relaxed clock (Drummond et al. Reference Drummond, Ho, Phillips and Rambaut2006) for the evolutionary rate variation, following common practices.

For each inference, two independent Markov chain Monte Carlo runs were executed each for 8 million generations with sampling frequency of 200. The beginning 35% of samples were discarded as burn-in, and the rest of the samples from the two runs were combined after checking consistency. We made sure the average standard deviation of split frequencies was below 0.02, and the effective sample sizes were all greater than 100. In rare cases, we had to resume the analysis or double the chain length until these criteria were satisfied. The posterior tree samples were summarized as a 50% majority-rule consensus tree.

Missing Data

The main procedure involves no missing data. We also repeated the analyses with 50% missing states in the extinct taxa and 10% missing in the extant taxa, mimicking the observation in empirical datasets. Specifically, we replaced each state by a question mark in the data matrix with the corresponding probability (i.e., 0.1 for extant and 0.5 for extinct taxa). Such replacement was performed randomly on the generated binary characters rather than on the doublets or triplets.

Tree Distance Metrics

We employed both the Quartet (Estabrook et al. Reference Estabrook, McMorris and Meacham1985) and Mutual Clustering Information (MCI; Smith Reference Smith2020) metrics for comparing the inferred tree with the true tree generating the data. The MCI metric is a generalized Robinson-Foulds (RF) distance metric (Robinson and Foulds Reference Robinson and Foulds1981) that is information based and less saturated; thus it is recommended over the RF metric (Smith Reference Smith2020). The Quartet metric also has several advantages over the RF metric and is also recommended (Smith Reference Smith2019).

Both distance metrics conflate accuracy and precision (Keating et al. Reference Keating, Sansom, Sutton, Knight and Garwood2020). Thus, we also calculated the Strict Joint Assertion (SJA, which is the number of quartets that are resolved identically in both trees over that resolved either identically or differently in both trees; Estabrook et al. Reference Estabrook, McMorris and Meacham1985) as a measure of accuracy, and the percentage of resolved internal branches (the number of internal branches in the estimated consensus tree over that in the true tree) as precision.

The quartet-related metrics are calculated using the package Quartet in R (Smith Reference Smith2019) and the MCI metric is using TreeDist in R (Smith Reference Smith2020).

Results

We aim to investigate the performance of Bayesian phylogenetic inference using the M2v and F2v models by comparing the inferred tree with the true tree simulating the data. The Quartet and MCI metrics measure the topological differences, and the tree lengths in non-clock analyses and tree heights in tip-dating analyses represent the branch-length estimates.

We first look at the results from data without missing states. The first two scenarios represent rate homogeneous evolution, and the models used in the inference can match that in the simulation, in which M2v is a special case of F2v. They are the best-case scenarios and act as a baseline. The results do show that the topologies and branch lengths are inferred with good accuracy (Figs. 2A–D, 3A–D, cases 1, 2). For the following four scenarios, the rates in the Q-matrix are quite similar, as they were generated from a Dirichlet distribution with parameter 10. As a result, the performance of the Bayesian inference is almost the same as when there is no rate variation (Figs. 2A–D, 3A–D, cases 3–6).

Figure 2. Tree distance metrics (Quartet and Mutual Clustering Information [MCI]) comparing the inferred tree with the true tree generating the data. Each violin plot contains 100 replicates. The left four panels show the results of non-clock analyses (A, C, E, G), while the right four panels show the results of tip-dating analyses (B, D, F, H). Panels labeled “w/ missing” (E–H) indicate scenarios with missing data. The numbers on the x-axis correspond to the following experiments (simulation model vs. inference model): 1, M2v–vs–M2v; 2, M2v–vs–F2v; 3, G4v(α = 10)–vs–M2v; 4, G4v(α = 10)–vs–F2v; 5, G8v(α = 10)–vs–M2v; 6, G8v(α = 10)–vs–F2v; 7, F2v(α = 1)–vs–M2v; 8, F2v(α = 1)–vs–F2v; 9, G4v(α = 1)–vs–M2v; 10, G4v(α = 1)–vs–F2v; 11, G8v(α = 1)–vs–M2v; 12, G8v(α = 1)–vs–F2v; 13, F2v(α = 1, v = 4)–vs–M2v; 14, F2v(α = 1, v = 4)–vs–F2v; 15, G4v(α = 1, v = 4)–vs–M2v; 16, G4v(α = 1, v = 4)–vs–F2v; 17, G8v(α = 1, v = 4)–vs–M2v; 18, G8v(α = 1, v = 4)–vs–F2v.

Figure 3. Relative bias (posterior mean minus the true value, then divided by the true value) and relative width of credibility interval (CI) (95% CI width divided by the true value) for each of the following experiments (simulation model vs. inference model): 1, M2v–vs–M2v; 2, M2v–vs–F2v; 3, G4v(α = 10)–vs–M2v; 4, G4v(α = 10)–vs–F2v; 5, G8v(α = 10)–vs–M2v; 6, G8v(α = 10)–vs–F2v; 7, F2v(α = 1)–vs–M2v; 8, F2v(α = 1)–vs–F2v; 9, G4v(α = 1)–vs–M2v; 10, G4v(α = 1)–vs–F2v; 11, G8v(α = 1)–vs–M2v; 12, G8v(α = 1)–vs–F2v; 13, F2v(α = 1, v = 4)–vs–M2v; 14, F2v(α = 1, v = 4)–vs–F2v; 15, G4v(α = 1, v = 4)–vs–M2v; 16, G4v(α = 1, v = 4)–vs–F2v; 17, G8v(α = 1, v = 4)–vs–M2v; 18, G8v(α = 1, v = 4)–vs–F2v. Each violin plot contains 100 replicates. The left four panels show the tree lengths from non-clock analyses (A, C, E, G), while the right four panels show the tree heights from tip-dating analyses (B, D, F, H). Panels labeled “w/ missing” (E–H) indicate scenarios with missing data.

The hardest situation appears to be when the data were simulated under F2v and each character has its own stationary frequencies (π₀ and π₁). The M2v model certainly does not account for this, resulting in larger tree distances (Fig. 2A–D, case 7) and underestimated tree lengths (Fig. 3A, case 7). In the tip-dating analyses, the tree height estimates are barely affected (Fig. 3B, case 7), but the clock rate is underestimated (Fig. 4A, case 7). The inference model of F2v is supposed to match the simulation condition; however, we did not estimate individual frequencies for each character due to identifiability issues. Instead, we averaged π₀ (and π₁ = 1 − π₀) using a discretized symmetric beta distribution (Wright et al. Reference Wright, Lloyd and Hillis2016). This strategy can correct the bias of tree-length or clock-rate estimates (Figs. 3A, 4A, case 8), but results in similar tree distances as using the M2v model (Fig. 2A–D, case 8). Having a further look at the accuracy (SJA) and precision (tree resolution) metrics, we find that the larger tree distances under M2v are largely contributed by decreased accuracy (Supplementary Figs. S1, S2, case 1 vs. case 7), whereas those under F2v are largely contributed by decreased precision (Supplementary Figs. S1, S2, case 2 vs. case 8).

Figure 4. Relative bias (posterior mean minus the true value, then divided by the true value) and relative width of credibility interval (CI) (95% CI width divided by the true value) of the base clock rate for each of the following experiments (simulation model vs. inference model): 1, M2v–vs–M2v; 2, M2v–vs–F2v; 3, G4v(α = 10)–vs–M2v; 4, G4v(α = 10)–vs–F2v; 5, G8v(α = 10)–vs–M2v; 6, G8v(α = 10)–vs–F2v; 7, F2v(α = 1)–vs–M2v; 8, F2v(α = 1)–vs–F2v; 9, G4v(α = 1)–vs–M2v; 10, G4v(α = 1)–vs–F2v; 11, G8v(α = 1)–vs–M2v; 12, G8v(α = 1)–vs–F2v; 13, F2v(α = 1, v = 4)–vs–M2v; 14, F2v(α = 1, v = 4)–vs–F2v; 15, G4v(α = 1, v = 4)–vs–M2v; 16, G4v(α = 1, v = 4)–vs–F2v; 17, G8v(α = 1, v = 4)–vs–M2v; 18, G8v(α = 1, v = 4)–vs–F2v. Each violin plot contains 100 replicates. The left two panels are scenarios without missing data (A and C), while the right two panels labeled with “w/ missing” (B and D) indicate scenarios with missing data.

Surprisingly, severe correlation in each pair or triplet of characters does not increase but instead decreases the tree distances (Fig. 2A–D, cases 9–12), although they still present higher distances than the homogeneous ones (Fig. 2A–D, cases 1–6). This results from both slightly increased accuracy and precision (Supplementary Figs. S1, S2, cases 7–12), likely because the rate heterogeneity for each character in these settings is slightly lower than that under independent evolution, which is reflected in the estimates of the shape parameter of the symmetric beta distribution (Supplementary Material, log files). Having more correlated characters (three vs. two) makes almost no difference in how the inference results are affected. Using the F2v model in inference cannot match the simulation models, but it is still helpful in slightly correcting the bias of underestimating tree length (Fig. 3A, cases 9–12) or clock rate (Fig. 4A, cases 9–12).

The most heterogeneous scenarios involve rate heterogeneity both among characters and across branches (NCM). However, using the simplest M2v model as well as the F2v model can achieve comparable performance as when there is no rate heterogeneity for inference of tree topology (Fig. 2A–D, cases 13–18, Supplementary Figs. S1, S2, cases 13–18). Evolutionary rate heterogeneity across branches appears to retain strong phylogenetic signal in the data. On the other hand, branch-length or clock-rate estimates are more biased, with M2v showing the most severe underestimation and narrowest credibility interval (CI) width (Figs. 3A, 4A, cases 13–18).

Empirical data typically contain many missing states. When 50% of states in the fossil taxa and 10% in the extant taxa are missing on average, we observe similar patterns as when there is no missing state, with decreased precision but similar accuracy (Figs. 2E–H, 3E–H, 4B, D, Supplementary Figs. S3, S4). In other words, missing data mostly result in more unresolved nodes in the trees and larger CIs of the tree lengths, but for the resolved part of the tree, the accuracy is similar to that when there is no missing data. Similar patterns are also observed when the number of characters is much smaller (50 vs. 200), with largely decreased precision and slightly decreased accuracy (Supplementary Fig. S5).

Discussion

The Bayesian method has been demonstrated to have good accuracy when data were generated under either common or no common mechanisms (Wright and Hillis Reference Wright and Hillis2014; O’Reilly et al. Reference O’Reilly, Puttick, Parry, Tanner, Tarver, Fleming, Pisani and Donoghue2016, Reference O’Reilly, Puttick, Pisani and Donoghue2018; Puttick et al. Reference Puttick, O’Reilly, Tanner, Fleming, Clark, Holloway and Lozano-Fernandez2017, Reference Puttick, O’Reilly, Pisani and Donoghue2019; Smith Reference Smith2019; Keating et al. Reference Keating, Sansom, Sutton, Knight and Garwood2020). We moved one step further and introduced character correlation in the simulations. Both the non-clock and tip-dating analyses suggest that Bayesian inference assuming character independence does not mislead the inference of tree topology when character correlation and rate heterogeneity are present. This is quite reassuring, as correlation and NCM have been argued to be quite common in morphological characters, and model-based methods are blamed for not accounting for these (Goloboff et al. Reference Goloboff, Torres and Arias2018, Reference Goloboff, Pittman, Pol and Xu2019).

However, when the interest is the branch lengths, they can be biased toward underestimation when evolutionary rate variation is high among characters and along branches. Such variation can be modeled by general Markov processes in theory, but they are typically not practical in inference. Unlike molecular sequences where the same nucleotide across sites has the same biological meaning, morphological characters coded as 0, for example, have different meanings among characters; thus using one parameter for all the 0s would be pointless, whereas unlinking all of them would result in too many parameters. The best strategy so far has been using the F2v model, in which the state frequencies are averaged analogous to averaging the site rates (Wright et al. Reference Wright, Lloyd and Hillis2016). According to our simulation results, it is recommended over the M2v model in all the scenarios we have tested. However, the F2v model only accounts for rate variation among character states. To further account for rate variation along branches, we could subdivide the data into multiple partitions (e.g., according to the anatomical regions) and infer independent evolutionary rates for each partition (e.g., using unlinked clock models; Lee Reference Lee2016; Zhang and Wang Reference Zhang and Wang2019). Bear in mind, though, we should keep enough (probably at least dozens of) characters in each partition to avoid overparameterization, especially when the data contain a large portion of missing states. Alternatively, a new method has been developed to account for rate variation both across characters and along branches by switching the rates among different rate regimes (Khakurel and Höhna Reference Khakurel and Höhna2025).

In the tip-dating analyses, we fixed the fossil ages to their true values. Hence the inferred tree heights (root ages) are reliable in all different conditions. This implies that incorporating accurate fossil information is crucial for dating divergence times, even when the morphological evolutionary model is mis-specified (Klopfstein et al. Reference Klopfstein, Ryer, Coiro and Spasojevic2019). In practice, however, uncertainties in fossil ages and prior for the timetree (root age in particular) are likely to decrease the accuracy (Barido-Sottani et al. Reference Barido-Sottani, Aguirre-Fernández, Hopkins, Stadler and Warnock2019; Luo et al. Reference Luo, Duchêne, Zhang, Zhu and Ho2019). Depending on the data and models, the situation can become rather complicated (Simões et al. Reference Simões, Caldwell and Pierce2020; May et al. Reference May, Contreras, Sundue, Nagalingum, Looy and Rothfels2021). Optimistically, when the timetree is presumably reliable, evolutionary rate estimates could be refined using subsequent comparative methods for the characters of interest under more complex models (Pennell et al. Reference Pennell, Eastman, Slater, Brown, Uyeda, FitzJohn, Alfaro and Harmon2014; Revell Reference Revell2024).

We only considered correlated discrete morphological characters in this study. It is worth noting that there is a large body of literature for correlated continuous traits. The evolution of the traits is typically modeled by a Brownian motion (BM) (Felsenstein Reference Felsenstein1973, Reference Felsenstein1985b; Freckleton Reference Freckleton2012) or an Ornstein–Uhlenbeck (OU) process (Uhlenbeck and Ornstein Reference Uhlenbeck and Ornstein1930; Felsenstein Reference Felsenstein1988; Hansen Reference Hansen1997; Butler and King Reference Butler and King2004), and trait correlations are described by the variance–covariance matrix in the model. Relative to this, the threshold model (Wright Reference Wright1934; Felsenstein Reference Felsenstein2005) is a promising alternative for correlated discrete characters, in which the observed discrete states depend on whether the underlying continuous trait (called liability) is above a threshold value. Although the BM and OU models have been well studied mathematically, practical implementations for phylogenetic inference are sparse (Álvarez-Carretero et al. Reference Álvarez-Carretero, Goswami, Yang and dos Reis2019; Hassler et al. Reference Hassler, Magee, Zhang, Baele, Lemey, Ji, Fourment and Suchard2022; Zhang et al. Reference Zhang, Drummond and Mendes2023). The main reason is that these models are parameter-rich, and developing efficient computational methods is technically challenging. Thus, it appears to be an important area for further improvement.

Conclusion

Our results demonstrate that Bayesian inference of phylogenetic trees is remarkably robust to violations of the character independence assumption. Topological inference remains accurate across a range of realistic evolutionary scenarios, including strong correlation and substantial evolutionary rate heterogeneity among morphological characters. However, our analyses also reveal that branch lengths or clock rates may be systematically underestimated under simpler models when rate variation is present. To mitigate this, we recommend using models that average over character-state frequencies (e.g., F2v) and, when feasible, incorporating rate variation across partitions.

While this study focuses on discrete morphological characters, future work should extend to continuous trait and threshold models that more directly account for trait correlations. Overall, our findings support the continued and expanded use of Bayesian methods in morphological phylogenetics and call for methodological innovations to improve branch-length estimation under complex evolutionary processes.

Acknowledgments

This research was supported by the National Key Research and Development Program of China (2023YFF0804502 to C.Z.) and the National Natural Science Foundation of China (42172006 to C.Z.).

Author Contribution

C.Z. designed the study and led the analyses. X.L. performed the simulations. C.Z. led the interpretation of the results and writing of the manuscript. X.L. and C.Z. revised the manuscript and gave final approval for publication.

Competing Interests

The authors declare no competing interests.

Data Availability Statement

Electronic Supplementary Material is available from the GitHub Digital Repository: https://github.com/zhangchicool/morphSim.

Footnotes

Handling Editor: Rachel Warnock

The authors contributed equally to this study.

References

Literature Cited

Álvarez-Carretero, S., Goswami, A., Yang, Z., and dos Reis, M.. 2019. Bayesian estimation of species divergence times using correlated quantitative characters. Systematic Biology 68:967–986.10.1093/sysbio/syz015CrossRef Google Scholar PubMed

Barido-Sottani, J., Aguirre-Fernández, G., Hopkins, M. J., Stadler, T., and Warnock, R.. 2019. Ignoring stratigraphic age uncertainty leads to erroneous estimates of species divergence times under the fossilized birth-death process. Proceedings of the Royal Society B 286:20190685.10.1098/rspb.2019.0685CrossRef Google Scholar PubMed

Beaulieu, J. M., and Donoghue, M. J.. 2013. Fruit evolution and diversification in campanulid angiosperms. Evolution 67:3132–3144.10.1111/evo.12180CrossRef Google Scholar PubMed

Billet, G., and Bardin, J.. 2019. Serial homology and correlated characters in morphological phylogenetics: modeling the evolution of dental crests in placentals. Systematic Biology 68:267–280.10.1093/sysbio/syy071CrossRef Google Scholar PubMed

Brazeau, M. D., Guillerme, T., and Smith, M. R.. 2019. An algorithm for morphological phylogenetic analysis with inapplicable data. Systematic Biology 68:619–631.10.1093/sysbio/syy083CrossRef Google Scholar PubMed

Brown, J. W., Parins-Fukuchi, C., Stull, G. W., Vargas, O. M., and Smith, S. A.. 2017. Bayesian and likelihood phylogenetic reconstructions of morphological traits are not discordant when taking uncertainty into consideration: a comment on Puttick et al. Proceedings of the Royal Society B 284:20170986.10.1098/rspb.2017.0986CrossRef Google Scholar

Butler, M. A., and King, A. A.. 2004. Phylogenetic comparative analysis: a modeling approach for adaptive evolution. American Naturalist 164:683–695.10.1086/426002CrossRef Google Scholar PubMed

Donoghue, P. C. J., and Yang, Z.. 2016. The evolution of methods for establishing evolutionary timescales. Philosophical Transactions of the Royal Society B 371:20160020.10.1098/rstb.2016.0020CrossRef Google Scholar PubMed

Drummond, A., Ho, S., Phillips, M., and Rambaut, A.. 2006. Relaxed phylogenetics and dating with confidence. PLoS Biology 4:e88.10.1371/journal.pbio.0040088CrossRef Google Scholar PubMed

Estabrook, G. F., McMorris, F. R., and Meacham, C. A.. 1985. Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units. Systematic Zoology 34:193–200.10.2307/2413326CrossRef Google Scholar

Felsenstein, J. 1973. Maximum-likelihood estimation of evolutionary trees from continuous characters. American Journal of Human Genetics 25:471–492.Google Scholar PubMed

Felsenstein, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17:368–376.10.1007/BF01734359CrossRef Google Scholar PubMed

Felsenstein, J. 1985a. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791.10.1111/j.1558-5646.1985.tb00420.xCrossRef Google Scholar

Felsenstein, J. 1985b. Phylogenies and the comparative method. American Naturalist 125:1–15.10.1086/284325CrossRef Google Scholar

Felsenstein, J. 1988. Phylogenies and quantitative characters. Annual Review of Ecology and Systematics 19:445–471.10.1146/annurev.es.19.110188.002305CrossRef Google Scholar

Felsenstein, J. 2005. Using the quantitative genetic threshold model for inferences between and within species. Philosophical Transactions of the Royal Society B 360:1427–1434.10.1098/rstb.2005.1669CrossRef Google Scholar PubMed

Freckleton, R. P. 2012. Fast likelihood calculations for comparative analyses. Methods in Ecology and Evolution 3:940–947.10.1111/j.2041-210X.2012.00220.xCrossRef Google Scholar

Goloboff, P. A., Torres, A., and Arias, J. S.. 2018. Weighted parsimony outperforms other methods of phylogenetic inference under models appropriate for morphology. Cladistics 34:407–437.10.1111/cla.12205CrossRef Google Scholar PubMed

Goloboff, P. A., Pittman, M., Pol, D., and Xu, X.. 2019. Morphological data sets fit a common mechanism much more poorly than DNA sequences and call into question the Mkv model. Systematic Biology 68:494–504.Google Scholar

Goloboff, P. A., Laet, J. D., Ríos-Tamayo, D., and Szumik, C. A.. 2021. A reconsideration of inapplicable characters, and an approximation with step-matrix recoding. Cladistics 37:596–629.10.1111/cla.12456CrossRef Google Scholar

Hansen, T. F. 1997. Stabilizing selection and the comparative analysis of adaptation. Evolution 51:1341–1351.10.1111/j.1558-5646.1997.tb01457.xCrossRef Google Scholar PubMed

Hassler, G. W., Magee, A. F., Zhang, Z., Baele, G., Lemey, P., Ji, X., Fourment, M., and Suchard, M. A.. 2022. Data integration in Bayesian phylogenetics. Annual Review of Statistics and Its Application 10:353–377.10.1146/annurev-statistics-033021-112532CrossRef Google Scholar PubMed

Hopkins, M. J., and St. John, K.. 2021. Incorporating hierarchical characters into phylogenetic analysis. Systematic Biology 70:1163–1180.10.1093/sysbio/syab005CrossRef Google Scholar PubMed

Keating, J. N., Sansom, R. S., Sutton, M. D., Knight, C. G., and Garwood, R. J.. 2020. Morphological phylogenetics evaluated using novel evolutionary simulations. Systematic Biology 69:897–912.10.1093/sysbio/syaa012CrossRef Google Scholar PubMed

Khakurel, B., and Höhna, S.. 2025. A covarion model for phylogenetic estimation using discrete morphological datasets. bioRxiv 660793.Google Scholar

Klopfstein, S., Vilhelmsen, L., and Ronquist, F.. 2015. A nonstationary Markov model detects directional evolution in hymenopteran morphology. Systematic Biology 64:1089–1103.10.1093/sysbio/syv052CrossRef Google Scholar PubMed

Klopfstein, S., Ryer, R., Coiro, M., and Spasojevic, T.. 2019. Mismatch of the morphology model is mostly unproblematic in total-evidence dating: insights from an extensive simulation study. bioRxiv 679084.Google Scholar

Lee, M. S. Y. 2016. Multiple morphological clocks and total-evidence tip-dating in mammals. Biology Letters 12:20160033.10.1098/rsbl.2016.0033CrossRef Google Scholar PubMed

Lee, M. S. Y., and Palci, A.. 2015. Morphological phylogenetics in the genomic age. Current Biology 25:R922–R929.10.1016/j.cub.2015.07.009CrossRef Google Scholar PubMed

Leslie, A. B., Beaulieu, J. M., Crane, P. R., Knopf, P., and Donoghue, M. J.. 2015. Integration and macroevolutionary patterns in the pollination biology of conifers. Evolution 69:1573–1583.10.1111/evo.12670CrossRef Google Scholar PubMed

Lewis, P. O. 2001. A likelihood approach to estimating phylogeny from discrete morphological character data. Systematic Biology 50:913–925.10.1080/106351501753462876CrossRef Google Scholar PubMed

Luo, A., Duchêne, D. A., Zhang, C., Zhu, C.-D., and Ho, S. Y. W.. 2019. A simulation-based evaluation of tip-dating under the fossilized birth–death process. Systematic Biology 69:325–344.10.1093/sysbio/syz038CrossRef Google Scholar

Maddison, W. P. 1993. Missing data versus missing characters in phylogenetic analysis. Systematic Biology 42:576–581.10.1093/sysbio/42.4.576CrossRef Google Scholar

May, M. R., Contreras, D. L., Sundue, M. A., Nagalingum, N. S., Looy, C. V., and Rothfels, C. J.. 2021. Inferring the total-evidence timescale of marattialean fern evolution in the face of model sensitivity. Systematic Biology 70:1232–1255.10.1093/sysbio/syab020CrossRef Google Scholar PubMed

O’Reilly, J. E., Puttick, M. N., Parry, L., Tanner, A. R., Tarver, J. E., Fleming, J., Pisani, D., and Donoghue, P. C. J.. 2016. Bayesian methods outperform parsimony but at the expense of precision in the estimation of phylogeny from discrete morphological data. Biology Letters 12:20160081.10.1098/rsbl.2016.0081CrossRef Google Scholar PubMed

O’Reilly, J. E., Puttick, M. N., Pisani, D., and Donoghue, P. C. J.. 2018. Probabilistic methods surpass parsimony when assessing clade support in phylogenetic analyses of discrete morphological data. Palaeontology 61:105–118.10.1111/pala.12330CrossRef Google Scholar PubMed

Pagel, M. 1994. Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters. Proceedings of the Royal Society B 255:37–45.Google Scholar

Pagel, M., and Meade, A.. 2006. Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo. American Naturalist 167:808–825.10.1086/503444CrossRef Google Scholar PubMed

Pagel, M., Meade, A., and Barker, D.. 2004. Bayesian estimation of ancestral character states on phylogenies. Systematic Biology 53:673–684.10.1080/10635150490522232CrossRef Google Scholar PubMed

Pennell, M. W., Eastman, J. M., Slater, G. J., Brown, J. W., Uyeda, J. C., FitzJohn, R. G., Alfaro, M. E., and Harmon, L. J.. 2014. geiger v2.0: an expanded suite of methods for fitting macroevolutionary models to phylogenetic trees. Bioinformatics 30:2216–2218.10.1093/bioinformatics/btu181CrossRef Google Scholar PubMed

Puttick, M. N., O’Reilly, J. E., Tanner, A. R., Fleming, J. F., Clark, J., Holloway, L., Lozano-Fernandez, J., et al. 2017. Uncertain-tree: discriminating among competing approaches to the phylogenetic analysis of phenotype data. Proceedings of the Royal Society B 284:20162290.10.1098/rspb.2016.2290CrossRef Google Scholar

Puttick, M. N., O’Reilly, J. E., Pisani, D., and Donoghue, P. C. J.. 2019. Probabilistic methods outperform parsimony in the phylogenetic analysis of data simulated without a probabilistic model. Palaeontology 62:1–17.10.1111/pala.12388CrossRef Google Scholar

Pyron, R. A. 2011. Divergence time estimation using fossils as terminal taxa and the origins of Lissamphibia. Systematic Biology 60:466–481.10.1093/sysbio/syr047CrossRef Google Scholar PubMed

Revell, L. J. 2024. phytools 2.0: an updated R ecosystem for phylogenetic comparative methods (and other things). PeerJ 12:e16505.10.7717/peerj.16505CrossRef Google Scholar PubMed

Robinson, D. F., and Foulds, L. R.. 1981. Comparison of phylogenetic trees. Mathematical Biosciences 53:131–147.10.1016/0025-5564(81)90043-2CrossRef Google Scholar

Ronquist, F., and Huelsenbeck, J. P.. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572–1574.10.1093/bioinformatics/btg180CrossRef Google Scholar PubMed

Ronquist, F., Klopfstein, S., Vilhelmsen, L., Schulmeister, S., Murray, D. L., and Rasnitsyn, A. P.. 2012a. A total-evidence approach to dating with fossils, applied to the early radiation of the Hymenoptera. Systematic Biology 61:973–999.10.1093/sysbio/sys058CrossRef Google Scholar

Ronquist, F., Teslenko, M., van der Mark, P., Ayres, D. L., Darling, A., Höhna, S., Larget, B., Liu, L., Suchard, M. A., and Huelsenbeck, J. P.. 2012b. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Systematic Biology 61:539–542.10.1093/sysbio/sys029CrossRef Google Scholar

Simões, T. R., Caldwell, M. W., and Pierce, S. E.. 2020. Sphenodontian phylogeny and the impact of model choice in Bayesian morphological clock estimates of divergence times and evolutionary rates. BMC Biology 18:191.10.1186/s12915-020-00901-5CrossRef Google Scholar PubMed

Simões, T. R., Vernygora, O. V., de Medeiros, B. A. S., and Wright, A. M.. 2022. Handling logical character dependency in phylogenetic inference: extensive performance testing of assumptions and solutions using simulated and empirical data. Systematic Biology 72:662–680.10.1093/sysbio/syad006CrossRef Google Scholar

Smith, M. R. 2019. Bayesian and parsimony approaches reconstruct informative trees from simulated morphological datasets. Biology Letters 15:20180632.10.1098/rsbl.2018.0632CrossRef Google Scholar PubMed

Smith, M. R. 2020. Information theoretic generalized Robinson–Foulds metrics for comparing phylogenetic trees. Bioinformatics 36:5007–5013.10.1093/bioinformatics/btaa614CrossRef Google Scholar PubMed

Stadler, T. 2010. Sampling-through-time in birth-death trees. Journal of Theoretical Biology 267:396–404.10.1016/j.jtbi.2010.09.010CrossRef Google Scholar PubMed

Stadler, T. 2011. Simulating trees with a fixed number of extant species. Systematic Biology 60:676–684.10.1093/sysbio/syr029CrossRef Google Scholar PubMed

Tarasov, S. 2021. Integration of anatomy ontologies and evo-devo using structured Markov models suggests a new framework for modeling discrete phenotypic traits. Systematic Biology 68:698–716.10.1093/sysbio/syz005CrossRef Google Scholar

Tarasov, S. 2023. New phylogenetic Markov models for inapplicable morphological characters. Systematic Biology 72:681–693.10.1093/sysbio/syad005CrossRef Google Scholar PubMed

Tavaré, S. 1986. Some probabilistic and statistical problems on the analysis of DNA sequences. Lectures on Mathematics in the Life Sciences 17:57–86.Google Scholar

Tuffley, C., and Steel, M.. 1997. Links between maximum likelihood and maximum parsimony under a simple model of site substitution. Bulletin of Mathematical Biology 59:581–607.10.1007/BF02459467CrossRef Google Scholar

Uhlenbeck, G. E., and Ornstein, L. S.. 1930. On the theory of the Brownian motion. Physical Review 36:823–841.10.1103/PhysRev.36.823CrossRef Google Scholar

Wright, A. M., and Hillis, D. M.. 2014. Bayesian analysis using a simple likelihood model outperforms parsimony for estimation of phylogeny from discrete morphological data. PLoS ONE 9:e109210.10.1371/journal.pone.0109210CrossRef Google Scholar PubMed

Wright, A. M., Lloyd, G. T., and Hillis, D. M.. 2016. Modeling character change heterogeneity in phylogenetic analyses of morphology through the use of priors. Systematic Biology 65:602–611.10.1093/sysbio/syv122CrossRef Google Scholar PubMed

Wright, S. 1934. An analysis of variability in number of digits in an inbred strain of guinea pigs. Genetics 19:506.10.1093/genetics/19.6.506CrossRef Google Scholar

Yang, Z. 1994a. Estimating the pattern of nucleotide substitution. Journal of Molecular Evolution 39:105–111.10.1007/BF00178256CrossRef Google Scholar

Yang, Z. 1994b. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. Journal of Molecular Evolution 39:306–314.10.1007/BF00160154CrossRef Google Scholar

Yang, Z. 2014. Molecular evolution: a statistical approach. Oxford University Press, Oxford.10.1093/acprof:oso/9780199602605.001.0001CrossRef Google Scholar

Zhang, C., and Wang, M.. 2019. Bayesian tip dating reveals heterogeneous morphological clocks in Mesozoic birds. Royal Society Open Science 6:182062.10.1098/rsos.182062CrossRef Google Scholar PubMed

Zhang, C., Stadler, T., Klopfstein, S., Heath, T., and Ronquist, F.. 2016. Total-evidence dating under the fossilized birth-death process. Systematic Biology 65:228–249.10.1093/sysbio/syv080CrossRef Google Scholar PubMed

Zhang, R., Drummond, A. J., and Mendes, F. K.. 2023. Fast Bayesian Inference of Phylogenies from Multiple Continuous Characters. Systematic Biology 73:102–124.10.1093/sysbio/syad067CrossRef Google Scholar

Figure 1. The distribution of tree length (A) and the numbers of extant and extinct tips (B) of the simulated trees.

Table 1. Models and settings used in the simulations and inferences. See “Methods” for the explanations of the symbols.

Figure 2. Tree distance metrics (Quartet and Mutual Clustering Information [MCI]) comparing the inferred tree with the true tree generating the data. Each violin plot contains 100 replicates. The left four panels show the results of non-clock analyses (A, C, E, G), while the right four panels show the results of tip-dating analyses (B, D, F, H). Panels labeled “w/ missing” (E–H) indicate scenarios with missing data. The numbers on the x-axis correspond to the following experiments (simulation model vs. inference model): 1, M2v–vs–M2v; 2, M2v–vs–F2v; 3, G4v(α = 10)–vs–M2v; 4, G4v(α = 10)–vs–F2v; 5, G8v(α = 10)–vs–M2v; 6, G8v(α = 10)–vs–F2v; 7, F2v(α = 1)–vs–M2v; 8, F2v(α = 1)–vs–F2v; 9, G4v(α = 1)–vs–M2v; 10, G4v(α = 1)–vs–F2v; 11, G8v(α = 1)–vs–M2v; 12, G8v(α = 1)–vs–F2v; 13, F2v(α = 1, v = 4)–vs–M2v; 14, F2v(α = 1, v = 4)–vs–F2v; 15, G4v(α = 1, v = 4)–vs–M2v; 16, G4v(α = 1, v = 4)–vs–F2v; 17, G8v(α = 1, v = 4)–vs–M2v; 18, G8v(α = 1, v = 4)–vs–F2v.

Figure 3. Relative bias (posterior mean minus the true value, then divided by the true value) and relative width of credibility interval (CI) (95% CI width divided by the true value) for each of the following experiments (simulation model vs. inference model): 1, M2v–vs–M2v; 2, M2v–vs–F2v; 3, G4v(α = 10)–vs–M2v; 4, G4v(α = 10)–vs–F2v; 5, G8v(α = 10)–vs–M2v; 6, G8v(α = 10)–vs–F2v; 7, F2v(α = 1)–vs–M2v; 8, F2v(α = 1)–vs–F2v; 9, G4v(α = 1)–vs–M2v; 10, G4v(α = 1)–vs–F2v; 11, G8v(α = 1)–vs–M2v; 12, G8v(α = 1)–vs–F2v; 13, F2v(α = 1, v = 4)–vs–M2v; 14, F2v(α = 1, v = 4)–vs–F2v; 15, G4v(α = 1, v = 4)–vs–M2v; 16, G4v(α = 1, v = 4)–vs–F2v; 17, G8v(α = 1, v = 4)–vs–M2v; 18, G8v(α = 1, v = 4)–vs–F2v. Each violin plot contains 100 replicates. The left four panels show the tree lengths from non-clock analyses (A, C, E, G), while the right four panels show the tree heights from tip-dating analyses (B, D, F, H). Panels labeled “w/ missing” (E–H) indicate scenarios with missing data.

Figure 4. Relative bias (posterior mean minus the true value, then divided by the true value) and relative width of credibility interval (CI) (95% CI width divided by the true value) of the base clock rate for each of the following experiments (simulation model vs. inference model): 1, M2v–vs–M2v; 2, M2v–vs–F2v; 3, G4v(α = 10)–vs–M2v; 4, G4v(α = 10)–vs–F2v; 5, G8v(α = 10)–vs–M2v; 6, G8v(α = 10)–vs–F2v; 7, F2v(α = 1)–vs–M2v; 8, F2v(α = 1)–vs–F2v; 9, G4v(α = 1)–vs–M2v; 10, G4v(α = 1)–vs–F2v; 11, G8v(α = 1)–vs–M2v; 12, G8v(α = 1)–vs–F2v; 13, F2v(α = 1, v = 4)–vs–M2v; 14, F2v(α = 1, v = 4)–vs–F2v; 15, G4v(α = 1, v = 4)–vs–M2v; 16, G4v(α = 1, v = 4)–vs–F2v; 17, G8v(α = 1, v = 4)–vs–M2v; 18, G8v(α = 1, v = 4)–vs–F2v. Each violin plot contains 100 replicates. The left two panels are scenarios without missing data (A and C), while the right two panels labeled with “w/ missing” (B and D) indicate scenarios with missing data.

Article contents

Bayesian inference of phylogenetic trees is not misled by correlated discrete morphological characters

Abstract

Information

Non-technical Summary

Introduction

Methods

Markov Models

Simulation Procedure

Phylogenetic Inference

Missing Data

Tree Distance Metrics

Results

Discussion

Conclusion

Acknowledgments

Author Contribution

Competing Interests

Data Availability Statement

Footnotes

References

Literature Cited

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests