Hostname: page-component-5db58dd55d-pjp64 Total loading time: 0 Render date: 2026-05-31T05:39:24.288Z Has data issue: false hasContentIssue false

An implementation of the Brownian motion model for Bayesian phylogenetic inference using continuous traits with missing states

Published online by Cambridge University Press:  26 December 2025

Ziye Wang
Affiliation:
School of Life Sciences, Peking University , Beijing 100871, China
Chi Zhang*
Affiliation:
Institute of Vertebrate Paleontology and Paleoanthropology Chinese Academy of Sciences , Beijing 100044, China
*
Corresponding author: Chi Zhang; Email: zhangchi@ivpp.ac.cn

Abstract

Continuous morphological traits play a crucial role in phylogenetic inference, yet they are often discretized due to limited software support and challenges of handling missing data efficiently. We present a new implementation of the Brownian motion model for continuous trait evolution in the Bayesian phylogenetics software MrBayes. Our approach efficiently accommodates any proportion of missing data and supports evolutionary rate variation across characters and data partitions. It is compatible with both non-clock and relaxed clock models. We validate the implementation through simulations and apply it to empirical datasets of pterosaurs and ancient humans, demonstrating that continuous traits can improve phylogenetic resolution. This development expands the methodological tool kit for morphological and total-evidence phylogenetics and is applicable across diverse taxonomic groups.

Information

Type
Methodological Advances
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited. The written permission of Cambridge University Press or the rights holder(s) must be obtained prior to any commercial use and/or adaptation of the article.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Paleontological Society
Figure 0

Figure 1. Pruning algorithm without and with missing states. One continuous trait with state values at the tips of a five-taxa tree is used for illustration. We assume the root is at the branch leading to taxon A. The algorithm traverses the tree (internal nodes) in post-order, that is, I, J, K, and the root. For internal node k with two descendant nodes i and j, we calculate the contrast (xk = mimj), ancestral state (mk), and transformed branch length (vk). The contrasts follow independent normal distributions. According to the pulley principle, the root position does not affect the likelihood and can be placed anywhere in the tree. When the states at taxa B and D are missing, we simply prune the branches leading to them, resulting in a star tree with three tips, A, C, and E. Refer to the pseudocode in the Appendix for a more rigorous calculation.

Figure 1

Figure 2. Quartet distance metric comparing the inferred tree topology with the true tree (A and B), and root-mean-square error (RMSE; C and D) and mean widths of 95% highest posterior density (HPD) intervals (E and F) of the estimated branch lengths. Each violin plot contains 100 replicates. The four data types are: (1) 200 variable binary discrete characters; (2) 100 variable binary discrete characters and 100 continuous traits; (3) 200 continuous traits; and (4) 200 continuous traits with 50% missing states in the extinct taxa and 10% missing in the extant taxa.

Figure 2

Figure 3. Tanglegram comparing the 50% majority-rule consensus tree from the Bayesian tip-dating analysis (A) to the strict consensus of the most parsimonious trees from TNT (B) using both continuous (normalized) and discrete morphological characters of pterosaurs. The relaxed clock model is independent-lognormal shared between the two partitions in the tip-dating analysis.

Figure 3

Figure 4. Tanglegram comparing the 50% majority-rule consensus tree using both continuous (normalized) and discrete morphological characters (A) with that using discretized continuous and discrete characters (B) of ancient humans. Both analyses used the white noise relaxed clock unlinked between the two partitions. The original study (tree on the right) enforced 10 topological constraints referring to the most parsimonious tree from TNT using both continuous and discrete characters (Ni et al. 2021).