Hostname: page-component-89b8bd64d-46n74 Total loading time: 0 Render date: 2026-05-14T00:49:11.239Z Has data issue: false hasContentIssue false

Measuring the degree of starshape in genealogies – summary statistics and demographic inference

Published online by Cambridge University Press:  30 July 2009

KONRAD LOHSE*
Affiliation:
Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3JT, UK
JEROME KELLEHER
Affiliation:
Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3JT, UK
*
*Corresponding author. Tel: +44 (0)131 650 5508. e-mail: K.R.Lohse@sms.ed.ac.uk
Rights & Permissions [Opens in a new window]

Summary

The degree of starshape of a genealogy is readily detectable using summary statistics and can be taken as a surrogate for the effect of past demography and other non-neutral forces. Summary statistics such as Tajima's D and related measures are commonly used for this. However, it is well known that because of their neglect of the genealogy underlying a sample such neutrality tests are far from ideal. Here, we investigate the properties of two types of summary statistics that are derived by considering the genealogy: (i) genealogical ratios based on the number of mutations on the rootward branches, which can be inferred from sequence data using a simple algorithm and (ii) summary statistics that use properties of a perfectly star-shaped genealogy. The power of these measures to detect a history of exponential growth is compared with that of standard summary statistics and a likelihood method for the single and multi-locus case. Statistics that depend on pairwise measures such as Tajima's D have comparatively low power, being sensitive to the random topology of the underlying genealogy. When analysing multi-locus data, we find that the genealogical measures are most powerful. Provided reliable outgroup information is available they may constitute a useful alternative to full likelihood estimation and standard tests of neutrality.

Information

Type
Paper
Copyright
Copyright © Cambridge University Press 2009
Figure 0

Fig. 1. Random genealogy of a sample of 20 sequences. The root partitions the sample into two subclades of size 3 and 7. Rootward branches are shown as bold, terminal branches as dotted lines, mutations are represented as crosses. The time interval until the last coalescence event, T2, is shorter than average under the SNM. In this example S=30, ηR=7, ηRmin=2 and ηe=14.

Figure 1

Fig. 2. Power of summary statistics and likelihood method against exponential growth rate A=0–50. n=10, θ=20. Each point is based on 10 000 replicate simulations. The power of the likelihood method was estimated from 100 replicates (see large filled circles and error bars).

Figure 2

Fig. 3. Power of summary statistics against exponential growth rate A=0–50. n=50, θ=20. Note the different range (0–1) on the y-axis compared with Fig. 2.

Figure 3

Fig. 4. Power of summary statistics to detect a history of exponential growth (A=8) against θ. n=10.

Figure 4

Fig. 5. The effect of topological asymmetry on statistical power (simulation parameters as in 2). Genealogies of Fig. 2 were sorted according to the partition by the root (shown above plot). Only the most asymmetrical partition (9, 1) (a) and one other case (7, 3) (b) are shown. Results for the other three partitions were very similar to (b). Note that since lineages are exchangeable all asymmetrical partitions have the same probability Pa=2/(n−1) (Tajima, 1983, eqn (2)).

Figure 5

Fig. 6. Power of summary statistics to detect a history of growth A=8 using the mean across multiple loci against the number of loci, n=10 (A) and θ=20 (B). Assuming mutational rate heterogeneity (θ gamma distributed with α=2 and E[θ]=20).

Figure 6

Fig. 7. Power of summary statistics in the variance-based tests across multiple loci for three different growth rates (from left to right A=2, 4, 8). (A) θ=20. (B) Assuming mutational rate heterogeneity (θ gamma distributed with α=2 and E[θ]=20).