Hostname: page-component-89b8bd64d-rbxfs Total loading time: 0 Render date: 2026-05-07T02:32:15.409Z Has data issue: false hasContentIssue false

Population genetic inference using a fixed number of segregating sites: a reassessment

Published online by Cambridge University Press:  21 January 2008

SEBASTIÁN E. RAMOS-ONSINS*
Affiliation:
Max Planck Institute of Chemical Ecology, Hans-Knöll-Strasse 8, 07745 Jena, Germany
SYLVAIN MOUSSET
Affiliation:
Biocenter, Department of Biology II, University of Munich, 82152 Planegg-Martinsried, Germany
THOMAS MITCHELL-OLDS
Affiliation:
Max Planck Institute of Chemical Ecology, Hans-Knöll-Strasse 8, 07745 Jena, Germany
WOLFGANG STEPHAN
Affiliation:
Biocenter, Department of Biology II, University of Munich, 82152 Planegg-Martinsried, Germany
*
*Corresponding author. e-mail: sramosonsins@ub.edu
Rights & Permissions [Opens in a new window]

Summary

Coalescent theory is commonly used to perform population genetic inference at the nucleotide level. Here, we examine the procedure that fixes the number of segregating sites (henceforth the FS procedure). In this approach a fixed number of segregating sites (S) are placed on a coalescent tree (independently of the total and internode lengths of the tree). Thus, although widely used, the FS procedure does not strictly follow the assumptions of coalescent theory and must be considered an approximation of (i) the standard procedure that uses a fixed population mutation parameter θ, and (ii) procedures that condition on the number of segregating sites. We study the differences in the false positive rate for nine statistics by comparing the FS procedure with the procedures (i) and (ii), using several evolutionary models with single-locus and multilocus data. Our results indicate that for single-locus data the FS procedure is accurate for the equilibrium neutral model, but problems arise under the alternative models studied; furthermore, for multilocus data, the FS procedure becomes inaccurate even for the standard neutral model. Therefore, we recommend a procedure that fixes the θ value (or alternatively, procedures that condition on S and take into account the uncertainty of θ) for analysing evolutionary models with multilocus data. With single-locus data, the FS procedure should not be employed for models other than the standard neutral model.

Information

Type
Research Article
Copyright
Copyright © Cambridge University Press 2007
Figure 0

Fig. 1. Shape index of a coalescent tree. The shape index is the ratio of the branch lengths in the upper and lower parts of the tree, X/Yn (see text).

Figure 1

Fig. 2. Differences between the sizes of FS and Fθ procedures for each S separately. (A) Neutral model: In the upper panel the distribution of S values obtained with the Fθ procedure using θ=0·0057 is shown. In the middle and lower panels the size differences for statistics for the upper (97·5%) and the lower (2·5%) tails, respectively, are presented. Note that large S values, although causing large differences, contribute very little to the total. (B) Hitchhiking model: In the upper panel the distribution of S values obtained with the Fθ procedure using θ=0·0127 is shown. In the middle and lower panels the size differences for statistics for the upper (90%) and the lower (10%) tails, respectively, are presented. Abbreviations are explained in Section 2. Critical values are not calculated for B (lower tail), and for Kw, and Hw (upper tail), because these statistics are not conservative with recombination.

Figure 2

Table 1. Difference (in percentage) between the FS and procedures for several critical valuesa

Figure 3

Fig. 3. Differences between the sizes of the FS and FSθU procedures. Critical values are obtained for each of the nine neutrality tests for the 2·5% upper and lower tails. Abbreviations are indicated in Section 2.

Figure 4

Table 2. Difference (in percentage) between the FS and the FSθU procedures for several critical valuesa

Figure 5

Fig. 4. Effect of the number of loci on the probability of rejecting the neutral panmictic model for nine neutrality tests. (A) Differences between the sizes of FS and FSθU procedures with no recombination in the upper 2·5% tail, given different numbers of loci. (B) Differences between the sizes of FS and FSθU procedures with no recombination in the lower 2·5% tail, given different numbers of loci. Abbreviations are explained in Section 2. Critical values are not calculated for B (lower tail), and for Kw, and Hw (upper tail), because these statistics are not conservative with recombination. S was fixed at 20 for each locus. Plots obtained using S values from a distribution compatible with θ=0·0057 gave equivalent results (not shown) although for a small number of loci we observed a large variance.

Figure 6

Fig. 5. Posterior densities of the total length Ln of a coalescent tree with n=20 are shown for the different procedures and no recombination. Results based on the FS, FSθ and FSθU procedures are displayed. For FSθ the θ parameter was arbitrarily set to 20/a2(20).

Figure 7

Table 3. Mean and standard deviation (in parentheses) of the posterior distribution of L20

Figure 8

Table 4. Mean and standard deviation (in parentheses) of the tree shape

Figure 9

Fig. 6. Probability distribution of Ln for four alternative models using n=20 and S=20. The FS and FSθU procedures are displayed in each case. Parameter values are the same as in Table 1 but intragenic recombination was set to 4Nr=10 (except for panel D where it was zero). (A) Island model. (B) Logistic expansion model. (C) Bottleneck model. (D) Hitchhiking model.

Figure 10

Fig. 7. Effect of recombination on average Ln for the FS, FSθ (θ=0·0057) and FSθU procedures in a neutral panmictic population using n=20 and S=20.