Hostname: page-component-6766d58669-7fx5l Total loading time: 0 Render date: 2026-05-21T10:35:05.455Z Has data issue: false hasContentIssue false

Testing temporal changes in allele frequencies: a simulation approach

Published online by Cambridge University Press:  14 October 2010

EDSON SANDOVAL-CASTELLANOS*
Affiliation:
Laboratorio de Genética Ecológica y Evolución, Instituto de Ecología, Universidad Nacional Autónoma de México, Circuito exterior de Ciudad Universitaria, Mexico City, P.C. 04510, Mexico
*
*Tel: +(52)(55)56229005. e-mail: esandoval@miranda.ecologia.unam.mx
Rights & Permissions [Opens in a new window]

Summary

Analysis of the temporal variation in allele frequencies is useful for studying microevolutionary processes. However, many statistical methods routinely used to test temporal changes in allele frequencies fail to establish a proper hypothesis or have theoretical or practical limitations. Here, a Bayesian statistical test is proposed in which the distribution of the distances among sampling frequencies is approached with computer simulations, and hypergeometric sampling is considered instead of binomial sampling. To validate the test and compare its performance with other tests, agent-based model simulations were run for a variety of scenarios, and two real molecular databases were analysed. The results showed that the simulation test (ST) maintained the significance value used (α=0·05) for a vast combination of parameter values, whereas other tests were sensitive to the effect of genetic drift or binomial sampling. The differences between binomial and hypergeometric sampling were more complex than expected, and a novel effect was described. This study suggests that the ST is especially useful for studies with small populations and many alleles, as in microsatellite or sequencing molecular data.

Information

Type
Research Papers
Copyright
Copyright © Cambridge University Press 2010
Figure 0

Fig. 1. Model followed by the simulations. Each ellipse represents a population. The size of the population is indicated by the term outside the ellipse, and the allele frequency of one allele at a particular locus is represented by the term inside. The subindex indicates the generation number. The effective population is a splinter group of the total population, as are the samples (they represent non-replacement samples). The total population is obtained by random mating of a very large number of gametes, and therefore, the total population can be modelled as sampling with replacement from the previous effective population. The algorithm substitutes these sampling processes by generating random numbers, whether hypergeometric (hrn) or multinomial (mrn). When the samples are separated by t generations, the intermediate non-sampled generations can be simulated only by a binomial deviate (see text).

Figure 1

Table 1. Results for the two real data sets analysed. In the snail data set, the numbers under the label ‘No. of generations between samples’ indicate the number of generations between the first and last samples, while the numbers inside the parentheses indicate the generations between consecutive samples. Grey cells indicate significant results (α=0·05), and * indicates analyses that were not possible because of a lack of data or because the locus was monomorphic

Figure 2

Fig. 2. Percentage of significant tests obtained for different values of: (A) total population sizes (N), (B) numbers of generations between samples; and an increasing number of samples/generations, with two alleles (C) or three alleles (D). The default settings were (unless otherwise is indicated) as follows: N=10 000, Ne=N/2, t=5, two samples with sizes S0=S5=100, number of alleles k=2, initial allele frequencies of 0·5, sampling Plan II and fully Bayesian algorithm. In (C) and (D), only one generation separated consecutive samples, and the initial allele frequencies were 0·95/0·05 and 0·7/0·28/0·02, respectively.

Figure 3

Fig. 3. The graphics show the percentages of significant values obtained for different numbers of samples and alleles and for three different population sizes, N, as indicated in the graphics. The higher numbers of alleles/samples could not be used with the smaller population sizes (N=100 and 150), because the likelihood of an allele being lost is very high with a large number of alleles and a small number of organisms (the general simulations do not allow samples with missing alleles). The number of samples was the same as the number of alleles for each run. They were increased simultaneously just to assess the combined effect of the parameters in a simple form, because our goal was not to analyse the effect of each parameter (that was already done) but the magnitude the departures can reach with certain ‘problematic’ combinations of parameters. For that purpose, the following settings were used: Ne=N/2, one generation between consecutive samples, sample sizes were all the same Si=20, sampling Plan II and fully Bayesian algorithm. The initial allele frequencies were as follows: 0·5/0·5 (two alleles), 0·25/0·25/0·5 (three alleles), 0·167/0·167/0·167/0·5 (four alleles), 0·125/0·125/0·125/0·125/0·5 (five alleles), 0·1/0·1/0·1/0·1/0·1/0·5 (six alleles), 0·083/…/0·083/0·5 (seven alleles), 0·071/…/0·071/0·5 (eight alleles), 0·0625/…/0·0625/0·5 (nine alleles) and 0·055/…/0·055/0·5 (ten alleles).

Figure 4

Fig. 4. Difference between binomial and hypergeometric significant STs, and for tests over the FT statistic. Graphics show the % of the significant tests obtained with binomial ST minus the % obtained with hypergeometric ST, as a function of sample sizes. Two SNe ratios and sampling plans were used, S/Ne=0·9 and S/Ne=0·5; and Plan I (before reproduction) and Plan II (after reproduction). Curves correspond to: (A) S/Ne=0·9, Plan II; (B) S/Ne=0·5, Plan II; (C) S/Ne=0·9, Plan I, (D) S/Ne=0·5, Plan I. A curve in the positive region meant that the % of significant bSTs was larger than the % of hypergeometric tests, and a curve in the negative region the converse. The default settings were as follows: N=2Ne, t=5 (generations between samples), k=2, initial allele frequencies of 0·5/0·5 and fully Bayesian algorithm. Notice that, since N, Ne were fixed for each curve, the x-axis not only indicates increasing sample sizes but also increasing N and Ne.

Figure 5

Fig. 5. (A) Percentage of significant tests obtained when the frequency of one allele in the ‘real population’ (the ABM simulation) was increased by a given amount (s=1·01×, 1·05× and 1·1×) each generation, emulating a positive selection process. (B, C) Percentage of significant tests when incorrect values of N were used by the tests (x-axis) for two different values of real population size (the used in general simulations): N=1000 (B) and N=10 000 (C). The default parameters used were as follows: N=10 000, Ne=N/2, t=5, S0=S5=100, number of alleles k=2, initial allele frequencies of 0·5, sampling Plan II and full Bayesian algorithm.