Hostname: page-component-89b8bd64d-46n74 Total loading time: 0 Render date: 2026-05-10T22:21:22.159Z Has data issue: false hasContentIssue false

A two-step method for detecting selection signatures using genetic markers

Published online by Cambridge University Press:  01 June 2010

DANIEL GIANOLA*
Affiliation:
Department of Animal Sciences and Department of Dairy Science, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, N-1432 Ås, Norway Department of Animal Sciences, Georg-August-Universität, Göttingen, Germany
HENNER SIMIANER
Affiliation:
Department of Animal Sciences, Georg-August-Universität, Göttingen, Germany
SABER QANBARI
Affiliation:
Department of Animal Sciences, Georg-August-Universität, Göttingen, Germany
*
*Corresponding author. e-mail: gianola@ansci.wisc.edu
Rights & Permissions [Opens in a new window]

Summary

A two-step procedure is presented for analysis of θ (FST) statistics obtained for a battery of loci, which eventually leads to a clustered structure of values. The first step uses a simple Bayesian model for drawing samples from posterior distributions of θ-parameters, but without constructing Markov chains. This step assigns a weakly informative prior to allelic frequencies and does not make any assumptions about evolutionary models. The second step regards samples from these posterior distributions as ‘data’ and fits a sequence of finite mixture models, with the aim of identifying clusters of θ-statistics. Hopefully, these would reflect different types of processes and would assist in interpreting results. Procedures are illustrated with hypothetical data, and with published allelic frequency data for type II diabetes in three human populations, and for 12 isozyme loci in 12 populations of the argan tree in Morocco.

Information

Type
Research Papers
Copyright
Copyright © Cambridge University Press 2010
Figure 0

Fig. 1. Posterior density (thick line) of the allelic frequency p at a locus for which 199 copies have been observed out of 200 alleles counted in hypothetical population M; the posterior distribution is Beta\left( {199 \plus {\textstyle{1 \over 2}}\comma 1 \plus {\textstyle{1 \over 2}}} \right). The thin line is the density of a normal approximation to the sampling distribution of the maximum likelihood estimator.

Figure 1

Fig. 2. Posterior density of θl for the hypothetical example of populations M and N.

Figure 2

Fig. 3. Empirical cumulative distribution function of θl for the hypothetical example of populations M and N.

Figure 3

Fig. 4. Posterior density of the allelic frequency p under the ‘null’ model for hypothetical populations M and N: 209 copies of Al are observed out of 260 alleles screened.

Figure 4

Fig. 5. Posterior density of θl under the null model for the hypothetical example of populations M and N.

Figure 5

Fig. 6. Density of the posterior distribution of θKCNJ11 obtained from allelic frequencies in Myles et al. (2007).

Figure 6

Table 1. Allelic frequencies at 12 isozyme loci in each of 12 Argan tree populations, adapted from Petit et al. (1998) by making all loci bi-allelic. A1–A12 represent frequencies of the ‘A’ allele at loci 1–12; No. A1–No. A12 are the observed number of copies of the alleles. The number of ‘a’ alleles can be calculated from the number of individuals samples and the number of ‘A’ alleles observed

Figure 7

Fig. 7. Box plot of the posterior distributions of θ-parameters in 12 isozyme loci of the argan tree in Morocco (data originally from Petit et al.1998).

Figure 8

Fig. 8. Non-parametric density estimates of θ values (based on 2000 sample for each of 12 loci), logit(θ) and Gompit(θ). All samples treated as homogeneous, i.e. as generated from the same stochastic process.

Figure 9

Table 2. Comparison of mixture models with 2, 3 or 4 components fitted to the 12 posterior means of θ-parameters and their logit or Gompit transforms in the argan tree data of Petit et al. (1998). AIC (models with smallest values are favoured and indicated in boldface)

Figure 10

Table 3. Conditional probabilities of membership to one of two clusters for mixture models fitted to the posterior means of θ for the 12 loci in the argan tree, and their logit, log(θ/1−θ), and Gompit, −log(−log(θ)), transformations (boldfaced probability indicates the cluster with largest probability of membership)