Skip to main content

Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates

  • Joseph Felsenstein (a1)

It is known that under neutral mutation at a known mutation rate a sample of nucleotide sequences, within which there is assumed to be no recombination, allows estimation of the effective size of an isolated population. This paper investigates the case of very long sequences, where each pair of sequences allows a precise estimate of the divergence time of those two gene copies. The average divergence time of all pairs of copies estimates twice the effective population number and an estimate can also be derived from the number of segregating sites. One can alternatively estimate the genealogy of the copies. This paper shows how a maximum likelihood estimate of the effective population number can be derived from such a genealogical tree. The pairwise and the segregating sites estimates are shown to be much less efficient than this maximum likelihood estimate, and this is verified by computer simulation. The result implies that there is much to gain by explicitly taking the tree structure of these genealogies into account.

Hide All
Avise, J. C. (1989). Gene trees and organismal histories: a phylogenetic approach to population biology. Evolution 43, 11921208.
Avise, J. C, Ball, R. M. Jr & Arnold, J. (1988). Current versus historical population sizes in vertebrate species with high gene flow: a comparison based on mitochondrial DNA polymorphism and inbreeding theory for neutral mutations. Molecular Biology and Evolution 5, 331344.
Ball, R. M. Jr, Neigel, J. E. & Avise, J. C. (1990). Gene genealogies within the organismal pedigrees of randommating populations. Evolution 44, 360370.
Cann, R. L., Stoneking, M. & Wilson, A. C. (1987). Mitochondrial DNA and human evolution. Nature 325, 3136.
Ethier, S. N. & Griffiths, R. C. (1987). The infinitely-manysites model as a measure-valued diffusion. Annals of Probability 15, 515545.
Feller, W. (1968). An Introduction to Probability Theory and Its Applications, 3rd edn.New York: John Wiley.
Griffiths, R. C. (1989). Genealogical tree probabilities in the infinitely-many-site model. Journal of Mathematical Biology 11, 667680.
Harding, E. F. (1971). The probabilities of rooted tree shapes generated by random bifurcation. Advances in Applied Probability 3, 4477.
Hudson, R. R. (1983). Testing the constant-rate neutral allele model with protein sequence data. Evolution 37, 203217.
Kingman, J. F. C. (1982 a). The coalescent. Stochastic Processes and Their Applications 13, 235248.
Kingman, J. F. C. (1982 b). On the genealogy of large populations. Journal of Applied Probability 19 A, 2743.
Maddison, W. P. & Slatkin, M. (1991). Null models for the number of evolutionary steps in a character on a phylogenetic tree. Evolution 45, 11841197.
Moran, P. A. P. (1958). Random processes in genetics. Proc. Camb. Phil. Soc. 54, 6071.
Nei, M. & Tajima, F. (1981). DNA polymorphism detectable by restriction endonucleases. Genetics 97, 145163.
Nei, M. (1987). Molecular Evolutionary Genetics. New York: Columbia University Press.
Saunders, I. W., Tavare, S. & Watterson, G. A. (1984). On the genealogy of nested subsamples from a haploid population. Advances in Applied Probability 16, 471491.
Slatkin, M. (1987). The average number of sites separating DNA sequences drawn from a subdivided population. Theoretical Population Biology 32, 4249.
Slatkin, M. (1989). Detecting small amounts of gene flow from phylogenies of alleles. Genetics 121, 609612.
Slatkin, M. & Maddison, W. P. (1989). Cladistic measure of gene flow inferred from the phylogenies of alleles. Genetics 123, 603613.
Slowinski, J. G. & Guyer, C. (1989). Testing the stochasticity of patterns of organismal diversity: an improved null model. American Naturalist 134, 907921.
Strobeck, C. (1983). Estimation of the neutral mutation rate in a finite population from DNA sequence data. Theoretical Population Biology 24, 160172.
Tajima, F. (1983). Evolutionary relationship of DNA sequences in finite populations. Genetics 105, 437460.
Takahata, N. (1988). The coalescent in two partially isolated diffusion populations. Genetical Research 52, 213222.
Takahata, N. & Slatkin, M. (1990). Genealogy of neutral genes in two partially isolated populations. Theoretical Population Biology 38, 331350.
Tavare, S. (1984). Line-of-descent and genealogical processes, and their applications in population genetics models. Theoretical Population Biology 26, 119164.
Watterson, G. A. (1975). On the number of segregating sites in genetical models without recombination. Theoretical Population Biology 7, 256276.
Wright, S. (1940). Breeding structure of populations in relation to speciation. American Naturalist 74, 232248.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Genetics Research
  • ISSN: 0016-6723
  • EISSN: 1469-5073
  • URL: /core/journals/genetics-research
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 29 *
Loading metrics...

Abstract views

Total abstract views: 346 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 23rd September 2018. This data will be updated every 24 hours.