Skip to main content
    • Aa
    • Aa

Estimating effective population size from samples of sequences: a bootstrap Monte Carlo integration method

  • Joseph Felsenstein (a1)

We would like to use maximum likelihood to estimate parameters such as the effective population size Ne, or, if we do not know mutation rates, the product 4Neμof mutation rate per site and effective population size. To compute the likelihood for a sample of unrecombined nucleotide sequences taken from a random-mating population it is necessary to sum over all genealogies that could have led to the sequences, computing for each one the probability that it would have yielded the sequences, and weighting each one by its prior probability. The genealogies vary in tree topology and in branch lengths. Although the likelihood and the prior are straightforward to compute, the summation over all genealogies seems at first sight hopelessly difficult. This paper reports that it is possible to carry out a Monte Carlo integration to evaluate the likelihoods pproximately. The method uses bootstrap sampling of sites to create data sets for each of which a maximum likelihood tree is estimated. The resulting trees are assumed to be sampled from a distribution whose height is proportional to the likelihood surface for the full data. That it will be so is dependent on a theorem which is not proven, but seems likely to be true if the sequences are not short. One can use the resulting estimated likelihood curve to make a maximum likelihood estimate of the parameter of interest, Ne or of 4Neμ. The method requires at least 100 times the computational effort required for estimation of a phylogeny by maximum likelihood, but is practical on today's work stations. The method does not at present have any way of dealing with recombination.

Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

J. C. Avise (1989). Gene trees and organismal histories: a phylogenetic approach to population biology. Evolution 43, 11921208

R. L. Cann , M. Stoneking & A. C. Wilson (1987). Mitochondrial DNA and human evolution. Nature 325, 3136.

B. Efron (1979). Bootstrap methods: another look at the jackknife. Annals of Statistics 7, 126.

B. Efron (1982). The Jackknife, the Bootstrap and Other Resampling Plans. Philadelphia: Society for Industrial and Applied Mathematics.

J. Felsenstein (1981). Evolutionary trees from gene frequencies and quantitative characters: finding maximum likelihood estimates. Evolution 35, 12291242.

J. Felsenstein (1985). Confidence limits on phylogenies with a molecular clock. Systematic Zoology 34, 152161.

J. Felsenstein (1988). Phylogenies from molecular sequences: inference and reliability. Annual Review of Genetics 22, 521565.

R. C. Griffiths (1989). Genealogical tree probabilities in the infinitely-many-site model. Journal of Mathematical Biology 27, 667680.

W. K. Hastings (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97109.

J. M. Hammersley & D. C. Handscomb (1964). Monte Carlo Methods. London: Methuen.

T. H. Jukes & C. Cantor (1969). Evolution of protein molecules. In Mammalian Protein Metabolism (ed. M. N. Munro ), pp. 21132. New York: Academic Press.

M. Kimura & T. Ohta (1972). On the stochastic model for estimation of mutational distance between homologous proteins. Journal of Molecular Evolution 2, 8790.

J. F. C. Kingman (1982 a). The coalescent. Stochastic Processes and Their Applications 13, 235248.

H. R. Künsch (1989). The jackknife and the bootstrap for general stationary observations. Annals of Statistics 17, 12171241.

T. Margush & F. R. McMorris (1981). Consensus n-trees. Bulletin of Mathematical Biology 43, 239244.

N. Metropolis , A. W. Rosenbluth , M. N. Rosenbluth , A. H. Teller & E. Teller (1953). Equation of state calculations by fast computing machines. Journal of Chemical Physics 21, 10871092.

C. Strobeck (1983). Estimation of the neutral mutation rate in a finite population from DNA sequence data. Theoretical Population Biology 24, 160172.

G. A. Watterson (1975) On the number of segregating sites in genetical models without recombination. Theoretical Population Biology 7, 256276.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Genetics Research
  • ISSN: 0016-6723
  • EISSN: 1469-5073
  • URL: /core/journals/genetics-research
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 0
Total number of PDF views: 10 *
Loading metrics...

Abstract views

Total abstract views: 78 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 31st March 2017. This data will be updated every 24 hours.