Skip to main content
    • Aa
    • Aa

Estimating population haplotype frequencies from pooled DNA samples using PHASE algorithm


Recent studies show that the PHASE algorithm is a state-of-the-art method for population-based haplotyping from individually genotyped data. We present a modified version of PHASE for estimating population haplotype frequencies from pooled DNA data. The algorithm is compared with (i) a maximum likelihood estimation under the multinomial model and (ii) a deterministic greedy algorithm, on both simulated and real data sets (HapMap data). Our results suggest that the PHASE algorithm is a method of choice also on pooled DNA data. The main reason for improvement over the other approaches is assumed to be the same as with individually genotyped data: the biologically motivated model of PHASE takes into account correlated genealogical histories of the haplotypes by modelling mutations and recombinations. The important questions of efficiency of DNA pooling as well as influence of the pool size on the accuracy of the estimates are also considered. Our results are in line with the earlier findings in that the pool size should be relatively small, only 2–5 individuals in our examples, in order to provide reliable estimates of population haplotype frequencies.

Corresponding author
*Corresponding author. Tel: (358) 9-191-51419. Fax: (358) 9-191-51400. e-mail:
Hide All
Abecasis G. R., Cherny S. S., Cookson W. O. & Cardon L. R. (2002). Merlin – rapid analysis of dense genetic maps using sparse gene flow trees. Nature Genetics 30, 97101.
Albers C. A., Heskes T. & Kappen H. J. (2007). Haplotype inference in general pedigrees using the cluster variation method. Genetics 177, 11011116.
Butcher L. M., Meaburn E., Liu L., Fernandez C., Hill L., Al-Chalabi A., Plomin R., Schalkwyk L. & Craig I. W. (2004). Genotyping pooled DNA on microarrays: a systematic genome screen of thousands of SNPs in large samples to detect QTLs for complex traits. Behavior Genetics 34, 549555.
Clark A. G. (1990). Inference of haplotypes from PCR-amplified samples of diploid populations. Molecular Biology and Evolution 7, 111122.
Douglas J. A., Boehnke M., Gillanders E., Trent J. M. & Gruber S. B. (2001). Experimentally-derived haplotypes substantially increase the efficiency of linkage disequilibirium studies. Nature Genetics 28, 361364.
Fishelson M., Dovgolevsky N. & Geiger D. (2005). Maximum likelihood haplotyping for general pedigrees. Human Heredity 59, 4160.
Gasbarra D. & Sillanpää M. J. (2006). Constructing parental linkage phase and genetic map over distances <1 cM using pooled haploid DNA. Genetics 172, 13251335.
Gasbarra D., Sillanpää M. J. & Arjas E. (2005). Backward simulation of ancestors of sampled individuals. Theoretical Population Biology 67, 7583.
International HapMap Consortium (2003). The international HapMap project. Nature 426, 789796.
International HapMap Consortium (2005). A haplotype map of the human genome. Nature 437, 12991320.
International HapMap Consortium (2007). A second generation human haplotype map of over 3·1 million SNPs. Nature 449, 851861.
Ito T., Chiku S., Inoue E., Tomita M., Morisaki T., Morisaki H. & Kamatani N. (2003). Estimation of haplotype frequencies, linkage-disequilibrium measures, and combination of haplotype copies in each pool by use of pooled DNA data. American Journal of Human Genetics 72, 384398.
Johnson T. (2005). Multipoint linkage disequilibrium mapping using multilocus allele frequency data. Annals of Human Genetics 69, 474497.
Johnson T. (2007). Bayesian method for gene detection and mapping using case and control design and DNA pooling. Biostatistics 8, 546565.
Kirkpatrick B., Armendariz C. S., Karp R. M. & Halperin E. (2007). HaploPool: improving haplotype frequency estimation through DNA pools and phylogenetic modeling. Bioinformatics 23, 30483055.
Lee W. C. (2005). A DNA pooling strategy for family-based association studies. Cancer Epidemiology Biomarkers and Prevention 14, 958962.
Li N. & Stephens M. (2003). Modeling linkage disequilibrium, and identifying recombination hotspots using SNP data. Genetics 165, 22132233.
Long J. C., Williams R. C. & Urbanek M. (1995). An E-M algorithm and testing strategy for multiple-locus haplotypes. American Journal of Human Genetics 56, 799810.
Marchini J., Cutler D., Patterson N., Stephens M., Eskin E., Halperin E., Lin S., Qin Z. S., Munro H. M., Abecasis G. R., Donnelly P. & International HapMap Consortium (2006). A comparison of phasing algorithms for trios and unrelated individuals. American Journal of Human Genetics 78, 437450.
Niu T. (2004). Algorithms for inferring haplotypes. Genetic Epidemiology 27, 334347.
Niu T., Qin Z. S., Xu X. & Liu J. S. (2002). Bayesian haplotype inference for multiple linked single-nucleotide polymorphism. American Journal of Human Genetics 70, 157169.
Norton N., Williams N. M., Williams H. J., Spurlock G., Kirov G., Morris D. W., Hoogendoorn B., Owen M. J. & O'Donovan M. C. (2002). Universal, robust, highly quantitative SNP allele frequency measurement in DNA pools. Human Genetics 110, 471478.
Pfeiffer R. M., Rutter J. L., Gail M. H., Struewing J. & Gastwirth J. L. (2002). Efficiency of DNA pooling to estimate joint allele frequencies and measure linkage disequilibrium. Genetic Epidemiology 22, 94102.
Qian D. & Beckmann L. (2002). Minimum-recombinant haplotyping in pedigrees. American Journal of Human Genetics 70, 14341445.
Quade S. R. E., Elston R. C. & Goddard K. A. B. (2005). Estimating haplotype frequencies in pooled DNA samples when there is genotyping error. BMC Genetics 6, 25.
Risch N. & Teng J. (1998). The relative power of family-based and case-control designs for linkage disequilibrium studies of complex human disease I. DNA pooling. Genome Research 8, 12731288.
Robert C. P. & Casella G. (1999). Monte Carlo Statistical Methods. New York: Springer.
Sham P., Bader J. S., Craig I., O'Donovan M. & Owen M. (2002). DNA pooling: a tool for large-scale association studies. Nature Reviews Genetics 3, 862871.
Sobel E. & Lange K. (1996). Descent graphs in pedigree analysis: applications to haplotyping, location scores, and marker-sharing statistics. American Journal of Human Genetics 58, 13231337.
Stephens M. & Donnelly P. (2003). A comparison of Bayesian methods for haplotype reconstruction from population genotype data. American Journal of Human Genetics 73, 11621169.
Stephens M. & Scheet P. (2005). Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation. American Journal of Human Genetics 76, 449462.
Stephens M., Smith N. J. & Donnelly P. (2001). A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics 68, 978989.
Tamiya G., Shinya M., Imanishi T., Ikuta T., Makino S., Okamoto K., Furugaki K., Matsumoto T., Mano S., Ando S., Nozaki Y., Yukawa W., Nakashige R., Yamaguchi D., Ishibashi H., Yonekura M., Nakami Y., Takayama S., Endo T., Saruwatari T., Yagura M., Yoshikawa Y., Fujimoto K., Oka A., Chiku S., Linsen S. E., Giphart M. J., Kulski J. K., Fukazawa T., Hashimoto H., Kimura M., Hoshina Y., Suzuki Y., Hotta T., Mochida J., Minezaki T., Komai K., Shiozawa S., Taniguchi A., Yamanaka H., Kamatani N., Gojobori T., Bahram S. & Inoko H. (2005). Whole genome association study of rheumatoid arthritis using 27 039 microsatellites. Human Molecular Genetics 14, 23052321.
Uimari P. & Sillanpää M. J. (2001). Bayesian oligogenic analysis of quantitative and qualitative traits in general pedigrees. Genetic Epidemiology 21, 224242.
Wang J., Koehler K. J. & Dekkers J. C. M. (2007). Interval mapping of quantitative trait loci with selective DNA pooling data. Genetics Selection Evolution 39, 685709.
Wang S., Kidd K. & Zhao H. (2003). On the use of DNA pooling to estimate haplotype frequencies. Genetic Epidemiology 24, 7482.
Wijsman E. (1987). A deductive method of haplotype analysis in pedigrees. American Journal of Human Genetics 41, 356373.
Yang H. C., Pan C. C., Lin C. Y. & Fann C. S. J. (2006). PDA: pooled DNA analyzer. BMC Bioinformatics 7, 233.
Yang Y., Zhang J., Hoh J., Matsuda F., Xu P., Lathrop M. & Ott J. (2003). Efficiency of single-nucleotide polymorphism haplotype estimation from pooled DNA. Proceedings of the National Academy of Sciences, USA 100, 72257230.
Zhang K., Zhu J., Shendure J., Porreca G. J., Aach J. D., Mitra R. D. & Church G. M. (2006 a). Long-range polony haplotyping of individual human chromosome molecules. Nature Genetics 38, 382387.
Zhang Y., Niu T. & Liu J. S. (2006 b). A coalescence-guided hierarchical Bayesian method for haplotype inference. American Journal of Human Genetics 79, 313322.
Zou G. H. & Zhao H. Y. (2005). Family-based association tests for different family structures using pooled DNA. Annals of Human Genetics 69, 429442.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Genetics Research
  • ISSN: 0016-6723
  • EISSN: 1469-5073
  • URL: /core/journals/genetics-research
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 1
Total number of PDF views: 6 *
Loading metrics...

Abstract views

Total abstract views: 68 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 19th October 2017. This data will be updated every 24 hours.