Skip to main content Accessibility help

Multi-generational imputation of single nucleotide polymorphism marker genotypes and accuracy of genomic selection

  • S. Toghiani (a1), S. E. Aggrey (a2) (a3) and R. Rekaya (a1) (a4) (a2)


Availability of high-density single nucleotide polymorphism (SNP) genotyping platforms provided unprecedented opportunities to enhance breeding programmes in livestock, poultry and plant species, and to better understand the genetic basis of complex traits. Using this genomic information, genomic breeding values (GEBVs), which are more accurate than conventional breeding values. The superiority of genomic selection is possible only when high-density SNP panels are used to track genes and QTLs affecting the trait. Unfortunately, even with the continuous decrease in genotyping costs, only a small fraction of the population has been genotyped with these high-density panels. It is often the case that a larger portion of the population is genotyped with low-density and low-cost SNP panels and then imputed to a higher density. Accuracy of SNP genotype imputation tends to be high when minimum requirements are met. Nevertheless, a certain rate of genotype imputation errors is unavoidable. Thus, it is reasonable to assume that the accuracy of GEBVs will be affected by imputation errors; especially, their cumulative effects over time. To evaluate the impact of multi-generational selection on the accuracy of SNP genotypes imputation and the reliability of resulting GEBVs, a simulation was carried out under varying updating of the reference population, distance between the reference and testing sets, and the approach used for the estimation of GEBVs. Using fixed reference populations, imputation accuracy decayed by about 0.5% per generation. In fact, after 25 generations, the accuracy was only 7% lower than the first generation. When the reference population was updated by either 1% or 5% of the top animals in the previous generations, decay of imputation accuracy was substantially reduced. These results indicate that low-density panels are useful, especially when the generational interval between reference and testing population is small. As the generational interval increases, the imputation accuracies decay, although not at an alarming rate. In absence of updating of the reference population, accuracy of GEBVs decays substantially in one or two generations at the rate of 20% to 25% per generation. When the reference population is updated by 1% or 5% every generation, the decay in accuracy was 8% to 11% after seven generations using true and imputed genotypes. These results indicate that imputed genotypes provide a viable alternative, even after several generations, as long the reference and training populations are appropriately updated to reflect the genetic change in the population.


Corresponding author



Hide All
Badke, YM, Bates, RO, Ernst, CW, Fix, J and Steibel, JP 2014. Accuracy of estimation of genomic breeding values in pigs using low-density genotypes and imputation. G3: Genes|Genomes|Genetics 4, 623631.
Berry, DP and Kearney, JF 2011. Imputation of genotypes from low- to high-density genotyping platforms and implications for genomic selection. Animal 5, 11621169.
Brondum, R, Guldbrandtsen, B, Sahana, G, Lund, M and Su, G 2014. Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle. BMC Genomics 15, 728.
Browning, S and Browning, B 2007. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. American Journal of Human Genetics 81, 10841097.
Calus, MP and Veerkamp, RF 2007. Accuracy of breeding values when using and ignoring the polygenic effect in genomic breeding value estimation with a marker density of one SNP per cM. Journal of Animal Breeding and Genetics 124, 362368.
Chen, L, Li, C, Sargolzaei, M and Schenkel, F 2014. Impact of genotype imputation on the performance of GBLUP and Bayesian methods for genomic prediction. PLoS One 9, e101544.
de los Campos, G, Gianola, D and Rosa, G 2009. Reproducing kernel Hilbert spaces regression: a general framework for genetic evaluation. Journal of Animal Science 87, 18831887.
Druet, T and Georges, M 2010. A hidden markov model combining linkage and linkage disequilibrium information for haplotype reconstruction and quantitative trait locus fine mapping. Genetics 184, 789798.
Druet, T, Schrooten, C and de Roos, AP 2010. Imputation of genotypes from different single nucleotide polymorphism panels in dairy cattle. Journal of Dairy Science 93, 54435454.
Goddard, M 2009. Genomic selection: prediction of accuracy and maximisation of long term response. Genetica 136, 245257.
Habier, D, Fernando, R, Kizilkaya, K and Garrick, D 2011. Extension of the Bayesian alphabet for genomic selection. BMC Bioinformatics 12, 186.
Habier, D, Tetens, J, Seefried, FR, Lichtner, P and Thaller, G 2010. The impact of genetic relationship information on genomic breeding values in German Holstein cattle. Genetics Selection Evolution 42, 5.
Hao, K, Chudin, E, McElwee, J and Schadt, EE 2009. Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies. BMC Genetics 10, 27.
Hayes, B, Bowman, P, Chamberlain, A and Goddard, M 2009. Invited review: genomic selection in dairy cattle: progress and challenges. Journal of Dairy Science 92, 433443.
Hickey, J, Kinghorn, B, Tier, B, van der Werf, J and Cleveland, M 2012. A phasing and imputation method for pedigreed populations that results in a single-stage genomic evaluation. Genetics Selection Evolution 44, 9.
Howie, BN, Donnelly, P and Marchini, J 2009. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genetics 5, e1000529.
Hoze, C, Fouilloux, M-N, Venot, E, Guillaume, F, Dassonneville, R, Fritz, S, Ducrocq, V, Phocas, F, Boichard, D and Croiseau, P 2013. High-density marker imputation accuracy in sixteen French cattle breeds. Genetics Selection Evolution 45, 33.
Huang, L, Wang, C and Rosenberg, NA 2009. The relationship between imputation error and statistical power in genetic association studies in diverse populations. American Journal of Human Genetics 85, 692698.
Huang, M, Xu, H, Xie, S, Zhou, H and Qu, L 2011. Insulin-like growth factor-1 receptor is regulated by microRNA-133 during skeletal myogenesis. PLoS One 6, e29173.
Khatkar, MS, Moser, G, Hayes, BJ and Raadsma, HW 2012. Strategies and utility of imputed SNP genotypes for genomic analysis in dairy cattle. BMC Genomics 13, 538.
Li, Y, Willer, C, Ding, J, Scheet, P and Abecasis, G 2010. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genetic Epidemiology 34, 816834.
Luan, T, Woolliams, J, Lien, S, Kent, M, Svendsen, M and Meuwissen, T 2009. The accuracy of genomic selection in Norwegian red cattle assessed by cross-validation. Genetics 183, 11191126.
Ma, P, Brøndum, RF, Zhang, Q, Lund, MS and Su, G 2013. Comparison of different methods for imputing genome-wide marker genotypes in Swedish and Finnish Red Cattle. Journal of Dairy Science 96, 46664677.
Meuwissen, TH, Hayes, BJ and Goddard, ME 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 18191829.
Nothnagel, M, Ellinghaus, D, Schreiber, S, Krawczak, M and Franke, A 2009. A comprehensive evaluation of SNP genotype imputation. Human Genetics 125, 163171.
Pryce, J and Hayes, B 2012. A review of how dairy farmers can use and profit from genomic technologies. Animal Production Science 52, 180184.
Sargolzaei, M, Chesnais, J and Schenkel, F 2011. Fimpute – an efficient imputation algorithm for dairy cattle populations. Journal of Dairy Science 94, 421.
Sargolzaei, M, Chesnais, J and Schenkel, F 2014. A new approach for efficient genotype imputation using information from relatives. BMC Genomics 15, 478.
Sargolzaei, M and Schenkel, FS 2009. QMSim: a large-scale genome simulator for livestock. Bioinformatics 25, 680681.
Scheet, P and Stephens, M 2006. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. American Journal of Human Genetics 78, 629644.
VanRaden, P, O’Connell, J, Wiggans, G and Weigel, K 2011. Genomic evaluations with many more genotypes. Genetics Selection Evolution 43, 10.
VanRaden, PM 2008. Efficient methods to compute genomic predictions. Journal of Dairy Science 91, 44144423.
VanRaden, PM, Null, DJ, Sargolzaei, M, Wiggans, GR, Tooker, ME, Cole, JB, Sonstegard, TS, Connor, EE, Winters, M, van Kaam, JBCHM, Valentini, A, Van Doormaal, BJ, Faust, MA and Doak, GA 2013. Genomic imputation and evaluation using high-density Holstein genotypes. Journal of Dairy Science 96, 668678.
VanRaden, PM, Van Tassell, CP, Wiggans, GR, Sonstegard, TS, Schnabel, RD, Taylor, JF and Schenkel, FS 2009. Invited review: reliability of genomic predictions for North American Holstein bulls. Journal of Dairy Science 92, 1624.
Wang, H, Woodward, B, Bauck, S and Rekaya, R 2012. Imputation of missing SNP genotypes using low density panels. Livestock Science 146, 8083.
Weigel, KA, de los Campos, G, González-Recio, O, Naya, H, Wu, XL, Long, N, Rosa, GJM and Gianola, D 2009. Predictive ability of direct genomic values for lifetime net merit of Holstein sires using selected subsets of single nucleotide polymorphism markers. Journal of Dairy Science 92, 52485257.
Weigel, KA, Van Tassell, CP, O’Connell, JR, VanRaden, PM and Wiggans, GR 2010. Prediction of unobserved single nucleotide polymorphism genotypes of Jersey cattle using reference panels and population-based imputation algorithms. Journal of Dairy Science 93, 22292238.
Zhang, Z and Druet, T 2010. Marker imputation with low-density marker panels in Dutch Holstein cattle. Journal of Dairy Science 93, 54875494.


Type Description Title
Supplementary materials

Toghiani supplementary material S1
Toghiani supplementary material

 Word (131 KB)
131 KB


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed