Hostname: page-component-89b8bd64d-5bvrz Total loading time: 0 Render date: 2026-05-06T08:34:16.606Z Has data issue: false hasContentIssue false

An ensemble-based approach to imputation of moderate-density genotypes for genomic selection with application to Angus cattle

Published online by Cambridge University Press:  18 July 2012

CHUANYU SUN*
Affiliation:
Department of Dairy Science, University of Wisconsin, Madison, WI 53706, USA
XIAO-LIN WU
Affiliation:
Department of Dairy Science, University of Wisconsin, Madison, WI 53706, USA Department of Animal Sciences, University of Wisconsin, Madison, WI 53706, USA
KENT A. WEIGEL
Affiliation:
Department of Dairy Science, University of Wisconsin, Madison, WI 53706, USA
GUILHERME J. M. ROSA
Affiliation:
Department of Animal Sciences, University of Wisconsin, Madison, WI 53706, USA Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI 53706, USA
STEWART BAUCK
Affiliation:
Merial Limited, Duluth, GA 30096, USA
BRENT W. WOODWARD
Affiliation:
Merial Limited, Duluth, GA 30096, USA
ROBERT D. SCHNABEL
Affiliation:
Division of Animal Sciences, University of Missouri, Columbia, MO 65211, USA
JEREMY F. TAYLOR
Affiliation:
Division of Animal Sciences, University of Missouri, Columbia, MO 65211, USA
DANIEL GIANOLA
Affiliation:
Department of Dairy Science, University of Wisconsin, Madison, WI 53706, USA Department of Animal Sciences, University of Wisconsin, Madison, WI 53706, USA Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI 53706, USA
*
*Corresponding author: 1675 Observatory Dr., Department of Dairy Science, University of Wisconsin, Madison, WI 53706, USA. Tel: +6082637824. E-mail: csun28@wisc.edu
Rights & Permissions [Opens in a new window]

Summary

Imputation of moderate-density genotypes from low-density panels is of increasing interest in genomic selection, because it can dramatically reduce genotyping costs. Several imputation software packages have been developed, but they vary in imputation accuracy, and imputed genotypes may be inconsistent among methods. An AdaBoost-like approach is proposed to combine imputation results from several independent software packages, i.e. Beagle(v3.3), IMPUTE(v2.0), fastPHASE(v1.4), AlphaImpute, findhap(v2) and Fimpute(v2), with each package serving as a basic classifier in an ensemble-based system. The ensemble-based method computes weights sequentially for all classifiers, and combines results from component methods via weighted majority ‘voting’ to determine unknown genotypes. The data included 3078 registered Angus cattle, each genotyped with the Illumina BovineSNP50 BeadChip. SNP genotypes on three chromosomes (BTA1, BTA16 and BTA28) were used to compare imputation accuracy among methods, and the application involved the imputation of 50K genotypes covering 29 chromosomes based on a set of 5K genotypes. Beagle and Fimpute had the greatest accuracy among the six imputation packages, which ranged from 0·8677 to 0·9858. The proposed ensemble method was better than any of these packages, but the sequence of independent classifiers in the voting scheme affected imputation accuracy. The ensemble systems yielding the best imputation accuracies were those that had Beagle as first classifier, followed by one or two methods that utilized pedigree information. A salient feature of the proposed ensemble method is that it can solve imputation inconsistencies among different imputation methods, hence leading to a more reliable system for imputing genotypes relative to independent methods.

Information

Type
Research Papers
Copyright
Copyright © Cambridge University Press 2012
Figure 0

Table 1. Number of animals and number of SNP markers with known genotypes in the training and testing sets

Figure 1

Table 2. Summary statistics of the bootstrap distribution of imputation accuracy obtained using each of the six software packages on chromosomes 1, 16 and 28*†‡

Figure 2

Fig. 1. Box plots of imputation accuracy on (a) chromosome 1, (b) chromosome 16 and (c) chromosome 28, obtained using six imputation software packages and five ensemble methods. Results are obtained from 50 bootstrap replicates. For x-axis labels, 1 = ‘Beagle3.3’; 2 = ‘IMPUTE2.0’; 3 = ‘fastPHASE1.4’; 4 = ‘findhap version 2’; 5 = ‘AlphaImpute’; 6 = ‘Fimpute version 2’; 7 - 11 = five ensemble systems.

Figure 3

Fig. 2. Kernel density plots of imputation accuracy for 720 ensemble methods obtained on (a) chromosome 1, (b) chromosome 16 and (c) chromosome 28.

Figure 4

Fig. 3. Comparison of imputation accuracy evaluated on 29 autosomes in registered Angus cattle using 6 independent imputation packages and 5 ensemble systems. For EnS1-5, the figure gives the average accuracy of the 5 ensembles.

Figure 5

Table A1. Evaluated on bovine chromosome 1 (BTA1)

Figure 6

Table A2. Evaluated on bovine chromosome 16 (BTA16)

Figure 7

Table A3. Evaluated on chromosome 28 (BTA28)

Figure 8

Table A4. List of the top 120 ensemble systems and combinations