Hostname: page-component-5db58dd55d-qmkzp Total loading time: 0 Render date: 2026-05-26T09:03:05.475Z Has data issue: false hasContentIssue false

Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods

Published online by Cambridge University Press:  14 October 2010

GUSTAVO DE LOS CAMPOS*
Affiliation:
University of Wisconsin-Madison, 1675 Observatory Drive, WI 53706, USA International Maize and Wheat Improvement Center (CIMMYT), Ap. Postal 6-641, 06600, México DF, México
DANIEL GIANOLA
Affiliation:
University of Wisconsin-Madison, 1675 Observatory Drive, WI 53706, USA
GUILHERME J. M. ROSA
Affiliation:
University of Wisconsin-Madison, 1675 Observatory Drive, WI 53706, USA
KENT A. WEIGEL
Affiliation:
University of Wisconsin-Madison, 1675 Observatory Drive, WI 53706, USA
JOSÉ CROSSA
Affiliation:
International Maize and Wheat Improvement Center (CIMMYT), Ap. Postal 6-641, 06600, México DF, México
*
*Corresponding author: 1665 University Boulevard, Ryals Public Health Building 414, AL 35294, USA. e-mail: gcampos@uab.edu
Rights & Permissions [Opens in a new window]

Summary

Prediction of genetic values is a central problem in quantitative genetics. Over many decades, such predictions have been successfully accomplished using information on phenotypic records and family structure usually represented with a pedigree. Dense molecular markers are now available in the genome of humans, plants and animals, and this information can be used to enhance the prediction of genetic values. However, the incorporation of dense molecular marker data into models poses many statistical and computational challenges, such as how models can cope with the genetic complexity of multi-factorial traits and with the curse of dimensionality that arises when the number of markers exceeds the number of data points. Reproducing kernel Hilbert spaces regressions can be used to address some of these challenges. The methodology allows regressions on almost any type of prediction sets (covariates, graphs, strings, images, etc.) and has important computational advantages relative to many parametric approaches. Moreover, some parametric models appear as special cases. This article provides an overview of the methodology, a discussion of the problem of kernel choice with a focus on genetic applications, algorithms for kernel selection and an assessment of the proposed methods using a collection of 599 wheat lines evaluated for grain yield in four mega environments.

Information

Type
Research Papers
Copyright
Copyright © Cambridge University Press 2010
Figure 0

Fig. 1. Alternative models for prediction of genetic values. Phenotypic records (y) were always the sum of a genetic signal (g) and a vector of Gaussian residuals (ε). Models differed on how g was represented, as described in the figure. BL, Bayesian LASSO; RKHS, reproducing kernel Hilbert spaces regression; λ, LASSO regularization parameter; θ, RKHS bandwidth parameter; σ·2, variance parameter; KA, kernel averaging; N(., .), normal density; DE(.), double-exponential density.

Figure 1

Fig. 2. Histogram of the evaluations of Gaussian kernel K(i,i′)=exp{−θk−1dii} by value of the bandwidth parameter (θ=0·25 left and θ=7, right). Here, dii=||xixi||2 is the squared Euclidean distance between marker codes xi=(xi1, …, xip)′ and xi=(xi1, …, xip)′ , and k \equals \mathop {\max }\limits_{\left( {i\comma i \prime} \right)} \lcub \vert \vert {\bf x}_{i} \minus {\bf x}_{i \prime} \vert \vert ^{\rm \setnum{2}} \rcub .

Figure 2

Fig. 3. Estimated posterior mean of the residual variance versus values of the bandwidth parameter, θ, by environment and model. Kθ is a marker-based RKHS regression with bandwidth parameter θ; Pedigree & Markers Kθ uses pedigree and markers, here, θ is the value of the bandwidth parameter for markers. Pedigree & Markers K0·25+K7 uses pedigree and markers with kernel averaging (KA) E1–E4: environments where the lines were evaluated.

Figure 3

Fig. 4. Estimated MSE between CV predictions (yHat) and observations (y) versus values of the bandwidth parameter, θ, by environment and model. Kθ is a marker-based RKHS regression with bandwidth parameter θ; Pedigree & Markers Kθ uses pedigree and markers, here, θ is the value of the bandwidth parameter for markers. Pedigree & Markers K0·25+K7 uses pedigree and markers with KA. E1–E4: environments where the lines were evaluated.

Figure 4

Table A1. Posterior mean (SD) of residual variance by model and environment

Figure 5

Table A2. MSE between realized phenotypes and CV predictions, by model and environment

Figure 6

Fig. A1. Posterior mean of the variance of the regression on the pedigree, σa2, versus values of the bandwidth parameter, θ, by environment and model. Pedigree & Markers Kθ uses pedigree and markers, here, θ is the value of the bandwidth parameter for markers. Pedigree & Markers K0·25+K7 uses pedigree and markers with KA. E1–E4: environments where the lines were evaluated.