Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods

GUSTAVO DE LOS CAMPOS; DANIEL GIANOLA; GUILHERME J. M. ROSA; KENT A. WEIGEL; JOSÉ CROSSA

doi:10.1017/S0016672310000285

Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods

Published online by Cambridge University Press: 14 October 2010

GUSTAVO DE LOS CAMPOS ,

DANIEL GIANOLA ,

GUILHERME J. M. ROSA ,

KENT A. WEIGEL and

JOSÉ CROSSA

Show author details

GUSTAVO DE LOS CAMPOS*: Affiliation:
University of Wisconsin-Madison, 1675 Observatory Drive, WI 53706, USA International Maize and Wheat Improvement Center (CIMMYT), Ap. Postal 6-641, 06600, México DF, México
DANIEL GIANOLA: Affiliation:
University of Wisconsin-Madison, 1675 Observatory Drive, WI 53706, USA
GUILHERME J. M. ROSA: Affiliation:
University of Wisconsin-Madison, 1675 Observatory Drive, WI 53706, USA
KENT A. WEIGEL: Affiliation:
University of Wisconsin-Madison, 1675 Observatory Drive, WI 53706, USA
JOSÉ CROSSA: Affiliation:
International Maize and Wheat Improvement Center (CIMMYT), Ap. Postal 6-641, 06600, México DF, México
*: *Corresponding author: 1665 University Boulevard, Ryals Public Health Building 414, AL 35294, USA. e-mail: gcampos@uab.edu

Article contents

Summary
Introduction
RKHS regression
Application to plant breeding data
Concluding remarks
Footnotes
References

Rights & Permissions

Summary

Prediction of genetic values is a central problem in quantitative genetics. Over many decades, such predictions have been successfully accomplished using information on phenotypic records and family structure usually represented with a pedigree. Dense molecular markers are now available in the genome of humans, plants and animals, and this information can be used to enhance the prediction of genetic values. However, the incorporation of dense molecular marker data into models poses many statistical and computational challenges, such as how models can cope with the genetic complexity of multi-factorial traits and with the curse of dimensionality that arises when the number of markers exceeds the number of data points. Reproducing kernel Hilbert spaces regressions can be used to address some of these challenges. The methodology allows regressions on almost any type of prediction sets (covariates, graphs, strings, images, etc.) and has important computational advantages relative to many parametric approaches. Moreover, some parametric models appear as special cases. This article provides an overview of the methodology, a discussion of the problem of kernel choice with a focus on genetic applications, algorithms for kernel selection and an assessment of the proposed methods using a collection of 599 wheat lines evaluated for grain yield in four mega environments.

Information

Type: Research Papers
Information: Genetics Research , Volume 92 , Issue 4 , August 2010 , pp. 295 - 308

DOI: https://doi.org/10.1017/S0016672310000285 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2010

1. Introduction

Prediction of genetic values is relevant in plant and animal breeding, as well as for assessing the probability of disease in medicine. Standard genetic models view phenotypic outcomes (y_i; i=1, …, n) as the sum of a genetic signal (g_i) and of a residual (∊_i), that is: y_i=g_i+∊_i. The statistical learning problem consists of uncovering genetic signal from noisy data, and predictions (ĝ_i) are constructed using phenotypic records and some type of knowledge about the genetic background of individuals.

Family structure, usually represented as a pedigree, and phenotypic records have been used for the prediction of genetic values in plants and animals over several decades (e.g. Fisher, Reference Fisher1918; Wright, Reference Wright1921; Henderson, Reference Henderson1975). In pedigree-based models (P), a genealogy is used to derive the expected degree of resemblance between relatives, measured as Cov(g_i, g _i′), and this provides a means for smoothing phenotypic records.

Dense molecular marker panels are now available in humans and in many plant and animal species. Unlike pedigree data, genetic markers allow follow-up of Mendelian segregation; a term that in additive models and in the absence of inbreeding accounts for 50% of the genetic variability. However, incorporating molecular markers into models poses several statistical and computational challenges such as how models can cope with the genetic complexity of multi-factorial traits (e.g. Gianola & de los Campos, Reference Gianola and de los Campos2008), and with the curse of dimensionality that arises when a large number of markers is considered. Parametric and semi-parametric methods address these two issues in different ways.

In parametric regression models for dense molecular markers (e.g. Meuwissen et al., Reference Meuwissen, Hayes and Goddard2001), g_i is a parametric regression on marker covariates, x_ik with k=1, …, p indexing markers. The linear model takes the form: $y_{i} \equals \sum\nolimits_{k \equals \setnum{1}}^{p} {x_{ik} \beta _{k} } \plus \epsiv _{i}$ , where β_k is the regression of y_i on x_ik. Often, p>>n and some shrinkage estimation method such as ridge regression (Hoerl & Kennard, Reference Hoerl and Kennard1970a, Reference Hoerl and Kennard1970b) or LASSO (Least Absolute Shrinkage and Selection Operator, Tibshirani, Reference Tibshirani1996), or their Bayesian counterparts, are used to estimate marker effects. Among the latter, those using marker-specific shrinkage such as the Bayesian LASSO of Park & Casella (Reference Park and Casella2008) or methods BayesA or BayesB of Meuwissen et al. (Reference Meuwissen, Hayes and Goddard2001) are the most commonly used. In linear regressions, dominance and epistasis may be accommodated by adding appropriate interactions between marker covariates to the model; however, the number of predictor variables is extremely large and modelling interactions is only feasible to a limited degree.

Reproducing kernel Hilbert spaces (RKHS) regressions have been proposed for semi-parametric regression on marker genotypes, e.g. Gianola et al. (Reference Gianola, Fernando and Stella2006) and Gianola & van Kaam (Reference Gianola and van Kaam2008) . In RKHS, markers are used to build a covariance structure among genetic values; for example, Cov(g_i, g _i′)∝K(x_i, x_i′), where x_i, x_i′ are vectors of marker genotypes and K(., .), the reproducing kernel (RK), is some positive definite (PD) function (de los Campos et al., Reference de los Campos, Gianola and Rosa2009a). This semi-parametric approach has several attractive features: (a) the methodology can be used with almost any type of information set (e.g. covariates, strings, images and graphs). This is particularly important because techniques for characterizing genomes change rapidly; (b) some parametric methods for genomic selection (GS) appear as special cases and (c) computations are performed in an n-dimensional space. This provides RKHS methods with a great computational advantage relative to some parametric methods, especially when p>>n.

This article discusses and evaluates the use of RKHS regressions for genomic-enabled prediction of genetic values of complex traits. Section 2 gives a brief review of RKHS regressions. A special focus is placed on the problem of kernel choice. We discuss cases where a genetic model (e.g. additive infinitesimal) is used to choose the kernel and others where the RK is chosen based on its properties (e.g. predictive ability). Section 3 presents an application to an extensive plant breeding data set where some of the methods discussed in Section 2 are evaluated. Concluding remarks are provided in Section 4.

2. RKHS regression

RKHS methods have been used in many areas of application such as spatial statistics (e.g. ‘Kriging’; Cressie, Reference Cressie1993), scatter-plot smoothing (e.g. smoothing splines; Wahba, Reference Wahba1990) and classification problems (e.g. support vector machines; Vapnik, Reference Vapnik1998), just to mention a few. Estimates in RKHS regressions can be motivated as solutions to a penalized optimization problem in an RKHS or as posterior modes in a certain class of Bayesian models. A brief description of RKHS estimates in the context of penalized estimation is given first in section 2(i), with its Bayesian interpretation introduced later in section 2(ii). A representation of RKHS regressions that uses orthogonal basis functions is given in section 2(iii). This section ends in 2(iv) with a discussion of the problem of kernel choice.

(i) Penalized estimation in RKHS

A standard problem in statistical learning consists of extracting signal from noisy data. The learning task can be described as follows (Vapnik, Reference Vapnik1998): given data {(y_i, t_i)}_i ₌₁ⁿ, originating from some functional dependency, infer this dependency. The pattern relating input, t_i∊T, and output, y_i∊Y, variables can be described with an unknown function, g, whose evaluations are g_i=g(t_i). For example, t_i may be a vector of marker genotypes, t_i=x_i and g may be a function assigning a genetic value to each genotype. Inferring g requires defining a collection (or space) of functions from which an element, ĝ, will be chosen via a criterion (e.g. a penalized residual sum of squares or a posterior density) for comparing candidate functions. Specifically, in RKHS, estimates are obtained by solving the following optimization problem:

(1)

$\hat{g} \equals \mathop {\arg \min }\limits_{g \in H} \lcub l\lpar g\comma {\bf y}\rpar \plus \lambda \vert \vert g\vert \vert _{H}^{\setnum{2}} \rcub \comma$

where g∊H denotes that the optimization problem is performed within the space of functions H, a RKHS; l(g, y) is a loss function (e.g. some measure of goodness of fit); λ is a parameter controlling trade-offs between goodness of fit and model complexity; and ||g||_H ² is the square of the norm of g on H, a measure of model complexity. A technical discussion of RKHS of real-valued functions can be found in Wahba (Reference Wahba1990) ; here, we introduce some elements that are needed to understand how ĝ is obtained.

Hilbert spaces are complete linear spaces endowed with a norm that is the square root of the inner product in the space. The Hilbert spaces that are relevant for our discussion are RKHS of real-valued functions, here denoted as H. An important result, known as the Moore–Aronszajn theorem (Aronszajn, Reference Aronszajn1950), states that each RKHS is uniquely associated with a PD function that is a function, K(t_i, t _i′), satisfying $\sum _{i} \sum _{i \prime} \alpha _{i} \alpha _{i \prime} K\lpar t_{i} \comma t_{i \prime} \rpar \gt 0$ for all sequences, {α_i}, with α_i≠0 for some i. This function, K(t_i, t _i′), also known as the RK, provides basis functions and an inner product (therefore a norm) to H. Therefore, choosing K(t_i, t _i′) amounts to selecting H; the space of functions where (1) is solved.

Using that duality, Kimeldorf & Wahba (Reference Kimeldorf and Wahba1971) showed that the finite-dimensional solution of (1) admits a linear representation $g\lpar t_{i} \rpar \equals \sum _{i \prime} K\lpar t_{i} \comma t_{i \prime} \rpar \alpha _{i \prime}$ , or in matrix notation, ${\bf{g}} \equals {\bf K}{\bf{\bmalpha}} \equals \lsqb {g}\lpar t_{\setnum{1}} \rpar \comma \ldots \comma {g}\lpar t_{n} \rpar \rsqb {\prime }$ , where K={K(t_i, t _i′)} is an n×n matrix whose entries are the evaluations of the RK at pairs of pint in T. Further, in this finite-dimensional setting, $\left\Vert g \right\Vert_{H}^{\setnum{2}} \equals \sum\nolimits_{i} {\sum\nolimits_{i\prime} {\alpha _{i} \alpha _{i \prime} K\lpar t_{i} \comma t_{i \prime} \rpar } } \equals {{\bmalpha} \prime {\bf K}{\bmalpha }$ . Using this in (1) and setting l(g, y) to be a residual sum of squares, one obtains: ${\bf \hat{g}} \equals {\bf K} {\bf \hat{\bmalpha }}$ , where ${\bf\hat{\bmalpha }} \equals \lpar \hat{\alpha }_{\setnum{1}} \comma \ldots \comma \hat{\alpha }_{n} \rpar \prime$ is the solution of

(2)

${\bf \hat{\bmalpha }} \equals \mathop {\arg \min }\limits_{\bmalpha } {\rm \lcub }\lpar {\bf y} \minus {\bf K}{\bmalpha }\rpar {\prime } \lpar {\bf y} \minus {\bf K}{\bmalpha }\rpar \plus \lambda {\bmalpha} \prime {\bf K}{\bmalpha }{\rm \rcub }$

and y={y_i} is a data vector. The first-order conditions of (2) lead to $\lpar {\bf K \prime K} \plus \lambda {\bf K}\rpar {\bf \hat{\bmalpha }} \equals {\bf K \prime y}$ . Further, since K=K′ and K⁻¹ exists, pre-multiplication by K⁻¹ yields, $\left[ {{\bf K} \plus \lambda {\bf I}} \right]{\bf \hat{\bmalpha }} \equals {\bf y}$ . Therefore, the estimated conditional expectation function is ${\bf \hat{g}} \equals {\bf K}{\bf \hat{\bmalpha }} \equals {\bf K}\lpar {\bf K} \plus \lambda {\bf I}\rpar ^{ \minus \setnum{1}} {\bf y} \equals {\bf P}\lpar \lambda \comma K\rpar {\bf y}$ , where P(λ, K)=K(K+λI)⁻¹ is a smoother or influence matrix.

The input information, t_i∊T, enters into the objective function and on the solution only through K. This allows using RKHS for regression with any class of information sets (vectors, graphs, images, etc.) where a PD function can be evaluated; the choice of kernel becomes the key element of model specification.

(ii) Bayesian interpretation

From a Bayesian perspective, ${\bf \hat{\bmalpha }}$ can be viewed as a posterior mode in the following model: ${\bf y} \equals {\bf K}{\bmalpha } \plus {\bmepsiv }$ ; P(ε, α|σ_∊², σ_g ²)=N(ε|0, Iσ_∊²)N(α|0, K⁻¹σ_g ²). The relationship between RKHS regressions and Gaussian processes was first noted by Kimeldorf & Wahba (Reference Kimeldorf and Wahba1970) and has been revisited by many authors (e.g. Harville, Reference Harville, David and David1983; Speed, Reference Speed1991). Following de los Campos et al. (Reference de los Campos, Gianola and Rosa2009a), one can change variables in the above model, with g=K α, yielding

(3)

$\openup3\left\{ {\matrix{ {{\bf y} \equals {\bf g} \plus {\bmepsiv }\comma } \hfill \cr {p\lpar \left. {{\bmepsiv }\comma {\bf g}} \right\vert\sigma _{\epsiv }^{\setnum{2}} \comma \sigma _{g}^{\setnum{2}} \rpar \equals N\lpar \left. {\bmepsiv} \right\vert{\bf 0}\comma {\bf I}\sigma _{\epsiv }^{\setnum{2}} \rpar N\lpar \left. {\bf g} \right\vert{\bf 0}\comma {\bf K}\sigma _{g}^{\setnum{2}} \rpar .} \hfill \cr} } \right.$

Thus, from a Bayesian perspective, the evaluations of functions can be viewed as Gaussian processes satisfying Cov(g_i, g_i _′)∝K(t_i, t _i′). The fully Bayesian RKHS regression assumes unknown variance parameters, and the model becomes

(4)

$\openup2\left\{ {\matrix{ {{\bf y} \equals {\bf g} \plus {\bmepsiv }\comma } \hfill \cr {p\lpar {\bmepsiv }\comma {\bf g}\comma \sigma _{\epsiv }^{\setnum{2}} \comma \sigma _{g}^{\setnum{2}} \rpar \equals N\lpar \left. {\bmepsiv } \right\vert{\bf 0}\comma {\bf I}\sigma _{\epsiv }^{\setnum{2}} \rpar N\lpar \left. {\bf g} \right\vert{\bf 0}\comma {\bf K}\sigma _{g}^{\setnum{2}} \rpar p\lpar \sigma _{\epsiv }^{\setnum{2}} \comma \sigma _{g}^{\setnum{2}} \rpar \comma } \hfill \cr} } \right.$

where p(σ_∊², σ_g ²) is a (proper) prior density assigned to variance parameters.

(iii) Representation using orthogonal random variables

Representing model (4) with orthogonal random variables simplifies computations greatly and provides additional insights into the nature of the RKHS regressions. To this end, we make use of the eigenvalue (EV) decomposition (e.g. Golub & Van Loan, Reference Golub and Van Loan1996) of the kernel matrix K=ΛΨΛ′ , where Λ is a matrix of eigenvectors satisfying Λ′Λ=I; Ψ=Diag{Ψ_j}, Ψ₁⩾Ψ₂⩾ …⩾Ψ_n>0, is a diagonal matrix whose non-zero entries are the EVs of K; and j=1, …, n, indexes eigenvectors (i.e. columns of Λ) and the associated EV. Using these, (4) becomes

(5)

$\openup3\!\left\{ {\matrix{ {{\bf y} \equals {\bmLambda}{\bmdelta } \plus {\bmepsiv}\comma } \hfill \cr {p\lpar {\bmepsiv }\comma {\bmdelta }\comma \sigma _{\epsiv}^{\setnum{2}} \comma \sigma _{g}^{\setnum{2}} \rpar \propto N\lpar \left. {\bmepsiv } \right\vert{\bf 0}\comma {\bf I}\sigma _{\epsiv }^{\setnum{2}} \rpar N\lpar \left. {\bmdelta } \right\vert{\bf 0}\comma {\bmPsi}\sigma _{g}^{\setnum{2}} \rpar p\lpar \sigma _{\epsiv }^{\setnum{2}} \comma \sigma _{g}^{\setnum{2}} \rpar .} \hskip-10pt\hfill \cr} } \right.$

To see the equivalence of (4) and (5), note that Λδ is multivariate normal because so is δ. Further, E(Λδ)=ΛE(δ)=0 and Cov(Λδ)=ΛΨΛ′σ_g ²=Kσ_g ². Therefore, equations (4) and (5) are two parameterizations of the same probability model. However, equation (5) is much more computationally convenient, as discussed next.

The joint posterior distribution of (5) does not have a closed form; however, draws can be obtained using a Gibbs sampler. Sampling regression coefficients from the corresponding fully conditional distribution, p(δ|y, σ_∊², σ_g ²), is usually the most computationally demanding step. From standard results of Bayesian linear models, one can show that $p\lpar \left. {\bmdelta } \right\vert{\rm ELSE}\rpar \equals N\lpar {\bf \hats{\bmdelta }}\comma \sigma _{\epsiv }^{\setnum{2}} {\bf C}^{ \minus \setnum{1}} \rpar$ , where ELSE denotes everything else other than δ, C=[Λ′Λ+σ_∊²σ_g ⁻² Ψ⁻¹]=Diag{1+σ_∊²σ_g ⁻²Ψ_j ⁻¹} and ${\bf \hats{\bmdelta }} \equals {\bf C}^{ \minus \setnum{1}} {\bmLambda \prime {\bf y}}$ . This simplification occurs because Λ′Λ=I and Ψ=Diag{Ψ_j}. The fully conditional distribution of δ is multivariate normal, and the (co)variance matrix, σ_∊² C⁻¹, is diagonal; therefore $p\lpar {\bmdelta }\vert {\rm ELSE}\rpar \equals {\bf\prod}_{j \equals \setnum{1}}^{n} p\lpar \delta_{j} {\rm \vert ELSE}\rpar$ . Moreover, p(δ_j|ELSE) is normal, centred at [1+σ_∊²σ_g ⁻²Ψ _j ⁻¹]⁻¹y _.j and with variance σ_∊²[1+σ_∊²σ_g ⁻²Ψ _j ⁻¹]⁻¹. Here, y _.j=λ′_j y, where λ_j is the jth eigenvector (i.e. the jth column of Λ). Note that model unknowns are not required for computing y _.j, implying that these quantities remain constant across iterations of a sampler. The only quantities that need to be updated are [1+σ_∊²σ_g ⁻²Ψ_j ⁻¹] and σ_∊²[1+σ_∊²σ_g ⁻²Ψ_j ⁻¹]. If model (5) is extended to include other effects (e.g. an intercept or some fixed effects), the right-hand side of the mixed model equations associated to p(δ|ELSE) will need to be updated at each iteration of the sampler; however, the matrix of coefficients remains diagonal and this simplifies computations greatly (see Appendix).

In equation (5), the conditional expectation function is a linear combination of eigenvectors: ${\bf g} \equals {\bmLambda}{\bmdelta } \equals \sum _{j} {\bmlambda }_{j} \delta _{j}$ . The EV are usually sorted such that Ψ₁⩾Ψ₂⩾ …⩾Ψ_n>0. The prior precision variance of regression coefficients is proportional to the EV, that is, Var(δ_j)∝Ψ_j. Therefore, the extent of shrinkage increases as j does. For most RKs, the decay of the EV will be such that for the first EV [1+σ_∊²σ_g ⁻²Ψ_j ⁻¹] is close to one, yielding negligible shrinkage of the corresponding regression coefficients. Therefore, linear combinations of the first eigenvectors can then be seen as components of g that are (essentially) not penalized.

(iv) Choosing the RK

The RK is a central element of model specification in RKHS. Kernels can be chosen so as to represent a parametric model, or based on their ability of predicting future observations. Examples of these two approaches are discussed next.

The standard additive infinitesimal model of quantitative genetics (e.g. Fisher, Reference Fisher1918; Henderson, Reference Henderson1975), is an example of a model-driven kernel (e.g. de los Campos et al., Reference de los Campos, Gianola and Rosa2009a). Here, the information set (a pedigree) consists of a directed acyclic graph and K(t_i, t _i′) gives the expected degree of resemblance between relatives under an additive infinitesimal model. Another example of an RKHS regression with a model-derived kernel is the case where K is chosen to be a marker-based estimate of a kinship matrix (usually denoted as G, cf, Ritland, Reference Ritland1996; Lynch & Ritland, Reference Lynch and Ritland1999; Eding & Meuwissen, Reference Eding and Meuwissen2001; Van Raden, Reference Van Raden2007; Hayes & Goddard, Reference Hayes and Goddard2008). An example of a (co)variance structure derived from a quantitative trait locus (QTL)-model is given in Fernando & Grossman (Reference Fernando and Grossman1989) .

Ridge regression and its Bayesian counterpart (Bayesian ridge regression (BRR)) can also be represented using (4) or (5). A BRR is defined by y=Xβ+ε and p(ε, β, σ_∊², σ_β²)=N(ε|0, Iσ_∊²)N(β|0, Iσ_β²)p(σ_∊², σ_β²). To see how a BRR constitutes a special case of (5), one can make use of the singular value decomposition (e.g. Golub & Van Loan, Reference Golub and Van Loan1996) of X=UDV′. Here, U (n×n) and V (p×n) are matrices whose columns are orthogonal, and D=Diag{ξ_j} is a diagonal matrix whose non-null entries are the singular values of X. Using this in the data equation, we obtain y=UDV′β+ε=Uδ+ε, where δ=DV′β. The distribution of δ is multivariate normal because so is that of β. Further, E(δ)=DV′E(β)=0 and Cov(δ)=DV′VD′σ_β²=DD′σ_β²; thus, δ~N[0, Diag{ξ_j ²}σ_β²]. Therefore, a BRR can be equivalently represented using (5) with Λ=U and Ψ=Diag{ξ_j ²}. Note that using Λ=U and Ψ=Diag{ξ_j ²} in (5) implies K=UDD′U′=UDV′VD′U′=XX′ in (4). Habier Fernando & Dekkers (Reference Habier, Fernando and Dekkers2009) argue that as the number of markers increases, XX′ approaches the numerator relationship matrix, A. From this perspective, XX′ can also be viewed just as another choice for an estimate of a kinship matrix. However, the derivation of the argument follows the standard treatment of quantitative genetic models where genotypes are random and marker effects are fixed, whereas in BRR, the opposite is true (see Gianola et al., Reference Gianola, de los Campos, Hill, Manfredi and Fernando2009 for further discussion).

In the examples given above, the RK was defined in such a manner that it represents a parametric model. An appeal of using parametric models is that estimates can be interpreted in terms of the theory used for deriving K. For example, if K=A then σ_g ² is interpretable as an additive genetic variance and σ_g ²(σ_g ²+σ_∊²)⁻¹ can be interpreted as the heritability of the trait. However, these models may not be optimal from a predictive perspective. Another approach (e.g. Shawe-Taylor & Cristianini, Reference Shawe-Taylor and Cristianini2004) views RKs as smoothers, with the choice of kernel based on their predictive ability or some other criterion. Moreover, the choice of the kernel may become a task of the algorithm.

For example, one can index a Gaussian kernel with a bandwidth parameter, θ, so that K(t_i,t _i′|θ)=exp{−θd(t_i,t _i′)}. Here, d(t_i,t _i′) is some distance function and θ controls how fast the covariance function drops as points get further apart as measured by d(t_i, t _i′). The bandwidth parameter may be chosen by cross-validation (CV) or with Bayesian methods (e.g. Mallick et al., Reference Mallick, Ghosh and Ghosh2005). However, when θ is treated as uncertain in a Bayesian model with Markov chain Monte Carlo (MCMC) methods, the computational burden increases markedly because the RK must be computed every time that a new sample of θ becomes available. It is computationally easier to evaluate model performance over a grid of values of θ; this is illustrated in section 3.

The (co)variance structure implied by a Gaussian kernel is not derived from any mechanistic consideration; therefore, no specific interpretation can be attached to the bandwidth parameter. However, using results for infinitesimal models under epistasis one could argue that a high degree of epistatic interaction between additive infinitesimal effects may induce a highly local (co)variance pattern in the same way that large values of θ do. This argument is revisited later in this section.

The decay of the EV controls, to a certain extent, the shrinkage of estimates of δ and, with this, the trade-offs between goodness of fit and model complexity. Transformations of EV (indexed with unknown parameters) can also be used to generate a family of kernels. One such example is the diffusion kernel K_α=ΛDiag{exp(αΨ_j)}Λ′ (e.g. Kondor & Lafferty, Reference Kondor and Lafferty2002). Here, α>0 is used to control the decay of EV. In this case, the bandwidth parameter can be interpreted as a quantity characterizing the diffusion of signal (e.g. heat) along edges of a graph, with smaller values being associated with more diffusion.

A third way of generating families of kernels is to use closure properties of PD functions (Shawe-Taylor & Cristianini, Reference Shawe-Taylor and Cristianini2004). For example, linear combinations of PD functions, $\tilde{K}\lpar t_{i} \comma t_{i \prime} \rpar \equals \sigma _{g_{\setnum{1}} }^{\setnum{2}} K_{\setnum{1}} \lpar t_{i} \comma t_{i \prime} \rpar \plus \sigma _{g_{\setnum{2}} }^{\setnum{2}} K_{\setnum{2}} \lpar t_{i} \comma t_{i \prime} \rpar$ , with σ_g _.²⩾0, are PD as well. From a Bayesian perspective, $\sigma _{g_{\setnum{1}} }^{\setnum{2}}$ and $\sigma _{g_{\setnum{2}} }^{\setnum{2}}$ are interpretable as variance parameters. To see this, consider extending (4) to two random effects so that: g=g₁+g₂ and, p(g₁, g₂| $\sigma _{g_{\setnum{1}} }^{\setnum{2}}$ , $\sigma _{g_{\setnum{2}} }^{\setnum{2}}$ )=N(g₁|0, K₁ $\sigma _{g_{\setnum{1}} }^{\setnum{2}}$ )N(g₂|0, K₂ $\sigma _{g_{\setnum{2}} }^{\setnum{2}}$ ). It follows that g~N(0, K₁ $\sigma _{g_{\setnum{1}} }^{\setnum{2}}$ +K₂ $\sigma _{g_{\setnum{2}} }^{\setnum{2}}$ ), or equivalently ${\bf g} \sim N\lpar {\bf 0}\comma {\bf \tilde{K}}\tilde{\sigma }_{g}^{\setnum{2}} \rpar$ , where $\tilde{\sigma }_{g}^{\setnum{2}} \equals \lpar \sigma _{g_{\setnum{1}} }^{\setnum{2}} \plus \sigma _{g_{\setnum{2}} }^{\setnum{2}} \rpar$ and ${\bf \tilde{K}} \equals {\bf K}_{\setnum{1}} \sigma _{g_{\setnum{1}} }^{\setnum{2}} \tilde{\sigma }_{g}^{ \minus \setnum{2}} \plus{\bf K}_{\setnum{2}} \sigma _{g_{\setnum{2}} }^{\setnum{2}} \tilde{\sigma }_{g}^{ \minus \setnum{2}}$ . Therefore, fitting an RKHS with two random effects is equivalent to using ${\bf \tilde{K}}$ in (4). Extending this argument to r kernels one obtains: ${\bf \tilde{K}} \equals \sum _{r} {\bf K}_{r} \sigma _{g_{r} }^{\setnum{2}} \tilde{\sigma}_{g_{r} }^{ \minus \setnum{2}}$ , where $\tilde{\sigma }_{g}^{\setnum{2}} \equals \sum_{r} \tilde{\sigma }_{g_{r} }^{\setnum{2}}$ . For example, one can obtain a sequence of kernels, {K_r}, by evaluating a Gaussian kernel over a grid of values of a bandwidth parameter {θ_r}. The variance parameters, $\{ {\sigma _{g_{r} }^{\setnum{2}} } \}$ , associated with each kernel in the sequence can be viewed as weights. Inferring these variances amounts to inferring a kernel, ${\bf \tilde{K}}$ , which can be seen as an approximation to an optimal kernel. We refer to this approach as kernel selection via kernel averaging (KA); an example of this is given in section 3.

The Haddamard (or Schur) product of PD functions is also PD, that is, if K ₁(t_i, t _i′) and K ₂(t_i, t _i′) are PD, so is K(t_i, t _i′)=K ₁(t_i, t _i′)K ₂(t_i, t _i′); in matrix notation, this is usually denoted as K=K₁#K₂. From a genetic perspective, this formulation can be used to accommodate non-additive infinitesimal effects (e.g. Cockerham, Reference Cockerham1954; Kempthorne, Reference Kempthorne1954). For example, under random mating, linkage equilibrium and in the absence of selection, K=A#A={a(i, i′)²} gives the expected degree of resemblance between relatives under an infinitesimal model for additive×additive interactions. For epistatic interaction between infinitesimal additive effects of qth order, the expected (co)variance structure is, K={a(i, i′)^q ⁺¹}. Therefore, for q⩾1 and i≠i′, the prior correlation,

$\eqalign{0 \lt{{a\lpar i\comma i \prime \rpar ^{q \minus \setnum{1}} } \over {\sqrt {a\lpar i\comma i\rpar ^{q \minus \setnum{1}} a\lpar i \prime \comma i \prime \rpar ^{q \minus \setnum{1}} } }} \equals \tab \left[ {{{a\lpar i\comma i \prime \rpar } \over {\sqrt {a\lpar i\comma i\rpar a\lpar i \prime \comma i \prime \rpar } }}} \right]^{q \minus \setnum{1}} \cr\lt\tab {{a\lpar i\comma i \prime \rpar } \over {\sqrt {a\lpar i\comma i\rpar a\lpar i \prime \comma i \prime \rpar } }} \lt 1 \comma$

decreases, i.e. the kernel becomes increasingly local, as the degree of epistatic interaction increases, producing an effect similar to that of a bandwidth parameter of a Gaussian kernel.

3. Application to plant breeding data

Some of the methods discussed in the previous section were evaluated using a data set consisting of a collection of historical wheat lines from the Global Wheat Breeding Programme of CIMMYT (International Maize and Wheat Improvement Center). In plant breeding programmes, lines are selected based on their expected performance and collecting phenotypic records is expensive. An important question is whether phenotypes collected on ancestor lines, together with pedigrees and markers, can be used to predict performance of lines for which phenotypic records are not available yet. If so, breeding programmes could perform several rounds of selection based on marker data only; with phenotypes measured every few generations. The reduction in generation interval attainable by selection based on markers may increase the rate of genetic progress and, at the same time, the cost of phenotyping would be reduced (e.g. Bernardo & Yu, Reference Bernardo and Yu2007; Heffner et al., Reference Heffner, Sorrells and Jannink2009). Thus, assessing the ability of a model to predict future outcomes is central in breeding programmes.

The study presented in this section attempted to evaluate: (a) how much could be gained in predictive ability by incorporating marker information into a pedigree-based model, (b) how sensitive these results are with respect to the choice of kernel, (c) whether or not Bayesian KA is effective for selecting kernels and (d) how RKHS performs relative to a parametric regression model, the Bayesian LASSO (BL; Park & Casella, Reference Park and Casella2008).

(i) Materials and methods

The data comprise family, marker and phenotypic information of 599 wheat lines that were evaluated for grain yield (GY) in four environments. Single-trait models were fitted to data from each environment. Marker information consisted of genotypes for 1447 Diversity Array Technology (DArT) markers, generated by Triticarte Pty. Ltd (Canberra, Australia; http://www.triticarte.com.au). Pedigree information was used to compute additive relationships between lines (i.e. twice the kinship coefficient; Wright, Reference Wright1921) using the Browse application of the International Crop Information System, as described in McLaren et al. (Reference McLaren, Bruskiewich, Portugal and Cosico2005) .

A sequence of models was fitted to the entire data set and in a CV setting. Figure 1 gives a summary of the models considered. In all environments, phenotypes were represented using equation y_i=μ+g_i+∊_i, where y_i (i=1, …, 599) is the phenotype of the ith line; μ is an effect common to all lines; g_i is the genetic value of the ith line; and ∊_i is a line-specific residual. Phenotypes were standardized to a unit variance in each of the environments. Residuals were assumed to follow a normal distribution $\epsiv _{i} \mathop \sim \limits^{{\rm IID}} N\lpar 0\comma \sigma _{\epsiv }^{\setnum{2}} \rpar$ , where σ_∊² is the residual variance. The conditional distribution of the data was $p\lpar {\bf y}\vert \mu \comma {\bf g}\comma \sigma _{\epsiv }^{\setnum{2}} \rpar \equals \prod _{i \equals \setnum{1}}^{n} N\lpar y_{i} \vert \mu \plus g_{i} \comma \sigma _{\epsiv }^{\setnum{2}} \rpar$ , where g=(g ₁, …, g_n)′. Models differed on how g_i was modelled.

Fig. 1.

Alternative models for prediction of genetic values. Phenotypic records (y) were always the sum of a genetic signal (g) and a vector of Gaussian residuals (ε). Models differed on how g was represented, as described in the figure. BL, Bayesian LASSO; RKHS, reproducing kernel Hilbert spaces regression; λ, LASSO regularization parameter; θ, RKHS bandwidth parameter; σ_·², variance parameter; KA, kernel averaging; N(., .), normal density; DE(.), double-exponential density.

In a standard infinitesimal additive model (P, standing for pedigree-model), genetic values are g=a with p(a|σ_a ²)=N(0, Aσ_a ²), where σ_a ² is the additive genetic variance and A={a(i,i′)}, as before, is the numerator relationship matrix among lines computed from the pedigree. This is a RKHS with K=A.

For marker-based models (M), two alternatives were considered: BL and RKHS regression. In the BL, genetic values were a linear function of marker covariates, g=Xβ, where X is an incidence matrix with marker genotypes codes and β=(β₁, …, β_p)′, the vector of regression coefficients, was inferred using the BL of Park & Casella (Reference Park and Casella2008) . Following de los Campos et al. (Reference de los Campos, Naya, Gianola, Crossa, Legarra, Manfredi, Weigel and Cotes2009b), the prior density of the regularization parameter of the BL, here denoted as $\tilde{\lambda }$ , was $p\lpar\tilde{\lambda}\rpar\propto {\rm Beta}( {\left. {{{\tilde{\lambda }} \sol {150}}} \vert\tilde{\alpha }_{\setnum{1}} \equals 1{\cdot}2\comma \tilde{\alpha }_{\setnum{2}} \equals 1{\cdot}2} )$ , which is flat over a fairly wide range. This model is denoted as M_BL.

In marker-based RKHS regressions (M_K) g=f_θ, where f_θ=(f _θ_,1, …, f _θ_,n)′ was assigned a Gaussian prior with null mean and (co)variance matrix Cov(f_θ)∝K_θ={exp(−θk ⁻¹d_ii_′)}. Here, θ is a bandwidth parameter, d_ii _′=||x_i−x_i_′||² is the square Euclidean distance between marker codes x_i=(x_i ₁, …, x_ip)′ and x_i _′=(x_i_′ ₁, …, x_i_′_p)′, and $k \equals \mathop {\max }\limits_{\left( {i\comma i \prime} \right)} {\rm \lcub }\vert \vert {\bf x}_{i} \minus {\bf x}_{i \prime} \vert \vert ^{\rm \setnum{2}} {\rm \rcub }$ . Models were fitted over a grid of values of θ and are denoted as M_k _,_θ. The optimal value of the bandwidth parameter is expected to change with many factors such as: (a) distance function; (b) number of markers, allelic frequency and coding of markers, all factors affecting the distribution of observed distances and (c) genetic architecture of the trait, a factor affecting the expected prior correlation of genetic values (see section 2(iv)). We generated a grid of values, θ∊{0·1, 0·25, 0·5, 0·75, 1, 2, 3, 5, 7, 10}, that for this data set allowed exploring a wide variety of kernels. Figure 2 gives a histogram of the evaluations of the kernel for two extreme values of the bandwidth parameter; θ=0·25 gives very high prior correlations, while θ=7 gives a kernel matrix with very low correlations in the off-diagonal.

Fig. 2.

Histogram of the evaluations of Gaussian kernel K(i,i′)=exp{−θk ⁻¹d_ii _′} by value of the bandwidth parameter (θ=0·25 left and θ=7, right). Here, d_ii _′=||x_i−x_i_′||² is the squared Euclidean distance between marker codes x_i=(x_i ₁, …, x_ip)′ and x_i_′=(x_i_′ ₁, …, x_i_′_p)′ , and $k \equals \mathop {\max }\limits_{\left( {i\comma i \prime} \right)} \lcub \vert \vert {\bf x}_{i} \minus {\bf x}_{i \prime} \vert \vert ^{\rm \setnum{2}} \rcub$ .

A model where g was the sum of two components: g=f_0·25+f₇, with p(f_0·25, f₇| $\sigma _{g_{\setnum{0\cdot 25}} }^{\setnum{2}}$ , $\sigma _{g_{\setnum{7}} }^{\setnum{2}}$ )=N(f_0·25|0,K_0·25 $\sigma _{g_{\setnum{0\cdot 25}} }^{\setnum{2}}$ )N(f₇|0, K₇ $\sigma _{g_{\setnum{7}} }^{\setnum{2}}$ ) was fitted as well. This model is referred to as M_KA, standing for marker-based model with ‘kernel-averaging’. Note that K_0·25 and K₇ provide very different kernels (see Fig. 2). With more extreme values of the bandwidth parameter, marker information is virtually lost. Indeed, choosing θ=0 gives a kernel matrix full of ones and θ→∞ gives K_θ→I, and averaging these two kernels gives a resulting (co)variance structure that does not use marker information at all.

Finally, a sequence of models including pedigree and marker data (PM) was obtained by setting g=a+Xβ, denoted as PM_BL; g=a+f_θ, θ={0·1, 0·25, 0·5, 0·75, 1, 2, 3, 5, 7, 10}, denoted as PM_k _,_θ; and, g=a+f_0·25+f₇, denoted as PM_KA.

In all models, variance parameters were treated as unknown and assigned identical independent scaled inverse chi-square prior distributions with three degrees of freedom and scale parameters equal to 1, p(σ_·²)=χ⁻²(σ_·²|df=3, S=1). Samples from posterior distributions for each of the models were obtained with a Gibbs sampler (see de los Campos et al., Reference de los Campos, Naya, Gianola, Crossa, Legarra, Manfredi, Weigel and Cotes2009b, for the case of M_BL and PM_BL, and the Appendix for RKHS models). Inferences were based on all 35 000 samples obtained after discarding 2000 samples as burn-in. The distribution of prediction errors was estimated using a 10-fold CV (e.g. Hastie et al., Reference Hastie, Tibshirani and Friedman2009).

(ii) Results

Figure 3 shows the posterior means of the residual variance in M_k _,_θ and PM_k _,_θ versus values of the bandwidth parameter θ obtained when models were fitted to the entire data. Each panel in Fig. 3 corresponds to one environment and the horizontal lines give the posterior means of the residual variance from P and PM_KA. Table A1 of the Appendix gives estimates of the posterior means and of the posterior standard deviations of the residual variance from each of the 25 models, by environment. The posterior means of the residual variances indicate that models M and PM fitted the data better than P, and PM_KA gave almost always better fit than M_k _,_θ and PM_k _,_θ. In all environments, the posterior mean of the residual variance decreased monotonically with θ; this was expected because K_θ becomes increasingly local as the bandwidth parameter increases. In environments 2, 3 and 4, the slopes of the curves relating the posterior mean of residual variance to θ were gentler for PM_k _,_θ than for M_k _,_θ. This occurs, because in PM_k _,_θ, the regression function has two components, one of which, the regression on the pedigree, is not a function of the bandwidth parameter. Models M_BL and PM_BL did not fit the training data as well as most of the RKHS counterparts, with a posterior mean of the residual variance that was close to that of M_k _,0·1 and PM_k _,0·5, respectively (see Table A1 of the Appendix).

Fig. 3.

Estimated posterior mean of the residual variance versus values of the bandwidth parameter, θ, by environment and model. K _θ is a marker-based RKHS regression with bandwidth parameter θ; Pedigree & Markers K _θ uses pedigree and markers, here, θ is the value of the bandwidth parameter for markers. Pedigree & Markers K _0·25+K ₇ uses pedigree and markers with kernel averaging (KA) E1–E4: environments where the lines were evaluated.

The contribution of a, that is, the regression on the pedigree, to the conditional expectation function, g, can be assessed via the posterior mean of σ_a ² (see Fig. A1 in the Appendix). The posterior mean of σ_a ² was larger in P models than in their PM counterparts; this was expected, because in P the regression on the pedigree is the only component of the conditional expectation function that contributes to phenotypic variance. Within PM_k _,_θ, the posterior mean of σ_a ² was minimum at intermediate values of the bandwidth parameters. At extreme values of θ, the RK may not represent the types of patterns present in the data and, thus, the estimated conditional expectation function would depend more strongly on the regression on the pedigree (large values of σ_a ²).

Plots in Fig. 4 give the estimated mean-squared error (MSE) between CV predictions and observations versus values of the bandwidth parameter (x-axis), by environment and model. The predictive MSE of the P and PM_KA models are displayed as horizontal dashed lines, and values of those for the BL (both in M_BL and PM_BL) are shown at the bottom of the panels. Table A2 in the Appendix gives the estimated MSE by model and environment, respectively.

Fig. 4.

Estimated MSE between CV predictions (yHat) and observations (y) versus values of the bandwidth parameter, θ, by environment and model. K _θ is a marker-based RKHS regression with bandwidth parameter θ; Pedigree & Markers K _θ uses pedigree and markers, here, θ is the value of the bandwidth parameter for markers. Pedigree & Markers K _0·25+K ₇ uses pedigree and markers with KA. E1–E4: environments where the lines were evaluated.

Overall, models including marker information had better predictive ability than pedigree-based models. For example, relative to P, using PM_KA yielded decreases in MSE between CV predictions of observations of 20·4, 8·8, 7·0 and 11·0% for E1 through E4, respectively (Table A2 in the Appendix). Thus, it appears that sizable gains in predictive ability can be attained by considering markers and pedigrees jointly, as in PM_KA. These results are in agreement with some empirical studies (e.g. Corrada Bravo et al., Reference Corrada Bravo, Lee, Klein, Klein, Iyengar and Wahba2009; de los Campos et al., Reference de los Campos, Naya, Gianola, Crossa, Legarra, Manfredi, Weigel and Cotes2009b) that provided evidence of a gain in predictive ability by jointly considering markers and pedigree information. However, marker density in this study was relatively low; as marker density increases it is expected that the relative importance of considering pedigree information will decrease (e.g. Calus & Veerkamp, Reference Calus and Veerkamp2007).

As shown in Fig. 4, the value of the bandwidth parameter that gave the best predictive ability was in the range (2,4), except for environment E2 in which values of θ near one performed slightly better. The value of the bandwidth parameter that was optimal from the perspective of predictive ability was similar in M and PM models (Fig. 4 and Table A2 in the Appendix). However, the difference between the predictive ability of PM_k _,_θ and M_k _,_θ models was larger for extreme values of θ, indicating that PM models are more robust than M models with respect to the choice of θ. Again, this occurs because PM_k _,_θ involves some form of KA (between the RK evaluated in the pedigree, A, and the one evaluated in marker genotypes, K_θ).

In all environments, KA had an estimated PMSE that was either close or lower than the one obtained with any specific value of the bandwidth parameter (Fig. 4 and Table A2 in the appendix). This was observed both in models with and without pedigree. These results suggest that KA can be an effective way of choosing the RK. Finally, PM_KA had higher predictive ability than PM_BL; this suggests a superiority of semi-parametric methods. However, PM_BL outperformed PM_k _,_θ for extreme values of the bandwidth parameter, illustrating, again, the importance of kernel selection. Moreover, the superiority of RKHS methods may not generalize to other traits or populations.

Using data from US Jersey sires (n=1446) genotyped with the BovineSNP50 BeadChip (42 552 Single-nucleotide polymorphisms (SNPs)) de los Campos et al. (Reference de los Campos, Gianola, Rosa, Weigel, Vazquez and Allison2010) compared the predictive ability of several RKHS models for predicted transmitting abilities of milk production, protein content and daughter pregnancy rate. Models evaluated in that study were: (a) BRR, i.e. K=XX′; (b) a Gaussian kernel evaluated over a grid of values of the bandwidth parameter, i.e. K_θ; (c) KA using the two most extreme kernels in the sequence {K_θ}; and (d) a model where K was a marker-based estimate of a kinship matrix, i.e. K=G. Results in that study are in agreement with findings reported here in that using KA gave predictive ability similar to that achieved with best performing kernel in the sequence {K_θ}. The comparison between KA, BRR and using K=G yielded mixed results: for milk yield all models performed similarly; however, for protein content BRR and G outperformed KA and the opposite was observed for daughter fertility, illustrating that the optimal choice of kernel may be trait dependent.

4. Concluding remarks

Incorporating molecular markers into models for prediction of genetic values poses important statistical and computational challenges. Ideally, models for dense molecular markers should be: (a) able to cope with the curse of dimensionality; (b) flexible enough to capture the complexity of quantitative traits and (c) amenable for computations. RKHS regressions can be used to address some of these challenges.

Coping with the curse of dimensionality and with complexity. In RKHS, the curse of dimensionality is controlled by defining a notion of smoothness of the unknown function with respect to pairs of points in input space, Cov[g(t_i), g(t _i′)]∝K(t_i, t _i′). The choice of RK becomes a central element of model specification in RKHS regressions.

As a framework, RKHS is flexible enough to accommodate many non-parametric and some parametric methods, including some classical choices such as the infinitesimal model. The frontier between parametric and non-parametric methods becomes fuzzy; models are better thought as decision rules (i.e. maps from data to estimates) and best evaluated based on performance. Predictive ability appears as a natural choice for evaluating model performance from a breeding perspective.

From a non-parametric perspective, kernels are chosen based on their properties (e.g. predictive ability). To a certain extent, this choice can be made a task of the algorithm. KA offers a computationally convenient method for kernel selection, and results on this study, as well as those of de los Campos et al. (Reference de los Campos, Gianola, Rosa, Weigel, Vazquez and Allison2010), suggests that KA is an effective strategy for kernel selection.

Computational considerations. RK Hilbert spaces methods offer enormous computational advantages relative to most of the parametric methods for regression on molecular markers. This occurs for two reasons: (a) the model can be represented in terms of n unknowns and (b) factorizations such as EV or Singular value decompositions can be used to arrive at highly efficient algorithms. Unfortunately, these benefits cannot be exploited in linear models, y=Xβ+ε, with marker-specific prior precision variances of effects such as BayesA or Bayesian LASSO. This provides RKHS with a great computational advantage relative to those methods, especially when p>>n.

Contribution of marker genotypes to prediction of genetic values. Unlike pedigrees, molecular markers allow tracing Mendelian segregation; potentially, this should allow better predictions of genetic values. Results from this study confirm this expectation. Overall, PM models outperformed P models. Further, most RKHS regression yielded better predictions than those attained with the Bayesian LASSO. However, this did not occur for every RK, indicating that the choice of the kernel is one of the main challenges when applying kernel-based methods. As stated, our results as well as those of de los Campos et al. (Reference de los Campos, Gianola, Rosa, Weigel, Vazquez and Allison2010) suggest that KA provides an effective way of choosing a kernel.

Future challenges. In the kernels used in this study all SNPs contributed equally to the RK. As the number of available markers increases, a high number is expected to be located in regions of the genome that are not associated with genetic variability of a quantitative trait. Ideally, the RK should weight each marker based on some measure of its contribution to genetic variance. In linear models such as the Bayesian LASSO, or methods Bayes A or Bayes B, the prior variances of marker effects, which are marker specific, act as weights assigned to each of the markers (e.g. de los Campos et al., Reference de los Campos, Naya, Gianola, Crossa, Legarra, Manfredi, Weigel and Cotes2009b).

In RKHS models, one could think of kernels where the contribution of each marker to the kernel is weighted according to some measure of its contribution to genetic variance. For example, one could derive weighted estimates of kinship in which each marker obtains a differential contribution. Alternatively, with a Gaussian kernel, one could think of attaching a bandwidth parameter to each marker. For example, one could use $K\lpar i\comma i \prime \rpar \equals \exp \lcub \minus \sum _{k \equals \setnum{1}}^{p} \theta _{k} d\lpar x_{ik} \comma x_{i \prime k} \rpar \rcub$ , where θ_k and d(x_ik, x_i _′_k) are a bandwidth parameter and a distance function associated with the kth marker.

An approach similar to that above-described was evaluated by Long et al. (Reference Long, Gianola, Rosa, Weigel, Kranis and González-Recio2010) who used radial-basis functions evaluated on principal components (as opposed to individual markers) derived from marker genotypes. Results of that study indicate that the use of input-specific bandwidth parameters may improve predictive ability relative to a model based on a single bandwidth parameter. However, inferring these weights (or bandwidth parameters) poses several statistical challenges when p>>n. This occurs because the kernel must be re-computed every time the bandwidth parameters are updated. A natural alternative is to use two-step procedures, with a first step in which an approximation to the weights (or bandwidth parameters) is employed (e.g. with some form of single-marker regression) and a second step where genetic values are inferred. Irrespective of whether single or two-step approaches are used, the development and evaluation of algorithms for computing weighted kernels seem to constitute a central area of research for the application of RKHS to genomic models.

APPENDIX

1. Gibbs sampler

The Appendix describes a Gibbs sampler for a Bayesian RKHS regression. The parameterization is as in equation (5), extended to two random effects and with the inclusion of an intercept. Extension of the model to more than two random effects is straightforward. The derivation of the fully conditional distributions presented here uses standard results for Bayesian linear models (e.g. Gelman et al., Reference Gelman, Carlin, Stern and Rubin2004; Sorensen & Gianola, Reference Sorensen and Gianola2002).

Let K₁=Λ₁ Ψ₁ Λ′₁ and K₂=Λ₂ Ψ₂ Λ′₂ be the EV decompositions of the two kernel matrices. Extending (5) to two random effects and by including an intercept, the data equation and likelihood function become y=1μ+Λ₁ δ₁+Λ₂ δ₂+ε and p(y|μ,δ₁,δ₂,σ_∊²)=N(y|1μ+Λ₁ δ₁+Λ₂ δ₂, Iσ_∊²), respectively. The joint prior is (upon assuming a flat prior for μ)

$\eqalign{\tab p\lpar \mu \comma {\bmdelta }_{\setnum{1}} \comma {\bmdelta }_{\setnum{2}} \comma \sigma _{\epsiv }^{\setnum{2}} \comma \sigma _{g_{\setnum{1}} }^{\setnum{2}} \comma \sigma _{g_{\setnum{2}} }^{\setnum{2}} \rpar \propto N\lpar {\bmdelta }_{\setnum{1}} \vert {\bf 0}\comma {\bf \bmPsi }_{\setnum{1}} \sigma _{g_{\setnum{1}} }^{\setnum{2}} \rpar N\lpar {\bmdelta }_{\setnum{2}} \vert {\bf 0}\comma {\bmPsi }_{\setnum{2}} \sigma _{g_{\setnum{2}} }^{\setnum{2}} \rpar \cr \tab\quad \times \chi ^{ \minus \setnum{2}} \lpar \sigma _{\epsiv }^{\setnum{2}} \vert df_{\epsiv } \comma S_{\epsiv } \rpar \chi ^{ \minus \setnum{2}} \lpar \sigma _{g_{\setnum{1}} }^{\setnum{2}} \vert df_{g_{\setnum{1}} } \comma S_{g_{\setnum{1}} } \rpar \chi ^{ \minus \setnum{2}} \lpar \sigma _{g_{\setnum{2}} }^{\setnum{2}} \vert df_{g_{\setnum{2}} } \comma S_{g_{\setnum{2}} } \rpar . \cr}$

Above, χ⁻²(.|df_., S_.) is a scaled inverse chi-square density with degree of freedom df_. and scale-parameter S _., with the parameterization presented in Gelman et al. (Reference Gelman, Carlin, Stern and Rubin2004) .

The joint posterior density is proportional to the product of the likelihood and the prior; thus

$\eqalign{\tab p\lpar \mu \comma {\bmdelta }_{\setnum{1}} \comma {\bmdelta }_{\setnum{2}} \comma \sigma _{\epsiv }^{\setnum{2}} \comma \sigma _{g_{\setnum{1}} }^{\setnum{2}} \comma \sigma _{g_{\setnum{2}} }^{\setnum{2}} \vert {\bf y}\rpar\cr\tab\quad\!\propto\! N\lpar {\bf y}\vert {\bf 1}\mu \plus {\bf \bmLambda }_{\setnum{1}} {\bmdelta }_{\setnum{1}} \plus {\bf \bmLambda }_{\setnum{2}} {\bmdelta }_{\setnum{2}} {\rm \comma }{\bf I}\sigma _{\epsiv }^{\setnum{2}} \rpar \cr \tab \quad \quad \times N\lpar {\bmdelta }_{\setnum{1}} \vert {\bf 0}\comma {\bmPsi }_{\setnum{1}} \sigma _{g_{\setnum{1}} }^{\setnum{2}} \rpar N\lpar {\bmdelta }_{\setnum{2}} \vert {\bf 0}\comma {\bmPsi }_{\setnum{2}} \sigma _{g_{\setnum{2}} }^{\setnum{2}} \rpar \cr \tab \quad \quad \times \chi ^{ \minus \setnum{2}} \lpar \sigma _{\epsiv }^{\setnum{2}} \vert {\rm df}_{\epsiv } \comma S_{\epsiv } \rpar \chi ^{ \minus \setnum{2}} \lpar \sigma _{g_{\setnum{1}} }^{\setnum{2}} \vert {\rm df}_{g_{\setnum{1}} } \comma S_{g_{\setnum{1}} } \rpar \chi ^{ \minus \setnum{2}} \lpar \sigma _{g_{\setnum{2}} }^{\setnum{2}} \vert {\rm df}_{g_{\setnum{2}} } \comma S_{g_{\setnum{2}} } \rpar . \cr}$

The Gibbs sampler draws samples of the unknowns from their fully conditional distributions, with the conjugate priors chosen, all fully conditionals are known, as described next.

Intercept. Parameter μ enters only in the likelihood; therefore,

$\eqalign{p\left( {\left. \mu \right\vert{\rm ELSE}} \right) \tab \propto N\left( {\left. {\bf y} \right\vert{\bf 1}\mu \plus {\bf \bmLambda }_{\setnum{1}} {\bmdelta }_{\setnum{1}} \plus {\bf \bmLambda }_{\setnum{2}} {\bmdelta }_{\setnum{2}} {\rm }\comma {\bf I}\sigma _{\epsiv }^{\setnum{2}} } \right)\cr\tab \propto N\left( {\left. {{\bf y}^{\mu } } \right\vert{\bf 1}\mu \comma {\bf I}\sigma _{\epsiv }^{\setnum{2}} } \right){\rm }\comma$

where y^μ=y−Λ₁ δ₁−Λ₂ δ₂, and ELSE denotes all other unknowns except for μ. The fully conditional distribution is then normal with mean $n^{ \minus \setnum{1}} \sum _{i} y_{i}^{\hskip2pt\mu }$ and variance n ⁻¹σ_∊².

Regression coefficients. The fully conditional distribution of δ₁ is

$\eqalign{ p\lpar {\bmdelta }_{\setnum{1}} {\rm \vert ELSE}\rpar\! \tab \propto N\lpar {\bf y}\vert {\rm }{\bf 1}\mu \plus {\bmLambda }_{\setnum{1}} {\bmdelta }_{\setnum{1}} \plus {\bmLambda }_{\setnum{2}} {\bmdelta }_{\setnum{2}} {\rm }\comma {\bf I}\sigma _{\epsiv }^{\setnum{2}} \rpar\cr\tab\quad\times N\lpar {\bmdelta }_{\setnum{1}} {\rm \vert }{\bf 0}\comma {\bmPsi }_{\setnum{1}} \sigma _{g_{\setnum{1}} }^{\setnum{2}} \rpar\cr\tab \propto N\lpar {\bf y}^{\delta _{\setnum{1}} } \vert {\bmLambda }_{\setnum{1}} {\bmdelta }_{\setnum{1}} \comma {\bf I}\sigma _{\epsiv }^{\setnum{2}} \rpar N\lpar {\bmdelta }_{\setnum{1}} \vert {\rm }{\bf 0}\comma {\bmPsi }_{\setnum{1}} \sigma _{g_{\setnum{1}} }^{\setnum{2}} \rpar \comma \cr}$

where ${\bf y}^{\delta _{\setnum{1}} } \equals {\bf y} \minus {\bf 1}\mu \minus {\bmLambda }_{\setnum{2}} {\bmdelta }_{\setnum{2}}$ . This is known to be a multivariate normal distribution with mean (covariance matrix) equal to the solution (inverse of the matrix of coefficients) of the following system of equations: $\lsqb {\bmLambda \prime}_{\setnum{1}} {\bmLambda }_{\setnum{1}} \sigma _{\epsiv }^{ \minus \setnum{2}} \plus {\bmPsi }_{\setnum{1}}^{ \minus \setnum{1}} \sigma _{g_{\setnum{1}} }^{ \minus \setnum{2}} \rsqb {\bf \hats{\bmdelta }}_{\setnum{1}} \equals {\bmLambda \prime}_{\setnum{1}} {\bf y}^{\delta _{\setnum{1}} } \sigma _{\epsiv }^{ \minus \setnum{2}}$ . Using Λ′₁ Λ₁=I, the system becomes $\lsqb {\bf I}\sigma _{\epsiv }^{ \minus \setnum{2}} \plus {\bmPsi }_{\setnum{1}}^{ \minus \setnum{1}} \sigma _{g_{\setnum{1}} }^{ \minus \setnum{2}} \rsqb {\bf \hats{\bmdelta }}_{\setnum{1}} \equals {\bmLambda \prime}_{\setnum{1}} {\bf y}^{\delta _{\setnum{1}} } \sigma _{\epsiv }^{ \minus \setnum{2}}$ . Since Ψ is diagonal, so is the matrix of coefficients of the above system, implying that the elements of δ₁ are conditionally independent. Moreover, p(δ_1j|ELSE) is normal, centred at $\lsqb 1 \plus \sigma _{\epsiv }^{\setnum{2}} \sigma _{g_{\setnum{1}} }^{ \minus \setnum{2}} \rmPsi _{\setnum{1}j}^{ \minus \setnum{1}} \rsqb ^{ \minus \setnum{1}} y_{.\hskip1pt j}^{\delta _{\setnum{1}} }$ and with variance $\sigma _{\epsiv }^{\setnum{2}} \lsqb 1 \plus \sigma _{\epsiv }^{\setnum{2}} \sigma _{g_{\setnum{1}} }^{ \minus \setnum{2}} \rmPsi _{\setnum{1}j}^{ \minus \setnum{1}} \rsqb ^{ \minus \setnum{1}}$ , where $y_{.\hskip1pt j}^{\delta _{\setnum{1}} } \equals {\bmlambda \prime}_{\setnum{1}j} {\bf y}^{\delta _{\setnum{1}} }$ . Here, λ_1j is the jth column (eigenvector) of Λ₁.

By symmetry, the fully conditional distribution of δ₂ is also multivariate normal and the associated system of equations is $\lsqb {\bf I}\sigma _{\epsiv }^{ \minus \setnum{2}} \plus {\bmPsi }_{\setnum{2}}^{ \minus \setnum{1}} \sigma _{g_{\setnum{2}} }^{ \minus \setnum{2}} \rsqb {\bf \hats{\bmdelta }}_{\setnum{2}} \equals {\bmLambda \prime}_{\setnum{2}} {\bf y}^{\delta _{\setnum{2}} } \sigma _{\epsiv }^{ \minus \setnum{2}}$ , where ${\bf y}^{\delta _{\setnum{2}} } \equals {\bf y} \minus {\bf 1}\mu \minus {\bmLambda }_{\setnum{1}} {\bmdelta }_{\setnum{1}}$ .

Variance parameters. The fully conditional distribution of the residual variance is

$\eqalign{p\left( {\left. {\sigma _{\epsiv }^{\setnum{2}} } \right\vert{\bf y}} \right) \tab \!\propto N\left( {\left. {\bf y} \right\vert{\rm \ }{\bf 1}\mu \plus {\bmLambda }_{\setnum{1}} {\bmdelta }_{\setnum{1}} \plus {\bmLambda }_{\setnum{2}} {\bmdelta }_{\setnum{2}} {\rm \ }\comma {\bf I}\sigma _{\epsiv }^{\setnum{2}} } \right)\chi ^{ \minus \setnum{2}} \left( {\left. {\sigma _{\epsiv }^{\setnum{2}} } \right\vert{\rm df}_{\epsiv } \comma S_{\epsiv } } \right) \cr \tab \!\propto N\left( {\left. {\bmepsiv } \right\vert{\bf 0}\comma {\bf I}\sigma _{\epsiv }^{\setnum{2}} } \right)\chi ^{ \minus \setnum{2}} \left( {\left. {\sigma _{\epsiv }^{\setnum{2}} } \right\vert{\rm df}_{\epsiv } \comma S_{\epsiv } } \right){\rm }\comma \cr}$

where ε=y−1μ−Λ₁ δ₁−Λ₂ δ₂. The above is a scaled inverse chi-square distribution with df=n+df_∊ and scale parameter $S \equals \lpar \sum _{i} \epsiv _{i}^{\setnum{2}} \plus {\rm df}_{\epsiv } S_{\epsiv } \rpar \sol \lpar n \plus {\rm df}_{\epsiv } \rpar$ .

The fully conditional distribution of $\sigma _{g_{\setnum{1}} }^{\setnum{2}}$ is $p\lpar \sigma _{g_{\setnum{1}} }^{\setnum{2}} {\rm \vert ELSE}\rpar \!\propto\! N\lpar {\bmdelta }_{\setnum{1}} {\rm \vert }{\bf 0}\comma {\bmPsi }_{\setnum{1}} \sigma _{g_{\setnum{1}} }^{\setnum{2}} \rpar \chi ^{ \minus \setnum{2}} \lpar \sigma _{g_{\setnum{1}} }^{\setnum{2}} {\rm \vert df}_{g_{\setnum{1}} } \comma S_{g_{\setnum{1}} } \rpar$ , which is a scaled inverse chi-square distribution with ${\rm df} \equals n \plus {\rm df}_{g_{\setnum{1}} }$ and scale parameter $S \equals \lpar \sum _{i} \Psi _{\setnum{1}j}^{ \minus \setnum{1}} \delta _{\setnum{1}j}^{\setnum{2}} \plus {\rm df}_{g_{\setnum{1}} } S_{g_{\setnum{1}} } \rpar \sol \lpar n \plus {\rm df}_{g} \rpar$ . Here, Ψ_1j is the jth EV of K₁. Similarly, the fully conditional distribution of $\sigma _{g_{\setnum{2}} }^{\setnum{2}}$ is scaled inverse chi-square with ${\rm df} \equals n \plus {\rm df}_{g_{\setnum{2}} }$ and scale parameter $S \equals \lpar \sum _{j} \Psi _{\setnum{2}j}^{ \minus \setnum{1}} \delta _{\setnum{2}j}^{\setnum{2}} \plus {\rm df}_{g_{\setnum{2}} } S_{g_{\setnum{2}} } \rpar \sol \lpar n \plus {\rm df}_{g_{\setnum{2}} }\rpar$ .

2. Tables and Figures

Table A1.

Posterior mean (SD) of residual variance by model and environment

E1–E4 are the four environments where wheat lines were evaluated; K_θ are (Bayesian) RKHS models using a Gaussian kernel evaluated at marker-genotypes with bandwidth parameter θ; K_0·25+K₇ is a model that includes two Gaussian kernels differing only in the value of θ.

Table A2.

MSE between realized phenotypes and CV predictions, by model and environment

E1–E4 are the four environments where wheat lines were evaluated; K_θ are (Bayesian) RKHS models using a Gaussian kernel evaluated at marker genotypes with bandwidth parameter θ; K_0·25+K_7·00 is a model that includes two Gaussian kernels differing only in the value of θ.

Fig. A1.

Posterior mean of the variance of the regression on the pedigree, σ_a ², versus values of the bandwidth parameter, θ, by environment and model. Pedigree & Markers K _θ uses pedigree and markers, here, θ is the value of the bandwidth parameter for markers. Pedigree & Markers K _0·25+K ₇ uses pedigree and markers with KA. E1–E4: environments where the lines were evaluated.

The authors thank Vivi Arief from the School of Land Crop and Food Sciences of the University of Queensland, Australia, for assembling the historical wheat phenotypic and molecular marker data and for computing the additive relationships between the wheat lines. We acknowledge valuable comments from Grace Wahba, David B. Allsion, Martin Schlather, Emilio Porcu and two anonymous reviewers. Financial support by the Wisconsin Agriculture Experiment Station; grant DMS-NSF DMS-044371 is acknowledged.

Footnotes

References

Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society 68, 337–404.CrossRef Google Scholar

Bernardo, R. & Yu, J. (2007). Prospects for genome-wide selection for quantitative traits in maize. Crop Science 47, 1082–1090.CrossRef Google Scholar

Calus, M. P. L. & Veerkamp, R. F. (2007). Accuracy of breeding values when using and ignoring the polygenic effect in genomic breeding value estimation with a marker density of one SNP per cM. Journal of Animal Breeding and Genetics 124, 362–388.CrossRef Google Scholar PubMed

Cockerham, C. C. (1954). An extension of the concept of partitioning hereditary variance for analysis of covariance among relatives when epistasis is present. Genetics 39, 859–882.CrossRef Google Scholar PubMed

Corrada Bravo, H., Lee, K. E., Klein, B. E. K., Klein, R., Iyengar, S. K. & Wahba, G. (2009). Examining the relative influence of familial, genetic and environmental covariate information in flexible risk models. Proceedings of the National Academy of Sciences, USA 106, 8128–8133.CrossRef Google Scholar

Cressie, N. (1993). Statistics for Spatial Data. New York Wiley.CrossRef Google Scholar

de los Campos, G., Gianola, D. & Rosa, G. J. M. (2009 a). Reproducing kernel Hilbert spaces regression: a general framework for genetic evaluation. Journal of Animal Science 87, 1883–1887.CrossRef Google Scholar PubMed

de los Campos, G., Naya, H., Gianola, D., Crossa, J., Legarra, A., Manfredi, E., Weigel, K. & Cotes, J. M. (2009 b). Predicting quantitative traits with regression models for dense molecular markers and pedigrees. Genetics 182, 375–385.CrossRef Google Scholar

de los Campos, G., Gianola, D., Rosa, G. J. M., Weigel, K. A., Vazquez, A. I. & Allison, D. B. (2010). Semi-Parametric Marker-enabled Prediction of Genetic Values using Reproducing Kernel Hilbert Spaces methods. In: Proceedings of the 9th World Congress on Genetics Applied to Livestock Production. Leipzig, Germany, in press.Google Scholar

Eding, J. H. & Meuwissen, T. H. E. (2001). Marker based estimates of between and within population kinships for the conservation of genetic diversity. Journal of Animal Breeding and Genetics 118, 141–159.CrossRef Google Scholar

Fernando, R. L. & Grossman, M. (1989). Marker assisted selection using best linear unbiased prediction. Genetics Selection Evolution 21, 467–477.CrossRef Google Scholar

Fisher, R. A. (1918). The correlation between relatives on the supposition of Mendelian inheritance. Transactions of the Royal Society of Edinburgh 52, 399–433.CrossRef Google Scholar

Gelman, A., Carlin, J. B., Stern, H. S. & Rubin, D. B. (2004). Bayesian Data Analysis. London, UK: Chapman and Hall.Google Scholar

Gianola, D. & de los Campos, G. (2008). Inferring genetic values for quantitative traits non-parametrically. Genetics Research 90, 525–540.CrossRef Google Scholar PubMed

Gianola, D. & van Kaam, J. B. C. H. M. (2008). Reproducing kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics 178, 2289–2303.CrossRef Google Scholar PubMed

Gianola, D., Fernando, R. L. & Stella, A. (2006). Genomic-assisted prediction of genetic value with semiparametric procedures. Genetics 173, 1761–1776.CrossRef Google Scholar PubMed

Gianola, D., de los Campos, G., Hill, W. G., Manfredi, E. & Fernando, R. L. (2009). Additive genetic variability and the Bayesian alphabet. Genetics 183, 347–363.CrossRef Google Scholar PubMed

Golub, G. H. & Van Loan, C. F. (1996). Matrix Computations 3rd ed. Baltimor and London: The Johns Hopkins University Press.Google Scholar

Habier, D., Fernando, R. L. & Dekkers, J. C. M. (2007). The impact of genetic relationships information on genome-assisted breeding values. Genetics 177, 2389–2397.CrossRef Google Scholar PubMed

Harville, D. A. (1983). Discussion on a section on interpolation and estimation. In David, H. A. & David, H. T. (ed.). Statistics an Appraisal, pp. 281–286. Ames, IA: The Iowa State University Press.Google Scholar

Hastie, T., Tibshirani, R. & Friedman, J. (2009). The Elements of Statistical Learning (Data Mining, Inference, and Prediction) 2nd edition. New York, NY: Springer.Google Scholar

Hayes, B. J. & Goddard, M. E. (2008). Prediction of breeding values using marker-derived relationship matrices. Journal of Animal Science 86, 2089–2092.CrossRef Google Scholar PubMed

Heffner, E. L., Sorrells, M. E. & Jannink, J. L. (2009). Genomic selection for crop improvement. Crop Science 49, 1–12.CrossRef Google Scholar

Henderson, C. R. (1975). Best linear unbiased estimation and prediction under a selection model. Biometrics 31, 423–447.CrossRef Google Scholar

Hoerl, A. E. & Kennard, R. W. (1970 a). Ridge regression: Biased estimation for non-orthogonal problems. Technometrics 12, 55–67.CrossRef Google Scholar

Hoerl, A. E. & Kennard, R. W. (1970 b). Ridge regression: Biased estimation for non-orthogonal problems. Technometrics 12, 69–82.CrossRef Google Scholar

Kempthorne, O. (1954). The correlation between relatives in a random mating population. Proceedings of the Royal Society of London B 143, 103–113.Google Scholar

Kimeldorf, G. S. & Wahba, G. (1970). A correspondence between Bayesian estimation on stochastic process and smoothing by splines. Annals of Mathematical Statistics, 41, 495–502.CrossRef Google Scholar

Kimeldorf, G. S. & Wahba, G. (1971). Some results on Tchebycheffian spline functions. Journal of Mathematic Analysis and Applications 33, 82–95.CrossRef Google Scholar

Kondor, R. I. & Lafferty, J. (2002). Diffusion kernels on graphs and other discrete inputs. Proceedings of 19th International Conference on Machine Learning (ICML-2002).Google Scholar

Long, N., Gianola, D., Rosa, G., Weigel, K., Kranis, A. & González-Recio, O. (2010). Radial basis function regression methods for predicting quantitative traits using SNP markers. Genetics Research, in press.CrossRef Google Scholar PubMed

Lynch, M. & Ritland, K. (1999). Estimation of pairwise relatedness with molecular markers. Genetics 152, 1753–1766.CrossRef Google Scholar PubMed

Mallick, B., Ghosh, D. & Ghosh, M. (2005). Bayesian kernel-based classification of microarray data. Journal of the Royal Statistical Society, Series B 2, 219–234.CrossRef Google Scholar

Meuwissen, T. H. E., Hayes, B. J. & Goddard, M. E. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819–1829.CrossRef Google Scholar PubMed

McLaren, C. G., Bruskiewich, R., Portugal, A. M. & Cosico, A. B. (2005). The international Rice information system. A platform for meta-analysis of rice crop data. Plant Physiology 139, 637–642.CrossRef Google Scholar PubMed

Park, T. & Casella, G. (2008). The Bayesian LASSO. Journal of the American Statistical Association 103, 681–686.CrossRef Google Scholar

Ritland, K. (1996). Estimators for pairwise relatedness and individual inbreeding coefficients. Genetics Research 67, 175–186.CrossRef Google Scholar

Shawe-Taylor, J. & Cristianini, N. (2004). Kernel Methods for Pattern Analysis. Cambridge, UK: Cambridge University Press.CrossRef Google Scholar

Sorensen, D. & Gianola, D. (2002). Likelihood, Bayesian, and MCMC Methods in Quantitative Genetics. New York: Springer-Verlag.CrossRef Google Scholar

Speed, T. (1991). [That BLUP is a good thing: the estimation of random effects]: Comment. Statistical Science 6, 42–44.CrossRef Google Scholar

Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, B 58, 267–288.Google Scholar

Van Raden, P. M. (2007). Genomic measures of relationship and inbreeding. Interbull Bulletin 37, 33–36.Google Scholar

Vapnik, V. (1998). Statistical Learning Theory. New York, NY: Wiley.Google Scholar

Wahba, G. (1990). Spline Models for Observational Data. Philadelphia, PA: Society for Industrial and applied Mathematics.CrossRef Google Scholar

Wright, S. (1921). Systems of mating. I. The biometric relations between parents and offspring. Genetics 6, 111–123.CrossRef Google Scholar PubMed

Fig. 1. Alternative models for prediction of genetic values. Phenotypic records (y) were always the sum of a genetic signal (g) and a vector of Gaussian residuals (ε). Models differed on how g was represented, as described in the figure. BL, Bayesian LASSO; RKHS, reproducing kernel Hilbert spaces regression; λ, LASSO regularization parameter; θ, RKHS bandwidth parameter; σ·2, variance parameter; KA, kernel averaging; N(., .), normal density; DE(.), double-exponential density.

Fig. 2. Histogram of the evaluations of Gaussian kernel K(i,i′)=exp{−θk−1dii′} by value of the bandwidth parameter (θ=0·25 left and θ=7, right). Here, dii′=||xi−xi′||2 is the squared Euclidean distance between marker codes xi=(xi1, …, xip)′ and xi′=(xi′1, …, xi′p)′ , and k \equals \mathop {\max }\limits_{\left( {i\comma i \prime} \right)} \lcub \vert \vert {\bf x}_{i} \minus {\bf x}_{i \prime} \vert \vert ^{\rm \setnum{2}} \rcub .

Fig. 3. Estimated posterior mean of the residual variance versus values of the bandwidth parameter, θ, by environment and model. Kθ is a marker-based RKHS regression with bandwidth parameter θ; Pedigree & Markers Kθ uses pedigree and markers, here, θ is the value of the bandwidth parameter for markers. Pedigree & Markers K0·25+K7 uses pedigree and markers with kernel averaging (KA) E1–E4: environments where the lines were evaluated.

Fig. 4. Estimated MSE between CV predictions (yHat) and observations (y) versus values of the bandwidth parameter, θ, by environment and model. Kθ is a marker-based RKHS regression with bandwidth parameter θ; Pedigree & Markers Kθ uses pedigree and markers, here, θ is the value of the bandwidth parameter for markers. Pedigree & Markers K0·25+K7 uses pedigree and markers with KA. E1–E4: environments where the lines were evaluated.

Table A1. Posterior mean (SD) of residual variance by model and environment

Table A2. MSE between realized phenotypes and CV predictions, by model and environment

Fig. A1. Posterior mean of the variance of the regression on the pedigree, σa2, versus values of the bandwidth parameter, θ, by environment and model. Pedigree & Markers Kθ uses pedigree and markers, here, θ is the value of the bandwidth parameter for markers. Pedigree & Markers K0·25+K7 uses pedigree and markers with KA. E1–E4: environments where the lines were evaluated.

Article contents

Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods

Summary

Information

1. Introduction

2. RKHS regression

(i) Penalized estimation in RKHS

(ii) Bayesian interpretation

(iii) Representation using orthogonal random variables

(iv) Choosing the RK

3. Application to plant breeding data

(i) Materials and methods

(ii) Results

4. Concluding remarks

APPENDIX

1. Gibbs sampler

2. Tables and Figures

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests