Hostname: page-component-89b8bd64d-7zcd7 Total loading time: 0 Render date: 2026-05-06T08:57:36.229Z Has data issue: false hasContentIssue false

Radial basis function regression methods for predicting quantitative traits using SNP markers

Published online by Cambridge University Press:  29 July 2010

NANYE LONG*
Affiliation:
Department of Animal Sciences, University of Wisconsin, Madison, WI 53706, USA
DANIEL GIANOLA
Affiliation:
Department of Animal Sciences, University of Wisconsin, Madison, WI 53706, USA Department of Dairy Science, University of Wisconsin, Madison, WI 53706, USA
GUILHERME J. M. ROSA
Affiliation:
Department of Dairy Science, University of Wisconsin, Madison, WI 53706, USA
KENT A. WEIGEL
Affiliation:
Department of Dairy Science, University of Wisconsin, Madison, WI 53706, USA
ANDREAS KRANIS
Affiliation:
Aviagen Ltd, Newbridge, Midlothian EH28 8SZ, UK
OSCAR GONZÁLEZ-RECIO
Affiliation:
Instituto Nacional de Investigacion y Tecnologia Agraria y Alimentaria, 28040 Madrid, Spain
*
*Corresponding author. Nanye Long, Department of Animal Sciences, University of Wisconsin, Madison, WI 53706, USA. e-mail: nlong@wisc.edu
Rights & Permissions [Opens in a new window]

Summary

A challenge when predicting total genetic values for complex quantitative traits is that an unknown number of quantitative trait loci may affect phenotypes via cryptic interactions. If markers are available, assuming that their effects on phenotypes are additive may lead to poor predictive ability. Non-parametric radial basis function (RBF) regression, which does not assume a particular form of the genotype–phenotype relationship, was investigated here by simulation and analysis of body weight and food conversion rate data in broilers. The simulation included a toy example in which an arbitrary non-linear genotype–phenotype relationship was assumed, and five different scenarios representing different broad sense heritability levels (0·1, 0·25, 0·5, 0·75 and 0·9) were created. In addition, a whole genome simulation was carried out, in which three different gene action modes (pure additive, additive+dominance and pure epistasis) were considered. In all analyses, a training set was used to fit the model and a testing set was used to evaluate predictive performance. The latter was measured by correlation and predictive mean-squared error (PMSE) on the testing data. For comparison, a linear additive model known as Bayes A was used as benchmark. Two RBF models with single nucleotide polymorphism (SNP)-specific (RBF I) and common (RBF II) weights were examined. Results indicated that, in the presence of complex genotype–phenotype relationships (i.e. non-linearity and non-additivity), RBF outperformed Bayes A in predicting total genetic values using SNP markers. Extension of Bayes A to include all additive, dominance and epistatic effects could improve its prediction accuracy. RBF I was generally better than RBF II, and was able to identify relevant SNPs in the toy example.

Information

Type
Paper
Copyright
Copyright © Cambridge University Press 2010
Figure 0

Table 1. Correlation and PMSE1 on test data in the toy example. Five scenarios (1–5) of broad sense heritability are 0·1, 0·25, 0·5, 0·75 and 0·9, respectively. corr (E(y), ŷ): correlation between expected and predicted phenotype; PMSE1 \equals n_{{test}}^{ \minus \setnum{1}} \sum\nolimits_{i \equals \setnum{1}}^{n_{{{test}}} } \,{\lpar {E\lpar y_{i} \rpar \minus {\hat{y}}_{i} } \rpar }^{\setnum{2}}, where ntest=500 is sample size in the testing set. Bayes AA, Bayes AAD and Bayes AADE represent a Bayes A model that contains additive marker effects only, additive + dominance effects and additive + dominance + pair-wise epistatic effects, respectively.

Figure 1

Fig. 1. Posterior means of θ in RBF I and of SNP effects g in Bayes A in the toy example. The five scenarios correspond to different ratios between the genetic and residual variances (1–1:9; 2–1:3; 3–1:1; 4–3:1 and 5–9:1).

Figure 2

Table 2. Correlation and PMSE2 in the testing sets of BW and FCR data. corr(y, ŷ): correlation between observed and predicted phenotype. PMSE2 \equals n_{{test}}^{ \minus \setnum{1}} \sum\nolimits_{i \equals \setnum{1}}^{n_{{{test}}} } \,\lpar y_{i} \minus {\hat{y}}_{i} \rpar ^{\setnum{2}}, where ntest is the sample size in the testing set.

Figure 3

Fig. 2. Density of prediction accuracy of Bayes A and RBF II in the whole genome simulation (based on 50 replications), for each of the three scenarios (pure additive, additive+dominance and pure epistasis). Prediction accuracy is corr(E(y), ŷ). RBF50–RBF300 indicate RBF II with 50–300 basis functions, respectively.

Figure 4

Fig. 3. Density of PMSE1 of Bayes A and RBF II in the whole genome simulation (based on 50 replications), for each of the three scenarios (pure additive, additive+dominance and pure epistasis). PMSE1 \equals n_{{\rm test}}^{ \minus \setnum{1}} \sum\nolimits_{i \equals \setnum{1}}^{n_{{{\rm test}}} } \, {\left( {E\lpar y_{i} \rpar \minus \mathop {\hat{y}}\nolimits_{i} } \right)}\nolimits^{\setnum{2}}, where ntest=500 is sample size in the testing set. RBF50–RBF300 indicate RBF II with 50–300 basis functions, respectively.

Figure 5

Table 3. Prediction accuracy and PMSE1 for the three scenarios (pure additive, additive+dominance and pure epistasis) in the whole genome simulation. Prediction accuracy is corr(E(y), ŷ). PMSE1 \equals n_{{test}}^{ \minus \setnum{1}} \sum\nolimits_{i \equals \setnum{1}}^{n_{{{test}}} } \,{\lpar {E\lpar y_{i} \rpar \minus {\hat{y}}_{i} } \rpar }^{\setnum{2}}, where ntest=500 is the sample size in the testing set. RBF50–RBF300 indicate RBF II with 50–300 basis functions, respectively. Results were averages of 50 replications, with standard deviations given in parentheses.

Figure 6

Fig. 4. Density plots of r2 (squared correlation between two SNP loci) of pairwise SNP LD in BW and FCR data. Exhaustive pairwise computations were not feasible (24 126 931 pairs from 6947 SNPs in BW; 6 056 940 pairs from 3481 SNPs in FCR), so repeated sampling of all SNPs was used. Each repetition consisted of randomly sampling 50 SNPs, and pairwise r2 was calculated. Values of r2 from all repetitions were collected to approximate the LD level among the SNPs, and used for plotting. Here, 10 repetitions were performed for each dataset.