Skip to main content
    • Aa
    • Aa

LASSO with cross-validation for genomic selection

  • M. GRAZIANO USAI (a1), MIKE E. GODDARD (a2) (a3) and BEN J. HAYES (a3)

We used a least absolute shrinkage and selection operator (LASSO) approach to estimate marker effects for genomic selection. The least angle regression (LARS) algorithm and cross-validation were used to define the best subset of markers to include in the model. The LASSO–LARS approach was tested on two data sets: a simulated data set with 5865 individuals and 6000 Single Nucleotide Polymorphisms (SNPs); and a mouse data set with 1885 individuals genotyped for 10 656 SNPs and phenotyped for a number of quantitative traits. In the simulated data, three approaches were used to split the reference population into training and validation subsets for cross-validation: random splitting across the whole population; random sampling of validation set from the last generation only, either within or across families. The highest accuracy was obtained by random splitting across the whole population. The accuracy of genomic estimated breeding values (GEBVs) in the candidate population obtained by LASSO–LARS was 0·89 with 156 explanatory SNPs. This value was higher than those obtained by Best Linear Unbiased Prediction (BLUP) and a Bayesian method (BayesA), which were 0·75 and 0·84, respectively. In the mouse data, 1600 individuals were randomly allocated to the reference population. The GEBVs for the remaining 285 individuals estimated by LASSO–LARS were more accurate than those obtained by BLUP and BayesA for weight at six weeks and slightly lower for growth rate and body length. It was concluded that LASSO–LARS approach is a good alternative method to estimate marker effects for genomic selection, particularly when the cost of genotyping can be reduced by using a limited subset of markers.

Corresponding author
*Corresponding author. Settore Genetica e Biotecnologie, AGRIS-Sardegna, Loc. Bonassai, Km 18·6 S. S. Sassari-Fertilia, 07040, Olmedo (SS), Italy. Tel: +39 079387318. Fax: +39-079389450. e-mail:
Hide All
de Los Campos G., Naya H., Gianola D., Crossa J., Legarra A., Manfredi E., Weigel K. & Cotes J. M. (2009). Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics 182, 375385.
Efron B., Hastie T., Johnstone I. & Tibshirani R. (2004). Least angle regression. Annals of Statistics 32, 407499.
Foster S. D., Verbyla A. P. & Pitchford W. S. (2007). Incorporating LASSO effects into a mixed model for quantitative trait loci detection. Journal of Agricultural, Biological and Environmental Statistics 12, 300314.
Gianola D., Fernando R. L. & Stella A. (2006). Genomic-assisted prediction of genetic value with semiparametric procedures. Genetics 173, 17611776.
Gianola D., de Los Campos G., Hill W. G., Manfredi E. & Fernando R. (2009). Additive genetic variability and the Bayesian alphabet. Genetics 183, 347363.
Habier D., Fernando R. L. & Dekkers J. C. M. (2007). The impact of genetic relationship information on genome-assisted breeding values. Genetics 177, 23892397.
Hayes B. J. & Goddard M. E. (2001). The distribution of the effects of genes affecting quantitative traits in livestock. Genetics, Selection, Evolution 33, 209229.
Kohavi R. (1995). A study of cross-validation and bootstrap for estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (ed. Mellish C. S.), pp. 11371143. San Francisco, CA: Morgan Kaufmann Publishers.
Legarra A., Robert-Granie C., Manfredi E. & Elsen J. M. (2008). Performance of genomic selection in mice. Genetics 180, 611618.
Lund M. S., Sahana G., de Koning D. J., Su G. & Carlborg Ö. (2009). Comparison of analyses of the QTLMAS XII common dataset I: genomic selection. BMC Proceedings 3, S1.
Maher B. (2008). The missing heritability. Nature Genetics 456, 1821.
Meuwissen T. H. E., Hayes B. & Goddard M. E. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 18191829.
Park T. & Casella G. (2008). The Bayesian LASSO. Journal of the American Statistical Association 103, 681686.
Sanna S., Jackson A. U., Nagaraja R., Willer C. J., Chen W. M., Bonnycastle L. L., Shen H., Timpson N., Lettre G., Usala G., Chines P. S., Stringham H. M., Scott L. J., Dei M., Lai S., Albai G., Crisponi L., Naitza S., Doheny K. F., Pugh E. W., Ben-Shlomo Y., Ebrahim S., Lawlor D. A., Bergman R. N., Watanabe R. M., Uda M., Tuomilehto J., Coresh J., Hirschhorn J. N., Shuldiner A. R., Schlessinger D., Collins F. S., Davey Smith G., Boerwinkle E., Cao A., Boehnke M., Abecasis G. R. & Mohlke K. L. (2008). Common variants in the GDF5-UQCC region are associated with variation in human height. Nature Genetics 40, 198203.
Tibshirani R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B 58, 267288.
Valdar W., Solberg L. C., Gauguier D., Cookson W. O., Rawlins J. N. P., Mott R. & Flint J. (2006). Genetic and environmental effects on complex traits in mice. Genetics 174, 959984.
Yi N. & Xu S. (2008). Bayesian LASSO for quantitative trait loci mapping. Genetics 179, 1045–55.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Genetics Research
  • ISSN: 0016-6723
  • EISSN: 1469-5073
  • URL: /core/journals/genetics-research
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 7
Total number of PDF views: 76 *
Loading metrics...

Abstract views

Total abstract views: 344 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 23rd October 2017. This data will be updated every 24 hours.