Skip to main content

Population and quantitative genomic properties of the USDA soybean germplasm collection

  • Alencar Xavier (a1) (a2), Rima Thapa (a1), William M. Muir (a3) and Katy Martin Rainey (a1)

This study is the first assessment of the entire soybean [Glycine max (L.) Merr] collection of the United State Department of Agriculture National Plant Germplasm System (USDA) reporting quantitative and population genomic parameters. It also provides a new insight into soybean germplasm structure. Germplasm studies enable plant breeders to incorporate novel genetic resources into breeding pipelines to improve valuable agronomic traits. We conducted comprehensive analyses on the 19,652 soybean accessions in the USDA-ARS germplasm collection, genotyped with the SoySNP50 K iSelect BeadChip SNP array, to elucidate the quantitative properties of existing subpopulations inferred through hierarchical clustering performed with Ward's D agglomeration method and Nei's standard genetic distance. We found the effective population size to be approximately 106 individuals based on the linkage disequilibrium of unlinked loci. The cladogram indicated the existence of eight major clusters. Each cluster displays particular properties with regard to major quantitative traits. Among those, cluster 3 represents the tropical and semi-tropical genetic material, cluster 5 displays large seeds and may represent food-grade germplasm, and cluster 7 represents the undomesticated material in the germplasm collection. The average FST among clusters was 0.22 and a total of 914 SNPs were exclusive to specific clusters. Our classification and characterization of the germplasm collection into major clusters provides valuable information about the genetic resources available to soybean breeders and researchers.

Corresponding author
*Corresponding author. E-mail:
Hide All
Akey, JM, Zhang, G, Zhang, K, Jin, L and Shriver, MD (2002) Interrogating a high-density SNP map for signatures of natural selection. Genome Research 12: 18051814.
Arshad, MU, Ali, N and Ghafoor, A (2006) Character correlation and path coefficient in soybean Glycine max (L.) Merrill. Pakistan Journal of Botany 38: 121.
Bandillo, N, Jarquin, D, Song, Q, Nelson, R, Cregan, P, Specht, J and Lorenz, A (2015) A population structure and genome-wide association analysis on the USDA soybean germplasm collection. Plant Gene 8: 113. doi: 10.3835/plantgenome2015.04.0024.
Brown-Guedira, GL, Thomson, JA, Nelson, RL and Warburton, ML (2000) Evaluation of genetic diversity of soybean introductions and North American ancestors using RAPD and SSR markers. Crop Science 40: 815823.
Carter, TE, Hymowitz, T and Nelson, RL (2004a) Biogeography, local adaptation, Vavilov and genetic diversity in soybean. In: Werner, D (eds) Biological Resources and Migration. Berlin: Springer, pp. 4759.
Carter, TE, Nelson, R, Sneller, CH and Cui, Z (2004b) In soybeans: improvement, production, and uses. In: Boerma, HR and Specht, JE (eds) Vol Agronomy. Madison, WI: American Society of Agronomy, Crop Science Society of America, Soil Science Society of America, no 16, pp. 303416.
Chan, C, Qi, X, Li, M-W, Wong, F-L and Lam, H-M (2012) Recent developments of genomic research in soybean. Journal of Genetics and Genomics 39: 317324.
Chang, H, Lipka, AE, Domier, LL and Hartman, GL (2016) Characterization of disease resistance loci in the USDA soybean germplasm collection using genome-wide association studies. Plytopathology 106: 11391151.
Concibido, V, La Vallee, B, Mclaird, P, Pineda, N, Meyer, J, Hummel, L, Yang, J, Wu, K and Delannay, X (2003) Introgression of a quantitative trait locus for yield from Glycine soja into commercial soybean cultivars. Theoretical and Applied Genetics 106: 575582.
Cox, TF and Cox, MA (2000) Multidimensional Scaling. CRC Press.
DeJong, G and VanNoordwijk:, AJ (1992) Acquisition and allocation of resources: genetic (co) variances, selection, and life histories. American Naturalist 139: 749770.
Doebley, JF, Gaut, BS and Smith, BD (2006) The molecular genetics of crop domestication. Cell 127: 1309–1142.
Ecochard, R and Ravelomanantsoa, Y (1982) Genetic correlations derived from full-sib relationships in soybean (Glycine max Merr.). Theoretical and Applied Genetics 63: 915.
Ertl, DS and Fehr, WR (1985) Agronomic performance of soybean genotypes from Glycine max x Glycine soja crosses. Crop Science 25: 589592.
Flori, L, Fritz, S, Jaffrézic, F, Boussaha, M, Gut, I, Heath, S, Foulley, JL and Gautier, M (2009) The genome response to artificial selection: a case study in dairy cattle. PLoS ONE 4: e6595.
Grant, D, Nelson, RT, Cannon, SB and Shoemaker, RC (2009) Soybase, the USDA-ARS soybean genetics and genomics database. Nucleic Acids Research 38: D843D846.
Guo, J, Wang, Y, Song, C, Zhou, J, Qiu, L, Huang, H and Wang, Y (2010) A single origin and moderate bottleneck during domestication of soybean (Glycine max): implications from microsatellites and nucleotide sequences. Annals of Botany 106: 505514.
Ha, BK, Lee, KJ, Velusamy, V, Kim, JB, Kim, SH, Ahn, JW, Kang, SY and Kim, DS (2014) Improvement of soybean through radiation-induced mutation breeding techniques in Korea. Plant Genetic Resources 12: S54S57.
Hazel, LN (1943) The genetic basis for constructing selection indexes. Genetics 28: 476490.
He, S, Wang, Y, Volis, S, Li, D and Yi, T (2012) Genetic diversity and population structure: implications for conservation of wild soybean (Glycine soja Sieb. et Zucc) based on nuclear and chloroplast microsatellite variation. International Journal of Molecular Sciences 13: 1260812628.
Henryon, M, Berg, P and Sørensen, AC (2014) Animal-breeding schemes using genomic information need breeding plans designed to maximise long-term genetic gains. Livestock Science 166: 3847.
Holsinger, KE and Weir, BS (2009) Genetics in geographically structured populations: defining, estimating and interpreting FST. Nature Reviews Genetics 10: 639650.
Hou, A, Chen, P, Alloatti, J, Li, D, Mozzoni, L, Zhang, B and Shi, A (2009) Genetic variability of seed sugar content in worldwide soybean germplasm collections. Crop Science 49: 903912.
Hymowitz, T (2008) The history of the soybean. In Johnson, L, White, PJ and Galloway, R (eds) Soybeans: Chemistry, Production, Processing and Utilization. Urbana, IL: AOCS Press, pp. 132.
Hyten, DL, Song, Q, Zhu, Y, Choi, I, Nelson, RL, Costa, JM, Specht, JE, Shoemaker, RC and Cregan, PB (2006) Impacts of genetic bottlenecks on soybean genome diversity. Proceedings of the National Academy of Sciences of the United States opf America 103: 1666616671.
James, G, Witten, D, Hastie, T and Tibshirani, R (2013) An Introduction to Statistical Learning. New York: Springer, 1st ed. 2013, Corr. 5th printing 2015 Edition.
Jarquin, D, Specht, J and Lorenz, A (2016) Prospects of genomic prediction in the USDA soybean germplasm collection: historical data creates robust models for enhancing selection of accessions. G3: Genes| Genomes| Genetics 6: 23292341.
Johnson, HW, Robinson, HF and Comstock, RE (1955) Estimates of genetic and environmental variability in soybeans. Agronomy Journal 47: 314318.
Jombart, T and Ahmed, I (2011) Adegenet 1.3–1: new tools for the analysis of genome-wide SNP data. Bioinformatics 27: 30703071.
Kuroda, Y, Kaga, A, Tomooka, N, Yano, H, Takada, Y, Kato, S and Vaughan, D (2013) QTL affecting fitness of hybrids between wild and cultivated soybeans in experimental fields. Ecology and Evolution 3: 21502168.
Kwon, SH and Torrie, JH (1964) Heritability and interrelationship among traits of two soybean populations. Crop Science 4: 196198.
Lachance, J and Tishkoff, SA (2013) SNP ascertainment bias in population genetic analyses: why it is important, and how to correct it. Bioessays 35: 780786.
Li, YH, Li, W, Zhang, C, Yang, L, Chang, RZ, Gaut, BS and Qiu, LJ (2010) Genetic diversity in domesticated soybean (Glycine max) and its wild progenitor (Glycine soja) for simple sequence repeat and single-nucleotide polymorphism loci. New Phytologist 188: 242253.
Li, YH, Zhao, SC, Ma, JX, Li, D, Yan, L, Li, J, Qi, XT, Guo, XS, Zhang, L, He, WM and Chang, RZ (2013) Molecular footprints of domestication and improvement in soybean revealed by whole genome re-sequencing. BMC Genomics 14: 579.
Mardia, KV (1978) Some properties of classical multidimensional scaling. Communications on Statistics – Theory and Methods A7: 12331241.
Min, W, Run-zhi, L, Wan-ming, Y and Wei-jun, D (2013) Assessing the genetic diversity of cultivars and wild soybeans using SSR markers. African Journal of Biotechnology 9: 48574866.
Misztal, I, Tsuruta, S, Strabel, T, Auvray, B, Druet, T and Lee, DH (2002) BLUPF90 and related programs (BGF90). In Proceedings of the 7th World Congress on Genetics Applied to Livestock Production, Montpellier, France, August 2002; Session 28. (pp. 1–2). Institut National de la Recherche Agronomique (INRA).
Molnar, SJ, Rai, S, Charette, M and Cober, ER (2003) Simple sequence repeat (SSR) markers linked to E1, E3, E4, and E7 maturity genes in soybean. Genome 46: 10241036.
Muir, WM (2007) Comparison of genomic and traditional BLUP-estimated breeding value accuracy and selection response under alternative trait and genomic parameters. Journal of Animal Breeding and Genetics 124: 342355.
Murtagh, F and Legendre, P (2014) Ward's hierarchical agglomerative clustering method: which algorithms implement ward's criterion? Journal of Classification 31: 274295.
Narvel, JM, Fehr, WR, Chu, WC, Grant, D and Shoemaker, RC (2000) Simple sequence repeat diversity among soybean plant introductions and elite genotypes. Crop Science 40: 14521458.
Nei, M (1972) Genetic distance between populations. American Naturalist 106: 283292.
Paradis, E, Claude, J and Strimmer, K (2004) APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20: 289290.
Recker, JR, Burton, JW, Cardinal, A and Miranda, L (2013) Analysis of quantitative traits in two long-term randomly mated soybean populations: I. Genetic Variances 53: 13751383.
Recker, JR, Burton, JW, Cardinal, A and Miranda, L (2014) Genetic and phenotypic correlations of quantitative traits in two long-term, randomly mated soybean populations. Crop Science 54: 939943.
Reif, JC, Melchinger, AE and Frisch, M (2005) Genetical and mathematical properties of similarity and dissimilarity coefficients applied in plant breeding and seed bank management. Crop Science 45: 17.
Samanfar, B, Molnar, SJ, Charette, M, Schoenrock, A, Dehne, F, Golshani, A, Belzile, F and Cober, ER (2017) Mapping and identification of a potential candidate gene for a novel maturity locus, E10, in soybean. Theoretical and Applied Genetics 130: 377390.
Schmutz, J, Cannon, SB, Schlueter, J, Ma, J, Mitros, T, Nelson, W, Hyten, DL, Song, Q, Thelen, JJ, Cheng, J and Xu, D, (2010) Genome sequence of the palaeopolyploid soybean. Nature 463: 178183.
Searle, SR (1961) Phenotypic, genetic and environmental correlations. Biometrics 17: 474480.
Sherman-Broyles, S, Bombarely, A, Powell, AF, Doyle, JL, Egan, AN, Coate, JE and Doyle, JJ (2014) The wild side of a major crop: soybean's perennial cousins from down under. American Journal of Botany 101: 16511665.
Shi, A, Chen, P, Zhang, B and Hou, A (2010) Genetic diversity and association analysis of protein and oil content in food-grade soybeans from Asia and the United States. Plant Breeding 129: 250256.
Shoemaker, RC, Schlueter, J and Doyle, JJ (2006) Paleopolyploidy and gene duplication in soybean and other legumes. Current Opinion in Plant Biology 9: 104109.
Singh, RJ and Hymowitz, T (1989) The genomic relationships among Glycine soja Sieb. and Zucc. G. max (L.) Merr. and ‘G. gracilis’ Skvortz. Plant Breeding 103: 171173.
Singh, RJ and Nelson, RL (2015) Intersubgeneric hybridization between Glycine max and G. tomentella: production of F1, amphidiploid, BC1, BC2, BC3, and fertile soybean plants. Theoretical and Applied Genetics 128: 11171136.
Slatkin, M and Excoffier, L (1996) Maximization algorithm. Heredity 76: 377383.
Song, Q, Hyten, DL, Jia, G, Quigley, CV, Fickus, EW, Nelson, RL and Cregan, PB (2013) PB. Development and evaluation of SoySNP50K, a high-density genotyping array for soybean. PLoS ONE 8: e54985.
Song, Q, Hyten, DL, Jia, G, Quigley, CV, Fickus, EW, Nelson, RL and Cregan, PB (2015) Fingerprinting soybean germplasm and its utility in genomic research. G3: Genes| Genomes| Genetics 5: 19992006.
Stekhoven, DJ and Buhlmann, P (2012) Missforest: non-parametric missing value imputation for mixed-type data. Bioinformatics 28: 112118.
Stranden, I and Christensen, OF (2011) Allele coding in genomic evaluation. Genetics Selection Evolution 43: 111.
Sved, JA, Cameron, EC and Gilchrist, AS (2013) Estimating effective population size from linkage disequilibrium between unlinked loci: theory and application to fruit fly outbreak populations. PLoS ONE 8: e69078.
Tasma, IM, Lorenzen, LL, Green, DE and Shoemaker, RC (2001) Mapping genetic loci for flowering time, maturity, and photoperiod insensitivity in soybean. Molecular Breeding 8: 2535.
Tavaud-Pirra, M, Sartre, P, Nelson, R, Santon, S, Texier, N and Roumet, P (2009) Genetic diversity in a soybean collection. Crop Science 49: 895902.
Wang, D, Graef, GL, Procopiuk, AM and Diers, BW (2004) Identification of putative QTL that underlie yield in interspecific soybean backcross populations. Theoretical and Applied Genetics 108: 458467.
Wang, KJ, Li, XH, Zhang, JJ, Chen, H, Zhang, ZL and Yu, GD (2010) Natural introgression from cultivated soybean (Glycine max) into wild soybean (Glycine soja) with the implications for origin of populations of semi-wild type and for biosafety of wild species in China. Genetic Resources and Crop Evolution 57: 747761.
Wang, Y, Lu, J, Chen, S, Shu, L, Palmer, RG, Xing, G, Li, Y, Yang, S, Yu, D, Zhao, T and Gai, J, (2014) Exploration of presence/absence variation and corresponding polymorphic markers in soybean genome. Journal of Integrative Plant Biology 56: 10091019.
Waples, RS, Antao, T and Luikart, G (2014) Effects of overlapping generations on linkage disequilibrium estimates of effective population size. Genetics 197: 769780.
Weir, BS and Cockerham, CC (1984) Estimating F-statistics for the analysis of population structure. Evolution 38: 13581370.
Wen, Z, Ding, Y, Zhao, T and Gai, J (2009) Genetic diversity and peculiarity of annual wild soybean (G. soja Sieb. et Zucc.) from various eco-regions in China. Theoretical and Applied Genetics 119: 371381.
Wright, S (1949) The genetical structure of populations. Annals of Eugenics 15: 323354.
Wright, S (1965) The interpretation of population structure by F-statistics with special regard to systems of mating. Evolution 19: 395420.
Xavier, A, Xu, S, Muir, WM and Rainey, KM (2015) NAM: association studies in multiple populations. Bioinformatics 31: 38623864.
Xavier, A, Muir, WM, Craig, B and Rainey, KM (2016) Walking through the statistical black boxes of plant breeding. Theoretical and Applied Genetics 129: 19331949.
Xavier, A, Hall, B, Casteel, S, Muir, W and Rainey, KM (2017) Using unsupervised learning techniques to assess interactions among complex traits in soybeans. Euphytica 213: 200.
Xavier, A, Jarquin, D, Howard, R, Ramasubramanian, V, Specht, JE, Graef, GL, Beavis, WD, Diers, BW, Song, Q, Cregan, PB and Nelson, R (2018) Genome-Wide analysis of grain yield stability and environmental interactions in a multiparental soybean population. G3: Genes, Genomes, Genetics 8: 519529.
Xu, D, Abe, J, Gai, J and Shimamoto, Y (2002) Diversity of chloroplast DNA SSRs in wild and cultivated soybeans: evidence for multiple origins of cultivated soybean. Theoretical and Applied Genetics 105: 645653.
Xu, M, Xu, Z, Liu, B, Kong, F, Tsubokura, Y, Watanabe, S, Xia, Z, Harada, K, Kanazawa, A, Yamada, T and Abe, J (2013) Genetic variation in four maturity genes affects photoperiod insensitivity and PHYA-regulated post-flowering responses of soybean. BMC Plant Biology 13: 1.
Yamada, T, Takagi, K and Ishimoto, M (2012) Recent advances in soybean transformation and their application to molecular breeding and genomic analysis. Breeding Science 61: 480494.
Yamamichi, M and Innan, H (2012) Estimating the migration rate from genetic variation data. Heredity 108: 362.
Zera, AJ and Harshman:, LG (2001) The physiology of life history trade-offs in animals. Annual Review of Ecology and Systematics 32: 95126.
Zhang, J, Song, Q, Cregan, PB and Jiang, GL (2016) Genome-wide association study, genomic prediction and marker-assisted selection for seed weight in soybean (Glycine max). Theoretical and Applied Genetics 129: 117130.
Zhao, S, Zheng, F, He, W, Wu, H, Pan, S and Lam, HM (2015) Impacts of nucleotide fixation during soybean domestication and improvement. BMC Plant Biology 15: 81.
Zhou, Z, Jiang, Y, Wang, Z, Gou, Z, Lyu, J, Li, W, Yu, Y, Shu, L, Zhao, Y, Ma, Y and Fang, C (2015) Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean. Nature Biotechnology 33: 408414.
Zhu, YL, Song, QJ, Hyten, DL, Van Tassell, CP, Matukumalli, LK, Grimm, DR, et al. (2003) Single-nucleotide polymorphisms in soybean. Genetics 163: 11231134.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Plant Genetic Resources
  • ISSN: 1479-2621
  • EISSN: 1479-263X
  • URL: /core/journals/plant-genetic-resources
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Type Description Title
Supplementary materials

Xavier et al. supplementary material
Xavier et al. supplementary material 1

 PDF (1.7 MB)
1.7 MB
Supplementary materials

Xavier et al. supplementary material
Xavier et al. supplementary material 2

 Unknown (1.3 MB)
1.3 MB


Full text views

Total number of HTML views: 3
Total number of PDF views: 28 *
Loading metrics...

Abstract views

Total abstract views: 247 *
Loading metrics...

* Views captured on Cambridge Core between 23rd April 2018 - 22nd May 2018. This data will be updated every 24 hours.