Skip to main content Accessibility help

An approach to the development of a core set of germplasm using a mixture of qualitative and quantitative data

  • Rupam Kumar Sarkar (a1), Prabina Kumar Meher (a1), S. D. Wahi (a1), T. Mohapatra (a2) and A. R. Rao (a1)...

Development of a representative and well-diversified core with minimum duplicate accessions and maximum diversity from a larger population of germplasm is highly essential for breeders involved in crop improvement programmes. Most of the existing methodologies for the identification of a core set are either based on qualitative or quantitative data. In this study, an approach to the identification of a core set of germplasm based on the response from a mixture of qualitative (single nucleotide polymorphism genotyping) and quantitative data was proposed. For this purpose, six different combined distance measures, three for quantitative data and two for qualitative data, were proposed and evaluated. The combined distance matrices were used as inputs to seven different clustering procedures for classifying the population of germplasm into homogeneous groups. Subsequently, an optimum number of clusters based on all clustering methodologies using different combined distance measures were identified on a consensus basis. Average cluster robustness values across all the identified optimum number of clusters under each clustering methodology were calculated. Overall, three different allocation methods were applied to sample the accessions that were selected from the clusters identified under each clustering methodology, with the highest average cluster robustness value being used to formulate a core set. Furthermore, an index was proposed for the evaluation of diversity in the core set. The results reveal that the combined distance measure A 1 B 2 – the distance based on the average of the range-standardized absolute difference for quantitative data with the rescaled distance based on the average absolute difference for qualitative data – from which three clusters that were identified by using the k-means clustering algorithm along with the proportional allocation method was suitable for the identification of a core set from a collection of rice germplasm.

Corresponding author
* Corresponding authors: E-mail:;
Hide All
Agrama, HA, Yan, WG, Lee, F, Fjellstrom, R, Chen, M-H, Jia, M and McClung, A (2009) Genetic assessment of a mini-core subset developed from the USDA rice genebank. Crop Science 49: 13361346.
Crossa, J and Franco, J (2004) Statistical methods for classifying genotypes. Euphytica 137: 1937.
Doring, C, Borgelt, C and Kruse, R (2004) Fuzzy clustering of quantitative and qualitative data. In Proceedings of the 2004 NAFIPS. Banff, Alberta, Canada, pp. 8489.
Everitt, BS (1979) Unresolved problems in cluster analysis. Biometrics 35: 169181.
Frankel, OH and Brown, AHD (1984) Plant genetic resources today: a critical appraisal. In: Holden, JHW and Williams, JT (eds) Crop Genetic Resources: Conservation and Evaluation. London: George Allen & Unwin Ltd, pp. 249257.
Gangopadhyay, KK, Mahajan, RK, Kumar, G, Yadav, SK, Meena, BL, Pandey, C, Bisht, IS, Mishra, SK, Sivaraj, N, Gambhir, R, Sharma, SK and Dhillon, BS (2010) Development of a core set in brinjal (Solanum melongena L.). Crop Science 50: 755762.
Gibert, K and Cortes, U (1997) Weighting quantitative and qualitative variables in clustering methods. Mathware & Soft Computing 4: 251266.
Gower, JC (1971) A general coefficient of similarity and some of its properties. Biometrics 27: 857874.
Hu, J, Zhu, J and Xu, HM (2000) Methods of constructing core collections by stepwise clustering with three sampling strategies based on the genotypic values of crops. Theoretical and Applied Genetics 101: 264268.
Kim, KW, Chung, HK, Cho, GT, Ma, KH, Chandrabalan, D, Gwag, JG, Kim, TS, Cho, EG and Park, YJ (2007) PowerCore: a program applying the advanced M strategy with a heuristic search for establishing core sets. Bioinformatics 23: 515526.
Monti, S, Tamayo, P, Mesirov, J and Golub, T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning 52: 91118.
Munneke, B, Schlauch, KA, Simonsen, KL, Beavis, WD and Doerge, RW (2005) Adding confidence to gene expression clustering. Genetics 170: 20032011.
Odong, TL, van Heerwaarden, J, Jansen, J, van Hintum, TJL and van Eeuwijk, FA (2011) Determination of genetic structure of germplasm collections: are traditional hierarchical clustering methods appropriate for molecular marker data? Theoretical and Applied Genetics 123: 195205.
Odong, TL, Jansen, J, van Eeuwijk, FA and van Hintum, TJL (2013) Quality of core collections for effective utilisation of genetic resources review, discussion and interpretation. Theoretical and Applied Genetics 126: 289305.
Sarkar, RK, Rao, AR, Wahi, SD and Bhat, KV (2011) A comparative performance of clustering procedures for mixture of qualitative and quantitative data – an application to black gram. Plant Genetic Resources: Characterisation and Utilization 9: 523527.
Sharma, R, Rao, VP, Upadhyaya, HD, Reddy, VG and Thakur, RP (2010) Resistance to grain mold and downy mildew in a mini-core collection of sorghum germplasm. Plant Disease 94: 439444.
Simpson, TI (2010) clusterCons: Calculate the consensus clustering result from re-sampled clustering experiments with the option of using multiple algorithms and parameter, R package version 3.0.2.
Simpson, TI, Armstrong, JD and Jarman, AP (2010) Merged consensus clustering to assess and improve class discovery with microarray data. BMC Bioinformatics 11: 590.
Studnicki, M and Debski, K (2012) ccChooser: Developing a core collections, R package version 3.0.2.
van Hintum, T and Th, JL (1999) The Core Selector, a system to generate representative selections of germplasm accessions. Plant Genetic Resources Newsletter 118: 6467.
van Hintum, T, Brown, AHD, Spillane, C and Hodgkin, T (2000) Core collections of plant genetic resources. IPGRI Technical Bulletin No. 3. International Plant Genetic Resources Institute, Rome, Italy.
Wen, W, Franco, J, Chavez-Tovar, VH, Yan, J and Taba, S (2012) Genetic characterization of a core set of a tropical maize race Tuxpeño for further use in maize improvement. PLoS ONE 7: e32626.
Yan, W, Rutger, JN, Bryant, RJ, Bockelman, HE, Fjellstrom, RG, Thomas, MC, Tai, H and McClung, AM (2007) Development and evaluation of a core subset of the USDA rice germplasm collection. Crop Science 47: 869876.
Yu, JZ, Kohel, RJ, Fang, DD, Cho, J, Van Deynze, A, Ulloa, M, Hoffman, SM, Pepper, AE, Stelly, DM, Jenkins, JN, Saha, S, Kumpatla, SP, Shah, MR, Hugie, WV and Percy, RG (2012) A high-density simple sequence repeat and single nucleotide polymorphism genetic map of the tetraploid cotton genome. Genes Genomes Genetics 2: 4358.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Plant Genetic Resources
  • ISSN: 1479-2621
  • EISSN: 1479-263X
  • URL: /core/journals/plant-genetic-resources
Please enter your name
Please enter a valid email address
Who would you like to send this to? *



Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed