Skip to main content Accessibility help

A comparative performance of clustering procedures for mixture of qualitative and quantitative data – an application to black gram

  • Rupam Kumar Sarkar (a1), A. R. Rao (a2), S. D. Wahi (a3) and K. V. Bhat (a4)

Knowledge of the genetic diversity of germplasm of breeding material is invaluable in crop improvement programmes. Frequently, qualitative and quantitative data are used separately to assess genetic diversity of crop genotypes. While assessing diversity based on qualitative and quantitative traits separately, there may occur a problem when the degree of correspondence between the clusters formed does not agree with each other. This study compares five different procedures of clustering based on the criterion of weighted average of observed proportion of misclassification in black gram genotypes using qualitative, quantitative traits and mixture data. The INDOMIX- and PRINQUAL-based clustering procedures, i.e. INDOMIX and PRINQUAL methods in conjunction with the k-means clustering procedure, show better performance compared with other clustering procedures, followed by clustering based on either quantitative or qualitative data alone. The use of the INDOMIX- and PRINQUAL-based procedures can help breeders in capturing the variation present in both qualitative and quantitative trait data simultaneously and solving the problem of ambiguity over the degree of correspondence between clustering based on either qualitative or quantitative traits alone.

Corresponding author
*Corresponding author. E-mail:
Hide All
Carrol, JD and Chang, JJ (1970) Analysis of individual differences in multidimensional scaling via an N-way generalization of Eckart–Young decomposition. Psychometrika 35: 283319.
Cole-Rodgers, P, Smith, DW and Bosland, PW (1997) A novel statistical approach to analyze genetic resource evaluations using capsicum as an example. Crop Science 37: 10001002.
de Leeuw, J and van Rijckevorsel, JLA (1980) HOMALS and PRINCALS, some generalization of principal components analysis. In: Diday, E, Lebart, L, Pagès, JP and Tomassone, R (eds) Data Analysis and Informatics II. North Holland/Amsterdam: Elsevier Science Publisher, pp. 231242.
Dempster, AP, Laird, NM and Rubin, DB (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39: 138.
Geleta, N and Labuschange, MT (2005) Qualitative traits variation in sorghum (Sorghum bicolor (L.) Moench) germplasm from eastern highlands of Ethiopia. Biodiversity and Conservation 14: 30553064.
Gower, JC (1971) A general coefficient of similarity and some of its properties. Biometrics 27: 857872.
Harch, BD, Basford, KE, DeLacy, IH and Lawrence, PK (1999) The analysis of large scale data taken from the world groundnut (Arachis hypogaea L.) germplasm collection. II. Two-way data with mixed data types. Euphytica 105: 7382.
Kawuki, RS, Ferguson, M, Labuschagne, MT, Herselman, L, Orone, J, Ralimanana, I, Bidiaka, M, Lukombo, S, Kanyange, MC, Gashaka, G, Mkamilo, G, Gethi, J and Obiero, H (2011) Variation in qualitative and quantitative traits of cassava germplasm from selected national breeding programmes in sub-Saharan Africa. Field Crops Research 122: 151156.
Kiers, HAL (1989) Three-way Methods for Analysis of Qualitative and Quantitative Two-way Data. Leiden: DSWO Press.
Kohonen, T (1988) Self-organizing and Associative Memory. 3rd edn. New York: Springer-Verlag, Inc.
Kolluru, R, Rao, AR, Prabhakaran, VT, Selvi, A and Mohapatra, T (2007) Comparative evaluation of clustering techniques for establishing AFLP based genetic relationship among sugarcane cultivars. Journal of Indian Society of Agricultural Statistics 61: 5165.
Li, T (2006) A unified view on clustering binary data. Machine Learning 62: 199215.
Mohammadi, SA and Prasanna, BM (2003) Analysis of genetic diversity in crop plants – salient statistical tools and considerations. Crop Science 43: 12351248.
Peeters, JP and Martinelli, JA (1989) Hierarchical cluster analysis as a tool to manage variation in germplasm collections. Theoretical and Applied Genetics 78: 4248.
SAS (2005) SAS® 9.1.3 Language Reference: Concepts. 3rd edn. Cary, NC: SAS Institute, Inc.
Sneath, PHA and Sokal, RR (1973) Numerical Taxonomy. San Francisco, CA: Freeman.
Souza, E and Sorrells, ME (1991a) Relationships among 70 North American oat germplasms. I. Cluster analysis using quantitative characters. Crop. Science 31: 599605.
Souza, E and Sorrells, ME (1991b) Relationships among 70 North American oat germplasms. I. Cluster analysis using qualitative characters. Crop Science 31: 605612.
Ward, JH Jr (1963) Hierarchical grouping to optimize an objective function. Journal of American Statistical Association 58: 236244.
Winsberg, S and Ramsay, JO (1983) Monotone spline transformations for dimension reduction. Psychometrika 48: 575595.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Plant Genetic Resources
  • ISSN: 1479-2621
  • EISSN: 1479-263X
  • URL: /core/journals/plant-genetic-resources
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Type Description Title
Supplementary materials

Rao Supplementary Material 1
Rao Supplementary Material 1

 Word (92 KB)
92 KB
Supplementary materials

Rao Supplementary Data 2
Rao Supplementary Data 2

 Unknown (6 KB)
6 KB
Supplementary materials

Rao Supplementary Data 1
Rao Supplementary Data 1

 Unknown (925 bytes)
925 bytes
Supplementary materials

Rao Supplementary Data 3
Rao Supplementary Data 3

 Unknown (7 KB)
7 KB
Supplementary materials

Rao Supplementary Material 2
Rao Supplementary Material 2

 Word (54 KB)
54 KB


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed