A comparative performance of clustering procedures for mixture of qualitative and quantitative data – an application to black gram

Rupam Kumar Sarkar; A. R. Rao; S. D. Wahi; K. V. Bhat

doi:10.1017/S1479262111000827

A comparative performance of clustering procedures for mixture of qualitative and quantitative data – an application to black gram

Published online by Cambridge University Press: 25 July 2011

Rupam Kumar Sarkar ,

A. R. Rao ,

S. D. Wahi and

K. V. Bhat

Show author details

Rupam Kumar Sarkar: Affiliation:
Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
A. R. Rao*: Affiliation:
Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
S. D. Wahi: Affiliation:
Biometrics Division, Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
K. V. Bhat: Affiliation:
National Bureau of Plant Genetic Resources, New Delhi 110 012, India
*: *Corresponding author. E-mail: arrao@iasri.res.in

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Knowledge of the genetic diversity of germplasm of breeding material is invaluable in crop improvement programmes. Frequently, qualitative and quantitative data are used separately to assess genetic diversity of crop genotypes. While assessing diversity based on qualitative and quantitative traits separately, there may occur a problem when the degree of correspondence between the clusters formed does not agree with each other. This study compares five different procedures of clustering based on the criterion of weighted average of observed proportion of misclassification in black gram genotypes using qualitative, quantitative traits and mixture data. The INDOMIX- and PRINQUAL-based clustering procedures, i.e. INDOMIX and PRINQUAL methods in conjunction with the k-means clustering procedure, show better performance compared with other clustering procedures, followed by clustering based on either quantitative or qualitative data alone. The use of the INDOMIX- and PRINQUAL-based procedures can help breeders in capturing the variation present in both qualitative and quantitative trait data simultaneously and solving the problem of ambiguity over the degree of correspondence between clustering based on either qualitative or quantitative traits alone.

Keywords

cluster analysis genetic diversity mixture data qualitative traits quantitative traits RAPD

Information

Type: Research Article
Information: Plant Genetic Resources , Volume 9 , Issue 4 , December 2011 , pp. 523 - 527

DOI: https://doi.org/10.1017/S1479262111000827 [Opens in a new window]
Copyright: Copyright © NIAB 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Carrol, JD and Chang, JJ (1970) Analysis of individual differences in multidimensional scaling via an N-way generalization of Eckart–Young decomposition. Psychometrika 35: 283–319.CrossRef Google Scholar

Cole-Rodgers, P, Smith, DW and Bosland, PW (1997) A novel statistical approach to analyze genetic resource evaluations using capsicum as an example. Crop Science 37: 1000–1002.CrossRef Google Scholar

de Leeuw, J and van Rijckevorsel, JLA (1980) HOMALS and PRINCALS, some generalization of principal components analysis. In: Diday, E, Lebart, L, Pagès, JP and Tomassone, R (eds) Data Analysis and Informatics II. North Holland/Amsterdam: Elsevier Science Publisher, pp. 231–242.Google Scholar

Dempster, AP, Laird, NM and Rubin, DB (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39: 1–38.Google Scholar

Geleta, N and Labuschange, MT (2005) Qualitative traits variation in sorghum (Sorghum bicolor (L.) Moench) germplasm from eastern highlands of Ethiopia. Biodiversity and Conservation 14: 3055–3064.CrossRef Google Scholar

Gower, JC (1971) A general coefficient of similarity and some of its properties. Biometrics 27: 857–872.CrossRef Google Scholar

Harch, BD, Basford, KE, DeLacy, IH and Lawrence, PK (1999) The analysis of large scale data taken from the world groundnut (Arachis hypogaea L.) germplasm collection. II. Two-way data with mixed data types. Euphytica 105: 73–82.CrossRef Google Scholar

Kawuki, RS, Ferguson, M, Labuschagne, MT, Herselman, L, Orone, J, Ralimanana, I, Bidiaka, M, Lukombo, S, Kanyange, MC, Gashaka, G, Mkamilo, G, Gethi, J and Obiero, H (2011) Variation in qualitative and quantitative traits of cassava germplasm from selected national breeding programmes in sub-Saharan Africa. Field Crops Research 122: 151–156.CrossRef Google Scholar

Kiers, HAL (1989) Three-way Methods for Analysis of Qualitative and Quantitative Two-way Data. Leiden: DSWO Press.Google Scholar

Kohonen, T (1988) Self-organizing and Associative Memory. 3rd edn. New York: Springer-Verlag, Inc.CrossRef Google Scholar

Kolluru, R, Rao, AR, Prabhakaran, VT, Selvi, A and Mohapatra, T (2007) Comparative evaluation of clustering techniques for establishing AFLP based genetic relationship among sugarcane cultivars. Journal of Indian Society of Agricultural Statistics 61: 51–65.Google Scholar

Li, T (2006) A unified view on clustering binary data. Machine Learning 62: 199–215.CrossRef Google Scholar

Mohammadi, SA and Prasanna, BM (2003) Analysis of genetic diversity in crop plants – salient statistical tools and considerations. Crop Science 43: 1235–1248.CrossRef Google Scholar

Peeters, JP and Martinelli, JA (1989) Hierarchical cluster analysis as a tool to manage variation in germplasm collections. Theoretical and Applied Genetics 78: 42–48.CrossRef Google Scholar PubMed

SAS (2005) SAS^® 9.1.3 Language Reference: Concepts. 3rd edn. Cary, NC: SAS Institute, Inc.Google Scholar

Sneath, PHA and Sokal, RR (1973) Numerical Taxonomy. San Francisco, CA: Freeman.Google Scholar

Souza, E and Sorrells, ME (1991a) Relationships among 70 North American oat germplasms. I. Cluster analysis using quantitative characters. Crop. Science 31: 599–605.Google Scholar

Souza, E and Sorrells, ME (1991b) Relationships among 70 North American oat germplasms. I. Cluster analysis using qualitative characters. Crop Science 31: 605–612.CrossRef Google Scholar

Ward, JH Jr (1963) Hierarchical grouping to optimize an objective function. Journal of American Statistical Association 58: 236–244.CrossRef Google Scholar

Winsberg, S and Ramsay, JO (1983) Monotone spline transformations for dimension reduction. Psychometrika 48: 575–595.CrossRef Google Scholar

Rao Supplementary Material 1

File 91.6 KB

Rao Supplementary Material 2

File 53.8 KB

Rao Supplementary Data 1

File 925 Bytes

Rao Supplementary Data 2

File 6 KB

Rao Supplementary Data 3

File 6.9 KB

Article contents

A comparative performance of clustering procedures for mixture of qualitative and quantitative data – an application to black gram

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Rao Supplementary Material 1

Rao Supplementary Material 2

Rao Supplementary Data 1

Rao Supplementary Data 2

Rao Supplementary Data 3

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests