Hostname: page-component-76c49bb84f-229nc Total loading time: 0 Render date: 2025-07-06T06:26:16.148Z Has data issue: false hasContentIssue false

An Examination of the Effect of Six Types of Error Perturbation on Fifteen Clustering Algorithms

Published online by Cambridge University Press:  01 January 2025

Glenn W. Milligan*
Affiliation:
The Ohio State University
*
Requests for reprints should be sent to Glenn W. Milligan, Faculty of Management Sciences, 356 Hagerty Hall, The Ohio State University, Columbus, Ohio 43210.

Abstract

An evaluation of several clustering methods was conducted. Artificial clusters which exhibited the properties of internal cohesion and external isolation were constructed. The true cluster structure was subsequently hidden by six types of error-perturbation. The results indicated that the hierarchical methods were differentially sensitive to the type of error perturbation. In addition, generally poor recovery performance was obtained when random seed points were used to start the K-means algorithms. However, two alternative starting procedures for the nonhierarchical methods produced greatly enhanced cluster recovery and were found to be robust with respect to all of the types of error examined.

Information

Type
Original Paper
Copyright
Copyright © 1980 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Reference Notes

Dudewicz, E. J. IRCCRAND-The Ohio State University random number generator package, 1974, Columbus, Ohio: The Ohio State University, Department of Statistics.Google Scholar
Learmonth, G. P., &Lewis, P. A. W. Naval Postgraduate School random number generator package LLRANDOM, 1973, Monterey, Calif.: Naval Postgraduate School, Department of Operations Research and Administrative Sciences.Google Scholar

References

Anderberg, M. R. Cluster analysis for applications, 1973, New York: Academic Press.Google Scholar
Baker, F. B. Stability of two hierarchical grouping techniques Case I: Sensitivity to data errors. Journal of the American Statistical Association, 1974, 69, 440445.Google Scholar
Bartko, J. J., Straus, J. S., & Carpenter, W. T. An evaluation of taxometric techniques for psychiatric data. Classification Society Bulletin, 1971, 2, 228.Google Scholar
Blashfield, R. K. Mixture model tests of cluster analysis: Accuracy of four agglomerative hierarchical methods. Psychological Bulletin, 1976, 83, 377388.CrossRefGoogle Scholar
Bromley, D. B. Rank order cluster analysis. British Journal of Mathematical and Statistical Psychology, 1966, 19, 105123.CrossRefGoogle ScholarPubMed
Cattel, R. B. r p and other coefficients of pattern similarity. Psychometrika, 1949, 14, 279298.CrossRefGoogle Scholar
Cormack, R. M. A review of classification. Journal of the Royal Statistical Society (Series A), 1971, 134, 321367.CrossRefGoogle Scholar
Cronbach, L. J., & Gleser, G. C. Assessing the similarity between profiles. Psychological Bulletin, 1953, 50, 456473.CrossRefGoogle ScholarPubMed
Cunningham, K. M. & Ogilvie, J. C. Evaluation of hierarchical grouping techniques: A preliminary study. Computer Journal, 1972, 15, 209213.CrossRefGoogle Scholar
D'Andrade, R. G. U-statistic hierarchical clustering. Psychometrika, 1978, 43, 5967.CrossRefGoogle Scholar
Dudewicz, E. J. Speed and quality of random numbers for simulation. Journal of Quality Technology, 1976, 8, 171178.CrossRefGoogle Scholar
Edelbrock, C. Comparing the accuracy of hierarchical clustering algorithms: The problem of classifying everybody. Multivariate Behavioral Research, 1979, 14, 367384.CrossRefGoogle ScholarPubMed
Everitt, B. S. Cluster analysis, 1974, London: Halstead Press.Google Scholar
Fleiss, L., Zubin, J. On the methods and theory of clustering. Multivariate Behavioral Research, 1969, 4, 235250.CrossRefGoogle ScholarPubMed
Friedman, H. P. & Rubin, J. On some invariant criteria for grouping data. Journal of the American Statistical Association, 1967, 62, 11591178.CrossRefGoogle Scholar
Hartigan, J. A. Clustering algorithms, 1975, New York: Wiley.Google Scholar
Helmstadter, G. An empirical comparison of methods for estimating profile similarity. Educational and Psychological Measurement, 1957, 17, 7182.CrossRefGoogle Scholar
Hubert, L. J. & Levin, J. R. Evaluating object set partitions: Free sort analysis and some generalizations. Journal of Verbal Learning and Verbal Behavior, 1976, 15, 459470.CrossRefGoogle Scholar
Jardine, N., Sibson, R. Mathematical taxonomy, 1971, New York: Wiley.Google Scholar
Johnson, S. C. Hierarchical clustering schemes. Psychometrika, 1967, 32, 241254.CrossRefGoogle ScholarPubMed
Kuiper, F. K. & Fisher, L. A Monte Carlo comparison of six clustering procedures. Biometrics, 1975, 31, 777783.CrossRefGoogle Scholar
Levinsohn, J. R. & Funk, S. G. CLUSTER-Hierarchical clustering program for large data sets (N greater than 100). Behavior Research Methods and Instrumentation, 1973, 5, 432432.CrossRefGoogle Scholar
Mezich, J. E. An evaluation of quantitative taxonomic methods (Doctral dissertation, The Ohio State University, 1975). Dissertation Abstracts International, 1975, 36, 3008-B. (University Microfilms No. 75-26, 616).Google Scholar
Milligan, G. W. An examination of the effect of error perturbation of constructed data on fifteen clustering algorithms (Doctoral dissertation, The Ohio State University, 1978). Dissertation Abstracts International, 1979, 40, 4010B4011B (University Microfilms No. 7902188)Google Scholar
Milligan, G. W. Ultrametric hierarchical clustering algorithms. Psychometrika, 1979, 44, 343346.CrossRefGoogle Scholar
Milligan, G. W. & Isaac, P. D. The validation of four ultrametric clustering algorithms. Pattern Recognition, 1980, 12, 4150.CrossRefGoogle Scholar
Peay, E. R. Nonmetric grouping: Clusters and cliques. Psychometrika, 1975, 40, 297313.CrossRefGoogle Scholar
Rand, W. M. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 1971, 66, 846850.CrossRefGoogle Scholar
Rohlf, F. J. Methods of comparing classifications. Annual Review of Ecology and Systematics, 1974, 5, 101113.CrossRefGoogle Scholar
Shepard, R. N. Representation of structure in similarity data: Problems and prospects. Psychometrika, 1974, 39, 373421.CrossRefGoogle Scholar
Sneath, P. H. A. A comparison of different clustering methods as applied to randomly-spaced points. Classification Society Bulletin, 1966, 1, 218.Google Scholar
Sneath, P. H. A. Evaluation of clustering methods. In Cole, A. J. (Eds.), Numerical taxonomy, 1969, New York: Academic Press.Google Scholar
Sneath, P. H. A. & Sokal, R. R. Numerical taxonomy, 1973, San Francisco: Freeman.Google Scholar
Williams, W. T., Lance, G. N., Dale, M. B. & Clifford, H. T. Controversy concerning the criteria for taxonometric strategies. Computer Journal, 1971, 14, 162165.CrossRefGoogle Scholar
Zahn, C. T. Graph theory methods for detecting and describing Gestalt clusters. IEEE Transactions on Computers, 1971, C-20, 6886.CrossRefGoogle Scholar