An Examination of the Effect of Six Types of Error Perturbation on Fifteen Clustering Algorithms

Glenn W. Milligan

doi:10.1007/BF02293907

An Examination of the Effect of Six Types of Error Perturbation on Fifteen Clustering Algorithms

Published online by Cambridge University Press: 01 January 2025

Glenn W. Milligan

Show author details

Glenn W. Milligan*: Affiliation:
The Ohio State University
*: Requests for reprints should be sent to Glenn W. Milligan, Faculty of Management Sciences, 356 Hagerty Hall, The Ohio State University, Columbus, Ohio 43210.

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

An evaluation of several clustering methods was conducted. Artificial clusters which exhibited the properties of internal cohesion and external isolation were constructed. The true cluster structure was subsequently hidden by six types of error-perturbation. The results indicated that the hierarchical methods were differentially sensitive to the type of error perturbation. In addition, generally poor recovery performance was obtained when random seed points were used to start the K-means algorithms. However, two alternative starting procedures for the nonhierarchical methods produced greatly enhanced cluster recovery and were found to be robust with respect to all of the types of error examined.

Keywords

clustering algorithms clustering validation Monte Carlo research

Information

Type: Original Paper
Information: Psychometrika , Volume 45 , Issue 3 , September 1980 , pp. 325 - 342

DOI: https://doi.org/10.1007/BF02293907 [Opens in a new window]
Copyright: Copyright © 1980 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Reference Notes

Dudewicz, E. J. IRCCRAND-The Ohio State University random number generator package, 1974, Columbus, Ohio: The Ohio State University, Department of Statistics.Google Scholar

Learmonth, G. P., &Lewis, P. A. W. Naval Postgraduate School random number generator package LLRANDOM, 1973, Monterey, Calif.: Naval Postgraduate School, Department of Operations Research and Administrative Sciences.Google Scholar

References

Anderberg, M. R. Cluster analysis for applications, 1973, New York: Academic Press.Google Scholar

Baker, F. B. Stability of two hierarchical grouping techniques Case I: Sensitivity to data errors. Journal of the American Statistical Association, 1974, 69, 440–445.Google Scholar

Bartko, J. J., Straus, J. S., & Carpenter, W. T. An evaluation of taxometric techniques for psychiatric data. Classification Society Bulletin, 1971, 2, 2–28.Google Scholar

Blashfield, R. K. Mixture model tests of cluster analysis: Accuracy of four agglomerative hierarchical methods. Psychological Bulletin, 1976, 83, 377–388.CrossRef Google Scholar

Bromley, D. B. Rank order cluster analysis. British Journal of Mathematical and Statistical Psychology, 1966, 19, 105–123.CrossRef Google Scholar PubMed

Cattel, R. B. r _p and other coefficients of pattern similarity. Psychometrika, 1949, 14, 279–298.CrossRef Google Scholar

Cormack, R. M. A review of classification. Journal of the Royal Statistical Society (Series A), 1971, 134, 321–367.CrossRef Google Scholar

Cronbach, L. J., & Gleser, G. C. Assessing the similarity between profiles. Psychological Bulletin, 1953, 50, 456–473.CrossRef Google Scholar PubMed

Cunningham, K. M. & Ogilvie, J. C. Evaluation of hierarchical grouping techniques: A preliminary study. Computer Journal, 1972, 15, 209–213.CrossRef Google Scholar

D'Andrade, R. G. U-statistic hierarchical clustering. Psychometrika, 1978, 43, 59–67.CrossRef Google Scholar

Dudewicz, E. J. Speed and quality of random numbers for simulation. Journal of Quality Technology, 1976, 8, 171–178.CrossRef Google Scholar

Edelbrock, C. Comparing the accuracy of hierarchical clustering algorithms: The problem of classifying everybody. Multivariate Behavioral Research, 1979, 14, 367–384.CrossRef Google Scholar PubMed

Everitt, B. S. Cluster analysis, 1974, London: Halstead Press.Google Scholar

Fleiss, L., Zubin, J. On the methods and theory of clustering. Multivariate Behavioral Research, 1969, 4, 235–250.CrossRef Google Scholar PubMed

Friedman, H. P. & Rubin, J. On some invariant criteria for grouping data. Journal of the American Statistical Association, 1967, 62, 1159–1178.CrossRef Google Scholar

Hartigan, J. A. Clustering algorithms, 1975, New York: Wiley.Google Scholar

Helmstadter, G. An empirical comparison of methods for estimating profile similarity. Educational and Psychological Measurement, 1957, 17, 71–82.CrossRef Google Scholar

Hubert, L. J. & Levin, J. R. Evaluating object set partitions: Free sort analysis and some generalizations. Journal of Verbal Learning and Verbal Behavior, 1976, 15, 459–470.CrossRef Google Scholar

Jardine, N., Sibson, R. Mathematical taxonomy, 1971, New York: Wiley.Google Scholar

Johnson, S. C. Hierarchical clustering schemes. Psychometrika, 1967, 32, 241–254.CrossRef Google Scholar PubMed

Kuiper, F. K. & Fisher, L. A Monte Carlo comparison of six clustering procedures. Biometrics, 1975, 31, 777–783.CrossRef Google Scholar

Levinsohn, J. R. & Funk, S. G. CLUSTER-Hierarchical clustering program for large data sets (N greater than 100). Behavior Research Methods and Instrumentation, 1973, 5, 432–432.CrossRef Google Scholar

Mezich, J. E. An evaluation of quantitative taxonomic methods (Doctral dissertation, The Ohio State University, 1975). Dissertation Abstracts International, 1975, 36, 3008-B. (University Microfilms No. 75-26, 616).Google Scholar

Milligan, G. W. An examination of the effect of error perturbation of constructed data on fifteen clustering algorithms (Doctoral dissertation, The Ohio State University, 1978). Dissertation Abstracts International, 1979, 40, 4010B–4011B (University Microfilms No. 7902188)Google Scholar

Milligan, G. W. Ultrametric hierarchical clustering algorithms. Psychometrika, 1979, 44, 343–346.CrossRef Google Scholar

Milligan, G. W. & Isaac, P. D. The validation of four ultrametric clustering algorithms. Pattern Recognition, 1980, 12, 41–50.CrossRef Google Scholar

Peay, E. R. Nonmetric grouping: Clusters and cliques. Psychometrika, 1975, 40, 297–313.CrossRef Google Scholar

Rand, W. M. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 1971, 66, 846–850.CrossRef Google Scholar

Rohlf, F. J. Methods of comparing classifications. Annual Review of Ecology and Systematics, 1974, 5, 101–113.CrossRef Google Scholar

Shepard, R. N. Representation of structure in similarity data: Problems and prospects. Psychometrika, 1974, 39, 373–421.CrossRef Google Scholar

Sneath, P. H. A. A comparison of different clustering methods as applied to randomly-spaced points. Classification Society Bulletin, 1966, 1, 2–18.Google Scholar

Sneath, P. H. A. Evaluation of clustering methods. In Cole, A. J. (Eds.), Numerical taxonomy, 1969, New York: Academic Press.Google Scholar

Sneath, P. H. A. & Sokal, R. R. Numerical taxonomy, 1973, San Francisco: Freeman.Google Scholar

Williams, W. T., Lance, G. N., Dale, M. B. & Clifford, H. T. Controversy concerning the criteria for taxonometric strategies. Computer Journal, 1971, 14, 162–165.CrossRef Google Scholar

Zahn, C. T. Graph theory methods for detecting and describing Gestalt clusters. IEEE Transactions on Computers, 1971, C-20, 68–86.CrossRef Google Scholar

Article contents

An Examination of the Effect of Six Types of Error Perturbation on Fifteen Clustering Algorithms

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Reference Notes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests