Unsupervised Machine Learning for Clustering in Political and Social Research

Philip D. Waggoner

doi:10.1017/9781108883955

Series: Elements in Quantitative and Computational Methods for the Social Sciences

Unsupervised Machine Learning for Clustering in Political and Social Research

Published online by Cambridge University Press: 15 December 2020

Philip D. Waggoner

Show author details

Philip D. Waggoner: Affiliation:
University of Chicago

Summary

In the age of data-driven problem-solving, applying sophisticated computational tools for explaining substantive phenomena is a valuable skill. Yet, application of methods assumes an understanding of the data, structure, and patterns that influence the broader research program. This Element offers researchers and teachers an introduction to clustering, which is a prominent class of unsupervised machine learning for exploring and understanding latent, non-random structure in data. A suite of widely used clustering techniques is covered in this Element, in addition to R code and real data to facilitate interaction with the concepts. Upon setting the stage for clustering, the following algorithms are detailed: agglomerative hierarchical clustering, k-means clustering, Gaussian mixture models, and at a higher-level, fuzzy C-means clustering, DBSCAN, and partitioning around medoids (k-medoids) clustering.

Element contents

Summary
References

Get access

Keywords

clustering unsupervised machine learning computational social science R

Type: Element
Information: Series: Elements in Quantitative and Computational Methods for the Social Sciences

DOI: https://doi.org/10.1017/9781108883955 [Opens in a new window]

Online ISBN: 9781108883955

Publisher: Cambridge University Press

Print publication: 28 January 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Anscombe, F. J. 1973. “Graphs in statistical analysis.” The American Statistician 27:17–21.Google Scholar

Baumer, Benjamin S., Kaplan, Daniel T., and Horton, Nicholas J.. 2017. Modern Data Science with R. Chapman and Hall/CRC.Google Scholar

Benaglia, Tatiana, Chauveau, Didier, Hunter, David, and Young, Derek. 2009. “mixtools: An R package for analyzing finite mixture models.” Journal of Statistical Software 32(6):1–29.CrossRef Google Scholar

Bezdek, James C., and Hathaway, Richard J.. 2002. VAT: A tool for visual assessment of (cluster) tendency. In IJCNN’02. Proceedings of the 2002 International Joint Conference on Neural Networks. Vol. 3 IEEE pp. 2225–2230.Google Scholar

Bezdek, James C., Ehrlich, Robert, and Full, William. 1984. “FCM: The Fuzzy C-Means clustering algorithm.” Computers & Geosciences 10(2–3):191–203.Google Scholar

Bouveyron, Charles, Celeux, Gilles, Murphy, T. Brendan, and Raftery, Adrian E.. 2019. Model-Based Clustering and Classification for Data Science: With Applications in R. Cambridge University Press.Google Scholar

Bowen, Daniel C., and Greene, Zachary. 2014. “Should we measure professionalism with an index? A note on theory and practice in state legislative professionalism research.” State Politics & Policy Quarterly 14(3): 277–296.Google Scholar

Brock, Guy, Pihur, Vasyl, Datta, Susmita, Datta, Somnath, et al. 2011. “clValid, an R package for cluster validation.” Journal of Statistical Software.Google Scholar

Day, William H. E., and Edelsbrunner, Herbert. 1984. “Efficient algorithms for agglomerative hierarchical clustering methods.” Journal of Classification 1(1):7–24.Google Scholar

Ester, Martin, Kriegel, Hans-Peter, Sander, Jörg, Xu, Xiaowei, et al. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd. Vol. 96 pp. 226–231.Google Scholar

Figueiredo, Mario A. T., and Jain, Anil K.. 2002. “Unsupervised learning of finite mixture models.” IEEE Transactions on Pattern Analysis and Machine Intelligence 24(3):381–396.CrossRef Google Scholar

Friedman, Jerome, Hastie, Trevor, and Tibshirani, Robert. 2001. The Elements of Statistical Learning. Springer series in statistics. New York, NY.Google Scholar

Gong, Xiaoliang, Long, Bozhong, Fang, Kun, Di, Zongling, Hou, Yichu, and Cao, Lei. 2016. A prediction based on clustering and personality questionnaire data for IGD risk: A preliminary work. In 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). IEEE pp. 1699–1703.Google Scholar

Hara, Kotaro, Adams, Abigail, Milland, Kristy, Savage, Saiph, Callison-Burch, Chris, and Bigham, Jeffrey P.. 2018. A data-driven analysis of workers’ earnings on Amazon Mechanical Turk. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM p. 449.Google Scholar

Hartigan, John A., and Wong, Manchek A.. 1979. “Algorithm AS 136: A kmeans clustering algorithm.” Journal of the Royal Statistical Society. Series C (Applied Statistics) 28(1):100–108.Google Scholar

Johnson, Stephen C. 1967. “Hierarchical clustering schemes.” Psychometrika 32(3):241–254.CrossRef Google Scholar PubMed

Kanungo, Tapas, Mount, David M., Netanyahu, Nathan S., Piatko, Christine D., Silverman, Ruth, and Wu, Angela Y.. 2002. “An efficient k-means clustering algorithm: Analysis and implementation.” IEEE Transactions on Pattern Analysis & Machine Intelligence (7):881–892.Google Scholar

Kassambara, Alboukadel. 2017. Practical Guide to Cluster Analysis in R: Unsupervised Machine Learning. Vol. 1 STHDA.Google Scholar

Kaufman, Leonard, and Rousseeuw, Peter J.. 2009. Finding Groups in Data: an Introduction to Cluster Analysis. Vol. 344 John Wiley & Sons.Google Scholar

Matejka, Justin, and Fitzmaurice, George. 2017. Same stats, different graphs: generating datasets with varied appearance and identical statistics through simulated annealing. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM pp. 1290–1294.Google Scholar

Moon, Todd K. 1996. “The expectation-maximization algorithm.” IEEE Signal Processing Magazine 13(6):47–60.CrossRef Google Scholar

Muthén, Bengt, and Shedden, Kerby. 1999. “Finite mixture modeling with mixture outcomes using the EM algorithm.” Biometrics 55(2): 463–469.Google Scholar

Squire, Peverill. 1992. “Legislative professionalization and membership diversity in state legislatures.” Legislative Studies Quarterly pp. 69–79.Google Scholar

Squire, Peverill. 2000. “Uncontested seats in state legislative elections.” Legislative Studies Quarterly pp. 131–146.Google Scholar

Squire, Peverill. 2007. “Measuring state legislative professionalism: The squire index revisited.” State Politics & Policy Quarterly 7(2): 211–227.Google Scholar

Squire, Peverill. 2017. “A Squire Index update.” State Politics & Policy Quarterly 17(4):361–371.Google Scholar

Tukey, John W. 1980. “We need both exploratory and confirmatory.” The American Statistician 34(1):23–25.Google Scholar

Wickham, Hadley, and Grolemund, Garrett. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.Google Scholar

Element contents

Unsupervised Machine Learning for Clustering in Political and Social Research

Summary

Keywords

Access options

References

Save element to Kindle

Save element to Dropbox

Save element to Google Drive