Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-5nwft Total loading time: 0 Render date: 2024-06-01T09:27:57.863Z Has data issue: false hasContentIssue false

Modern Dimension Reduction

Published online by Cambridge University Press:  10 July 2021

Philip D. Waggoner
Affiliation:
University of Chicago

Summary

Data are not only ubiquitous in society, but are increasingly complex both in size and dimensionality. Dimension reduction offers researchers and scholars the ability to make such complex, high dimensional data spaces simpler and more manageable. This Element offers readers a suite of modern unsupervised dimension reduction techniques along with hundreds of lines of R code, to efficiently represent the original high dimensional data space in a simplified, lower dimensional subspace. Launching from the earliest dimension reduction technique principal components analysis and using real social science data, I introduce and walk readers through application of the following techniques: locally linear embedding, t-distributed stochastic neighbor embedding (t-SNE), uniform manifold approximation and projection, self-organizing maps, and deep autoencoders. The result is a well-stocked toolbox of unsupervised algorithms for tackling the complexities of high dimensional data so common in modern society. All code is publicly accessible on Github.
Get access
Type
Element
Information
Online ISBN: 9781108981767
Publisher: Cambridge University Press
Print publication: 05 August 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

ANES, American National Election Studies. 2019. Pilot Study.Google Scholar
Armstrong, David A., Bakker, Ryan, Carroll, Royce, et al. 2014. Analyzing Spatial Models of Choice and Judgment with R. CRC Press.CrossRefGoogle Scholar
Chang, Hong and Yeung, Dit-Yan. 2006. “Robust locally linear embedding.” Pattern Recognition 39(6):10531065.Google Scholar
Friedman, Jerome, Hastie, Trevor, and Tibshirani, Robert. 2001. The Elements of Statistical Learning. Springer.Google Scholar
Goodfellow, Ian, Bengio, Yoshua, Courville, Aaron, and Bengio, Yoshua. 2016. Deep Learning. Massachusetts Institute of Technology Press.Google Scholar
Hebb, Donald Olding. 1949. The Organization of Behavior: A Neuropsychological Theory. J. Wiley; Chapman & Hall.Google Scholar
James, Gareth, Witten, Daniela, Hastie, Trevor, and Tibshirani, Robert. 2013. An Introduction to Statistical Learning. Springer.CrossRefGoogle Scholar
Kennedy, Ryan and Waggoner, Philip D.. 2021. Introduction to R for Social Scientists: A Tidy Programming Approach. CRC Press.Google Scholar
Kohonen, Teuvo. 1982. “Self-organized formation of topologically correct feature maps.” Biological Cybernetics 43(1):5969.CrossRefGoogle Scholar
Kourtit, Karima, Nijkamp, Peter, and Arribas, Daniel. 2012. “Smart cities in perspective – a comparative European study by means of self-organizing maps.” Innovation: The European Journal of Social Science Research 25(2):229246.Google Scholar
Krishnan, Raghavan, Samaranayake, V. A., and Jagannathan, Sarangapani. 2018. “A multi-step nonlinear dimension-reduction approach with applications to big data.” IEEE Transactions on Knowledge and Data Engineering 31(12):22492261.CrossRefGoogle Scholar
Li, Juntao, Song, Yan, Zhang, Haisong, et al. 2018. Generating classical Chinese poems via conditional variational autoencoder and adversarial training. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. pp. 38903900.CrossRefGoogle Scholar
McInnes, Leland, Healy, John, and Melville, James. 2018. “Umap: Uniform manifold approximation and projection for dimension reduction.” arXiv preprint:1802.03426.Google Scholar
Mikolov, Tomas, Chen, Kai, Corrado, Greg, and Dean, Jeffrey. 2013. “Efficient estimation of word representations in vector space.” arXiv preprint arXiv:1301.3781.Google Scholar
Ordun, Catherine, Purushotham, Sanjay, and Raff, Edward. 2020. “Exploratory analysis of COVID-19 tweets using topic modeling, umap, and digraphs.” arXiv preprint arXiv:2005.03082.Google Scholar
Riemann, B. 1873. “On the Hypotheses that Lie at the Bases of Geometry (1854).” English translation by W. K. Clifford, Nature 8.Google Scholar
Roweis, Sam T. and Saul, Lawrence K.. 2000. “Nonlinear dimensionality reduction by locally linear embedding.” Science 290(5500):23232326.Google Scholar
Rubin, Donald B. 1976. “Inference and missing data.” Biometrika 63(3): 581592.Google Scholar
Saul, Lawrence K. and Roweis, Sam T.. 2003. “Think globally, fit locally: unsupervised learning of low dimensional manifolds.” Journal of Machine Learning Research 4(Jun):119155.Google Scholar
Schölkopf, Bernhard, Smola, Alexander, and Müller, Klaus-Robert. 1997. Kernel principal component analysis. In International Conference on Artificial Neural Networks. Springer pp. 583588.Google Scholar
van der Maaten, Laurens. 2014. “Accelerating t-SNE using tree-based algorithms.” The Journal of Machine Learning Research 15(1):32213245.Google Scholar
van der Maaten, Laurens and Hinton, Geoffrey. 2008. “Visualizing data using t-SNE.” Journal of Machine Learning Research 9(Nov):25792605.Google Scholar
Waggoner, Philip D. 2019. “amerika: American Politics-Inspired Color Palette Generator. R package version 0.1.0.” https://CRAN.R-project.org/package=amerikaGoogle Scholar
Waggoner, Philip D. 2020. Unsupervised Machine Learning for Clustering in Political and Social Research. Cambridge University Press.CrossRefGoogle Scholar
Waggoner, Philip D. 2021. “Pandemic Policymaking.” Journal of Social Computing 2(1):1426.CrossRefGoogle Scholar
Wattenberg, Martin, Viégas, Fernanda, and Johnson, Ian. 2016. “How to use t-SNE effectively.” Distill 1(10):e2.Google Scholar
Wickham, Hadley, Averick, Mara, Bryan, Jennifer, et al. 2019. “Welcome to the Tidyverse.” Journal of Open Source Software 4(43):1686.CrossRefGoogle Scholar
Zhang, Tonglin and Yang, Baijian. 2018. “Dimension reduction for big data.” Statistics and Its Interface 11(2):295306.Google Scholar
Zou, Hui, Hastie, Trevor, and Tibshirani, Robert. 2006. “Sparse principal component analysis.” Journal of Computational and Graphical Statistics 15(2):265286.Google Scholar

Save element to Kindle

To save this element to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Modern Dimension Reduction
Available formats
×

Save element to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Modern Dimension Reduction
Available formats
×

Save element to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Modern Dimension Reduction
Available formats
×