Modern Dimension Reduction

Philip D. Waggoner

doi:10.1017/9781108981767

Data are not only ubiquitous in society, but are increasingly complex both in size and dimensionality. Dimension reduction offers researchers and scholars the ability to make such complex, high dimensional data spaces simpler and more manageable. This Element offers readers a suite of modern unsupervised dimension reduction techniques along with hundreds of lines of R code, to efficiently represent the original high dimensional data space in a simplified, lower dimensional subspace. Launching from the earliest dimension reduction technique principal components analysis and using real social science data, I introduce and walk readers through application of the following techniques: locally linear embedding, t-distributed stochastic neighbor embedding (t-SNE), uniform manifold approximation and projection, self-organizing maps, and deep autoencoders. The result is a well-stocked toolbox of unsupervised algorithms for tackling the complexities of high dimensional data so common in modern society. All code is publicly accessible on Github.

References

ANES, American National Election Studies. 2019. Pilot Study.

Armstrong, David A., Bakker, Ryan, Carroll, Royce, et al. 2014. Analyzing Spatial Models of Choice and Judgment with R. CRC Press.

Chang, Hong and Yeung, Dit-Yan. 2006. “Robust locally linear embedding.” Pattern Recognition 39(6):1053–1065.

Friedman, Jerome, Hastie, Trevor, and Tibshirani, Robert. 2001. The Elements of Statistical Learning. Springer.

Goodfellow, Ian, Bengio, Yoshua, Courville, Aaron, and Bengio, Yoshua. 2016. Deep Learning. Massachusetts Institute of Technology Press.

Hebb, Donald Olding. 1949. The Organization of Behavior: A Neuropsychological Theory. J. Wiley; Chapman & Hall.

James, Gareth, Witten, Daniela, Hastie, Trevor, and Tibshirani, Robert. 2013. An Introduction to Statistical Learning. Springer.

Kennedy, Ryan and Waggoner, Philip D.. 2021. Introduction to R for Social Scientists: A Tidy Programming Approach. CRC Press.

Kohonen, Teuvo. 1982. “Self-organized formation of topologically correct feature maps.” Biological Cybernetics 43(1):59–69.

Kourtit, Karima, Nijkamp, Peter, and Arribas, Daniel. 2012. “Smart cities in perspective – a comparative European study by means of self-organizing maps.” Innovation: The European Journal of Social Science Research 25(2):229–246.

Krishnan, Raghavan, Samaranayake, V. A., and Jagannathan, Sarangapani. 2018. “A multi-step nonlinear dimension-reduction approach with applications to big data.” IEEE Transactions on Knowledge and Data Engineering 31(12):2249–2261.

Li, Juntao, Song, Yan, Zhang, Haisong, et al. 2018. Generating classical Chinese poems via conditional variational autoencoder and adversarial training. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. pp. 3890–3900.

McInnes, Leland, Healy, John, and Melville, James. 2018. “Umap: Uniform manifold approximation and projection for dimension reduction.” arXiv preprint:1802.03426.

Mikolov, Tomas, Chen, Kai, Corrado, Greg, and Dean, Jeffrey. 2013. “Efficient estimation of word representations in vector space.” arXiv preprint arXiv:1301.3781.

Ordun, Catherine, Purushotham, Sanjay, and Raff, Edward. 2020. “Exploratory analysis of COVID-19 tweets using topic modeling, umap, and digraphs.” arXiv preprint arXiv:2005.03082.

Riemann, B. 1873. “On the Hypotheses that Lie at the Bases of Geometry (1854).” English translation by W. K. Clifford, Nature 8.

Roweis, Sam T. and Saul, Lawrence K.. 2000. “Nonlinear dimensionality reduction by locally linear embedding.” Science 290(5500):2323–2326.

Rubin, Donald B. 1976. “Inference and missing data.” Biometrika 63(3): 581–592.

Saul, Lawrence K. and Roweis, Sam T.. 2003. “Think globally, fit locally: unsupervised learning of low dimensional manifolds.” Journal of Machine Learning Research 4(Jun):119–155.

Schölkopf, Bernhard, Smola, Alexander, and Müller, Klaus-Robert. 1997. Kernel principal component analysis. In International Conference on Artificial Neural Networks. Springer pp. 583–588.

van der Maaten, Laurens. 2014. “Accelerating t-SNE using tree-based algorithms.” The Journal of Machine Learning Research 15(1):3221–3245.

van der Maaten, Laurens and Hinton, Geoffrey. 2008. “Visualizing data using t-SNE.” Journal of Machine Learning Research 9(Nov):2579–2605.

Waggoner, Philip D. 2019. “amerika: American Politics-Inspired Color Palette Generator. R package version 0.1.0.” https://CRAN.R-project.org/package=amerika

Waggoner, Philip D. 2020. Unsupervised Machine Learning for Clustering in Political and Social Research. Cambridge University Press.

Waggoner, Philip D. 2021. “Pandemic Policymaking.” Journal of Social Computing 2(1):14–26.

Wattenberg, Martin, Viégas, Fernanda, and Johnson, Ian. 2016. “How to use t-SNE effectively.” Distill 1(10):e2.

Wickham, Hadley, Averick, Mara, Bryan, Jennifer, et al. 2019. “Welcome to the Tidyverse.” Journal of Open Source Software 4(43):1686.

Zhang, Tonglin and Yang, Baijian. 2018. “Dimension reduction for big data.” Statistics and Its Interface 11(2):295–306.

Zou, Hui, Hastie, Trevor, and Tibshirani, Robert. 2006. “Sparse principal component analysis.” Journal of Computational and Graphical Statistics 15(2):265–286.

Metrics

Altmetric attention score

Total number of HTML views: 0

Total number of PDF views: 0 *

Loading metrics...

Total views: 0 *

Loading metrics...

* Views captured on Cambridge Core between #date#. This data will be updated every 24 hours.

Usage data cannot currently be displayed.

Modern Dimension Reduction

This Element has been cited by the following publications. This list is generated based on data provided by Crossref.

Book description

References

Metrics

Altmetric attention score

Full text views

Book summary page views

Accessibility standard: Unknown

Why this information is here

Accessibility Information