Out of One, Many: Using Language Models to Simulate Human Samples

Lisa P. Argyle; Ethan C. Busby; Nancy Fulda; Joshua R. Gubler; Christopher Rytting; David Wingate

doi:10.1017/pan.2023.2

Out of One, Many: Using Language Models to Simulate Human Samples

Published online by Cambridge University Press: 21 February 2023

Christopher Rytting and

David Wingate

Show author details

Lisa P. Argyle*: Affiliation:
Department of Political Science, Brigham Young University, Provo, UT, USA. e-mail: lpargyle@byu.edu, ethan.busby@byu.edu, jgub@byu.edu
Ethan C. Busby: Affiliation:
Department of Political Science, Brigham Young University, Provo, UT, USA. e-mail: lpargyle@byu.edu, ethan.busby@byu.edu, jgub@byu.edu
Nancy Fulda: Affiliation:
Department of Computer Science, Brigham Young University, Provo, UT, USA. e-mail: nfulda@cs.byu.edu, christophermichaelrytting@gmail.com, wingated@cs.byu.edu
Joshua R. Gubler: Affiliation:
Department of Political Science, Brigham Young University, Provo, UT, USA. e-mail: lpargyle@byu.edu, ethan.busby@byu.edu, jgub@byu.edu
Christopher Rytting: Affiliation:
Department of Computer Science, Brigham Young University, Provo, UT, USA. e-mail: nfulda@cs.byu.edu, christophermichaelrytting@gmail.com, wingated@cs.byu.edu
David Wingate: Affiliation:
Department of Computer Science, Brigham Young University, Provo, UT, USA. e-mail: nfulda@cs.byu.edu, christophermichaelrytting@gmail.com, wingated@cs.byu.edu
*: Corresponding author Lisa P. Argyle

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

We propose and explore the possibility that language models can be studied as effective proxies for specific human subpopulations in social science research. Practical and research applications of artificial intelligence tools have sometimes been limited by problematic biases (such as racism or sexism), which are often treated as uniform properties of the models. We show that the “algorithmic bias” within one such tool—the GPT-3 language model—is instead both fine-grained and demographically correlated, meaning that proper conditioning will cause it to accurately emulate response distributions from a wide variety of human subgroups. We term this property algorithmic fidelity and explore its extent in GPT-3. We create “silicon samples” by conditioning the model on thousands of sociodemographic backstories from real human participants in multiple large surveys conducted in the United States. We then compare the silicon and human samples to demonstrate that the information contained in GPT-3 goes far beyond surface similarity. It is nuanced, multifaceted, and reflects the complex interplay between ideas, attitudes, and sociocultural context that characterize human attitudes. We suggest that language models with sufficient algorithmic fidelity thus constitute a novel and powerful tool to advance understanding of humans and society across a variety of disciplines.

Keywords

artificial intelligence machine learning computational social science public opinion

Type: Article
Information: Political Analysis , Volume 31 , Issue 3 , July 2023 , pp. 337 - 351

DOI: https://doi.org/10.1017/pan.2023.2 [Opens in a new window]
Copyright: © The Author(s), 2023. Published by Cambridge University Press on behalf of the Society for Political Methodology

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Edited by Jeff Gill

References

Adiwardana, D., et al. 2020. “Towards a Human-Like Open-Domain Chatbot.” Preprint, arXiv:2001.09977.Google Scholar

ANES. 2021. “American National Election Studies.” https://electionstudies.org/about-us/.Google Scholar

Argyle, L. P., Busby, E. C., Fulda, N., Gubler, J. R., Rytting, C., and Wingate, D.. 2022. “Replication Data for: ‘Out of One, Many: Using Language Models to Simulate Human Samples’.” https://doi.org/10.7910/DVN/JPV20K CrossRef Google Scholar

Barberá, P., Boydstun, A. E., Linn, S., McMahon, R., and Nagler, J.. 2021. “Automated Text Classification of News Articles: A Practical Guide.” Political Analysis 29 (1): 19–42.CrossRef Google Scholar

Barocas, S., and Selbst, A. D.. 2016. “Big Data’s Disparate Impact.” California Law Review 104: 671.Google Scholar

Bender, E. M., Gebru, T., McMillan-Major, A., and Shmitchell, S.. 2021. “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623.Google Scholar

Benoit, K., Munger, K., and Spirling, A.. 2019. “Measuring and Explaining Political Sophistication through Textual Complexity.” American Journal of Political Science 63 (2): 491–508.CrossRef Google Scholar PubMed

Berelson, B., Lazarsfeld, P. F., and McPhee, W. N.. 1954. Voting: A Study of Opinion Formation in a Presidential Campaign. Chicago, IL: University of Chicago Press.Google Scholar

Box-Steffensmeier, J. M., Boef, S. D., and Lin, T.-m.. 2004. “The Dynamics of the Partisan Gender Gap.” American Political Science Review 98 (3): 515–528.CrossRef Google Scholar

Brown, T. B., et al. 2020. “Language Models Are Few-Shot Learners.” Advances in Neural Information Processing Systems 33: 1–25.Google Scholar

Burns, N., and Gallagher, K.. 2010. “Public Opinion on Gender Issues: The Politics of Equity and Roles.” Annual Review of Political Science 13 (1): 425–443.CrossRef Google Scholar

Caliskan, A., Bryson, J. J., and Narayanan, A.. 2017. “Semantics Derived Automatically from Language Corpora Contain Human-Like Biases.” Science 356 (6334): 183–186.10.1126/science.aal4230CrossRef Google Scholar PubMed

Campbell, A., Converse, P. E., Miller, W. E., and Stokes, D. E.. 1960. The American Voter. Chicago, IL: University of Chicago Press.Google Scholar

Coppock, A., and McClellan, O. A.. 2019. “Validating the Demographic, Political, Psychological, and Experimental Results Obtained from a New Source of Online Survey Respondents.” Research & Politics 6 (1): 1–14.CrossRef Google Scholar

Cramér, H. 1946. Mathematical Methods of Statistics. Princeton, NJ: Princeton University Press.Google Scholar

Cramer, K. 2020. “Understanding the Role of Racism in Contemporary US Public Opinion.” Annual Review of Political Science 23 (1): 153–169.10.1146/annurev-polisci-060418-042842CrossRef Google Scholar

Cramer, K. J. 2016. The Politics of Resentment: Rural Consciousness in Wisconsin and the Rise of Scott Walker. Chicago, IL: University of Chicago Press.CrossRef Google Scholar

Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V., and Salakhutdinov, R.. 2019. “Transformer-Xl: Attentive Language Models beyond a Fixed-Length Context.” In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2978–2988, Florence, Italy. Association for Computational Linguistics.CrossRef Google Scholar

Druckman, J. N., and Lupia, A.. 2016. “Preference Change in Competitive Political Environments.” Annual Review of Political Science 19 (1): 13–31.CrossRef Google Scholar

Garg, N., Schiebinger, L., Jurafsky, D., and Zou, J.. 2018. “Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes.” Proceedings of the National Academy of Sciences 115 (16): E3635–E3644.CrossRef Google Scholar PubMed

Greene, K. T., Park, B., and Colaresi, M.. 2019. “Machine Learning Human Rights and Wrongs: How the Successes and Failures of Supervised Learning Algorithms Can Inform the Debate about Information Effects.” Political Analysis 27 (2): 223–230.CrossRef Google Scholar

Grimmer, J., Roberts, M. E., and Stewart, B. M.. 2021. “Machine Learning for Social Science: An Agnostic Approach.” Annual Review of Political Science 24: 395–419.10.1146/annurev-polisci-053119-015921CrossRef Google Scholar

Hutchings, V. L., and Valentino, N. A.. 2004. “The Centrality of Race in American Politics.” Annual Review of Political Science 7 (1): 383–408.10.1146/annurev.polisci.7.012003.104859CrossRef Google Scholar

Iyengar, S., Sood, G., and Lelkes, Y.. 2012. “Affect, Not Ideology a Social Identity Perspective on Polarization.” Public Opinion Quarterly 76 (3): 405–431.CrossRef Google Scholar

Jardina, A. 2019. White Identity Politics. New York: Cambridge University Press.CrossRef Google Scholar

Keith, B. E., Magleby, D. B., Nelson, C. J., Orr, E., and Westyle, M. C.. 1992. The Myth of the Independent Voter. Berkeley, CA: University of California Press.Google Scholar

Klar, S., and Krupnikov, Y.. 2016. Independent Politics: How American Disdain for Parties Leads to Political Inaction. New York: Cambridge University Press.CrossRef Google Scholar

Magleby, D. B., Nelson, C. J., and Westlye, M. C.. 2011. “The Myth of the Independent Voter Revisited.” In Facing the Challenge of Democracy: Explorations in the Analysis of Public Opinion and Political Participation, edited by Sniderman, P. M., and Highton, B., 238–266. Princeton, NJ: Princeton University Press.Google Scholar

Marcus, G. 2020. “The Next Decade in AI: Four Steps towards Robust Artificial Intelligence.” Preprint, arXiv:2002.06177.Google Scholar

Mason, L. 2018. Uncivil Agreement. Chicago, IL: University of Chicago Press.CrossRef Google Scholar

Mayson, S. G. 2018. “Bias In, Bias Out.” Yale Law Journal 128: 2218.Google Scholar

Panch, T., Mattie, H., and Atun, R.. 2019. “Artificial Intelligence and Algorithmic Bias: Implications for Health Systems.” Journal of Globalization and Health 9 (2): 010318. https://doi.org/ 10.7189/jogh.09.020318 CrossRef Google Scholar PubMed

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I.. 2019. “Language Models Are Unsupervised Multitask Learners.” OpenAI Blog 1 (8): 9.Google Scholar

Raffel, C., et al. 2020. “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.” Journal of Machine Learning Research 21 (140): 1–67.Google Scholar

Rheault, L., and Cochrane, C.. 2020. “Word Embeddings for the Analysis of Ideological Placement in Parliamentary Corpora.” Political Analysis 28 (1): 112–133.10.1017/pan.2019.26CrossRef Google Scholar

Rodriguez, P., and Spirling, A.. 2022. “Word Embeddings: What Works, What Doesn’t, and How to Tell the Difference for Applied Research.” Journal of Politics 84 (1): 101–115.CrossRef Google Scholar

Ross, R. S. 2012. Guide for Conducting Risk Assessments (Nist Sp-800-30rev1). Gaithersburg: The National Institute of Standards and Technology (NIST).Google Scholar

Rothschild, J. E., Howat, A. J., Shafranek, R. M., and Busby, E. C.. 2019. “Pigeonholing Partisans: Stereotypes of Party Supporters and Partisan Polarization.” Political Behavior 41 (2): 423–443.CrossRef Google Scholar

Salganik, M. J. 2017. Bit by Bit: Social Research in the Digital Age. Princeton, NJ: Princeton University Press.Google Scholar

Simpson, E. H. 1951. “The Interpretation of Interaction in Contingency Tables.” Journal of the Royal Statistical Society, Series B 13: 238–241.Google Scholar

Tate, K. 1994. From Protest to Politics: The New Black Voters in American Elections. Cambridge, MA: Harvard University Press.Google Scholar

Argyle et al. Dataset

Dataset

https://doi.org/10.7910/DVN/JPV20K

Link

Argyle et al. supplementary material

PDF 1.2 MB

Article contents

Out of One, Many: Using Language Models to Simulate Human Samples

Abstract

Keywords

Access options

Footnotes

References

Argyle et al. Dataset

Argyle et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests