Hostname: page-component-89b8bd64d-b5k59 Total loading time: 0 Render date: 2026-05-12T05:59:21.747Z Has data issue: false hasContentIssue false

Nationally Representative, Locally Misaligned: The Biases of Generative Artificial Intelligence in Neighborhood Perception

Published online by Cambridge University Press:  06 October 2025

Paige Bollen
Affiliation:
Department of Political Science, Ohio State University , Columbus, OH, USA
Joe Higton
Affiliation:
Department of Politics, New York University , New York, NY, USA
Melissa Sands*
Affiliation:
Department of Government, London School of Economics , London, UK
*
Corresponding author: Melissa Sands; Email: mlsands@gmail.com
Rights & Permissions [Opens in a new window]

Abstract

Researchers across disciplines increasingly use Generative Artificial Intelligence (GenAI) to label text and images or as pseudo-respondents in surveys. But of which populations are GenAI models most representative? We use an image classification task—assessing crowd-sourced street view images of urban neighborhoods in an American city—to compare assessments generated by GenAI models with those from a nationally representative survey and a locally representative survey of city residents. While GenAI responses, on average, correlate strongly with the perceptions of a nationally representative survey sample, the models poorly approximate the perceptions of those actually living in the city. Examining perceptions of neighborhood safety, wealth, and disorder reveals a clear bias in GenAI toward national averages over local perspectives. GenAI is also better at recovering relative distributions of ratings, rather than mimicking absolute human assessments. Our results provide evidence that GenAI performs particularly poorly in reflecting the opinions of hard-to-reach populations. Tailoring prompts to encourage alignment with subgroup perceptions generally does not improve accuracy and can lead to greater divergence from actual subgroup views. These results underscore the limitations of using GenAI to study or inform decisions in local communities but also highlight its potential for approximating “average” responses to certain types of questions. Finally, our study emphasizes the importance of carefully considering the identity and representativeness of human raters or labelers—a principle that applies broadly, whether GenAI tools are used or not.

Information

Type
Letter
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of The Society for Political Methodology
Figure 0

Figure 1 GPT’s average evaluations of wealth, daytime safety, nighttime safety, and disorder compared to average evaluations of U.S. and Detroit samples.Note: The top panel shows human samples’ average perceptions plotted against GPT’s average assessments. Each dot represents an image. The diagonal dashed line represents where the two are perfectly equivalent. Correlation coefficients and LOESS and linear regression lines with 95% confidence intervals are shown. The bottom panel displays the outcome of pairwise two-sample t-tests comparing the means of human- and GPT-derived ratings for each image.

Figure 1

Figure 2 GPT’s average evaluations of wealth, daytime safety, nighttime safety, and disorder compared to average evaluations of women in the U.S. and Detroit samples.Note: The top panel shows human samples’ average perceptions plotted against GPT’s average assessments. Each dot represents an image. The diagonal dashed line represents where the two are perfectly equivalent. Correlation coefficients and LOESS and linear regression lines with 95% confidence intervals are shown.

Figure 2

Figure 3 GPT’s average evaluations of wealth, daytime safety, nighttime safety, and disorder compared to average evaluations of men in the U.S. and Detroit samples.Note: The top panel shows human samples’ average perceptions plotted against GPT’s average assessments. Each dot represents an image. The diagonal dashed line represents where the two are perfectly equivalent. Correlation coefficients and LOESS and linear regression lines with 95% confidence intervals are shown.

Supplementary material: File

Bollen et al. supplementary material

Bollen et al. supplementary material
Download Bollen et al. supplementary material(File)
File 9 MB
Supplementary material: Link

Bollen et al. Dataset

Link