Hostname: page-component-6766d58669-vgfm9 Total loading time: 0 Render date: 2026-05-17T16:08:11.592Z Has data issue: false hasContentIssue false

Measuring and Modeling Neighborhoods

Published online by Cambridge University Press:  02 February 2024

CORY McCARTAN*
Affiliation:
New York University, United States
JACOB R. BROWN*
Affiliation:
Boston University, United States
KOSUKE IMAI*
Affiliation:
Harvard University, United States
*
Corresponding author: Cory McCartan, Faculty Fellow, Center for Data Science, New York University, United States, corymccartan@nyu.edu.
Jacob R. Brown, Assistant Professor, Department of Political Science, Boston University, United States, jbrown13@bu.edu.
Kosuke Imai, Professor, Department of Government and Department of Statistics, Harvard University, United States, imai@harvard.edu.
Rights & Permissions [Opens in a new window]

Abstract

Granular geographic data present new opportunities to understand how neighborhoods are formed, and how they influence politics. At the same time, the inherent subjectivity of neighborhoods creates methodological challenges in measuring and modeling them. We develop an open-source survey instrument that allows respondents to draw their neighborhoods on a map. We also propose a statistical model to analyze how the characteristics of respondents and local areas determine subjective neighborhoods. We conduct two surveys: collecting subjective neighborhoods from voters in Miami, New York City, and Phoenix, and asking New York City residents to draw a community of interest for inclusion in their city council district. Our analysis shows that, holding other factors constant, white respondents include census blocks with more white residents in their neighborhoods. Similarly, Democrats and Republicans are more likely to include co-partisan areas. Furthermore, our model provides more accurate out-of-sample predictions than standard neighborhood measures.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of American Political Science Association
Figure 0

Figure 1. Map with Brush Tool Used to Draw Neighborhoods

Figure 1

Figure 2. Descriptive Statistics for Respondent Neighborhoods

Figure 2

Figure 3. Model SchematicNote: The respondent’s location is indicated by the black house in the center. Blocks are labeled in the order in which they are considered for inclusion in the neighborhood, with the number indicating the graph-theoretical distance and the letter the spatial distance tiebreaker. Blocks shaded purple have been included in the neighborhood, whereas blocks shaded light orange have been excluded. Parentheses around a block label indicate that it will not be considered for inclusion in the neighborhood because none of its neighboring blocks which are closer to the respondent belong to the neighborhood.

Figure 3

Figure 4. Illustration of Kernel Function across a Range of Values of the $ \alpha $ Parameter, Indicated by Different ColorsNote: The length scale shown here is arbitrary; in the model, it is estimated as the L parameter.

Figure 4

Figure 5. Selected Full Model Coefficient Posteriors, Scaled to Show the Percentage Point Change in Probability of a Block’s Inclusion for a Baseline Probability of 50%Note: Plotted are 90% and 50% credible intervals, with posterior medians displayed to the right of each interval. Section S5 of the Supplementary Material contains the full results table for the other variables specified the “Model Specification” section.

Figure 5

Figure 6. Posterior Median of the Difference in F1 Scores between a Neighborhood Predicted by the Model and a Circular Neighborhood of the Same Radius (Top) or a Census Tract (Bottom)Note: The boxplot shows the variation in this median difference across the respondents included in the model fitting (left plot) and excluded from the model fitting (right plot). Positive values indicate the model outperforming the circular baseline, on average, for a particular respondent. The baseline model includes geographic information only while the full model also includes demographic information. Section S5 of the Supplementary Material contains the full results tables for the full and baseline models.

Figure 6

Figure 7. The Left Plot Shows the Racial Demographics of Area Surrounding the Example RespondentNote: The subjective neighborhood drawn by this respondent is indicated by the solid black line and each census block is shaded based on the percent white of its population. The right plot shows the difference in the posterior probability of a block being included in the respondent’s neighborhood between the full and baseline models. The baseline model includes geographic information only while the full model also includes demographic information. Blue areas are relatively more likely to be included under the full model, while orange areas are relatively less likely to be included.

Figure 7

Figure 8. Descriptive Statistics for Respondent Communities of Interest

Figure 8

Figure 9. Selected Full Model Coefficient Posteriors, Scaled to Show the Percentage Point Change in Probability of a Block’s Inclusion for a Baseline Probability of 50%Note: Plotted are 90% and 50% credible intervals, with posterior medians displayed to the right of each interval. Section S5 of the Supplementary Material contains the full results table.

Figure 9

Figure 10. Posterior Median of the Difference in F1 Scores between a Community of Interest Predicted by the Model Prediction and a Circular Neighborhood of the Same Radius (Top) and a Census Tract (Bottom)Note: The boxplot shows the variation in this median difference across the respondents included in the model fitting (left plot) and excluded from the model fitting (right plot). Positive values indicate the model outperforming the circle (tract), on average, for a particular respondent. The baseline model includes geographic information only while the full model also includes demographic information.

Figure 10

Figure 11. On the Left, a Map Visualizing the Consensus Community of a Synthetic Residential Population of a Single Census Block, Which is Marked with a White AsteriskNote: Darker blocks are those which are included in a higher proportion of synthetic resident’s predicted neighborhoods. On the right, the trade-off between the size of the community of interest and the degree of consensus is plotted for communities in two areas: one with high and one with low racial diversity.

Supplementary material: File

McCartan et al. supplementary material 1

McCartan et al. supplementary material
Download McCartan et al. supplementary material 1(File)
File 368.7 KB
Supplementary material: File

McCartan et al. supplementary material 2

McCartan et al. supplementary material
Download McCartan et al. supplementary material 2(File)
File 4.2 MB
Submit a response

Comments

No Comments have been published for this article.