Hostname: page-component-89b8bd64d-ksp62 Total loading time: 0 Render date: 2026-05-09T20:37:43.429Z Has data issue: false hasContentIssue false

The Power of Characters: Evaluating Machine Learning-Modified Bayesian Improved Surname Geocoding Inference of Race in Redistricting

Published online by Cambridge University Press:  22 May 2024

John A. Curiel*
Affiliation:
Survey Research Group, YouGov Inc., New York City, NY, USA
Kevin DeLuca
Affiliation:
Department of Political Science, Yale University, New Haven, CT, USA
*
Corresponding author: John A. Curiel; Email: jcuriel@mit.edu
Rights & Permissions [Opens in a new window]

Abstract

Identifying racial disparities in policy and politics is a pressing area of research within the United States. Where early work made use of identifying potentially noisy correlations between county or precinct demographics and election outcomes, the advent of Bayesian Improved Surname Geocoding (BISG) vastly improved estimation of race by employing voter lists. Machine Learning (ML)-modified BISG in turn offers accuracy gains over the static – and potentially outdated – surname dictionaries present in traditional BISG. However, the extent to which ML might substantively alter the policy and political implications of redistricting is unclear given its improvements in voter race estimation. Therefore, we ascertain the potential gains of ML-modified BISG in improving the estimation of race for the purpose of redistricting majority-minority districts. We evaluate an ML-modified BISG program against traditional BISG estimates in correctly estimating the race of voters for creating majority-minority congressional districts within North Carolina and Georgia, and in state assembly districts in Wisconsin. Our results demonstrate that ML-modified BISG offers substantive gains over traditional BISG, especially in diverse political geographic units. Further, we find meaningful improvements in accuracy when estimating majority-minority district racial composition. We conclude with recommendations on when and how to use the two methods, in addition how to ensure transparency and confidence in BISG-related research.

Information

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of the State Politics and Policy Section of the American Political Science Association
Figure 0

Figure 1. Distribution of differences error by race assignment and state.

Figure 1

Figure 2. Surname dictionary BISG vs. ML differences in error – North Carolina.

Figure 2

Figure 3. Surname dictionary BISG vs. ML differences in error – Georgia.

Figure 3

Figure 4. BISG vs. ZRP accuracy – North Carolina.

Figure 4

Figure 5. BISG vs. ZRP accuracy – Georgia.

Figure 5

Figure 6. BISG vs. ZRP accuracy – Wisconsin state assembly proposals.

Supplementary material: Link

Curiel and DeLuca Dataset

Link