Hostname: page-component-7479d7b7d-pfhbr Total loading time: 0 Render date: 2024-07-15T13:05:25.589Z Has data issue: false hasContentIssue false

Minmaxing of Bayesian Improved Surname Geocoding and Geography Level Ups in Predicting Race

Published online by Cambridge University Press:  29 November 2021

Jesse T. Clark
Postdoctoral Research Associate, Princeton University, Princeton, NJ
John A. Curiel
Assistant Professor, Ohio Northern University, Ada, OH, USA
Tyler S. Steelman*
Department of Political Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA. E-mail:
Corresponding author Tyler S. Steelman


Racial identification is a critical factor in understanding a multitude of important outcomes in many fields. However, inferring an individual’s race from ecological data is prone to bias and error. This process was only recently improved via Bayesian improved surname geocoding (BISG). With surname and geographic-based demographic data, it is possible to more accurately estimate individual racial identification than ever before. However, the level of geography used in this process varies widely. Whereas some existing work makes use of geocoding to place individuals in precise census blocks, a substantial portion either skips geocoding altogether or relies on estimation using surname or county-level analyses. Presently, the trade-offs of such variation are unknown. In this letter, we quantify those trade-offs through a validation of BISG on Georgia’s voter file using both geocoded and nongeocoded processes and introduce a new level of geography—ZIP codes—to this method. We find that when estimating the racial identification of White and Black voters, nongeocoded ZIP code-based estimates are acceptable alternatives. However, census blocks provide the most accurate estimations when imputing racial identification for Asian and Hispanic voters. Our results document the most efficient means to sequentially conduct BISG analysis to maximize racial identification estimation while simultaneously minimizing data missingness and bias.

© The Author(s) 2021. Published by Cambridge University Press on behalf of the Society for Political Methodology

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Edited by Jeff Gill


Alvarez, R. M., Katz, J. N., and Kim, S. S.. 2020. “Hidden Donors: The Censoring Problem in U.S. Federal Campaign Finance Data.” Election Law Journal 19(1):118.CrossRefGoogle Scholar
Amos, B., and McDonald, M. P.. 2020. “A Method to Audit the Assignment of Registered Voters to Districts and Precincts.” Political Analysis 28(3):356371.CrossRefGoogle Scholar
Clark, J., Curiel, J. A., and Steelman, T. S.. 2021. “Replication Data for: Minmaxing of Bayesian Improved Surname and Geography Level Ups in Predicting Race.” Harvard Dataverse, V1. Scholar
Curiel, J. A., and Steelman, T. S.. 2018. “Redistricting out Representation: Democratic Harms in Splitting Zip Codes.” Election Law Journal 17(4):328353.CrossRefGoogle Scholar
Duque, J. C., Laniado, H., and Polo, A.. 2018. “S-Maup: Statistical Test to Measure the Sensitivity to the Modifiable Areal Unit Problem.” PLoS One 13(11):125.CrossRefGoogle Scholar
Edwards, F., Esposito, M. H., and Lee, H.. 2018. “Risk of Police-Involved Death by Race/Ethnicity and Place, United States, 2012–2018.” American Journal of Public Health 108(9):12411248.CrossRefGoogle Scholar
Einstein, K. L., Glick, D. M., and Palmer, M.. 2020. Neighborhood Defenders: Participatory Politics and America’s Housing Crisis. Cambridge: Cambridge University Press.Google Scholar
Elliott, M. N., Fremont, A., Morrison, P. A., Pantoja, P., and Lurie, N.. 2008. “A New Method for Estimating Race/Ethnicity and Associated Disparities Where Administrative Records Lack Self-Reported Race/Ethnicity.” Health Services Research 43(5p1):17221736. Scholar
Enos, R. D., Kaufman, A. R., and Sands, M. L.. 2019. “Can Violent Protest Change Local Policy Support? Evidence from the Aftermath of the 1992 Los Angeles Riot.” American Political Science Review 113(4):10121028.CrossRefGoogle Scholar
Fraga, B. L. 2018. The Turnout Gap: Race, Ethnicity, and Political Inequality in a Diversifying America. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Imai, K., and Khanna, K.. 2016. “Improving Ecological Inference by Predicting Individual Ethnicity from Voter Registration Record.” Political Analysis 24(2):263272.CrossRefGoogle Scholar
King, G. 1997. A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data. Princeton: Princeton University Press.Google Scholar
Lu, C., et al. 2019. “Examining Scientific Writing Styles from the Perspective of Linguistic Complexity.” Journal of the Association for Information Science and Technology 70(5):462475.CrossRefGoogle Scholar
Masuoka, N. 2006. “Together they Become One: Examining the Predictors of Panethnic Group Consciousness Among Asian Americans and Latinos.” Social Science Quarterly 87(5):9931011.CrossRefGoogle Scholar
Masuoka, N., Ramanathan, K., and Junn, J.. 2019. “New Asian American Voters: Political Incorporation and Participation in 2016.” Political Research Quarterly 72(4):9911003.CrossRefGoogle Scholar
Nall, C. 2018. The Road to Inequality: How the Federal Highway Program Polarized America and Undermined Cities. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Nemerever, Z., and Rogers, M.. 2021. “Measuring the Rural Continuum in Political Science.” Political Analysis 29(3):120.CrossRefGoogle Scholar
Robinson, W. S. 1950. “Ecological Correlations and the Behavior of Individuals.” American Sociological Review 15(3):351357. Scholar
Signorella, M. L. 2020. “Toward a More Just Feminism.” Psychology of Women Quarterly 44(2):256265. Scholar
Studdert, D. M., et al. 2020. “Handgun Ownership and Suicide in California.” New England Journal of Medicine 382(23):22202229.CrossRefGoogle ScholarPubMed
Swift, J. N., Goldberg, D. W., and Wilson, J. P.. 2008. “Geocoding Best Practices: Review of Eight Commonly Used Geocoding Systems.” Technical report 10, University of Southern California Research GIS Laboratory, Los Angeles. Scholar
Supplementary material: PDF

Clark et al. supplementary material

Clark et al. supplementary material

Download Clark et al. supplementary material(PDF)
PDF 666.5 KB