Hostname: page-component-6766d58669-mzsfj Total loading time: 0 Render date: 2026-05-18T00:33:32.845Z Has data issue: false hasContentIssue false

It’s All in the Name: A Character-Based Approach to Infer Religion

Published online by Cambridge University Press:  23 March 2023

Rochana Chaturvedi
Affiliation:
Department of Computer Science, University of Illinois Chicago, Chicago, IL, USA. Email: rchatu2@uic.edu
Sugat Chaturvedi*
Affiliation:
Science Policy Research Unit (SPRU), University of Sussex, Sussex House, Falmer, Brighton BN1 9RH, UK. Email: sc2057@sussex.ac.uk
*
Corresponding author Sugat Chaturvedi
Rights & Permissions [Opens in a new window]

Abstract

Large-scale microdata on group identity are critical for studies on identity politics and violence but remain largely unavailable for developing countries. We use personal names to infer religion in South Asia—where religion is a salient social division, and yet, disaggregated data on it are scarce. Existing work predicts religion using a dictionary-based method and, therefore, cannot classify unseen names. We provide character-based machine-learning models that can classify unseen names too with high accuracy. Our models are also much faster and, hence, scalable to large datasets. We explain the classification decisions of one of our models using the layer-wise relevance propagation technique. The character patterns learned by the classifier are rooted in the linguistic origins of names. We apply these to infer the religion of electoral candidates using historical data on Indian elections and observe a trend of declining Muslim representation. Our approach can be used to detect identity groups across the world for whom the underlying names might have different linguistic roots.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of the Society for Political Methodology
Figure 0

Table 1 Descriptive statistics.

Figure 1

Figure 1 Relative character frequency heatmaps for REDS data.

Figure 2

Table 2 Heatmaps for Muslim and non-Muslim names using LRP on distinctive REDS test set names.

Figure 3

Table 3 Most relevant n-grams among Muslim and non-Muslim names.

Figure 4

Table 4 Results on test sets. Standard errors are reported in parentheses. The highest score for a metric within a panel is marked in bold. The evaluation for Name2community and Language Model is based on names classified unambiguously.

Figure 5

Figure 2 Density plots of absolute difference in reported and estimated religious counts per 1,000 people.

Figure 6

Figure 3 Muslim representation in Indian Politics during 1962–2021.

Supplementary material: Link

Chaturvedi and Chaturvedi Dataset

Link
Supplementary material: PDF

Chaturvedi and Chaturvedi supplementary material

Appendices A-G

Download Chaturvedi and Chaturvedi supplementary material(PDF)
PDF 1.5 MB