Hostname: page-component-5db58dd55d-pjp64 Total loading time: 0 Render date: 2026-06-03T05:47:52.998Z Has data issue: false hasContentIssue false

A Bayesian mixture model captures temporal and spatial structure of voting blocs within longitudinal referendum data

Published online by Cambridge University Press:  09 January 2026

John O’Brien*
Affiliation:
Department of Mathematics, Bowdoin College, Brunswick, ME, USA
Rights & Permissions [Opens in a new window]

Abstract

The estimation of voting blocs is an important statistical inquiry in political science. However, the scope of these analyses is usually restricted to roll call data where individual votes are directly observed. Here, we examine a Bayesian mixture model with Dirichlet-multinomial components to infer voting blocs within longitudinal referendum data. This model infers voting bloc mixture within municipalities using state-level data aggregated at the municipal level. As a case study, we analyze the vote totals of Maine referendum questions balloted from 2008 to 2019 for 423 municipalities. Using a birth–death Markov chain Monte Carlo approach to inference, we recover the posterior distribution on the number of voting blocs, the support for each question within each bloc, and the blocs’ mixture within each municipality. We find that these voting blocs are structured by geography and largely consistent across the study period. The model finds that blocs exhibit both spatial gradients and discontinuities in their structure. Examining the statistical fit of the model, we uncover a small number of questions that show inconsistency with the statewide bloc structure and note that the content of these questions relates to specific regions. We conclude with possible statistical extensions, connections to other statistical frameworks in political science, and detail possible locations for model applications.

Information

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press on behalf of EPS Academic Ltd.
Figure 0

Table 1. Summary of referendums on each ballot in the study period. Total votes are measured as the highest total number of votes on each ballot. Median total refers to the median total number of votes for municipalities in the data set

Figure 1

Table 2. Notation and interpretations for parameters of the statistical model of the data

Figure 2

Figure 1. (a) Posterior distribution for $K$ for complete data set. As small clusters are often transient, the results after filtering these clusters are also presented. (b) Co-occupancy matrix for the voting blocs of the complete data, using the representative clustering for $K=6$. Color scale is blue at $0$ and red at $0.96$. (c) Projection of the voting bloc co-occupancy matrix onto the map of Maine. Voting blocs names are described in the Results section. Cluster (voting bloc) proportions calculated against the representative clustering.

Figure 3

Figure 2. Presentation of representative clusterings for each four-year election cycle from 2008–2011 to 2016–2019. Maps are subscripted with the starting year of the cycle. Number of blocs determined by the posterior mode of $K$ after filtering for clusters smaller than 5 municipalities. Representative clustering determined as in Figure 1(c).

Figure 4

Figure 3. Barcode plot provides a summary of support for each question within each municipality minus the overall level of support for each question across all municipalities. Questions are arranged vertically by year and municipalities arranged horizontally according to the same voting bloc clustering as Figure 1. Grey lines show the boundaries between voting blocs and years.

Figure 5

Figure 4. PCA projection and t-SNE embedding in two-dimensions from referendum data using a Jensen-Shannon divergence for pairwise distances. Municipalities are colored according to the mixture proportions used in Figure 1. Colors the same as in Figure 1c.

Figure 6

Figure 5. Median and standard deviation of the estimated question fit $(q-p)$ across all municipalities by question. Five questions exhibit higher standard deviation, marked in red. Plots of the estimated question fit by question for each municipality are given in Supplementary Figures 1–3.

Figure 7

Figure 6. Model fit across a typical question (left) and a “local” question (right). Quality of fit is plotted in a grey to red scale with white being perfect fit (0) and red being the worst observed fit (0.06).

Supplementary material: File

O’Brien supplementary material

O’Brien supplementary material
Download O’Brien supplementary material(File)
File 10.8 MB
Supplementary material: Link

O’Brien Dataset

Link