Hostname: page-component-89b8bd64d-j4x9h Total loading time: 0 Render date: 2026-05-11T07:08:10.595Z Has data issue: false hasContentIssue false

A Bayesian Alternative to Synthetic Control for Comparative Case Studies

Published online by Cambridge University Press:  13 July 2021

Xun Pang
Affiliation:
Department of International Relations, Tsinghua University, Beijing, China. E-mail: xpang@tsinghua.edu.cn
Licheng Liu
Affiliation:
Department of Political Science, Massachusetts Institute of Technology, Cambridge, MA, USA. E-mail: liulch@mit.edu
Yiqing Xu*
Affiliation:
Department of Political Science, Stanford University, Stanford, CA, USA. E-mail: yiqingxu@stanford.edu
*
Corresponding author Yiqing Xu
Rights & Permissions [Opens in a new window]

Abstract

This paper proposes a Bayesian alternative to the synthetic control method for comparative case studies with a single or multiple treated units. We adopt a Bayesian posterior predictive approach to Rubin’s causal model, which allows researchers to make inferences about both individual and average treatment effects on treated observations based on the empirical posterior distributions of their counterfactuals. The prediction model we develop is a dynamic multilevel model with a latent factor term to correct biases induced by unit-specific time trends. It also considers heterogeneous and dynamic relationships between covariates and the outcome, thus improving precision of the causal estimates. To reduce model dependency, we adopt a Bayesian shrinkage method for model searching and factor selection. Monte Carlo exercises demonstrate that our method produces more precise causal estimates than existing approaches and achieves correct frequentist coverage rates even when sample sizes are small and rich heterogeneities are present in data. We illustrate the method with two empirical examples from political economy.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2021. Published by Cambridge University Press on behalf of the Society for Political Methodology
Figure 0

Figure 1 A graphic representation of dynamic multilevel latent factor model (DM-LFM). The shaded nodes represent observed data, including untreated outcomes and covariates; the unshaded nodes represent “missing” data (treated counterfactuals) and parameters. Only one (treated) unit i is shown. $T_{0i} = a_{i}-1$ is the last period before the treatment starts to affect unit i. The focus of the graph is Period t. Covariates in periods other than t, as well as relationships between parameters and $y_{i1}(c)$, $y_{iT_{0i}(c)}$, and $y_{iT}(c)$, are omitted for simplicity.

Figure 1

Table 1. Total number of parameters.

Figure 2

Figure 2 Estimated average treatment effect on the treated (ATT): difference-in-differences (DiD) versus dynamic multilevel latent factor model (DM-LFM). The above figures show the ATT estimates and their 95% credibility intervals from the DiD and DM-LFM estimators, both implemented with Bayesian Markov Chain Monte Carlo (MCMC) algorithms. The red dashed lines represent the true ATT of the five treated units.

Figure 3

Figure 3 Posterior distributions of coefficients. (a) and (b) show the posterior mean and 95% credibility intervals of the invariant component and time-varying component of $\boldsymbol {\beta }_{it}$, respectively. (c) shows the posterior distribution of each $\boldsymbol {\omega }_{\boldsymbol {\gamma }}$, which captures the extent to which a factor influences the outcome.

Figure 4

Figure 4 Comparison of model performance: root mean square errors (RMSE) and coverage. The above figures show the RMSE (a) and the coverage rate of 95% credibility intervals (b) of the average treatment effect on the treated (ATT) estimates using three sets of models: (1) models that do not include factors or covariates; (2) models that include 10 factors but no covariates; (3) models that include 10 factors and covariates with fixed coefficients; and (4) models that include 10 factors and covariates with varying coefficients.

Figure 5

Figure 5 Actual and estimated counterfactuals for West Germany. This figure shows the per capita gross domestic product (GDP) of actual West Germany and the posterior means (with 95% credibility intervals) of its untreated outcome from 1960 to 2003. The within-sample prediction is very precise and the interval is too narrow to be seen in the figure between 1960 and 1989.

Figure 6

Figure 6 Estimated effect of reunification on West Germany. This figure shows the estimated treatment effect of reunification on West Germany’s per capita gross domestic product (GDP) using the synthetic control method (SCM) (a) and dynamic multilevel latent factor model (DM-LFM) (b), as well as the result from a placebo test using DM-LFM in which the 1987–1989 period (before reunification) is taken as “treated” (c).

Figure 7

Figure 7 The effect of election day registration (EDR) on voter turnout. This figure shows the estimated average treatment effect of EDR on voter turnout in the United States using Gsynth (Xu 2017) (a) and dynamic multilevel latent factor model (DM-LFM) (b), respectively. On the x-axis, the positive integers indicate the duration of treatment while the pre-treatment years are labeled as nonpositive integers. Period 1 is the first presidential election year in which a state implements EDR. Because the treated states adopted EDR at different points in time, the number of treated units decreases as p increases.

Figure 8

Table 2. Comparing TSCS methods for comparative case studies.

Supplementary material: Link

Pang et al. Dataset

Link
Supplementary material: PDF

Pang et al. supplementary material

Pang et al. supplementary material

Download Pang et al. supplementary material(PDF)
PDF 1 MB