Hostname: page-component-77f85d65b8-lfk5g Total loading time: 0 Render date: 2026-03-29T08:42:40.580Z Has data issue: false hasContentIssue false

How Much Should We Trust Instrumental Variable Estimates in Political Science? Practical Advice Based on 67 Replicated Studies

Published online by Cambridge University Press:  03 May 2024

Apoorva Lal
Affiliation:
Independent Researcher
Mackenzie Lockhart
Affiliation:
Institution for Social and Policy Studies, Yale University, New Haven, CT 06511, USA
Yiqing Xu*
Affiliation:
Department of Political Science, Stanford University, Stanford, CA 94305, USA
Ziwen Zu
Affiliation:
Department of Political Science, University of California, San Diego, La Jolla, CA 92093, USA
*
Corresponding author: Yiqing Xu; Email: yiqingxu@stanford.edu
Rights & Permissions [Opens in a new window]

Abstract

Instrumental variable (IV) strategies are widely used in political science to establish causal relationships, but the identifying assumptions required by an IV design are demanding, and assessing their validity remains challenging. In this paper, we replicate 67 articles published in three top political science journals from 2010 to 2022 and identify several concerning patterns. First, researchers often overestimate the strength of their instruments due to non-i.i.d. error structures such as clustering. Second, IV estimates are often highly uncertain, and the commonly used t-test for two-stage-least-squares (2SLS) estimates frequently underestimate the uncertainties. Third, in most replicated studies, 2SLS estimates are significantly larger in magnitude than ordinary-least-squares estimates, and their absolute ratio is inversely related to the strength of the instrument in observational studies—a pattern not observed in experimental ones—suggesting potential violations of unconfoundedness or the exclusion restriction in the former. We provide a checklist and software to help researchers avoid these pitfalls and improve their practice.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of The Society for Political Methodology
Figure 0

Figure 1 IV studies published in the APSR, AJPS, and JOP. Our criteria rule out IV models appearing in the Supplementary Material only, in dynamic panel settings, with multiple endogenous variables, and with nonlinear link functions. Non-replicability is primarily due to a lack of data and/or coding errors.

Figure 1

Table 1 Data availability and replicability of IV articles.

Figure 2

Table 2 Types of instruments.

Figure 3

Figure 2 Original versus effective and bootstrapped F. Circles represent applications without a clustering structure and triangles represent applications with a clustering structure. Studies that do not report F-statistic are painted in red. The original F-statistics are obtained from the authors’ original model specifications and choices of variance estimators in the 2SLS regressions. They may differ from those reported in the articles because of misreporting.

Figure 4

Figure 3 Comparison of 2SLS and OLS analytic SEs. Subfigure (a) shows the distribution of the ratio between $\hat {SE}(\hat \tau _{2SLS})$ and $\hat {SE}(\hat \tau _{OLS})$, both obtained analytically. Subfigure (b) plots the relationship between the absolute values of $\hat \rho (d, \hat {d})$, the estimated correlational coefficient between d and $\hat {d}$, and the ratio (on a logarithmic scale). In one study, the analytic $\hat {SE}(\hat \tau _{2SLS})$ is much smaller than $\hat {SE}(\hat \tau _{OLS})$; we suspect that the former severely underestimates the true SE of the 2SLS estimate, likely due to a clustering structure.

Figure 5

Figure 4 Alternative inferential methods. In subfigures (a)–(c), we compare original p-values to those from alternative inferential methods, testing against the null that $\tau = 0$. Both axes use a square-root scale. Original p-values are adapted from original articles or calculated using standard-normal approximations of z-scores. Solid circles represent Arias and Stasavage (2019), where the authors argue for a null effect using IV strategy. Bootstrap-c and -t represent percentile methods based on 2SLS estimates and t-statistics, respectively, using original model specifications. Hollow triangles in subfigure (c) indicate unbounded 95% CIs from the AR test using the inversion method. Subfigure (d) presents $tF$ procedure results from 54 single instrument designs. Green and red dots represent studies remaining statistically significant at the 5% level using the $tF$ procedure and those that do not, respectively. Subfigures (a)–(c) are inspired by Figure 3 in Young (2022), and subfigure (d) by Figure 3 in Lee et al. (2022).

Figure 6

Figure 5 Relationship between OLS and 2SLS estimates. In subfigure (a), both axes are normalized by reported OLS SE estimates with the gray band representing the $[-1.96, 1.96]$ interval. Subfigure (b) displays a histogram of the logarithmic magnitudes of the ratio between reported 2SLS and OLS coefficients. Subfigures (c) and (d) plot the relationship between $|\hat \rho (d,\hat {d})|$ and the ratio of 2SLS and OLS estimates. Gray and red circles represent observational and experimental studies, respectively. Subfigure (d) highlights studies with statistically significant OLS results at the 5% level, claimed as part of the main findings.

Figure 7

Table 3 Summary of replication results

Figure 8

Figure 6 Replicated OLS and 2SLS estimates with 95% CIs (Rueda 2017, Table 5 column 1). The outcome is citizens’ reports of voting buying. The treatment is the actual polling place size. The instrument is the size of the polling station predicted by the rules limiting the voters per polling station. The magnitude of the 2SLS estimate is slightly larger than that of the OLS estimate. Similar figures for each of the 70 IV designs are shown in the SM. This plot is made by ivDiag, an open-source package in R (Note to editor, please remove the Github link; please format the font of “R” properly).

Supplementary material: File

Lal et al. supplementary material

Lal et al. supplementary material
Download Lal et al. supplementary material(File)
File 5.2 MB