## 1 Introduction

Instrumental-variable (IV) estimation is an important tool for applied researchers across the social and behavioral sciences to address noncompliance with the assigned treatment in randomized experiments and unobserved confounding in observational studies (see, e.g., Sovey and Green Reference Sovey and Green2011; Dunning Reference Dunning2012). To date, the predominant framework for IV, introduced by Angrist, Imbens, and Rubin (Reference Angrist, Imbens and Rubin1996), has been based on potential outcomes. In contrast to classic structural equation models (Haavelmo Reference Haavelmo1943), which assume constant treatment effects, IV analysis based on potential outcomes allows for heterogeneous treatment effects (under the assumption of monotonicity, see below). This framework uses principal stratification (Frangakis and Rubin Reference Frangakis and Rubin2004) to classify the population into distinct, nonoverlapping but unobserved groups: compliers, always-takers, never-takers, and defiers.

The principal strata are defined by their reaction to the instrument—the assignment to take the treatment (or not). *Compliers* always comply with the assigned treatment: they take the treatment only when assigned to take it and do not take the treatment when they are not assigned. *Always-takers* always take the treatment, independent of whether they are assigned or not. *Never-takers* never take the treatment, independent of whether they are assigned or not. *Defiers* always do the opposite of the assignment: when assigned, they do not take the treatment and when not assigned, they do take it. One implication of observations’ agency over whether or not to comply with the assigned treatment is that comparing those who take the treatment to those who refuse it does not provide a causal estimate of the treatment effect since the reasons for (non-)compliance might well be correlated with other characteristics that affect the outcome. In this context, IV analysis based on the potential outcome framework offers a clearly defined set of assumptions that point-identify the treatment effect for the complier subpopulation—the local average treatment effect (LATE). The corresponding treatment effects for noncompliers (i.e., always-takers, never-takers, and defiers) are, by definition, not identified (Angrist, Imbens, and Rubin Reference Angrist, Imbens and Rubin1996).

The focus of IV analysis on compliers raises questions about how different this group is from the noncompliers and by how much the LATE differs from the average treatment effect (ATE). The latter is often the main quantity of interest for applied researchers but different from the LATE if treatment effects are heterogeneous. These questions typically become more important the weaker the instrument is, and the larger the proportion of noncompliers. The “localness” of the LATE led some researchers to doubt its usefulness for the study of economic and political phenomena (Deaton Reference Deaton2009; Heckman and Urzua Reference Heckman and Urzua2010). Like other scholars (e.g., Imbens Reference Imbens2010), we believe that rather than abandoning IV altogether, researchers should pay more attention to the potential limitations of the LATE. A transparent discussion of the external validity (or lack thereof) of the LATE beyond compliers should build on an explicit comparison of compliers, always-takers, and never-takers. In this letter, we introduce such a method for profiling compliers and noncompliers, corresponding statistical software, and provide an illustration of the approach.

Profiling compliers and noncompliers as discussed in this article can play three important roles in assessing the generalizability of the LATE. First, heterogeneity in treatment responses across compliance strata (and, therefore, how much the LATE and ATE differ) is typically driven by both observable and unobservable variables. Thus, finding that compliers and noncompliers are highly similar in terms of their observable covariates does not imply that they are also similar in terms of unobserved variables, nor does it suggest that we can generalize the LATE to the ATE without invoking additional assumptions (see, e.g., Aronow and Carnegie (Reference Aronow and Carnegie2013) and their re-weighting method). But, if profiling reveals that compliers and noncompliers are different with regard to observable covariates that are likely to be predictive of treatment effect size, we know that any attempt to directly learn about the ATE from the LATE is prone to lead to biased inferences. Second, different instruments for the same treatment often estimate different LATEs. One example for this is provided by Angrist and Pischke (Reference Angrist and Pischke2009), who compare two instruments that increase the propensity to have a third child; namely, if the first two children are twins or have the same gender. Other examples where different instruments for the same treatment exist can be found in the literature on the economic and political returns to education and on the effects of get-out-the-vote campaigns. Comparing the compliers defined by the different instruments can help explain the differences in the corresponding LATEs. Third, one can often conjecture how a slightly stronger (or weaker) instrument than the one actually used might be able to convert some marginal never-takers to compliers (and vice versa). For example, in the context of the *Washington Post* study discussed below, a slightly stronger instrument might take the form of a financial reward for actually reading the newspaper (in addition to the free subscription). Comparing the actual never-takers and compliers allows us to form expectations about the treatment effects of the additional compliers encouraged by a stronger instrument and their similarity to the LATE in the original study. And if the researcher cannot (further) increase the strength of the instrument, this comparison allows for a better understanding of the never-takers who cannot be pushed to take the treatment.

In the following, we provide a simple method to characterize compliers and noncompliers in terms of their covariates. We discuss the assumptions required for profiling and show that they are weaker than the assumptions needed for identification of the LATE if the instrument is randomly assigned. Our method can be applied to IV analysis in both experimental and observational studies and whether or not the exclusion restriction holds.

While the idea to profile compliers and noncompliers is not new, it is rarely done in practice. We reviewed all seventy-one papers using IV (including fuzzy regression discontinuity) designs that were published in the *American Journal of Political Science*, the *American Political Science Review*, the *Journal of Politics*, and *Political Analysis* between January 2013 and December 2018. None of these papers provides information as to if and how compliers differ from the rest of the sample.Footnote ^{1} In economics, the share is only slightly higher: of the 280 papers using IV or fuzzy regression discontinuity designs that were published during the same time period in the *American Economic Review*, *Econometrica*, the *Journal of Political Economy*, and the *Quarterly Journal of Economics*, 10 articles compared compliers to the entire sample in terms of their covariates.Footnote ^{2} A cursory reading of recent empirical papers in demography, epidemiology, and sociology reveals that profiling of compliers and noncompliers is similarly rare. Profiling is likely unpopular due to both a lack of awareness among researchers and the limitations or complexity of existing methods.

Prior research has used a few approaches for profiling. Pinotti (Reference Pinotti2017) profiles compliers by regressing an interaction of the covariate and the treatment on the instrumented treatment using two stage least squares (2SLS) and, based on this, compares the covariate means of compliers to the total sample. This approach is inefficient as it only leverages compliers assigned to treatment but disregards information about compliers assigned to control. Angrist and Pischke (Reference Angrist and Pischke2009) exploit variation in the first stage across covariate groups to estimate the relative likelihood that compliers (compared to the entire sample) will have a particular characteristic. Since it is focused on ratios, this method is limited to binary (or dichotomized) covariates. As an alternative method that is not limited to binary covariates, Baiocchi, Cheng, and Small (Reference Baiocchi, Cheng and Small2014) reweight the sample to estimate the covariate mean of compliers. A similar approach is Abadie’s (Reference Abadie2003) $\unicode[STIX]{x1D705}$-weighting scheme. In addition to weighting the local average response function (LARF) regression of the outcome on the treatment and the covariates, the $\unicode[STIX]{x1D705}$ weights can also be used to weight binary and continuous covariates to the subsample of compliers. In the standard LARF regression that conditions on covariates, iterative least squares algorithms might fail to find the weights that correspond to the global residual-sum-of-squares minimum since the estimated weights will be negative for always-takers and never-takers by construction.Footnote ^{3} Note, however, that when the instrument is randomized, the $\unicode[STIX]{x1D705}$ weights can be simplified such that no optimization is needed to estimate covariate means for compliers. In that case, the $\unicode[STIX]{x1D705}$ weights offer an alternative but equivalent estimator for the complier mean to ours.

Given the limited popularity of these approaches, we developed a general, simple—and, arguably, more intuitive—method. In the following, we detail the assumptions needed to identify the sample means of covariates for compliers, always-takers, and never-takers. In conjunction with the software that implements our estimator for R and STATA, we hope that this will facilitate the profiling of compliers and noncompliers as a standard practice accompanying every IV analysis.

The rest of this letter is structured as follows: the next section provides an informal summary that conveys the core idea of our approach in a nonmathematical fashion. The following section then discusses the assumptions needed to identify the covariate means of complier, always-taker, and never-taker strata in more formal terms. We then apply this estimator to provide descriptive statistics of compliers and noncompliers in a randomized encouragement experiment on the effect of reading the *Washington Post* on voting behavior and public opinion. A second application, presented in the Supplementary Materials (SM), focuses on compliers and noncompliers in a study on the effects of watching *Fox News* on voting in a referendum on affirmative action. Both examples show how profiling compliers and noncompliers is an important first step in gauging the external validity of the effect estimates. A brief conclusion discusses possible ways to generalize the proposed method beyond the binary instrument and binary treatment case and points the reader to the software that implements our estimator.

## 2 Intuition

Before we more formally discuss our method of profiling compliers and noncompliers, this section provides a nonmathematical summary of the core idea. Consider an IV scenario with a binary treatment, a binary, scalar instrument, and two-sided noncompliance. Assume that the instrument is independently assigned and that there are no defiers (such that the study group consists of compliers, always-takers, and never-takers). Under these two assumptions, we can directly identify the compliance status of some units by comparing their instrument and treatment values: subjects assigned to the control group who take the treatment are “observable” always-takers, and subjects assigned to the treatment group who do not take the treatment are “observable” never-takers. Because observable and nonobservable always-takers and never-takers, respectively, have the same covariate distribution if the instrument is independently assigned, we can directly estimate the covariate means for these two subpopulations. In contrast, compliers cannot be identified at the individual level since compliers assigned to the control group are, with respect to their realized instrument and treatment values, observably identical to never-takers assigned to control; and compliers assigned to the treatment group are observably identical to always-takers assigned to the treatment. However, by subtracting the weighted covariate mean of observable always-takers and never-takers from the covariate mean of the entire sample, we can back out the covariate mean for compliers.

## 3 Method

Following the notation of Angrist, Imbens, and Rubin (Reference Angrist, Imbens and Rubin1996) who discuss IV analysis within the potential outcome framework, we assume that every observation has two binary, fixed, and unobserved potential treatment indicators ($D(0)$ and $D(1)$) that realize a binary, observed treatment indicator ($D$). The treatment $D$ depends on the binary, scalar instrument, $Z$, i.e., $D=D(Z)$. If the unit is assigned to treatment, then $Z=1$, and $Z=0$ otherwise. Together, the instrument and treatment indicators define four subpopulations, the principal strata, as shown in Table 1.

Since only $D$, but not $D(0)$ and $D(1)$, is observed, the principal strata that a unit belongs to is unknown without further assumptions. In order to identify, and estimate, the mean of pretreatment (and preinstrument) covariate $X$, including uncertainty estimates for each of the four subpopulations, we impose two identifying assumptions. Since these two assumptions are similar to the assumptions needed to identify the LATE and weaker if the instrument is randomly assigned, profiling compliers and noncompliers comes at little additional cost.

### Assumption 1 (Monotonicity).

$D(1)\geqslant D(0)$.

### Assumption 2 (Independence of the instrument).

$D(0),D(1),X\bot ~~\bot Z$.

The first assumption, monotonicity, rules out defiers, for whom $D(1)<D(0)$. The second assumption, independence of the instrument, implies that the instrument is assigned independently of a unit’s compliance stratum and covariate value. Note that Assumption 2 is both weaker and stronger than what is needed to identify the LATE. On the one hand, for profiling, researchers do not have to invoke the exclusion restriction, which assumes that the instrument is also independent of the two potential outcomes. This implies that we can characterize compliers and noncompliers even when the exclusion restriction is violated and the instrument affects the outcome through channels other than the treatment.Footnote ^{4} On the other hand, for profiling, we have to assume that the covariate $X$ is independent of the instrument. The latter assumption will hold trivially if the instrument is randomly assigned. In addition to these two assumptions, the probability of assignment has to be bounded between 0 and 1, $0<\mathbb{P}(Z=1)<1$, and must influence the treatment, $\mathbb{E}[D|Z=1]\neq \mathbb{E}[D|Z=0]$ (first stage), as otherwise the fraction of compliers is 0.

These assumptions are sufficient to identify the covariate means for always-takers

by focusing on the observable subset of nonencouraged always-takers. Equivalently, we can identify the covariate means of never-takers by focusing on the observable never-takers:

Next, we turn to compliers. First, we employ the Law of Iterated Expectations to decompose the population mean of $X$ into a linear combination of the strata means, weighted by the size of each strata:

Invoking the monotonicity and independence assumption, we can substitute all potential treatment indicators with their observed counterparts, but for $\mathbb{E}[X|D(1)>D(0)]$,

Solving equation (4) for $\mathbb{E}[X|D(1)>D(0)]$, we get

such that the entire right-hand side is written in terms of observed treatments, such that the covariate means for compliers are identified.

Building on the intuition developed in the preceding section, Table 2 stratifies the population by realized treatment ($D$) and assignment ($Z$) to help examine these identification results. After ruling out defiers using the monotonicity assumption, we can identify the always-taker and never-taker strata from their directly observable subsets in the off-diagonal cells. The independence of the instrument assumption ensures that the observable and nonobservable always-takers and never-takers are exchangeable. The identifiability of the compliers that are not directly observable (on the main diagonal) follows from the observation that we can subtract the contribution of the always-takers and never-takers from the overall population mean to back out the mean of the complier strata.

Having established the assumptions needed to identify the different strata, we now turn to estimation. We construct our estimator by replacing the population means and shares in equations (1), (2), and (5) with their sample analogs (Manski Reference Manski1988). While the derivation of the estimators is relegated to the SM Section 1, we briefly sketch here the sample analog of the main result from equation (5).

Let $N$ be the sample size, $K_{nt}$ the number of observable never-takers, $K_{at}$ the number of observable always-takers, and $N_{Z=z}$ the number of units with realized instrument value $z$. Let $x_{i}$ be the covariate value for the $i$th unit. We write the estimators for the covariate mean of the entire sample, $\hat{\unicode[STIX]{x1D707}}$, for always-takers, $\hat{\unicode[STIX]{x1D707}}_{at}$, and for never-takers, $\hat{\unicode[STIX]{x1D707}}_{nt}$, as

By weighting these covariate means by the estimated sample shares of compliers, $\hat{\unicode[STIX]{x1D70B}}_{co}$, always-takers, $\hat{\unicode[STIX]{x1D70B}}_{at}$, and never-takers, $\hat{\unicode[STIX]{x1D70B}}_{nt}$, as in equation (5), we can estimate the covariate mean for compliers, $\hat{\unicode[STIX]{x1D707}}_{co}$, as follows:

Because the shares of the complier, always-takers, and never-takers are unknown and have to be estimated, the derivation of the standard error for the complier mean is somewhat tedious. Given the extremely low computational costs in this context, we use the bootstrap method to obtain standard errors that reflect the estimation uncertainty in both the covariate means and the sample proportions. The SM detail the results from a series of Monte Carlo experiments that verify that the empirical coverage rate closely tracks the nominal coverage rate of the 95% bootstrap confidence interval across different sample sizes. These simulations also confirm that the bias in means decreases at the expected rate as the sample size grows.

## 4 Application

Gerber, Karlan, and Bergan (Reference Gerber, Karlan and Bergan2009) report the results of a randomized field experiment in which subjects were sent a free ten-week subscription for either the *Washington Post* or the *Washington Times*. For this analysis, we focus on those $N=503$ subjects that were randomized to the *Post* or the control group and completed the baseline and the follow-up survey. The baseline survey was conducted in September 2005 and asked the respondents to, inter alia, indicate their gender, age, past turnout, and their preference for a Democratic or Republican governor. The follow-up survey was conducted during the week after the November 2005 Virginia gubernatorial election and asked the respondents about the newspapers they receive and how frequently they read them and their voting behavior during the gubernatorial election as well as their attitudes on a range of topics. Based on these data, and using ordinary least squares regression to estimate intention-to-treat (ITT) effects, the authors estimate the effect of receiving a free *Post* on a range of outcomes including political knowledge, policy positions, turnout, and voting for the Democratic or Republican candidate. Among all the outcomes considered, only the ITT effect of voting for the Democratic candidate is statistically significant: getting a free subscription to the *Post* increases the probability of selecting the Democrat by 7.9 percentage points ($p<0.082$) among voters.Footnote ^{5} SM Table S.1 shows the corresponding IV analysis. For this, we code the $D=1$ if the subjects report that they received the *Post* and 0 otherwise.Footnote ^{6} Using 2SLS, we estimate a LATE of 22.3 percentage points ($p<0.083$) for respondents that comply with the *Post* assignment.

Next, we use the estimators described above to profile compliers and noncompliers. Figure 1 shows the covariate means for the entire sample and the sample shares and covariate means for compliers, always-takers, and never-takers across eight socio-political characteristics, all measured pretreatment in the baseline survey. Numerical estimates are provided in SM Table S.2. About 34.1% of the subjects are compliers, 20.3% are always-takers, and 45.6% are never-takers.

The upper panel of Figure 1 shows that all strata have relatively similar turnout levels in 2001, 2002, and 2004. However, we find some meaningful differences in terms of socio-demographic characteristics. Compliers tend to be younger than never-takers, who, in turn, are younger than always-takers; and compliers are less likely to be female compared to always-takers and never-takers (the latter group features the highest share of women). In terms of political preferences, compliers and always-takers are both less likely to support the Republican candidate compared to never-takers, while always-takers are slightly more likely to prefer none of the candidates compared to compliers and never-takers.

These differences have two main implications for our ability to generalize the estimates from the subsample of compliers. First, since the causal effect of treatment receipt is only defined for compliers, any attempt to directly generalize those estimates to always-takers and never-takers is purely speculative. Second, who is (and is not) a complier is a function of the instrument and is fixed only in the context of the particular study analyzed. Therefore, a slightly stronger instrument might be expected to encourage some never-takers to become compliers. In this study, participants in the encouragement group were simply sent a postcard announcing that they had won a free subscription to the *Post*. Thus, combining the free subscription with, e.g., a financial incentive to read the newspaper might strengthen the instrument by incentivizing some marginal never-takers to become compliers. Would we expect the treatment effect for these additional compliers to be the same as the LATE in the original study? Our profiling method is crucial for answering this question: given the differences in age and gender between compliers and never-takers and the preexisting differences in support for the Republican candidate—all factors that are likely related to changes in party attachment and party vote switching (see, e.g., Campbell *et al.* Reference Campbell, Converse, Miller and Stokes1980)—we have little reason to assume that the LATE estimated for compliers can be generalized to other study subjects.

A second application, discussed in the SM, profiles compliers and noncompliers in a randomized encouragement experiment on the effect of watching *Fox News* on support for affirmative action (Albertson and Lawrence Reference Albertson and Lawrence2009). In this context, we find politically meaningful and statistically significant differences in the level of political interest and frequency of media consumption between compliers and never-takers.

## 5 Conclusion

In this letter, we introduced a simple method of profiling compliers, always-takers, and never-takers based on their (pretreatment) covariate characteristics. Like many prior studies on IV, we focused on a case with a binary treatment and a scalar, binary instrument. In principle, our proposed method could be generalized to categorical or continuous treatments, and nonbinary (and even multivariate) instruments, by considering all combinations of instrument and treatment levels. However, the compliers will likely change across instrument and treatment levels, which creates a heretofore unsolved aggregation problem (see Abadie Reference Abadie2003). For the case of a binary instrument and a categorical or continuous treatment, one could dichotomize the treatment variable, and profile compliers and noncompliers at the treatment level at which the first stage is strongest. A similar method could be used to dichotomize a categorical or continuous instrument. When resorting to this approximation, researchers should keep in mind that compliers and noncompliers who comply at different instrument and treatment levels might look different.

At least for the binary treatment, binary instrument case considered here, we recommend that researchers make it a standard practice to augment any IV analysis with reporting descriptive statistics of pretreatment covariates for the complier and noncomplier samples along with their shares. These statistics should form the basis of an informed discussion of the differences between compliers, always-takers, and never-takers. Such a discussion will increase our understanding of how “local” the LATE is, and forms the first step toward gauging the extent to which the findings derived for compliers can or cannot be generalized to other strata. We hope that this paper, in conjunction with the accompanying, easy-to-use software package ivdesc (for R available at CRAN and for STATA at http://github.com/sumtxt/ivdesc), facilitates the adoption of this practice.

## Acknowledgements

We would like to thank Alisha Esshaki for compiling an overview of IV applications in the social and behavioral sciences and Julian Schüssler, Luke Keele, the participants at PolMeth XXXVI, the editor Sunshine Hillygus, and two anonymous reviewers for excellent comments.

**Data Availability Statement**

The replication materials for this paper can be found at Marbach and Hangartner (Reference Marbach and Hangartner2019).

## Supplementary material

For supplementary material accompanying this paper, please visit https://doi.org/10.1017/pan.2019.48.