1 Introduction
In pairwise randomized experiments, researchers collect pairs of units, where the two units in each pair share the exact or similar values of matched-on variables, and randomly assign treatment to one unit in each pair (Imbens and Rubin Reference Imbens and Rubin2015, ch. 10). The difference-in-means of outcomes between the treated and control groups estimates the average treatment effect (hereafter, ATE) on the outcome without bias. This design is an efficient tool to enable pretreatment balance, in particular when the number of units is small (Donner and Klar Reference Donner and Klar2000, 32).
Nonetheless, a major problem of pairwise randomized experiments is attrition; the outcomes of some units are sometimes missing (Donner and Klar Reference Donner and Klar2000, 40; Glennerster and Takavarasha Reference Glennerster and Takavarasha2013, 159; Hayes and Moulton Reference Hayes and Moulton2009, 72–74). Moreover, attrition might be nonignorable, that is, it may be related to the value of an outcome (Allison Reference Allison2002, 4–5). The typical solutions to attrition, inverse probability weighting and (multiple) imputation techniques, do not work in this situation (Allison Reference Allison2002; Little and Rubin Reference Little and Rubin2002). Accordingly, if analysts are not satisfied with bounds (Imai and Jiang Reference Imai and Jiang2018), they will usually employ one of the following two methods.
The first, but naive, approach to attrition is the unitwise deletion estimator (UDE). That is, analysts delete only missing observations and apply the difference-in-means estimator to all the remaining units to estimate the ATE (Glennerster and Takavarasha Reference Glennerster and Takavarasha2013, 159). It is, however, well known that the UDE can be biased if attrition is nonignorable (e.g., Little and Rubin Reference Little and Rubin2002, 41–44).
The second textbook tool to address attrition is the pairwise deletion estimator (PDE), which deletes missing units as well as the other units in the same pairs and calculates the difference-in-means by using units in the remaining pairs only (Donner and Klar Reference Donner and Klar2000, 40; Imai, King, and Nall Reference Imai, King and Nall2009, 44). The PDE protects the balance of matched-on variables, but it is not sufficient to retain the distribution of treatment effects across units and thus guarantee unbiased estimation of the ATE. Rather, as Dunning (Reference Dunning2011) and Gerber and Green (Reference Gerber and Green2012) warn, the PDE can be biased. Scholars may also suspect that since the PDE discards more units, it will be less efficient than the UDE (Hayes and Moulton Reference Hayes and Moulton2009, 73–74).
Even though both the UDE and the PDE can be biased, we should use one of them given no other option. Previous studies do not examine the bias and variance of these estimators in this situation. This paper takes the design of pairwise randomization seriously, derives the properties of the two estimators and their variance estimators, and recommends the PDE rather than the UDE. Proof of the propositions, detailed comments on them, and an application example are available in the Supplementary Material.
2 Finite Sample
2.1 Setting
Suppose that there are $N (\geq 2)$ pairs and each pair is composed of two units. Let $Y_{ij}$ , $R_{ij}$ , and $X_{ij}$ denote the realized outcome, realized response, and treatment status, respectively, for unit i of pair j where $i =1,2$ and $j = 1,2,\ldots ,N$ . $R_{ij}$ is equal to one (zero) when $Y_{ij}$ is observed (missing). $X_{ij}$ is equal to one (zero) when unit i of pair j is assigned to the treated (control) group. Since this is a pairwise randomized experiment, for every pair j, either $X_{1j}=1, X_{2j}=0$ or $X_{1j}=0, X_{2j}=1$ holds.
I make the stable unit treatment value assumption (Imbens and Rubin Reference Imbens and Rubin2015). $Y_{ij}(1)$ and $R_{ij}(1)$ denote the potential outcome and response for unit i of pair j if treatment is assigned to the unit ( $X_{ij} = 1$ ), respectively. $R_{ij}(1)$ is equal to one (zero) when $Y_{ij} = Y_{ij}(1)$ is observed (missing) in the case of treatment. $Y_{ij}(0)$ and $R_{ij}(0)$ are defined similarly for the case of control.
The main estimand in this section, the finite sample ATE, is denoted by $\tau \equiv E \{ Y_{ij}(1) - Y_{ij}(0) \} \equiv \frac {1}{2N} \sum _{j=1}^{N} \sum _{i=1}^{2} \{ Y_{ij}(1) - Y_{ij}(0) \}$ . The full sample estimator is denoted by $\hat {\tau }_{F} \equiv E(Y_{ij} \mid X_{ij} = 1) - E(Y_{ij} \mid X_{ij} = 0)$ . This estimator is not available in the case of attrition. Rather, analysis of $\hat {\tau }_{F}$ provides reference benchmarks against which this study compares the properties of the UDE and PDE.Footnote 1 The UDE is denoted by $\hat {\tau }_{U} \equiv E(Y_{ij} \mid X_{ij} = 1, R_{ij} = 1) - E(Y_{ij} \mid X_{ij} = 0, R_{ij} = 1)$ . Obviously, it can be defined only when $N_{t} \equiv \sum _{j=1}^{N} \sum _{i=1}^{2} X_{ij} R_{ij} \geq 1$ and $N_{c} \equiv \sum _{j=1}^{N} \sum _{i=1}^{2} (1 - X_{ij}) R_{ij} \geq 1$ . The PDE is denoted by $\hat {\tau }_{P} \equiv E(Y_{ij} \mid X_{ij} = 1, R_{1j} = R_{2j} = 1) - E(Y_{ij} \mid X_{ij} = 0, R_{1j} = R_{2j} = 1)$ . It can be defined only when $N_{tc} \equiv \sum _{j=1}^{N} R_{1j} R_{2j} \geq 1$ .
In order to clarify the properties of estimators in the propositions below, for $x=0,1$ , this study denotes the between-pair and within-pair deviations of potential outcome by $\beta _{ij}(x) \equiv \{ Y_{1j}(x) + Y_{2j}(x) \}/2 - E \{ Y_{i^{\prime }j^{\prime }}(x) \}$ and $\omega _{ij}(x) \equiv Y_{ij}(x) - \{ Y_{1j}(x) + Y_{2j}(x) \}/2$ .
This study assumes pairwise randomization of treatment assignment. That is, each $X_{1j}$ is ignorable and independent, and $\Pr (X_{1j} = 1) = 1/2$ . In addition, the following three assumptions of potential responses are optional; I will invoke one of them in each of the propositions below. “FS” stands for the “finite sample.” First, I specify the assumption under which no outcomes are missing and thus $\hat {\tau }_{F}$ is available.
Assumption 1 No Attrition: FS
$\forall i, j, R_{ij}(1) = R_{ij}(0) = 1.$
Second, we assume a perfect match in the sense that the potential responses are the same between the units in every pair. For instance, in the cases of (monozygotic) twins and littermates of the same sex, matched-on variables (e.g., [part of] DNA) may completely explain the missingness pattern. Under this assumption, $N_{t}$ and $N_{c}$ are constant regardless of treatment assignment, and the properties of $\hat {\tau }_{U}$ can be expressed simply enough to understand its essence.
Assumption 2 Unitwise Matched Attrition: FS
$\forall j, R_{1j}(1) = R_{2j}(1), R_{1j}(0) = R_{2j}(0).$
Finally, we consider the assumption under which $N_{tc}$ is constant regardless of treatment assignment, when the properties of $\hat {\tau }_{P}$ can be presented concisely. This assumption holds if (but not only if) either (or both) of the following two scenarios is true in every pair j. The first scenario is when attrition is unitwise matched ( $R_{1j}(1) = R_{2j}(1), R_{1j}(0) = R_{2j}(0)$ ). (Therefore, Assumption 2 always leads to Assumption 3 but not vice versa.) The second scenario is when each unit is either an “always-reporter” ( $R_{ij}(1) = R_{ij}(0) = 1$ ) or a “never-reporter” ( $R_{ij}(1) = R_{ij}(0) = 0$ ) (Gerber and Green Reference Gerber and Green2012, 225). For instance, in the cases of blind tests, subliminal stimuli, and administrative records, it is likely that $R_{ij}(1) = R_{ij}(0)$ .
Assumption 3 Pairwise Matched Attrition: FS
$\forall j, R_{1j}(1) R_{2j}(0) = R_{1j}(0) R_{2j}(1)$ .
2.2 Bias
In this section, the operator $\mathbb {E}(\cdot )$ takes expectation over the random assignment of the treatment. Now, I present the bias of each ATE estimator.
Proposition 1 Bias of ATE Estimators: FS
-
(1) Under Assumption 1, $\mathbb {E}(\hat {\tau }_{F} ) - \tau = 0.$
-
(2) Under Assumption 3 and $N_{tc} \geq 1$ , $\mathbb {E}(\hat {\tau }_{P} )- \tau = E \{ \beta _{ij} (1) - \beta _{ij} (0) \mid R_{1j}(1) = R_{2j}(1) = R_{1j}(0) = R_{2j}(0) = 1 \}.$
-
(3) Under Assumption 2 and $N_{t}, N_{c} \geq 1$ , $\mathbb {E}(\hat {\tau }_{U}) - \tau = E\{ \beta _{ij} (1) \mid R_{ij}(1) = 1\} - E\{ \beta _{ij} (0) \mid R_{ij}(0) = 1\}$ .
A few remarks are in order. First, unless we assume ignorable attrition, not only $\hat {\tau }_{U}$ but also $\hat {\tau }_{P}$ are biased for $\tau $ . Second, $\hat {\tau }_{P}$ has a causal interpretation under a weaker assumption than $\hat {\tau }_{U}$ . Under Assumption 3 and $N_{tc} \geq 1$ , I define a kind of local average treatment effect (LATE) of “always-reporting pairs” by $\tau _{P} \equiv E \{ Y_{ij}(1) - Y_{ij}(0) \mid R_{1j}(1) = R_{2j}(1) = R_{1j}(0) = R_{2j}(0) = 1 \}$ .Footnote 2 This is a causal effect because it is a principal effect (Frangakis and Rubin Reference Frangakis and Rubin2002) where the corresponding principal stratum is a set of such pairs that $R_{1j}(1) = R_{2j}(1) = R_{1j}(0) = R_{2j}(0) = 1$ . It follows that $\hat {\tau }_{P}$ is unbiased for $\tau _{P}$ : $\mathbb {E}(\hat {\tau }_{P} ) - \tau _{P} = 0$ . This argument may correctly remind readers of instrumental variable estimation for noncompliance cases. By contrast, even under Assumption 2 and $N_{t}, {N_{c} \geq 1}$ , it is difficult to interpret $\hat {\tau }_{U}$ as a causal effect unless $R_{ij}(1) = R_{ij}(0)$ for all i and j, in which case $\hat {\tau }_{U}$ is reduced to $\hat {\tau }_{P}$ .
2.3 Variance
In this section, $\mathrm {Var} (\cdot )$ and $\mathrm {Cov} (\cdot , \cdot )$ denote the finite sample variance and covariance, respectively, and the operator $\mathbb {V}^2 (\cdot )$ takes variance over the random assignment of the treatment. Here are the variances of the three ATE estimators.
Proposition 2 Variance of ATE Estimators: FS
-
(1) Under Assumption 1,
$$ \begin{align*} \mathbb{V}^2 (\hat{\tau}_{F}) = \frac{1}{N} [ \mathrm{Var} \{ \omega_{ij}(1) \} + \mathrm{Var} \{ \omega_{ij}(0) \} + 2 \mathrm{Cov} \{ \omega_{ij}(1), \omega_{ij}(0) \}]. \end{align*} $$ -
(2) Under Assumption 3 and $N_{tc} \geq 1$ ,
$$ \begin{align*} \mathbb{V}^2 (\hat{\tau}_{P}) & = \frac{1}{N_{tc}} [ \mathrm{Var} \{ \omega_{ij}(1) \mid R_{1j}(1) = R_{2j}(1) = R_{1j}(0) = R_{2j}(0) = 1 \} \\ & \quad + \mathrm{Var} \{ \omega_{ij}(0) \mid R_{1j}(1) = R_{2j}(1) = R_{1j}(0) = R_{2j}(0) = 1 \}\\ & \quad + 2 \mathrm{Cov} \{ \omega_{ij}(1), \omega_{ij}(0) \mid R_{1j}(1) = R_{2j}(1) = R_{1j}(0) = R_{2j}(0) = 1 \}].\end{align*} $$ -
(3) Under Assumption 2 and $N_{t}, N_{c} \geq 1$ ,
$$ \begin{align*} \mathbb{V}^2 (\hat{\tau}_{U})& = \frac{1}{N_{t}} \mathrm{Var} \{ \omega_{ij}(1) \mid R_{ij}(1) = 1\} + \frac{1}{N_{c}} \mathrm{Var} \{ \omega_{ij}(0) \mid R_{ij}(0) = 1\}\\ & \quad + \frac{2 N_{tc}}{ N_{t} N_{c} } \mathrm{Cov} \{ \omega_{ij}(1), \omega_{ij}(0) \mid R_{1j}(1) = R_{2j}(1) = R_{1j}(0) = R_{2j}(0) = 1 \}. \end{align*} $$
It is worth while mentioning that even though the number of units used for estimation is not smaller for $\hat {\tau }_{U}$ than for $\hat {\tau }_{P}$ (i.e., $N_{t}, N_{c} \geq N_{tc}$ ), $\hat {\tau }_{U}$ can be less efficient than $\hat {\tau }_{P}$ (i.e., $\mathbb {V}^2 (\hat {\tau }_{U})> \mathbb {V}^2(\hat {\tau }_{P})$ ) if $\mathrm {Var} \{ \omega _{ij}(x) \mid R_{1j}(x) = R_{2j}(x) = 1, R_{1j}(1 - x) = R_{2j}(1 - x) = 0 \}$ is sufficiently larger than $ \mathrm {Var} \{ \omega _{ij}(x) \mid R_{1j}(1) = R_{2j}(1) = R_{1j}(0) = R_{2j}(0) = 1 \}$ for $x=0,1$ .
2.4 Variance Estimator
How should we estimate the variances of these three ATE estimators? For certain situations, some scholars advocate “breaking the matches” (Lynn and McCulloch Reference Lynn and McCulloch1992), namely, analyzing data from pairwise randomized experiments as if they are completely randomized experiments. Thus, I begin with the Neyman variance estimator. Suppose that Assumption 1 holds. For $x=0,1$ , if $X_{ij} = x$ , a natural estimator of $\omega _{ij}(x)$ is $\hat {\omega }_{ij}(x) \equiv Y_{ij} - E(Y_{ij} \mid X_{ij} = x)$ . Accordingly, we may estimate $\mathrm {Var} \{ \omega _{ij}(x) \}$ in the first and second terms of the equation in Proposition 2 (1) by
The third term of the equation in Proposition 2 (1) “is generally impossible to estimate empirically because we never observe both $Y_{ij}(1)$ and $Y_{ij}(0)$ for the same unit” (Imbens and Rubin Reference Imbens and Rubin2015, 92). Thus, if we dismiss the third term, we derive the Neyman variance estimator of $\hat {\tau }_{F}$ as
Similarly, without Assumption 1, if $N_{tc} \geq 2$ or $N_{t}, N_{c} \geq 2$ , we can derive the Neyman variance estimators of $\hat {\tau }_{P}$ or $\hat {\tau }_{U}$ as
respectively.
Nonetheless, researchers do not have to give up estimating the third terms in the equations in Proposition 2. A merit of pairwise randomized experiments is that even if $X_{1j}=1, R_{2j}=1$ , analysts may estimate $\omega _{1j}(0)$ by $\hat {\omega }_{1j}(0) \equiv -\hat {\omega }_{2j}(0)$ because $\omega _{1j}(0) = - \omega _{2j}(0)$ . This finding is the most important contribution of this study. Therefore, when $N_{tc} \geq 2$ , we may estimate $\mathrm {Cov} \{ \omega _{ij}(1), \omega _{ij}(0) \mid R_{1j}(1) = R_{2j}(1) = R_{1j}(0) = R_{2j}(0) = 1 \}$ in the third terms by
where $Y_{tj} = \sum ^{2}_{i=1} X_{ij} Y_{ij}$ and $Y_{cj} = \sum ^{2}_{i=1} (1 - X_{ij}) Y_{ij}$ . Accordingly, when $N_{tc} \geq 2$ , I propose the adjusted Neyman variance estimators as
It turns out that the adjusted Neyman variance estimator of $\hat {\tau }_{F}$ is reduced to what scholars recommend (Gerber and Green Reference Gerber and Green2012, 77; Imai Reference Imai2008, 4861; Imbens and Rubin Reference Imbens and Rubin2015, 227). This paper extends it to $\hat {\tau }_{P}$ and, in particular, $\hat {\tau }_{U}$ . I now derive the properties of these variance estimators.
Proposition 3 Bias of the Neyman Variance Estimators: FS
-
(1) Under Assumption 1,
$$ \begin{align*} \mathbb{E}\{ \widehat{\mathbb{V}}^{\text{Neyman}} (\hat{\tau}_{F})\} - \mathbb{V}^2(\hat{\tau}_{F}) = \frac{1}{N - 1} [ \mathrm{Var} \{ \beta_{ij}(1) \} + \mathrm{Var} \{ \beta_{ij}(0) \} ] - \frac{2}{N} \mathrm{Cov} \{ \omega_{ij}(1), \omega_{ij}(0) \}. \end{align*} $$ -
(2) Under Assumption 3 and $N_{tc} \geq 2$ ,
$$ \begin{align*} \begin{split} \mathbb{E} \{ \widehat{\mathbb{V}}^{\text{Neyman}} (\hat{\tau}_{P}) \} - \mathbb{V}^2(\hat{\tau}_{P}) &= \frac{1}{N_{tc} - 1} [ \mathrm{Var} \{ \beta_{ij}(1) \mid R_{1j}(1) = R_{2j}(1) = R_{1j}(0) = R_{2j}(0) = 1 \} \\ & \quad + \mathrm{Var} \{ \beta_{ij}(0) \mid R_{1j}(1) = R_{2j}(1) = R_{1j}(0) = R_{2j}(0) = 1 \} ]\\ &\quad - \frac{2}{N_{tc}} \mathrm{Cov} \{ \omega_{ij}(1), \omega_{ij}(0) \mid R_{1j}(1) = R_{2j}(1) = R_{1j}(0) = R_{2j}(0) = 1 \}. \end{split} \end{align*} $$ -
(3) Under Assumption 2 and $N_{t}, N_{c} \geq 2$ ,
$$ \begin{align*} \begin{split} \mathbb{E}\{ \widehat{\mathbb{V}}^{\text{Neyman}} (\hat{\tau}_{U})\} - \mathbb{V}^2(\hat{\tau}_{U}) & = \frac{1}{N_{t} - 1} \mathrm{Var} \{ \beta_{ij}(1) \mid R_{ij}(1) = 1 \} + \frac{1}{N_{c} - 1} \mathrm{Var} \{ \beta_{ij}(0) \mid R_{ij}(0) = 1 \}\\ & \quad - \frac{2N_{tc}}{N_{t} N_{c}} \mathrm{Cov} \{ \omega_{ij}(1), \omega_{ij}(0) \mid R_{1j}(1) = R_{2j}(1) = R_{1j}(0) = R_{2j}(0) = 1 \}. \end{split} \end{align*} $$
Proposition 4 Bias of the Adjusted Neyman Variance Estimators: FS
-
(1) Under Assumption 1,
$$ \begin{align*} \mathbb{E} \{ \widehat{\mathbb{V}}^{\text{Adj-Neyman}} (\hat{\tau}_{F}) \} - \mathbb{V}^2(\hat{\tau}_{F}) = \frac{1}{N - 1} \mathrm{Var} \{ \beta_{ij}(1) - \beta_{ij}(0) \} \geq 0. \end{align*} $$ -
(2) Under Assumption 3 and $N_{tc} \geq 2$ ,
$$ \begin{align*} \begin{split} &\quad \mathbb{E} \{ \widehat{\mathbb{V}}^{\text{Adj-Neyman}} (\hat{\tau}_{P})\} - \mathbb{V}^2(\hat{\tau}_{P}) \\ &= \frac{1}{N_{tc} - 1} \mathrm{Var} \{ \beta_{ij}(1)- \beta_{ij}(0) \mid R_{1j}(1) = R_{2j}(1) = R_{1j}(0) = R_{2j}(0) = 1 \} \geq 0. \end{split} \end{align*} $$ -
(3) Under Assumption 2 and $N_{tc} \geq 2$ ,
$$ \begin{align*} \begin{split} &\quad \mathbb{E} \{ \widehat{\mathbb{V}}^{\text{Adj-Neyman}} (\hat{\tau}_{U})\} - \mathbb{V}^2(\hat{\tau}_{U}) \\ &= \frac{1}{N_{t} - 1} \mathrm{Var} \{ \beta_{ij}(1) \mid R_{ij}(1) = 1 \} + \frac{1}{N_{c} - 1}\mathrm{Var} \{ \beta_{ij}(0) \mid R_{ij}(0) = 1 \} \\ &\quad -\frac{2 N^{2}_{tc}}{ N_{t} N_{c} (N_{tc} - 1)} \mathrm{Cov} \{ \beta_{ij}(1), \beta_{ij}(0) \mid R_{1j}(1) = R_{2j}(1) = R_{1j}(0) = R_{2j}(0) = 1 \}. \end{split} \end{align*} $$
A merit of $\widehat{\mathbb{V}}^{\text {Adj-Neyman}}(\hat {\tau }_{F})$ and $\widehat{\mathbb{V}}^{\text {Adj-Neyman}}(\hat {\tau }_{P})$ is that they cannot have negative bias (and thus they are conservative). By contrast, the other variance estimators can be downwardly biased.
3 Super-Population
3.1 Setting
Following Imai (Reference Imai2008) and Imbens and Rubin (Reference Imbens and Rubin2015, chs. 6 and 10, esp. pp. 109 and 229), this paper supposes that the above N pairs of a finite sample are drawn from a super-population that is composed of $N^{*} (> N)$ pairs, with $N^{*}$ large, but countable. I define super-population variables and operators in the same way as the corresponding finite sample variables and operators, and denote them by adding superscript * to each term. In particular, the main estimand of this section, the super-population ATE, is denoted by $\tau ^{*} \equiv E^{*} \{ Y^{*}_{i^{*}j^{*}}(1) - Y^{*}_{i^{*}j^{*}}(0) \} \equiv \frac {1}{2N^{*}} \sum _{j^{*}=1}^{N^{*}} \sum _{i^{*}=1}^{2} \{ Y^{*}_{i^{*}j^{*}}(1) - Y^{*}_{i^{*}j^{*}}(0) \}$ .
I assume random sampling of pairs. In addition, the following three assumptions of potential responses are optional in the same spirit as the finite sample case. SP stands for the “super-population.”
Assumption 1* No Attrition: SP
$\forall i^{*}, j^{*}, R^{*}_{i^{*}j^{*}}(1) = R^{*}_{i^{*}j^{*}}(0) = 1.$
Assumption 2* Unitwise Matched Attrition: SP
$\forall j^{*}, R^{*}_{1j^{*}}(1) = R^{*}_{2j^{*}}(1), R^{*}_{1j^{*}}(0) = R^{*}_{2j^{*}}(0).$
Assumption 3* Pairwise Matched Attrition: SP
$\forall j^{*}, R^{*}_{1j^{*}}(1) R^{*}_{2j^{*}}(0) = R^{*}_{1j^{*}}(0) R^{*}_{2j^{*}}(1).$
Note that unlike the case of a finite sample, in the super-population perspective, $N_{t}$ , $N_{c}$ , and $N_{tc}$ are not constant across sampling of pairs even under Assumption 2* or 3* .
3.2 Bias
In this section, the operator $\mathbb {E}^{*}(\cdot )$ takes expectation not only over the random assignment of the treatment but also over the random sampling of the pairs. Define
The bias of each ATE estimator is as follows.
Proposition 1* Bias of ATE Estimators: SP
-
(1) Under Assumption 1* , $\mathbb {E}^{*} ( \hat {\tau }_{F} ) - \tau ^{*}= 0.$
-
(2) Under Assumption 3* ,
$$ \begin{align*} \mathbb{E}^{*} ( \hat{\tau}_{P} \mid N_{tc} \geq 1) - \tau^{*} = E^{*} \{ \beta^{*}_{i^{*}j^{*}}(1) - \beta^{*}_{i^{*}j^{*}}(0) \mid R^{*}_{1j^{*}}(1) = R^{*}_{2j^{*}}(1) = R^{*}_{1j^{*}}(0) = R^{*}_{2j^{*}}(0) = 1 \}. \end{align*} $$ -
(3) Under Assumption 2* ,
$$ \begin{align*} \begin{split} &\quad \mathbb{E}^{*} ( \hat{\tau}_{U} \mid N_{t}, N_{c} \geq 1)- \tau^{*} \\ &= \Big\{ \mathbb{E}^{*} \Big( \frac{ N_{tc} }{ N_{t}} \Big| N_{t}, N_{c} \geq 1 \Big) \bar{\beta}^{*}(1 \mid 1, 1) + \mathbb{E}^{*} \Big( \frac{N_{t} - N_{tc} }{ N_{t}} \Big| N_{t}, N_{c} \geq 1 \Big) \bar{\beta}^{*}(1 \mid 1, 0) \Big\} \\ &\quad - \Big\{ \mathbb{E}^{*} \Big( \frac{ N_{tc} }{ N_{c}} \Big| N_{t}, N_{c} \geq 1 \Big) \bar{\beta}^{*}(0 \mid 1, 1) + \mathbb{E}^{*} \Big( \frac{ N_{c} - N_{tc} }{ N_{c}} \Big| N_{t}, N_{c} \geq 1 \Big) \bar{\beta}^{*}(0 \mid 0, 1) \Big\}. \end{split} \end{align*} $$
As in the case of a finite sample, unless we assume ignorable attrition, not only $\hat {\tau }_{U}$ but also $\hat {\tau }_{P}$ is biased for $\tau ^{*}$ . It also holds that $\mathbb {E}^{*}(\hat {\tau }_{P} ) - \tau ^{*}_{P} = 0$ . Furthermore, note that in general,
because
3.3 Variance
In this section, $\mathrm {Var}^{*} (\cdot )$ and $\mathrm {Cov}^{*} (\cdot , \cdot )$ denote the super-population variance and covariance, respectively, and the operators $\mathbb {V}^{2*} (\cdot )$ and $\mathbb {V}^{*} (\cdot , \cdot )$ take variance and covariance, respectively, not only over the random assignment of the treatment but also over the random sampling of the pairs. In addition, I assume that under Assumption 3* , either $\lim _{N^{*} \rightarrow \infty } N^{*}_{tc} = \infty $ or $\lim _{N^{*} \rightarrow \infty } N^{*}_{tc} < 2$ holds. I also assume that under Assumption 2* , the same conditions hold not only for $N^{*}_{tc}$ but also for $N^{*}_{t} - N^{*}_{tc}$ and $N^{*}_{c} - N^{*}_{tc}$ .
Below, I derive the super-population variance of the three ATE estimators in the limit. Note that I increase $N^{*}$ , not N.
Proposition 2* Variance of ATE Estimators: SP
-
(1) Under Assumption 1* ,
$$ \begin{align*}\lim_{N^{*} \rightarrow \infty} \mathbb{V}^{2*}( \hat{\tau}_{F} ) = \frac{1}{N} [ \mathrm{Var}^{*} \{ \beta^{*}_{i^{*}j^{*}}(1) - \beta^{*}_{i^{*}j^{*}}(0) \} + \mathrm{Var}^{*} \{ \omega^{*}_{i^{*}j^{*}}(1) + \omega^{*}_{i^{*}j^{*}}(0) \} ]. \end{align*} $$ -
(2) Under Assumption 3* ,
$$ \begin{align*}&\lim_{N^{*} \rightarrow \infty} \mathbb{V}^{2*}( \hat{\tau}_{P}\mid N_{tc} \geq 1)\\& = \mathbb{E}^{*} \Big( \frac{1}{N_{tc}} \Big| N_{tc} \geq 1\Big) [ \mathrm{Var}^{*} \{\beta^{*}_{i^{*}j^{*}}(1) - \beta^{*}_{i^{*}j^{*}}(0)\mid R^{*}_{1j^{*}}(1) = R^{*}_{2j^{*}}(1) = R^{*}_{1j^{*}}(0) = R^{*}_{2j^{*}}(0) = 1 \} \\& \quad+ \mathrm{Var}^{*} \{ \omega^{*}_{i^{*}j^{*}}(1) + \omega^{*}_{i^{*}j^{*}}(0)\mid R^{*}_{1j^{*}}(1) = R^{*}_{2j^{*}}(1) = R^{*}_{1j^{*}}(0) = R^{*}_{2j^{*}}(0) = 1 \}].\end{align*} $$ -
(3) Under Assumption 2* ,
$$\begin{align*}& \lim_{N^{*} \rightarrow \infty} \mathbb{V}^{2*}( \hat{\tau}_{U} \mid N_{t}, N_{c} \geq 1)\\ &=\Big(\Big[ \mathbb{E}^{*}\Big( \frac{N_{tc} }{N_{t}^2}\Big| N_{t}, N_{c} \geq 1 \Big )\{ \tilde{\beta}^{2*}(1 \mid 1, 1) + \tilde{\omega}^{2*}(1 \mid 1, 1) \} \\ &\quad + \mathbb{E}^{*}\Big( \frac{N_{t} - N_{tc} }{N_{t}^2}\Big| N_{t}, N_{c} \geq 1 \Big ) \{ \tilde{\beta}^{2*}(1 \mid 1, 0) + \tilde{\omega}^{2*} (1 \mid 1, 0) \} \Big] \\ &\quad +\Big[ \mathbb{E}^{*}\Big( \frac{N_{tc} }{N_{c}^2}\Big| N_{t}, N_{c} \geq 1 \Big )\{ \tilde{\beta}^{2*}(0 \mid 1, 1) + \tilde{\omega}^{2*} (0 \mid 1, 1) \} \\ &\quad + \mathbb{E}^{*}\Big( \frac{N_{c} - N_{tc} }{N_{c}^2}\Big| N_{t}, N_{c} \geq 1 \Big ) \{ \tilde{\beta}^{2*}(0 \mid 0, 1) + \tilde{\omega}^{2*}(0 \mid 0, 1) \} \Big] \\ &\quad - 2 \mathbb{E}^{*}\Big (\frac{N_{tc} }{ N_{t} N_{c} } \Big| N_{t}, N_{c} \geq 1 \Big ) [ \mathrm{Cov}^{*}\{ \beta^{*}_{i^{*}j^{*}}(1), \beta^{*}_{i^{*}j^{*}}(0) \mid R^{*}_{1j^{*}}(1) = R^{*}_{2j^{*}}(1) = R^{*}_{1j^{*}}(0) = R^{*}_{2j^{*}}(0) = 1 \}\\ &\quad - \mathrm{Cov}^{*} \{ \omega^{*}_{i^{*}j^{*}}(1), \omega^{*}_{i^{*}j^{*}}(0) \mid R^{*}_{1j^{*}}(1) = R^{*}_{2j^{*}}(1) = R^{*}_{1j^{*}}(0) = R^{*}_{2j^{*}}(0) = 1 \} ]\Big) \\ &\quad +\Big[ \mathbb{V}^{2*} \Big( \frac{N_{tc}}{N_{t}} \Big| N_{t}, N_{c} \geq 1 \Big) \{ \bar{\beta}^{*}(1 \mid 1, 1) - \bar{\beta}^{*}(1 \mid 1, 0) \}^2 \\ &\quad + \mathbb{V}^{2*} \Big( \frac{N_{tc}}{N_{c}} \Big| N_{t}, N_{c} \geq 1 \Big) \{ \bar{\beta}^{*}(0 \mid 1, 1) - \bar{\beta}^{*}(0 \mid 0, 1) \}^2\\ &\quad - 2 \mathbb{V}^{*} \Big( \frac{N_{tc}}{N_{t}}, \frac{N_{tc}}{N_{c}} \Big| N_{t}, N_{c} \geq 1 \Big) \{ \bar{\beta}^{*}(1 \mid 1, 1) - \bar{\beta}^{*}(1 \mid 1, 0) \} \{ \bar{\beta}^{*}(0 \mid 1, 1) - \bar{\beta}^{*}(0 \mid 0, 1) \}\Big],\end{align*} $$where$$ \begin{align*} \tilde{\beta}^{2*}(x \mid r_{1}, r_{0}) & = \mathrm{Var}^{*} \{ \beta^{*}_{i^{*}j^{*}}(x) \mid R^{*}_{1j^{*}}(1) = R^{*}_{2j^{*}}(1) = r_{1}, R^{*}_{1j^{*}}(0) = R^{*}_{2j^{*}}(0) = r_{0} \}\\ \tilde{\omega}^{2*}(x \mid r_{1}, r_{0}) & = \mathrm{Var}^{*} \{ \omega^{*}_{i^{*}j^{*}}(x) \mid R^{*}_{1j^{*}}(1) = R^{*}_{2j^{*}}(1) = r_{1}, R^{*}_{1j^{*}}(0) = R^{*}_{2j^{*}}(0) = r_{0} \}. \end{align*} $$
Like the case of a finite sample, $\mathbb {V}^{2*}( \hat {\tau }_{U} \mid N_{tc} \geq 1)$ can be larger than $\mathbb {V}^{2*}( \hat {\tau }_{P}\mid N_{tc} \geq 1)$ .
3.4 Variance Estimator
Finally, I show the super-population biases of the two variance estimators in the limit.
Proposition 3* Bias of the Neyman Variance Estimators: SP
-
(1) Under Assumption 1* ,
$$ \begin{align*} \lim_{N^{*} \rightarrow \infty} [\mathbb{E}^{*} \{ \widehat{\mathbb{V}}^{\text{Neyman}}(\hat{\tau}_{F}) \} - \mathbb{V}^{2*} ( \hat{\tau}_{F} )] = \frac{2}{N} [ \mathrm{Cov}^{*} \{ \beta^{*}_{i^{*}j^{*}}(1), \beta^{*}_{i^{*}j^{*}}(0) \} - \mathrm{Cov}^{*} \{ \omega^{*}_{i^{*}j^{*}}(1), \omega^{*}_{i^{*}j^{*}}(0) \} ]. \end{align*} $$ -
(2) Under Assumption 3* ,
$$ \begin{align*} & \lim_{N^{*} \rightarrow \infty} [ \mathbb{E}^{*} \{ \widehat{\mathbb{V}}^{\text{Neyman}}(\hat{\tau}_{P}) \mid N_{tc} \geq 2 \} - \mathbb{V}^{2*} ( \hat{\tau}_{P} \mid N_{tc} \geq 2) ]\\ &= 2 \mathbb{E}^{*} \Big( \frac{1}{N_{tc}} \Big| N_{tc} \geq 2\Big) [ \mathrm{Cov}^{*} \{ \beta^{*}_{i^{*}j^{*}}(1), \beta^{*}_{i^{*}j^{*}}(0) \mid R^{*}_{1j^{*}}(1) = R^{*}_{2j^{*}}(1) = R^{*}_{1j^{*}}(0) = R^{*}_{2j^{*}}(0) = 1 \}\\ & \quad - \mathrm{Cov}^{*} \{ \omega^{*}_{i^{*}j^{*}}(1), \omega^{*}_{i^{*}j^{*}}(0) \mid R^{*}_{1j^{*}}(1) = R^{*}_{2j^{*}}(1) = R^{*}_{1j^{*}}(0) = R^{*}_{2j^{*}}(0) = 1 \} ]. \end{align*} $$ -
(3) Under Assumption 2* ,
$$ \begin{align*} & \lim_{N^{*} \rightarrow \infty} [ \mathbb{E}^{*} \{ \widehat{\mathbb{V}}^{\text{Neyman}}(\hat{\tau}_{U})\mid N_{t}, N_{c} \geq 2 \} - \mathbb{V}^{2*} ( \hat{\tau}_{U} \mid N_{t}, N_{c} \geq 2) ] \\ &= 2 \mathbb{E}^{*}\Big( \frac{N_{tc} }{ N_{t} N_{c} } \Big| N_{t}, N_{c} \geq 2\Big) [ \mathrm{Cov}^{*} \{ \beta^{*}_{i^{*}j^{*}}(1), \beta^{*}_{i^{*}j^{*}}(0) \mid R^{*}_{1j^{*}}(1) = R^{*}_{2j^{*}}(1) = R^{*}_{1j^{*}}(0) = R^{*}_{2j^{*}}(0) = 1 \}\\ & \quad - \mathrm{Cov}^{*} \{ \omega^{*}_{i^{*}j^{*}}(1), \omega^{*}_{i^{*}j^{*}}(0) \mid R^{*}_{1j^{*}}(1) = R^{*}_{2j^{*}}(1) = R^{*}_{1j^{*}}(0) = R^{*}_{2j^{*}}(0) = 1 \} ] \\ & \quad + \Big[ \mathbb{E}^{*} \Big\{ \frac{ N_{tc} (N_{t} - N_{tc}) }{N_{t}^2(N_{t} - 1)} \Big| N_{t}, N_{c} \geq 2 \Big\} - \mathbb{V}^{2*} \Big( \frac{N_{tc} }{N_{t}} \Big| N_{t}, N_{c} \geq 2 \Big) \Big] \{ \bar{\beta}^{*}(1 \mid 1, 1) - \bar{\beta}^{*}(1 \mid 1, 0) \}^2 \\ & \quad + \Big[ \mathbb{E}^{*} \Big\{ \frac{N_{tc} (N_{c} - N_{tc}) }{N_{c}^2(N_{c} - 1)} \Big| N_{t}, N_{c} \geq 2 \Big\} - \mathbb{V}^{2*} \Big( \frac{N_{tc} }{N_{c}} \Big| N_{t}, N_{c} \geq 2 \Big) \Big] \{ \bar{\beta}^{*}(0 \mid 1, 1) - \bar{\beta}^{*}(0 \mid 0, 1) \}^2 \\ & \quad + 2 \mathbb{V}^{*} \Big( \frac{N_{tc} }{N_{t}}, \frac{N_{tc} }{N_{c}} \Big| N_{t}, N_{c} \geq 2 \Big) \{ \bar{\beta}^{*}(1 \mid 1, 1) - \bar{\beta}^{*}(1 \mid 1, 0) \} \{ \bar{\beta}^{*}(0 \mid 1, 1) - \bar{\beta}^{*}(0 \mid 0, 1) \} . \end{align*} $$
Proposition 4* Bias of the Adjusted Neyman Variance Estimators: SP
-
(1) Under Assumption 1* ,
$$ \begin{align*} \lim_{N^{*} \rightarrow \infty} [ \mathbb{E}^{*} \{ \widehat{\mathbb{V}}^{\text{Adj-Neyman}}(\hat{\tau}_{F}) \} - \mathbb{V}^{2*} ( \hat{\tau}_{F} ) ] =0. \end{align*} $$ -
(2) Under Assumption 3* ,
$$ \begin{align*} \lim_{N^{*} \rightarrow \infty} [ \mathbb{E}^{*} \{ \widehat{\mathbb{V}}^{\text{Adj-Neyman}}(\hat{\tau}_{P}) \mid N_{tc} \geq 2 \} - \mathbb{V}^{2*} ( \hat{\tau}_{P} \mid N_{tc} \geq 2) ] =0. \end{align*} $$ -
(3) Under Assumption 2* ,
$$ \begin{align*} & \lim_{N^{*} \rightarrow \infty} [ \mathbb{E}^{*} \{ \widehat{\mathbb{V}}^{\text{Adj-Neyman}}(\hat{\tau}_{U})\mid N_{tc} \geq 2 \} - \mathbb{V}^{2*} ( \hat{\tau}_{U} \mid N_{tc} \geq 2) ] \\ &= \Big[ \mathbb{E}^{*} \Big\{ \frac{ N_{tc} (N_{t} - N_{tc}) }{N_{t}^2(N_{t} - 1)} \Big| N_{tc} \geq 2 \Big\} - \mathbb{V}^{2*} \Big( \frac{N_{tc} }{N_{t}} \Big| N_{tc} \geq 2 \Big) \Big] \{ \bar{\beta}^{*}(1 \mid 1, 1) - \bar{\beta}^{*}(1 \mid 1, 0) \}^2 \\ & \quad + \Big[ \mathbb{E}^{*} \Big\{ \frac{N_{tc} (N_{c} - N_{tc}) }{N_{c}^2(N_{c} - 1)} \Big| N_{tc} \geq 2 \Big\} - \mathbb{V}^{2*} \Big( \frac{N_{tc} }{N_{c}} \Big| N_{tc} \geq 2 \Big) \Big] \{ \bar{\beta}^{*}(0 \mid 1, 1) - \bar{\beta}^{*}(0 \mid 0, 1) \}^2 \\ & \quad + 2 \mathbb{V}^{*} \Big( \frac{N_{tc} }{N_{t}}, \frac{N_{tc} }{N_{c}} \Big| N_{tc} \geq 2 \Big) \{ \bar{\beta}^{*}(1 \mid 1, 1) - \bar{\beta}^{*}(1 \mid 1, 0) \} \{ \bar{\beta}^{*}(0 \mid 1, 1) - \bar{\beta}^{*}(0 \mid 0, 1) \} . \end{align*} $$
Proposition 4* (2) is new to my knowledge and surprising. Even if we do not assume ignorable attrition, the adjusted Neyman variance estimator is unbiased for the super-population variance of the PDE. Here is an intuitive explanation: If we regard always-reporting pairs as an alternative super-population and apply Proposition 4* (1) to it, we obtain Proposition 4* (2). In a nutshell, $\widehat{\mathbb{V}}^{\text {Adj-Neyman}}(\hat {\tau }_{P})$ corresponds to the design of pairwise randomization. On the other hand, the bias directions of the other variance estimators are unknown.
4 Conclusion
Nonignorable attrition in pairwise randomized experiments has attracted less attention than it should. This paper shows that the UDE and the PDE are biased (Propositions 1 and 1* ). Nonetheless, a practical advice of this paper is simple: use the PDE rather than the UDE. The reasons are summarized as follows:
-
1. The PDE can be regarded as a kind of local average treatment effect for always-reporting pairs under a milder assumption than the UDE.
-
2. As compared with the UDE, the PDE is not necessarily less efficient. (Propositions 2 and 2* )
-
3. The adjusted Neyman variance estimator for the PDE is conservative in a finite sample and unbiased in a super-population, which is not the case for the UDE. (Propositions 4 and 4* )
Finally, the Neyman variance estimator can have either positive or negative bias for both ATE estimators (Propositions 3 and 3* ) and thus is not recommended.
Acknowledgments
Earlier versions, whose titles were “Blocking Reduces, if not Removes, Attrition Bias” or “Missing Data under the Matched-Pair Design: A Practical Guide,” were presented at the Workshop on Directional Statistics, the Institute of Statistical Mathematics (ISM), Tokyo, July 5, 2012, the Asian Political Methodology Meeting (APMM), Tokyo Institute of Technology, January 6–7, 2014, the Annual Meeting of the Midwest Political Science Association (MPSA), Chicago, April 16–19, 2015, the Annual Summer Meeting of Society for Political Methodology, Rochester, July 23–25, 2015, International Methods Colloquium, September 26, 2015, the Annual Meeting of the Japanese Political Science Association, Chiba University, October 10, 2015, the APMM, University of Sydney, January 9–11, 2017, the Annual Meeting of the MPSA, Chicago, April 6–9, 2017, the Japanese Economic Association Spring Meeting, Ritsumeikan University, June 24–25, 2017, and workshops at Dartmouth College, ISM, Osaka University, University of California, San Diego, and Washington University in St. Louis. I appreciate comments from Raymond Duch, Justin Esarey, Mototsugu Fukushige, Jeff Gill, Donald Green, Yoichi Hizen, Yusaku Horiuchi, Kosuke Imai, Luke Keele, Gary King, Dean Lacy, Charles McClean, Ryan T. Moore, Brendan Nyhan, Fredrik Sävje, Keith Schnakenberg, Susumu Shikano, Patrick Tucker, and Takahide Yanagi.
Funding
This work was supported by Japan Society for the Promotion of Science [grant numbers KAKENHI JP26285031, JP16K13340]; and Computer Centre of Gakushuin University [no grant number].
Supplementary Material
For supplementary material accompanying this paper, please visit https://doi.org/10.1017/pan.2020.51.
Data Availability Statement
The replication materials for the supplementary material of this paper can be found at Fukumoto (Reference Fukumoto2020).
Conflict of Interest
There is no conflict of interest to disclose.