Instrumental Variable Quantile Regression with Misclassification

This paper studies the instrumental variable quantile regression model (Chernozhukov and hansen, 2005) when a binary treatment variable is possibly misclassified and endogenous. It has two identification results. First, I show that, under the stochastic monotonicity condition (Small and Tan, 2007; DiNardo and Lee, 2011), the reduced-form quantile treatment effect is biased towards zero compared to the structural quantile treatment effect and therefore can be used as a lower bound for it. The reduced-form quantile treatment effect is the quantile treatment effect of the instrumental variable on the outcome variable, and is available even without any measurement for the treatment variable. Second, I derive moment conditions for the structural quantile function under standard assumptions about the measurement error. The moment conditions can be used for an inference via existing methods for moment inequalities.


Introduction
The instrumental variable quantile regression model (Chernozhukov and Hansen, 2005) is one of the most widely used tools for measuring the heterogeneous treatment effects in the presence of endogeneity. In many empirical applications, a treatment variable is also potentially mismeasured, and therefore it is empirically relevant how researchers can use the instrumental variable quantile regression model with a mismeasured treatment variable. For example, Chernozhukov and Hansen (2004) use the instrumental variable quantile regression model to investigate quantile treatment effect of 401(k) participation on saving behaviors, but the pension plan type is subject to a measurement error in survey datasets. Using the Health and Retirement Study, Gustman, Steinmeier, and Tabatabai (2008) estimate that around one fourth of the survey respondents misclassified their pension plan type. To the best of my knowledge, however, no paper has investigated the instrumental variable quantile regression model when a binary variable is potentially misclassified and endogenous. This paper has two identification results on the structural quantile function. First, I consider the reduced-form quantile treatment effect, that is, the quantile treatment effect of the instrumental variable on the outcome variable. This analysis can be useful to empirical studies for a few reasons. First and most importantly, empirical researchers routinely estimate the reduced-form quantile treatment effect as a part of their data analysis, e.g., Bitler, Hoynes, and Domina (2016). Although it has been used by empirical studies, the reduced-form quantile treatment effect has not been formally related to the structural quantile treatment effect. Second, the reduced-form quantile treatment effect does not use the treatment variable or its measurement, and therefore it is available even when a measurement does not exist.
I show that, under the rank similarity condition and the stochastic monotonicity condition, the reduced-form quantile treatment effect is biased towards zero and can be used as a lower bound for the structural quantile treatment effect. The stochastic monotonicity condition, proposed by Small and Tan (2007) and DiNardo and Lee (2011), requires that the instrumental variable weakly increases (resp. decreases) the probability of being treated (resp. untreated) for each value of the unobserved heterogeneity in the outcome equation. It is weaker than the deterministic monotonicity condition (Imbens and Angrist, 1994;Angrist, Imbens, and Rubin, 1996) because it allows for defiers to exist. Second, I derive moment conditions for the structural quantile function in the presence of measurement error. The identifying power comes from the exogeneity of the measurement error, which is widely used in the measurement error literature (e.g., Bound, Brown, and Mathiowetz, 2001) and yields the exclusion restrictions similar to Henry, Kitamura, and Salanié (2014). Using simulated datasets, I demonstrate that an inference based on those moment conditions can be implemented via an existing method for moment inequalities (Bugni, Canay, and Shi, 2016). I use an inference method for partially identified parameters, because the structural quantile function is not point-identified for some data generating processes no matter how many values the instrumental variable takes.

Related literature
Several papers have considered a measurement error problem of regressors in the quantile regression framework, e.g., Chesher (1991), Schennach (2008), Montes-Rojas (2009), Firpo, Galvao, andSong (2015), and Song (2016). They focus on the case in which the mismeasured regressor is continuously distributed, where this paper focuses on a discrete treatment variable in which the measurement error has to be nonclassical. Ura (2015) investigates the local average treatment effect model with a mismeasured treatment. The local average treatment effect model is also a model for heterogeneous treatment effects in the presence of endogeneity, but has a different structure than the instrumental variable quantile regression model. Mahajan (2006), Lewbel (2007), and Hu (2008) consider the identification problem of the average treatment effect (or, more generally, the conditional density function of the outcome variable given the true treatment variable) when a discrete treatment variable is mismeasured. Their identification strategy is based on the assumption that the true treatment variable (or the individual treatment effect) is exogenous, and there is no straightforward way to generalized their results to the endogenous treatment. Frazis and Loewenstein (2003) and DiTraglia and García-Jimeno (2015) study a regression model in which a binary variable is potentially misclassified and endogenous. Their approach is based on the homogenous treatment effect, which does not hold in the quantile treatment effect framework.

Instrumental variable quantile regression model with misclassification
My analysis is based on an instrumental variable quantile regression model in Chernozhukov and Hansen (2005) and, for the sake of simplicity, omits covariates other than the treatment variable. Y is an outcome variable, D * is a binary treatment variable taking values in {0, 1}, and Z is an instrumental variable. The structural quantile function q(d * , u) relates the outcome Y to the treatment variable D * and the error term (U 0 , U 1 ): The random variable q(d * , U d * ) is the potential outcome when D * = d * . The parameter of interest is the τ -th quantile of the counterfactual outcome: q(d * , τ ).
I will maintain Assumption 1 throughout this paper. The first condition (i) requires that the outcome variable Y is continuously distributed over the real line. The second condition (ii) is the exogeneity of the instrumental variable Z. In this paper I focus on the local exclusion restriction at τ , which suffices to derive the testable implication, Eq. (1), in Chernozhukov and Hansen (2005) for the structural quantile function at τ . The local restriction is a weaker condition than the full independence between Z and U d * (Chesher, 2003). The last condition (iii) is the rank similarity condition, in which the two unobserved heterogeneity terms U 0 and U 1 have the same distribution given the endogenous treatment assignment. The rank similarity condition is a restriction on the unobserved heterogeneity in the outcome equation and has been used for investigating the heterogenous treatment effects (e.g., Doksum, 1974;Heckman, Smith, and Clements, 1997;Chernozhukov and Hansen, 2004).
Under the rank similarity condition, Chernozhukov and Hansen (2005) obtain the following relationship between the distribution of (Y, D * , Z) and the structural quantile function q(d * , τ ).
Theorem 2. Define U = U D * . Under Assumption 1, (1) In this paper, I consider a measurement error problem in the treatment variable. Instead of observing the true treatment variable D * , a measurement D for D * is observed in the dataset. The measurement D is not necessarily equal to the truth D * , that is, D may misclassify the treatment status. As a result, the equality (1) cannot be directly used for identifying the structural quantile function.
3 Reduced-form quantile treatment effect I start the identification analysis with comparing the reduced-form quantile treatment effect Q Y |Z=z 1 (τ ) − Q Y |Z=z 0 (τ ) and the structural quantile treatment effect q(1, τ ) − q(0, τ ) when Z is a binary variable taking z 0 or z 1 . Theorem 3 shows that, under the stochastic monotonicity condition in (2), the reduced-form quantile treatment effect Theorem 3. Suppose that Assumption 1 holds and that the stochastic monotonicity condition holds: for every u ∈ [0, 1].
(a) There is some unknown constant κ ∈ [0, 1] such that The stochastic monotonicity condition in (2) is proposed by Small and Tan (2007) and DiNardo and Lee (2011). It assumes a positive relationship between the treatment variable D * and the instrumental variable Z in which, for every possible realization u of U , the probability of being treated f U,D * |Z=z (u, 1) is weakly increasing in z, and the probability of being untreated f U,D * |Z=z (u, 0) is weakly decreasing in z. Theorem 3 (b) requires a strict inequality at τ in the stochastic monotonicity condition to guarantee κ > 0.

Identified set for the structural quantile function
This section considers the use of the potentially misclassified treatment variable D and provides the sharp identified set for the structural quantile function q(·, τ ). To extract some information about the truth D * from its measurement D, I impose restrictions on the misclassification probabilities.
Assumption 4 (i) is that the measurement error does not depend on (Y, Z). It is a widely-used assumption in the literature on measurement error (e.g., Mahajan, 2006;Lewbel, 2007;and Hu, 2008). Assumption 4 (ii) is that the measurement D is positively correlated with the true treatment variable D * as in Hausman, Abrevaya, and Scott-Morton (1998).
To introduce the identification result, I define the sharp identified set for q(·, τ ). I let Q be the set of possible structural quantile functions q, P be the set of possible misclassification probability p = (p 0 , p 1 ), and F be the set of possible distributions f U,D * ,Z . The data generating process is characterized by (q, p, f U,D * ,Z ). The distribution for the observables (Y, D, Z) is generated by the underlying parameter (q, p, f U,D * ,Z ) as follows: Given the distribution F Y,D,Z for (Y, D, Z), the sharp identified set for (q, p, f U,D * ,Z ) is defined by the set of elements of Q × P × F which generate F Y,D,Z . The sharp identified set for q(·, τ ) is the projection of the sharp identified set for (q, p, f U,D * ,Z ) on q(·, τ ).
The sharp identified set for q(·, τ ) under Assumptions 1 and 4 is characterized by moment equalities and inequalities.
Theorem 5. Assume that all elements in Q × P × F satisfies Assumptions 1 and 4. (a) Given a distribution f Y,D,Z for the observed variables, if (y 0 , y 1 ) belongs to the sharp identified set for q(·, τ ), then contains all the values (q, p, f U,D * ,Z ) satisfying Assumptions 1 and 4, then the above conditions are necessary and sufficient for (y 0 , y 1 ) to be in the sharp identified set for q(·, τ ).

Under-identification even with large variation in Z
The structural quantile function is not point identified in general unless there is an additional information on the model primitives (q, p, f U,D * ,Z ). The failure of the point identification happens regardless of the order condition based on Eq. (6), where the number of the parameters are 4 and the number of the equations is the number of support points of Z. Theorem 6 states this failure formally for a class of data generating processes. This finding has practical implications on estimation and inference, since asymptotic arguments on partially identified parameters are quite different from point identified parameters.
The assumptions in Theorem 6 are satisfied for standard settings. These assumptions do not contradict with Assumptions 1 and 4. Condition (i) is a regularity condition. Condition (ii) is that the treatment variable can have a non-zero effect on the outcome at quantile index τ . Condition (iii) is that the treatment variable can be exogenous. Condition (iv) is a condition on the size of the parameter space Q × P × F in which the parameter value which is a small (indexed by ε) perturbation of (q, p, f U,D * ,Z ) also belongs to Q × P × F.

Inference
To construct a confidence region for the parameter of interest q(·, τ ), this paper recommends the use of sub-vector inference methods for moment inequality models (Romano and Shaikh, 2008;Bugni et al., 2016;and Kaido, Molinari, and Stoye, 2016), since the identified set for q(·, τ ) in Theorem 5 involves nuisance parameters (p 0 , p 1 ). In the next section, I use the minimum resampling test in Bugni et al. (2016) among the existing methods. 1 The misclassification probability (p 0 , p 1 ) can also be of interest. I can also constructs a confidence region for q(·, τ ) and (p 0 , p 1 ) jointly. See Canay and Shaikh (2016) for various testing procedures in moment inequality models.

Monte Carlo simulations
Consider the following data generating process. Z is a binary random variable taking z 0 with probability 0.5 and z 1 with probability 0.5. U and V are distributed according to the two-dimensional Gaussian copula with correlation equal to ρ. Assume that Z is independent of (U, V ). D * is a binary random variable with D * = 1{V ≤ The measurement D is equal to the true value D * with probability 0.75, that is, p d * = 0.25.
Consider the parameter values for (ρ, γ) in Table 1. ρ controls the degree of the endogeneity in D * , and γ represents the strength of the instrumental variable. Table  1 also show the values for the population parameters.
For each design, I conduct Monte Carlo experiments with sample size 1000. Figures 1-3 describe the coverage probabilities of the confidence intervals for the structural quantile treatment effect θ = q(1, τ ) − q(0, τ ) at τ = 0.5. I use the subvector inference method in Bugni et al. (2016) to compute the confidence intervals with size 10% and the coverage probabilities in the figures are based on 2000 simulations. The tuning parameters are based on the suggestions in Bugni et al. (2016). All the figures demonstrate that the inference method works well in a finite sample.

Conclusion
This paper investigates the instrumental variable quantile regression model when a binary regressor is possibly misclassified and endogenous. I show that the reducedform quantile treatment effect is a lower bound (with the same sign) for the structural quantile treatment effect, under the rank similarity condition and the stochastic monotonicity condition. I also characterize the sharp identified set for the structural quantile function under the rank similarity condition and standard assumptions on the measurement error.    Table 1. The solid curve represents the coverage probability and the dotted line represents the identified set.

A Proofs
Proof of Theorem 2. For the sake of completeness, I repeat the proof of Chernozhukov and Hansen (2005) under the local exclusion restriction at τ in Assumption 1. The first equation follows from where the second to last equality comes from Assumption 1 (iii) and the last comes from Assumption 1 (ii).