A SIMPLE NONPARAMETRIC APPROACH FOR ESTIMATION AND INFERENCE OF CONDITIONAL QUANTILE FUNCTIONS

In this paper, we present a new nonparametric method for estimating a conditional quantile function and develop its weak convergence theory. The proposed estimator is computationally easy to implement and automatically ensures quantile monotonicity by construction. For inference, we propose to use a residual bootstrap method. Our Monte Carlo simulations show that this new estimator compares well with the check-function-based estimator in terms of estimation mean squared error. The bootstrap confidence bands yield adequate coverage probabilities. An empirical example uses a dataset of Canadian high school graduate earnings, illustrating the usefulness of the proposed method in applications.


INTRODUCTION
Quantile regression has now become a very useful tool in economics, statistics, and other social sciences. But concerns about quantile crossing and model misspecification (He, 1997;Angrist, Chernozhukov, and Fernández-Val, 2006) have led to the development of nonparametric estimation of conditional quantile functions (Chaudhuri, 1991;Koenker, Ng, and Portnoy, 1994;Yu and Jones, 1998). An important framework in this methodology is the following nonparametric locationscale model, where the outcome variable Y i and the covariate X i are related by for i = 1, . . . ,n, where i is a zero-mean error term that is independent of X i , m 0 (·) and σ 0 (·) > 0 are the unknown location and scale functions, respectively.
Model (1) generalizes the location-scale-based linear quantile regression model. In addition, compared with homoskedastic nonparametric quantile regression models (Chaudhuri, 1991;Honda, 2004), i.e., where the error term u i has zero mean and is independent of X i , the locationscale model is more flexible in capturing heterogeneity across individuals. As a result, it has received a great deal of attention in the literature (Fan and Gijbels, 1996;Van Keilegom and Akritas, 1999;Heuchenne and Keilegom, 2007;Einmahl and Van Keilegom, 2008;Neumeyer, 2008Neumeyer, , 2009aFlorens, Simar, and Keilegom, 2014;Birke, Neumeyer, and Volgushev, 2017). For example, Chen, Dahl, and Khan (2005) use a censored nonparametric location-scale regression model to study how the median-length unemployment insurance spell in New Jersey varies with age, whereas Florens et al. (2014) use a flexible nonparametric location-scale model to estimate a frontier function. There is rich literature on nonparametrically estimating the error's distribution function and/or testing independence between the covariate X and the idiosyncratic error u based on the nonparametric location-scale model. For example, Akritas and Van Keilegom (2001) study the nonparametric location-scale model (1) and establish the weak convergence of the process √ n F (·) − F (·) , whereF is the empirical distribution function of the nonparametrically estimated residualŝ i = Y i −m n (X i ) /σ n (X i ), andm n (X i ) andσ n (X i ) are nonparametric estimators of m 0 (X i ) and σ 0 (X i ), respectively. Birke et al. (2017) investigate the empirical independence process of covariates and estimated errors in a nonparametric location-scale conditional quantile curve and prove a weak convergence result. Cheng (2002), Einmahl and Van Keilegom (2008), Neumeyer (2009a, b), and Kiwitt and Neumeyer (2012) consider the specification tests of the model (1), including testing the independence between the error and the covariate X, testing monotonicity of the nonparametric conditional quantile function and establishing bootstrap versions of the weak convergence results.
In this paper, we also consider a location-scale model given in (1), but differing from the existing literature, we focus on the problem of estimating the conditional quantile function of Y i given X i = x based on a location-scale quantile model framework. In particular, we propose a new estimator that is theoretically and computationally simple. The simplicity of our approach is rooted in an identification strategy that is different from prior studies in the context of quantile estimation. Specifically, we assume that E[ i ] = 0 and E[ 2 i ] = 1, which is merely a normalization and hence is not restrictive. This assumption is not new and is, in fact, ubiquitous in mean regression models. When it comes to quantile regression, however, the literature has taken a different route. For example, He (1997) requires that i has a median of zero and that | i | has a median of one, which, of course, can also be achieved without loss of generality. Similar quantile-type normalizations have been imposed in, for instance, Chaudhuri (1991), Horowitz and Lee (2005), Chen et al. (2005), and Birke et al. (2017).
An appealing consequence of our normalization is that the conditional quantile function q τ (x) of Y i given X i = x takes a particularly simple closed-form structure: for all τ ∈ (0,1), where Q (τ ) is the τ th quantile of i . Note that m 0 (x) and σ 2 0 (x) are the conditional mean and conditional variance functions of Y i given X i = x, respectively, both of which can be estimated by standard nonparametric methods (Fan and Yao, 1998). Thus, ifm n (x) andσ n (x) are estimators of m 0 (x) and σ 0 (x), respectively (which can be obtained from mean regression), then q τ (x) can be estimated bŷ whereQ (τ ) is the empirical quantile function of the regression residuals. Here, we stress that one can use any nonparametric method (kernel or series method) to estimate m 0 (x) and σ 0 (x). As long as they are consistent, the resultingq τ (x) will be consistent as well in view of equation (4).
In the literature, there are two classes of estimation strategies for the conditional quantile function. The first is tailored to the location-scale model (1). As mentioned previously, a common feature of these methods is that they adopt quantile-type normalizations. Despite the fact that these normalizations are not restrictive, the estimation steps are not as straightforward as ours. For example, the restricted regression quantile approach proposed by He (1997) consists of three different quantile regressions. Following the estimation procedure proposed by Dette and Volgushev (2008) that is designed to avoid quantile crossing, Birke et al. (2017) present an estimator constructed from a transformation of a kernel estimator of the conditional cumulative distribution function (CDF), where the transformation entails an auxiliary distribution function, a kernel function, and a bandwidth parameter (in addition to those used in kernel estimation). In contrast, our estimator automatically guarantees monotonicity in quantile index τ . Therefore, we do not need sorting (Chernozhukov, Fernández-Val, and Galichon, 2010;Qu and Yoon, 2015), isotonization (Mammen, 1991) or transformation of any kind (Dette and Volgushev, 2008). This desired property of our approach lies in the fact that we have exploited the location-scale structure, whereas those alternative monotonization methods have their own merits in nonlocation-scale settings.
The second class of strategies for estimating the conditional quantile function is fully nonparametric. There are two popular methods in this area: inverting an estimator of the conditional distribution function and using the check-functionbased approach. The inverse-CDF-based estimator is of the formq τ (x) = inf{y : F n (y|x) ≥ τ }, for τ ∈ (0,1), whereF n (y|x) is a nonparametric estimator of conditional distribution function F(y|x). The check-function-based approach deliverš ) is the check function, K(·) is a kernel function, and h is a bandwidth parameter. It is well known that the choice of smoothing parameter is of crucial importance in nonparametric estimation, and some data-driven methods, such as the leastsquares cross-validation (LS-CV) method, can lead to optimal smoothing parameter selection. However, for the inverse-CDF-based estimator, the LS-CV method computation time is O(n 3 ) (Li, Lin, and Racine, 2013), where n is the sample size. By contrast, our quantile estimator requires only O(n 2 ) computation time. Moreover, while it is difficult to obtain a derivative estimator and derive its asymptotic theory if one uses the inverse-CDF-based method to estimate the quantile function, our method also yields the derivative function estimate when using the local linear estimation method. In addition, as in Racine and Li (2017), when the data have an unbounded support, our estimator tends to have lower finitesample estimation mean squared errors (MSEs).
We exploit the location-scale structure in constructing our estimator and require the existence of higher-order moments in deriving the asymptotic distribution theory. By contrast, the inverse-CDF-based and check-function-based estimators do not impose these restrictions. Therefore, we admit that our model setup is restrictive relative to the aforementioned two methods. However, in practice, researchers often are prepared to make some model assumptions, and our proposed method can be useful in empirical applications, given its simplicity. Furthermore, our method is not supposed to replace the existing inverse-CDF-based estimator or the check-function-based estimator, which are well established in the literature. Instead, it is an additional tool to complement these existing methods.
Although our method avoids the quantile crossings as a result of the locationscale structure of the model, it does not prevent the model from being misspecified, i.e., the true quantile function may not have a location-scale specification. This is a somewhat more subtle issue than the indication of misspecification via quantile crossings (Chernozhukov et al. 2010;Phillips, 2015). 1 Following Racine and Li (2017), if the possible misspecification of a location-scale model is a concern, we suggest using a pretest procedure to detect the possible misspecification. Using a pretest estimator avoids severe misspecification with regard to the location-scale model assumption. We apply this pretest method to some data generating processes (DGPs) that violate the location-scale assumption. Simulation results show that the pretest procedure works well in guarding against severe misspecification. Note that our model structure is the same as that of Racine and Li (2017), but our estimation procedure is much simpler. In Section 6.1, we use simulations to compare the finite-sample performances of our proposed estimator with those of the checkfunction-based estimator and Racine and Li's (2017) estimator.
We reiterate that the simplicity of our estimation hinges on an identification strategy that is common in the context of mean regressions but (somewhat surprisingly) less appreciated in the literature of quantile regression. We derive the Bahadur expansions for our estimator, which, in turn, deliver the asymptotic distribution. Both are uniform in the quantile index. For inferential purposes, we propose a residual bootstrap procedure. These uniformity results can be useful in, for example, constructing confidence bands for the conditional quantile function and testing the distributional hypothesis, such as the homogeneity or equality of quantile treatment effects, as in Koenker and Xiao (2002) and Qu and Yoon (2015). As an illustration, we consider the problem of constructing confidence bands. Our simulation results further show that the bootstrap confidence intervals have adequate coverage probabilities under different error distributions that we consider. For an empirical illustration, we apply our method to a Canadian dataset of high school graduate earnings and examine how the conditional quantiles of logwage (defined as the logarithm of wage) vary across different quantile indexes τ ∈ (0,1).
The remainder of the paper is organized as follows. Section 2 describes the methodology in detail. Section 3 introduces assumptions and establishes the asymptotic theory for our proposed estimator. Section 4 considers the bootstrap inference. We extend our local constant conditional quantile estimator to the local linear case in Section 5. In Section 6, we use Monte Carlo simulations to examine the finite-sample performances of the proposed quantile estimator and the bootstrap confidence interval. Section 7 contains an illustrative empirical application. Section 8 concludes. Proofs of the main theoretical results of the paper can be found in the Supplementary Material.

METHODOLOGY
In this section, we describe a two-step procedure for estimating the conditional quantile function q τ (x). In the first step, we use undersmoothed smoothing parameters b 1 and b 2 to estimate the conditional mean function m 0 (x) and the conditional standard error function σ 0 (x). This, in turn, allows us to construct the residuals {ˆ i } n i=1 and then use the empirical quantile function of {ˆ i } n i=1 as an estimator of Q (τ ). In the second stage, we use, possibly optimally selected, smoothing parameters h 1 and h 2 to estimate m 0 (x) and σ 0 (x). These estimators, together with the estimator of Q (τ ) in the first step, yield the final estimator of q τ (x) = m 0 (x) + σ 0 (x)Q (τ ). We focus on the local constant estimator. The extension to the local linear case is given Section 5.
We fix x and treat q τ (x) as a process indexed by τ ∈ T = [τ,τ ] with 0 < τ ≤ τ < 1. Let X ⊂ R denote the support of X. 2 We assume that X is a union of bounded and closed sets. Because the local constant kernel estimator has a large estimation bias at the boundary region, we use a trimming function to eliminate observations near the boundary. For expositional simplicity, we consider a simple case that X = [0,1] to illustrate how to construct a trimming function. Let h 1 and h 2 be the smoothing parameters used to estimate m 0 (x) and σ 0 (x), respectively. To avoid the boundary bias problem, similar to Birke et al. (2017), we define a trimmed set X n def = [δ n ,1 − δ n ], where δ n = 2max{h 1 ,h 2 }. We then only consider x ∈ X n . Note that as n gets larger, the trimmed set X n becomes closer to (0,1), so that we only trim out the boundary points {0} and {1} asymptotically (which has zero Lebesgue measure). In addition, we assume the kernel function K(·) has support [−1,1]. Then, in kernel estimation, when n is sufficiently large, a term like K(( We will not distinguish between the trimming sets [δ n /2,1 − δ n /2] and [δ n ,1 − δ n ], and we denote both by X n . We are now ready to present the details of the two-step estimation procedure. Step 1. We estimate the quantile function of the error term based on nonparametric residuals. First, we estimate m 0 (x) and σ 0 (x) by the nonparametric local constant kernel method with the undersmoothed parameters b 1 and b 2 (see Assumption 3.5): where 1 i,n = 1(X i ∈ X n ).
Step 2. Estimate m 0 (·) and σ 0 (·) with the smoothing parameters h 1 and h 2 (where we allow for optimally selected smoothing parameters): Finally, the τ th conditional quantile estimator of Y given X = x is given bŷ whereσ h 2 (X i ) = σ 2 h 2 (X i ).
Remark 2.1. In this paper, the smoothing parameters b 1 and b 2 are only used for preliminary estimates of m 0 (X i ) and σ 2 0 (X i ) that are needed for estimating the 3 In practice, one can also use the standardized residuals˜ i = ( quantile Q (τ ). After we obtainQ (τ ), we will estimate q τ (x) by estimating m 0 (x) and σ 2 0 (x) using the smoothing parameters h 1 and h 2 . Remark 2.2. In practice, one can select h 1 and h 2 using the leave-one-out cross-validation method. Specifically, is the leave-one-out kernel estimator of σ 2 0 (X i ) with the smoothing parameter h. However, when we construct uniform confidence bands, we use the undersmoothed smoothing parameters h j for j = 1,2.

ASYMPTOTIC THEORY
We make the following regularity assumptions.
with mean zero and unit variance, and is independent of X i , for all i = 1, . . . ,n; (iii) X , the support of X, is a union of closed and bounded intervals in R.
It is a symmetric and bounded density function, and is twice continuously differentiable on (−1,1). Assumption 3.3. Let f (x) be the density function of X evaluated at x ∈ X . (i) Both m 0 (x) and σ 0 (x) are twice continuously differentiable on x at the interior of X . Their second derivative functions are uniformly bounded over The support E of the error term is convex; (ii) The distribution function F of is twice continuously differentiable, and its second derivative function is bounded on E; (iii) The density function f of is uniformly continuous on E and positive on [ Assumption 3.5. The bandwidths h j and b j , j = 1,2, satisfy that, as n → ∞, (i) h j = c j n −1/5 for some finite constant c j > 0; (ii) b j = c j n −β for some finite constant c j > 0 and β ∈ ( 1 5 , 1 3 ). Assumption 3.1 is fairly standard. Although the i.i.d. assumption is maintained throughout the paper, the results can be easily extended to the weakly dependent data case. Assumption 3.2 includes standard restrictions on the kernel function. Assumption 3.3 imposes smoothness conditions on m 0 (·), σ 0 (·) and the density function f (·). We also assume f (·) and the scale σ 0 (·) to be positive over the support for identification reasons. Assumption 3.4 is a sufficient condition to deliver the desired asymptotic property of the empirical quantile estimatorQ (τ ). Assumption 3.5 allows for optimal smoothing on h 1 and h 2 but requires undersmoothing for b 1 and b 2 , and the assumption implies that nb 3 j /(log n) 2 → ∞, j = 1,2. We impose undersmoothing on b j ,j = 1,2, so that the estimation error fromQ (τ ) is asymptotically negligible. 4 Throughout this paper, we use the notationġ( dx 2 , where g is a function of x, such as m 0 or σ 0 , or estimators of them (m,σ , etc.). THEOREM 3.1. Under Assumptions 3.1-3.5, for any fixed x at the interior of X ,

with zero mean, and the covariance structure is given by
Theorem 3.1 is the weak convergence result, and it trivially implies the pointwise convergence ofq τ (x) (for any fixed τ ∈ T ), as described in the following Corollary 3.1.
COROLLARY 3.1. Under Assumptions 3.1-3.5, for any x at the interior of X and any τ ∈ T , where D 1 (x), D 2,τ (x), ν 0 , and c are the same as in Theorem 3.1.
Remark 3.1. Note that the leading bias of the quantile estimatorq τ (x) includes two terms: (a) D 1 (x), the estimation bias inm(x) and (b) D 2,τ (x), the estimation bias inσ (x). The asymptotic variance ofq τ (x) is related to the estimation variances ofm(x) andσ (x), as well as the covariance between the two. If one uses a higher-order kernel to estimate σ 0 (x) as in Racine and Li (2017), thenσ (x) will converge faster thanm(x). Therefore, the leading bias of our quantile estimator will simplify to D 1 (x), and the asymptotic variance will collapse to ν 0 σ 2 0 (x)/f (x). These coincide with the leading bias and variance in Racine and Li (2017). Then, by Propositions 4.4 and 4.5 of Racine and Li (2017), we know that: (1) when the error has unbounded support, our estimator tends to have a smaller estimation MSE than the check-function-based estimator, and (2) if has a bounded support with density function bounded away from zero, the check-functionbased method is likely to have a smaller estimation MSE than our proposed method.
Remark 3.2. In practice, one can use the data-driven method to select the optimal smoothing parameters, and Theorem 3.1 and Corollary 3.1 still hold with the data-dependent smoothing parameters by using the stochastic equicontinuity argument as in Li and Li (2010).

Remark 3.3.
Note that in the two-step estimation procedure described in Section 2, when estimating σ 2 0 (x), we usem λ (X j ), j ∈ {1,...,n} instead of m λ (x) (λ denotes the generic smoothing parameter). Fan and Yao (1998) recommend the same approach, as it leads to the result that the bias D 2,τ only contains two terms. If one usesm λ (x), however, the bias D 2,τ will contain three terms.
Remark 3.4. A co-editor suggests that our identification conditions may be relaxed to Median( i ) = 0 and Median(| i |) = 1 to allow fat-tail distribution of i . These normalization conditions were considered in Birke et al. (2017). One can apply results from Birke et al. (2017) to estimate the conditional quantile function q τ (x) under these normalization conditions. Unlike our model setup, this approach does not lead to a simple closed-form estimator. We leave the formal investigation of this approach as a future research topic.
Remark 3.5. Zhao and Xiao (2014) develop an innovative approach to constructing efficient estimators of conditional mean regression functions via quantile regressions. Their proposed method is based on optimally combining information over multiple quantiles and can be applied to a broad range of parametric and nonparametric settings. They use the check-function-based method to estimate the conditional quantile functions. Under the location-scale model framework, one may use our proposed method to estimate the conditional quantile function. Then, combining information over multiple quantiles, as suggested by Zhao and Xiao (2014), may lead to a more efficient estimator of the conditional mean function. A theoretical investigation of such an estimation method is beyond the scope of the present paper.

BOOTSTRAP INFERENCE
In this section, we turn our attention to inference on the conditional quantile function estimator. Many applied researchers are interested in the distributional treatment effects of a policy intervention. The inferential theory can be used to analyze issues such as (i) constructing the confidence band for the conditional quantile estimator and (ii) testing the distributional hypotheses such as the homogeneity or equality of quantile treatment effects. In this paper, we focus on (i) and use the bootstrap method to construct confidence intervals for the proposed conditional quantile estimator. There are two motivations for using a bootstrap method for inference. First, the conventional analytic asymptotic approximation entails estimation of covariances whose forms are somewhat complicated Titiunik, 2014, 2018). Second, bootstrapping often yields more reliable and accurate results (Beran, 1982;Hall, 1992). We use the residual bootstrap method to construct the confidence intervals.

Uniform Bootstrap Confidence Interval Algorithm
To avoid calculating complicated leading bias terms in the conditional quantile estimator, we suggest using the undersmoothed smoothing parameters h j , j = 1,2, in constructing bootstrap confidence bands. We use h * j andq * τ (x) to denote the corresponding smoothing parameters and the quantile estimator, using the bootstrap sample, respectively. For a function f and, for α ∈ (0,1), let where P W is computed with respect to the bootstrap randomness conditional on the data. Then, our level 1 − α confidence band is given by We propose a three-step procedure to construct the uniform bootstrap confidence intervals for the quantile process q τ (x) over the set T . Let B denote the number of bootstraps. For each b = 1, . . . ,B: Step 1. Generate the bootstrap error { * i } n i=1 by random draws with replacement from the standardized residuals {ˆ i } n i=1 , whereˆ i is defined as in (5). Then, construct the bootstrap sample by Step 2. Discretize T into a grid of quantile indexes {τ 1 , . . . ,τ m }, where m is the number of grids. Then, use the bootstrap sample denote the estimators of m 0 (x) and σ 0 (x) with the bootstrap sample;Q (τ j ) is obtained in the same way with the smoothing parameter b j , where j = 1,2, as in Step 1 of Section 2.
Step 3. Calculate the maximum distance betweenq * The above three steps produce B bootstrap supremum estimators T * n,1 (x), · · · , in an ascending order and obtain the sequence is the nearest integer function and α ∈ (0,1) is the nominal size.

Uniform Bootstrap Confidence Interval Theory
In this subsection, we establish the validity of the bootstrap method. We first replace Assumption 3.5 with Assumption 4.1 below.
Assumption 4.1. The bandwidths h j , h * j , j = 1,2, satisfy that as n → ∞, (i) h j = c j n −γ for some finite and positive constant c j , with γ ∈ ( 1 5 , 1 3 ); (ii) h * j = c j n −γ for some finite and positive constant c j ,γ > γ andγ ∈ ( 1 5 , 1 3 ). Remark 4.1. Assumption 4.1(i) implies that h 2 j = o((nh j ) −1/2 ), nh 3 j /(log n) 2 → ∞. That is, the bias terms are asymptotically negligible compared with the variance terms. Assumption 4.1(ii) requires that h * j , the smoothing parameter used in estimating the conditional quantile function with the bootstrap sample, is further undersmoothed compared with h j , for j = 1,2. It also implies that n(h * j ) 3+δ → ∞, for some δ > 0, and log(n)/(nh j ) = o(1/(nh * j )), for j = 1,2. As suggested in Hall (1992), in order to avoid estimating the complicated bias terms in the bootstrap, one can use undersmoothing in estimating the unknown functions. Theorem 4.1 shows that the bootstrap is consistent, and hence the bootstrap confidence band has an adequate coverage ratio, which is stated in Corollary 4.1 below.
COROLLARY 4.1. If Assumptions 3.1-3.4 and 4.1 hold, then for any x at the interior of X , Remark 4.2. Given the established uniformity in τ ∈ T , one may choose as many grid points as possible, subject to the computational capacity. We stress, however, that the discretization here is purely of computational rather than statistical nature. This is in contrast to Qu and Yoon (2015), whose estimators as processes are obtained by extrapolating (in a particular fashion) point estimators at a finite number of grid points. Remark 4.3. As a referee correctly pointed out, because the residual distribution can be estimated at the √ n-rate (if we further impose a condition that b j = O(n −1/4 ), j = 1,2) while the nonparametric functions m and σ 2 are estimated at a slower rate, the estimator behaves as if the residual distribution was known in advance. Therefore, the processq τ (x) (indexed by τ ∈ T ) converges (without bootstrap) in a simple "modular" fashion: once the CDF of residuals is estimated at a (uniform) √ n-rate and Bahadur expansions form andσ 2 are obtained, the Bahadur representation for the final estimator follows easily. Therefore, a minor contribution of the paper is to establish Bahadur expansions form andσ under a weaker condition b j = o(n −1/5 ), j = 1,2, and the main technical innovation of this paper is to establish the bootstrap consistency.

THE LOCAL LINEAR QUANTILE ESTIMATOR
Up to now, we have concentrated on the local constant conditional quantile estimator. In this section, we briefly discuss how to extend the earlier results to the case of using local linear methods to estimate the conditional quantile function. For brevity and to avoid replicating similar proofs, we choose to present a simple case of estimating a τ th conditional quantile function, q τ (x), with a fixed τ ∈ T . We omit the trimming indicator function 1 i,n = 1(X i ∈ X n ). Below, we first introduce some notation.
Define a τ (x) = (q τ (x),q τ (x)) T , a 1 (x) = (m 0 (x),ṁ 0 (x)) T , and a 2 (x) = (σ 0 (x), σ 0 (x)) T , where the superscript "T" denotes the transpose of a matrix. Letâ 1 (x), a 2 (x), andQ ,LL (τ ) be the local linear estimators of a 1 (x), a 2 (x), and Q (τ ) that are defined in the Supplementary Material, respectively. Then, the local linear estimator of a τ (x) is given by: wherem h,LL andṁ h,LL denote the local linear estimators of m 0 (x) andṁ 0 (x) with the smoothing parameter h. Similarly,σ h,LL andσ h,LL are the local linear estimators of σ 0 (x) andσ 0 (x), with the smoothing parameter h.Q ,LL (τ ) is the estimator of Q (τ ), derived the same way as in the local constant case, except that the estimators of m 0 (x) and σ 0 (x) are replaced with the corresponding local linear counterparts.
Remark 5.1. Compared with the local constant result given in Corollary 3.1, the local linear estimatorq τ,LL (x) has fewer bias terms, while the asymptotic variance is the same.
With an almost identical proof method and technicality, we can show that the weak convergence result holds for the local linear estimator, i.e., for any fixed x at the interior of X , where G(τ ) is the same Gaussian process as defined in Theorem 3.1. Using a similar calculation to the one described in Section 4.1, one can compute uniform bootstrap confidence intervals of the local linear conditional quantile estimator.

MONTE CARLO SIMULATION
We conduct numerical experiments to assess two issues: (i) the finite-sample estimation MSE performance of the proposed estimators relative to the checkfunction-based estimator and the conditional quantile estimator proposed by Racine and Li (2017) and (ii) the performance of the estimated uniform confidence bands.
Assumption 3.2 requires that the kernel function has a bounded support. In fact, this assumption can be relaxed by, for example, allowing for the use of a Gaussian kernel, but with a more tedious proof. Throughout the simulations in this section, we use the Gaussian kernel. In Section S3 in the Supplementary Material (Table  S3.1), we also report the performance of our method using the Epanechnikov kernel to examine whether estimation results are sensitive to different kernel functions. Our simulations show that the estimation results using different kernels are quite similar. This supports our claim that, in practice, one can use a Gaussian kernel. For the check-function-based estimator, we also use a Gaussian kernel (simulation results using the Epanechnikov kernel yield similar results).
Next, we discuss how to select the bandwidth. For our proposed method, the smoothing parameters are selected by the LS-CV method, as described in Remark 2.2. For the check-function-based method, we choose the bandwidth h by minimizing the following cross-validation function: whereq −j,τ (X j ;h) is the leave-one-out local constant check-function-based estimator of the conditional τ th quantile of Y, given X j (denoted by q τ (X j )) with bandwidth h, i.e., Tables 1-3 report the mean MSEs for the Gaussian, χ 2 (5), and exponential error DGPs, respectively. For the Gaussian and χ 2 (5) errors (Tables 1 and 2), it can be seen that both our proposed method and Racine and Li's method have smaller estimation MSEs than the check-function-based method for all cases considered. For the exponential error (Table 3), we observe that, in general, the check-functionbased method performs better at lower quantiles, whereas our method and Racine & Li's method dominate the check-function-based method for middle and upper quantiles. Next, we compare the performances of our method and Racine and Li's method. For Gaussian error, our method has a smaller MSE than Racine and Li's method in most cases. For the other two errors, Racine and Li's method, in general, performs better than our method at lower and middle quantiles, whereas our method works better at higher quantiles.

A Pretest Estimator
Our proposed estimator utilizes the local-scale model structure, whereas the check-function-based method does not. Hence, when the location-scale model structure does not hold, our estimator can be inferior to the check-function-based estimator. In practice, researchers might be concerned about the possible model misspecification. Following Racine and Li (2017), we suggest that one conduct a pretest and proceed with the check-function-based method if the test rejects the location-scale specification, and use our proposed method otherwise. This procedure will lead to a pretest estimator. For readers' convenience, we present the pretest procedure below.
The pretest method first assesses the adequacy of the location-scale model specification using the Kolmogorov-Smirnov test proposed by Einmahl and Van Keilegom (2008). We use the resampling bootstrap to obtain the critical values of the test statistic (the null hypothesis is that X i and i are independent of each other): . Therefore, we only need to nonparametrically estimatem,σ once to obtain the residuals In each bootstrap loop, we resample {ˆ i } n i=1 and compute T * KS using the bootstrap sample whereˆ * is the bootstrap residual. The empirical distribution of T * KS is used to obtain the critical values of the null distribution of T KS . As the computation of TS * KS does not involve any optimization procedures, the computation of T * KS by the resampling process is fairly fast.
We now examine the performance of the pretest estimator. We consider the following location-shape DGP that violates the location-scale model assumption: where X i ∼ Uniform[−1,1]. We also consider two different error DGPs: (1) (X i ) is a Gaussian mixture of N(−1,0.5) and N(3,0.5) and (2) (X i ) is a Gaussian mixture of N(−1,1) and N(1,1). For both cases, the mixing probability is (1 + X i )/2 ∈ [0,1]. Figure 1 presents the two error distributions. We use the same mean function as given in (8).   For each replication, we first conduct a model specification test proposed by Einmahl and Van Keilegom (2008), with the null hypothesis being the locationscale model. 5 The pretest estimator is the check-function-based estimator if the null is rejected, and our proposed estimator otherwise.
Tables 4 and 5 report the mean MSEs of the pretest estimator. The MSE is calculated similar to the way it was calculated in Section 6.1. Table 4 reveals that the pretest estimator mimics the behavior of the check-function-based estimator. On the other hand, the table also reveals that the check-function-based method that does not require the location-shape assumption outperforms our estimator when the separation between the two normal distributions is large (i.e., the misspecification is severe). Table 5 suggests that our estimator holds its own when the misspecification is mild.
The simulation results show that the pretest procedure works well in guarding against severe misspecification.

Bootstrap Uniform Confidence Interval Coverage Ratio
In this subsection, we examine the coverage ratios of 95% and 90% uniform confidence intervals with sample sizes n = 100, 200, and 400. We conduct R = 5 Einmahl and Van Keilegom (2008) test for the independence between the covariate(s) X and the error term in the location-scale model (1). One constructs the Kolmogorov-Smirnov test statistic T KS = √ nsup whereF X,ˆ (x,t),F X (x), andFˆ (t) are empirical CDFs andˆ = y−m(x) σ (x) . We use the resampling bootstrap to obtain the critical values. Table 5. Mean MSEs of the pretest estimator, location-shape Gaussian mixture  N(−1,1), and N(1,1)  1,000 Monte Carlo replications, and for each replication, we generate B = 500 bootstrap statistics to construct uniform confidence intervals. We consider five evaluation points: x 0 ∈ {−2/3, − 1/3,0,1/3,2/3}, and three error distributions: N(0,1), χ 2 (5), and exp(1). The coverage ratio is defined as the number of times the confidence bands contain the complete quantile curve over τ ∈ T = [0.1,0.9] divided by the number of bootstraps. Our theoretical analysis assumes that h j is undersmoothed relative to the optimal smoothing parameter that balances the estimated squared bias and variance and that h * j is further undersmoothed with respect to h j , for j = 1,2. In simulations, we use both the optimal and undersmoothed bandwidths. For undersmoothing, we choose h j = n −α 1 h opt j , and h * j = n −α 2 h * opt j , where α 1 ,α 2 = 1/20,1/10 , and that h * j = o(h j ), h opt j , and h * opt j denote the optimal bandwidths obtained from leave-oneout LS-CV, j = 1,2. Table 6 presents the 95% coverage ratio of the uniform confidence interval using a Gaussian kernel. We observe that: first, as sample size increases, the coverage ratios get closer to nominal coverage probabilities; second, optimal smoothing can lead to undercoverage in some cases, and undersmoothing improves the coverage. 6 In general, our proposed bootstrap method has adequate coverage, and 6 As pointed out in Calonico et al. (2014), if one uses the distributional approximation to construct the confidence interval, optimal smoothing will lead to a nonnegligible bias; conventional bias correction can also have very poor performance, because it may add to the finite-sample variability of the usual t-statistic. They proposed a robust biascorrected t-statistic by accounting for the variability.  The 90% coverage ratio of the uniform confidence interval using a Gaussian kernel is shown in Table 7. We observe features similar to those in Table 6.
In Section S3 in the Supplementary Material (Tables S3.2 and S3.3), we also report the 95% and 90% coverage ratios of uniform confidence intervals over τ ∈ T = [0.2,0.8]. We observe features similar to those in Tables 6 and 7. However, in general, the coverage ratios are more adequate than those over τ ∈ T = [0.1,0.9]. The reason is that, as τ approaches 0 and 1, the quantile estimation accuracy 7 In Section S3 in the Supplementary Material (Table S3.4), we also report the 95% and 90% coverage ratios of the uniform confidence intervals over τ ∈ T = [0.2,0.8] with an Epanechnikov kernel. The simulations show that the bootstrap confidence band coverage for using different kernels is quite similar. This suggests that, in practice, the estimated bootstrap uniform confidence interval is not sensitive to different kernel functions.  gets less precise, which will undermine the coverage of the bootstrap confidence interval.
As a co-editor and a referee suggested, one may consider using the nonparametric resampling bootstrap method. The resampling bootstrap can be conducted in three steps, similar to the residual three-step bootstrap procedure described in Section 4.1. The first step generates the bootstrap sample The last two steps are the same as in the residual bootstrap procedure. In Section S3 in the Supplementary Material (Tables S3.5 and S3.6), we report the 95% and 90% coverage ratios of the resampling uniform confidence interval over τ ∈ [0.2,0.8], under the same DGPs as in the residual bootstrap case.
One referee also suggested a score bootstrap method. We implement the score bootstrap method using the following steps.
Note that we only need to estimate σ 0 (X i ), σ 0 (x), and f (x) once in (11) and (12). We do not have to estimate them in each bootstrap loop. Second, we compute the score bootstrap version conditional quantile estimator q * τ j (x), which is defined bŷ (11) and (12), respectively, andQ (τ j ) is obtained in the same way with the smoothing parameter b j , j = 1,2, as in Step 1 of Section 2.
The last step is the same as Step 3 of the residual bootstrap described in Section 4.1, and we can also construct the uniform confidence interval accordingly.
In Section S3 in the Supplementary Material (Tables S3.7 and S3.8), we report the 95% and 90% coverage ratios of score bootstrapped uniform confidence intervals, under the same DGPs as in the residual bootstrap case. Simulation results show that the score bootstrap works reasonably well. In addition, in Table S3.9, we also present a computation time comparison between the residual bootstrap and the score bootstrap. The score bootstrap method only takes about 20% ∼ 30% of the computation time of the residual bootstrap method for the sample sizes we considered. Therefore, the score bootstrap method is computationally more efficient. We observe that, with sample size n = 100, the score bootstrap takes about 30% of the residual bootstrap computation time, and as sample size increases to n = 400, this number drops to 20%. We expect that for larger sample sizes (say, in a big data scenario), the computational advantage of the score bootstrap over the residual-based bootstrap method will be even more substantial.

AN EMPIRICAL APPLICATION
In this section, we present an illustrative empirical example to compare the performances of our proposed method with those of the check-function-based method.

Canadian High School Graduate Earnings
We use a Canadian cross-section wage dataset. The data consist of a random sample taken from the 1971 Canadian Census Public Use Tapes for male individuals having common education (grade 13). The sample size is 205. The data contain two variables: "logwage" (logarithm of the wages) and "age" (years). We are interested in estimating the conditional quantiles of logwage given age, q τ (age), as well as its derivative function. Therefore, we choose to use the local linear method to estimate m 0 (x). However, when applying the local linear method to estimate σ 0 (x), sometimes we get a negative estimated value ofσ h 2 (x). To avoid this problem, we first use the local constant method to estimate σ 0 (x). Then, we take the derivative ofσ h 2 (x) with respect to x to obtain an estimate ofσ 0 (x) (we use a Gaussian kernel so that the kernel function is differentiable). Figure 2 plots estimated Canadian logwage quantile curves using our proposed method and the check-function-based method. We see that for low quantiles (τ = 0.1 and 0.15), the wages started to decrease after age 35, whereas for upper quantiles (τ = 0.85 and 0.9), the wages started to decrease around age 55. It is likely that high-quantile people have higher unobserved characteristics (such as ability) that cause their wages to continue to increase until age 55, whereas at the low quantile, people may hold more labor-intensive jobs whose wages start to decrease at a much younger age compared with those of high-quantile individuals. In Section S3 in the Supplementary Material, we provide estimation results using wage (to replace logwage) as the dependent variable. Figure 3 plots the corresponding quantile derivative curves. The check-functionbased method estimated quantile derivative curve is very wiggly, whereas our proposed method gives a much smoother estimated derivative curve. From Figure 3, we can see the low quantiles' derivative functions have steeper downward slopes than those of upper quantiles. All the derivative curves reach the maximum deceleration at around age 40. After age 40, the low-quantile derivative curves still remain at the negative regions, meaning that their logwage still decreases as age goes up, although the rate of decline is less than that at age 40. In contrast, for high quantiles, the derivative curves remain positive until close to age 55. We see that smooth curves estimated by our method allow us to better interpret the estimation results, whereas it is hard to do so with the wiggly curves obtained using the checkfunction-based method.

CONCLUSION
In this paper, we propose a new and easy-to-implement nonparametric method for estimating conditional quantile functions. We derive the asymptotic theory and provide a practical procedure for constructing uniform confidence bands. The proposed conditional quantile estimator compares well with the checkfunction counterpart, and the bootstrap confidence interval has adequate coverage probabilities. An empirical application using a Canadian cross-section wage dataset showcases the appealing feature of our proposed method in practice. Notes: Subfigures (I)-(V) plot the conditional quantiles of logwage given age, with age at 24, 27, 38, 49, and 56 (which correspond to the 10th, 25th, 50th, 75th, and 90th percentiles of age, respectively).
There are many directions in which one can extend the results of this paper to more general settings. For example, one can allow for time-series-dependent data, as in Han et al. (2016), or allow for panel nonstationary data considered in Chen and Khan (2008). Another extension is to allow for the covariate X i to be endogenous, as in Horowitz and Lee (2007), Su and Hoshino (2016), and Kaplan and Sun (2017). We leave these as possible future research topics.

A. Some Useful Lemmas
To avoid the slow convergence rate at the boundary region, we will estimate m 0 (x) and σ 0 (x), for x ∈ X n , where x ∈ X n is the trimmed set defined earlier (see the beginning part of Section 2). Asymptotically, P(X ∈ X n ) → P(X ∈ X ) = 1. In this appendix, we present a few lemmas that are used to prove the main results of the paper. The proofs of these lemmas are given in Section S1 in the Supplementary Material. LEMMA A.1. Suppose that Assumptions 3.1-3.3 and 3.5 hold. Then, we have: where B 1 (x) = 1 2 μ 2 2ṁ 0 (x)ḟ (x)/f (x) +m 0 (x) , and μ 2 = u 2 K(u)du, and where B 2 (x) = μ 2 4σ 0 (x) 2δ 2 (x)ḟ (x)/f (x) +δ 2 (x) , and δ 2 (x) = σ 2 0 (x).

B. Proofs of Main Results
Before the proof of Theorem 3.1, we first give the following proposition. Under Assumptions 3.1-3.5 and using a similar proof as the proof of Theorem 2 in Akritas and Van Keilegom (2001), we can show that the processF (·) − F (·) is of order o p n −2/5 , whereF is the empirical distribution function of the residuals {ˆ i } n i=1 . Then, by Theorem 3.9.4 (delta method) of van der Vaart and Wellner (1996), we will have the result (B.1).