Testing for a General Class of Functional Inequalities

In this paper, we propose a general method for testing inequality restrictions on nonparametric functions. Our framework includes many nonparametric testing problems in a unified framework, with a number of possible applications in auction models, game theoretic models, wage inequality, and revealed preferences. Our test involves a one-sided version of $L_{p}$ functionals of kernel-type estimators $(1\leq p<\infty )$ and is easy to implement in general, mainly due to its recourse to the bootstrap method. The bootstrap procedure is based on nonparametric bootstrap applied to kernel-based test statistics, with estimated"contact sets."We provide regularity conditions under which the bootstrap test is asymptotically valid uniformly over a large class of distributions, including the cases that the limiting distribution of the test statistic is degenerate. Our bootstrap test is shown to exhibit good power properties in Monte Carlo experiments, and we provide a general form of the local power function. As an illustration, we consider testing implications from auction theory, provide primitive conditions for our test, and demonstrate its usefulness by applying our test to real data. We supplement this example with the second empirical illustration in the context of wage inequality.


Introduction
In this paper, we propose a general method for testing inequality restrictions on nonparametric functions. To describe our testing problem, let v τ,1 , . . . , v τ,J denote nonparametric real-valued functions on R d for each index τ ∈ T , where T is a subset of a finite dimensional space. We focus on testing H 0 : max{v τ,1 (x), · · ·, v τ,J (x)} ≤ 0 for all (x, τ ) ∈ X × T , against H 1 : max{v τ,1 (x), · · ·, v τ,J (x)} > 0 for some (x, τ ) ∈ X × T , (1.1) where X ×T is a domain of interest. We propose a one-sided L p integrated test statistic based on nonparametric estimators of v τ,1 , . . . , v τ,J . We provide general asymptotic theory for the test statistic and suggest a bootstrap procedure to compute critical values. We establish that our test has correct uniform asymptotic size and is not conservative. We also determine the asymptotic power of our test under fixed alternatives and some local alternatives.
We allow for a general class of nonparametric functions, including, as special cases, conditional mean, quantile, hazard, and distribution functions and their derivatives. For example, v τ,j (x) = P (Y j ≤ τ |X = x) can be the conditional distribution function of Y j given X = x, or v τ,j (x) can be the τ -th quantile of Y j conditional on X = x. We can also allow for transformations of these functions satisfying some regularity conditions. The nonparametric estimators we consider are mainly kernel-type estimators but can be allowed to be more general, provided that they satisfy certain Bahadur-type linear expansions.
Inequality restrictions on nonparametric functions arise often as testable implications from economic theory. For example, in first-price auctions, Guerre, Perrigne, and Vuong (2009, GPV hereafter) show that the quantiles of the observed equilibrium bid distributions with different numbers of bidders should satisfy a set of inequality restrictions (Equation (5) of GPV). If the auctions are heterogeneous so that the private values are affected by observed characteristics, we may consider conditionally exogenous participation with a conditional version of the restrictions (see Section 3.2 of GPV). Such restrictions are in the form of multiple inequalities for linear combinations of nonparametric conditional quantile functions. Our test then can be used to test whether the restrictions hold jointly uniformly over quantiles and observed characteristics in a certain range. In this paper, we use this auction example to illustrate the usefulness of our general framework. To the best of our knowledge, there does not exist an alternative test available in the literature for this kind of examples.
In addition to GPV, a large number of auction models are associated with some forms of functional inequalities. See, for example, Haile and Tamer (2003), Haile, Hong, and Shum (2003), Aradillas-López, Gandhi, and Quint (2013a), Aradillas-López, Gandhi, and Quint (2013b), and Krasnokutskaya, Song, and Tang (2013), among others. Our method can be used to make inference in their setups, while allowing for continuous covariates.
Econometric models of games belong to a related but distinct branch of the literature, compared to the auction models. In this literature, inference on many game theoretic models are recently based on partial identification or functional inequalities. For example, see Tamer (2003), Andrews, Berry, and Jia (2004), Berry and Tamer (2007), Aradillas-López and Tamer (2008), Ciliberto and Tamer (2009), Beresteanu, Molchanov, and Molinari (2011), Galichon and Henry (2011), Chesher and Rosen (2012), and Aradillas-López and , among others. See de Paula (2013) and references therein for a broad recent development in this literature. Our general method provides researchers in this field a new inference tool when they have continuous covariates.
Inequality restrictions also arise in testing revealed preferences. Blundell, Browning, and Crawford (2008) used revealed preference inequalities to provide the nonparametric bounds on average consumer responses to price changes. In addition, Blundell, Kristensen, and Matzkin (2014) used the same inequalities to bound quantile demand functions. It would be possible to use our framework to test revealed preference inequalities either in average demand functions or in quantile demand functions. See also Hoderlein and Stoye (2013) and Kitamura and Stoye (2013) for related issues of testing revealed preference inequalities.
In addition to the literature mentioned above, many results on partial identification can be written as functional inequalities (see, e.g., Imbens and Manski (2004), Manski (2003), Manski (2007), Manski and Pepper (2000), Tamer (2010), and references therein). In Section 3, we provide a couple of motivating examples of partially identified econometric models (one from Chesher and Rosen (2014) and the other from Khan, Ponomareva, and Tamer (2013)) for which our testing approach can be used to construct confidence regions but to which none of the currently available methods can be applied.
Our framework has several distinctive merits. First, our proposal is easy to implement in general, mainly due to its recourse to the bootstrap method. The bootstrap procedure is based on nonparametric bootstrap applied to kernel-based test statistics. We establish the general asymptotic (uniform) validity of the bootstrap procedure under high level conditions and provide low level conditions for an empirical example based on GPV.
Second, our proposed test is shown to exhibit good power properties both in finite and large samples. Good power properties can be achieved by the use of critical values that adapt to the binding restrictions of functional inequalities. This could be done in various ways; in this paper, we follow the "contact set" approach of Linton, Song, and Whang (2010) and propose bootstrap critical values. As is shown in this paper, the bootstrap critical values yield significant power improvements. Furthermore, we find through our local power analysis that this class of tests exhibit dual convergence rates depending on Pitman directions, and in many cases, the faster of the two rates achieves a parametric rate of √ n, despite the use of kernel-type test statistics.
Third, we establish the asymptotic validity of the proposed test uniformly over a large class of distributions, without imposing restrictions on the covariance structure among nonparametric estimates of v τ,j (·), thereby allowing for degenerate cases. Such a uniformity result is crucial for ensuring good finite sample properties for tests whose (pointwise) limiting distribution under the null hypothesis exhibits various forms of discontinuity. The discontinuity in the context of this paper is highly complex, as the null hypothesis involves inequality restrictions on a multiple number of (or even a continuum of) nonparametric functions. We establish the uniform validity of the test in a way that covers these various incidences of discontinuity. Our new uniform asymptotics may be of independent interest in many other contexts.
Much of the recent literature on testing inequality restrictions focuses on conditional moment inequalities. 1 Researches on conditional moment inequalities include Andrews and Shi (2013), Andrews and Shi (2014), Armstrong (2011a), Armstrong (2011b), Armstrong and Chan (2013), Chernozhukov, Lee, and Rosen (2013), Chetverikov (2011), Fan and Park (2014), Khan and Tamer (2009), Kim (2009), Lee, Song, and Whang (2013), Menzel (2009), Ponomareva (2010, among others. In contrast, this paper's approach naturally covers a wide class of inequality restrictions among nonparametric functions that the moment inequality framework does not (or at least is cumbersome to) apply. Such examples include testing multiple inequalities that are defined by differences in conditional quantile functions uniformly over covariates and quantiles. 2 If we restrict our attention to the conditional moment inequalities, then our approach is mostly comparable to the moment selection approach of Andrews and Shi (2013). Our general framework is also related to testing qualitative nonparametric hypotheses such as monotonicity in mean regression. See, for example, Baraud, Huet, and Laurent (2005), Chetverikov (2012), Dümbgen and Spokoiny (2001), and Ghosal, Sen, and van der Vaart (2000) among many others. See also Lee, Linton, and Whang (2009) and Delgado and Escanciano (2012) for testing stochastic monotonicity.
Among aforementioned papers, Chernozhukov, Lee, and Rosen (2013) developed a supnorm approach in testing inequality restrictions on nonparametric functions using pointwise asymptotics, and in principle, could be extended to test general functional inequalities as 1 There exists large literature on inference on models with a finite number of unconditional moment inequality restrictions. Some examples include Andrews and Barwick (2012), Andrews and Guggenberger (2009), Andrews and Soares (2010), Beresteanu and Molinari (2008), Bugni (2010), Canay (2010), Chernozhukov, Hong, and Tamer (2007), Galichon and Henry (2009), Romano and Shaikh (2008), Romano and Shaikh (2010), and Rosen (2008), among others. 2 A working paper version (Andrews and Shi 2009) of Andrews and Shi (2013) covers testing moment inequalities indexed by τ ∈ T , but their framework does not appear to be easily extendable to deal with functions of multiple conditional quantiles such as differences in conditional quantiles.
in (1.1). 3 Example 4 of Chernozhukov, Lee, and Rosen (2013) considered the case of one inequality with a conditional quantile function at a particular quantile, but it is far from trivial to extend this example to multiple inequalities of differences in conditional quantile functions uniformly over a range of quantiles. As this paper demonstrates through empirical applications, such testing problems arise frequently in the fields of industrial organization and labor economics (see Sections 3.3 and 3.4).
The uniformity result in this paper is non-standard since our test is based on asymptotically non-tight processes, in contrast to Andrews and Shi (2013) who convert conditional moment inequalities into an infinite number of unconditional moment inequalities. This paper's development of asymptotic theory draws on the method of Poissonization (see, e.g., Horváth (1991) and Giné, Mason, and Zaitsev (2003)). For applications of this method, see Anderson, Linton, and Whang (2012) for inference on a polarization measure,  for testing for conditional treatment effects, and Lee, Song, and Whang (2013) for testing inequalities for nonparametric regression functions using the numerator of the Nadaraya-Watson estimator (based on pointwise asymptotics). Also, see Mason and Polonik (2009) and Biau, Cadre, Mason, and Pelletier (2009) for support estimation.
The remainder of the paper is as follows. Section 2 gives an informal description of our general framework by introducing test statistics and critical values and by providing intuitions behind our approach. In Section 3, we present four motivating examples that include two examples of partially identified models and two empirical examples to demonstrate the usefulness of our test. The first empirical example is based on GPV and the second one is about testing functional inequalities in the context of wage inequality, inspired by Acemoglu and Autor (2011). In Section 4, we establish the uniform asymptotic validity of our bootstrap test using high-level conditions. We also provide a class of distributions for which the asymptotic size is exact. In Section 5, we give primitive conditions for the uniform asymptotic validity of our inference method for the first empirical example in Section 3. In Section 6, we establish consistency of our test and its local power properties. Section 7 concludes. Appendices consist of two parts. The first part presents results of Monte Carlo experiments and more examples of testing functional inequalities that include an alternative statistic for the first empirical example and testing monotonicity with respect to a covariate 3 Our test involves a one-sided version of L p -type functionals of nonparametric estimators (1 ≤ p < ∞). We regard the sup-norm and L p norm approaches complementary, each with its own strength and weakness. For example, our test and also the test of Andrews and Shi (2013) have higher power against relatively flat alternatives, whereas the test of Chernozhukov, Lee, and Rosen (2013) has higher power against sharplypeaked alternatives. See the results of Monte Carlo experiments reported in Appendix I. See also Andrews and Shi (2013), Andrews and Shi (2014), and Chernozhukov, Lee, and Rosen (2013) for related discussions and further Monte Carlo evidence.
in conditional expectation, cumulative distribution, and quantile functions. The remaining part provides all the proofs of theorems.
2. General Overview 2.1. Test Statistics. We present a general overview of this paper's framework by introducing test statistics and critical values. To ease the exposition, we confine our attention to the case of J = 2 here. The definitions and formal results for general J are given later in Section 4.
Throughout this paper, we assume that T is a connected compact subset of a Euclidean space. This does not lose much generality because when T is a finite set, we can redefine our test statistic by taking T as part of the finite index j indexing the nonparametric functions.
2.2. Bootstrap Critical Values. As we shall see later, the asymptotic distribution of the test statistic exhibits complex ways of discontinuities as one perturbs the data generating processes. This suggests that the finite sample properties of the asymptotic critical values may not be stable. Furthermore, the location-scale normalization requires nonparametric estimation and thus a further choice of tuning parameters. This can worsen the finite sample properties of the critical values further. To address these issues, this paper develops a bootstrap procedure. 4 Permitting the convergence rate r n,j to differ across j ∈ N J can be convenient, when the nonparametric estimators have different convergence rates. For example, this accommodates a situation where one jointly tests the non-negativity and monotonicity of a nonparametric function. 5 While our framework permits the case whereσ τ,j (x) is simply chosen to be 1, we allow for a more general case whereσ τ,j (x) is a consistent estimator for some nonparametric quantity.
For example, the set B n,{1} (c n ) is a set of points (x, τ ) such that |v n,τ,1 (x)/σ n,τ,1 (x)| is close to zero, and v n,τ,2 (x)/σ n,τ,2 (x) is negative and away from zero. We call contact sets such sets as B n,{1} (c n ), B n,{2} (c n ), and B n,{1,2} (c n ). Now, comparing (2.2) with (2.1) reveals that the limiting distribution ofθ under the null hypothesis will not depend on points outside the union of the contact sets. Thus the main idea of this paper is to base the bootstrap critical values on the quantity on the right hand side of (2.2) instead of that on the last integral in (2.1). As we will explain shortly in the next subsection, this leads to a test that is uniformly valid and exhibits substantial improvement in power.
Then it is shown later that the test has asymptotically correct size, i.e., (2.6) limsup where P 0 is the collection of potential distributions that satisfy the null hypothesis.
2.3. Obtaining tuning parameters. To constructĉ n , we suggest the following procedure. First, define where ε > 0 is a small number. Then, set c n = C cs (log log n)q 1−αn (S * n ), (2.7) where q 1−αn (S * n ) is the (1 − α n )-th quantile of the bootstrap distribution of S * n with α n = 0.1/ log n, and C cs is a "sensitivity" constant that needs to be chosen by a researcher. Although the rule-of-thumb for c n in (2.7) is not completely data-driven, it has the advantage that the scale ofû τ,j (x) is invariant, due to the term q 1−αn (S * n ); see Chernozhukov, Lee, and Rosen (2013) for a similar idea. 7 This data-dependent choice ofĉ n is encompassed by the theoretical framework of this paper, while many other choices are also admitted. 8 To implement our bootstrap test, it is necessary to fix three constants: η, ε, and C cs , in addition to the bandwidth used in kernel-based nonparametric estimation. Based on our experiences in Monte Carlo experiments, we suggest the following rule-of-thumb: set η and ε to be small numbers, say η = = 10 −6 and check sensitivity with respect to C cs by varying it over a certain range. In particular, we recommend taking C cs = 0.5 and performing sensitivity check by increasing the value of C cs up to 1.5. 9 Regarding the bandwidth selection, we suggest the following rule. First, choose a bandwidth, sayh, using a readily available bandwidth selection rule that is typically designed 7 Note that q 1−αn (S * n ) is the (1 − α n ) quantile of the supremum ofŝ * τ,j (x) over (j, τ, x) for a sufficiently small ε, provided thatŝ * τ,j (x) is non-degenerate. Note that (1 − α n ) converges to 1 as n gets large. Thus, this observation leads to the choice ofĉ n in (2.7) that is proportional to q 1−αn (S * n ) times a very slowing growing term such as log log n, to insure thatĉ n diverges to infinity but as slowly as possible, while having the property of scale invariance. 8 See Assumption A4(ii) below for sufficient conditions for a data dependent choice ofĉ n . It is not hard to see that the conditions are satisfied, once the uniform convergence rates ofv τ,j (x) andσ τ,j (x) and their bootstrap versions hold as required in Assumptions A3, A5, and B2 and B3. 9 The rationale behind this particular recommendation is that in Monte Carlo experiments reported in Appendix I, our test performed well with C cs = 0.5 and we would like to be on the more conservative side when we check the sensitivity to C cs .
for the purpose of optimal estimation (e.g. see Fan and Gijbels (1996) for local polynomial estimators). When d = 1 and the underlying function is twice continuously differentiable, the bandwidth has the formh = Cn −1/5 with some constant C. Second, if necessary, modifỹ h so that it satisfies the regularity conditions imposed in this paper. For example, in case of estimating conditional quantile functions, Assumption AUC-3 in Section 5 is satisfied by the choice of h = n −s with the condition 1/4 < s < 1/3 if the local linear estimator is used with d = 1. Then we can take h =h × n 1/5 × n −s for some s satisfying 1/4 < s < 1/3. 2.4. Discontinuity, Uniformity, and Power. Many tests of inequality restrictions exhibit discontinuity in its limiting distribution under the null hypothesis. When the inequality restrictions involve nonparametric functions, this discontinuity takes a complex form, as emphasized in Section 5 of Andrews and Shi (2013).
To see the discontinuity problem in our context, let {(Y i , X i ) } n i=1 be i.i.d. copies from an observable bivariate random vector, (Y, X) ∈ R × R, where X i is a continuous random variable with density f . We consider a simple testing example: Here, with the subscript τ suppressed, we set J = 1, r n,1 = √ nh, p = d = 1, and define where K is a nonnegative, univariate kernel function with compact support and h is a bandwidth.
When liminf n→∞ Q (B n,1 (0)) > 0 with Q(B n,1 (0)) denoting Lebesgue measure of B n,1 (0), we can show that the leading term on the right hand side in (2.10) becomes asymptotically N (0, σ 2 0 ) for some σ 2 0 > 0. On the other hand, the second term vanishes in probability as n → ∞ under H 0 because for each x ∈ X \B n,1 (0), This asymptotic theory is pointwise in P (with P fixed and letting n → ∞), and may not be adequate for finite sample approximation. There are two sources of discontinuity. First, the pointwise asymptotic theory essentially regards the drift component √ nhv n,1 (x) as −∞, whereas in finite samples, the component can be very negative, but not −∞. Second, even if the nonparametric function √ nhv n,1 (x) changes continuously, the contact set B n,1 (0) may change discontinuously in response. 10 While there is no discontinuity in the finite sample distribution of the test statistic, there may arise discontinuity in its pointwise asymptotic distribution. Furthermore, the complexity of the discontinuity makes it harder to trace its source, when we have J > 2. As a result, the asymptotic validity of the test that is established pointwise in P is not a good justification of the test. We need to establish the asymptotic validity that is uniform in P over a reasonable class of probabilities.
Under regularity conditions, bootstrap critical values based on the least favorable configuration (LFC) such that can be shown to yield tests that are asymptotically valid uniformly in P . However, they are often too conservative in practice. Using a critical value based on also yields an asymptotically valid test, and yetθ * LFC >θ * 1 in general. Thus the bootstrap tests that use the contact set have better power properties than those that do not. The power 10 For example, take √ nhv n,1 (x) = −x 2 /n on X = [−1, 1]. Let v 0 (x) ≡ 0. Then √ nhv n,1 (x) goes to v 0 (x) uniformly in x ∈ X as n → ∞. However, for each n, B n,1 (0) = {x ∈ X : √ nhv n,1 (x) = 0} = {0}, which does not converge in Hausdorff distance to B 1 (0) ≡ {x ∈ X : v 0 (x) = 0} = X . improvement is substantial in many simulation designs and can be important in real-data applications. 11 Now, let us see how the choice of c * α,η ≡ max{c * α , h 1/2 η +â * } (with d = 1 here) leads to bootstrap inference that is valid even when the test statistic becomes degenerate under the null hypothesis. The degeneracy arises when the inequality restrictions hold with large slackness, so that the convergence in (2.11) holds with σ 2 0 = 0, and hence h −1/2 (θ − a n,1 ) = o P (1).
Note that for the sake of validity only, one may replace h 1/2 η by a fixed constant, sayη > 0. However, this choice would render the test asymptotically too conservative. The choice of h 1/2 η in this paper makes the test asymptotically exact for a wide class of probabilities, while preserving the uniform validity in both the cases of degeneracy and nondegeneracy. 12 The precise class of probabilities under which the test becomes asymptotically exact is presented in Section 4.
There are two remarkable aspects of the local power behavior of our bootstrap test. First, the test exhibits two different kinds of convergence rates along different directions of Pitman local alternatives. Second, despite the fact that the test uses the approach of local smoothing by kernel as in Härdle and Mammen (1993), the faster of the two convergence rates achieves a parametric rate of √ n. To see this more closely, let us return to the simple example in (2.8), and consider the following local alternatives: where v 0 (x) ≤ 0 for all x ∈ X and δ(x) > 0 for some x ∈ X , and b n → ∞ as n → ∞ such that v n (x) > 0 for some x ∈ X . The function δ(·) represents a Pitman direction of the local alternatives. Suppose that the test has nontrivial local power against local alternatives of 11 There may exist an alternative approach to improve the power of our test. Romano, Shaikh, and Wolf (2013) proposed a computationally attractive two-step method for testing a finite number of unconditional moment inequalities. It is an interesting topic to extend their two-step approach to our setup, but it is beyond the scope of this paper. 12 Our fixed positive constant η plays a role similar to a fixed constant in Andrews and Shi (2013)'s modification of the sample variance-covariance matrix of unconditional moment conditions, transformed by instruments (ε in their notation in equation (3.5) of Andrews and Shi (2013)).
the form in (2.13), but trivial power whenever b n in (2.13) is replaced by b n that diverges faster than b n . In this case, we say that the test has convergence rate equal to b n against the Pitman direction δ.
As we show later, there exist two types of convergence rates of our test, depending on the choice of δ(x).
the test achieves a parametric rate b n = √ n. On the other hand, when δ(·) is such that the test achieves a slower rate b n = √ nh 1/4 . See Section 6.2 for heuristics behind the results. In Section 6.3, the general form of local power functions is derived.

Motivating Examples
In this section, we first provide two examples of partially identified econometric models for which our testing approach can be used to construct confidence regions. One example is based on generalized instrumental variables models of Chesher and Rosen (2014), and the other is from a panel data model of Khan, Ponomareva, and Tamer (2013). In addition, we give two empirical examples. The first empirical example is on testing auction models following GPV, and the second one is about testing functional inequalities via differencesin-differences in conditional quantiles, inspired by Acemoglu and Autor (2011). All four examples given in this section are not covered easily by existing inference methods, when continuous covariates exist; however, they are all special cases of our general framework.
Appendix II gives more examples of testing problems that can be included in our general framework. In particular, these additional examples include new methods for testing monotonicity with respect to a covariate by constructing one-sided L p -type functionals in a suitable fashion in three examples: one in mean regression, another in conditional distribution function, and the third in quantile regression.
3.1. Generalized Instrumental Variables Models. First, we consider generalized instrumental variables models of Chesher and Rosen (2014). Specially, we illustrate usefulness of our framework using Example 5 of Chesher and Rosen (2014) with the restriction that the structural error U is independent of the instrument Z. In Example 5 of Chesher and Rosen (2014), the outcome variable Y 1 is fully observed, whereas the endogenous explanatory variable Y * 2 is interval censored, that is, One of semiparametric specifications imposed in Chesher and Rosen (2014) is to assume the linear index for the structural function without any parametric specification of the distribution of U . In this specification, Chesher and Rosen (2014) show that the full independence between U and Z implies that the identified set for the structural parameter β is given by the set of b's that satisfy The identified set in (3.1) is a simplified version of the identified set obtained in Section 4 of Chesher and Rosen (2014), without including exogenous explanatory variables. Then, a confidence region for β can be obtained by inverting pointwise 3.2. Panel Data Models with Endogenous Censoring. Consider a panel data model of Khan, Ponomareva, and Tamer (2013). In their framework, a researcher only observes {(Y it , D it , X it ) : i = 1, . . . , n, t = 1, . . . , T } generated from where α i is the unobserved fixed effect that can be correlated with X i = (X i1 , . . . , X iT ) and U i = (U i1 , . . . , U iT ) . Khan, Ponomareva, and Tamer (2013) consider endogenous censoring and obtain bounds under alternative modeling assumptions. To illustrate their approach, note that . When α i + U it has the same distribution as α i + U is conditional on X i for t = s (which they call Model 1), they show that the identified set is the set of b's that satisfy for every (y, x) and every t = 1, . . . , T . Then, to construct a confidence region for β, we may take the following route: for each j = 1, . . . , T , we define and carry out our test pointwise in b. Khan, Ponomareva, and Tamer (2013) focus on the case when covariates have discrete distribution with finite support. Our method provides an inference method for the case of continuous covariates. Our general framework also applies to other partially identified panel data models. For example, see Jun, Lee, and Shin (2011), Li and Oka (2013) and Rosen (2012)  3.3.1. Testing Problem. Suppose that the number I of bidders can take two values, 2 and 3 (that is, I ∈ {2, 3}). For each τ such that 0 < τ < 1, let q k (τ |x) denote the τ -th conditional quantile (given X = x) of the observed equilibrium bid distribution when the number of bidders is I = k, where k = 2, 3. A conditional version of Equation (5) of GPV (with I 1 = 2 and I 2 = 3 in their notation) provides the following testing restrictions: for any τ ∈ (0, 1] and for any x ∈ supp(X), where supp(X) is the (common) support of X, and b is the left endpoint of the support of the observed bids. 13 The restrictions in (3.2) are based on conditionally exogenous participation for which the latent private value distribution is independent of the number of bidders conditional on observed characteristics (X), e.g. appraisal values.
A slightly weaker version of (3.2) can be put into our general testing problem in (1.1). 14 That is, we can test the following null hypothesis: for any (τ, x) ∈ T × X ⊂ (0, 1] × supp(X). 13 In GPV, it is assumed that for I = k, the support of the observed equilibrium bid distribution is [b, b k ] ⊂ [0, ∞) with b < b k , where k = 2, 3. Note that b is common across k's, while b k 's are not. 14 If necessary, we may test the strict inequalities (3.1), instead of the weak inequalities (3.2). However, such test would require a test statistic that is different from ours and needs a separate treatment.
The example in (3.3) illustrates that in order to test the implications of auction theory, it is essential to test the null hypothesis uniformly in τ and x. More specifically, testing for a wide range of τ is important because testable implications are expressed in terms of conditional stochastic dominance relations. Furthermore, testing the relations uniformly over x is natural since theoretical predictions given by conditionally exogenous participation should hold for any realization of observed auction heterogeneity. It also shows that it is important to go beyond the J = 1 case and to include a general J > 1. In fact, if the number of bidders can take more than two values, there could be many more functional inequalities (see Corollary 1 of GPV). Finally, we note that v τ,1 (x) and v τ,2 (x) are not forms of conditional moment inequalities and each involves two different conditional quantile functions indexed by τ . Therefore, tests developed for conditional moment inequalities are not directly applicable to this empirical example. There exist related but distinct papers regarding this empirical example. See, e.g., Marmer, Shneyerov, and Xu (2013) who developed a nonparametric test for selective entry, and Gimenes and Guerre (2013) who proposed augmented quantile regression for first-price auction models.

Test Statistic.
To implement the test, it is necessary to estimate conditional quantile functions. In estimation of q j (τ |x), j = 2, 3, we may use a local polynomial quantile regression estimator, say q j (τ |x). Now writê whereb is a consistent estimator of b. 15 Then testing (3.3) can be carried out using {v τ,j (x) : j = 1, 2} based on our general framework. In this application, our test statistics take the following forms: (3.4) Note that in (3.4), we setσ τ,j (x) ≡ 1. As a matter of fact, it is possible to develop an alternative test statistic by rewriting (3.3) in terms of distribution functions. Appendix II.1 illustrates the usefulness and flexibility of our framework by reconsidering the implications from GPV using a test statistic based on estimating conditional cumulative distribution functions.
3.3.3. Empirical Results. We now present empirical results using the timber auction data used in Lu and Perrigne (2008). 16 They used the timer auction data to estimate bidders' risk aversion, taking advantage of bidding data from ascending auctions as well as those from first-price sealed-bid auctions. In our empirical example, we use only the latter auctions with 2 and 3 bidders, and we use the appraisal value as the only covariate X i (d = 1). Summary statistics and visual presentation of data are given in Table 1 and Figure 2. It can be seen from Table 1 that average bids become higher as the number of bidders increases from 2 to 3. The top panel of Figure 2 suggests that this more aggressive bidding seems to be true, conditional on appraisal values. Before estimation, the covariate was transformed to lie between 0 and 1 by studentizing it and then applying the standard normal CDF transformation. The bottom panel of Figure  2 shows local linear estimates of conditional quantile functions at τ = 0.1, 0.5, 0.9. 17 In this figure, estimates are only shown between the 10% and 90% sample quantiles of the covariate.
On one hand, the 10% conditional quantiles are almost identical between auctions with two bidders (I = 2) and those with three bidders (I = 3). On the other hand, the 50% and 90% conditional quantiles are higher with three bidders for most values of appraisal values. There is a crossing of two conditional median curves at the lower end of appraisal values.
To check whether inequalities in (3.3) hold in this empirical example, we plot estimates of v τ,1 (x) and v τ,2 (x) in Figure 3. The top panel of the figure shows that 20 estimated curves of v τ,1 (x), each representing a particular conditional quantile, ranging from the 10th percentile to the 90th percentile. There are strictly positive values of v τ,1 (x) at the lower end of appraisal values. The bottom panel of Figure 3 depicts 20 estimated curves of v τ,2 (x), showing that they are all strictly negative. The test based on (3.4) can tell formally whether positive values of v τ,1 (x) at the lower end of appraisal values can be viewed as evidence against economic restrictions imposed by (3.3).
against the null hypothesis beyond random sampling errors. Therefore, we have not found any evidence against economic implications imposed by (3.3).
3.4. Empirical Example 2: Testing Functional Inequalities in the Context of Wage Inequality. We now give an example based on Acemoglu and Autor (2011 Acemoglu and Autor (2011) depict changes in log hourly wages by percentile relative the median. Specifically, they consider the following differences-in-differences in quantiles: for time periods t and s and for quantiles τ , where q t (τ |x) denotes the τ -quantile of log hourly wages conditional on X = x in year t. Acemoglu and Autor (2011) consider males and females together in Figure 9a, males only in Figure 9b, and females only in Figure 9c. Thus, in their setup, the only covariate X is gender.

Test Statistic.
To implement the test, we again use a local polynomial quantile regression estimator, sayq t (τ |x). Then ∆ t,s (τ, x) can be estimated bŷ Then testing (3.5) can be carried out usinĝ wherev τ,t,s (x) = −∆ t,s (τ, x). 19 Here, to reflect different sample sizes between two time periods, we set where n j and h j are the sample size and the bandwidth used for nonparametric estimation for year j = t, s.
18 Note that H 0 in (3.5) includes the case ∆ t,s (τ, x) ≡ 0, which does not correspond to the notion of polarization. In view of this, our null hypothesis in (3.5) can be regarded as a weak form of polarization hypothesis, whereas a more strict version can be written as the inequality in (3.5) holds strictly for some high and low quantiles. 19 Note that the null hypothesis is written as positivity in (3.5). Hencev τ,t,s (x) is defined accordingly.    ∆ 1988,1974 (τ, x) and −∆ 2008,1988 (τ, x), respectively, where x is age in years.
3.4.3. Empirical Results. We used the CPS data extract of Acemoglu and Autor (2011). 20 In our empirical example, we use age in years as the only covariate. Summary statistics and visual presentation of data are given in Table 2 and Figure 4. Note that Figure 4 replicates the basic patterns of Figures 9 of Acemoglu and Autor (2011).
We now turn to the conditional version of Figure 4, using age as a conditioning variable. As an illustration, let X be an interval of ages between 25 and 60 and let T = [0.1, 0.9]. To check whether inequalities in∆ t,s (τ, x) ≥ 0 hold for each value of (x, τ ) ∈ X × T , we plot estimates ofv τ,t,s (x) = −∆ t,s (τ, x) in Figure 5. The top panel of the figure shows that 5 estimated curves ofv τ,1988,1974 (x), each representing a particular conditional quantile, and the bottom panel shows the corresponding graph for period 1988-2008. 21 By construction, the estimated curve is a flat line at zero when τ = 0.5. As consistent with Figure 4, the lower quantiles seem to violate the null hypothesis, especially for the period 1974-1988. As before, our test can tell formally whether positive values ofv τ,t,s (x) lead to rejection of the null hypothesis of polarization of wage growth.
We considered both the L 1 and L 2 test statistics described in (3.6). As before, the contact set was estimated withĉ n = C cs log log(n)q 1−0.1/ log(n) (S * n ) with r n = √ nh. 22 We checked the sensitivity to the tuning parameters with C cs ∈ {0.5, 1, 1.5}.
For period 1974-1988, we rejected the null hypothesis at the 1% level across all three values of C cs . However, for period 1988-2008, we fail to reject the null hypothesis at the 5% level for any value of C cs . Therefore, the changing patterns of the US wage distribution around 1988, reported in Acemoglu and Autor (2011), seem to hold up conditionally on age as well.

Uniform Asymptotics under General Conditions
In this section, we establish uniform asymptotic validity of our bootstrap test using highlevel conditions. We also provide a class of distributions for which the asymptotic size is exact. We first define the set of distributions we consider.
Definition 1. Let P denote the collection of the potential joint distributions of the observed random vectors that satisfy Assumptions A1-A6, and B1-B4 given below. Let P 0 ⊂ P be the sub-collection of potential distributions that satisfy the null hypothesis.
Let || · || denote the Euclidean norm throughout the paper. For any given sequence of subcollections P n ⊂ P, any sequence of real numbers b n > 0, and any sequence of random vectors Z n , we say that Z n /b n → P 0, P n -uniformly, or Z n = o P (b n ), P n -uniformly, if for any a > 0, limsup Similarly, we say that Z n = O P (b n ), P n -uniformly, if for any a > 0, there exists M > 0 such that limsup We also define their bootstrap counterparts. Let P * denote the probability under the bootstrap distribution. For any given sequence of subcollections P n ⊂ P, any sequence of real 21 As before, underlying conditional quantile functions are estimated via the local linear quantile regression estimator with the kernel function K(u) = 1.5[1 − (2u) 2 ] × 1{|u| ≤ 0.5}. One important difference from the first empirical example is that we used the CPS sample weight, which were incorporated by multiplying it to the kernel weight for each observation. Finally, the bandwidth was h = 2.5 for all years. 22 To accommodate different sample sizes across years, we set n = (n 1974 + n 1988 + n 2008 )/3 in computingĉ n . numbers b n > 0, and any sequence of random vectors Z * n , we say that Z * n /b n → P * 0, P nuniformly, or Z * n = o P * (b n ), P n -uniformly, if for any a > 0, Similarly, we say that Z * n = O P * (b n ), P n -uniformly, if for any a > 0, there exists M > 0 such that limsup In particular, when we say Z n = o P (b n ) or O P (b n ), P-uniformly, it means that the convergence holds uniformly over P ∈ P, and when we say Z n = o P (b n ) or O P (b n ), P 0 -uniformly, it means that the convergence holds uniformly over all the probabilities in P that satisfy the null hypothesis.

Test Statistics and Critical Values in General Form.
First, let us extend the test statistics and the bootstrap procedure to the general case of J ≥ 1. Let Λ p : R J → [0, ∞) be a nonnegative, increasing function indexed by p ≥ 1. While the theory of this paper can be extended to various general forms of map Λ p , we focus on the following type: where for a ∈ R, [a] + = max{a, 0}. The test statistic is defined aŝ θ = X ×T Λ p (û τ,1 (x), · · ·,û τ,J (x)) dQ(x, τ ).
To motivate our bootstrap procedure, it is convenient to begin with the following lemma. Let us introduce some notation. Define N J ≡ 2 N J \{∅}, i.e., the collection of all the nonempty subsets of N J ≡ {1, 2, · · ·, J}. For any A ∈ N J and v = (v 1 , · · ·, v J ) ∈ R J , we define v A to be v except that for each j ∈ N J \A, the j-th entry of v A is zero, and let That is, Λ A,p (v) is a "censoring" of Λ p (v) outside the index set A. Now, we define a general version of contact sets: for A ∈ N J and for c n,1 , c n,2 > 0, (4.3) B n,A (c n,1 , c n,2 ) ≡ (x, τ ) ∈ X × T : |r n,j v n,τ,j (x)/σ n,τ,j (x)| ≤ c n,1 , for all j ∈ A r n,j v n,τ,j (x)/σ n,τ,j (x) < −c n,2 , for all j ∈ N J /A , where σ n,τ,j (x) is a "population" version ofσ τ,j (x) (see e.g. Assumption A5 below.) When c n,1 = c n,2 = c n for some c n > 0, we write B n,A (c n ) = B n,A (c n,1 , c n,2 ).
Lemma 1. Suppose that Assumptions A1-A3 and A4(i) in Section 4.2 hold. Suppose further that c n,1 > 0 and c n,2 > 0 are sequences such that log n{c −1 n,1 + c −1 n,2 } → 0, as n → ∞. Then as n → ∞, where P 0 is the set of potential distributions of the observed random vector under the null hypothesis.
The lemma above shows that the test statisticθ is uniformly approximated by the integral with domain restricted to the contact sets B n,A (c n,1 , c n,2 ) in large samples. Note that the asymptotic result is remarkable, in the sense that the approximation error betweenθ and the expression on the right-hand side is o P (ε n ) for any ε n → 0. The result of Lemma 1 suggests that one may consider a bootstrap procedure that mimics the representation ofθ in Lemma 1.
The explicit condition forĉ n is found in Assumption A4 below. Given the bootstrap coun- ,σ τ,j (x)] : j ∈ N J }, we define our bootstrap test statistic as follows: Let c * α be the (1 − α)-th quantile from the bootstrap distribution ofθ * and take c * α,η = max{c * α , h d/2 η +â * } as our critical value, where η > 0 is a small fixed number. One of the main technical contributions of this paper is to present precise conditions under which this proposal of bootstrap test works. We present and discuss them in subsequent sections.
To see the intuition for the bootstrap validity, first note that the uniform convergence of r n,j {v τ,j (x) − v n,τ,j (x)} over (x, τ ) implies that (4.4) B n,A (c n,L , c n,U ) ⊂B A (ĉ n ) ⊂ B n,A (c n,U , c n,L ) with probability approaching one, whenever P {c n,L ≤ĉ n ≤ c n,U } → 1. Therefore, if √ log n/c n,L → 0, then, (lettingŝ τ,j ≡ r n,j (v τ,j (x) − v n,τ,j (x))/σ τ,j (x)), we have with probability approaching one, by Lemma 1 and the null hypothesis. When the last sum has a nondegenerate limit, we can approximate its distribution by the bootstrap distribution where the inequality follows from (4.4). 23 Thus the critical value is read from the bootstrap distribution ofθ * . On the other hand, if the last sum in (4.5) has limiting distribution degenerate at zero, we simply take a small positive number η to control the size of the test. This results in our choice of c * α,η = max{c * α , h d/2 η +â * }.

4.2.
High-Level Regularity Conditions. In this section, we provide high-level conditions needed to develop general results. We assume that S ≡ X × T is a compact subset of a Euclidean space. We begin with the following assumption.
Assumption A1. (Asymptotic Linear Representation) For each j ∈ N J ≡ {1, · · ·, J}, there exists a nonstochastic function v n,τ,j (·) : R d → R such that (a) v n,τ,j (x) ≤ 0 for all (x, τ ) ∈ S under the null hypothesis, and (b) as n → ∞, and the distribution of X i is absolutely continuous with respect to Lebesgue measure, 24 we definê and β n,x,τ,j : RL × R d → R is a function which may depend on n ≥ 1.
Assumption A1 requires that there exist a nonparametric function v n,τ,j (x) around which the asymptotic linear representation holds uniformly in P ∈ P, and v n,τ,j (x) ≤ 0 under the null hypothesis. The required rate of convergence in (4.6) is o P (h d/2 ) instead of o P (1). We need this stronger convergence rate primarily becauseθ − a n is O P (h d/2 ) for some nonstochastic sequence a n . 25 Whenv τ,j (x) is a sample mean of i.i.d. random quantities involving nonnegative kernels andσ n,τ (x) = 1, we may take v n,τ,j (x) = Ev τ,j (x), and then o P ( √ h d ) is in fact precisely equal to 0. If the original nonparametric function v τ,j (·) satisfies some smoothness conditions, we may take v n,τ,j (x) = v τ,j (x), and handle the bias part Ev τ,j (x) − v τ,j (x) using the standard arguments to deduce the error rate o P ( √ h d ). Assumption A1 admits both set-ups. For instance, consider the simple example in Section 2.4. The asymptotic linear representation in Assumption 1 can be shown to hold with is chosen as in (2.9). The following assumption for β n,x,τ,j essentially defines the scope of this paper's framework. 24 Throughout the paper, we assume that X i ∈ R d is a continuous random vector. It is straightforward to extend the analysis to the case where X i has a subvector of discrete random variables. 25 To see this more clearly, we assume that T = {τ }, p = 1, and J = 1, and suppress the subscripts τ and j from the notation, and takeσ(x) = 1 for simplicity. We write (in the case where v n (x) = 0) where R n is an error term that has at least the same convergence rate as the convergence rate of the remainder term in the asymptotic linear representation forv(x). Now we let It can be shown that the leading term is asymptotically normal using the method of Poissonization. Hence h −d/2θ − h −d/2 a n becomes asymptotically normal, if R n = o P (h d/2 ). This is where the faster error rate in the asymptotic linear representation in Assumption A1(i) plays a role.
Assumption A 2. (Kernel-Type Condition) For some compact K 0 ⊂ R d that does not depend on P ∈ P or n, it is satisfied that β n,x,τ,j (y, u) = 0 for all u ∈ R d \K 0 and all (x, τ, y) ∈ X × T × Y j and all j ∈ N J , where Y j denotes the support of Y ij .
Assumption A2 can be immediately verified when the asymptotic linear representation in (4.6) is established. This condition is satisfied in particular when the asymptotic linear representation involves a multivariate kernel function with bounded support in a multiplicative form. In such a case, the set K 0 depends only on the choice of the kernel function, not on any model primitives.
Assumption A3 requires thatv τ,j (x) − v n,τ,j (x) have the uniform convergence rate of O P (r −1 n,j √ log n) uniformly over P ∈ P. Lemma 2 in Section 4.4 provides some sufficient conditions for this convergence.
We now introduce conditions for the bandwidth h and the tuning parameter c n for the contact sets.
Assumption A 4. (Rate Conditions for Tuning Parameters) (i) As n → ∞, h → 0, √ log n/r n → 0, and n −1/2 h −d−ν 1 → 0 for some arbitrarily small ν 1 > 0, where r n ≡ min j∈N J r n,j . (ii) For each n ≥ 1, there exist nonstochastic sequences c n,L > 0 and c n,U > 0 such that c n,L ≤ c n,U , and inf P ∈P P {c n,L ≤ĉ n ≤ c n,U } → 1, and log n/c n,L + c n,U /r n → 0, as n → ∞.
The requirement that √ log n/r n → 0 is satisfied easily for most cases where r n increases at a polynomial order in n. Assumption A4(ii) requires thatĉ n increase faster than √ log n but slower than r n with probability approaching one.
Assumption A5 requires that the scale normalizationσ τ,j (x) should be asymptotically well defined. The condition precludes the case where estimatorσ τ,j (x) converges to a map that becomes zero at some point (x, τ ) in S. Assumption A5 is usually satisfied by an appropriate choice ofσ τ,j (x). When one choosesσ τ,j (x) = 1, which is permitted in our framework, Assumption A5 is immediately satisfied with σ n,τ,j (x) = 1. Again, if we go back to the simple example considered in Section 2.4, it is straightforward to see that under regularity conditions, with the subscript τ suppressed,σ 2 1 (x) = σ 2 n,1 (x) + o P (1) and σ 2 n,1 (x) = σ 2 The convergence can be strengthened to a uniform convergence when σ 2 1 (x) is bounded away from zero uniformly over x ∈ X and P ∈ P, so that Assumption A5 holds.
We introduce assumptions about the moment conditions for β n,x,τ,j (·, ·) and other regularity conditions. For τ ∈ T and ε 1 > 0, let in its interior and K 0 is the same as Assumption 2.
Assumption A6. (i) There exist M ≥ 2(p + 2), C > 0, and ε 1 > 0 such that Assumption A6(i) requires that conditional moments of β n,x,τ,j (Y ij , z) be bounded. Assumption A6(ii) is a technical condition for the distribution of X i . The third inequality in Assumption A6(ii) is satisfied if the distribution of X i is uniformly tight in P, and follows, for example, if sup P ∈P E||X i || < ∞. The first inequality in Assumption A6(ii) requires that there be a common compact set outside which the distribution of X i still has positive probability mass uniformly over P ∈ P. The main thrust of Assumption A6(ii) lies in the requirement that such a compact set be independent of P ∈ P. While it is necessary to make this technical condition explicit as stated here, the condition itself appears very weak. This paper's asymptotic analysis adopts the approach of Poissonization (see, e.g., Horváth (1991) and Giné, Mason, and Zaitsev (2003)). However, existing methods of Poissonization are not readily applicable to our testing problem, mainly due to the possibility of local or global redundancy among the nonparametric functions. In particular, the conditional 26 The conditional expectation E P |β n,x,τ,j (Y ij , u) , which is not well defined according to Kolmogorov's definition of conditional expectations. See, e.g. Proschan and Presnell (1998) for this problem. Here we define the conditional expectation in an elementary way by using conditional densities or conditional probability mass functions of (Y ij , Y ik ) given X i = x, depending on whether (Y ij , Y ik ) is continuous or discrete.
covariance matrix of β n,x,τ,j (Y ij , u)'s across different (x, τ, j)'s given X i can be singular in the limit. Since the empirical researcher rarely knows a priori the local relations among nonparametric functions, it is important that the validity of the test is not sensitive to the local relations among them, i.e., the validity should be uniform in P .
This paper deals with this challenge in three steps. First, we introduce a Poissonized version of the test statistic and apply a certain form of regularization to facilitate the derivation of its limiting distribution uniformly in P ∈ P, i.e., regardless of singularity or degeneracy in the original test statistic. Second, we use a Berry-Esseen-type bound to compute the finite sample influence of the regularization bias and let the regularization parameter go to zero carefully, so that the bias disappears in the limit. Third, we translate thus computed limiting distribution into that of the original test statistic, using so-called de-Poissonization lemma. This is how the uniformity issue in this complex situation is covered through the Poissonization method combined with the method of regularization.

Asymptotic Validity of Bootstrap Procedure.
Recall that E * and P * denote the expectation and the probability under the bootstrap distribution. We make the following assumptions forv * τ,j (x).
and β n,x,τ,j is a real valued function introduced in Assumption A1.
Assumption B1 is the asymptotic linear representation of the bootstrap estimatorv * τ,j (x). The proof of the asymptotic linear representation can be typically proceeded in a similar way that one obtains the original asymptotic linear representation in Assumption A1. Assumptions B2 and B3 are the bootstrap versions of Assumptions A3 and A5.
Note that Assumption B4 is stronger than the bandwidth condition in Assumption A4(i). The main reason is that we need to prove that for some a ∞ > 0, we have a n = a ∞ + o(h d/2 ) and a * n = a ∞ + o P (h d/2 ), P-uniformly, where a n is an appropriate location normalizer of the test statistic, and a * n is a bootstrap counterpart of a n . To show these, we utilize a Berry-Esseen-type bound for a nonlinear transform of independent sum of random variables. Since the approximation error depends on the moment bounds for the sum, the bandwidth condition in Assumption B4 takes a form that involves M > 0 in Assumption A6.
We now present the result of the uniform validity of our bootstrap test.
Theorem 1. Suppose that Assumptions A1-A6 and B1-B4 hold. Then One might ask whether the bootstrap test 1{θ > c * α,η } is asymptotically exact, i.e., whether the inequality in Theorem 1 holds as an equality. As we show below, the answer is affirmative in general. The remaining issue is a precise formulation of a subset of P 0 such that the rejection probability of the bootstrap test achieves the level α asymptotically, uniformly over the subset.
To see when the test will have asymptotically exact size, we apply Lemma 1 to find that with probability approaching one, , and c n,U > 0 and c n,L > 0 are nonstochastic sequences that satisfy Assumption A4(ii). 27 We fix a positive sequence q n → 0, and write the right hand side as Under the null hypothesis, we have v n,τ,j (x) ≤ 0, and hence the last sum is bounded by with probability approaching one. Using the uniform convergence rate in Assumption A3, we find that as long as Q(B n,A (c n,U , c n,L )\B n,A (q n )) → 0, fast enough, the second term in (4.7) vanishes in probability. As for the first integral, since for all x ∈ B n,A (q n ), we have |r n,j v n,τ,j (x)/σ τ,j (x)| ≤ q n for all j ∈ A, we use the Lipschitz continuity of the map Λ A,p on a compact set, to approximate the leading sum in (4.7) bȳ Thus we let as long as λ n and q n converge to zero fast enough. We will specify the conditions in Theorem 2 below. Let us deal withθ 1,n (q n ). First, it can be shown that there are sequences of nonstochastic numbers a n (q n ) ∈ R and σ n (q n ) > 0 that depend on q n such that We provide the precise formulae for σ n (q n ) and a n (q n ) in Section 6.3. Since the distribution of h −d/2 {θ 1,n (q n ) − a n (q n )}/σ n (q n ) is approximated by the bootstrap distribution of h −d/2 {θ * − a n (q n )}/σ n (q n ) in large samples, we find that Hence the bootstrap critical value c * α will dominate h −d/2 η +â * > 0, if for all n ≥ 1, We can show thatâ * − a n (q n ) = o P (h d/2 ), which follows if λ n in (4.8) vanishes to zero sufficiently fast. Hence if we have c * α becomes approximately equal to our bootstrap critical value c * α,η . This leads to the following formulation of probabilities.
The following theorem establishes the asymptotic exactness of the size of the bootstrap test over P ∈ P n (λ n , q n ) ∩ P 0 .
Theorem 2. Suppose that Assumptions A1-A6 and B1-B4 hold. Let λ n → 0 and q n → 0 be positive sequences such that Theorem 2 shows that the rejection probability of our bootstrap test achieves exactly the level α uniformly over the set of probabilities in P n (λ n , q n )∩P 0 . If v n,τ,j (x) ≡ 0 for each (x, τ ) and for each j (the least favorable case, say P LFC ), then it is obvious that the distribution P LFC belongs to P n (λ n , q n ) for any positive sequences λ n → 0 and q n → 0. This would be the only case of asymptotically exact coverage if bootstrap critical values were obtained as in (2.12), without contact set estimation. By estimating the contact sets and obtaining a critical value based on them, Theorem 2 establishes the asymptotically uniform exactness of the bootstrap test for distributions such that they may not satisfy v n,τ,j (x) ≡ 0 everywhere.

Sufficient Conditions for Uniform Convergences in Assumptions A3 and B2.
This subsection gives sufficient conditions that yield Assumptions A3 and B2. The result is formalized in the following lemma.
Lemma 2. (i) Suppose that Assumptions A1-A2 hold and that for each j ∈ N J , there exist finite constants C, γ j > 0, and a positive sequence δ n,j > 0 such that for all n ≥ 1, and all ] ≤ C and δ n,j = n s 1,j and h = n s 2 for some s 1,j , s 2 ∈ R. Furthermore, assume that for some small ν > 0. Then, Assumption A3 holds.
The condition (4.11) is the local L 2 -continuity condition for β n, Andrews (1994) called "Type IV class". The condition is satisfied by numerous maps that are continuous or discontinuous, as long as regularity conditions for the random vector (Y i , X i ) are satisfied. 28 Typically, δ n,j diverges to infinity at a polynomial rate in h −1 . The constant γ j is 2 or can be smaller than 2, depending on the smoothness of the underlying function b n,ij (x, τ ). The value of γ j does not affect the asymptotic theory of this paper, as long as it is strictly positive.

Verifying High-Level Conditions for the First Empirical Example
In this section, we use the auction model of GPV to illustrate how to verify high-level regularity conditions in Section 4. 29 5.1. Details on Estimating Conditional Quantile Functions. We provide further details on the empirical example considered in Section 3.3. Assume that q k (τ |x) is (r +1)-times continuously differentiable with respect to x, where r ≥ 1. We use a local polynomial es- . . , n} denote the observed data, where {B i : = 1, . . . , L i } denotes the L i number of observed bids in the i-th auction, X i a vector of observed 28 Chen, Linton, and Van Keilegom (2003, Theorem 3) introduced its extension to functions indexed partly by infinite dimensional parameters, and called it local uniform L 2 -continuity. For further discussions, see Andrews (1994) and Chen, Linton, and Van Keilegom (2003). 29 Similarly one may derive primitive conditions for the second empirical example since it is also concerned with estimating conditional quantile functions. Hence we omit the details. characteristics for the i-th auction, and L i the number of bids for the i-th auction, taking values from N L ≡ {2, · · ·,L}. In our application,L = 3.
Assume that the data {(B i , X i , L i ) : = 1, . . . , L i , i = 1, . . . , n} are i.i.d. over i and that B i 's are also i.i.d. over conditional on X i and L i . To implement the test, it is necessary to estimate b. In our application, we useb = min{B i : = 1, . . . , L i , i = 1, . . . , n}, that is the overall sample minimum.
For each x = (x 1 , . . . , x d ), the r-th order local polynomial quantile regression estimator of q k (τ |x) can be obtained by minimizing and e 1 is a column vector whose first entry is one, and the rest zero. Note that all bids are combined in each auction since we consider symmetric bidders. For u = (u 1 , · · ·, u d ) ∈ A r , and r + 1 times differentiable map f on R d , we define the following derivative:

5.2.
Primitive Conditions for the Example. Let us present primitive conditions for the auction example of GPV.
(ii) The density f of X is continuously differentiable on R d with a derivative bounded uniformly over P ∈ P.
(iv) P {L i = k|X i = x} is bounded away from zero uniformly over x ∈ S τ (ε), τ ∈ T and P ∈ P, and is continuously differentiable in x with a derivative bounded uniformly over x ∈ S τ (ε), τ ∈ T and P ∈ P.
Assumption AUC-3. (i) K is compact-supported, nonnegative, bounded, and Lipschitz continuous on the interior of its support, K(u)du = 1, and K (u) ||u|| 2 du > 0. (ii) As n → ∞, Assumption AUC-1(i) is a standard assumption used in the local polynomial approach where one approximates q k (·|x) by a linear combination of its derivatives through Taylor expansion, except only that the approximation behaves well uniformly over P ∈ P. Assumption AUC-2 is made to prevent the degeneracy of the asymptotic linear representation ofγ τ,k (x) − γ τ,k (x) that is uniform over x ∈ S τ (ε), τ ∈ T and over P ∈ P. Assumptions AUC-3 (i) and (ii) are conditions for the kernel and the bandwidth. For example, the choice of h = n −s with the condition 1/(2(r + 1)) < s < 1/(3(d + ν)) satisfies the bandwidth condition. The small ν > 0 there is introduced to satisfy Assumption B4. Assumption AUC-4 holds in general because the extreme order statistic is super-consistent with the n −1 rate of convergence. Recall that e 1 is a unit vector whose first element is one and all other elements are zeros.
Theorem AUC. If Assumptions AUC-1, AUC-2, AUC-3, and AUC-4 hold, then Assumptions A1-A3, A5-A6, and B1-B4 hold with the following definitions: It is worth commenting on the linear expansion derived in Theorem AUC. The term α n,x,τ,k (Y i , z) is not mean zero conditionally on X i since the bias terms are included insidẽ l τ (·). Also, note that M n,τ,k (x) depends on n and contains the smoothing bias as well. However, the results obtained in Theorem AUC are sufficient enough to verify high-level conditions of the paper.
The main part of the proof is to establish a uniform error rate for the asymptotic linear representation for Guerre and Sabbah (2012). 30 Our proof uses some arguments of Guerre and Sabbah (2012), who employ a maximal inequality of Massart (2007, Theorem 6.8). 31 The theoretical novelty in our derivation of the linear expansion in Theorem AUC is that we have obtained an approximation that is uniform in (x, τ ) as well as in P . To the best of our knowledge, there is no established result on linear expansions of local polynomial quantile regression estimators that hold uniformly over three aspects (x, τ, P ) simultaneously. Therefore, our results may be of independent interest and can be useful in other contexts.

Power Properties
In this section, we go back to the general setup in Section 4 and consider the power properties of the bootstrap test. In Section 6.1, we establish the consistency of our test. Section 6.2 provides heuristic arguments behind local power properties of our tests, and Section 6.3 presents the local power function in a general form. 32 6.1. Consistency. First, to show consistency of our test, we make the following assumption.
The pointwise convergence v n,τ,j (x) = v τ,j (x) + o(1) holds typically by an appropriate choice of v n,τ,j (x). In many examples, condition (6.1) is often implied by Assumptions A1-A6. If we revisit the simple example considered in Section 2.4, it is straightforward to see 30 See Lemma QR2 in Appendix B. Our asymptotic approximation is based on plugging the asymptotic linear expansion directly. There is a recent proposal by Mammen, Van Keilegom, and Yu (2013), who developed nonparametric tests for parametric specifications of regression quantiles and showed that calculating moments of linear expansions of nonparametric quantile regression estimators might work better in a sense that their approach requires less stringent conditions for the dimension of covariates and the choice of the bandwidth. It is an interesting future research topic whether their ideas can be applied to our setup. 31 The main significant difference is that the convergence rate obtained by Guerre and Sabbah (2012) is uniform over h in some interval, while our result is uniform over P ∈ P. 32 The local power results in this section are more general than those of Lee, Song, and Whang (2013). In particular, the results accommodate a wider class of local alternatives that may not converge to the least favorable case. that under Assumptions A1-A6, with the subscript τ suppressed, v n,1 ( , and (6.1) holds easily.
We now establish the consistency of our proposed test as follows.
6.2. Local Power Analysis: Definitions and Heuristics. In this section, we investigate the local power properties of our test. For local power analysis, we formally define the space of Pitman directions. Let D be the collection of R J -valued bounded functions on X ×T such that for each That is, at least one of the components of any δ ∈ D is a non-zero function a.e. For each For a given vector of sequences b n = (b n,1 , · · ·, b n,J ), such that b n,j → ∞, and δ ∈ D, we consider the following type of local alternatives: is a sequence of Pitman local alternatives that consist of three components: v 0 τ,j (x), b n , and δ τ,j (x). The first component v 0 τ,j (x) determines where the sequence of local alternatives converges to. For example, if v 0 τ,j (x) ≡ 0 for all (x, τ, j), then we have a sequence of local alternatives that converges to the least favorable case. We allow for negative values for v 0 τ,j (x), so that we include the local alternatives that do not converge to the least favorable case as well.
From here on, we assume the local alternative hypotheses of the form in (6.2). We fix v 0 τ,j (x) and identify each local alternative with a pair (b n , δ) for each Pitman direction δ ∈ D.
The following definitions are useful to explain our local power results.
Definition 3. (i) Given a Pitman direction δ ∈ D, we say that an α-level test, 1{T > c α }, has nontrivial local power against (b n , δ), if under the local alternatives (b n , δ), and say that the test has trivial local power against (b n , δ), if under the local alternatives (b n , δ), (ii) Given a collection D, we say that a test has convergence rate b n against D, if the test has nontrivial local power against (b n , δ) for some δ ∈ D, and has trivial local power against (b n , δ) for all δ ∈ D and all b n such that b n,j /b n,j → ∞ as n → ∞, for all j = 1, . . . , J.
One of the remarkable aspects of the local power properties is that our test has two types of convergence rates. More specifically, there exists a partition (D 1 , D 2 ) of D, where our test has a rate b n against D 1 and another rate b n against D 2 . Furthermore, in many nonparametric inequality testing environments, the faster of the two rates b n and b n achieves the parametric rate of √ n. To see this closely, let us assume the set-up of testing inequality restrictions on a mean regression function in Section 2.4, and consider the following local alternatives: where v 0 (x) ≤ 0 for all x ∈ X , and δ ∈ D. First, we set b n = √ n. Then under this local alternative hypothesis (b n , δ), we can verify that with probability approaching one, Under regularity conditions, the right-hand side of (6.4) is approximated by The leading term in (6.5) converges in distribution to Z 1 ∼ N (0, σ 2 0 ) precisely as in (2.11). Furthermore, we can show that a n,δ = Therefore, as for the last term in (6.5), we find that where the last equality follows from expanding We conclude that under the local alternatives, we have The magnitude of the last term in the limit determines the local power of the test. Thus under Pitman local alternatives such that (6.6) the test has nontrivial power against √ n-converging Pitman local alternatives. Note that the integral in (6.6) is defined on the population contact set B 0 (0). Thus, the test has nontrivial power, unless the contact set has Lebesgue measure zero or δ(·) is "too often negative" on the contact set.
When the integral in (6.6) is zero, we consider the local alternatives (b n , δ) with a slower convergence rate b n = n 1/2 h 1/4 . Following similar arguments as before, we now have which can be shown again to be equal to However, observe that Therefore, even when B 0 (0) {δ(x)/σ 1 (x)}dx = 0, the test still has nontrivial power against n 1/2 h 1/4 -converging Pitman local alternatives, if the Pitman directions are such that When inf x∈X σ 2 1 (x) > c > 0 for some c > 0 (recall Assumption A5) and Q(B 0 (0)) > 0, we have B 0 (0) {δ 2 (x)/σ 2 1 (x)}dx > 0 and the set {D 1 , D 2 } becomes a partition of D. Thus the bootstrap test has a convergence rate of √ n against D 1 and n 1/2 h 1/4 -rate against D 2 . In the next section, Corollary 1 provides a general result of this phenomenon of dual convergence rates of our bootstrap test.
n,τ 1 ,τ 2 (x, u) ] is a mean zero R 2J -valued Gaussian random vector whose covariance matrix is given by The multiple integral in (6.8) is nonnegative. The limit of the quantity σ 2 n (q n ) as n → ∞, if it is positive, is nothing but the asymptotic variance of the test statisticθ (after location-scale normalization). Not surprisingly the asymptotic variance does not depend on points (x, τ ) of X × T such that v n,τ,j (x)/σ n,τ,j (x) is away below zero, as is expressed through its dependence on the contact sets B n,A (q n ) and the "truncated map"Λ x,τ (·) involving A's restricted to N J .
We first make the following assumptions.
Assumption C2 can also be shown to hold in many examples. When appropriate smoothness conditions for v τ,j (x) hold and a suitable (possibly higher-order) kernel function is used, we can take v n,τ,j (x) in Assumption A1 to be identical to v τ,j (x), and hence Assumption C2 is implied by (6.2). For the simple example in Section 2.4, if we take v n,j (x) = Ev j (x), it The local asymptotic power function is based on the asymptotic normal approximation of the distribution ofθ (after scale and location normalization) under the local alternatives. For this purpose, we define the sequence of probability sets that admit the normal approximation under local alternatives. For Definition 4. For any positive sequence λ n → 0, define whereP 0 n (λ n ) is equal toP n (λ n , q n ) except that B n,A (c n,U , c n,L ) and B n,A (q n ) are replaced by B 0 n,A (c n,U , c n,L ) and B 0 n,A (q n ) for all A ∈ N J , and q n is set to be zero.
To give a general form of the local power function, let us define ψ n,A,τ (·; x) : The local power properties of the bootstrap test are mainly determined by the slope and the curvature of this function. So, we define if the first derivatives and the second derivatives in the definition exist respectively.
To appreciate Assumption C3, consider the case where J = 2, A = {1, 2}, and W (1) n,τ,τ (x, 0) has a distribution denoted by G n . Choose y 1 ≥ y 2 without loss of generality. We take Certainly the three quantities are all differentiable in (y 1 , y 2 ).
The following theorem offers the local power function of the bootstrap test in a general form.
Theorem 4. Suppose that Assumptions A1-A6, B1-B4, C1-C2, and C3(i) hold and that as n → ∞. Then for each sequence P n ∈ P 0 n (λ n ), n ≥ 1, which satisfies the local alternative where Φ denotes the standard normal cdf, Theorem 4 shows that if we take b n such that b n,j = r n,j h −d/2 for each j = 1, . . . , J, the local asymptotic power of the test against (b n , δ) is determined by the shift µ 1 (δ). Thus, the bootstrap test has nontrivial local power against (b n , δ) if and only if The test is asymptotically biased against (b n , δ) such that µ 1 (δ) < 0.
Suppose that (6.14) µ 1 (δ) = 0, for all A ∈ N J , i.e., when δ τ,σ has positive and negative parts which precisely cancels out in the integration. Then, we show that the bootstrap test has nontrivial asymptotic power against local alternatives that converges at a rate slower than n −1/2 to the null hypothesis.
Theorem 5. Suppose that the conditions of Theorem 4 and Assumption C3(ii) hold. Then for each sequence P n ∈ P 0 n (λ n ), n ≥ 1, which satisfies the local alternative hypothesis (b n , δ) for some δ ∈ D such that µ 1 (δ) = 0 and b n = (r n, The local power function depends on the limit of the curvature of the function ψ n,A,τ (y; x) at y = 0, for all A ∈ N J . When the function is strictly concave at 0 in the limit, ψ A,τ (0; x) is positive definite on X ×T , and in this case, the bootstrap test has nontrivial power whenever δ τ,σ (x) is nonzero on a set whose intersection with B 0 n (0) has Lebesgue measure greater than c > 0 for all n ≥ 1, for some c > 0.
From Theorems 4 and 5, it is seen that the phenomenon of dual convergence rates generally hold for our tests. To formally state the result, define When lim inf n→∞ Q(B 0 n (0)) > 0, the set {D 1 , D 2 } becomes a partition of the space of Pitman directions D.
Corollary 1. Suppose that the conditions of Theorem 5 hold. Then the bootstrap test has convergence rate b n = (r n,j h −d/2 ) J j=1 against D 1 , and convergence rate b n = (r n,j h −d/4 ) J j=1 against D 2 .
When r n,j 's diverge to infinity at the usual nonparametric rate r n,j = n 1/2 h d/2 as in many kernel-based estimators, the test has a parametric rate of convergence b n = √ n and nontrivial local power against D 1 . However, the test has a convergence rate slower than the parametric rate against D 2 .
When r n,j 's diverge slower than the rate n 1/2 h d/2 as in the case of kernel-based derivative estimators, the test has a convergence rate slower than the parametric rate. In Appendix II.2, we present several nonparametric tests for monotonicity where d = 1, J = 1, and r n,1 = n 1/2 h 3/2 . In this case, the monotonicity tests have convergence rate with b n = n 1/2 h against D 1 , and convergence rate with b n = n 1/2 h 5/4 against D 2 .

Conclusions
In this paper, we have proposed a general method for testing inequality restrictions on nonparametric functions and have illustrated its usefulness by looking at two particular empirical applications. We regard our examples as just some illustrative applications and believe that our framework can be useful in a number of other settings.
Our bootstrap test is based on a one-sided version of L p functionals of kernel-type estimators (1 ≤ p < ∞). We have provided regularity conditions under which the bootstrap test is asymptotically valid uniformly over a large class of distributions and have also provided a class of distributions for which the asymptotic size is exact. We have shown the consistency of our test and have obtained a general form of the local power function.
There are different notions of efficiency for nonparametric tests and hence there is no compelling sense of an asymptotically optimal test for the hypothesis considered in this paper. See Nitikin (1995) and Bickel, Ritov, and Stoker (2006) for a general discussion. It would be interesting to consider a multiscale version of our test based on a range of bandwidths to see if it achieves adaptive rate-optimality against a sequence of smooth alternatives along the lines of Armstrong and Chan (2013) and Chetverikov (2011).

Appendices
Appendix I reports the results of Monte Carlo experiments, and Appendix II presents more examples of testing problems that can be included in our general framework. Appendix A gives the proofs of Theorems 1-5, Appendix B provides the proof of Theorem AUC, and Appendices C and D offer auxiliary results for the proofs of Theorems 1-5.

I. Monte Carlo Experiments
This part of the appendix reports the finite-sample performance of our proposed test for the Monte Carlo design considered in Andrews and Shi (2013, Section 10.3, hereafter AS). The null hypothesis has the form with a fixed θ. AS generated a random sample of (Y, X) from the following model: withŨ i ∼ N (0, 1) and σ = 1, and f (·) is a function with an alternative shape. AS considered two functions: These two functions have steep slopes, f AS1 being a roughly plateau-shaped function and f AS2 a roughly double-plateau-shaped function, respectively. AS considered the following Monte Carlo designs: AS compared their tests with Chernozhukov, Lee, and Rosen (2013, hereafter CLR) and Lee, Song, and Whang (2013). The latter test uses conservative standard normal critical values based on the least favorable configuration.  Lee, Song, and Whang (2013), which uses conservative standard normal critical values based on the least favorable configuration. LSW2 refers to this paper that uses bootstrap critical values based on the estimated contact set. The tuning parameter is chosen by the ruleĉ n = C cs log log(n)q 1−0.1/ log(n) (S * n ), where C cs ∈ {0.4, 0.5, 0.6}.
In this paper, we used the same statistic for Lee, Song, and Whang (2013) as reported in AS. Specifically, we used the L 1 version of the test with the inverse standard error weight function. In implementing the test, we used K(u) = (3/2)(1 − (2u) 2 )I(|u| ≤ 1/2) and  (1)-(5) are "CPcorrected", where those in columns (6)-(8) are not "CP-corrected". h = 2 ×ŝ X × n −1/5 , where I(A) is the usual indicator function that has value one if A is true and zero otherwise andŝ X is the sample standard deviation of X. Thus, the only difference between the new test (which we call LSW2) and Lee, Song, and Whang (2013) (which we call LSW1) is the use of critical values: LSW1 uses the standard normal critical values based on the least favorable configuration, whereas LSW2 uses bootstrap critical values based on the estimated contact set. For contact set estimation, we set the rulê  Table 3 shows that coverage probabilities of LSW2 are much closer to the nominal level than those of LSW1. When c = 0.4 and n = 100 or 250, we see some under-coverage for LSW2, but it disappears as n gets larger. Table 4 reports the false coverage probabilities (FCPs). Figures in columns (1)-(5) are "CP-corrected" by AS, where those in columns (6)-(8) are not "CP-corrected". However, CP-correction would not change the results for either n ≥ 500 or c ≥ 0.5 since in each of these cases, we have over-coverage. We can see that in terms of FCPs, LSW2 performs much better than LSW1 in all DGPs. Furthermore, the performance of LSW2 is equivalent to that of AS for DGP1, DGP3, and DGP4, and is superior to AS for DGP2. Overall, our simulation results show that our new test is a substantial improved version of LSW1 and is now very much comparable to AS. The relative poor performance of CLR in tables 3 and 4 are mainly due to the experimental design. If the underlying function is sharply peaked, as those in the reported simulations of Chernozhukov, Lee, and Rosen (2013), CLR performs better than AS. In unreported simulations, we confirmed that CLR performs better than LSW2 as well. This is very reasonable since CLR is based on the sup-norm statistic, whereas ours is based on the one-sided L p norm. Therefore, we may conclude that AS, CLR, and LSW2 complement each other.

II. Further Examples of Testing Functional Inequalities
II.1. Testing Functional Inequalities in the Auction Model via Estimating Conditional Cumulative Distribution Functions. This appendix illustrates the usefulness and flexibility of our framework by reconsidering implications from GPV in terms of conditional stochastic dominance. Specifically, relative to the test statistic in the main text (based on estimating conditional quantiles functions), we consider a related but distinct testing statistic based on estimating conditional cumulative distribution functions.
We may rewrite (3.3) as where G k (·|x) is the CDF of the observed bid (conditional on X = x) when the number of bidders is I = k (k = 2, 3). Recall that in GPV, the support of the observed bid is [b, b k ]. Note that strictly speaking, the restrictions in (II.1) are not identical to those in (3.3) since τ in (3.3) is limited to a compact strict subset of (0, 1).
To implement the test, it is necessary to know b k (k = 2, 3), in addition to the value of b. As before, in our application, we set b the overall minimum value, and b k the maximum value when the number of bids is k for k = 2, 3. Define To construct the test statistic, it is necessary to estimate Note that again all bids are combined in each auction (see the definition ofB i (b)) since we consider symmetric bidders. The sum statistic would be convenient for testing (II.1) since b 2 can be different from b 3 . Then the test statistic has the form where Q is Lebesgue measure. Note that we did not normalizev j (b, x) by its pointwise standard error here. One advantage of doing this is that we can test the null hypothesis on the full support [b, b k ], (k = 2, 3) without an elaborate use of the trimming function or a decaying weight function at the boundary.
II.2. Nonparametric Tests of Monotonicity: an L p Approach. In this appendix, we present new methods for testing monotonicity by constructing one-sided L p -type functionals in a suitable fashion. Suppose that we observe n independent and identically distributed random vectors {(Y i , X i ) : i = 1, . . . , n} from the joint distribution of random variables Y and X, where Y is the dependent variable and X is a univariate explanatory variable. We consider testing monotonicity in three examples: one in mean regression, another in conditional distribution function, and the third in quantile regression. In what follows, we focus on the case that J = 1; however, it is straightforward to extend to the J > 1 case with a multivariate vector of Y i . II.2.1. Testing monotonicity of mean regression. Let X ⊂ R be the region of our interest in the domain of the regression function E[Y |X = x]. Consider testing the hypothesis Let K be a one-dimensional kernel function and h be a bandwidth. Define the following U -process: for with a suitable choice of r n . Define It can be shown that the U -processv(x) has the following asymptotic representation: where R n is a remainder term that is of smaller order than the leading term and Therefore, we have r n = √ nh 3 . In a contemporaneous paper, Chetverikov (2012) proposed an adaptive test using the supnorm statistic of a studentized version of a U-process, includingv(x) as a special case. The test based on (II.3) is an alternative to the sup-norm test of Chetverikov (2012). The test of Chetverikov (2012) is closely related to the tests proposed in Ghosal, Sen, and van der Vaart (2000). They developed monotonicity tests for the function m(·) in the transformation model φ(Y ) = m(X) + ε, where φ(·) is a monotone function and X and ε are independent. In their setup, independence between X and ε is indispensable, but φ(·) can be unknown as long as it is strictly monotone. They constructed sup-norm and time spent test statistics (S 1,n and S 2,n of Ghosal, Sen, and van der Vaart (2000, page 1060)) using the following U -process: Ghosal, Sen, and van der Vaart (2000) shows that the limit of h −1 EU n (x) is less than or equal to zero if and only if ∂m(x)/∂x ≥ 0. As before, we may develop a test based on (II.4) with a redefinedv(x) = h −1 U n (x) and the same r n .
In addition to sup-norm and time spent test statistics, Ghosal, Sen, and van der Vaart (2000, page 1070) suggested test statistics that are similar to our one-sided L p statistics; however, they did not provide asymptotic theory, remarking that there are no limit theorems for one-sided L p functionals of a stationary Gaussian process. However, we can obtain the limiting distribution of our suggested test statistic in (II.3) by a direct approximation ofθ via Poissonization techniques, without going through strong approximation results such as Rio (1994) and Chernozhukov, Lee, and Rosen (2013).
II.2.2. Testing stochastic monotonicity. Let F Y |X (·|x) denote the distribution of Y conditional on X = x, where (Y, X) is a pair of random variables whose joint distribution is absolutely continuous with respect to Lebesgue measure. We assume that the function F Y |X (y|x) is continuously differentiable with respect to x for each y. Consider testing the hypothesis H 0 : ∂F Y |X (y|x)/∂x ≤ 0 for all (y, x) ∈ Y × X , where Y ⊂ R and X ⊂ R are domains of interest.
In this subsection, consider the following U -process: for (y, x) ∈ Y × X , Lee, Linton, and Whang (2009) proposed a nonparametric test of stochastic monotonicity using the sup-norm statistic based onv(y, x). Note that as mentioned in Lee, Linton, and Whang (2009), under the regularity conditions imposed in this paper, as n → ∞, we have that That is, the limit of Ev(y, x) is less than or equal to zero if and only if H 0 holds. Again this suggests we develop a test based on with r n = √ nh 3 . The one-sided L p functional-based test complements the sup-norm test of Lee, Linton, and Whang (2009). Delgado and Escanciano (2012) proposed an alternative approach based on the sup-norm of the difference between the empirical copula function and its least concave majorant.
II.2.3. Testing Monotonicity of Quantile Regression. Let q(τ |x) denote the τ -th quantile of Y conditional on X = x, where τ ∈ (0, 1). In this subsection, we consider testing monotonicity of quantile regression. The null hypothesis and the alternative hypothesis are as follows: where X ⊂ {(x 1 , x 2 ) ∈ R 2 : x 1 ≤ x 2 } and T ⊂ (0, 1). The null hypothesis states that the quantile functions are increasing on X for all τ ∈ T , and the alternative hypothesis is the negation of the hypothesis. If T consists of a singleton set, then testing (II.7) amounts to testing monotonicity of quantile regression at a fixed quantile.
Suppose that q(τ |x) is continuously differentiable on on X for each τ ∈ T . Then one natural approach is to test the sign restriction of the derivative of q(τ |x). In other words, we again develop a test based on (II.6) withv(τ, x) now being the local polynomial estimator of ∂q(τ |x)/∂x and r n = √ nh 3 . Our general framework covers various other forms of monotonicity tests for quantile regression. For example, one might be interested in monotonicity of an interquartile regression function. More specifically, let τ 1 < τ 2 be chosen from (0, 1) and write ∆q τ 1 ,τ 2 (x) ≡ q(τ 2 |x 1 ) − q(τ 1 |x 1 ). Then the null hypothesis and the alternative hypothesis of monotonicity of the interquartile regression function are as follows: The null hypothesis states that the interquartile regression function q τ 2 ,j (x) − q τ 1 ,j (x) is increasing on X for all j ∈ N J . This type of monotonicity can be used to investigate whether the income inequality (in terms of interquartile comparison) become severe as certain demographic variable X increases. Once again, we can consider a test based on (II.3) witĥ v(x) now being the local polynomial estimator of [∂q(τ 2 |x)/∂x−∂q(τ 2 |x)/∂x] and r n = √ nh 3 .
Appendix A. Proofs of Theorems 1-5 The roadmap of Appendix A is as follows. Appendix A begins with the proofs of Lemma 1 (the representation ofθ) and Lemma 2 (the uniform convergence ofv τ,j (x)). Then we establish auxiliary results, Lemmas A1-A4, to prepare for the proofs of Theorems 1-3. The brief descriptions of these auxiliary results are given below.
Lemma A1 establishes asymptotic representation of the location normalizers for the test statistic both in the population and in the bootstrap distribution. The crucial implication is that the difference between the population version and the bootstrap version is of order o P (h d/2 ), P-uniformly. The result is in fact an immediate consequence of Lemma D12 in Appendix D.
Lemma A2 establishes uniform asymptotic normality of the representation ofθ and its bootstrap version. The asymptotic normality results use the method of Poissonization as in Giné, Mason, and Zaitsev (2003) and Lee, Song, and Whang (2013). However, in contrast to the preceding researches, the results established here are much more general, and hold uniformly over a wide class of probabilities. The lemma relies on Lemmas C7-C9 in Appendix C and their bootstrap versions in Lemmas D7-D9 in Appendix D. These results are employed to obtain the uniform asymptotic normality of the representation ofθ in Lemma A2.
Lemma A3 establishes that the estimated contact setsB A (ĉ n ) are covered by its enlarged population version, and covers its shrunk population version with probability approaching one uniformly over P ∈ P. In fact, this is an immediate consequence of the uniform convergence results forv τ,j (x) andσ τ,j (x) in Assumptions 3 and 5. Lemma A3 is used later, when we replace the estimated contact sets by their appropriate population versions, eliminating the nuisance to deal with the estimation errors inB A (ĉ n ).
Lemma A4 presents the approximation result of the critical values for the original and bootstrap test statistics in Lemma A2, by critical values from the standard normal distribution uniformly over P ∈ P. Although we do not propose using the normal critical values, the result is used as an intermediate step for justifying the use of the bootstrap method in this paper. Obviously, Lemma A4 follows as a consequence of Lemma A2.
Equipped with Lemmas A1-A4, we proceed to prove Theorem 1. For this, we first use the representation result of Lemma 1 forθ. In doing so, we use B A (c n,L , c n,U ) as a population version ofB A (ĉ n ). This is because B A (c n,L , c n,U ) ⊂B A (ĉ n ) with probability approaching one by Lemma A3, and thus, makes the bootstrap test statistiĉ θ * dominate the one that involves B A (c n,L , c n,U ) in place ofB A (ĉ n ). The distribution of the latter bootstrap version with B A (c n,L , c n,U ) is asymptotically equivalent to the representation ofθ with B A (c n,L , c n,U ) after location-scale normalization, as long as the limiting distribution is nondegenerate. When the limiting distribution is degenerate, we use the second component h d/2 η+â * in the definition of c * α,η to ensure the asymptotic validity of the bootstrap procedure. For both cases of degenerate and nondegenerate limiting distributions, Lemma A1 which enables one to replaceâ * by an appropriate population version is crucial.
The proof of Theorem 2 that shows the asymptotic exactness of the bootstrap test modifies the proof of Theorem 1 substantially. Instead of using the representation result of Lemma 1 forθ with B n,A (c n,L , c n,U ), we now use the same version but with B n,A (c n,U , c n,L ). This is because for asymptotic exactness, we need to approximate the original and bootstrap quantities by versions using B n,A (q n ) for small q n , and to do this, we need to control the remainder term in the bootstrap statistic with the integral domainB A (ĉ n )\B n,A (q n ). By our choice of B n,A (c n,U , c n,L ) and by the fact that we havê with probability approaching one by Lemma A3, we can bound the remainder term with a version with the integral domain B n,A (c n,U , c n,L )\B n,A (q n ). Thus this remainder term vanishes by the condition for λ n and q n in the definition of P n (λ n , q n ).
The rest of the proofs are devoted to proving the power properties of the bootstrap procedure. Theorem 3 establishes consistency of the bootstrap test. Theorems 4 and 5 establish local power functions under Pitman local drifts. The proofs of Theorems 4-5 are similar to the proof of Theorem 2, as we need to establish the asymptotically exact form of the rejection probability for the bootstrap test statistic. Nevertheless, we need to employ some delicate arguments to deal with the Pitman local alternatives, and need to expand the rejection probability to obtain the final results. For this, we first establish Lemmas A5-A7. Essentially, Lemma A5 is a version of the representation result of Lemma 1 under local alternatives. Lemma A6 and Lemma A7 parallel Lemma A1 and Lemma 2 under local alternatives.
Let us begin by proving Lemma 1. First, recall the following definitions Proof of Lemma 1. It suffices to show the following two statements: Step 1: As n → ∞, where we recall B n (c n,1 , c n,2 ) ≡ ∪ A∈N J B n,A (c n,1 , c n,2 ).
Step 2: For each A ∈ N J , as n → ∞, First, we prove Step 1. We write the integral in the probability as We first show that when (x, τ ) ∈ S\B n (c n,1 , c n,2 ), we have A n (x, τ ) = ∅ under the null hypothesis. Suppose that (x, τ ) ∈ S\B n (c n,1 , c n,2 ) but to the contrary, A n (x, τ ) is nonempty. By the definition of A n (x, τ ), we have (x, τ ) ∈ B n,An(x,τ ) (c n,1 , c n,2 ). However, since S\B n (c n,1 , c n,2 ) = S ∩ ∩ A∈N J B c n,A (c n,1 , c n,2 ) ⊂ B c n,An(x,τ ) (c n,1 , c n,2 ), this contradicts the fact that (x, τ ) ∈ S\B n (c n,1 , c n,2 ). Hence whenever (x, τ ) ∈ S\B n (c n,1 , c n,2 ), where o P (1) is uniform over (x, τ ) ∈ S and over P ∈ P by Assumption A5. Fix a small ε > 0. We have for all j ∈ N J , for all (x, τ ) ∈ S\B n (c n,1 , c n,2 ) → 1, as n → ∞, where the last convergence follows because A n (x, τ ) = ∅ for all (x, τ ) ∈ S\B n (c n,1 , c n,2 ). Therefore, with probability approaching one, the term in (A.3) is bounded by S\Bn(c n,1 ,c n,2 ) where 1 J is a J-dimensional vector of ones. Using the definition of Λ p (v), bound the above integral by Note that by Assumption A3, Fix any arbitrarily large M > 0 and denote by E n the event that The term (A.5), when restricted to this event E n , is bounded by which becomes zero from some large n on, given that (c n,1 ∧ c n,2 )/ √ log n → ∞. Since sup P ∈P 0 P E c n → 0 as n → ∞ and then M → ∞ by Assumption A3, we obtain the desired result of Step 1.
As for Step 2, we have for any small ε > 0, and for all j ∈ N J \A, P r n,j v n,τ,j (x) σ τ,j (x) < − c n,1 ∧ c n,2 1 + ε for all (x, τ ) ∈ B n,A (c n,1 , c n,2 ) (A.6) ≥ P r n,j v n,τ,j (x) σ n,τ,j (x) < − c n,1 ∧ c n,2 (1 + ε) {1 + o P (1)} for all (x, τ ) ∈ B n,A (c n,1 , c n,2 ) → 1, similarly as before. Lets τ,A (x) be a J-dimensional vector whose j-th entry is r n,jvn,τ,j (x)/σ τ,j (x) if j ∈ A, and r n,j {v n,τ,j (x) − v n,τ,j (x)}/σ τ,j (x) if j ∈ N J \A. Since by Assumption A5, we have inf as n → ∞, using either definition of Λ p (v) in (4.1), where 1 −A is the J-dimensional vector whose j-th entry is zero if j ∈ A and one if j ∈ N J \A, and the last inequality holds with probability approaching one by (A.6). Note that by Assumption A3 and by the assumption that √ log n{c −1 n,1 + c −1 n,2 } → ∞, we deduce that for any j ∈ N J , we obtain the desired result from (A.7). Now let us turn to the proof of Lemma 2 in Section 4.4.
Step 1 is carried out by some elementary moment calculations, whereas Step 2 is proved using a maximal inequality of Massart (2007, Theorem 6.8).
Proof of Step 1: It is not hard to see that , for some C 1 > 0, C > 0. The last bound follows by the uniform fourth moment bound for b n,ij (x, τ ) assumed in Lemma 2. Note that . Note that the indicator function 1 n,ij in the definition of β a n,x,τ,j does not depend on (x, τ ) of β a n,x,τ,j . Using (4.11) in Lemma 2 and following (part of) the arguments in the proof of Theorem 3 of Chen, Linton, and Van Keilegom (2003), we find that there exist C 1 > 0 and C 2,j > 0 such that for all ε > 0, where N [] (ε, F n,j , L 2 (P )) denotes the ε-bracketing number of the class F n,j with respect to the L 2 (P )-norm and N (ε, X × T , || · ||) denotes the ε-covering number of the space X × T with respect to the Euclidean norm || · ||. The last inequality follows by the assumption that X and T are compact subsets of a Euclidean space. The class F n,j is uniformly bounded by 1/2. Let {[β a n,x k ,τ k ,j (·, (· − x k )/h)/M n,j − ∆ k (·, ·)/M n,j , β a n,x k ,τ k ,j (·, (· − x k )/h)/M n,j + ∆ k (·, ·)/M n,j ] : By the previous covering number bound, we can take N n,j ≤ C 1 ((εM n,j /δ n,j ) ∧ 1) −C 2,j , and Note that for any k ≥ 2, E |b a n,ij (x, τ )/M n,j | k ≤ E b 2 n,ij (x, τ ) /M 2 n,j ≤ CM −2 n,j h d = C(log n)/n, by the fact that |b a n,ij (x, τ )/M n,j | ≤ 1/2. Furthermore, where the first inequality follows because |∆ k (Y ij , X i )/M n,j | ≤ 1. Therefore, by Theorem 6.8 of Massart (2007), we have (from sufficiently large n on) where C 1 , C 2 , C 3 , and C 4 are positive constants. (The inequality above follows because √ log n/ √ n → 0 as n → ∞.) The leading integral has a domain restricted to [0, δ n,j /M n,j ], so that it is equal to After multiplying by M n,j /h d/2 , the last term is of order because δ n,j = n s 1,j and h = n s 2 for some s 1,j , s 2 ∈ R. Also, note that after multiplying by M n,j /h d/2 = √ n/ √ log n, the last term in (A.10) (with minus sign) becomes where the inequality follows because √ log n ≥ 1 for all n ≥ e ≡ exp(1). Collecting the results for both the terms on the right hand side of (A.10), we obtain the desired result of Step 2.
Using Le Cam's Poissonization lemma in Giné and Zinn (1990) (Proposition 2.2 on p.855) and following the arguments in the proof of Theorem 2.2 of Giné (1997), we deduce that where N i 's are i.i.d. Poisson random variables with mean 1 and independent of {( Using the same arguments as in the proof of (i), we find that the first expectation is O √ log n uniformly in P ∈ P. Using independence, we write the second expectation as which, as shown in the proof of part (i), is O( √ log n), uniformly in P ∈ P.
For further proofs, we introduce new notation. Define for any positive sequences c n,1 and c n,2 , and any v ∈ R J , We let a R n (c n,1 , c n,2 ) ≡ where z N,τ (x) and z * N,τ (x) are random vectors whose j-th entry is respectively given by and N is a Poisson random variable with mean n and independent of {Y i , X i } ∞ i=1 . We also define a n (c n,1 , c n,2 ) ≡ E Λ x,τ (W (1) n,τ,τ (x, 0)) dQ(x, τ ).
Proof of Lemma A1. The proof is essentially the same as the proof of Lemma D12 in Appendix D.
(ii) The proof can be done in the same way as in the proof of (i), using Lemmas D7 and D9(i) in Appendix D instead of Lemmas C7 and C9(i) in Appendix C.
Lemma A3. Suppose that Assumptions A1-A5 hold. Then for any sequences c n,L , c n,U > 0 satisfying Assumption A4(ii), and for each A ∈ N J , inf P ∈P P B n,A (c n,L , c n,U ) ⊂B A (ĉ n ) ⊂ B n,A (c n,U , c n,L ) → 1, as n → ∞.
Proof of Lemma A3. By using Assumptions A3-A5, and following the proof of Theorem 2, Claim 1 in Linton, Song, and Whang (2010), we can complete the proof. Details are omitted.
Proof of Lemma A4. (i) The statement immediately follows from the first statement of Lemma A2(i) and Lemma A1.
Proof of Theorem 1. By Lemma 1, we have as n → ∞. Since under the null hypothesis, we have v n,τ,j (·)/σ τ,j (·) ≤ 0 for all j ∈ N J , with probability approaching one by Assumption A5, we have be denoted byc α * n,L . By Lemma A3 and Assumption A4(ii), with probability approaching one, This implies that as n → ∞, There exists a sequence of probabilities {P n } n≥1 ⊂ P 0 such that where {w n } ⊂ {n} is a certain subsequence, andθ wn and c * wn,α,η are the same asθ and c * α,η except that the sample size n is now replaced by w n . By Assumption A6(i), {σ n (c n,L , c n,U )} n≥1 is a bounded sequence. Therefore, there exists a subsequence {u n } n≥1 ⊂ {w n } n≥1 , such that σ un (c un,L , c un,U ) converges. We consider two cases: Case 1: lim n→∞ σ un (c un,L , c un,U ) > 0, and Case 2: lim n→∞ σ un (c un,L , c un,U ) = 0. In both case, we will show below that (A.23) limsup n→∞ P un {θ un > c * un,α,η } ≤ α.
Since along {w n }, P wn {θ wn > c * wn,α,η } converges, it does so along any subsequence of {w n }. Therefore, the above limsup is equal to the last limit in (A.22). This completes the proof. Proof of (A.23) in Case 1: We write P un {θ un > c * un,α,η } as By Lemma A1, the leading probability is equal to Since η > 0 and lim n→∞ σ un (c un,L , c un,U ) = 0, the leading probability vanishes by Lemma C9(ii).
Proof of Theorem 2. We focus on probabilities P ∈ P n (λ n , q n ) ∩ P 0 . Recalling the definition of u n,τ (x;σ) ≡ [r n,j v n,τ,j (x)/σ τ,j (x)] j∈N J and applying Lemma 1 along with the condition that log n/c n,U < log n/c n,L → 0, as n → ∞, we find that with probability approaching one, Since under P ∈ P 0 , u n,τ (x;σ) ≤ 0 for all x ∈ S, with probability approaching one by Assumption 5, the last term multiplied by h −d/2 is bounded by (from some large n on) where the second to the last equality follows because Q (B n,A (c n,U , c n,L )\B n,A (q n )) ≤ λ n by the definition of P n (λ n , q n ), and the last equality follows by (4.10).
On the other hand, From the definition of Λ p in (4.1), the last difference (in absolute value) is bounded by

where [a]
A is a vector a with the j-th entry is set to be zero for all j ∈ N J \A and C > 0 is a constant that does not depend on n ≥ 1 or P ∈ P. We have sup (x,τ )∈B n,A (qn) [u n,τ (x;σ)] A ≤ q n (1 + o P (1)), by the null hypothesis and by Assumption A5. Also, by Assumptions A3 and A5, sup (x,τ )∈B n,A (qn) [ŝ τ (x)] A = O P log n .
Therefore, we conclude that The last O P (1) term is o P (1) by the condition for q n in (4.10). Thus we find that . Now let us consider the bootstrap statistic. We writê By Lemma A3, we find that inf P ∈P P B n,A (ĉ n ) ⊂ B n,A (c n,U , c n,L ) → 1, as n → ∞, so that with probability approaching one. The last term multiplied by h −d/2 is bounded by where the second to the last equality follows by Assumption B2 and the definition of P n (λ n , q n ), and the last equality follows by (4.10). Thus, we conclude that (A.26) h −d/2 (θ * − a n (q n )) σ n (q n ) = h −d/2 θ * n (q n ) − a n (q n ) σ n (q n ) + o P * (1), P n (λ n , q n )-uniformly, Using the same arguments, we also observe that where the last equality uses Lemma A1. Let the (1 − α)-th percentile of the bootstrap distribution ofθ * (q n ) be denoted byc α * n (q n ). Then by (A.26), we have (A.28) h −d/2 (c * α − a n (q n )) σ n (q n ) = h −d/2 (c α * n (q n ) − a n (q n )) σ n (q n ) + o P * (1), P n (λ n , q n )-uniformly.
By Lemma A4(ii) and by the condition that σ n (q n ) ≥ η/Φ −1 (1 − α), the leading term on the right hand side is equal to by the restriction σ n (q n ) ≥ η/Φ −1 (1 − α) in the definition of P n (λ n , q n ) and (A.27). Using this, and following the proof of Step 1 in the proof of Theorem 2, we deduce that P h −d/2 θ − a n (q n ) σ n (q n ) > h −d/2 c * α,η − a n (q n ) σ n (q n ) = P h −d/2 θ n (q n ) − a n (q n ) σ n (q n ) > h −d/2 c * α − a n (q n ) σ n (q n ) + o(1) where the first equality uses (A.25), (A.29), and the second equality uses (A.28). Since σ n (q n ) ≥ η/Φ −1 (1 − α) > 0 for all P ∈ P n (λ n , q n ) ∩ P 0 by definition, using the same arguments in the proof of Lemma A4, we obtain that the last probability is equal to uniformly over P ∈ P n (λ n , q n ) ∩ P 0 .

Furthermore, we have for any
Proof of Lemma A5. Consider the first statement. Let λ be either d/2 or d/4. We write S\B 0 n (c n,1 ,c n,2 ) where u 0 τ (x;σ) ≡ (r n,1 v 0 n,τ,1 (x)/σ τ,1 (x), · · ·, r n,J v 0 n,τ,J (x)/σ τ,J (x)) and Note that δ τ,σ (x) is bounded with probability approaching one by Assumption A3. Also note that for each j ∈ N J , by Assumption A3. Hence we obtain the desired result, using the same arguments as in the proof of Lemma 1. Given that we have (A.32), the proof of the second statement can be proceeded in the same way as the proof of the first statement.
Proof of Lemma A6. The result follows immediately from Lemma D12 in Appendix D.
Proof of Lemma A7. Note that by the definition of P 0 n (λ n ), we have Hence we can follow the proof of Lemma A2 to obtain the desired results.
Proof of Theorem 4. Using Lemma A5, we find that with probability approaching one. We write the leading sum as where R n ≡ A∈N J B 0 n,A (c n,U ,c n,L )\B 0 n,A (0) Λ A,p (ŝ τ (x) + u τ (x;σ)) dQ(x, τ ).
As for the bootstrap counterpart, note that since δ τ,j (x) is bounded and σ n,τ,j (x) is bounded away from zero uniformly over (x, τ ) ∈ S and n ≥ 1, and hence (A.38) sup as n → ∞. By (A.38), the difference between r n,j v n,τ,j (x)/σ n,τ,j (x) and r n,j v 0 n,τ,j (x)/σ n,τ,j (x) vanishes uniformly over (x, τ ) ∈ S. Therefore, combining this with Lemma A3, we find that as n → ∞. Now with probability approaching one, As for the last sum, it is bounded by with probability approaching one by (A.39). The above sum multiplied by h −d/2 is bounded by by Assumption B2 and the rate condition for λ n . Thus, we conclude that Letc α * n (0) be the (1 − α)-th quantile of the bootstrap distribution ofθ * (0) and let γ α * n (0) be the (1 − α)-th quantile of the bootstrap distribution of .
Proof of Theorem 5. First, observe that Lemma A5 continues to hold. This can be seen by following the proof of Lemma A5 and noting that (A.32) becomes here yielding the same convergence rate. The rest of the proof is the same. Similarly, Lemma A6 continues to hold also under the modified local alternatives of (6.2) with b n,j = r n,j h −d/4 . We define We follow the proof of Theorem 4 and take up arguments from (A.43). Observe that By the Dominated Convergence Theorem, Now we can use the above result by replacing δ τ,σ (x) by δ U τ,σ,κ (x) and δ L τ,σ,κ (x) and follow the proof of Theorem 4 to obtain the desired result.

Appendix B. Proofs of Results for the Example in Section 5
We first offer a general asymptotic linear representation theorem for quantile regression functions that can be useful for other purposes. While the proof employs some arguments from Guerre and Sabbah (2012), the result is different from theirs. The main difference is that their result pays attention to uniformity in h over some range, while our result pays attention to uniformity in P .
Let (B , X , L) , with B ≡ (B 1 , · · ·, BL) ∈ RL, and X ∈ R d , be a random vector such that the joint distribution of (B , X ) is absolutely continuous with respect to Lebesgue measure and L is a discrete random variable taking values from N L ≡ {1, 2, · · ·,L}. For each x ∈ R d and k ∈ N L , the conditional distribution of B l given (X, L) = (x, k) is the same across l = 1, · · ·, k.
Suppose that we are given a random sample {(B i , X i , L i ) } n i=1 of (B , X , L) . We use a local polynomial method, similar to Chaudhuri (1991a) and Chaudhuri (1991b). Assume that q k (τ |x) is (r +1)-times continuously differentiable with respect to x, where r ≥ 1. Then, we construct an estimatorγ τ,k (x) as follows: and h is a bandwidth that goes to zero as n → ∞.
We make the following assumptions.
(ii) The density f of X is continuously differentiable on R with a derivative bounded uniformly over P ∈ P.
We define where we recall c(z) = (z u ) u∈Ar , for z ∈ R d , and γ τ,k (x) = (γ τ,k,u (x)) u∈Ar with We also define for a, b ∈ R |Ar| , Lemma QR1. Suppose that Assumptions QR1-QR3 hold. Let {δ 1n } ∞ n=1 and {δ 2n } ∞ n=1 be positive sequences such that δ 1n = O P (1) and δ 2n ≤ δ 1n from some large n on. Then for each k ∈ N L , the following holds uniformly over P ∈ P: where we recall the definition of M n,τ,k (x) as Proof of Lemma QR1. (i) Define where the dependence on P is through q k (τ |x 1 ) and γ τ,k (x 1 ). We also let It is not hard to see that is a residual from the Taylor expansion of q k (τ |x 1 ) and X is bounded, and the derivatives from the Taylor expansion are bounded uniformly over P ∈ P. Let f ∆ τ,k,x (t|x ) be the conditional density of ∆ x,τ,lk,i given Since f τ,k (·|x ) is bounded uniformly over x ∈ S τ (ε) and over τ ∈ T (Assumption QR2(iii)), we conclude that for some C > 0 that does not depend on P ∈ P, We will use the results in (B.3) and (B.5) later.
For any g ∈ G n /L and any m ≥ 1, Therefore, by (B.12), for some constants C 1 , C 2 > 0, it is satisfied that for any m ≥ 2, By (B.8), (B.10), and (B.12), and the definition of b n and s n in (B.13), there exist constants (The term b 2 n ε is obtained by chaining the second and third inequalities of (B.8) and using the fact that δ = ε and the uniform bound in (B.5). The last inequality follows because b n → 0 as n → ∞.) We defineε = ε 1/2 and bound the last term by C 3 b m−2 nε 2 , for some C 3 > 0, because b n ≤ 1 from some large n on. The entropy bound in (B.11) as a function of ε remains the same except for a different constant C > 0 there. Now by Theorem 6.8 of Massart (2007) and (B.11), there exist C 1 , C 2 > 0 such that n ∧ − log ε n dε + C 1 (b n + s n ) log n ≤ C 2 s n n log n + C 2 b n log n = O δ 2n √ log n n 1/4 h d/4 , where the last equality follows by the definitions of b n and s n in (B.13) and by Assumption QR3(ii). (ii) Define λ τ,4n (S i ; x) ≡ ∆ x,τ,lk,i and L k,1 ≡ {l τ (λ τ,4n (·; x)) : τ ∈ T , x ∈ S τ (ε)}, and L k,2 ≡ {λ τ,2n (·; x)λ τ,3n (·; x) : τ ∈ T , x ∈ S τ (ε)}. We write The leading term is an empirical process indexed by the functions in L k ≡ L k,1 · L k,2 . Approximating the indicator function inl τ by upper and lower Lipschitz functions and following similar arguments in the proof of (i), we find that for some constant C > 0. Note that we can take a constant function C as an envelope of L k . Then we follow the proof of Lemma 2 to obtain that By using (B.3) and (B.4), we find that Since √ nh r+1 → 0, we obtain the desired result. (iii) Recall the definition of g n,x,τ,k (S i ; s, b, a) in the proof of Lemma QR1(i). We write Using change of variables, we rewrite By expanding the difference, we have where R n,x,i (a, b) denotes the remainder term in the expansion. As for the leading integral, Hence, for any sequences a n , b n , we can write E[ζ ∆ n,x,τ,k (a n , b n )] as where the last O(h r+1 a n b n ) term is due to (B.3). We can bound where C 1 > 0 and C 2 > 0 are constants that do not depend on n or P ∈ P.
The following lemma is the bootstrap analogue of Lemma QR1.
(ii) Note that The difference between the first two terms on the right hand side is √ log n n 1/4 h d/4 , uniformly in P ∈ P, as we have seen in (i). We apply Lemma QR1(iii) to the last expectation in (B.20) to obtain the desired result.
Proof of Lemma QR4. The proof uses Lemma QR3 in the same way as the proof of Lemma QR2 uses Lemma QR1. Details are omitted.
Appendix C. Proofs of Auxiliary Results for Lemmas A2(i), Lemma A4(i), and Theorem 1 The eventual result in this appendix is Lemma C9 which is used to show the asymptotic normality of the location-scale normalized representation ofθ and its bootstrap version, and to establish its asymptotic behavior in the degenerate case. For this, we first prepare Lemmas C1-C3. To obtain uniformity that covers the case of degeneracy, this paper uses a method of regularization, where the covariance matrix of random quantities is added by a diagonal matrix of small diagonal elements. The regularized random quantities having this covariance matrix does not suffer from degeneracy in the limit, even when the original quantities have covariate matrix that is degenerate in the limit. Thus, for these regularized quantities, we can obtain uniform asymptotic theory using an appropriate Berry-Esseen-type bound. Then, we need to deal with the difference between the regularized covariance matrix and the original one. Lemma C1 is a simple result of linear algebra that is used to control this discrepancy.
Lemma C2 has two sub-results from which one can deduce a uniform version of Levy's continuity theorem. We have not seen any such results in the literature or monographs, so we provide its full proof. The result has two functions. First, the result enables one to deduce convergence in distribution in terms of convergence of cumulative distribution functions and convergence in distribution in terms of convergence of characteristic functions in a manner that is uniform over a given collection of probabilities. The original form of convergence in distribution due to the Poissonization method in Giné, Mason, and Zaitsev (2003) is convergence of characteristic functions. Certainly pointwise in P , this convergence implies convergence of cumulative distribution functions, but it is not clear under what conditions this implication is uniform over a given class of probabilities. Lemma C2 essentially clarifies this issue.
Lemma C3 is an extension of de-Poissonization lemma that appeared in Beirlant and Mason (1995). The proof is based on the proof of their same result in Giné, Mason, and Zaitsev (2003), but involves a substantial modification, because unlike their results, we need a version that holds uniformly over P ∈ P. This de-Poissonization lemma is used to transform the asymptotic distribution theory for the Poissonized version of the test statistic into that for the original test statistic.
Lemmas C4-C5 establish some moment bounds for a normalized sum of independent quantities. This moment bound is later used to control a Berry-Esseen-type bound, when we approximate those sums by corresponding centered normal random vectors.
Lemma C6 obtains an approximate version for the scale normalizer σ n . The approximate version involves a functional of a Gaussian random vector, which stems from approximating a normalized sum of independent random vectors by a Gaussian random vector through using a Berry-Esseen-type bound. For this result, we use the regularization method that we mentioned before. Due to the regularization, we are able to cover the degenerate case eventually.
Lemma C7 is an auxiliary result that is used to establish Lemma C9 in combination with the de-Poissonization lemma (Lemma C3). And Lemma C8 establishes asymptotic normality of the Poissonized version of the test statistics. The asymptotic normality for the Poissonized statistic involves the discretization of the integrals, thereby, reducing the integral to a sum of 1-dependent random variables, and then applies the Berry-Esseen-type bound in Shergin (1993). Note that by the moment bound in Lemmas C4-C5 that is uniform over P ∈ P, we obtain the asymptotic approximation that is uniform over P ∈ P. The lemma also presents a corresponding result for the degenerate case.
Finally, Lemma C9 combines the asymptotic distribution theory for the Poissonized test statistic in Lemma C7 with the de-Poissonization lemma (Lemma C3) to obtain the asymptotic distribution theory for the original test statistic. The result of Lemma C9 is used to establish the asymptotic normality result in Lemma A7.
The following lemma provides some inequality of matrix algebra.
Lemma C1. For any J × J positive semidefinite symmetric matrix Σ and any ε > 0, where A = tr(AA ) for any square matrix A.
Remark 1. The main point of Lemma C1 is that the bound √ Jε is independent of the matrix Σ. Such a uniform bound is crucially used for the derivation of asymptotic validity of the test uniform in P ∈ P.
Lemma C2. Suppose that V n ∈ R d is a sequence of random vectors and V ∈ R d is a random vector. We assume without loss of generality that V n and V live on the same measure space (Ω, F), and P is a given collection of probabilities on (Ω, F). Furthermore define F n (t) ≡ P {V n ≤ t} , and F (t) ≡ P {V ≤ t} .
(i) Suppose that the distribution P • V −1 is uniformly tight in {P • V −1 : P ∈ P}. Then for any continuous function f on R d taking values in [−1, 1] and for any ε ∈ (0, 1], we have On the other hand, if for each t ∈ R d , Proof of Lemma C2. (i) The proof uses arguments in the proof of Lemma 2.2 of van der Vaart (1998). Take a large compact rectangle B ⊂ R d such that P {V / ∈ B} < ε. Since the distribution of V is tight uniformly over P ∈ P, we can take such B independently of P ∈ P. Take a partition B = ∪ Jε j=1 B j and points x j ∈ B j such that J ε ≤ C d,1 ε −d , and |f (x) − f ε (x)| < ε for all x ∈ B, where C d,1 > 0 is a constant that depends only on d, and The second inequality following by P {V / ∈ B} < ε. As for the last term, we let and observe that where C d,2 > 0 is a constant that depends only on d. The last inequality follows because for any rectangle B j , we have |P {V n ∈ B j } − P {V ∈ B j }| ≤ C d,2 b n for some C d,2 > 0. We conclude that where C d = C d,2 {C d,1 + 1}. The last inequality follows because ε ≤ 1.
(ii) We show the first statement. We first show that under the stated condition, the sequence {P • V −1 n } ∞ n=1 is uniformly tight uniformly over P ∈ P. That is, for any ε > 0, we show there exists a compact set B ⊂ R d such that for all n ≥ 1, For this, we assume d = 1 without loss of generality, let P n denote the distribution of V n and consider the following: (using arguments in the proof of Theorem 3.3.6 of Durrett (2010)) Defineē n ≡ sup P ∈P sup t∈R |ϕ n (t) − ϕ(t)|. Using Theorem 3.3.8 of Durrett (2010), we bound the last term by The supremum of the right hand side terms over P ∈ P vanishes as we send n → ∞ and then u ↓ 0, by the assumption that sup P ∈P E|V | 2 < ∞. Hence the sequence {P • V −1 n } ∞ n=1 is uniformly tight uniformly over P ∈ P. Now, for each t ∈ R d , there exists a subsequence {n } ⊂ {n} and {P n } ⊂ P such that where F n (t; P n ) = P n {V n ≤ t} and F (t; P n ) = P n {V ≤ t} .
is uniformly tight (as shown above), there exists a subsequence {n k } ⊂ {n } such that is uniformly tight and hence there exists a further subsequence {n k j } ⊂ {n k } such that (C.4) F (t; P n k j ) → F * * (t), as j → ∞, for some cdf F * * . Since {n k j } ⊂ {n k }, we have from (C.3), By the condition of (ii), we have where ϕ n (u; P n ) = E Pn (exp (iuV n )) and ϕ (u; P n ) = E Pn (exp (iuV )) , and E Pn represents expectation with respect to the probability measure P n . Furthermore, by (C.4) and (C.5), and Levy's Continuity Theorem, lim j→∞ ϕ n k j (u; P n k j ) and lim j→∞ ϕ(u; P n k j ) exist and coincide by (C.6). Therefore, for all t ∈ R d , In other words, lim n →∞ |F n (t; P n ) − F (t; P n )| = lim n →∞ F n k j t; P n k j − F t; P n k j = 0.
Therefore, the first statement of (ii) follows by the last limit applied to (C.2). Let us turn to the second statement. Again, we show that {P • V −1 n } ∞ n=1 is uniformly tight uniformly in P ∈ P. Note that given a large rectangle B, There exists N such that for all n ≥ N , the first difference vanishes as n → ∞, uniformly in P ∈ P, by the condition of the lemma. As for the second term, we bound it by where V j is the j-th entry of V and B = × d j=1 [a j , b j ], b j < 0 < a j . By taking a j 's large enough, we make the last bound arbitrarily small independently of P ∈ P, because sup P ∈P EV 2 j < ∞ for each j = 1, · · ·, d. Therefore, {P • V −1 n } ∞ n=1 is uniformly tight uniformly in P ∈ P. Now, we turn to the proof of the second statement of (ii). For each u ∈ R d , there exists a subsequence {n } ⊂ {n} and {P n } ⊂ P such that where ϕ n (u; P n ) = E Pn exp(iu V n ) and ϕ(u; P n ) = E Pn exp(iu V ). By the condition in the second statement of (ii), for each t ∈ R d , is uniformly tight (as shown above), there exists a subsequence {n k } ⊂ {n } such that F n k (t; P n k ) → F * (t), as k → ∞, and hence by Levy's Continuity Theorem, we have ϕ n k (u; P n k ) → ϕ * (u), as k → ∞. Similarly, we also have ϕ(u; P n k ) → ϕ * * (u), as k → ∞. By (C.7), we have F * (t) = F * * (t) and ϕ * (u) = ϕ * * (u). Therefore, lim n →∞ |ϕ n (u; P n ) − ϕ (u; P n )| = lim n →∞ ϕ n k j u; P n k j − ϕ u; P n k j = 0.
Thus we arrive at the desired result.
Let {S n } ∞ n=1 be a sequence of random variables and P be a given set of probabilities P on a measure space on which (S n , U n (α P ), V n (α P )) lives, where α P ∈ (0, 1) is a quantity that may depend on P ∈ P and for some ε > 0, Furthermore, assume that for each n ≥ 1, the random vector (S n , U n (α P )) is independent of V n (α P ) with respect to each P ∈ P. Let for t 1 , t 2 ∈ R 2 , b n,P (t 1 , t 2 ; σ P ) ≡ P {S n ≤ t 1 , U n (α P ) ≤ t 2 } − P {σ P Z 1 ≤ t 1 , where Z 1 and Z 2 are independent standard normal random variables and σ 2 P > 0 for each P ∈ P. (Note that inf P ∈P σ 2 P is allowed to be zero.) where a n,P (ε) ≡ ε −d b n,P + ε, b n,P ≡ sup t 1 ,t 2 ∈R b n,P (t 1 , t 2 ; σ P ), and ε is the constant in (C.8).
(ii) Suppose further that for all t 1 , t 2 ∈ R, as n → ∞, Then, for all t ∈ R, we have as n → ∞, Remark 2. While the proof of Lemma C3 follows that of Lemma 2.4 of Giné, Mason, and Zaitsev (2003), it is worth noting that in contrast to Lemma 2.4 of Giné, Mason, and Zaitsev (2003) or Theorem 2.1 of Beirlant and Mason (1995), Lemma C3 gives an explicit bound for the difference between the conditional characteristic function of S n given N n (α P ) = n and the characteristic function of N (0, σ 2 P ). Under the stated conditions, (in particular (C.8)), the explicit bound is shown to depend on P ∈ P only through b n,P . Thus in order to obtain a bound uniform in P ∈ P, it suffices to control α P and b n,P uniformly in P ∈ P.
By the condition of the lemma and Lemma C2(i), we have for any ε > 0, Note that a n,P (ε) depends on P ∈ P only through b n,P .
Following the proof of Lemma 2.4 of Giné, Mason, and Zaitsev (2003), we have uniformly over P ∈ P. Note that the equality comes after applying Sterling's formula to 2πP {N n (α P ) = n} and change of variables from u to v/ √ n. (See the proof of Lemma 2.4 of Giné, Mason, and Zaitsev (2003).) The distribution of N n (α P ), being Poisson (n), does not depend on the particular choice of α P ∈ (0, 1), and hence the o(1) term is o(1) uniformly over t ∈ R and over P ∈ P. We follow the proof of Theorem 3 of Feller (1966, p.517) to observe that there exists n 0 > 0 such that uniformly over α ∈ [ε, 1 − ε], for all n > n 0 . Note that the distribution of V n (α P ) depends on P ∈ P only through α P ∈ [ε, 1 − ε] and ε does not depend on P . Since there exists n 1 such that for all n > n 1 , the previous inequality implies that for all n > max{n 0 , n 1 }, By (C.9) and (C.10), and from some large n on that does not depend on P ∈ P, we conclude that for each t ∈ R, as n → ∞. Since the right hand side does not depend on t ∈ R and P ∈ P, we obtain the desired result.
(ii) By the condition of the lemma and Lemma C2(ii), we have for any t, u ∈ R, as n → ∞. The rest of the proof is similar to that of (i). We omit the details.
Proof of Lemma C4. The proof can be proceeded by using Assumption A6(i) and following the proof of Lemma 4 of Lee, Song, and Whang (2013).
Let N be a Poisson random variable with mean n and independent of ( Let N 1 be a Poisson random variable with mean 1, independent of (Y Lemma C5. Suppose that Assumption A6(i) holds. Then for any m ∈ [2, M ] (with M > 0 being the constant in Assumption A6(i)) whereC 1 ,C 2 > 0 are constants that depend only on m.
Proof of Lemma C5. Let q n,τ,j (x) be the j-th entry of q n,τ (x). For the first statement of the lemma, it suffices to observe that for some positive constants C 1 andC, where the first inequality uses the definition of k n,τ,j,m , and the last inequality uses Lemma C4 and the fact that m ∈ [2, M ]. The second statement in (C.11) follows similarly. We consider the statements in (C.12). We consider the first inequality in (C.12). Let z N,τ,j (x) be the j-th entry of z N,τ (x). Then using Rosenthal's inequality (e.g. (2.3) of Giné, Mason, and Zaitsev (2003)), we find that max Eq 2 n,τ,j (x) m/2 , n −m/2+1 E|q n,τ,j (x)| m .
As for the second inequality in (C.12), for some C > 0, we use the second inequality in (C.11) and use Rosenthal's inequality in the same way as before, to obtain the inequality.
The following lemma offers a characterization of the scale normalizer of our test statistic.
Then for Borel sets B, B ⊂ S and A, A ⊂ N J , let The lemma below shows that σ R n,A,A (B, B ) and σ n,A,A (B, B ) are asymptotically equivalent uniformly in P ∈ P. We introduce some notation. Recall the definition of Σ n,τ 1 ,τ 2 (x, u), which is found below (6.7). Define forε > 0, where I J is the J dimensional identity matrix. CertainlyΣ n,τ 1 ,τ 2 ,ε (x, u) is positive definite. We define where η 1 ∈ R J and η 2 ∈ R J are random vectors that are independent, and independent of (Y i , X i ) ∞ i=1 , each following N (0,εI J ), and z N,τ (x; η 1 ) ≡ z N,τ (x)+η 1 / √ nh d . We are prepared to state the lemma.
Then for any sequences of Borel sets B n , B n ⊂ S and for any A, A ⊂ N J , where o(1) vanishes uniformly in P ∈ P as n → ∞.
Remark 3. The main innovative element of Lemma C6 is that the result does not require that σ n,A,A (B n , B n ) be positive for each finite n or positive in the limit. Hence the result can be applied to the case where the scale normalizer σ R n,A,A (B n , B n ) is degenerate (either in finite samples or asymptotically).
Then it suffices for the lemma to show the following two statements.
Step 1: As n → ∞, Step 2: For some C > 0 that does not depend onε or n, Then the desired result follows by sending n → ∞ and thenε ↓ 0, while chaining Steps 1 and 2. Proof of Step 1: We first focus on the first statement. For any vector where [a] 1 of a vector a ∈ R 2J indicates the vector of the first J entries of a, and [a] 2 the vector of the remaining J entries of a. By Theorem 9 of Magnus and Neudecker (2001, p. 208), Let q n,τ,j (x; η 1j ) ≡ p n,τ,j (x) + η 1j , where η 1j is the j-th entry of η 1 , and N 1 is a Poisson random variable with mean 1 and ( Let p n,τ (x) be the column vector of entries p n,τ,j (x) with j running in the set N J . Let [p (i) n,τ 1 (x), p (i) n,τ 2 (x + uh)] be i.i.d. copies of [p n,τ 1 (x), p n,τ 2 (x + uh)] and η (i) 1 and η (i) 2 be also i.i.d. copies of η 1 and η 2 . Define Note that The last sum has the same distribution as [η 1 , η 2 ] and the leading sum on the right-hand side has the same distribution as that of [z N,τ 1 (x), z N,τ 2 (x + uh)] . Therefore, we conclude that . Now we invoke the Berry-Esseen-type bound of Sweeting (1977, Theorem 1) to prove Step 1. By Lemma C5, we deduce that (C.20) sup for some C > 0. Also, recall the definition of ρ n,τ 1 ,τ 1 ,j,j (x, 0) in (6.7) and note that sup P ∈P j∈J (ρ n,τ 1 ,τ 1 ,j,j (x, 0) + ρ n,τ 2 ,τ 2 ,j,j (x, 0) + 2ε) ≤ C, for some C > 0 that depends only on J andε by Lemma C4. Observe that by the definition of C n,p in (C.18), and (C.21), We find that for each u ∈ U, ||W (i) n,τ 1 ,τ 2 (x, u)|| 2 is equal to Therefore, E||W (i) n,τ 1 ,τ 2 (x, u)|| 3 is bounded by where C 1 > 0 and C 2 > 0 are constants depending only on J, and the last bound follows by (C.20). Therefore, by Theorem 1 of Sweeting (1977), we find that withε > 0 fixed and n → ∞, n,τ 1 ,τ 2 (x, u) − EC n,p Z n,τ 1 ,τ 2 (x, u) (C.23) Using similar arguments, we also deduce that for j = 1, 2, and A ⊂ N J , For some C > 0, Cov (Λ p (Z n,τ 1 ,τ 2 ,ε (x)), Λ p (Z n,τ 1 ,τ 2 ,ε (x + uh))) The last inequality follows because Z n,τ 1 ,τ 2 ,ε (x) and Z n,τ 1 ,τ 2 ,ε (x + uh) are centered normal random vectors with a covariance matrix that has a finite Euclidean norm by Lemma C4. Hence we apply the Dominated Convergence Theorem to deduce the first statement of Step 1 from (C.23). We turn to the second statement of Step 1. The statement immediately follows because for each u ∈ U, the covariance matrix ofΣ −1/2 n,τ 1 ,τ 2 ,ε (x, u)ξ n,τ 1 ,τ 2 ,ε (x, u) is equal to the covariance matrix of [W (1) n,τ 1 ,τ 2 ,ε (x, u), W (2) n,τ 1 ,τ 2 ,ε (x, u)] and w τ 1 ,Bn (x)w τ 2 ,B n (x + uh) − w τ 1 ,Bn (x)w τ 2 ,B n (x) → 0, as n → ∞, for each u ∈ U, and for almost every x ∈ X (with respect to Lebesgue measure.) Proof of Step 2: We consider the first statement. First, we write By Hölder inequality, for C > 0 that depends only on P , where, if p = 1 then we set s = 2, and q = 1, and if p > 1, we set s = (p + 1)/(p − 1) and where Z ∈ R J is a centered normal random vector with identity covariance matrix I J . Also, we deduce that for some C > 0, by (C.12) of Lemma C5 and by the fact that 2s(p − 1) = 2(p + 1) ≤ M . Similarly, from some large n on, for some C > 0. Thus we conclude that for some C > 0, and that for some C > 0, Using similar arguments, we also find that for some C > 0, Therefore, there exist C 1 > 0 and C 2 > 0 such that from some large n on, Since the last multiple integral is finite, we obtain the first statement of Step 2.
We turn to the second statement of Step 2. Similarly as before, we write n,τ 1 ,τ 2 ,ε (x, u)). Now, observe that for C > 0 that does not depend onε, we have by Lemma C1(i), Using this, recalling the definitions of W (1) n,τ 1 ,τ 2 (x, u) and W (2) n,τ 1 ,τ 2 (x, u) in (C.17), and following the previous arguments, we obtain the second statement of Step 2.
Lemma C7. Suppose that for some small ν 1 > 0, n −1/2 h −d−ν 1 → 0, as n → ∞ and the conditions of Lemma C6 hold. Then there exists C > 0 such that for any sequence of Borel sets B n ⊂ S, and A ⊂ N J , from some large n on, Remark 4. The result is in the same spirit as Lemma 6.2 of Giné, Mason, and Zaitsev (2003).
(Also see Lemma A8 of Lee, Song and Whang (2013).) However, unlike these results, the location normalization here involves E[Λ A,p ( ], but with a stronger bandwidth condition. Like Lemma C6, the result of Lemma C7 does not require that the quantities √ nh d z n,τ (x) and √ nh d z N,τ (x) have a (pointwise in x) nondegenerate limit distribution.
Proof of Lemma C7. As in the proof of Lemma A8 of Lee, Song, and Whang (2013), it suffices to show that there exists C > 0 such that C does not depend on n and for any Borel set B ⊂ R, Step 1: Step 2: Since E|n −1/2 (n − N )| 2 does not depend on the joint distribution of (Y i , X i ), E|n −1/2 (n − N )| 2 ≤ O(1) uniformly over P ∈ P. Combining this with the second statement of (C.12), the product on the right hand side becomes O(n −1 ) uniformly over P ∈ P.
Proof of Claim 2: Let η 1 ∈ R J be the random vector defined prior to Lemma C6, and define As for the last term, since N and η 1 are independent, it is bounded by from some large n on.

Proof of
Step 2: We can follow the proof of Lemma C6 to show that where C n,τ 1 ,τ 2 ,A,A (x, u) is defined in (C.14) and o(1) is uniform over P ∈ P. Now, observe that Therefore, for some C > 0. Now, observe that because T is a bounded set. Thus the proof of Step 2 is completed.
The next lemma shows the joint asymptotic normality of a Poissonized version of a normalized test statistic and a Poisson random variable. Using this result, we can apply the de-Poissonization lemma in Lemma C3. To define a Poissonized version of a normalized test statistic, we introduce some notation.
Let C ⊂ R d be a compact set such that C does not depend on P ∈ P and α P ≡ P {X ∈ R d \C} satisfies that 0 < inf P ∈P α P ≤ sup P ∈P α P < 1. Existence of such C is assumed in Assumption A6(ii). For c n → ∞, we let B n,A (c n ; C) ≡ B n,A (c n ) ∩ (C × T ), where we recall the definition of B n,A (c n ) = B n,A (c n , c n ). (See the definition of B n,A (c n,1 , c n,2 ) before Lemma 1.) Define Let µ A 's be real numbers indexed by A ∈ N J , and define where we recall the definition of σ n,A,A (·, ·) prior to Lemma C6. Define The following lemma establishes the joint convergence of H n . In doing so, we need to be careful in dealing with uniformity in P ∈ P, and potential degeneracy of the normalized test statistic S n .
Lemma C8. Suppose that the conditions of Lemma C6 hold and that c n → ∞ as n → ∞.
Remark 5. The joint convergence result is divided into two separate results. The first case is a situation where S n is asymptotically nondegenerate uniformly in P ∈ P. The second case deals with a situation where S n is asymptotically degenerate for some P ∈ P.
First, we show the following statements.
Step 3: There exists c > 0 such that from some large n on, inf P ∈P λ min (C n ) > c.

Proof of
Step 1: Observe that from an inequality similar to (C.26) in the proof of Lemma C7, Using the fact that S is compact and does not depend on P ∈ P, for some constants C 1 , C 2 , C 3 > 0 that do not depend on P ∈ P, by the independence between η 1 and {z N,τ (x) : (x, τ ) ∈ S}, and by the second statement of Lemma C5. From the fact that we obtain the desired result.

Proof of
Step 2: Let Σ 2n,τ,ε be the covariance matrix of [(q n,τ (x) + η 1 ) ,Ũ n ] , where U n = U n / P {X ∈ C}. We can write Σ 2n,τ,ε as The first matrix on the right hand side is certainly positive semidefinite. Note that n )'s with k = 1, · · ·, n are i.i.d. copies of (q n,τ,j (x),Ū n ), wherē where N 1 is the Poisson random variable with mean 1 that is involved in the definition of q n,τ,j (x). Hence as for A n,τ (x), note that for C 1 , C 2 > 0, We conclude that sup Therefore, from some large n on, Similarly as in (C.22), we find that for some C > 0, from some large n on, where the last inequality uses (C.33). As for the last expectation, note that by Rosenthal's inequality, we have sup for some C > 0. We apply the first statement of Lemma C5 to conclude that where [a] 1 of a vector a ∈ R J+1 indicates the vector of the first J entries of a, and [a] 2 the last entry of a. By Theorem 1 of Sweeting (1977), we find that (withε > 0 fixed) where Z J+1 ∼ N (0, I J+1 ) and W (i) n,τ (x; η 1 )'s are i.i.d. copies of W n,τ (x; η 1 ). Since O(n −1/2 h −d/2 ) = o(h d/2 ) (by the condition that n −1/2 h −d−ν → 0, as n → ∞), Noting that E[D n,τ,p (Z J+1 )] = 0, we conclude that By applying the Dominated Convergence Theorem, we obtain Step 2.

Proof of
Step 3: First, we show that where o(1) is an asymptotically negligible term uniformly over P ∈ P. Note that . By Lemma C6, we find that for A, A ∈ N J , Cov(ψ n,A , ψ n,A ) = σ n,A,A (B n,A (c n ; C), B n,A (c n ; C)) + o(1), uniformly in P ∈ P, yielding the desired result.
Combining Steps 1 and 2, we deduce that Letσ 2 1 ≡ inf P ∈P σ 2 n (C) andσ 2 2 ≡ inf P ∈P (1 − α P ). Note that for some C 1 > 0, by the condition of the lemma. A simple calculation gives us where the last inequality follows by (C.35) and (C.36). Takingε small enough, we obtain the desired result.

Proof of
Step 4: Suppose that liminf n→∞ inf P ∈P σ 2 n (C) > 0. Let κ be the diameter of the compact set K 0 introduced in Assumption A2. Let C be given as in the lemma. Let Z d be the set of d-tuples of integers, and let {R n,i : i ∈ Z d } be the collection of rectangles in R d such that R n,i = [a n,i 1 , b n,i 1 ] × · · · · ×[a n,i d , b n,i d ], where i j is the j-th entry of i, and hκ ≤ b n,i j − a n,i j ≤ 2hκ, for all j = 1, · · ·, d, and two different rectangles R n,i and R n,j do not have intersection with nonempty interior, and the union of the rectangles R n,i , i ∈ Z d n , cover X , from some sufficiently large n on, where Z d n be the set of d-tuples of integers whose absolute values less than or equal to n.
We let and I n ≡ {i ∈ Z d n : B n,i = ∅}. Then B n,i has Lebesgue measure m(B n,i ) bounded by C 1 h d and the cardinality of the set I n is bounded by C 2 h −d for some positive constants C 1 and C 2 . Now let us define And also define B n,A,i (c n ) ≡ (B n,i × T ) ∩ B n,A (c n ), Then, we can write S n σ n (C) = i∈In α n,i and U n = i∈In u n,i .
By the definition of K 0 in Assumption A2, by the definition of R n,i and by the properties of Poisson processes, one can see that the array {(α n,i , u n,i )} i∈In is an array of 1-dependent random field. (See Mason and Polonik (2009) for details.) For any q 1 , q 2 ∈ R, let y n,i ≡ q 1 α n,i + q 2 u n,i . The focus is on the convergence in distribution of i∈In y n,i uniform over P ∈ P. Without loss of generality, we choose q 1 , q 2 ∈ R\{0}. Define V ar P i∈In y n,i = q 2 1 + q 2 2 (1 − α P ) + 2q 1 q 2 c n,P , uniformly over P ∈ P, where c n,P = Cov(S n , U n ). On the other hand, using Lemma C4 and following the proof of Lemma A8 of Lee, Song, and Whang (2013), we deduce that (C.38) sup P ∈P i∈In E|y n,i | r = o(1) as n → ∞, for any r ∈ (2, (2p + 2)/p]. By Theorem 1 of Shergin (1993), we have sup P ∈P sup t∈R P 1 q 2 1 + q 2 2 (1 − α P ) + 2q 1 q 2 c n,P i∈In for some C > 0, by (C.38). Therefore, by Lemma C2(i), we have for each t ∈ R, and each q ∈ R 2 \{0}, as n → ∞, to control the discrepancy between the sample version of the scale normalizer σ n , and its population version. Then we proceed to prove Lemmas D4-D9 which run in parallel with Lemmas C4-C9 as their bootstrap counterparts. We finish this subsection with Lemmas D10-D12 which are crucial for dealing with the bootstrap test statistic's location normalization. More specifically, Lemmas D10 and D11 are auxiliary moment bound results that are used for proving Lemma D12. Lemma D12 essentially delivers the result of Lemma A1 in Appendix A. This lemma is used to deal with the discrepancy between the population location normalizer and the sample location normalizer. Controlling this discrepancy to the rate o P (h d/2 ) is crucial for our purpose, because our bootstrap test statistic does not involve the sample version of the location normalizer a n for computational reasons. Lemmas D10 and D11 provide necessary moment bounds to achieve this convergence rate. The random variables N and N 1 represent Poisson random variables with mean n and 1 respectively. These random variables are independent of (Y . Let η 1 and η 2 be centered normal random vectors that are independent of each other and independent of (Y , N, N 1 . We will specify their covariance matrices in the proofs below. Throughout the proofs, the bootstrap distribution P * and expectations E * are viewed as the distribution of Note thatρ n,τ 1 ,τ 2 ,j,k (x, u) andk n,τ,j,m (x) are bootstrap versions of ρ n,τ 1 ,τ 2 ,j,k (x, u) andk n,τ,j,m (x). The lemma below establishes that the bootstrap versionρ n,τ 1 ,τ 2 ,j,k (x, u) is consistent for ρ n,τ 1 ,τ 2 ,j,k (x, u).
Lemma D4. Suppose that Assumption A6(i) holds and that for some C > 0, Then for all m ∈ [2, M ] and all ε ∈ (0, ε 1 ), with M > 0 and ε 1 > 0 being the constants that appear in Assumption A6(i)), there exists C 1 ∈ (0, ∞) that does not depend on n such that for each j ∈ N J , sup Similarly as in the proof of Lemma D3, we note that Hence the desired statement follows from Lemma C4.
Proof of Lemma D5. Let q * n,τ,j (x) be the j-th entry of q * n,τ (x). For the first statement of the lemma, it suffices to observe that for each ε ∈ (0, ε 1 ), there exist C 1 > 0 andC 1 > 0 such that where the last inequality uses Lemma D4. The second inequality in (D.1) follows similarly.
Then for any sequences of Borel sets B n , B n ⊂ S and for any A, A ⊂ N J , where σ 2 n,A,A (B n , B n ) is as defined in (C.15).
Proof of Lemma D6. The proof is very similar to that of Lemma C6. For brevity, we sketch the proof here. Define forε > 0, N,τ 2 (x + uh; η 2 ))), and g 2n,τ 1 ,τ 2 ,ε (x, u) ≡ Cov * (Λ A,p (Z n,τ 1 ,τ 2 ,ε (x)), Λ A ,p (Z n,τ 1 ,τ 2 ,ε (x + uh))), and [Z n,τ 1 ,τ 2 ,ε (x),Z n,τ 1 ,τ 2 ,ε (z)] is a centered normal R 2J -valued random vector with the same covariance matrix as the covariance matrix of [ under the product measure of the bootstrap distribution P * and the distribution of (η 1 , η 2 ) . As in the proof of Lemma C6, it suffices for the lemma to show the following two statements. ( Step 2 ): For some C > 0 that does not depend onε or n, Then the desired result follows by sending n → ∞ andε ↓ 0, while chaining Steps 1 and 2 and the second convergence in Step 2 in the proof of Lemma C6. We first focus on the first statement of (Step 1). For any vector where [a] 1 of a vector a ∈ R 2J indicates the vector of the first J entries of a, and [a] 2 the vector of the remaining J entries of a. Also, similarly as in (C.19), (D.6) λ min Σ * n,τ 1 ,τ 2 ,ε (x, u) ≥ε.
Thus the proof of the lemma is complete.
Lemma D7. Suppose that for some small ν 1 > 0, n −1/2 h −d−ν 1 → ∞, as n → ∞ and the conditions of Lemma C6 hold. Then there exists C > 0 such that for any sequence of Borel sets B n ⊂ S, and A ⊂ N J , from some large n on, Proof of Lemma D7. We follow the proof of Lemma C7 and show that for some C > 0, we have the following: Step 1: Step 2: where q * (i) n,τ (x; η (i) 1 )'s (i = 1, 2, · · · ) are as defined in the proof of Lemma D6 and q * (i) n,τ,j (x; η (i) 1j ) is the j-th entry of q * (i) n,τ (x; η (i) 1 ) andσ 2 n,τ,j (x) = V ar * (q * (i) n,τ,j (x; η (i) 1j )) > 0 and V ar * denotes the variance with respect to the joint distribution of ((Y * i , X * i ) n i=1 , η (i) 1j ) conditional on (Y i , X i ) n i=1 . We apply Lemma 1(i) of Horváth (1991) to deduce that E * n i=N +1 q * (i) n,τ,j (x; η Using Claims 1 and 2, and following the arguments in the proof of Lemma C7, we obtain (Step 1).
Let C ⊂ R d , α P ≡ P {X ∈ R d \C} and B n,A (c n ; C) be as introduced prior to Lemma C8. Define Let µ A 's be real numbers indexed by A ⊂ N J . We also define B n,A (c n ; C) as prior to Lemma C8 and let We let The following lemma is a bootstrap counterpart of Lemma C8.
Lemma D8. Suppose that the conditions of Lemma D6 hold and that c n → ∞, as n → ∞.
First, we show the following statements.
Step 3: There exists c > 0 such that from some large n on, inf P ∈P λ min (C n ) > c.

Proof of
Step 1: Observe that As in the proof of Step 1 in the proof of Lemma C8, we deduce that dQ(x, τ ).
Using (D.12) and Lemma D5, and following the same arguments in the proof of Step 2 in the proof of Lemma C8, we deduce that where [a] 1 of a vector a ∈ R J+1 indicates the vector of the first J entries of a, and [a] 2 the last entry of a. By Theorem 1 of Sweeting (1977), we find that (withε > 0 fixed) The last probability vanishes uniformly in P ∈ P by (D.13). By applying the Dominated Convergence Theorem, we obtain (Step 2).

Proof of
Step 4: We take {R n,i : i ∈ Z d }, and define = i∈In α * n,i and U * n = i∈In u * n,i .
Hence, we find that S * n = o P * (1) in P . The desired result follows by applying Theorem 1 of Shergin (1993) to the sum U * n = i∈In u * n,i , and then applying Lemma C2.
Lemma D9. Let C be the Borel set in Lemma D8.
(i) Suppose that the conditions of Lemma D8(i) are satisfied. Then for each a > 0, as n → ∞, (ii) Suppose that the conditions of Lemma D8(ii) are satisfied. Then for each a > 0, as n → ∞, Proof of Lemma D9. The proofs are precisely the same as those of Lemma C9, except that we use Lemma D8 instead of Lemma C8 here.
Proof of Lemma D10. We first establish the following fact. Fact: Suppose that W is a random vector such that E||W || 2 ≤ c W for some constant c W > 0. Then, for any r ≥ 2 and a positive integer m ≥ 1, where a m (r) = 2 m (r − 2) + 2, and C m > 0 is a constant that depends only on m and c W .
Proof of Lemma D11. The proof is precisely the same as that of Lemma D10, where we use Lemma D5 instead of Lemma C5.
We let for a sequence of Borel sets B n in S and λ ∈ {0, d/4, d/2}, A ⊂ N J , and a fixed bounded function δ on S, a R n (B n ) ≡ Bn E Λ A,p ( √ nh d z N,τ (x) + h λ δ(x, τ )) dQ(x, τ ) a R * n (B n ) ≡ Bn E * Λ A,p ( √ nh d z * N,τ (x) + h λ δ(x, τ )) dQ(x, τ ), and a n (B n ) ≡ Bn E Λ A,p (W (1) n,τ,τ (x, 0) + h λ δ(x, τ )) dQ(x, τ ), where z * N,τ (x) is a random vector whose j-th entry is given by Lemma D12. Suppose that the conditions of Lemmas D10 and D11 hold and that as n → ∞, for some small ν > 0. Then for any sequence of Borel sets B n in S, sup P ∈P a R n (B n ) − a n (B n ) = o(h d/2 ) and sup P ∈P P a R * n (B n ) − a n (B n ) > ah d/2 = o(1).