LIMIT THEORY FOR LOCALLY FLAT FUNCTIONAL COEFFICIENT REGRESSION

Functional coefficient (FC) regressions allow for systematic flexibility in the responsiveness of a dependent variable to movements in the regressors, making them attractive in applications where marginal effects may depend on covariates. Such models are commonly estimated by local kernel regression methods. This paper explores situations where responsiveness to covariates is locally flat or fixed. The paper develops new asymptotics that take account of shape characteristics of the function in the locality of the point of estimation. Both stationary and integrated regressor cases are examined. The limit theory of FC kernel regression is shown to depend intimately on functional shape in ways that affect rates of convergence, optimal bandwidth selection, estimation, and inference. In FC cointegrating regression, flat behavior materially changes the limit distribution by introducing the shape characteristics of the function into the limiting distribution through variance as well as centering. In the boundary case where the number of zero derivatives tends to infinity, near parametric rates of convergence apply in stationary and nonstationary cases. Implications for inference are discussed and a feasible pre-test inference procedure is proposed that takes unknown potential flatness into consideration and provides a practical approach to inference.


INTRODUCTION
Kernel approaches to nonparametric regression use localized versions of standard statistical methods to fit shape characteristics of nonlinear functions in statistical models.These methods have been extensively used in applied research across the social, business, and natural sciences.The methods are particularly useful in assessing the role of nonlinearities and parameter instabilities and are used in modeling cross section, time series, and panel data.An especially useful model Our thanks go to the Co-Editor, Liangjun Su, and two referees for most helpful comments on the earlier versions of this paper.Phillips acknowledges research support from the NSF under Grant No. SES 18-50860 at Yale University and a Kelly Fellowship at the University of Auckland.Wang acknowledges support from the National Natural Science Foundation of China (Grant No. 72103197).Address correspondence to Peter C. B. Phillips, Yale University, New Haven, CT, USA; e-mail: peter.phillips@yale.edu.
for which these methods have been developed is functional coefficient (FC) regression.Such regressions allow the responses of a dependent variable to depend locally in a systematic way on movements in other variables.This paper demonstrates that the limit theory in FC regression depends on the functional shape 1 of the regression coefficient in ways that involve rates of convergence, asymptotic variance, bandwidth selection, and inference.Standard limit theory for kernel regression shows clearly how functional shape affects bias, which is well known to depend on the local first two derivatives of the regression function and the first derivative of the density of the covariate.The limit theory changes in material ways when these and possibly higher derivatives are zero at the point of estimation.Recent work on FC cointegrating regression (Phillips and Wang, 2021) pointed out dependence of the asymptotic variance on the first derivative of the functional coefficient in estimating cointegrating equations.But the effects of flat functional shape on the limit distribution, including the bias function and limiting variance, have not been explored in earlier nonparametric literature on FC regression in either stationary or nonstationary cases.There also appears to be no former research on the implications of flat functional shape on the limit theory for standard kernel density estimation or kernel regression.
The present paper provides results and methods that address these deficiencies.Most nonparametric research concerns functional shape and seeks to explore and comment upon shape characteristics that relate to underlying economic ideas.Examples include (i) functions of interest like elasticities that may be flat, increasing or decreasing in response to other variables; (ii) functions where extrema are of interest in which zero derivatives play a key role; or (iii) climate responses to policy changes on CO 2 emissions, where the magnitudes of departures from null effects are particularly relevant.
New asymptotics are developed in the paper that specifically involve shape characteristics of the function in the locality of the point of estimation.In particular, locally flat behavior in the coefficient function is shown to have a major effect on the form of the asymptotic distribution as well as the rate of convergence, with important differences between stationary and nonstationary regressions.Local flatness in the coefficient function at some point in the covariate space may be regarded as an intermediate case between the usual FC model and regression with a fixed coefficient, allowing for responses of the dependent variable to be unresponsive to movements in other variables at this point in their support.The primary focus in this paper is to develop asymptotics for FC regression under such flatness conditions.Related effects to those described here, as well as the methods provided, may be expected to apply in other nonparametric regression models where flatness occurs in nonlinear nonparametric regressions. 1 The terminology "functional shape" here refers to the local shape of the function at a given point as captured by the magnitude of its derivatives at that point.The terminology "flat functional shape" or "locally flat behavior," which is used extensively in the paper, refers to the fact that the first (L − 1)-th derivatives of the function at the local point are zeros while the L-th derivative is nonzero, for some integer L ≥ 1.A precise statement is given in Theorem 2.1.
A further contribution of the paper is the development of a feasible pre-test inference method.Local flatness in a functional coefficient is typically unknown a priori, although it may be hypothesized on the basis of some underlying theory.Direct estimation of the degree of flatness is shown to be challenging.The feasible pre-test approach takes potential flatness into consideration and is shown to deliver efficiency gains in the flat region over naive estimation and inference where potential flatness is simply ignored.Pre-test inference is found to work well in simulations and is competitive in performance with oracle inference where local flatness information is assumed to be known.
The paper is organized as follows.The new limit theory is given in Section 2, which covers both stationary and nonstationary FC regression.Section 3 discusses the implications of the limit theory for inference and proposes a practically feasible pre-test inference method.Section 4 provides simulation evidence corroborating the asymptotics.Section 5 concludes.Proofs of the main results, several subsidiary lemmas, and computation details are given in the Appendix.Additional technical details are provided in the Supplementary Material to this paper.Throughout the paper we use the notation ≡ d to signify equivalence in distribution, ∼ a to signify asymptotic equivalence, to denote weak convergence on the relevant probability space, • and • to denote floor and ceiling functions, [•] to signify the rounded part of a real number, and μ j (K) = K u j K(u)du,ν j (K) = K u j K 2 (u)du for kernel moment functions, where K is the support of the kernel function K.According to the context, we use := and =: to signify definitional equality.Unless otherwise indicated denotes 1 0 .

ASYMPTOTIC THEORY FOR LOCALLY FLAT FC ESTIMATION
The standard FC regression model is a simple extension of linear regression, taking the following form (2.1) in which the covariate z t determines the strength or weakness of the response of y t to the regressor x t .The regressor x t is a p × 1 time series, which may be stationary or nonstationary.The covariate z t is a q × 1 time series and is commonly, although not always, assumed to be stationary.The error term u t is a scalar stationary process.In view of its flexibility as a convenient extension of fixed parameter regression, the model has been extensively studied and applied in econometrics.
A popular textbook reference is by Li and Racine (2007, Chap. 9.3).Many papers have studied estimation and inference in this model under various assumptions, including early work by Cai, Fan, and Yao (2000) and Tu and Wang (2020) on stationary regression and much subsequent work on nonstationary regressions covering both cointegrated and noncointegrated models (Juhl, 2005;Xiao, 2009;Cai, Li, and Park, 2009;Sun, Hsiao, and Li, 2011;Wang, Tu, and Chen, 2016;Tu and Wang, 2019;Wang, Phillips, and Tu, 2019;Tu and Wang, 2022).
Kernel weighted local least squares regression is a standard approach to estimate the functional coefficient β(•) in (2.1).The local level least squares estimate of β(z) is β(z) = n t=1 x t x t K tz −1 n t=1 x t y t K tz with kernel function K tz = K((z t − z)/h) and bandwidth h.The estimate β(z) may be decomposed in the usual manner into "bias" and "variance" terms as x t u t K tz .
(2.2) Under suitable regularity conditions the limit theory for β(z) is normal or mixed normal after standard corrections are employed for bias and suitable recentering or undersmoothing is employed (Phillips and Wang, 2021).These asymptotics lead to a theory of estimation and inference for both stationary, cointegrating, and mixed regressor cases.Our treatment extends the existing limit theory to address the impact of locally flat behavior in the regression coefficient function β(•).We start with the stationary case.

The FC Stationary Model
It is convenient for exposition to use a prototypical version of the model (2.1) in which the following conditions are assumed.
(i) The strictly stationary processes {x t ,z t ,u t } are α-mixing processes with mixing numbers α( j) that satisfy j≥1 j c [α( j)] 1−2/δ < ∞ for some δ > 2, c > τ (1 − 2/δ) and τ > 1 with finite moments of order p > 2δ > 4. Further, (ii) The density f (z) of the scalar process z t and the joint density f 0,j (s 0 ,s j ) of z t ,z t+j are bounded above and away from zero over their supports with uniformly bounded and continuous derivatives to the second order.(iii) The kernel function K(•) is a bounded probability density function symmetric about zero with The stationarity conditions in Assumption (i) accord with earlier work on nonparametric and FC kernel regression for which the mixing requirements are commonly used to enable development of asymptotic theory in time series FC regression (e.g., Fan and Yao, 2008;Cai et al., 2000).A stronger mixing decay rate condition c > τ (1 − 2/δ) for some δ > 2 and τ > 1 in (i) is used in place of the more usual condition c > 1 − 2/δ to assist in the nonparametric limit distribution theory under dependence.The exogeneity condition in (i) is convenient for the limit theory.Relaxation of those conditions requires alternative methods such as FC instrumental variable methods and additional technical complications that are not within the goals of the present work to address.Heteroskedasticity is allowed as in Cai et al. (2000).The kernel assumptions in (iii) are commonly employed but when bandwidths are very small, as they are in some of the results herein, kernels with support K on the entire real line R are better suited, or other methods used to avoid finite sample failure in the kernel-weighted signal in the regression.
The smoothness conditions (iv) on β(z t ) and its derivatives are needed for the theory developed here because the limiting bias expressions rely on higher order derivatives of β(z t ).When the smoothness degree parameter L is unknown and estimated a stronger condition may be required to allow for potential overestimation of L in practice.Condition (v) is standard in nonparametric work and specific rate conditions involving (n,h) are given as needed in the results below.However, as shown in the analysis of limit behavior when L → ∞, the optimal bandwidth may no longer satisfy the contraction condition h → 0 in (v).
Our first result details the limit theory for the FC regression estimator β(z) in model (2.1) for the stationary case under locally flat conditions on the coefficient function.
THEOREM 2.1.If Assumption 1 holds, if β(z) has derivatives β ( ) (z) = 02 at z for all = 0,1,2, . . .,L − 1 and some integer L ≥ 1 for which β (L) (z) = 0, then the following limit theory holds when nh Theorem 2.1 shows that flatness in the functional coefficient β(•) at z affects the limit theory of β(z) in the stationary x t regressor case only through the bias function 3).The bias order O(h L * ) and the functional form G L (z) are affected.The bias function G L (z) depends on the first two nonzero derivatives {β ( ) (z); = L,L + 1} of β(z), as well as the density f (z) and its first derivative f (1) (z), the latter appearing as is usual in nonparametric regression.When L is even the dependence is confined to the derivative β (L) (z) and the density f (z).The limiting variance formula S (z) = xx is unchanged from the standard case without flatness and the convergence rate remains √ nh.So, the effect of local flatness in β(z) affects the limit theory of FC regression only via the bias function.
As L rises with an increasing degree of flatness in the regression coefficient at z, the bias function in (2.3), which is of order O(h L * ), falls when h → 0 as n → ∞.When estimation bias falls it is natural to select a wider bandwidth to reduce variance.Correspondingly, the usual plug-in optimal bandwidth formula changes, with resulting adjustment to the convergence rate.This can be conveniently shown in the scalar coefficient function β(z) case, for which the optimal bandwidth formula for minimizing asymptotic mean squared error can be deduced from (2.3) in the usual way, giving (using the scalar x t case to illustrate) (2.6) In the conventional case where L = 1 and L * = 2, we have the usual optimal bandwidth rate h opt* = O(n − 1 5 ).More generally, and taking L to be even for convenience so that (2.7) For instance, when the functional coefficient has the polynomial form β(z) = q j=0 a j z L+j which is locally flat to order L − 1 at z = 0 when a 0 = 0, we have β (L) The same applies when the locally flat coefficient function β(z) has the asymptotically regular form as L → ∞ and the optimal bandwidth h opt* in (2.7) approaches the nonshrinking rate O(1/n 0 ) = O(1).Hence, for large L the associated optimal convergence rate is nh opt * which approaches √ n, giving a near-parametric convergence rate for extremely flat functions.
This behavior matches the heuristic that when a functional coefficient is nearly flat and bias is small from neighboring observation points, averaging over those observations by using a wider (or asymptotically nonshrinking) bandwidth is useful in reducing variance and thereby mean squared error.Note, however, that for this optimal choice of bandwidth as L → ∞, in such cases we have 1) following (2.7) for the case that L is even.So the bias term is, as usual, not negligible for the optimal choice of bandwidth.What Theorem 2.1, formula (2.7), and this asymptotic bias analysis show is that when the coefficient function is nearly flat in the neighborhood of the point of estimation, near parametric convergence rates are possible with the same limit normal distribution and variance as in other cases.

The FC Cointegrating Regression Model
For exposition we use a cointegrating regression equation with full rank I(1) exogenous regressors and functional coefficients.The model is a prototype of more complex systems and provides results that show the impact of flat behavioral characteristics in the functional coefficients on rates of convergence, estimation, inference, and bandwidth selection in a nonstationary framework.These simplifying conditions enable the use of standard kernel-weighted least squares regression.Similar analyses to those given here will be needed in more complex modeling environments under endogeneity and cointegrated equations with possibly cointegrated or even functionally cointegrated regressors.Extensions to address such complexities would involve procedures such as "fully modified" FCC kernel regression.Some related FM methods have been designed for the time varying parameter framework of cointegration (Phillips, Li, and Gao, 2017;Li, Phillips, and Gao, 2016;Gao and Phillips, 2013) and may be developed for FC cointegrating models.But they are not the subject of the present work and are left for future research.
The following assumption modifies the conditions of Assumption 1 and provides for a simple cointegrating regression analogue of model (2.1).

Assumption 2.
(i) {x t } is a full rank unit root process satisfying the functional law 1 √ n x n• B x (•), where B x is vector Brownian motion with variance matrix xx > 0. {u t ,z t } are strictly stationary α-mixing scalar processes with mixing numbers α(j) that satisfy j≥1 j c [α(j)] 1−2/δ < ∞ for some δ > 2 and c > τ (1−2/δ) and τ > 1 with finite moments of order p > 2δ > 4. {x t } and {z t } are independent processes.Further, E(u t |x s ) = 0 for all t,s, and E(u The high level assumption (i) on the functional limit behavior of the regressor x t is convenient, commonly used, and justified by standard primitive conditions (e.g., Phillips and Solo, 1992).The independence assumption between x t and z t is imposed to get useful mixed normal limit theory and a convenient inferential framework. 4The exogeneity and homoskedasticity are also imposed as in Cai et al. (2009).There are ways of dealing with these restrictions, such as developing a fully modified version of the FC estimation procedure or some new version of IVX estimation (Phillips and Magdalinos, 2009) and limit theory tailored to nonstationary FC models.But these options deserve another paper.In further extensions of this type to FCC regression models, many of the findings of the present work on the effects of local flatness of the functional coefficient will be relevant and can be explored in future work.The remaining conditions are as in Assumption 1.The condition on β(z t ) and its derivatives are needed in the nonstationary case because they figure in the development and appear in the asymptotic variance formula.THEOREM 2.2.If Assumption 2 holds, if β(z) has derivatives β ( ) (z) = 0 at z for all = 0,1,2, . . .,L − 1 and some integer L ≥ 1 for which β (L) (z) = 0, and E||β (L) (z t )|| 2 < ∞, then the following limit theory holds under the respective rate conditions indicated: (2.10) and where the bias function , just as in Theorem 2.1.
The division of the limit theory of FC cointegrating regression into three categories was discovered in Phillips and Wang (2021) for the case where L = 1.Theorem 2.2 extends those results to the general case and reveals the effect on both the limit theory and the convergence rate of local flatness in the coefficient function at the point of estimation.As shown in Phillips and Wang (2021) and as is evident in the proof of Theorem 2.2, the presence of multiple categories to the limit theory arises because two different sources of variability occur in the asymptotics -one from the random elements of the bias function and one from the sample covariance of the regressor and the equation error.Correspondingly, the form of the limit theory itself changes, according to the behavior of nh 2L .
Category (i) where nh 2L → 0 is comparable to the stationary case, but with convergence rate n √ h that embodies the O( √ n) order of the I(1) regressor x t and a limit variance matrix that replaces the stationary sample moment matrix limit xx with the corresponding quadratic functional B x B x for the nonstationary case in the limit matrix NS in (2.11).The bias function h L * B L (z) in the centering of β(z) is identical to the stationary case and has the same order O(h L * ).Mixed normal limit theory, but with different rates of convergence and different variance matrices, applies in cases (i), (ii) and the intermediate case (iii).
Remark 2.1 (Convergence-rate optimal bandwidth order).In case (iii) of Theorem 2.2 where nh 2L → c for some constant c ∈ (0,∞), the bandwidth h ∼ a (c/n) and then the convergence rate in case (ii Similarly, the convergence rate in case (i) becomes n . This duality between the two cases implies that the convergence rates in cases (i) and (ii) merge to the same O n 1− 1 4L rate for the intermediate situation where the bandwidth satisfies nh 2L → c.In fact, the case nh 2L → c ∈ (0,∞) yields the maximum convergence rate outcome for FCC regression because, for the boundary cases where nh 2L → 0 or nh 2L → ∞, we find that the respective convergence rates are n and n/h 2L−1 = o n 1− 1 4L .Thus, the FCC kernel regression convergence rate is optimal in the intermediate case where nh 2L → c ∈ (0,∞).The associated convergence-rate optimal bandwidth, denoted The limit distribution is a mixture of the mixed normal component MN (0, L (z)) (which comes from the random element of the bias function) and the mixed normal component MN (0, NS (z)) (which comes from the usual equation error term).The coefficients in this mixture are c 1 2 − 1 4L and c − 1 4L .For instance, if the locally flat function β(z) has the asymptotically regular form β(z) ∼ a a 0 z L as L → ∞, the variance matrix L (z) component in the asymptotic variance in (2.10) remains stable when L is large just like the c − 1 2L NS (z) component.Therefore the random element coming from the bias function cannot be ignored even when the functional coefficient is sufficiently flat at the point of estimation.
However, with h = O n − 1 2L , based on case (iii) of Theorem 2.2, the bias term cannot be neglected because it is of order → ∞ when L ≥ 2. Therefore when discussing the optimal bandwidth order, we need to take the bias effect into consideration, not only the convergence rate.This requires examination of the Mean Squared Error (MSE) optimal bandwidth order, as given next.
Remark 2.2 (Optimal bandwidth order).We explore the optimal bandwidth order with respect to Root Mean Squared Error (RMSE).Let h = O(n γ ), −1 < γ < 0, and β(z) − β(z) = O(n g L (γ ) ).The exponent function g L (γ ) in the latter expression represents the order of the RMSE, which is determined by the maximum of the bias order and the standard deviation order.The subindex L in g L (γ ) indicates that the RMSE order function varies with parameter L.
First consider the case where L is odd in which case L * = L + 1.Based on result (i) of Theorem 2.2, when nh 2L → 0 or equivalently γ < − 1 2L , we have Similarly, suppose L is even, in which case L * = L, and g L (γ ) can be derived based on Theorem 2.2.Thus, if nh 2L → 0 or equivalently γ < − 1 2L , we have when L is even.Note that (2.14) and (2.15) can be combined as (2.16) for L ≥ 2. The g L (γ ) functions are plotted in Figure 1.Evidently, when L = 1, the RMSE optimal bandwidth order is h opt * = O(n −1/2 ), which equals the convergencerate optimal bandwidth order h opt .For L ≥ 2, the optimal bandwidth is ), which is smaller than the convergence-rate optimal bandwidth order The discrepancy between these two optimal bandwidth rates is due to the fact that when L ≥ 2 bias dominates variance in result (iii) of Theorem 2.2.To reduce bias, the RMSE optimal bandwidth prefers to select a smaller order.When L is large, we can see the order of h opt * , viz., − 2 2L * +1 , is close to zero and then h opt * diminishes to zero at a very slow rate as n → ∞.This outcome is consistent with heuristics as β(z) is close to a constant function at the estimation point z when L is large; and estimation of an almost constant function requires only a very low degree of localization.
Remark 2.3 (MSE optimal bandwidth formula).The above analysis tells us that the RMSE optimal bandwidth order, or equivalently, the MSE optimal bandwidth order, is achieved within case (i) of Theorem 2.2.Taking the standard approach to optimal bandwidth selection that balances bias and variance (and using the scalar x t case for convenience) leads to the following formula compared with the stationary regressor case given in (2.6) (2.17) . (2.18) If, as before, the functional coefficient is flat at z = 0 with polynomial form β(z) = q j=0 a j z L+j and a 0 = 0, then x in (2.18).Again, the optimal bandwidth h opt* = O p (n −2/(2L+1) ) and for large L the optimal bandwidth shrinks at a very slow rate and the associated optimal convergence rate n h opt * approaches n, giving a nearparametric convergence rate for extremely flat functions.This suggests that a larger bandwidth is needed for large L. But in practice in both stationary and nonstationary cases, L is typically unknown, so c L (z) and the optimal bandwidth order are also unknown in the absence of information about β(z) and its derivatives.While estimation of optimal bandwidths by cross validation or by the use of derivative function estimates is possible, these methods typically lead to very slow convergence rates in optimal bandwidth formulas even in the simplest cases (Hall and Marron, 1987;Hall et al., 1991).So the above findings are likely to be mainly of importance and use in theoretical work.
Remark 2.4 (Asymptotics with MSE optimal bandwidth).When L = 1, choice of the MSE optimal bandwidth order h opt * leads to asymptotics that are determined according to case (iii) of Theorem 2.2 since nh 2L opt * = O(1) when L = 1.More specifically, the limit theory for β(z) is given by which matches the result in Phillips and Wang (2021, Thm. 2.1(c)) for the standard case of no flatness in β(z).In this case, the bias can be neglected because (2.20) In this case, the random bias component involving L (z) can be ignored asymptotically but the deterministic bias term cannot be neglected because n when L ≥ 2. As L → ∞, the fastest convergence rate approaches O p (n −1 ), leading to the parametric cointegrating regression convergence rate as L → ∞.As in the stationary case, this matches heuristic arguments because β(z) approaches a constant function at the estimation point z when L → ∞.

Procedures for Inference
When L is known or is correctly hypothesized standard test statistics for inference about the functional coefficient β(z) can be constructed in a standard way.Following Phillips and Wang (2021), but allowing now for local flatness in the coefficient function, we start with the matrix normalization 5 where and The statistic T(z;L) follows the same design as the robust t-test statistic developed in Phillips and Wang (2021) for the nonflat case with L = 1.The bias component h L * BL (z) in (3.1) and the second term of ˆ n (z;L) in (3.2) are both infeasible in practical work unless L is known or is stated as part of a null hypothesis such as (3.4) Section 3.3.1 below discusses some of the difficulties involved in the direct estimation of the derivative order parameter L.An alternative way to determine L empirically is to test whether successive derivatives of β(z) are zero at the point of estimation using consistent kernel estimates, 6 β( ) (z), of the derivative functions β ( ) (z) and conducting inference to detect zero derivatives at the point of interest.In most practical cases this procedure would involve examination of only the first derivative or first two derivatives ( = 1,2).Section 3.3.3provides an implementation of a two-step pre-test procedure.Fortunately, simulation evidence presented in Section 4.2 indicates that the two-step pre-test approach works well in 5 To avoid dealing with higher order bias when performing inference in this section we assume that the higher order bias can be ignored after being scaled by the convergence rate.In the stationary case, condition √ nhh L * +1 → 0 is adequate to ensure this.In the nonstationary case, condition n √ hh L * +1 → 0 works for case (i).Conditions for cases (ii) and (iii) can be obtained in a similar fashion. 6Derivative estimates can be obtained in the usual way by employing higher order polynomial approximation.
terms of coverage compared with the infeasible test procedure that employs correct information about L.
Under H 0 the statistic T(z;L 0 ) may be used to construct a robust Hotelling's T 2 type statistic based on the quadratic form T2 (z;L 0 ) = T(z;L 0 ) T(z;L 0 ), so that The following result shows that under the null H 0 with use of the correct value of L the statistics T(z;L 0 ) and T2 (z;L 0 ) satisfy T(z;L 0 ) N (0,I p ) and T2 (z;L 0 ) χ 2 p as n → ∞.This pivotal limit theory provides a basis for performing inference about β(z) when the functional coefficient is locally flat and the flatness parameter L is correctly hypothesized.This approach covers both stationary and nonstationary regressor cases.

Test Power
When the null hypothesis is false and the true value of the functional coefficient β (z) = β 0 but the maintained hypothesis L = L 0 is correct, asymptotic power can be explored under local alternatives of the form where m(z) is a p-vector function whose modulus is bounded away from the origin and ρ n is a real sequence for which ρ n → 0. Let χ 2 p (α) be the 1−α right tail critical value of the χ 2 p distribution.Then, under H 1,β we have for any ρ n satisfying ρ 2 n nh → ∞ if x t is stationary and Assumption 1 holds or for ρ n satisfying ρ 2 n n 2 h → ∞ if x t is nonstationary, nh 2L 0 → 0 and Assumption 2 holds.To prove (3.5) first consider the stationary case.In view of Theorem 2.1 we have, under just as in the proof of (A.25), we obtain It follows that under H 1,β and Assumption 1 so that (3.5) holds when ρ 2 n nh → ∞ and m(z) = 0 in the stationary case.In the nonstationary case, the analysis can be carried out separately depending on the rate of nh 2L 0 .We take nh 2L 0 → 0 as an example.When nh 2L 0 → 0 and nh → ∞, from Theorem 2.2(i) under H 1,β , we have where (3.10) Hence, under H 1,β , Assumption 2 and with nh 2L 0 → 0 and nh → ∞ we have where MN m (•,•) signifies a mean mixture normal distribution. 7The test statistic T2 (z;L 0 ) then diverges when ρ 2 n n 2 h → ∞ because it is asymptotically distributed as a mixture noncentral chi-square variate with the divergent noncentrality param- It follows that (3.5) holds in the nonstationary case when ρ 2 n n 2 h → ∞ and m(z) = 0. Results for nh 2L 0 → ∞ and nh 2L 0 → c ∈ (0,∞) can be obtained in similar ways and the details are omitted.In the case where nh 2L 0 → ∞, the condition Before closing this section we point out that this test is not designed to detect alternatives specifically about L in either stationary or nonstationary cases.To illustrate the difficulties involved in such alternatives, we take the stationarity case with nh 2L 0 → 0 and consider the alternative H 1,L : β(z) = β 0 , L 0 < L where the flatness degree L exceeds the hypothesized L 0 .To examine test power observe that The order of the second term depends on both L and L 0 , noting that the order of BL 0 (z) depends on L 0 and L through the empirical estimates β(L 0 ) (z) and β(L 0 +1) (z) that are used in the construction of the test statistic.Due to the fact that L is unknown under H 1,L , the order of magnitude of the component √ nh h L * BL (z) − h L * 0 BL 0 (z) cannot be precisely determined and the power characteristics of the test are not known.Since these properties of the test in the case of departures L from the hypothesized L 0 are unknown, the statistic is not designed to test hypotheses concerning the flatness order L.

Direct Estimation of L.
The statistic T(z;L) cannot be used in practical work if L is unknown or is not part of the null or an explicit maintained hypothesis.A natural approach if this were not the case but if L were directly estimable (by L, say) would be to employ plug-in estimates β( L) (z) and β( L+1) (z) of the required derivatives of β(z) in the bias and variance matrix components of T(z;L).However, as the analysis below reveals, in the general case of unknown L such a plugin approach encounters difficulties because of the challenge of direct consistent estimation of L. Further, as earlier analysis reveals, the optimal bandwidth order in FC regression depends on the flatness degree parameter L. Since L is a higher order property of an unknown nonparametric function β(z), this dependence poses a subtle question of how to determine the bandwidth h in estimation and inference.
In this respect, noting that β where zt lies on the line segment between z t and z and β (L) (z) = 0 by assumption, it follows that as n → ∞ and h → 0 as shown in Part 1 in the Supplementary Material.Of course, L † n is an infeasible rate estimator reliant on the unknown function β(•) in a neighborhood of z.It has a slow logarithmic convergence rate with Setting w tz = K tz / n t=1 K tz , this limit behavior suggests the following "plausible" practical estimate of L which can be computed using a preliminary bandwidth h satisfying h → 0 and nh → ∞.However, the estimator L is consistent only when bandwidth h is appropriately selected.Part 2 of the Supplementary Material provides a rigorous demonstration of this property of L in the cases L = 1 and L = 2.It is found that when L = 1, L is consistent only when condition nh 3 → c ∈ (0,∞] is satisfied and when L = 2, the consistency condition is In general, we expect that when nh 2L+1 → c ∈ (0,∞] holds L would be consistent for L ≥ 1.Hence, "appropriate selection" of the bandwidth for L to be consistent in general requires knowledge of L, making L infeasible in practice.Simulations that confirm these findings are also provided in Part 2 in the Supplementary Material. A further complication that should be mentioned is that even if a consistent estimator of L were available, bias correction requires specification of the bandwidth factor h(L) = h L in h L B L (z), which presents additional difficulties.For example, whereas the infeasible estimator L † n → p L, the consistency of L † n does not mean that h L † n ∼ a h L .At the end of Part 2 in the Supplementary Material, we show that h L † n is inconsistent for h L .Thus the slow rate of convergence of L † n interferes with consistent estimation of the factor h L needed for bias correction.In view of all these technical difficulties, the feasibility of direct consistent estimation of L requires further study and is left for future research.

Adaptive Statistic Design.
This section comments briefly on the possibility of constructing an adaptive test statistic that does not require knowledge of L. The idea stems from Remarks 3.2 and 3.3 in Phillips and Wang (2021) where a statistic is developed that incorporates bias and variance matrix estimators that do not involve L but instead rely on local information about the function obtained by kernel estimation.In principle, it is straightforward to extend this idea to the case where L > 1.Take the stationary case as an example.The adaptive bias estimator is defined as where the sample average 1 Unfortunately this adaptive bias estimator B(z) is not consistent for the true bias when L > 1 because local kernel estimation in β(z t ) − β(z) is insufficiently precise to capture the required derivative components.In consequence, the limit of B(z) has many additional terms when L > 1.Moreover, direct (bias correction) adjustment to achieve consistent bias estimation is not possible because the limit of B(z) depends on the unknown value of L.More details are provided in the Supplementary Material showing how the adaptive bias estimator fails in flat regions of the function where L > 1 in both stationary and nonstationary cases.There are further obstacles to inference in the adaptive bias estimator B(z) due to additional variation that affects the limit distribution of the bias centered term β(z) − β(z) − B(z).In the nonstationary case, the variance of this term depends on L and β (L) (z), making it difficult to estimate the limit variance adaptively without introducing further bias effects.These complications combine to make it difficult to design an adaptive statistic in cases where the flatness degree is unknown, leaving this pursuit as a further challenge for future research.
3.3.3.Pre-testing.In response to these challenges we propose a feasible pre-test method for practical work in cases where flatness may be suspected.The approach is to focus on the first two derivatives and determine L by pretesting whether these derivatives are zero.Inference is implemented using the selfnormalized statistic T(z;L) with L replaced by the pre-testing estimate.A two-step pre-test estimation procedure is suggested.
Step 1: is rejected, conclude that L = 1 and the procedure stops.Otherwise, continue to Step 2; Step 2: is rejected, conclude that L = 2 and the procedure stops.Otherwise, we simply let L = 3.
The procedure stops at L = 3 for simplicity and because this measure of flatness should suffice in most practical situations.Significance tests involving the derivatives are constructed as follows.To test H 0,β (1) we use limit theory for the derivative estimator β(1) (z).In our ongoing work (Wang and Phillips, 2022) using local p-th order polynomial estimation, the asymptotic theory for derivative estimators β(k) (z) (k = 0,1,2, . . .,p) in FC regressions are obtained.For example, with nonstationary x t and local linear (p = 1) estimation, the asymptotic theory for β(1) (z) (k = 1) is given by where ∼ a signifies asymptotic equivalence, , and B (x,2) = B x B x .From (3.13), it is not hard to see that the optimal bandwidth order that balances bias and asymptotic variance is n −2/7 .Moreover, with this optimal bandwidth order the first term on the right side of (3.13) is negligible relative to the second term as ).Further, result (3.13) remains asymptotically valid when local flatness exists as in that situation the first term is of smaller order than √ h/n and is again negligible when the optimal bandwidth order h = O(n −2/7 ) is used.Hence the t-ratio for testing H 0,β (1) : β (1) (z) = 0 can be constructed based on (3.13).The construction follows that of the t-ratio given in (3.1).Estimation of the asymptotic variance in (3.13) can be constructed in the same way as (3.2) since the asymptotic variances β,1,1 (z) and u,1,1 (z) in (3.13) take similar forms to those in (2.11) and (2.12) and are not repeated.For testing H 0,β (2) : β (2) (z) = 0, the asymptotic limit theory takes a more complicated form and is given in Appendix D.
Denoting the pre-test estimator of L obtained in this way by Lpre , tests can be constructed using the plug-in statistic T(z; Lpre ).Further, with Lpre available, the relevant optimal bandwidth order can be employed, based on (2.6) for stationary x t and Remark 2.2 for nonstationary x t .In this way bandwidth is adaptively determined to account for any local flatness of the function.Simulation performance of the feasible statistic T(z; Lpre ) is reported in Section 4.2.

SIMULATIONS
The simulation experiments that follow employ a simple prototypical framework for evaluating the adequacy of the asymptotic theory.We explore the behavior of the FC estimators and the adequacy of the limit theory in locally flat and nonflat regions of the function.The following sections consider estimation and inference in stationary and nonstationary cases, separately.

Nonstationary x t
In the first experiment the model (2.1) is used with a single I(1) exogenous regressor x t generated as a random walk with iid N (0,σ 2 x ) innovations xt and zero initialization x 0 , iid N (0,σ 2 u ) equation errors u t , and iid U[−1,2] covariates z t .We set σ 2 x = 1 and σ 2 u = 1.Throughout the simulations, the number of replications used is 10,000 and the coefficient function is the quartic β(z) = z 4 , for which the first three derivatives at z 1 = 0 are zero, β (4) (z 1 ) = 4! and Figure 2 shows the mean bias (plotted in the left panel), standard deviation (plotted in the middle panel) and RMSE (plotted in the right panel) for β(z) calculated at the points {z = 0,1} using samples of size n = 100,400 and 800, based on 10,000 replications.In estimation we employ a Gaussian kernel and the bandwidth formula h = σz × n γ .The range −0.90 ≤ γ ≤ −0.05 is used to meet the condition nh → ∞ and to avoid extremely small bandwidths for which there is considerable imprecision in the simulation estimates, as is evident in the plotted curves for the standard deviation and RMSE near the left limit of the domain of definition. 8The plots show significant differences in estimator behavior between the two points of estimation {z = 0,1}, which we summarize as follows.
(i) Bias increases as the bandwidth widens and the bandwidth power γ → 0. For very wide bandwidths, estimates at both z 1 = 0 and z 2 = 1 suffer large bias.However, bias is smaller and usually much smaller at the point z 1 = 0 of locally flat functional form than at point z 2 = 1.These findings all match the asymptotic theory in Theorem 2.2, which shows that bias has order h L * , which is h 4 when z 1 = 0 where L * = L = 4, compared with h 2 when z 2 = 1 where L * = L + 1 = 2 with L = 1.
(ii) Standard deviation rises in estimation at both points of estimation as the bandwidth becomes very small when γ → −1 or as bandwidth becomes very large when γ → 0. This outcome corresponds to asymptotic theory where there are three convergence rates for the cases given in Theorem 2.2, where it is shown that the highest convergence rate (or minimum standard deviation) occurs in the intermediate bandwidth contraction case with h = O(n − 1 2L ).When the bandwidth is very small (γ close to −1), considerable volatility in the standard deviation estimates was found even with a large number of replications, particularly for smaller sample sizes.We therefore only report results for γ ≥ −0.90 and some volatility in the estimates is evident in the graphics close to this lower limit.The standard deviation of β(z) at z 2 = 1 is seen to be substantially greater than that at z 1 = 0 except for small bandwidths, again matching the limit theory.
(iii) The RMSE curves demonstrate similar U-shaped patterns to those of the standard deviation curves.This simulation evidence corroborates the analysis in Remark 2.2, where it is shown that the RMSE order g L (γ ) has a check function shape with L = 4. Further, the RMSE is considerably lower when β(z) is flat at z 1 = 0 than when the coefficient function is rising at z 2 = 1.These gains hold throughout a wide range of bandwidth powers except for smaller bandwidths.
(iv) Across panels (a)-(c) in Figure 2, the main impact of larger sample sizes is the anticipated reduction in the bias, standard deviation, and RMSE, which applies to both z 1 = 0 and z 2 = 1 cases and across all bandwidth powers. 9 To better illustrate the optimal bandwidth order discussed in Remarks 2.1 and 2.2, we report the bandwidth power values corresponding to the minimum points of the standard deviation and RMSE curves from the simulations in Figure 2. Results are collected in Table 1 under the panel headed "x t is nonstationary".According to Remark 2.1, the convergence-rate, or equivalently, the standarddeviation optimal bandwidth order is achieved at − 1 2L , which is − 1 8 ≈ −0.13 for z 1 = 0 (L = 4) and − 1 2 for z 2 = 1 (L = 1).Following Remark 2.2, the RMSE optimal bandwidth order is − 2 2L+1 = − 2 9 ≈ −0.22 for z 1 = 0 (L = 4) and − 1 2 for z 2 = 1 (L = 1).These are the figures reported in the last row of Table 1 for n = ∞. 9The numbers in this row are the optimal bandwidth orders based on the asymptotic theory as n → ∞.For the case that x t is nonstationary, the standard-deviation optimal bandwidth order is the convergence-rate optimal bandwidth order analyzed in Remarks 2.1.It is given as − 1 2L , which is − 1 8 ≈ −0.13 for z 1 = 0 (L = 4) and − 1 2 for z 2 = 1 (L = 1).The RMSE optimal bandwidth follows Remark 2.2, which is − 2 2L * +1 = − 2 9 ≈ −0.22 for z 1 = 0 (L = L * = 4) and − 1 2 for z 2 = 1 (L = 1).For the case that x t is stationary, the RMSE optimal bandwidth order is − 1 2L * +1 as given in (2.6).Then the true value for z 1 = 0 (L = L * = 4) is − 1 9 ≈ −0.11 and that for  Only when L = 1 are these two optimal bandwidth orders the same both here and for z 2 = 1 in Table 1.When L = 4, the convergence-rate optimal bandwidth power is larger than the RMSE optimal bandwidth power.In Table 1 it is evident that for z 1 = 0, the standard-deviation optimal bandwidth order estimates are larger than the RMSE optimal bandwidth order estimates.Moreover, as the sample size n increases, the optimal bandwidth order estimates approach the corresponding limit values reported in the last row for n = ∞.These results again corroborate the analysis in Remarks 2.1 and 2.2 showing that the RMSE optimal bandwidth rate equals the convergence-rate optimal bandwidth order when L = 1 or is less than the convergence-rate optimal bandwidth order when L ≥ 2.

Stationary x t
In the second experiment the same model (2.1) is used but with a stationary exogenous regressor x t generated by the autoregression x t = θ x t−1 + xt with iid N (0,σ 2 x ) innovations xt and zero initialization x 0 , iid N (0,σ 2 u ) equation errors u t , and iid U[−1,2] covariates z t .We set σ 2 x = 1, σ 2 u = 1, and θ = 0.5.Again 10,000 replications are employed.The results for bias, standard deviation and RMSE are shown in Figure 3.The plots for the stationary case mirror those in Figure 2 for the FCC case.The imprecision in the simulation estimates at small bandwidths is more severe than in the nonstationary case and results are accordingly reported here for the reduced bandwidth power region −0.8 ≤ γ ≤ −0.05.The findings for the stationary case are summarized below.
(v) The main difference with the nonstationary model occurs in the standard deviation curves.Different from the nonstationary case, Theorem 2.1 shows that the convergence rate on the left hand side is unaffected by the local flatness parameter L or the bandwidth rate condition nh 2L .We therefore expect to see monotonously decreasing standard deviation curves for both points of estimation {z = 0,1} as the bandwidth power γ increases.From the middle panel of Figure 3, we observe that the standard deviation curve for z 1 = 0 indeed shows a decreasing pattern as γ increases to 0, but that for z 2 = 1 the curve starts to rise slightly when γ is close to 0. This is explained by the randomness that is present in the bias function in finite samples.Although the randomness in the bias function is of smaller order than that of the usual error term asymptotically and therefore does not figure in the limit theory, it can still affect finite sample performance.Moreover, the finite sample effects are less severe when the functional coefficient is locally flatter (with larger L) because the bias is smaller when L is larger.This explains why a marked rise in the curve is only observed towards the right limit near γ = 0 of the domain of definition of the standard deviation curves for z 2 = 1 but not for the curves for z 1 = 0.
(vi) For both RMSE curves, there is also a clear minimum RMSE bandwidth choice as in the nonstationary case.Furthermore, the curves indicate that the minimum RMSE bandwidth power γ is larger for estimation at z 1 = 0 than at z 2 = 1.Direct evidence of this difference is given by the estimates of the RMSE optimal bandwidth power reported in Table 1 under the panel "x t is stationary".These findings corroborate the analysis concerning the optimal RMSE bandwidth order following Theorem 2.1.
(vii) The plots in Figure 3 show the finite sample gains in estimation that occur from local flatness of the functional coefficient.These gains occur for bandwidths large enough to be well beyond the region where there is imprecision in the simulation estimates of the standard deviation and RMSE.

Inference
This Section reports findings on the finite sample performance of the t-ratios discussed in Section 3. Four statistics are considered: (i) the oracle t-ratio T(z;true L), which assumes L, the derivatives (β (L) (z),β (L+1) (z)), and other components σ 2 u , f (z) and f (1) (z) are known; (ii) the infeasible statistic T(z;true L) in which the true value of L is used and other components are estimated; (iii) the "naive" statistic T(z;L = 1) where L = 1 is used as the simplest case without any attention to potential flatness; and (iv) the pre-test statistic T(z; Lpre ).The oracle and infeasible statistics provide two baselines to assess the relative performance of the naive and pre-testing statistic for comparative purposes.The same generating mechanism is used as in the previous section and we again consider the two evaluation points z 1 = 0 (with L = 4) and z 2 = 1 (with L = 1).The bandwidth formula h = σz n γ and a second order Epanechnikov kernel are employed throughout the computations.The naive t-ratio T(z;L = 1) uses γ = −1/2 for nonstationary x t and γ = −1/5 for stationary x t in the computation of β(z) and K tz .The infeasible statistic T(z;true L) is identical to the naive choice when L = 1.When L = 4, it uses the optimal order γ = −1/(2L * + 1) for stationary x t and γ = −2/(2L * + 1) for nonstationary x t for the computation of β(z) and K tz .Given a known L or the estimate Lpre , the oracle tratio T(z;true L) and the pre-test statistic T(z; Lpre ) also use the optimal bandwidth order in the computation of β(z) and K tz .Other details concerning computation are given in Appendix C.
The empirical densities of these t-ratio statistics are shown in Figure 4 for stationary x t and in Figure 5 for nonstationary x t .From Figure 4, at the flat point z 1 = 0 the densities of the oracle statistic are evidently extremely close to the standard normal.Densities of the naive, infeasible, and pre-test statistics show some discrepancy from the standard normal, but the distribution of the infeasible statistic is closer to standard normal when the sample size is large.The improvement of the infeasible over the naive and pre-test statistics reveals  the gains from knowledge of L, or equivalently, the consequences of ignoring or incorrectly estimating local flatness in the coefficient function at the point of flatness.The naive and pre-test methods share similar performance.At the nonflat point, the oracle statistic is the one closest to standard normal.The naive and infeasible statistics are identical in this case because the true value of L is 1.The pre-test statistic is competitive to these.Compared to the performance at the flat point z 1 = 0, the densities are closer to standard normal at the nonflat point.In the nonstationary case in Figure 5, the oracle statistic is again extremely close to standard normal at the flat point.The naive, infeasible and pre-test statistics are competitive in performance, although all are too densely distributed at the origin.The pre-test statistic is slightly closer to standard normal.At the nonflat point, the naive and the infeasible distributions are again identical and their performance is competitive to that of the oracle statistic.The pre-test statistic shares the same competitive performance as the naive and infeasible statistics.In summary, ignoring local flatness in the coefficient function seems to cause some efficiency loss at the flat point and the pre-test method appears to recover slightly this lost efficiency in the nonstationary case.To see these features more clearly and consider their implications for inference we explore the coverage rates and confidence interval lengths of the associated tests.
Table 2 reports coverage rates and lengths of the confidence intervals constructed at the two points z 1 = 0 and z 2 = 1 using the four statistics T(z;true L), T(z;true L), T(z;L = 1), and T(z; Lpre ).Results are based on 20,000 replications.In all situations considered, the oracle statistic has the best performance with coverage rates close to the nominal level and confidence interval length the narrowest in most cases.The infeasible statistic has the second best performance.In the nonstationary case, it has coverage rates close to nominal levels but with confidence intervals slightly wider than those of the oracle.In the stationary case, the efficiency loss of the infeasible statistic versus the oracle is manifest in the lower coverage rate, while the confidence interval can be slightly narrower than the oracle.The naive method is identical to the infeasible method when L = 1.When L = 4, the naive method shares similar coverage rates with the infeasible method, but with much wider confidence interval lengths, especially when x t is nonstationary.This finding reflects the efficiency loss of the naive method caused by ignoring local flatness when it is present.The pre-test method has very similar performance to the infeasible method, especially when L = 1.When L = 4, pretesting achieves similar coverage rates with the infeasible statistic at the cost of slightly wider confidence intervals.It is worth noting that in the case of L = 4, the pre-test method significantly outperforms the naive method with much narrower confidence intervals.This finding reflects the efficiency gain of the estimator Lpre compared to simply treating L as 1.In sum, the pre-testing and infeasible statistics share similar coverage rates but with slightly wider confidence bands around the flat region for the pre-test statistic.Pre-testing appears to deliver significant efficiency gains over the naive method with much narrower confidence bands in Table 2. Coverage rates and confidence interval length (in brackets) at points (z 1 ,z 2 ) based on the oracle t-ratio T(z;true L), the infeasible t-ratio T(z;true L), the naive t-ratio T(z;L = 1), and the pre-test t-ratio T(z; Lpre )   the flat case.These results indicate a preference for the pre-testing approach among the feasible statistics.
Figure 6 further demonstrates the overall performance of the four statistics in the support of z t .The coverage rate curves and lengths of the confidence intervals are plotted over the support [−1,2] of z t for sample sizes n = 200 and n = 800.It is evident that the pre-test statistic has narrower confidence bands in the flat region (the vicinity of the flat point z = 0) and the phenomenon is stronger in the nonstationary case.These findings are explained by the results in Figure 7, which shows the empirical frequencies of the pre-test estimator Lpre , where it is clear that Lpre tends to over-estimate L in the flat region.The over-estimation is more severe in the nonstationary case, although this over-estimation is mitigated by increasing the sample size.At the same time, Figure 6 shows that the coverage rate of the pre-test statistic is competitive with the naive and infeasible statistics in the flat region.As we move away from the flat point z = 0, we find: (i) that the confidence bands of the pre-test statistic grow wider and finally merge with the other three methods; and (ii) that the coverage rate suffers a very small drop and then merges with the naive and infeasible methods.The small drop in coverage rate may be caused by the inferior performance of the derivative significance test in the shape transformation region of the coefficient function.In sum, the pre-test method has much narrower confidence bands around the flat region of the function at the cost of slightly lower coverage rates in the transformation region.In terms of overall performance, these results provide some promising support for practical use of the pre-test method.

CONCLUSION
This paper extends existing limit theory in functional regression to accommodate locally constant coefficients in the regression model (2.1), allowing for both stationary and nonstationary regressors x t .The findings show that, in the stationary case, the primary effects on the limit theory involve estimation bias, which in turn affects optimal bandwidth choice and optimal convergence rates.In the nonstationary case, both bias and dispersion are affected in the limit theory.As a result, the conditions that separate the limit theory into three different categories are affected by the flatness degree parameter.In particular, both bias and variance depend on the number (L − 1) of zero derivatives in the coefficient function, with consequential effects on optimal bandwidth choice and rates of convergence.In the boundary case where L → ∞ near parametric rates of convergence apply for both stationary and nonstationary cases.In both cases, locally flat functional coefficients make wider bandwidth choices beneficial compared with those implied by standard limit theory.But optimal bandwidth choice is complicated by the fact that bias-variance trade-offs may not correspond to optimal convergence rates and bias correction is more complex due to the locally flat behavior of the coefficient function.
In closing it is worth mentioning that extensions of the type given here are relevant to existing asymptotic theory for nonparametric estimation whenever locally flat functional behavior is present in other models such as probability densities, models with nonstationary regressors that are more complex than I(1) processes and models with time varying parameters.Common practice in the latter models, for instance, is to use weak trend formulations of the parameters, leading to time dependent coefficients of the form β( t n ).Trend formulations of this type in both stationary and nonstationary systems will lead to asymptotics that involve extensions similar to those developed here, particularly in stationary regressor case where bias expressions, bias order, and optimal bandwidth choice will all be influenced by flatness in the function.Similarly, in time varying parameter cointegrated systems of the type studied in Phillips et al. (2017), the limit theory will be affected by locally flat regions of the coefficient function.An important simplification in both these cases is that the coefficient function β(•) is deterministic, which means that the bias component affects centering but will not contribute directly to variability and the form of the limit distribution, as it can do in models with nonstationary regressors.These are some extensions of the present theory that seem worthy of full investigation in future research.
In all of the above models, any regions of flatness in the function being estimated are typically unknown a priori, including the degree of local flatness, just as the function itself is unknown.Our analysis shows that in such cases the formulas based on standard asymptotics that are used to measure bias and variance in nonparametric estimation are only approximate and rates of convergence may be wrong, especially in cases of nonstationary regressors.After extensive attempts we have found it extremely challenging to devise a direct inference procedure that accurately accommodates information about unknown locally flat characteristics of a functional coefficient.Fortunately, the two-step pre-test procedure is found to work well and can achieve evident efficiency gains at the flat region over the naive approach that simply ignores the local flatness problem.Empirical estimation of the precise degree of local flatness and improved inferential procedures that take account of potential flatness both merit further research.

APPENDIX
This Appendix has four sections.Section A provides proofs of the main results in the paper, Section B contains proofs of useful supporting lemmas, Section C provides additional computational details, Section D outlines some limit theory associated with testing that is based on local quadratic estimation.

A. PROOFS OF THE THEOREMS
Proof of Theorem 2.1.We analyze the components in the following normalized decomposition of the estimation error ⎛ Starting with the kernel-weighted signal matrix, we have Next, from the proof of Lemmas B.2(c) and (B.5), we have where 1 {L=odd} and C L (z) defined in (2.5).Upon normalization and using Lemma B.2(c), the first term in (A.1) is then The second term of (A.1) is, upon normalization and using Lemma B.2(b)(i), (1) but no central limit theorem holds, as shown in Lemma B.2(b)(ii).The final term of (A.1) is, after suitable normalization and using Lemma B.1(c)(i), Standardizing by the weighted signal matrix and recentering (A.1) we have the estimation error decomposition or, with each component appropriately standardized, as Using (A.4), (A.5) and (A.6) in (A.8), we have

10)
Using (A.3) and (A.4) we have which leads to where , giving the stated result for the first part, which holds whenever nh → ∞ ensuring the central limit theorem (A.12).
In cases where nh Proof of Theorem 2.2.Case (i) We start with the decomposition which can be obtained in the same way as (A.7).Rescale the components according to their asymptotic behavior, as determined in Lemma B.3, so that Then, since nh 2L → 0 in this case, we rescale the equation by n for the bias function.Hence, , as given in the stated result (2.8) for case (i).

Case (ii)
When nh 2L → ∞ the bandwidth goes to zero slower than O( 1√ n 1/2L ).To derive the limit theory in this case, rescale (A.14) by n/h 2L−1 , giving x t u t K tz B x B x −1 , giving the stated result (ii) of Theorem 2.2.

Case (iii)
Since It follows that the first and second terms on the right side of (A.14) have the same order and both therefore appear to contribute to the asymptotics.So, upon rescaling (A.14) by n 1− 1 4L we find that x t u t K tz . (A.22) The asympototics are then jointly determined by the two terms of (A.22).Conditional on F x , these terms are uncorrelated as the conditional covariance involves the matrix E x t x s (x t η t u s K sz ) = 0.

(A.23)
From (A.22) and the bias function calculation (A.16) which continues to hold, it follows that when nh 2L → c > 0 This proves result (iii) of Theorem 2.2.
Proof of Theorem 3.1.(i) Stationary x t We assume L is known.Using Lemma B.1, σ 2 u → p σ 2 u , and any consistent derivative estimator β(L) (z) of β (L) (z), we have

z). (A.26)
Combining (A.26) and Theorem 2.1 gives and T2 (z;L) χ 2 p follows.Further, in view of (A.25) in the stationary case, the simpler estimate ˜ n (z;L) = ν 0 (K) σ 2 u n t=1 x t x t K tz , which is based solely on the variance term, can be employed and the same limit theory applies.
(ii) Nonstationary x t We again assume that L is known.We analyze each case of the Theorem in turn.
Case (a) Using Lemma B.3(c) we have 1 In place of (A.25) and again using a consistent derivative estimator β(L) (z) → p β (L) (z) and σ 2 u → p σ 2 u , we now have and Eu t K 2 tz = 0, so the limit processes B ζ K (r),B uK (r) are independent.The functional laws follow by standard weak convergence methods when nh → ∞ and (ii) follows by showing the O p (1) property directly whereas the CLT does not hold because of failure of the Lindeberg condition, just as in the proof of Phillips and Wang (2021, Lem. B.1(a) and the Lindeberg condition holds when nh → ∞, giving the stated result.Part (ii) follows because, although the stability condition continues to hold as in (B.2), the Lindeberg condition fails when nh → c ∈ [0,∞) as (B.3) no longer tends to zero.To see this, it suffices to look at the simple case with scalar x t , iid{(u t ,z t )} and independent, strictly stationary components with respective densities {f x (x),f u (u),f (s)} we have, given > 0  Phillips and Wang (2021) proved a related result when L = 1 in their Lemma B.1(b).A similar argument is employed here.But for general L we need to compute the first and second moments of η t = ξ βt − Eξ βt and deal with the precise form of the local behavior of the coefficient function β(•) in the neighborhood of the point of estimation z.To this end, the proof uses the following Taylor representations where β (j) (z) = 0;j = 1,...,L − 1 and with zp and zp on the line segment connecting z t and z.The first and second moments of η t may now be deduced.Specifically, using the symmetry of K, we have , and C L (z) are defined in the statement of the Lemma.Next

It follows that
Var where ) in a similar way to (B.5), and where zp is on the line segment connecting z and z + hp.Further, for j = 0 and using the joint density f 0,j (s 0 ,s j ) of z t ,z t+j we have We now deduce that the long run variance matrix of η t , or variance matrix of the standardized partial sum which follows from (B.6) and standard arguments concerning the o(1) magnitude of the sum of the autocovariances of kernel-weighted stationary processes.In particular, from the α mixing property of z t and using a sum splitting argument and results (B.5), (B.7) and (B.8) above, we have for a suitable choice of M → ∞ such that Mh → 0, M τ h → ∞, and M n → 0, with τ > 1, c > τ(1 − 2/δ) and δ > 2. It then follows by arguments similar to the central limit theory for weakly dependent kernel regression in Robinson (1983), Masry and Fan (1997), and Fan and Yao (2008, Thm. 6.5) that the standardized partial sum process of η t satisfies a triangular array functional law giving where B η,L is vector Brownian motion with variance matrix V ηη,L = ν 2L (K)f (z) (L!) 2 β (L) (z) β (L)  (z) .The effective sample size condition nh → ∞ is required for this result.but no invariance principle applies.Taking scalar x t and iid {z t } for ease of notation, the stability condition is satisfied so that 1 √ nh 2L+1 n t=1 η t = O p (1) but the Lindeberg condition fails.To show this, note that η t = ξ βt − Eξ βt = ξ βt + O(h L * +1 ).Given > 0, nh → ∞ and β (L)    Ibragimov and Phillips (2008, Thm. 4.3)

C. ADDITIONAL COMPUTATIONAL DETAILS
The following paragraphs provide further details of how the three statistics in Figures 4, 5, 6 and Table 2 were computed.
(i) Computation of the naive t-ratio T(z;L = 1) follows the definition (3.1).Since the use of T(z;L = 1) implies belief that L = 1, the optimal bandwidth order for that case is employed in the computation.For the computation of β(z) and K tz the bandwidth h = σz n γ was used with γ = −1/2 for nonstationary x t and γ = −1/5 for stationary x t .For the other unknown components β (1) , β (2) (z), f (z) and f (1) (z) involved in (3.2) and (3.3), different 10 Since x t is exogenous the covariance of the two processes is bandwidth orders were used.Specifically, f (z) used γ = −1/5, f (1) (z) used γ = −1/7, β(1) was estimated using local linear estimation with γ = −1/7 for stationary x t and γ = −2/7 for nonstationary x t , and β(2) was estimated by local quadratic estimation with γ = −1/9 for stationary x t and γ = −2/9 for nonstationary x t .Those orders were selected based on optimal bandwidth order rules in the case of local p-th order polynomial estimation to estimate β (p) (z) for p = 1,2 and they utilize ongoing work by the authors for local p-th order polynomial estimation in functional coefficient regression.
(ii) For the infeasible statistic T(z;true L), the true L is used in the computation.For L = 1, it is identical to the naive choice.For L = 4, we need to estimate β (4) (z).We use local 4-th order estimation with bandwidth order γ = −1/13 for stationary x t and γ = −2/13 for nonstationary x t .For the computation of β(z) and K tz the optimal order γ = −1/(2L * + 1) for stationary x t and γ = −2/(2L * + 1) for nonstationary x t were used.
(iv) Computation of the pre-test based statistic T(z; Lpre ) follows similar lines as that of the infeasible statistic, except that L is obtained from the two-step pre-test procedure.

Figure 1 .
Figure 1.Plots of g L (γ ) for L ≥ 1. See online version for all color graphics.
1), as shown in Part 1 of the Supplementary Material.

Figure 3 .
Figure 3. Stationary case: bias, standard deviation and RMSE plots for the FC estimator β(z) at points z 1 = 0 and z 2 = 1 for the quartic coefficient function β(z) = z 4 .The figures show bias, standard deviation, and RMSE in the left, middle and right panels as functions of bandwidth power γ (−0.80 ≤ γ ≤ −0.05) in h = σz × n γ for Model (2.1) with a stationary autoregressive regressor x t with autoregressive coefficient θ = 0.5 and sample size n = 100,400 and 800.

Figure 6 .
Figure 6.Coverage rate (left scale) and length (right scale, with lines marked by circles) of the 95% confidence bands over the support of z t for n = 200 and n = 800, from 20,000 replications.

Figure 7 .
Figure 7. Empirical frequency of the pre-test estimator Lpre over the support of z t for n = 200 and n = 800, from 20,000 replications.
following from Lemma B.3(d)(i).In view of Lemmas B.3(b) and B.3(c)(i) we have ∞) , leading to failure in the Lindeberg condition.Part (d) (i) and (ii) of Part (d) follow in the same way as (i) and (ii) of Part (c).LEMMA B.2.Under Assumption 1, if β
, (B.23)leading directly to the stated independence of B x and B uK .When exogeneity is relaxed, it can be shown that (B.23) holds asymptotically as n → ∞ and h → 0 under a weak additional summability condition.
t K tz = O p (1) but no invariance principle holds.Part (d).(i) As in Part (a)(i), when nh → ∞ we have 1