ESTIMATES OF DERIVATIVES OF (LOG) DENSITIES AND RELATED OBJECTS

We estimate the density and its derivatives using a local polynomial approximation to the logarithm of an unknown density function f. The estimator is guaranteed to be non-negative and achieves the same optimal rate of convergence in the interior as on the boundary of the support of f. The estimator is therefore well-suited to applications in which non-negative density estimates are required, such as in semiparametric maximum likelihood estimation. In addition, we show that our estimator compares favorably with other kernel-based methods, both in terms of asymptotic performance and computational ease. Simulation results confirm that our method can perform similarly or better in finite samples compared to these alternative methods when they are used with optimal inputs, that is, an Epanechnikov kernel and optimally chosen bandwidth sequence. We provide code in several languages.


INTRODUCTION
We propose a new nonparametric estimator for (the logarithm) of a density function and its derivatives that attain the optimal rate of convergence both in the interior and at the boundary of the support.Our density estimator is available in closed form and is guaranteed to be positive unlike several alternatives, which is appealing in some applications and critical in others, such as in semiparametric maximum likelihood estimation (see e.g., Klein and Spady, 1993). 1 The new methodology differs from the previous literature in that it first estimates a function's derivatives, which, if desirable, can then be used to construct an estimate of the function itself.Our general estimation strategy can also be applied to obtain estimates of other quantities of economic interest, including the density in regression discontinuity design models, the (reciprocal) of the propensity score, the inverse bid function in auction models, and any other application in which the density appears inside a logarithm or a denominator. 2pecifically, we consider an i.i.d.sequence of random variables {x 1 , . . .,x n } with x i distributed according to some unknown distribution F with density f > 0 on its support [0,U ), where U can be infinite.The standard Rosenblatt-Parzen (RP) kernel density estimator is inconsistent at the boundary and is typically badly biased in finite samples at values of x near the boundary.In contrast, our method employs a local polynomial approximation of L(x) = log f (x) to obtain asymptotically normal estimates of L and its derivatives, away from, at, or near the boundary. 3n advantage of using a polynomial approximation to the log-density instead of the density is that the estimated density can be guaranteed to be positive, which is not true for alternative boundary correction methods that use boundary kernels or a local polynomial approximation of f (Cheng, Fan, and Marron, 1997;  Zhang and Karunamuni, 1998) or F (Lejeune and Sarda, 1992; Cattaneo, Jansson,  and Ma, 2020). 4 5Unlike Loader (1996) and Hjort and Jones (1996), however, the computation of our estimator does not require solving a nonlinear system of equations that involves numerical integration.In fact, the estimator of derivatives of L may be expressed as the solution to a linear system of (weighted) local averages.Thus, our method can be characterized as a local method of moments, similar in spirit to local likelihood density estimation (Loader, 1996; Hjort and  Jones, 1996, and much subsequent work) but computationally more similar to local polynomial regression (Lejeune and Sarda, 1992; Cheng, Fan, and Marron, 1997;  Zhang and Karunamuni, 1998; Cattaneo, Jansson, and Ma, 2020). 6We therefore retain the computational ease of a local polynomial regression estimator while eliminating the possibility of negative density estimates.The estimator of f then obtains in explicit form with estimates of the derivatives of L as inputs.
Apart from its numerical advantages, our estimator for the density has the same first-order asymptotic properties when applied with the same bandwidth as the local likelihood estimator.When applied with a larger bandwidth, however, our estimator achieves a smaller asymptotic mean square error.We cannot generally compare the bias of our method with the biases of methods that use a polynomial approximation to f, but our estimator has the same asymptotic variance as traditional methods when they are applied using an optimal kernel and bandwidth sequence.Hence, our local polynomial approximation to L can be expected to outperform alternative estimators for f in finite samples when our bias is smaller, for example, when the log-density is in fact polynomial.
In large enough samples, an asymptotically unbiased version of our estimator achieves a smaller variance and therefore a smaller mean square error than the optimized alternatives.Importantly, our estimator realizes this improved performance without sacrificing non-negativity of the estimated density, as would be required to achieve the same asymptotic distribution using alternative methods. 7ur estimator requires the choice of several input parameters, more than does RP.This can be seen as a downside and we could have hardwired specific choices of some of the input parameters to reduce the number of input parameters that must be chosen.We have not done so since there can be instances in which the additional flexibility is of value.
We also note that the log-density or its derivatives may be of direct interest to the researcher, in which case our method may be an attractive alternative to transforming estimates of f and its derivatives to obtain the desired estimates.For instance, the generalized reflection method of Karunamuni and Alberts (2005)  and Karunamuni and Zhang (2008) requires an estimate of L (0) which they obtain using a finite difference approximation.Estimates of L can moreover be used as an input into other objects.One case that has already been mentioned is semiparametric maximum likelihood estimation, of which Klein and Spady  (1993) is a classical example in which the likelihood objective can be written as a function of the log-density.But there are other important examples.For instance, in regression discontinuity design, estimation of the density at the discontinuity point can be of interest (see e.g., Cattaneo, Jansson, and Ma, 2020).A second example would be the estimation of propensity scores which are of importance in the estimation of treatment effects.A final example is that of the estimation of auction models for which our estimator can be used to obtain estimates of the inverse strategy function; see for example, Hickman and Hubbard (2015); Pinkse  and Schurter (2019).These examples and more are discussed in Section 5. Finally, we provide code in several languages at https://github.com/kschurter/logdensity.

ESTIMATOR
We now discuss our main estimator, postponing the discussion of applications and variants to Section 5. Let L(y) = log f (y) denote the log-density function and assume it to possess at least S + 1 ≥ 2 derivatives at x, the point at which we wish to estimate L. Our estimator will be in the kernel family of estimators and we denote our bandwidth by h. 8 To specifically allow for x approximating the boundary, we introduce the notation z = min(x/h,1).
In the first step of our estimation procedure, we estimate derivatives of L, which are subsequently used to construct an estimate of L itself.Our estimator of derivatives of L is based on the fact that for any differentiable function g : R → R S with support [−z,1] and for which g where L (s) denotes the sth derivative.The above follows from integration by parts under the assumption that f is bounded on the domain of integration and a Taylor expansion of L around x.We define β s = L (s) (x) and gather these coefficients into a vector We then estimate integrals on the right and left sides of (1) by their sample analogs and estimate β by solving Since ( 2) is linear in β, the solution will generally be unique.9Moreover, we can choose g such that our estimator β of β is in closed form.
There are many functions g that satisfy the desiderata outlined above.For the purpose of providing examples, we choose g to be a vector whose jth element is In Section 4, we describe the sense in which this choice of g may in fact be optimal.
Example 1.If S = 1 then β1 is simply the derivative of the logarithm of the RP estimator with kernel 61 Example 2. If z = 1 and g j (u) = k(u)u j−1 for some symmetric, non-negative kernel function k then (1) represents the first-order condition for the minimizer of the least squares criterion in a local polynomial regression of L (x i ) on x i , which would be infeasible because L (x i ) is not observed.
Example 2 illustrates that the integration by parts in (1) can be viewed as a device for obtaining a feasible set of local moment conditions from an infeasible set of moment conditions involving L .Thus, g fulfills a role similar to a kernel, but the restrictions we impose are different.Indeed, we require g(−z) = g(1) = 0 so that g(1)f (x + h) − g(−z)f (x − zh) is zero after integration by parts.Now, once we have an estimator β of the derivatives of L at x, we can use it to construct estimators of f (x) and L(x).Indeed, substituting our approximation for the log-density in 1 −z m z (t)f (x + th) dt and rearranging suggests an estimator for β 0 similar to that of Loader (1996): where f m is an RP estimator using a non-negative kernel m z with support [−z,1].
It should be apparent that f (x) cannot be negative and the numerator is zero only if there are no data on the interval [x − zh,x + h].
If z < 1 then m z can be thought of as a traditional boundary kernel 10 such that the bias in the numerator of (3) is O(h).The role of the denominator is that it is an (asymptotically) biased estimator of the number one.Indeed, the denominator bias compensates for the numerator bias such that for S = 1, the bias of f (x) is again O(h 2 ).We define L(x) = log f (x) and show that its asymptotic bias is β S+1 h S+1 times a constant that is independent of both n and fixed x. 11 Computing f is relatively simple because β is simply a local least squares statistic and f is a ratio.Although f 's denominator contains an integral, for values of S ≤ 2, which will be the most common scenario, the denominator in (3) obtains in closed form if m z is a truncated Epanechnikov; for S = 1 this is demonstrated in Example 3. 12 For other kernels and greater values of S, an asymptotically equivalent closed form expression can be obtained by expanding the denominator in (3) in terms of exponential Bell polynomials (Bell, 1927). 13Standard numerical integration methods can also be applied to this integral, in which case an advantage of our method is that this integral only needs to be computed once rather than at each iterate of a maximization routine, as in a local maximum likelihood approach.
The following examples compare the asymptotic behavior of our estimator and traditional approaches to kernel density estimation at and away from the boundary.Example 3 obtains a closed form expression for the denominator in (3), which is used in Examples 4 and 5 to obtain explicit expressions for the special cases z = 1 and z = 0 (away from the boundary and at the boundary, respectively).
, which is proportional to minus the Epanechnikov kernel.So then β1 is simply the derivative of the logarithm of the RP estimator using the Epanechnikov kernel.The denominator in (3) is then exactly The denominator can be expanded around h = 0 to obtain the approximation The bias of L(x) is by the mean value theorem then seen to be h 2 β 2 /5 + o(h 2 ).Thus, the bias we introduced in the denominator offsets the bias present in the numerator.
In Example 4, we took x > h and hence z = 1 to provide intuition.In the following example, we consider what happens at the boundary, that is if x = z = 0.
Example 5. Again suppose that S = 1, but now let x = z = 0. Then g(t) = −(1 − t)t, such that β1 is now the derivative of the logarithm of the kernel density estimator at zero using the kernel 6(1 − t)t1(t ≤ 1).
If we again use an Epanechnikov in (3) then now the denominator becomes for β1 = 0, Our results show that the bias of β1 is The bias of L(0) is by the delta method hence −7h 2 β 2 /80 + o(h 2 ).Again, the bias we introduced in the denominator offsets the bias in the numerator.
Example 5 demonstrates that, unlike traditional boundary kernel estimators, the bias of our estimator is O(h 2 ) at the boundary, also.It may seem odd that the bias in Example 5 is less than that in Example 4, but note that the variance will be larger at the boundary and one would hence generally choose a greater bandwidth.
We finish this section with a description of a multivariate version of our estimator.For S = 1, one could simply solve for with d x the dimension of x i and g → (x) = d x j=1 g (j) (x j ), where the g (j) 's satisfy the conditions described above for the univariate case. 14The corresponding estimator for the density f is then The procedure for values of S > 1 is analogous to the univariate case, but is substantially messier since the number of parameters to be estimated grows as a power of d x , just like it would for local polynomial estimation.The theoretical results below are for the univariate case, but extend to the multivariate case.

Derivatives of L
We first derive limit results for the vector β of estimates of derivatives of L. Since our estimator β is defined as the inverse of a matrix times a vector, its bias is the inverse of a matrix times a vector, also.
To simplify expressions for the asymptotic bias and variance of our estimator, we introduce the following objects which depend only on the choice of function g (which in turn depends on the proximity to the boundary z), as well as a diagonal matrix that depends on h.Let and let ,b, be defined as .
We are now in a position to state our first theorem.
THEOREM 1. Assume L is S + 1 times continuously differentiable (and hence f nonzero) in a neighborhood of x.Let h → 0 and nh 3 → ∞ as n → ∞.For a vector β defined in Appendix A.1, The "in a neighborhood" condition comes from the fact that we specifically allow x = zh.Although x is allowed to vary with h and hence with n, our result is not really uniform since, like with other local nonparametric estimators, tightness obtains only in an h-neighborhood, that is, a neighborhood that shrinks with the sample size.The rates in Theorem 1 are the standard rates for local nonparametric derivative estimation.15Example 6.For S = 1 the bias and variance expressions of β1 simplify to β 2 h(1 − z)/2 and 12/{f (x)(1 + z) 3 }, respectively.The interior case (z = 1) is more favorable than the boundary case (z = 0), as expected.

Density
We now continue with the results for f (x).
Let c msz = 1 −z m z (t)t s dt / s! and let c mz be a vector with elements c m1z , . . .,c mSz .Let further ω z (t) = m z (t) − c mz −1 g (t), and define s is the sth standard basis vector in R S , the function ω z is a kernel of order S + 1 or higher.To be clear, ω z is not used to compute the density estimate; rather, it is a convenient object that arises in the asymptotic theory.THEOREM 2. Assume L is S + 1 times continuously differentiable in a neighborhood of x, that f (x) > 0, and that 0 The asymptotic bias of our estimator is zero in some instances.For example, if S = 2 and z = 1 then the asymptotic bias is zero whenever m is a symmetric kernel function; this is natural since this is effectively equivalent to choosing a higher order kernel ω z , albeit that unlike higher order kernel density estimates, our estimates cannot be negative.
The following two examples derive the ω z functions for the case in which both m z is a uniform and x is at the boundary and the case in which m z is a truncated Epanechnikov and x is anywhere.
Example 8. Suppose that m z is a uniform and where the ratio equals 1/10 for z = 1 and −7/80 for z = 0. To get which equals 0.6f (x) for z = 1 and 4.01f (x) for z = 0.
As the above two examples demonstrate, deriving the asymptotic bias and variance for generic z can be a messy but straightforward exercise.

Limit results when f (x) = 0
Because the non-negativity of the proposed density estimator is one of its advantages over some alternative approaches (boundary kernels and local polynomial approximations to the density function), we further investigate the asymptotic behavior of our estimator in cases where one would expect these alternatives to struggle, that is, when the density is zero.Indeed, non-negativity would not be an advantage of the density estimator if it performed poorly in other ways whenever the density was close to zero.Fortunately, the bias of the density estimator converges at the usual rate when f (x) = 0 for fixed x > 0, and the variance converges even faster.Moreover, f (x) β1 converges to f (x) faster when f (x) = 0 than when f (x) > 0 if x > 0 despite the fact that the log-density and its derivatives are not defined when the density is zero.We further suggest a modification to β that increases the rate at which β1 diverges when f (0) = 0 and thereby reduces the order of the bias in the density at the boundary.If f (0) = 0 then the modified density estimator at the boundary is superconsistent.
Throughout this analysis, we maintain the standard smoothness assumption that f has two continuous derivatives and the assumption that f takes support on an interval [0,U).For fixed x > 0, these assumptions imply that f (x) is a relative minimum of the density, that is, At the boundary, however, we admit the possibility that f (0) > 0. Because the dominant terms in the asymptotic expansions depend on whether either or both f (x) and 12 are zero, we provide separate results to deal with these cases.The next theorem applies to the interior case where f (x) = f (x) = 0 and 12 = 0, which will be true if, for example, g is an even function.For simplicity, we specialize the theorems to the local-linear version of the estimator (S = 1).
Roughly speaking, the numerator in the definition of f converges in probability because the dominant term in the variance expansion is of smaller order when f (x) = 0.The denominator of f converges in probability to one because h β1 is o p (1).Hence, the density estimator is asymptotically equivalent to an RP estimator when f (x) = f (x) = 0 and 12 = 0.
The fact that f (x) in Theorem 3 has asymptotic variance equal to zero suggests choosing a smaller bandwidth.Indeed, it can be shown that the optimal bandwidth is then of order n −1/3 producing the convergence rate n −2/3 and a non-normal limit distribution. 16One cannot, however, make the bandwidth dependent on the value of f (x) itself since f (x) is exactly what we wish to estimate.Further, a two step approach, that is, replacing f (x) with zero if f (x) is below some cutoff, would converge even faster if f (x) = 0, but would produce discontinuous estimates and erratic behavior near the cutoff. 17he fact that the derivative estimator is superconsistent may be surprising at first glance, but it is a property that is shared by the RP derivative estimator.To see this, note that if f (x) = 0 for x in the interior then , where both results follow by bias-like expansions for nonparametric kernel density estimators, noting that twice continuous differentiability at x in the interior and f (x) = 0 imply that f (x) = 0.The convergence rate of both our estimator of f and the RP derivative estimator is o p (n −1/5 ), but nothing more can be said. 18e now turn to the boundary estimation problem, where we focus on the case x = 0 noting that these results easily extend to x = zh for fixed 0 ≤ z ≤ 1, as explained in the proof.The following theorem applies to the case where 12 or f (0) is nonzero.THEOREM 4. Let S = 1.Assume f is twice continuously differentiable at zero, f (0) = 0, and h = O(n −1/5 ).Let β * 1 = β1 max{1,h q A β1 } for some constants A > 0 and 0 < q < 1 and let f * denote the density estimator with β1 replaced by Intuitively, the key idea is that it will become apparent that the density is zero because β1 diverges at the rate of 1/h if 12 = 0 or 1/h 2 if f (0) > 0 and 12 = 0. Whenever β1 is large, β * 1 gets an extra shove toward infinity, so that h β * 1 diverges under the assumptions of Theorem 4 but β * 1 converges to the same limit as in Theorem 1 if f (0) > 0. Because h β * 1 appears in the exponent in the denominator of (3), f * converges to zero very quickly when f (0) = 0. We note that the unmodified estimator f (0) works well whenever 12 = 0 because h β1 then diverges at the rate of 1/h.Moreover, practitioners have control over 12 through the choice of g, so replacing β1 with β * is unnecessary in Theorem 4 for suitable choices of g.On the other hand, practitioners may not want to allow the possibility that the density is zero at the boundary to drive this choice.Thus, substituting β * 1 provides flexibility to consider other criteria in selecting g while maintaining superconsistency when f (0) = 0.
For the purposes of estimating f near the boundary when 12 is nonzero, we note that the left side of (2) equals Dividing the left side by − 11 yields a standard RP estimator for f (0) that converges at the optimal rate whenever β1 diverges.One can therefore use a similar idea as in Theorem 4 to define a piecewise estimator for f (zh) that equals f β1 if h q β1 < A and equals the RP estimator, otherwise.

ASYMPTOTIC COMPARISONS
In this section, we explore the optimal (g,m z ) in the local linear case S = 1 and compare our optimized estimator with existing methods.We show that the above choice of g and m z achieves the same asymptotic variance as an optimal RP estimator in the interior (z = 1), although their respective biases cannot be compared in general, in the sense that which bias is less depends on the density function and its second derivative at the point of estimation.We then consider the optimal choice of g and m z at the boundary (z = 0), where we show that the truncated Epanechnikov m z paired with g(t) = (t + z)(t − 1) 2 attains the same variance as an optimal boundary kernel (Zhang and Karunamuni, 1998), although the biases are again incomparable because our estimator's bias is proportional to f (x)L (x) rather than f (x) as in the case of RP estimators.We are, however, able to derive the relative asymptotic mean squared error (AMSE) of our estimation method compared with a local-likelihood based estimator.We show that our method with the cubic choice g(t) = (t + z)(1 − t) 2 attains the same AMSE in the interior and is more efficient at the boundary than the estimator in Loader (1996)  with an Epanechnikov kernel.

Optimal Choice of g and m z in the Local Linear Case
In traditional RP kernel estimation, one typically derives the AMSE-optimal kernel in conjunction with the optimal choice of bandwidth sequence.When the bandwidth sequence is optimally scaled by a multiplicative factor that depends on the roughness and second moment of the kernel, one can show that the Epanechnikov kernel minimizes the AMSE of the estimator for the density among all non-negative second-order kernels. 19The non-negativity restriction ensures that the estimated density is nonnegative and continuous at the point of evaluation.In this section, we similarly explore the optimal choice of inputs to our estimator under an analogous set of restrictions on g and m z that guarantee that our density estimate f (x) is nonnegative.
In addition, we impose the constraint that g does not change sign, that is, is either non-negative or nonpositive on [−z,1].This restriction helps to keep the right side of (2) away from zero whenever the density is nonzero.Although this restriction is not necessary for the asymptotic analysis in Section 3 because the probability limit of the right side is (strictly) positive, the estimator may perform poorly when 1 −z g(t)f (x + th) dt is close to zero in finite samples and f (x) > 0.
This restriction is necessary in order to derive an interior solution to the input selection problem.Without the sign-change constraint, the optimal choice of g and m z would yield an asymptotically unbiased estimator, which in turn implies that there is no optimal bandwidth sequence of order n −1/5 .For example, if , where a z (t) is a carefully selected quadratic polynomial in t, then ω z (t) will be a third-order kernel for z ∈ [0,1) and a fourth-order kernel for z = 1.One could then choose a bandwidth sequence proportional to n −1/5 that is large enough to make the asymptotic variance as small as desired.This approach would be analogous to using a higher-order kernel with a bandwidth of order n −1/5 in RP estimation, although the resultant ω z is in fact less rough than the usual optimal fourth-order kernel because we do not require ω z (1) = ω 1 (−1) = 0.This asymptotically unbiased estimator can be desirable in some circumstances, but one must be cognizant of the potential risks of using a relatively large bandwidth and a g for which the right side of (2) can be zero in finite samples.
To summarize, we seek to minimize the AMSE subject to the following constraints: and m z ≥ 0. Note that the constraints that g is non-negative and integrates to one are without loss of generality because the estimator is invariant under scalar multiplication of g and we have already assumed g does not change sign.
The AMSE of the density estimator depends on g and m z only through the function ω z .Provided that the bias is of order n −2/5 , the optimal bandwidth at a fixed x is h = which is similar to the corresponding expression for RP estimators, except that the bias depends on L (x) instead of f (x).A standard argument using the calculus of variations shows that the Euler equation for optimality implies that ω z is a quadratic polynomial.Unlike typical derivations of the optimal kernel for RP estimation, however, we do not directly require ω z (1) = 0, ω 1 (−1) = 0, or ω 1 ≥ 0. Our problem is slightly more complicated because we must determine the optimal quadratic ω z that can be obtained from a pair (g,m z ) that satisfy the above constraints.
To this end, we note that the above constraints on (g,m z ) imply that ω z (1) ≤ 0 and ω 1 (−1) ≥ 0. In the interior, the non-negativity of ω 1 (−1) is binding.Hence, we find that the optimal ω 1 is the Epanechnikov kernel, which can be obtained by selecting the Epanechnikov kernel for m 1 .Because c m11 is zero, the choice of g does not affect the asymptotic distribution given m 1 .One can therefore choose g according to another objective.For instance, g(t) = 1 − t 2 minimizes the AMSE of the estimator for β 1 .
On the other hand, at the boundary (z = 0) the constraint ω 0 (1) ≤ 0 is binding.We find that the optimal ω 0 is given by ω 0 (t) = 6 − 18t + 12t 2 , which is the same optimal boundary kernel found in Zhang and Karunamuni (1998).One can produce this ω 0 using a truncated Epanechnikov kernel for m 0 and g(t) = t(t − 1) 2 .Because this g differs from the optimal g for estimating f and f at z = 1, we suggest g(t) = (t + z)(t − 1)(1 + zt − t) combined with the Epanechnikov m z because it is AMSEoptimal for f and f in the interior and AMSE-optimal for f at the boundary.
If one attempts to derive optimal inputs for estimating the density local to the boundary, that is, at a drifting sequence x = zh for z ∈ (0,1), complications arise quickly.For example at z = 1/3, the constraints on (g,m z ) do not bind and an asymptotically unbiased estimator of the density is feasible using . The preceding analysis immediately goes awry because there is no optimal bandwidth of order n −1/5 for z = 1/3 given these inputs.At other values of z ∈ (0,1) for which the constraints do bind, the issue is that the above bandwidth sequence is generally suboptimal because ω z depends on h through z, with the result that the first-order condition for optimality of the bandwidth sequence is insufficient for the global minimum.In light of these issues, we conclude that there is no continuous transformation that one can apply to the bandwidth sequences at z = 0 and z = 1 in order to optimally estimate the density at intermediate values of z.Therefore, one must necessarily choose a somewhat ad hoc bandwidth local to the boundary.
One such rule for specifying the bandwidth near the boundary is to linearly vary the bandwidth between the interior and the boundary: , where h 0 and h 1 are the optimal bandwidths at z = 0 and z = 1.When g and m z are also chosen optimally for estimating the density at z = 1 and z = 0, h 0 = 2h 1 and this rule implies that the same window is used to estimate the density and its derivatives for all x < h 1 .

Relative AMSE
The AMSE of the local-likelihood estimator for f (x) in Loader (1996) can be written in a form similar to (6).The relative AMSE of our proposed estimator and the local-likelihood estimator using an optimal kernel is then given by the ratio of the multiplicative constants that scale f (x) 6/5 L (x) 2/5 n 4/5 .At z = 1, if one compares our optimal estimator to the local-likelihood based estimator using an optimal bandwidth sequence and the Epanechnikov kernel, the relative AMSE of our estimators is one.In fact, the limiting distributions are identical.At z = 0, the optimal kernel to use with the local-likelihood estimator is triangular, that is, k(t) = (1 − |t|)1{|t| ≥ 1}, which yields the same asymptotic bias and variance as our estimator using the truncated Epanechnikov and g(t) We should expect our estimator with g(t) = (t + z)(1 − t) 2 and the locallikelihood estimator to perform similarly in finite samples.Thus, our density estimator's computational ease is its more salient advantage over the local-likelihood based approach, given these inputs.Of course, one could make an alternative choice of g and increase the bandwidth to widen the gap in AMSE at the possible expense of a larger finite sample bias.

Optimal Inputs with Polynomial Approximations of Higher Order
For S > 1, the optimal (g,m z ) can again be reformulated as the optimal choice of a higher order kernel ω z .Unlike in RP estimation, however, the higher-order kernel does not necessarily entail the possibility of negative density estimates, since one can achieve a higher-order kernel ω z using a non-negative m z and a suitable g.For example, m z (t) = 3(1 − t 2 )/4 and g j (t) = (t + z) j+1 (t − 1) for j = 1,2,3 produces a fourth-order Epanechnikov kernel ω 1 (t) = 15 32 (3 − 7t 2 )(1 − t 2 ).Like in the linear case, the restrictions ω z (1) = ω 1 (−1) = 0 are necessary for the fourth-order Epanechnikov kernel to be AMSE-optimal.Without these or similar constraints on g, one can choose the nonbandwidth inputs to achieve zero asymptotic bias, that is, make the squared bias asymptotically negligible relative to the variance.We must therefore restrict g to ensure an interior solution when we optimize the AMSE over (g,m z ) in the higher order cases, as well.
In general, one also needs to restrict g to derive an AMSE-optimal choice for the estimation of the derivatives of L, as well.The necessary conditions for the calculus of variations problem imply that each optimal g j is a a polynomial of at most degree S + 2 for each j.Because the scale of g does not affect the estimates and g j (−z) = g j (1) = 0, one only needs to optimize over the remaining roots of g j for each j = 1,...,S.Characterizing the resulting minimization problem is routine, but it does not generally lead to an interior solution for the choice of bandwidth because the bias can be zero.For example, the bias can be made zero when S = 2 if g j t 2 dt = 0 for all j.

Treatment Effects
It is well-known (see e.g., Hirano, Imbens, and Ridder, 2003) that under an unconfoundedness assumption the average treatment effect can be expressed as where p is the propensity score, y the outcome variable, x a vector of regressors, and t a binary treatment variable.Let p 1 = Et be the unconditional treatment probability.Let further f 1 denote the regressor density function conditional on treatment and f 0 the density conditional on nontreatment.Then such that the reciprocals of the propensity scores only depend on the ratios of the densities and the unconditional choice probabilities.In practice, p(x) is often estimated using a logistic functional form and possibly a series expansion in x, 20 which implies the logarithm of the odds ratio is a polynomial in x.Our logpolynomial approximation to f 1 and f 0 similarly imply the odds ratio is logpolynomial.The data x will not typically be scalar-valued, so one could for instance use a linear index of regressors instead of the regressors themselves; see Section 5.3 for an example of how one might estimate p(x) = p * (x θ 0 ).In any case, our approach is a natural local extension to the logit series estimator for p(x).

Auctions
In first-price, sealed-bid procurement models with independent private values, it is well-known (see e.g., Guerre, Perrigne, and Vuong, 2000) that the inverse bid function is of the form b − F(b)/f (b), where F,f are the survivor and density functions of the minimum rival bid.Since the support of the bid distribution is assumed to have a lower bound in this literature (costs cannot be less than zero and hence neither are bids), boundary issues are a serious concern.So let (y) = F(y)/f (y) be the object of estimation.One way of estimating is to estimate F,f separately where f is estimated using the machinery in the main part of this paper and F is estimated by the empirical survivor function.This estimator has all the features of the estimator discussed earlier in the paper, positivity and boundary correction, both of which are desirable in the empirical auction setting.In particular, if the underlying cost distribution is approximately an exponential then so is the bid distribution and our estimator could be expected to work especially well.
The above approach is not specific to auction models.Indeed, consider the hazard function H(x) = f (x)/ F(x).H can also be estimated using the machinery developed in our paper.

Semiparametric Maximum Likelihood
There are many examples of semiparametric maximum likelihood estimators.Here, we only consider a classical one, namely the Klein and Spady (1993)  estimator of the coefficients in a semiparametric binary response model, which maximizes where p is an estimator of the choice probability.Klein and Spady apply techniques to ensure that the estimates p are positive and less than one, including trimming and adding a sample-size-dependent constant.Our method can be helpful since , where f j is the density of the linear index for observations with y i = j and p 1 is the unconditional choice probability.Since , the infeasible contribution to the log-likelihood could be written as such that only the log of the conditional and unconditional densities matters.
Obtaining conditions under which our estimator obtains the semiparametric efficiency bound, like the Klein and Spady estimator does, are well beyond the scope of this paper.In Section 6, we provide simulation evidence that suggests our estimates of the log-densities are suitable substitutes for RP estimates in this https://doi.org/10.1017/S0266466621000529Published online by Cambridge University Press application.Moreover, we show that an alternative method based on the infeasible score can outperform the Klein and Spady estimator.

Regression Discontinuity Design
One context in which the behavior of estimates near or at the boundary is of special importance is that of regression discontinuity design.For instance, Cattaneo, Jansson, and Ma (2020) provide a test of continuity of the density function at the boundary using a boundary density estimator that is similar to the estimator in Lejeune and Sarda (1992) in that it is based on a quadratic expansion of the distribution function.Compared to that approach, our method requires the density to be nonzero at the boundary, which is a requirement for the regression discontinuity framework in any case.The bottom line is that our method will work better if the log-density is approximately a low order polynomial near the boundary and theirs if the density itself is approximately a low order polynomial.This is borne out by our simulation results.

Other Boundary-Correction Methods
The boundary correction method of Karunamuni and Zhang (2008) requires a wellbehaved estimate of L (0), which is exactly what our method provides.

The Density and its Derivative
The following simulation exercise compares the performance of our estimator for the density and its derivatives near the boundary with alternatives that also employ local polynomial approximations (Lejeune and Sarda, 1992; Loader, 1996;  Cattaneo, Jansson, and Ma, 2020) and the generalized reflection method, which also estimates the derivative of the log-density near the boundary to remove the boundary effects of the RP estimator (Karunamuni and Alberts, 2005; Karunamuni  and Zhang, 2008) .For the local polynomial estimators, we use a local linear approximation to the density or log-density, depending on the method. 21e simulate 2,000 i.i.d.samples of size n = 500 and estimate f at points within one bandwidth of the boundary in order to compare the estimators away from, near, and at the boundary.Since the bandwidths differ across methods, so does the range of values at which the density is estimated in the presented results.The random variables are drawn from each of four distributions whose densities exhibit varying behaviors near their left boundary x = 0.The first is a beta distribution with density f 1 (x) = θ 0 (1 − x) θ 0 −1 , which is in fact polynomial in x for integer values of θ 0 , which one would expect to favor LS-CJM, although that is not reflected in our simulations if θ 0 > 3. The second design is a normal distribution with mean θ 0 /2 and variance one, left-truncated at zero.This density is log-quadratic, which could favor our method and Loader's.The third and fourth designs are f 3 (x) = (e −x + θ 0 xe −x )/(1 + θ 0 ) and f 4 (x) = (e −x + θ 0 x 2 e −x )/(1 + 2θ 0 ).
For each simulation design, we use our local-linear estimator to estimate f and its derivative using g(t) = (t + z)(t − 1)(1 + zt − t) (PS), 22 Loader's local-likelihood estimator (Loader), a local polynomial regression of the empirical CDF (LS-CJM), and a generalized reflection estimator (KZ). 23Wherever a kernel is required, we use the Epanechnikov kernel k(u) = 3(1 − u 2 )/4 or a truncated version thereof.This choice is not optimal at the boundary for Loader's estimator, but the efficiency loss is small. 24For our method and Loader's, we obtain an estimate of f by multiplying the estimate of f by the estimate of L .We do not estimate f (x) using the generalized reflection method (KZ) for x > 0 because Karunamuni and  Zhang (2008) do not discuss estimation of the derivatives anywhere except at the boundary.
Where possible, we use the asymptotically optimal bandwidth sequences for z = 1 and z = 0.For intermediate values of z, we linearly interpolate the bandwidth. 25or the generalized reflection method, which requires separate bandwidths and finite-difference approximation to estimate L in a first step, we do not develop a theory of the asymptotically optimal inputs.Instead, we select a finite-differencing scheme and choose a combination of auxiliary bandwidths so that the pilot estimate of L at the boundary and the density estimate at z = 1 have the same asymptotic variances as our method using g(t) = (t + z)(1 − t).Specifically, we use a main bandwidth equal to the asymptotically optimal bandwidth for our method at z = 1, and we estimate L (0) using {log f nh L (2h L ) − log f nh L (0)} / (2h L ), where f nh L (2h L ) and f nh L (0) are respectively a kernel and a boundary-kernel estimator for the density whose variance is proportional to 1/nh L .
Comparing the square RMSE of the density estimates at zero in Table 1 and the RMSE for the derivative in Table 2, there is no clear ranking of the estimators.Figure 1 depicts the bias and RMSE of the local linear estimators for θ 0 = 4.The linear approximation of f (LS-CJM) performs well for the beta distribution, but 22 The choice of g used here is a simple formula that interpolates the optimal choice at the boundary and away from the boundary. 23The difference between the estimated coefficients in Cattaneo, Jansson, and Ma (2020) and Lejeune and Sarda   (1992) is O p (1/n).We only provide simulation results for CJM's local polynomial regression of the empirical distribution function on x i , but we note that Lejeune and Sarda's local polynomial approximation to the empirical distribution is very similar. 24The relative efficiency of Loader's estimator using the triangular and Epanechnikov kernels at the boundary is about 1.008, meaning the Epanechnikov kernel requires a sample size 1.008 times larger to achieve the same MSE as the triangular kernel.We therefore expect the Epanechnikov kernel with n = 504 to have the same MSE as the triangular kernel with n = 500. 25Specifically, we use h = h 0 1 − min x/h 1 ,1 + h 1 min x/h 1 ,1 , where h 0 and h 1 are the asymptotically optimal bandwidths at z = 0 and z = 1 and x is the point of evaluation.This bandwidth selection rule implies that the same window is used to estimate the density and its derivatives at all points within h 1 of the boundary, that is, x + h = 2h 1 for all x < h 1 .has difficulty estimating the truncated normal density and the estimate is often negative.26KZ is generally neither the best nor the worst of the estimators we consider here, but we acknowledge that we have not optimized the inputs into the KZ estimator as thoroughly as the other estimator's inputs.We also note that KZ provides a familiar benchmark away from the boundary because it is simply the RP density estimator using an Epanechnikov kernel for z = 1.As expected, PS and Loader perform similarly for all z in each design.All four estimators have the same asymptotic variance away from the boundary, but the differences in the bias due to differences in L f and f cause the relatively minor differences in the AMSE.Local to the boundary, however, we observe more significant differences between the estimators.In three of the four designs, the polynomial approximation to the density function has a greater AMSE than the logpolynomial approximation methods (PS and Loader), and the gap tends to widen at the boundary.The generalized reflection method is relatively well behaved except in the first design in which the bias of the density estimate is very large and negative at the boundary, despite the fact that f 1 (0) is well estimated.
In Figure 2, we plot the bias and RMSE for the derivative of f.In three of the four simulation designs, PS has a smaller RMSE than Loader in the interior region, which we expected because g(t) = (t + z)(t − 1) minimizes the AMSE in L using the given bandwidth sequence.Figures 3 and 4 show the estimated log-density and its first derivative.Because the density estimated using a polynomial approximation to F may be negative, the logarithm of the LS-CJM estimator is often a large negative number and may be undefined.The sampling distributions of the estimators for the log-density and its derivative may be highly skewed as a result, and the bias and RMSE of the density estimate are less informative measures, especially when compared with the other estimators.In order to make reasonable visual comparisons, we plot the median error and median absolute error (MAE) instead of the bias and RMSE.

Semiparametric Maximum Likelihood
We now compare our approach with traditional RP estimation when the density is used in the semiparametric estimation of a binary choice model.Let y * i = x i θ 0 + i , where i is a random disturbance independent of x i .The data consist of an independent sample of observations (x i ,y i ) for i = 1, . . .,n, where Klein and Spady (1993) show that the quasi-maximum likelihood estimator obtained by substituting a kernel-based estimator for p(x i θ) in (7) achieves the semiparametric efficiency bound under some smoothness assumptions.In order to adequately control the stochastic order of the bias and variance in the nonparametric estimates of p(x i θ), Klein and Spady suggest using either a fourth-order kernel with a fixed bandwidth or a second-order kernel with a locally adaptive bandwidth.If one uses a fourth-order kernel, p(x i θ) can be negative.As a result, they recommend replacing p(x θ i ) and 1− p(x i θ) with p2 (x i θ) and {1 − p(x i θ)} 2 in the log-likelihood and then dividing by two.We use the semiparametric estimator based on a fourth-order kernel with a fixed bandwidth to compare with our approach. 27sing our approach, one can estimate log p(x i θ) with log ȳ + log f1 (x i θ) − log f (x i θ), where ȳ = y i /n, f1 is an estimate of the density of x i θ conditional on y i = 1, and f is the estimated unconditional density.For example, we estimate the derivatives of the log-density at x i θ using The density at x i θ is then estimated by As in Klein and Spady (1993), we then maximize the quasi-likelihood to obtain a semiparametric estimate of θ 0 , though we do not need to modify the quasilikelihood to allow for negative density estimates.We will refer to this as the PSlikelihood estimator.Alternatively, we can write the (population) score as where L 0 , L 1 , and L are the log-derivatives of the density of x i θ 0 .We then estimate the score using where L 1 , L 0 , and L are the estimated log-derivatives of the conditional and unconditional densities of x i θ .A semiparametric estimator for θ 0 is then obtained by solving for the value(s) of θ for which the score is zero.In case there are multiple solutions, we take the solution that attains the highest log-likelihood.We will refer to this as the PS-score estimator.Note that we estimate the score of the true model, as opposed to the derivative of the quasi-log-likelihood with respect to θ .Consequently, the PS-score and PS-likelihood estimators can be quite different.In the simulation results reported below, the correlation between the estimates is approximately 0.45, which is in fact weaker than the correlation between the PSlikelihood and Klein-Spady (KS) estimators (0.59).
In the following exercise, we use the same homoskedastic simulation design as in Klein and Spady (1993).The covariates are two-dimensional: x i1 is chisquare distributed with three degrees of freedom truncated at six on the right and x i2 is an independent standard normal variable truncated at two and negative two.Both covariates are standardized to have zero mean and unit variance.The random disturbance i is standard normal and independent of the x's.
We take θ 01 = θ 02 = 1.Because θ 0 is only identified up to scale, we normalize θ = √ 2 and θ 1 > 0. 28 For the KS estimator, we use a fourth-order Epanechnikov kernel to estimate the density of x i θ .For our estimators, we choose g j (t) = (t + z ) S−j+2 (t − z r ) j+1 , where z and z r measure how close the point of evaluation is to the boundary of the support of the x i θ , and we use a truncated secondorder Epanechnikov kernel for m z .In the interior, the resulting ω 1 is a fourthorder Epanechnikov kernel when S = 3.Hence, the pointwise variance of the dominant terms in the density estimates are the same for both KS and PSlikelihood.The observed differences between them can be largely attributed to differences in the bias and the boundary correction.For the PS-score estimator, we use S = 2 and recommend a bandwidth sequence proportional to n −2/7 so that 29 This bandwidth sequence converges to zero more quickly than the bandwidths used by Klein and Spady (1993) because the bias in the estimates of the log-derivatives of the density are proportional to h 2 when S = 2.
For the sake of comparison, however, we use the same interior bandwidth h ≈ 1.4 with all three methods. 30Near the boundaries of the support of the data, we multiply h by 2 − min(z l ,z r ) so that the same window is used to estimate the logdensity and its derivatives at all points within distance h of one of the boundaries.
The simulation was repeated 1,000 times.Table 3 reports the simulated bias, median error, variance, and median absolute deviation in the estimates of θ 2 from a sample of n = 100 observations.By design, the PS-likelihood and KS estimators 28 Assuming θ 1 > 0 is a normalization because the propensity p(x θ) is not restricted to be increasing in x θ. 29 One should trim the observations as discussed in Klein and Spady (1993) to ensure convergence in the quasi-score is uniform in θ , but we do not use any trimming in the reported simulation results. 30Following Klein and Spady (1993) we select h proportional to n −1/6.02 .Unlike in their simulation exercise, however, we must also select the constant of proportionality in the bandwidth sequence because we do not use a locally adaptive bandwidth.To do so, we make a somewhat arbitrary appeal to Silverman's rule-of-thumb for the fourth-order Epanechnikov kernel h = 2.1276σ n −1/9 , where σ is the standard deviation of the observations.After normalizing  The PS-score estimator has the smallest variance, which one would also expect because it uses the same bandwidth with a lower order polynomial approximation to the log-density.Interestingly, the PS-score estimator has the smallest bias, as well, despite using the same bandwidth.The smaller bias is partially due to the fact that the distribution of the PS-score estimator is less skewed than the other two sampling distributions, as evidenced by the median errors.The median error of the PS-likelihood estimator is much closer to zero and is less than that of PS-score, as we would expect from the higher order local polynomial approximation.On the other hand, the PS-score estimator improves upon the KS estimator according to all four performance measures.The relatively large bias and median error in KS may be symptomatic of the frequently observed phenomenon in which higher order kernels and polynomial approximations do not demonstrate their bias-reducing properties in small samples.In addition, the squaring of p and 1 − p in the modified quasi-likelihood adds another source of bias to the KS estimator.Asymptotically, this modification has no effect, but it essentially replaces the estimated p with its absolute value since log(x 2 )/2 = log |x|, which naturally tends to decrease the variance and increase the bias of the KS estimator in finite samples.
This example illustrates that estimating the log-derivative of the density can provide a more direct route to the researcher's ultimate goal and may improve on the finite-sample performance of existing methods, even semiparametrically efficient ones.Though it would not be prudent to draw general conclusions based on this example, we hope it is uncontroversial to note that correcting for boundary issues and nonpositive densities estimates has predictable effects on the bias and variance of semiparametric estimators.The approach developed in this paper provides a single framework for dealing with boundary and nonpositivity issues so that researchers can choose the inputs (h,g) and possibly m z to meet their needs.

CONCLUSION
We develop asymptotically normal nonparametric estimators based on a logpolynomial approximation of the unknown density function.By approximating the log-density with a polynomial, we can guarantee that our estimated density is nonnegative; and by using a polynomial approximation instead of a local constant approximation, we achieve the optimal rates of convergence at the boundary of the support as well as in the interior.
Because our approach allows for a relatively larger degree of customizationthe researcher must specify a bandwidth, a kernel, and a vector-valued function g that is zero at the extremes of its support-we explore the optimal set of inputs.Unlike the standard analysis of optimal kernel and bandwidth inputs, our estimator is nonnegative at the point of evaluation under a relaxed set of constraints.Because these constraints were needed in order to derive the optimal kernel for use with alternative methods, there is no interior solution to the optimal choice of inputs using our approach.If one imposes the additional restriction that g cannot cross zero, the AMSE-minimizing inputs achieve the same asymptotic variance as the standard kernel density estimator with the Epanechnikov kernel in the interior and the optimal boundary kernel derived in Zhang and Karunamuni (1998) at the boundary.Otherwise, if the researcher uses a larger bandwidth, marginal reductions in the asymptotic mean square error are possible.In the extreme, an asymptotically unbiased estimator can be obtained using a suitable choice of g.This approach would be analogous to the use of a higher-order kernel in standard kernel density estimation, except that our methods would still guarantee a nonnegative estimate.More generally, our approach is based on a sample analog to partial integration and can be applied to other settings, as well, such as the estimation of hazard functions or propensity scores.
In simulation exercises, we explore the finite-sample behavior of our approach to density estimations as well as the estimation of a semiparametric binary choice model.Using the constrained-optimal choice of inputs designed to mimic the asymptotic behavior of alternative density estimators as closely as possible, we unsurprisingly find comparable finite-sample performance, especially when compared with the local likelihood estimator of Loader (1996).The advantage of our approach compared with Loader (1996) is that our estimator is computationally simpler, which can be an important feature when the density is used as an input in an estimation algorithm.As an illustration, we apply our approach to the estimation of the semiparametric binary choice model studied by Klein and Spady (1993) and find that our method reduces the finite-sample bias and variance in their Monte-Carlo experiment.and where r hx = hx − R hx β is the vector with elements Let ˆ hx , Rhx be sample analogs of hx ,R hx , for example
Proof.The right hand side equals Proof.From ( 7) and ( 8) it follows that by the mean value theorem and Lemma 1.
Proof.Repeat the steps of the proof of Lemma 2. A.2.2.Distribution.
LEMMA 5. Bhx − B hx = O p (1/ √ nh) and Proof.We first show the result for âhx .
The normality of the limit follows from a standard central limit theorem, for example, Eicker  (1966).Because the estimator is linear and {x i } is i.i.d., the asymptotic mean and variance are easily obtained, for example, the (j,s)-element of the asymptotic variance matrix is Now, for any j,s = 1,...,S, which by standard kernel estimation theory has a limiting mean zero normal distribution and is hence O p (1).Proof.We have Proof.The bias has two components: one is due to the estimation of β and the other to the finite polynomial approximation.Consider the latter first.We have For the bias due to the estimation of β note that by Lemma 6 and Theorem 1 this bias is equal to Proof.By Lemmas 7 and 8, the desired bias is Proof.Follows from Lemmas 6 and 8 and Theorem 1. A.3.2.Distribution.
Proof.The bias result follows from Lemma 9 and the definition of .For asymptotic normality and the variance formula, note that Note further that by Lemma 6, the asymptotic distribution of D net of bias is governed by S s=1 h s ( βs − βs )c msz , which by ( 9) and ( 10) and Lemma 4 is Thus, which has the stated limit distribution by for example, Eicker (1966).
Proof of Theorem 2. We show the result for f where the result for L follows from the delta method.By Lemma 12, we only need to consider N − f (x) D. Apply Lemmas 9 and 11.

Condition
Proof.We show the rates for ˆ hx where the other rates obtain similarly.By a standard kernel variance expansion, we get The stated results then follow by taking square roots of the relevant rate on the right-hand side in the last displayed equation.

Condition
E fm − f (x) Proof.These follow from standard kernel bias expansions.
Proof of Theorem 3.Because 11 = 0 and g can be multiplied by any constant without affecting the estimator, we assume without loss of generality that 11 = −1.Next, we observe that Hence, h β1 = o p (1) by Lemmas 13 and 14 when f (x) = f (x) = 0 and g is symmetric so that 12 = 0. Thus, The result involving f β1 follows from the fact that f = O p (h 2 ) and h β1 = o p (1).
Proof of Theorem 4. Because 11 = 0 and g can be multiplied by any constant without affecting the estimator, we assume without loss of generality that 11 = −1.
In either case, the denominator in the definition of f * explodes.Thus, although the numerator is O p (n 1/5 ) because c mz1 is generically nonzero, f * converges exponentially toward zero.
If one considers the case x = zh then much the same derivations arise except that one has to take additional expansions.For instance, in the 12 = 0 and f (0) = 0 case, we would have

B. EXISTING METHODS
This appendix presents the key equations in a standardized notation for related estimators.

B.1. Local Polynomial Regression
The estimator analyzed in Cattaneo, Jansson, and Ma (2020) is a local polynomial regression of F n (x i ) on x i .That is, they estimate a smooth distribution and its derivatives by solving The first-order conditions yield the linear system of equations The difference between the estimated polynomial coefficients using ( 14) and ( 15) is O p (1/n), which derives from the difference between n x (i) x (i−1) k y−x h (y − x) j dy and k x (i) −x h (x (i) − x) j for j = 0,...,2(S + 1), where the subscripts denote the ith order statistic.

B.2. Local Likelihood Density Estimation
The estimator in Loader (1996) solves The first term in the objective represents the locally weighted log-likelihood of the data drawn from a log-polynomial density, while the second term comes from the constraint that the density must integrate to one.The first-order conditions for this problem yield the nonlinear system (y − x) j exp{p S+1 (y − x) b} dy j= 0, 1,..., S + 1.
For j = 0, the above equation resembles (3), while for j > 0, the left side is a local sample moment, and the right side is the local population moment implied by the polynomial approximation to the log-density.

B.3. Generalized Reflection Method
The generalized reflection method uses a transformation of the data such that the transformed data takes support on the entire real line, has a density that coincides with the density of the original data on [0,U ), and has a square-integrable second derivative.A transformation that achieves these aims is the mapping {x i } → {x i , − ρ(x i )} where the function ρ is given by ρ(x) = x + β 1 x 2 + Aβ 2 1 x 3 for some A > 1/3.The constant A ensures the transformation is strictly increasing; Karunamuni and Alberts (2005) and Karunamuni  and Zhang (2008) use A = 0.55 in their simulations.The density estimator is given by where ρ n is an estimator of the function ρ.The authors suggest estimating ρ by substituting β 1 with log f n (h) − log f n (0) h , where h 0 is a bandwidth and k 0 is a boundary kernel that satisfies 0 −1 k 0 (t) dt = 1, 0 −1 k 0 (t)t dt = 0, and 0 −1 k 0 (t)t 2 dt = 0. We note, however, that the details of the finite-difference approximation to the derivative of log f can be modified as long as the estimate of β 1 converges at the rate of n −1/5 , which is the optimal rate of convergence when f is assumed to have two continuous derivatives.

Figure 1 .
Figure 1.The bias and RMSE for estimators of f 1 , f 2 , f 3 , and f 4 (arranged from left to right) for θ 0 = 4.

Figure 2 .
Figure 2. The bias and RMSE for estimators of f 1 , f 2 , f 3 , and f 4 (arranged from left to right) for θ 0 = 4.

Figure 3 .
Figure 3.The median error and MAE for estimators of L 1 , L 2 , L 3 , and L 4 (arranged from left to right) for θ 0 = 4.

Figure 4 .
Figure 4.The median error and MAE for estimators of L 1 , L 2 , L 3 , and L 4 (arranged from left to right) for θ 0 = 4.

Table 1 .
RMSE of the Estimators for the Density at the Boundary (θ 0 = 4).

Table 2 .
RMSE of the Estimators for the Derivative of the Density at the Boundary (θ 0 = 4).

Table 3 .
Simulation Results for Semiparametric Estimates of the Linear Parameter in a Binary Choice Model.as possible given the two different approaches to density estimation.It is therefore unsurprising that neither dominates the other in terms of bias and variance.The variance of the PS-likelihood estimator is noticeably larger than that of the KS estimator, which one would expect in light of the boundary correction which reduces bias near the boundaries at the cost of an increased variance.
A.3.1.Bias.The next few lemmas are concerned with the asymptotic bias.LEMMA 7. The bias in N − f (x) is f (x) S+1 s=1 P s (β 1 ,...,β s )c msz h s + o(h S+1 ), where P s is a complete exponential Bell polynomial.
1, the rates in the following table apply to the variance of f m , ˆ hx , and Rat hx .