Heights and quantitative arithmetic on stacky curves

In this paper we investigate a family of algebraic stacks, the so-called stacky curves, in the context of the general theory of heights on algebraic stacks due to Ellenberg, Satriano, and Zureick-Brown. We first give an elementary construction of a height which is seen to be dual to theirs. Next we count rational points having bounded E-S-ZB height on a particular stacky curve, answering a question of Ellenberg, Satriano, and Zureick-Brown. We then show that when the Euler characteristic of stacky curves is non-positive, that the E-S-ZB height coming from the anti-canonical divisor class fails to have the Northcott property. Next we prove a generalized version of a conjecture of Vojta, applied to stacky curves with negative Euler characteristic and coarse space $\mathbb{P}^1$, is equivalent to the $abc$-conjecture. Finally, we prove that in the negative characteristic case the purely"stacky"part of the E-S-ZB height exhibits the Northcott property.


Introduction
Two of the outstanding conjectures in number theory are the so-called Manin-Batyrev conjecture [12] for the density of rational points on open subschemes of Fano varieties with respect to a Weil height, and Malle's conjecture [18] on the number of number fields having bounded discriminant, fixed degree, and fixed Galois group. Both conjectures assert, roughly, that the number of objects to be counted with an appropriate height at most X satisfy an asymptotic formula of the form where C, α, β are non-negative numbers with C, α > 0, and that C, α, β can be computed explicitly within their respective geometric and arithmetic frameworks.
In a recent article J. Ellenberg, M. Satriano, and D. Zureick-Brown formulate a bold conjecture that encompasses both the Manin-Batyrev and Malle conjetures as special cases [9,Main Conjecture]. Their conjecture concerns counting rational points with respect to a new theory of heights applying broadly to algebraic stacks which extends Weil's theory of heights on varieties. While the Manin and Malle conjectures are well studied, comparatively little is known about the behaviour of rational points on algebraic stacks.
In this article, we study heights on the stacky analogue of a smooth projective algebraic curve defined over Q, that is a stacky curve. Just as working with algebraic curves greatly simplifies the theory of algebraic geometry, so does working with stacky curves greatly simplify the theory of heights on algebraic stacks. The theory of heights on our stacky curves has the benefit of being completely explicit, and can be understood in an elementary (but not easy) manner. Indeed, we will see that natural questions involving stacky curves leads to an equivalent formulation of the abc-conjecture.
Let X be a "nice" algebraic stack defined over a number field K. Ellenberg, Satriano, and Zureick-Brown show that there is a function where Vec(X) is the collection of isomorphism classes of finite rank vector bundles on X, with the property that whenever X = X is a projective algebraic variety then H X has the following properties.
(1) The restriction H X | Pic(X) is Weil's height machine in the usual sense.
(2) If E is a vector bundle on X then H X (E) = H X (det E). We call the above construction the Ellenberg-Satriano-Zureick-Brown Height machine. The first point says the E-S-ZB Height machine recovers Weil's height machine when applied to an algebraic variety, and the second says that no new height functions are obtained for algebraic varieties.
The fact that heights are assigned to all finite rank vector bundles and not just line bundles is a crucial feature; indeed the height function that recovers Malle's conjecture in this framework comes from a vector bundle and not a line bundle. This demonstrates that this level of generality is necessary.
We now state the following version of the main conjecture of Ellenberg-Satriano-Zureick-Brown, which recovers (weak forms of) both the Batyrev-Manin conjecture and Malle's conjecture: Conjecture 1.1 (Stacky Batyrev-Manin-Malle Conjecture of Ellenberg-Satriano-Zureick-Brown). Let X be a "nice" algebraic stack defined over a number field and let E be a "nice" vector bundle on X. Then there is an open dense substack U of X such that for all ε > 0 one has where a(E) is a number depending at most on E.
Here "nice" means it satisfies a Northcott type property that allows one to count points in an open substack.
In the framework of [9] the Malle and Manin-Batyrev conjectures represent two extremes of their theory of heights. The Manin conjecture involves counting points on a projective variety with respect to a Weil height and no theory of algebraic stacks is required. On the other hand, Malle's conjecture cannot be interpreted as a problem about counting rational points on a scheme. Indeed, as shown in [9] Malle's conjecture involves counting rational points on the classifying stack BG where G is a finite group. The theory of algebraic stacks is essential for this interpretation of the Malle conjecture, and the standard theory of heights on projective algebraic varieties is insufficient for this purpose.
In this article we study the Ellenberg-Satriano-Zureick-Brown's theory of heights on algebraic stacks in the specific case of stacky curves. The family of stacky curves we are interested in lies between the two extremes described above. The stacky curves we consider may be thought of as a smooth projective curve C defined over a number field, along the the data of finitely many points P each endowed with a "stabilizer" group of the form Z/m P Z with m P > 1. The curve C is called the coarse space of the stacky curve, and is the "best approximation" of the stacky curve by an algebraic variety. We think of a stacky curve as being built out of the coarse space by adding multiplicities to finitely many points.
We now give a description of our main results. First, we give a ground-up construction of a type of height function on stacky curves, that recovers the E-S-ZB anti-canonical height on a stacky curve with coarse space P 1 . This construction is detailed in Section 3. We denote this height function by H −K X . Due to the technical nature of this aspect of our work, we defer stating the results until the next section.
The theory developed in [9] leads to two main conjectures, that depend on the positivity properties of a certain function edd : X(K) → R. In the context of stacky curves [9,Proposition 4.9] says that edd(x) = log H −K X (x). In other words is the same as the logarithmic anti-canonical height. When edd(x) is positive we think of X as being "Fanoish" (see [9, Page 39] for a provisional definition of this term) and [9,Main Conjecture] should apply to the anti-canonical height. For stacky curves positivity is equivalent to the anticanonical height having the Northcott property;there are finitely many points of bounded height. One of our main motivations for this paper is to answer a question of Ellenberg, Satriano, and Zureick-Brown about a stacky curve with "positive" anti-canonical height. Indeed, they asked whether one can obtain Conjecture 1.1 for the stacky curve, which has coarse space P 1 and three half-points. Our first theorem answers their question in the affirmative: Theorem 1.2. Let X be the stacky curve obtained by adding three half points to P 1 . Then the Stacky Batyrev-Manin-Malle Conjecture is true for H −K X .
In fact we prove more in Theorem 1.2, namely we show that the points of bounded E-S-ZB anti-canonical height satisfies an exact order of magnitude.
We note that P. Le Boudec has independently verified this case in a private communication.
A natural question which arises is when in the anti-canonical height "positive" in the sense described above. Equivalently, when does [9,Main Conjecture] apply to the anti-canonical height of a stacky curve. Precisely, when does the E-S-ZB anti-canonical height function produce a genuine height function, in the sense that it satisfies Northcott's property. We show that the positivity of the anti-canonical height is equivalent to the positivity of the Euler characteristic of the stacky curve. We can interpret our family of stacky curves as a mixture of geometry (the coarse space P 1 ) and arithmetic (the addition of various stacky points). In the purely geometric case of algebraic curves, the geometry determines the arithmetic in the sense that the anti-canonical height has Northcott's property if and only if the curve has positive Euler characteristic. It turns out in the stacky case the Euler characteristic measures not only the geometry of the coarse space (base scheme) but also the additional structure imposed by the stacky points. This leads to the following theorem, which is an exact analogue of the above statement regarding algebraic curves: Theorem 1.3. Let X be a proper smooth stacky curve defined over Q that has coarse space P 1 or is isomorphic to a smooth projective curve. Then the anti-canonical height H −K X has the Northcott property if and only if χ(X) > 0, where χ(X) is the Euler characteristic of X.
The above result tells us that the E-S-ZB height machine when applied to the tangent bundle recovers the behavior of the Weil height machine when applied to the tangent bundle, providing additional evidence that the E-S-ZB theory of heights if the correct generalization of the classical theory. In particular, Theorem 1.3 demonstrates that Conjecture 1.1 is a direct general ization of the classical Batyrev-Manin conjecture for Fano varieties.
When edd(x) is not positive, Ellenberg, Satriano, and Zureick-Brown have proposed a generalized Vojta's conjecture applicable to the case of algebraic stacks [9,Conjecture 4.23]. In the case of stacky curves, the stacky Vojta conjecture can be phrased in a particularly simple manner. The failure of edd(x) to be positive is equivalent to the failure of H −KX to have the Northcott property. Given that the anti-canonical height H −KX fails to have the Northcott property when the Euler characteristic is negative, a natural question is how far the function is from having Northcott's property. P. Vojta asked similar questions in his thesis (and see [26] for an update) and indeed gave a far reaching series of conjectures relating the geometry and arithmetic of algebraic varieties.
In [9, 4.7] it is speculated that [9,Conjecture 4.23] for stacky curves should follow from some version of the abc-conjecture. In the case of algebraic curves, Vojta's conjecture is known to be equivalent to the abcconjecture. We show that, much like the case of algebraic curves, the stacky analogue of Vojta's conjecture in the curve case is equivalent to the abc-conjecture. We formulate this as follows: Theorem 1.4. Let X be a proper smooth stacky curve defined over Q that has coarse space P 1 or is isomorphic to a smooth projective curve. Further suppose that X has negative Euler characteristic. Then the following statements are equivalent: (1) The abc-conjecture holds; and (2) For all X satisfying the hypotheses of the theorem and for all δ > 0 the function H −K X · H δ has Northcott's property, where H([x, y]) = max{|x|, |y|} is the usual height function on P 1 (Q).
Theorem 1.4 shows that Conjecture 4.23 in [9] is equivalent to the abc-conjecture, answering a question of Ellenberg, Satriano, and Zureick-Brown.
In [9] the authors wonder if the stacky Vojta conjecture might be more "in reach" for algebraic stacks obtained by rooting along a divisor D on an scheme X. The proof of Theorem 1.4 shows that if there is some m ≥ 4 such that item (2) in Theorem 1.4 holds for X m = X(P 1 : ((0, 1, ∞) : (m, m, m)) then a weak variant of the abc-conjecture can be derived. Specifically, there exists a positive number c m ≥ 1 such that for any co-prime a, b, c ∈ Z with a + b = c and ε > 0 that max{|a|, |b|, |c|} ≪ ε,m rad(abc) cm+ε .
In particular, any progress on the stacky Vojta conjecture for curves would lead to substantial progress on the abc-conjecture.

A further elaboration of our ideas
In this section we motivate and describe our main results in more detail, as well as describe our grounds-up height construction.
2.1. An elementary height machine on stacky curves. Our point of view of algebraic stacks is to adopt a bottom-up perspective. In other words, we define our algebraic stacks in terms of a base variety along with some extra data which is enough to uniquely construct an algebraic stack. As we are interested in a well behaved family of stacky curves this description will be particularly simple. The bottom up point of view allows us to discuss the objects we are interested in a concrete way that avoids technicalities and emphasizes the data most important for our purposes. The interested reader may consult [14] for general results involving the bottom up perspective on algebraic stacks and [25,Lemma 5.3.10] for the case of stacky curves.
Using the bottom up perspective, a stacky curve defined over a number field K is uniquely determined by the following data: A smooth X defined over K. A finite number of stacky points P 1 , . . . , P r along with integer multiplicities m Pi = m i > 1 attached to each point P i . We use the notation. X = (X : (P 1 , m 1 ), · · · , (P r , m r )) to denote the stacky curve with multiplicities m Pi = m i at the points P i . We will write X(X : (a, m)) as an abbreviation. We identify the rational points of the stack X with those of the coarse space. That is, we require X(K) = X(K). In [9] some care is taken to work with the locus of stacky points on an algebraic stack. In particular, one must contend with the accumulation of infinitely new stacky points. We ignore such difficulties since in the context of Conjecture 1.1 one may always remove such an accumulating collection of stacky points. Therefore from the point of view of counting points asymptotically on stacky curves this presents no issues.
We now will work exclusively with (stacky) curves of this shape, which is the algebraic stack obtained by replacing {a 1 , · · · , a n } ⊆ P 1 with B(Z/m j Z) for j = 1, · · · , n. For a precise definition of this object, see [14]. In [9] these are treated in section 4.6. The points a 1 , · · · , a n are "stacky points" which have an attached stabilizer group Z/m j Z and account for X(P 1 Q ; (a, m)) not being a projective variety. The other points behave like the points on the quasi-projective variety P 1 − {a 1 , · · · , a n }. This construction is an example of a "bottom-up" description of an algebraic stack; the algebraic stack X(P 1 Q ; (a 1 , m 1 ), · · · , (a n , m n )) is constructed from the data of P 1 Q and the points with their associated multiplicities. In fact a large class of algebraic stacks can be constructed this manner, see [14] for details and for further references and recent appearances of these objects see [25], [2], [27], [9, 4.6].
To apply the E-S-ZB height machine to a stacky curve X(X : (a, m)) we must choose a vector bundle on X(X : (a, m)). We will be primarily interested in the stacky curves X(P 1 Q : (a 1 , m 1 ), . . . , (a r , m r )) and the vector bundle a line bundle. As in the classical case one may work with divisors rather then line bundles. It suffices to consider divisors of the form where D is a divisor on P 1 and 0 ≤ c i ≤ m i − 1. To associate a height on such a divisor, we associate a height to each b i [a i ] and extend linearly. Motivated by [9] we have the following construction. For each stacky point a i = [α i : β i ] with α i , β i coprime integers we associate to it the linear form ℓ i (x, y) = α i y − β i x. For each m i define φ mi (n) is defined to be the smallest positive integer such that nφ mi (n) is a perfect m i -th power. The height function associated to c i [a i ] is then The linear form ℓ i takes into account the point a i and φ mi accounts for the multiplicity of a i , while the power c i accounts for the multiple of [a i ]. The introduction of the functions φ mi is due to [9] and working with these functions is a key feature of stacky curves with coarse space P 1 . As any divisor D on P 1 is linearly equivalent to deg D 2 (−K P 1 ) we can define a height function for any divisor whenever x, y are coprime integers. We call the above construction the Stacky Curve Height Machine. The benefit of this construction is that it is completely explicit and requires no machinery from the theory of algebraic stacks to compute. For example, in [9] one computes the height using a stacky version of Arakeolv theory, and an object known as a tuning stack. Such difficulties can be avoided in our context. In fact, this construction detects the S-integral points of the stacky curve X in a suitable sense. In particular a point (x : y) is an integral point of X(P 1 Q , (a 1 , m 1 ), . . . , (a r , m r )) if and only if is the Euler characteristic of the stacky curve, which is defined as the degree of the anti-canonical divisor.
When we wish to emphasize the dependence of (a, m) we write H −K X ([x : y]) = H (a,m) (x, y).

2.2.
Properties of the anti-canonical E-S-ZB height H −K X . The Northcott property of the naive height on P 1 implies that the E-S-ZB anti-canonical height H −K X has the Northcott property whenever On the other hand, if χ(X) ≤ 0 then it is not at all obvious whether H −K X should have the Northcott property. The following question is fundamental: Let L be a line bundle on X(X : (a, m)) and let H L be the associated E-S-ZB height. When does H L have the Northcott property? We will tackle this question when L = −K X and X = P 1 leaving the general case for future study. In this situation, this amounts to studying the positivty properties of the function edd(x) defined in [9]. This is important for the following reason, when edd(x) is sufficiently positive then [9, Main Conjecture] should apply. While when edd(x) is negative the stacky Vojta conjecture of [9] should be applicable.
This may seem surprising for the following reason: in the case when m = (2, · · · , 2) n and n ≥ 5, the height (2.2) should have the Northcott property, based on the heuristic that although the exponent of the naive height is negative, that the complexity of the equations should compensate for this. For example consider the following height x, y are coprime integers and sqf(| x |) is the square free part of x. The Northcott property asks if there are infinitely many [x : y] ∈ P 1 (Z) with H([x : y]) 2 ≤ T for some number T . A priori it may seem difficult to find infinitely many coprime pairs of integers (x, y) such that sqf(| If we assume that the E-S-ZB theory should behave roughly like its classical counterpart we can argue the converse: when χ(X) ≤ 0 one should have that H −K X should fail to have the Northcott property.
Arithmetic intuition from the theory of curves: For an algebraic variety X defined over a number field K, we say that a function H X has the strong Northcott property if for any positive numbers T, D the set (2.4) N (X, K; D, T ) = {P ∈ X(Q) : H X (P ) ≤ T, [K(P ) : K] ≤ D} is finite. Here K(P ) is the finite extension of K obtained by adjoining the coordinates of P . We introduce some new notions to make our statements as precise as possible. 13]). Let C be a smooth and geometrically integral projective curve defined over a number field K. Let e ≥ 1 be an integer. Define The arithmetic degree of irrationality of C is defined to be a.rr K (C) = min{e ∈ Z ≥1 : C e is infinite.} The arithmetic degree of irrationality is always finite since the gonality of a curve provides an upper bound (see [13] for an introduction and further references). Indeed, if C is a smooth projective and geometrically integral curve defined over a number field K with gonality d then there is a degree d finite surjective morphism φ : C → P 1 K , which provides infinitely many points on C of degree at most d. The notion of arithmetic irregularity can be related to the failure of the Northcott property for the anti-canonical height. Proposition 2.3. Let C be a smooth projective and geometrically integral curve defined over a number field K with χ(C) ≤ 0. Let −h C be the logarithmic anti-canonical height of C. Then Proof. First suppose that χ(C) = 0. Then g(C) = 1 and C is a genus 1 curve. Therefore the canonical divisor is trivial. Then we have that a.rr K (C) ∈ {1, 2}. If a.rr K (C) = 1 there C has infinitely many K points and thus −M 1 ≤ −h C (P ) ≤ M 1 for some constant M 1 and so min{D ∈ Z ≥1 : #N (C, K; D, M 1 ) = ∞} = 1. Conversely if min{D ∈ Z ≥1 : #N (C, K; D, M 1 ) = ∞} = 1 then there are infinitely K points in C and we have a.rr K (C) = 1. If a.rr K (C) = 2 then C(K) is finite. Thus min{D ∈ Z ≥1 : #N (C, K; D, M 1 ) = ∞} = 2. Conversely if min{D ∈ Z ≥1 : #N (C, K; D, M 1 ) = ∞} = 2 then C(K) is finite. We may now assume that χ(C) < 0 which means g(C) > 1. Therefore C is of general type, and h C is a height associated to the canonical divisor. As g > 1 the canonical divisor is ample, therefore there is some constant T with h C (P ) ≥ T for all P ∈ C(K). Therefore, −h C (P ) ≤ −T for all P ∈ C(K). Thus The anti-canonical height of an algebraic curve of non-positive Euler characteristic defined over a number field always fails to have the strong Northcott property. In the case of stacky curves of the form X(P 1 : (a, m)) we have that the arithmetic irregularity and gonality are both equal to one. It is clear that there are infinitely many rational points and the coarse space mapping gives a cover X → P 1 . Therefore if the E-S-ZB theory behaves analogously to the theory of Weil heights we expect that the E-S-ZB anti-canonical height on stacky curves with non-positive Euler characteristic to also fail to have Northcott's property.
Geometric intuition from the interplay of positivity properties and the theory of heights: Let X be a smooth projective variety defined over a number field K. If L is a line bundle on X there is a height function H L associated to L. If L is ample then H L has the Northcott property, while if L is not ample we expect H L to fail to have the Northcott property. Thus for a stacky curve X to determine when H L has the Northcott property we should determine when L is an "ample line bundle". There does not seem to be a general theory of ample line bundles on algebraic stacks, however for stacky curves there is a natural notion extending the classical definition. Let X be a nice stacky curve. Then there is a natural notion of degree for line bundles, extending the usual notion of the degree of a line bundle on a curve. It is natural the to call a line bundle L on X ample if deg X L > 0. Thus one might expect that H L has the Northcott property if and only if deg X L > 0. Now take L = T X . Then one has deg X T X = χ(X) and consequently we should expect that the anti-canonical height H T X has the Northcott property if and only if deg X T X = χ(X) > 0.
In summary, Theorem 2.1 shows that arithmetic and geometric intuition described above prove to be correct when X is birational to P 1 . In other words, the E-S-ZB theory behaves as predicted by the classical situation of curves and the classical theory of heights associated to ample line bundles. This answers a question posed by Ellenberg. It also suggests that the theory of heights may be of help in developing a theory of positivity in the setting of algebraic stacks. Indeed, one potential way of proceeding in certain cases is the following. It is relatively easy to show that for any line bundle L on a stacky curve X = X(X : (a, m)) one can find an integer N X such that N X L = π * X L for some line bundle L on X, here π X is the coarse space mapping. Since there is a good theory of ampleness on X one might try and say that L is ample if and only if N X L is ample. This is equivalent to demanding that deg X L > 0 as As the second notion may be extended to certain other classes of algebraic stacks, our results suggests that this may be an appropriate generalization of ampleness in arithmetic settings. On the other hand, this notion of ampleness only depends on coarse space of the curve, which makes it potentially unsatisfactory from a stacky point of view. Getting to the bottom of this tension seems to very interesting and deserves further study.
The proof of Theorem 2.1 uses the following theorem about elliptic curves: be a non-singular binary quartic form. Then there exists square-free d ∈ Z such that the the curve has a rational point and such that its Jacobian has positive rank as an elliptic curve defined over Q.
The proof of Theorem 2.4 is provided to us by A. Shnidman in [20], and we graciously acknowledge his assistance.
Combining these results gives the following uniform statement.
Corollary 2.5. Let X be a smooth proper stacky curve defined over Q such that X has coarse space P 1 Q or X is a projective algebraic curve. Let H X be the height associated to the anti-canonical divisor −K X . Then χ(X) > 0 if and only if H X has the strong Northcott property.
Proof. If X is an algebraic stack and not an algebraic curve then it is of the form X = X(P 1 Q ; (a, m)) and (2.1) gives the desired result. On the other hand, if X is a smooth projective and geometrically integral curve then χ(X) ≤ 0 implies that −H C does not have the Northcott property by the fact that a.rr K (C) < ∞.
Corollary 2.5 compels the following question: how should one define the arithmetic irregularity of a stacky curve X = X(X : (a, m))? There are two natural definitions that come to mind when χ(X) ≤ 0. We might define a.rr K (X(X : (a, m))) = a.rr K (X). That is, the arithmetic irregularity of the stack is the arithmetic irregularity of the underlying base curve/coarse space. On the other hand using one might try to use (2.3) to define the arithmetic irregularity of X when χ(X) < 0 as Our results show that these definitions are the same when χ(X) ≤ 0, and X = X(P 1 Q , (a, m)), precisely as what occurs in the case of algebraic curves.
In cases where we can prove that the Northcott property fails, according to [9] there should be a stacky Vojta conjecture. This can be explained in the following elementary manner. We would like to know whether the height H (a,m) can be modified to recover the Northcott property. This has the following motivation from the classical setting. Let L be a line bundle on a smooth projective variety X. Let M be a chosen ample line bundle on X. Then one considers inf{t ∈ R ≥0 : L + tM is ample} as a measure of how far L is being from an ample line bundle. We would like to ask similar questions for our height functions. Difficulties arise because the E-S-ZB height machine is not functorial in the usual sense; in the setting of algebraic varieties one can work with a linear spaces of divisors and then apply the height machine which by functoriality will respect the linear structure. Such methods are not immediately available to us. Instead we will apply the height machine, and then apply linear operations. Our motivating question is as follows, if X = X(P 1 : (a, m)) with χ(X) ≤ 0, what can be said about inf{t ∈ R ≥0 : H t P 1 H −K X has the Northcott property}.
Clearly, if we change the exponent in the classical part of the height so that it is positive, then we will recover the Northcott property. In fact we expect that something far less drastic suffices.
For a real number δ and the curve X(P 1 : (a, m)), define the height We then see that H (a,m) = H δ(m) (a,m) . Next put (2.6) γ(X) = inf{δ ∈ R : H δ (a,m) has the Northcott property on X}.
In fact γ(X) depends only on m, so we may also write it as γ(m). We make the following conjecture: Conjecture 2.6 (Northcott conjecture for stacky curves with coarse space P 1 ). For all (a, m), we have γ(X) = min{δ(m), 0}.
Conjecture 2.6 is in fact a version of Vojta's conjecture for stacky curves, and agrees with a conjecture of Ellenberg, Satriano, and Zureick-Brown in [9]. Towards this conjecture, we have the following: Combined with Theorem 2.1 the conjecture predicts that the set of δ ∈ R such that H δ a,m has the Northcott property is an interval of the form (δ(m), ∞) when δ(m) < 0 and (0, ∞) when δ(m) ≥ 0. Therefore, while Theorem 2.1 tells us we cannot count points with H a,m , Conjecture 2.6 predicts that we can count points using H δ(m)+ε a,m for any ε > 0.
Turning back to our motivational question in this area, we find that 2.6 predicts that inf{t ∈ R ≥0 : H t P 1 H T X has the Northcott property} = 0 when χ(X) ≤ 0. Furthermore, the infimum is a limit point that is not achieved. From this perspective it predicts that the anti-canonical height H −K X lies on the boundary of those line bundles with the Northcott property. Furthermore Conjecture 2.6 suggests that the Euler characteristic of the curve can be recovered in a natural way from the E-S-ZB theory of heights, as the smallest possible exponent for the classical part of the height such that the resulting function fails to have the Northcott property.
We proceed to prove that Conjecture 2.6 is a consequence of the abc-conjecture. However, it seems that we are very far from being able to prove such a result as strong as Conjecture 2.6 unconditionally.
In fact Conjecture 2.6 is equivalent to the abc-conjecture; see Theorem 1.4. The proof of the converse is quite different and so we give it in a separate subsection.
2.3. Quantitative arithmetic on stacky curves. In the positive Euler characteristic case, we consider a particular family of stacky curves, which includes an important example suggested by J. Ellenberg 1 and show that our theory of heights matches [9] in this instance. Finally, we verify a specific instance of the main conjecutre in [9] given by Ellenberg-Satriano-Zureick-Brown 1 using analytical methods. We remark that P. Le Boudec had obtained the same result as us in independent work (private communication).
We study the expression (2.3) a bit more carefully. It is easy to deduce that δ(m) ≥ 0 if and only if n ≤ 4, and δ(m) > 0 only if n ≤ 3. We will not consider the case n ≤ 2 in this paper.
If we assume m 1 ≤ m 2 ≤ m 3 then the only cases when we have positive Euler characteristic are when m 1 = m 2 = 2, m 1 = 2, m 2 = m 3 = 3 or m 1 = 2, m 2 = 3, m 3 = 4. In each of these three cases the Northcott property for H (a,m) holds trivially.
We now focus on the simplest cases, where m 1 = m 2 = 2 and m 3 = m, m ≥ 2. Using that PGL 2 acts 3-transitively on P 1 , we reduce to the case {a 1 , a 2 , a 3 } = {0, −1, ∞}. For [x, y] ∈ P 1 we may then set z m−1 square-free. In this notation, the E-S-ZB height is given by We normalize the height so that the exponent of the "classical part" is equal to one, to obtain the normalized height z m−1 square-free and pairwise co-prime , We prove the following theorem, which gives a crude upper bound for N m (T ): Theorem 2.9. Let X = X(P 1 : (0, 2), (∞, 2), (−1, m)) and let H m be the height function on X defined by (2.8). Then for any ε > 0 we have When m = 2 the upper bound of Theorem 2.9 is essentially the trivial bound, but it is non-trivial as soon as m > 2. In general we expect the exponent in Theorem 2.9 to be equal to the lower bound. Indeed, this can be verified when m = 2. Even more, we can give an exact order of magnitude for N 2 (T ): Theorem 2.10. There exist positive numbers c 1 , c 2 , c 3 such that In particular, we confirm the stacky Manin-Batyrev conjecture [9, Main conjecture] for X(P 1 Q , (a, 2), (b, 2), (c, 2)). For this stacky curve, [9, Main conjecture] predicts that N 2 (T ) = O ε T 1/2+ε . 1 Our theorem gives an exact order of magnitude for N 2 (T ). We remark, once again, that P. Le Boudec had obtained the same result. Further, our counting arguments are similar to those obtained by Le Boudec in [3] which studies the equation (7.4).
The other cases with positive Euler characteristic do not yield to the simple analytic counting arguments used to prove Theorem 2.10, though in principle counting rational points by height is a well-posed problem. We plan on returning to this issue in the future.
We illustrate how the stacky curve height machine (equation 2.1) allows one to detect integral points on M -curves. In this case the standard height is given by H s (a, b) = max{|a|, |b|} and the stacky height given by (2.7). They are equal precisely when | sqf(a) sqf(b) sqf(a + b)| = 1, or in the notation of (7.4), that |x 1 | = |x 2 | = |x 3 | = 1. (7.4) then turns into ±y 2 1 ± y 2 2 = ±y 2 3 , and up to rearranging we are essentially counting points on the conic (2.10) y 2 1 + y 2 2 = y 2 3 . Therefore if we denote by N (T ) the number of integral points (in the sense of Definition 3.22) on P 1 2,2,2 then: Corollary 2.11. There exist positive numbers c 1 , c 2 , c 3 such that for all T > c 3 we have The proof is elementary, since the curve can be explicitly parametrized by The condition max{|y 1 |, |y Theorem 2.10 and Corollary 2.11 imply that asymptotically 0-percent of the rational points on X(P 1 : (0, 2), , (−1, 2), (∞, 2))(Q) are integral, in the sense of Darmon (Definition 3.22).
To close off this subsection We note that in [2] Bhargava and Poonen study situations where the rational and integral points of a stacky curve satisfy the Hasse Principle. Motivated by this work we prove that the integral points on X(P 1 : (a 1 , 2), · · · , (a n , 2)) satisfy Hasse's principle.
. Then X has integral points if and only if the ternary quadratic form defines a conic with a rational point.
Notation. We denote by d k (n) for the number of ways of writing n as a product of k (not necessarily distinct) positive integers, and write d(n) = d 2 (n) for the usual divisor function. We will also use the big-O notation as well as Landau's notation. In particular, we will denote in the subscripts any dependencies; if there are no subscripts, then the implied constants are absolute.

(Stacky) Heights on M -curves
In this section we give an alternative construction of the height functions constructed in [9] in a special case: we construct the ESZ-B heights associated to line bundles on stacky curves with coarse space P 1 Q . This construction is elementary as it avoids the language of algebraic stacks.
For our construction, we eschew the theory of algebraic stacks in favour of Darmon's M -curves, which is an essentially equivalent theory that emphasizes the bottom up perspective to algebraic stacks. In other words, we only keep track of the minimum amount of data needed to construct the stack. This is analogous to only keeping track of a particular Weierstrass equation of an elliptic curve.
). Let K be a number field. An M -curve over K consists of the following data: • A smooth projective curve X defined over a number field K, and • For each P ∈ X(K) a multiplicity m P ∈ Z ≥1 ∪ {∞} with m P = 1 for all but finitely many P .
• The K-rational points of X are defined to be X(K).
Before continuing let us fix some notation. We assume that X = (X : (P 1 , m 1 ), · · · , (P r , m r )) is an M -curve as in Definition 3.1. We furthermore assume that 1 < m Pi < ∞ for each i.

3.1.
Translation between M -curves and stacky curves. We now explain the connection between stacky curves and M -curves. Recall that given a nice stacky curve X over a number field K there is a morphism π : X → X to a smooth projective curve X called the coarse space morphism, which is the universal morphism from X to a scheme. Often it is more practical to construct X from X by specifying a collection of points P 1 , ..., P r in the coarse space X and attaching cyclic stabilizer groups µ Pi to each P i . This is the bottom up approach of constructing an algebraic stack. One can think of this as specifying the ramification data of the coarse space morphism π : X → X. We often think of these points as "fractional points", because in the divisor class group of the associated curve we have added the point 1 #µP i P i . In other words, we think of a stacky curve as a smooth curve X with a choice of points P 1 , ..., P r with stabilizer groups µ Pi attached to each P i . This data defines an M -curve X = (X : (P 1 , #µ P1 ), · · · (P r , #µ Pr )). Conversely, given an M -curve X = (X : (P 1 , m 1 ), · · · (P r , m r )) with each 1 < m Pi < ∞ we consider the stacky curve with stacky points P i having the stabilizer group µ mP i . In this way one may establish a bijection between smooth proper geometrically connected Deligne-Mumford stacks of dimension 1 over K with stacky points defined over K that contain an open dense subscheme and possess a projective coarse moduli space and M -curves over K with finite multiplicities.
3.2. Construction of heights. We will give an alternative construction of heights on a stacky curve associated to line bundles. Our construction only depends on the definition of an M -curve and basic arithmetic. We then show that our height construction corresponds to the heights associated to line bundles in [9] when the coarse space is P 1 Q . As in the classical setting, we will work with height functions up to some bounded function. Line bundles on a stacky curve can be described as follows. . Let X = X(X : (P 1 , ..., P r ), (m 1 , ...m r )). Let O X (P ) be the line bundle associated to the divisor P on X. Then there are line bundles L Pi on X such that where π X : X → X is the coarse space map. Moreover, we have that any line bundle L on X can be uniquely written as The lemma tells us that every line bundle on X decomposes canonically as a line bundle on the coarse space X and some stacky line bundles L Pi that do not arise as line bundles on X. However, L ⊗mi Pi does arise from a line bundle on X. On the other hand, in [9, Definition 2.21] a height function H V decomposes as This decomposition highlights the new features of heights on algebraic stacks. The height H V is no longer additive in the sense that H V ⊗N = H N V . Furthermore, the height is no longer stable under field extensions. However, H st V is stable under field extensions and additive. On the other hand, H V does not possess a canonical decomposition into local factors, yet ν δ V;ν does admit such a decomposition.
Given a line bundle L on X we write Pi using Lemma 3.2. Taking inspiration from the decomposition (3.3) we will build the stable height H st L from Weil height functions H M and H Pi on X(K). The local factors ν δ V;ν will depend on the purely stacky data of the pairs (P i , m i ). We will construct the local factors ν δ V;ν from heights H LP i associated to the stacky line bundles L Pi .
When working with a stacky curve X = X(P 1 Q , (a, m)) this becomes particularly explicit. The stable part of the height H st V will be a Weil height function on P 1 Q , in other words a function of the form H st V (x : y) = max{|x|, |y|} d when x, y are coprime integers. To describe the local terms we will construct heights H L ⊗d i P i which each possess a decomposition and recover the local decomposition as We first construct the stable or classical part of the height. For this we require the notion of degree of a line bundle on an M -curve.
We may now define the stable height, which will correspond to the part of the height which can be computed classically.
Definition 3.4. Let X = X(X : (P 1 , ..., P r ), (m 1 , ...m r ))) and let L be a line bundle on X with where 0 ≤ d i < m i and M is a line bundle on X with π X being the coarse space map. We define the stable height associated to L as The stable height can be defined abstractly for all stacky height functions. Later in 4.3 we prove that this definition matches that in [9]. The stable height should be thought of as the part of the height that can be computed using classical height functions. Our heights will decompose as a product (3.4) H We now concentrate on constructing the height function H L ⊗d i P i associated to the stacky line bundle L ⊗di Pi .
Fix an M -curve X = (X : (P 1 , m 1 ), · · · (P r , m r )) defined over a number field K. We further choose a finite set of primes S of O K containing all the primes of bad reduction and all infinite places of K. We further choose a smooth and proper model X of X over O K,S . Everything we do is relative to this choice of model and the finite set of primes S.
To define meaningful heights we require the basics of arithmetic intersection theory as described by Darmon. It is unsurprising that intersection arises in the construction of heights, as this already occurs in the classical setting via Arakelov's construction of heights through arithmetic intersection theory.
Definition 3.5 (Darmon, [7]). Fix an M -curve X = (X : (P 1 , m 1 ), · · · (P r , m r )) defined over a number field K. Choose a finite set of primes S of O K containing all the primes of bad reduction and all infinite places of K. We further choose a smooth and proper model X of X over O K,S . Let P, Q be distinct points in X(K) and place ν a place in K with ν / ∈ S. Take p ν ⊂ O K to be the prime ideal associated to ν. We define the intersection multiplicity of P and Q at ν as follows.
(P · Q) ν := max{m : the images of P, Q in X(O K,S /p m ν ) are equal.} where the maximum over the empty set is defined to be 0.
We now package all the intersection multiplicities together while taking into account the arithmetic of the field extension K | Q. Definition 3.6. Fix an M -curve X = (X : (P 1 , m 1 ), · · · (P r , m r )) defined over a number field K. Choose a finite set of primes S of O K containing all the primes of bad reduction and all infinite places of K. We further choose a smooth and proper model X of X over O K,S . Given a prime p ν ⊆ O K we let We then package these together as The integer λ(P, t) is an exponential version of the familiar looking intersection product This is the first stage in the construction of the product of local terms in 3.3. We now show how to compute λ S,X in the following situations, which covers many cases of interest. y] = t ∈ P 1 (Z) with x, y coprime integers. Let ν be the place associated to a prime p ν .
Let E/Q be the elliptic curve given by affine equation y 2 = x 3 +Ax+B with A, B ∈ Z. Let S be the set of primes of bad redudction of E and the infinite place. Let P = (a, b) ∈ E(Z). If t = (x, y) is any other point in E(Z) then moving to the projective model we have that for any ν / ∈ S with associated prime Let us be more explicit and take E to be given by y 2 = x(x − 1)(x − λ) with λ an integer and S the set of primes of dividing λ along with the infinite prime. Take P = (0, 0) = O. Now take (x, y) ∈ E(Z) with ord p (x) = a > 1 and ord p (y) = b > 1. Then we have x = x ′ p a , y = y ′ p b and so and so p is a prime of bad reduction. Thus b ≤ a when ord p (x), ord p (y) > 1. Therefore we have λ(O, (x, y)) = p∤λ and p|gcd(x,y) p ord p (y) .
Thus λ(O, (x, y)) measures the size of the Y coordinate of a point which has a non-trivial greatest common divisor.
We will define The information contained in λ(P, t) are the intersection multiplicities (t · P ) p . As we are only interested in these exponents we will demand that We remained purposely agnostic until this point as to which properties N mi,di must have to allow for a wide variety of potential height functions in all cases. However, we suspect that we can precisely identify the relevant functions N mi,di . See question 4.6. We now specify the specific functions N mi,di which we will use. Definition 3.9. We now construct the functions N m : Z/mZ → Z ≥0 that will be relevant to us. Let (1) The most canonical approach is to consider the remainder of any representative of n modulo m of a given equivalence class. We define the canonical size function as N m,can ([r]) = r for 0 ≤ r < m.
(2) Z/mZ also has a canonical involution that can be composed with the canonical function defined above. We define (3) More generally one could consider for any d ∈ Z. With this notation, N m,− = N m,1 .
We can now define the height functions associated to the stacky line bundles L ⊗di Pi Definition 3.10. Fix an M -curve X = (X : (P 1 , m 1 ), · · · (P r , m r )) defined over a number field K. Choose a finite set of primes S of O K containing all the primes of bad reduction and all infinite places of K. We further choose a smooth and proper model X of X over O K,S . Consider the line bundle L di Pi on X given by 3.2. The stacky height function associated L di Pi is a function Returning to the situation of a fixed M -curve X = (X : (P 1 , m 1 ), · · · (P r , m r )) we now may define a stacky height machine for line bundles on X.
Definition 3.11 (The stacky height machine). Fix an M -curve X = (X : (P 1 , m 1 ), · · · (P r , m r )) defined over a number field K. Choose a finite set of primes S of O K containing all the primes of bad reduction and all infinite places of K. We further choose a smooth and proper model X of X over O K,S . Consider the line bundle be the function that identifies an integer with its remainder modulo m i . The stacky height associated to L is defined to be We call Pi the classical part of the height H L and the stacky part of the height H L .
Corollary 3.12. Fix an M -curve X = (P 1 : (P 1 , m 1 ), · · · (P r , m r )) defined over a number field K. Choose a finite set of primes S of O K containing all the primes of bad reduction and all infinite places of K. We further choose a smooth and proper model X of X over O K,S . The canonical height function H K X arises from choosing the canonical size function N mi,can : In other words, On the other hand the anti-canonical height function H −K X arises by choosing the anti-canonical size function N mi,− : Z/m i Z → Z ≥0 that sends a residue r to m i − r. That is, Proof. This follows directly from the definition, 4.4, and the fact that K X corresponds to the line bundle We now introduce two multiplicative functions φ m and r m that depend on an integer m ≥ 1. The function φ m will be the multiplicative function associated to N m,− : Z/mZ → Z ≥0 and r m will correspond to N m,can : Z/mZ → Z ≥0 . These functions will be crucial to understanding heights on the M -curve X(P 1 Q , (P 1 , m 1 ), . . . , (m r , P r )). The functions φ m and r m are dual to one another in a certain sense. This can be seen on the level of the functions N m as This duality is key to understanding the non-linear aspects of heights on stacky curves. For example, the differences between the height functions H L and H L −1 . To understand the duality between φ m and r m we define a function rad m (x). We have that φ m and r m satisfy a functional equation The function rad m (x) provides a lower bound on the usual radical of integer, in other words rad m (x) ≤ rad(x) These two properties allows us to relate the stacky Vojta conjecture of [9] to the abc-conjecture. We think of r m as the function associated to N m,can and φ m the function associated to N m,− Proposition 3.14. Let x ∈ Z ≥0 . Then we have

In particular we have
From the above formulas we obtain the following. The above gives an alternative relationship between these height functions> After taking 1 m powers the height functions decompose rad m into a "canonical" part and "anti-canonical" part. With these definitions in hand we define height functions on the M -curve X to be a function built out from the functions φ mP (λ(P, t)) 1 m P , and the classical Weil heights on the coarse space X.
Corollary 3.16. Fix an M -curve X = (X : (P 1 , m 1 ), · · · (P r , m r )) defined over a number field K. Choose a finite set of primes S of O K containing all the primes of bad reduction and all infinite places of K and a smooth and proper model X of X over O K,S . Consider the line bundle In particular, when X = P 1 we have where χ(X) = − deg K X is the Euler characteristic of X.
One interesting feature of the heights given by (3.11) is that they differentiate between rational and integral points on M -curves; this is one of the main features of these heights. This provides additional evidence that Ellenberg, Satriano, and Zuerick-Brown's theory of heights on algebraic stacks is an appropriate one.
The connection to [9] and our heights is the following which is proved in 4.1.
Theorem 3.17. Fix an M -curve X = (P 1 Q : (P 1 , m 1 ), · · · (P r , m r )). Choose S to be the set of all finite primes of Z and let P 1 Z be the canonical model of P 1 Q over Z. Let L be a line bundle on X. Let H ESZ-B L be the height constructed in [9] associated to L. Then there is some constant C > 0 with That is to say, up to a constant the stacky height machine of 3.11 agrees with the ESZ-B height machine of [9] when the coarse space is P 1 Q . We now explain how the functions φ m and r m can be used to understand the difference between H L and H L ⊗n .
Proof. Since φ m is multiplicative it suffices to prove the statement for x = p a where p is some prime. Note that φ m ((p a ) n = p m−na mod m ). Therefore φ m ((p a ) −d mod m ) = p m+da mod m . On the other hand, r m ((p a ) d mod m = p da mod m = p m+da mod m as needed.
Theorem 3.19 (Duality theorem). Let X = X(X : (P 1 , m 1 ), ..., (P r , m r )) be an M -curve and let Fix an integer n = 0 and write n i = nd i mod m i .
(1) Then we always have (2) If n > 0 then In particular, Therefore we have

Now we have that
by the definition of the r i . Now let n > 0. By proposition 3.18 we have φ m (λ(P i , t) −dn mod m ) = r m (λ(P i , t) ndi mod mi ) giving the desired conclusion of (2). Corollary 3.20. Let X = X(X : (P 1 , m 1 ), ..., (P r , m r )) be an M -curve and let Then The following special cases are of particular interest.   −1], m 3 )). Let t = [x : y] with x, y, x + y coprime integers then we have H K X (t) · H −K X (t) = rad m1 (x)rad m2 (y)rad m3 (x + y) ≤ rad(xy(x + y)).

By 3.19 we have
as needed. The remaining results follows from 3.7.
This suggests that the product of a stacky height with the stacky height of its dual may be of arithmetic interest. Indeed, for a single stacky point point we see that the product of a stacky canonical height and anti-canonical height gives a lower bound Moving to three stacky points gives a lower bound It is now apparent that stacky heights can be related to the abc-conjecture. In fact the abc-conjecture would now follow from the following statement about stacky heights: Let X = (P 1 Q , ([0 : 1], m 1 ), ([1 : 0], m 2 ), ([1 : −1], m 3 )) and a constant C ǫ,X > 0 such that  m 1 ), . . . , (m r , P r )) be an M -curve. Then can one always find constants ǫ(X) > 0 and C X > 0 such that .

3.3.
Integral Points on M -Curves. Here we show that the the height functions defined in (3.11) can be used to obtain information about integral points on X. Fix a stacky curve X = X(X : (p, m)).
Let H X,Pi be the height function associated to the the line bundle L Pi . We have H X,Pi (t) = φ mi (λ(P i , t)) 1/mi . We will be interested in the height function This is the stacky part of the anti-canonical height. We could also work with the stacky part of the canonical height We will prove that the set of integral points is contained in the set of points where H D X (t) = 1. When we take K = Q we see that this condition is sufficient. In other words the S-integral points are those where the stacky part of the height is trivial. Following Darmon [7] we have the following notion of integral points on an M -curve: Definition 3.22 (Darmon). Let X = (X : (P 1 , m 1 ), · · · (P r , m r )) be a M -curve over a number field K, S a finite set of places of K containing all primes of bad reduction. Let X be a smooth proper model for X over O K,S . The (X, S)-integral points of X (usually abbreviated to S-integral points of X) are the points t ∈ X(K) such that (3.18) (t · P ) ν ≡ 0 mod m P for all P ∈ X(K) and ν / ∈ S.
We shall prove the following theorem.
Theorem 3.23. Let X = (X : (P 1 , m 1 ), · · · , (P r , m r )) be an M -curve over K satisfying our assumptions and choose S and a model X as we have specified. Then we have the following conclusions. (1) In particular, the set of S-integral points of X is precisely the set of points where H X,Pi (t) = 1 for all i = 1, ..., r.
Fix a prime ν / ∈ S and write (t · P ) ν = m eν,P (t) P · q ν,P (t) where e ν,P (t) ≥ 0 and q ν,P (t) ≥ 0 is not divisible by m P . In other words q ν,P (t) is the m P -free part of (t · P ) ν . Set N(p ν ) = p f (ν) ν . Then Using the functions λ(P, t) we can find subsets of the rational points that contain all integral points. Proposition 3.24. Suppose that m P > 1. Define X(P ; K) = {t ∈ X(K) : H X,P (t) = 1}. Then X(O K,S,X ) ⊆ X(P ; K).
We see that each point P with multiplicity m P > 1 imposes a height dropping condition on the set of integral points. Thus to study integral points it suffices to study We obtain the following, that the stacky part of the anti-canonical height cuts out the integral points. In other words, the integral points are precisely those points where the stacky part of the anti-canonical height vanishes.
Example 3.26. Let X = C be an elliptic curve. Then H D X = H −K X . In other words the integral points of the stacky elliptic curve are precisely the points where the anti-canonical height vanishes. Since φ m (x) = 1 ⇐⇒ r m (x) = 1 we have that the S-integral points of a stacky elliptic curve are also precisely the points where the stacky canonical height vanishes. If X is a scheme and X is an elliptic curve then this is certainly true, as the canonical height is trivial and every rational point is integral, and vice versa.
Proof of Theorem 3.23. We have already shown part (1) of Theorem 3.23 in (3.24). We turn to part (2) and assume that K = Q. We know that that mP >1 X(P ; Q) ⊆ X(O Q,S,X ) by (3.24). We now show the reverse inclusion. Let t ∈ X(Q) with H X,P (t) = 1 for all P with m P > 1. Since K = Q we have that N(p ν ) = p ν and f (ν) = 1 for all finite places ν. Fix P with m P > 1. Towards a contradiction suppose that (t · P ) ν0 = 0 mod m P for some ν 0 / ∈ S. Then e ν0,P (t) = 0. Notice that H X,P (t) = 1 means that λ(P, t) is an m P -power. Since if ν = ν ′ we have that p ν = p ν ′ we have by unique factorization of integers that for some integers z ν (t). In particular for ν 0 we have p m e ν 0 ,P (t) Thus z ν0 (t)m P = q ν0,P (t) which contradicts q ν0,P (t) being indivisible by m P . Thus for all m P > 1 and ν / ∈ S we have (t · P ) ν ≡ 0 mod m P and t is an S-integral point of X by definition.

Stacky curves with coarse space P 1
We focus on the situation when the base curve is P 1 . Let X = P 1 Q and S = {ν ∞ } and take L to be O P 1 (1) so the ample height is the usual one. We consider the M -curve X = (P 1 Q : (P 1 , m 1 ), ..., (P r , m r )) In this situation the λ(P, t) can be easily computed. Proof. We have that (t · P i ) p = max n {[x : y] ≡ [a i : b i ] mod p n }. Note that this means there is some λ = 0 mod p n with (x, y) = λ(a i , b i ) mod p n . Since a i , b i have been taken coprime we may assume that p does not divide a i or b i . Suppose that p ∤ a i (the other case is similar). Then λ = x ai mod p n and therefore b i x − a i y = 0 mod p n . Thus (t · P i ) p = ord p (b i x − a i y). Then we have that Definition 4.2 (Euler characteristic of M -curves [7]). Let X = (X : (P 1 , m 1 ), ..., (P r , m r )) be an M -curve. The Euler characteristic of X is defined by the formula is the genus of the curve X. We define the genus of an M -curve by the formula χ(X) = 2−2g(X).
We now begin assembling the necessary ingredients to compare our heights constructed in 3 to those in [9]. We first work with H st L . Proposition 4.3. Let X = X(X : (P 1 , ..., P r ), (m 1 , ...m r ))) and let L be a line bundle on X with Proof. In [9] a general definition of the stable height is given. Let m = r i=1 m i . By the properties of the stable height described in [9] we have that It is a fact that if L is a line bundle on X then H st,ESZ-B π * X L = H L • π X . Therefore we have

Pi
Taking m th roots gives the desired inequality.
Corollary 4.4. Let X = X(P 1 : (P 1 , ..., P r ), (m 1 , ...m r ))) and let L be a line bundle on X. Then Proof. Let L be a line bundle on X. Then we may write On P 1 we have that O P 1 (P i ) ∼ = O P 1 (1). So by definition the stable height is as needed.
We can now precisely define the heights on a stacky curve with coarse space P 1 Q .
Definition 4.5. Let X = (P 1 Q , (P 1 , m 1 ), ..., (P r , m r )) be an M -curve. Set P i = [a i : b i ] with a i , b i coprime integers and ℓ i (t) = ax − by when t = [x : y] for x, y coprime integers.
Then we have In particular we have that Let x : spec Q → X be a rational point whose image is not any of the stacky points P i . Then in [9] there is a 1-dimensional stack C called the tuning stack and a diagram spec Q Moreover, the local discrepancies can be computed at a prime p by These degrees can be computed locally on X in terms of the stacky points P i . In other words, The local degree of L at P i at Q/Z is di mi . Following [9] we have the local degrees We obtain that the contribution at P i to the local discrepancy at p can be written as Now suppose that r i = 0. Then we have that In other words we have shown that Thus the local discrepancies are given by agree with the height constructed in [9] up to a bounded constant when X = P 1 Q . In other words, does our stacky height machine recover the heights in [9] for all stacky curves over all number fields? If not, can one define different size functionsÑ mi,di : Z ≥0 → Z ≥0 so that the ESZ-B height associated to L is of the form This result will follow provided one can show that the local degree ofx * L with respect to P i over a prime p is In this case, the argument given in 3.17 would give the desired result. One might further ask if these methods could be extended to compute the height functions of line bundles certain higher dimensional analogues of stacky curves.

4.2.
Morphisms of stacky curves. We will require some results on morphisms between stacky curves. Definition 4.7 (Darmon, [7]). Let X 1 = (X 1 , (P, m P )), X 2 = (X 2 , (Q, m Q )) be M curves defined over a number field K. A morphism of M -curves over K is a morphism of algebraic curves π : X 1 → X 2 defined over K such that for all P ∈ X 1 (K) with π(P ) = Q we have that m Q | e π (P )m P where e π (P ) is the ramification index of π at P . We also define e π (P ) = eπ (P )mP mQ the ramification index of π at P . Now let X = (X : Q; (P 1 , m 1 ), ..., (P r , m r )) be an M curve. For any s < r choose positive divisors d i of m i for i = 1, ..., s. Then there is a multiplicity lowering morphism π(d 1 , ..., d s ) : X(X : (P 1 , m 1 ), ..., (P r , m r )) → X(X : (P 1 , d 1 ), ..., (P s , d s )) defined by the identity morphism on X. The usefulness of this notion can be seen by the following.  Proof. We have that Direct computation shows that (U T ) T T = (det T )U T . We then have Note that if P = [a : b] and t = [x, y] where a, b and x, y are coprime integers then λ(P, t) = ay − bx by (4.1). In other words λ(P, t) = L(P, t) as defined in (4.9).  (rad m (det α −1 )) −(m−1) φ m (λ(P, t)) ≤ φ m (λ(α(P ), α(t)) ≤ rad m (det α) m−1 φ m (λ(P, t)) Proof. Let L be as (4.1). Then we have that for some integers d 1 , d 2 which account for common factors of α(P ) and α(t). Let n 1 = q 1 m+r 1 , n 2 = q 2 m+r 2 be integers with 0 ≤ r i < m.
. Therefore using (4.6) Applying the same reasoning using α −1 we have that Therefore we have as required.

Similarly we have
We can get slightly worse, but more understandable bounds as follows. We always have rad m (x) ≤ rad(x). Note that we have that r i=1 (1 − 1/m i ) = 2 − χ(X) = 2g(X). Thus we in fact have Of particular note is that we see that when studying the Northcott property, we may change the height by an automorphism. Thus the Northcott property is stable under isomorphism as expected.

4.3.
Northcott property of the canonical height on stacky curves. We now investigate the properties of the canonical height on stacky curves, given by (4.12) H When δ(m) = 0 we see that the canonical height exhibits a clear duality with the anti-canonical height, and so the same argument shows that Northcott's property will fail. When δ(m) < 0 we then see at once that H (a,m) will have Northcott's property as a consequence that the Weil height having Northcott's property. It remains to consider Northcott's property when δ(m) > 0. In this case we have m = (2, m 2 , m 3 ) with It suffices to show that for any such pair (m 2 , m 3 ) there exist integers a, b, c such that the curve has infinitely many primitive integral solutions. This is the content of Beukers' paper [1], and we are done.

On the Northcott property of canonical and anti-canonical heights on stacky curves
In this section we prove Theorem 1.3, starting with Theorem 2.1. We start with a reduction procedure of a curve X(P 1 : (a, m)) which we describe colloquially. By convention, we shall take our weight vectors m to have the property that m 1 ≤ m 2 ≤ · · · ≤ m n .
The above construction defines a morphism by the definition of a morphism of M -curves. It is totally ramified in the sense that if i is some index that does not appear in i then π i has ramification index m i at P i . We use the term canonical this type of construction can be used for any M -curve. In particular, by taking i to be the empty set, we obtain the coarse space morphism. We will show that if Theorem 2.1 holds for a totally ramified canonical covering of the shape X(P 1 : (a ′ , m ′ )) where a ′ , m ′ is obtained from a, m respectively by removing a subset of indices, then it also holds for X (P 1 : (a, m)): see Theorem 5.2 below.
Theorem 5.2. Let X(P 1 : (a 1 , m 1 ), · · · , (a n , m n )) be a stacky curve. If the Northcott property fails for the height (2.2) for some totally ramified canonical cover of X, then it will also fail for X.
Proof. Let X be given as in the statement of Theorem 5.2. We may assume, after reindexing the points if necessary, that the Northcott property for the ESZ-B height fails for the totally ramified canonical cover given by X (1) = X(P 1 : (a 1 , m 1 ), · · · , (a k , n k )) for some k ≤ n. This implies that, for some positive number C k depending at most on a 1 , · · · , a k and m 1 , · · · , m k , there are infinitely many integers x, y such that φ mi (ℓ i (x, y)) 1/mi max{|x|, |y|} δ(m (k) ) < C k .
Next note that the quotient Observe that φ m (s) ≤ |s| m−1 for any integer s, with equality if and only if s is square-free. It follows that and from here we immediately see from the triangle inequality that Q ≪ a 1.
Thus, by replacing C k with a larger positive number if necessary, we see that the Northcott property also fails for H (a,m) on X. Now, given Theorem 5.2, it remains to consider certain minimal choices of m. We say that δ(m) is minimally non-negative if there is no subsequence m ′ of m such that δ(m ′ ) ≤ 0. We have the following lemma characterizing the minimally non-negative tuples: Lemma 5.3. Suppose m = (m 1 , · · · , m n ) with 2 ≤ m 1 ≤ · · · ≤ m n is minimally non-negative. Then n ≤ 4.
It remains to deal with minimally non-negative tuples with n = 3, 4. Before we proceed we will require Theorem 2.4, which we prove now.
Proof of Theorem 2.4. As we remarked earlier, the proof given here is provided to us by A. Shnidman in [20].
For given non-singular binary quartic form F ∈ Z[x, y] given by F (x, y) = a 4 x 4 + a 3 x 3 y + a 2 x 2 y 2 + a 1 xy 3 + a 0 y 4 we write C F for the curve defined by: . The Jacobian of the genus one curve C a,b is the elliptic curve E a,b given by where I, J are the basic invariants given by I(F ) = 12a 4 a 0 − 3a 3 a 1 + a 2 2 , J(F ) = 72a 4 a 2 a 0 + 9a 3 a 2 a 1 − 27a 0 a 2 3 − 27a 4 a 2 1 − 2a 3 2 . By 2-descent, we see that C F corresponds to a class c in H 1 (Q, E a,b [2]). Note that for any integer d, the group We now consider two cases. First suppose that c does not come from 2-torsion. In this case it is immediate that E This is because in this case C If c comes from 2-torsion, then we note that C (d) F has positive rank, which completes the proof. With Theorem 2.4, we proceed to handle minimally non-negative tuples, starting with the case n = 4.

5.1.
Minimally non-negative tuples with n = 4. We begin with the case m = (2, 2, 2, 2), and we will need Theorem 2.4. By 3-transivity of the action of PGL 2 on P 1 and Lemma 4.10, we may assume that three of the points are 0, 1, ∞, corresponding to the linear forms x, y, x + y in the variables x, y. We then write ℓ(x, y) = ax + by for the linear form representing the 4-th half-point.
Proof. In this case the height is given by H(x, y) = sqf(x) sqf(y) sqf(x + y) sqf(ax + by), so this is equal to one if and only if each of x, y, x + y, ax + by is a square. To wit, we set . This induces the equation , which is solvable and whose (primitive) integral solutions are parametrized by Inserting this into ax + by gives a(2uv Given a solution (a, b) to this diophantine equation, one obtains a genus curve defined by which is isomorphic to an elliptic curve, since it has a rational point given by w 2 0 = F a,b (u 0 , v 0 ). In particular, it must be isomorphic to its jacobian. A simple calculation shows that the jacobian of this genus one curve is given by the equation so it suffices to find a, b such that E a,b has positive rank. We find that setting u 0 = 1, v 0 = 5 and a = 17, b = −118 that the curve E a,b has positive rank, and therefore w 2 = F a,b (u, v) will have infinitely many integral solutions (u, v, w). This gives infinitely many pairs u, v such that F 17,−118 (u, v) is a square. This implies our result, since The general case will follow by applying the same ideas in tandem with Theorem 2.4. Indeed, Theorem 2.4 gives that for any a, b ∈ Z such that F a,b (u, v) = a(u 2 − v 2 ) 2 + 4bu 2 v 2 is non-singular that there exists d ∈ Z such that C F (Q) = ∅ and E F has positive rank. Fixing such a d, we see that there are infinitely many co-prime integers u, v, z such that dz 2 = F (u, v). Recall that in this set-up we have H(x, y) = 1 · 1 · 1 · sqf(F (u, v)) ≤ |d|. This concludes the proof for the m = (2, 2, 2, 2) case.
We now specialize to the points where z i = w j = 1 except for i = 2 and j = 1, 2, as well as x 1 = y 1 = 1. Applying Theorem 2.4 and using the same argument as in the (2, 2, 2, 2) case, we see that there is a choice of w 1 = d such that there are infinitely many choices of x 2 , y 2 , w 2 , z 2 satisfying The height of such a point is given by It follows that there are infinitely many points of bounded height, and so Northcott's property fails.

5.2.
Minimally non-negative tuples with n = 3. To complete the proof of Theorem 2.1, it remains to handle the cases when n = 3 and χ(X) ≤ 0. We shall assume that m 1 ≤ m 2 ≤ m 3 . We then note that δ(m) ≤ 0 if and only if one of the following conditions is satisfied: We deal with the first case. We then write

Now set
x i,j = 1 for (i, j) ∈ {(1, 3), (2, 3), (3, 3), (3, 1)} and x 1,3 = z 1 , x 2,3 = z 2 , x 3,3 = z 3 , x 3,1 = d. Then the value of the height H (a,m) (x, y) in this case is given by Observe that It remains to choose d so that the plane cubic curve has infinitely many rational points. This is an easy consequence of the seminal work of Stewart and Top (Theorem 7, [22]). In particular, they showed that the number of cube-free integers d with |d| ≤ X such that the equation x 3 + y 3 = d defines an elliptic curve with rank at least 2 is asymptotically greater than X 1/3 . We of course do not need such a strong statement, indeed we only need one such d. This completes the proof for the case m 1 ≥ 3.
We proceed to handle the case m 1 = 2, m 2 ≥ 4. Using the same notation as in (5.2), we then set and set This gives a curve dz 2 1 = z 4 3 − z 4 2 . We need to choose square-free d so that this curve has infinitely many integral solutions, and such a d exists by Theorem 2.4. The height H (a,Bm) (x, y) is given by

Note that
|z 3 | 4 ≍ max{dz 2 1 , z 4 2 }, so we obtain the upper bound It follows that there are infinitely many integers x, y such that H (a,m) (x, y) remains bounded.
Finally, we resolve the case m 1 = 2, m 2 = 3, m 3 ≥ 6. In this case, we use the fact the there exist integers a, b, c with a square-free, b cube-free, and c 6-th power-free such that the equation has infinitely many primitive solutions (see [8], Section 6.3). Thus, by fixing such a triple (a, b, c) and setting x 5,j = 1 for all j ≥ 7 we specialized a point on X(P 1 : (0, 2), (∞, 3), (−1, 6)) to the curve given by (5.3). The height of such a point is then bounded in terms of a, b, c only, and is thus absolutely bounded. This shows that the Northcott property fails in this case as well.
This concludes the proof of Theorem 2.1.

5.3.
Proof of Theorems 2.7. We proceed to prove Theorem 2.7. The claim when δ(m) = 0 is covered in Theorem 2.1, so we will not discuss it again. When δ(m) > 0 we note that n = 3, and that in each such case there exist integers a m , b m , c m such that the equation has infinitely many primitive integral solutions; see for example [1]. This shows that the Northcott property fails for H 0 .
We may now work with the case when δ(m) < 0. We see that the height H 0 is bounded below by so it suffices to show that this quantity necessarily goes to infinity. We then use the notation from (5.2), to obtain the equation By convention, we have that x i,j is square-free for 1 ≤ i ≤ 3 and 1 ≤ j ≤ m i − 1. Thus (5.4) is equal to Viewing these products as coefficients in (5.5), we see that if H 0 is to be bounded, that these coefficients must be bounded. Therefore, it suffices to check that for a fixed triple of integers a, b, c that the equation ax m1 + by m2 + cz m3 = 0 has finitely many primitive integer solutions when 1/m 1 + 1/m 2 + 1/m 3 < 1. But this is exactly the content of Darmon and Granville's paper [8], so we are done.
6. Northcott property of perturbed anti-canonical heights and the abc-conjecture In this section we prove Theorem 1.4, starting with Theorem 2.8. We consider the property of recovering Northcott's property on a modified ESZ-B anti-canonical height on the stacky curve : (a, m)). Here the modified height takes the shape Since has the Northcott property for all κ > 0. First assume that χ(X) = 0. The Northcott property for the standard height implies that H δ (a,m) (x, y) has the Northcott property whenever δ > 0. So inf{δ ∈ R : H δ (a,m) has the Northcott property} = 0 = χ(X) as needed. Now suppose that χ(X) < 0. Assume, without loss of generality, that m 1 ≤ m 2 ≤ · · · ≤ m n . We then write

It follows that
Suppose that the following inequality holds for any ǫ > 0, Then by multiplying both sides of the equation by max{|x|, |y|} χ(X)+κ we obtain Taking ε = κ 2 we have that Thus H χ(X)+κ (a,m) (x, y) must have the Northcott property as it cannot remain bounded by the usual Northcott property for P 1 . Therefore we are done if we can confirm inequality (6.1). To do so we require the following proposition, due to Granville [15]: Proposition 6.1 (Granville). Suppose that the abc-conjecture holds. Then for any binary form F with non-zero discriminant and ε > 0 we have In other words, if the abc-conjecture holds then the radical of F (m, n) will be quite large compared to the variables m, n (provided that the degree is at least 3).
We will apply Proposition 6.1 to reduce the proof of Theorem 2.8 to a linear programming problem.
Applying Proposition 6.1 to the binary form in conjunction with the above observation, we obtain: Similarly for each i we have the bound (6.5) |ℓ i (x, y)| ≪ max{|x|, |y|}.
Taking logarithms and writing y i,j = log |z i,j |, we then have an optimization problem: where B = max{|x|, |y|}. Further, we have y i,j ≥ 0 for all i, j.
We emphasize that, at this point, integrality no longer plays a role, and neither does the syzygies relating the z i,j 's. Indeed, we only need to solve the above linear program allowing arbitrary real inputs. We have that c ∈ R N where N = n i=1 nm i .

Now put
Let A be the matrix with rows representing the constraints, If we have take e i,j to be a basis of R N then we have that the the rows of A are given by Finally let b be the column vector with n + 1 entries representing the the constraints given by (6.7) and (6.8). In other words we have Our linear programming problem is then the following: let y = (y ij ) ordered as above.

Minimize:
c T y (6.15) subject to: Ay ≥ b and y ≥ 0.
The dual linear program is 16) subject to: A T x ≤ c and x ≥ 0.
where x = [x 0 , x 1 , ..., x n ]. We call a vector x dual feasible if A T x ≤ c and and vector y primal feasible if Ay ≥ b. We have the following well known weak duality statement. Lemma 6.2 (Weak duality). Let A be an m × n matrix with real entries and c a n × 1 real vector and b an m × 1 real vector. Consider the primal linear program Minimize: c T y subject to: Ay ≥ b and y ≥ 0. and the dual linear program Let y be any primal feasible vector and x a dual feasible vector. Then Proof. Let A = (a i,j ). Because y is primal feasible we have Ay ≥ b. Therefore for all 1 ≤ i ≤ m we have n j=1 a i,j y j ≥ b i .
Multiplying by x i and summing over all i we have On the other hand because x is dual feasible we have that A T x ≤ c so for each 1 ≤ j ≤ n we have Multiplying by y j and summing over all j gives Combining inequality (6.17) and inequality (6.18) gives Returning to our problem, the weak duality theorem tells us that it suffices to find a dual feasible solution We first show that x is dual feasible. In this case A is a (n + 1) × n i=1 m i matrix. So a row of A T is indexed by a pair (i, j) with 1 ≤ i ≤ n and 1 ≤ j ≤ m i . We have that the (i, j) entry of A T x = x T A can be computed as Therefore to show that x is dual feasible for an arbitrary X we need that In our case the (i, j) entry of A T x is given by is a dual feasible solution and b T x = log B(−χ(X) − ǫ). By the weak duality theorem we have that

6.2.
Proof of Theorem 1.4. One direction of the theorem is provided by Theorem 2.8, which we proved in the previous subsection. It suffices to prove the converse.
Actually, for the convese we only need the assertion that for any κ > 0 and m ≥ 4 that the function H −K Xm (x) · H(x) κ has Northcott's property, where X m = X(P 1 : ((0, 1, ∞), (m, m, m)). To see this, let us fix ε > 0. Choose 0 < κ < ε/3 and choose m ∈ N sufficiently large so that By hypothesis, we have Trivially, we see that φ m (u) ≤ rad(u) m−1 for all u ∈ Z. Hence (6.19) implies (6.20) rad(x) The height H(x, y) is given by H(x, y) = |x 1 | 1/2 |y 1 | 1/2 |z m−1 1 · · · z m−1 | 1/m max{|x 1 x 2 2 |, |y 1 y 2 2 |} 1/m , where x = x 1 x 2 2 , y = y 1 y 2 2 and (7.1) x 1 x 2 2 + y 1 y 2 2 = z 1 z 2 2 · · · z m−1 m−1 z m m , with x 1 , y 1 , z 1 , · · · , z m−1 square-free. We normalize the height by raising it to the m-th power, obtaining the bound From here we see that This bound and |z m | ≥ 1 implies that From here we obtain a crude upper bound for N m (T ), which proves Theorem 2.9. Indeed, having chosen x 1 , y 1 , z 1 , · · · , z m−1 there are then O(T 1/m /(|x 1 y 1 | 1/2 |z 1 · · · z m−1 |) possibilities for z m . Having chose z m as well, there are then O ε (T ε ) possibilities for x 2 , y 2 , since x 2 , y 2 are polynomially bounded so they will be determined by the norm-equation (7.1) up to a log factor. Thus, there are possible solutions to (7.1) satisfying the height bound (7.2). We evaluate this as To give a lower bound, we choose square-free integers a, b, c so that the curve ax 2 + by 2 = cz m has a primitive integral solution. Such a triple is guaranteed to exist; see [1]. Then we can parametrize (some) of the solutions by a triple of integral binary forms (F, G, h) where deg F = deg G = m and deg h = 2. By The height is |a| m/2 |b| m/2 |c| m−1 max{|ax 2 |, |by 2 |}, so if we treat a, b, c as constants then max{|x|, |y|} ≪ a,b,c T 1/2 . Therefore, we are looking for solutions to the Thue inequality If we restrict u, v so that max{|u|, |v|} ≪ a,b,c T 1/(2m) , then we see that the above height bound is satisfied. Thus N m (T ) ≫ T 1/m . 7.2. Proof of Theorem 2.10. In this section, we prove Theorem 2.10. To do so we will show that N 2 (T ) = O T 1/2 (log T ) 3 and give a separate argument to show that N 2 (T ) ≫ T 1/2 (log T ) 3 . The incompatibility of these two arguments represents the main obstacle as to why an asymptotic formula for N 2 (T ) remains elusive.
We count rational points of bounded height on the curve X(P 1 Q ; (0, 2), (−1, 2), (∞, 2)) with the height on P 1 given by (2.7). On writing a = x 1 y 2 1 , b = x 2 y 2 2 , x 1 , x 2 square-free (note that this differs from the notation used elsewhere in the paper) we then have H(a, b) = |x 1 x 2 | sqf(x 1 y 2 1 + x 2 y 2 2 ) max{|x 1 y 2 1 |, |x 2 y 2 2 |}. and the max on the right hand side is dependent only on the relative size of |a|, |b|. If we write (7.4) x 1 y 2 1 + x 2 y 2 2 = x 3 y 2 3 , then we further obtain the expression We may assume without loss of generality that |x 1 y 2 1 | ≥ |x 2 y 2 | 2 and x 1 > 0, so that We consider the problem of counting integral points on the variety defined by (7.4), subject to the constraint To obtain the upper bound we must dissect (7.5) into suitable ranges. When |x 1 x 2 x 3 | ≤ T 1/2 we fix x 1 , x 2 , x 3 and treat (7.4) as a diagonal ternary quadratic form, say Q x . It is then the case that (7.6) |y i | ≤ T |x 1 x 2 x 3 | · |x i | for i = 1, 2, 3, and by Corollary 2 of [5] we then have the estimate for the number of y ∈ Z 3 =0 satisfying (7.5) and (7.4) provided that the quadratic form Q x has a rational zero. Otherwise it is clear that there will be no contribution. Thus we must estimate 1≤|x1x2x3|≤T 1/2 Qx has a rational zero This is similar to the work of Guo in [16], except he counted with respect to the height x ∞ . Nevertheless the techniques are similar, and again this may be of independent interest.
Next we must deal with the case when |x 1 x 2 x 3 | ≥ T 1/2 . For this it suffices to observe from (7.6) that We then treat (7.4) as a linear form L y in x. We use this to show that the contribution for each y is O T 1/2 |y 1 y 2 y 3 | −1 + 1 , which gives an acceptable contribution upon summing over y.
For the lower bound, we first restrict y 1 , y 2 , y 3 ∈ Z =0 satisfying |y 1 y 2 y 3 | ≤ T δ for some explicit δ > 0 to be specified later. We note that to obtain the correct order of magnitude it is permissible to choose any δ > 0.
Having fixed y = (y 1 , y 2 , y 3 ), we consider the simultaneous conditions (7.4) and (7.5). This gives rise to a binary form inequality of the shape (7.7) |x 2 1 x 2 (y 2 1 x 1 + y 2 2 x 2 )| ≤ T y 2 3 y −2 1 . Because |y 1 y 2 y 3 | is small, we can count the number of solutions x to this inequality with reasonable precision. However, even with |y 1 y 2 y 3 | counting the number of solutions x with enough uniformity appears to still be a challenging task, because the binary form in (7.7) is singular. This difficulty is exacerbated by the fact that we will need to apply a square-free sieve eventually to produce triples x with each coordinate square-free.
To get around this issue, we simply count solutions to (7.7) with x 1 , x 2 satisfying the inequalities |x i y 2 i | ≤ c i T 1/4 |y 1 y 2 y 3 | 1/2 , i = 1, 2 for some positive numbers c 1 , c 2 . This has the effect that the long cusps inherent in (7.7) are removed, and reduces the problem to a more straightforward geometry of numbers question.

Upper bounds.
To obtain upper bounds, it is crucial to view (7.4) as a plane in x 1 , x 2 , x 3 when |y 1 y 2 y 3 | ≤ T 1/2 and viewing (7.4) as a conic in y 1 , y 2 , y 3 ) when |x 1 x 2 x 3 | ≤ T 1/2 . We call the former the linear case and the latter the quadratic case. We proceed to deal with the linear case below.
The key is the following lemma on counting points in sublattices of Z 2 : Lemma 7.1. Let Λ ⊂ Z 2 be a lattice. Then for all positive real numbers R 1 , R 2 the number of primitive integral points contains at least two primitive vectors in Λ, say x 1 , x 2 , then since this rectangle is convex it contains the parallelogram with end points ±x 1 , ±x 2 . The area of this parallelogram is at least as large as det Λ, since the lattice spanned by x 1 , x 2 is a sublattice of Λ. It thus follows that R 1 R 2 ≫ det Λ. Otherwise, the rectangle [−R 1 , R 1 ] × [−R 2 , R 2 ] contains at most one primitive vector in Λ. This completes the proof.
The strength of this lemma is that it gives a strong upper bound even in lopsided boxes. Given (7.4), it follows that there is at least one i ∈ {2, 3} such that Without loss of generality, we assume that this holds for i = 2. Suppose that M 1 ≤ x 1 < 2M 1 . By (7.5), we have for the number of x 1 , x 2 , which then also determine x 3 . The two bounds coincide when and we get the bound O T 1/2 |y 2 y 3 |y 2 1 y 2 2 y 2 3 |y 1 | 3 + 1 = O T 1/2 |y 1 y 2 y 3 | + 1 for the number of x 1 , x 2 , x 3 given y 1 , y 2 , y 3 . Thus, we obtain an acceptable estimate whenever |y 1 y 2 y 3 | ≪ T 1/2 , since By partial summation, we have It follows that It remains to deal with the case when |y 1 y 2 y 3 | ≫ T 1/2 , where we instead fibre over x and consider zeroes of the corresponding diagonal quadratic forms Q x . Since |x i y 2 i | ≪ x 1 y 2 1 for i = 1, 2 by assumption, it follows that If |x 1 x 2 x 3 | ≫ T 1/2 , then x 3 1 y 6 1 ≫ T 3/2 ⇔ x 1 y 2 1 ≫ T 1/2 . This implies that |x 1 x 2 x 3 | · x 1 y 2 1 ≫ T, which violates (7.5) if the implied constants are sufficiently large. It thus follows that we must have |x 1 x 2 x 3 | ≪ T 1/2 in this case.
We now fix x 1 , x 2 , x 3 and consider (7.4) as a ternary quadratic form in y 1 , y 2 , y 3 . We shall require the following version of Corollary 2 in [5], which is an analogue of Lemma 7.1: Lemma 7.2. Let x 1 , x 2 , x 3 be pairwise co-prime square-free integers. Let R 1 , R 2 , R 3 be positive real numbers. Then the number of primitive solutions y 1 , y 2 , y 3 to the equation Since |x i y 2 i | ≪ x 1 y 2 1 for i = 1, 2, it follows that This implies that Lemma 7.2 then implies that for fixed x 1 , x 2 , x 3 the number of primitive y = (y 1 , y 2 , y 3 ) satisfying (7.4) is We now sum over primitive x ∈ Z 3 satisfying |x 1 x 2 x 3 | ≪ T 1/2 , with the property that the quadratic form Q x given by (7.4) has a rational zero. By the Hasse-Minkowski theorem, this is tantamount to the form Q x (y) = x 1 y 2 1 + x 2 y 2 2 − x 3 y 2 3 being everywhere locally soluble. The estimation of this is interesting on its own right and will be handled in a separate subsection.

7.2.2.
Counting soluble ternary quadratic forms. In this section, we consider the set x i square-free for i = 1, 2, 3, x 1 y 2 1 + x 2 y 2 2 − x 3 y 2 3 is everywhere locally soluble}. By a well-known theorem of Legendre (see [16]) the indicator function for S is given by We will now combine the ideas given in [16] and those in [11]. Put Since x 1 , x 2 , x 3 are pairwise coprime and square-free, it follows that where ω(n) is the number of distinct prime factors of n. It follows that where g expresses a product of Jacobi symbols. The sum (7.10) is expected to contribute the main term while the sum is expected to be negligible, due to the cancellation of characters.
By partial summation, we obtain: where Our situation differs from that of Guo in [16] since we are counting over triples with |x 1 x 2 x 3 | ≤ X rather than max{|x 1 |, |x 2 |, |x 3 |} ≤ X, which introduces some difficulties. However, this is exactly analogous to the situation encountered by Fouvry and Kluners in [11].
Our key proposition will be: Proposition 7.3. We have the asymptotic upper bound In fact we can refine Proposition 7.3 to give an asymptotic formula, but this is unnecessary for our purposes.
We We proceed to prove Proposition 7.3 in the remainder of the section. We begin by showing that triples (x 1 , x 2 , x 3 ) with µ 2 (x 1 x 2 x 3 ) = 1 and ω(x 1 x 2 x 3 ) large contribute negligibly. To wit, put By the triangle inequality, it is clear that n .
We now follow the strategy outlined in [11] and break up the set x 11 x 11 x 12 x 31 x 32 x 21 x 11 x 12 x 21 x 22 x 31 .
We then have the following lemma: The proof then follows.
To proceed, we shall require the following well-known lemma regarding character sums: We will also need the following variant of the Siegel-Walfisz theorem: Lemma 7.7. Let χ q be a primitive character modulo q ≥ 2. Then for every A > 1 we have We now consider, as in [11], the quantities (7.14) X † = (log X) 9 , X ‡ = exp (log X) 1/8 .
We now consider those A with the property that at most 2 entries larger than X ‡ . We dissect the sum according to the number r ≤ 2 of terms A ij greater than X ‡ . Let n be the product of those x ij which are larger than X ‡ , and m the product of the remaining ones. We sum over A with such properties to obtain This is sufficiently small for our purposes.
We may now assume that A ij ≥ X ‡ for at least three pairs i, j with 1 ≤ i ≤ 3, 1 ≤ j ≤ 2. We now suppose that there exist a = b such that A a,2 , A b,1 ≥ X † .
We may now apply Lemma 7.7 to obtain the bound A a,2 p 1 · · · p ℓ−1 (log X) −A/9 + Ω, with A arbitrarily large. Note that p 1 · · · p ℓ−1 ≤ X, hence Choosing A large shows that this contribution is negligible.
Since |y 1 y 2 y 3 | ≤ T δ , we obtain an acceptable error term provided that δ < 1/4. This shows that 7.3. Counting points with respect to the canonical height when χ(X) < 0. In this section, we first prove that the number of quadratic points on a hyperelliptic curve given by the model where F is an integral, non-singular binary form having degree 2g + 2 with g ≥ 2, is dominated by the "obvious" points given by triples (x, y, F (x, y)). To show that the proper quadratic points are negligible, we note that when g = 2 the proper quadratic points, which come in conjugate pairs, are in bijection with the rational points of the Jacobian Jac(C F ) via the correspondence [P ] → [P 1 + P 2 ] − K CF , where K CF is the canonical divisor. Thus in this case the proper quadratic points of bounded height are given by the rational points of bounded height in Jac(C F )(Q), for which there are O F ((log T ) rF ) many, where r F is the Mordell-Weil rank of Jac(C F ). For g ≥ 3 the proper quadratic points are finite by Faltings' theorem. Thus, the number of quadratic points on C F is asymptotically equal to the number of rational points in P 1 Q of bounded height.
To the contrary, for X = X(P 1 : (a, m)) with χ(X) < 0, we get a much less reasonable result. This is because we have little control over the set of integers x, y such that ℓ i (x, y) is divisible by a large square for i = 1, · · · , n. Even with the abc-conjecture there is only so much that can be shown. In the case when m = (2, · · · , 2) we have the following: Theorem 7.8. Let X = X(P 1 : (a, m)) be a stacky curve with m = (2, · · · , 2) n with n ≥ 5. Let N a,n (T ) be the number of rational points on X satisfying H (a,m) (x, y) ≤ T . Assume that the abc-conjecture holds. Then for any ε > 0 we have N (a,m) (T ) ≪ ε T 1 n−3 +ε .
Therefore, the existence of the integers y 1 , y 2 , y 3 , and hence x, y, depends on whether this conic has a rational point.