Asymptotic Structure of Graphs with the Minimum Number of Triangles

We consider the problem of minimizing the number of triangles in a graph of given order and size and describe the asymptotic structure of extremal graphs. This is achieved by characterizing the set of flag algebra homomorphisms that minimize the triangle density.


Introduction
The famous theorem of Turán [Tur41] determines ex(n, K r ), the maximum number of edges in a graph with n vertices that does not contain the r-clique K r (see also Mantel [Man07]). The unique extremal graph is the Turán graph T r−1 (n), the complete (r −1)-partite graph of order n whose part sizes differ at most by 1. Thus, for fixed r, we have ex(n, K r ) = (1 − 1 r−1 + o(1)) n 2 . Rademacher (unpublished, 1941) proved that a graph with ex(n, K 3 ) + 1 edges has at least ⌊n/2⌋ triangles. This prompted Erdős [Erd55] to pose the more general problem: what is g r (m, n), the smallest number of K r -subgraphs in a graph with n vertices and m edges? Various results have been obtained by Goodman [Goo59], Erdős [Erd62,Erd69], Moon and Moser [MM62], Bollobás [Bol76], Fisher [Fis89], Lovász and Simonovits [LS76,LS83], Razborov [Raz07,Raz08], Nikiforov [Nik11], Reiher [Rei12], and others. * Partially supported by the National Science Foundation, Grant DMS-1100215. † Part of this work was done while the author was at Steklov Mathematical Institute, supported by the Russian Foundation for Basic Research, and at Toyota Technological Institute, Chicago.
Let us consider the asymptotic question, that is, what is the limit g r (a) def = lim n→∞ g r ⌊a n 2 ⌋, n n r for any given a and r? While it is not difficult to show that the limit exists, determining g r (a) is a much harder task that was accomplished only recently (for r = 3 by Razborov [Raz08], for r = 4 by Nikiforov [Nik11], and for r ≥ 5 by Reiher [Rei12]).
If a = 1, we let G be the complete graph K n and define h(1) = 1. For a ∈ [0, 1], let H a,n be the set of all possible graphs G on [n] that arise this way, H a def = ∪ n∈N H a,n , and H def = ∪ a∈[0,1] H a . If a = 1 − 1 s for integer s ≥ 1, then we have two choices for t but c = 1 s and h(a) = 6 s 3 /s 3 are still uniquely defined. In general, H a,n has many non-isomorphic graphs and this seems to be one of the reasons why this extremal problem is so difficult.
Although each of the papers [Raz08,Nik11,Rei12] implies the lower bound g 3 (a) ≥ h(a), it is not clear how to extract the structural information about extremal graphs from these proofs.
Here we partially fill this gap by showing that, modulo changing a negligible proportion of adjacencies, the set H consists of all almost extremal graphs for the g 3 -problem. Here is the formal statement.
Theorem 1.1 For every ε > 0 there are δ > 0 and n 0 such that every graph G with n ≥ n 0 vertices and at most (g 3 (a) + δ) n 3 triangles, where a = e(G)/ n 2 , can be made isomorphic to some graph in H a,n by changing at most ε n 2 adjacencies.
This theorem is obtained by building upon the flag algebra approach from [Raz08]. In order to prove it we have to characterize first the set of extremal flag algebra homomorphisms for the g 3 -problem. This is done in Theorem 2.1 of Section 2, where the precise statement can be found. This task requires some extra work in addition to the arguments in [Raz08] and is an example of how flag algebra calculations may lead to structural results about graphs. (For some other results of a similar type, see e.g. [DHM + 12, HHK + 11, Pik11,Pik12].) Our initial motivation was the following conjecture of Lovász and Simonovits [LS76, Conjecture 1] for r = 3.
Conjecture 1.2 For every r ≥ 3 there is n 0 such that for every n ≥ n 0 and m with 0 ≤ m ≤ n 2 at least one of g r (m, n)-extremal graphs is obtained from a complete partite graph by adding a triangle-free graph inside one part.
If this conjecture is proved, then one may consider the problem of determining g r (m, n) as combinatorially solved: the number of K r -subgraphs in a such graph G is some explicit polynomial in m, n, and part sizes, and the question reduces to its minimization over the integers. This task may be difficult but it involves no graph theory. In fact, it is not hard to show (see e.g. [Nik11, Section 3]) that the optimal part ratios are approximately as those of the graphs in H a , where a = m/ n 2 . (However, our rounding |V 1 | = ⌊cn⌋, etc., was rather arbitrary: it was chosen just to have the family H a well-defined.) We hope that Theorem 1.1 may help in proving Conjecture 1.2 in the same way as the socalled stability approach is useful in obtaining exact results. One example where this approach succeeded is the clique minimization problem in the special case when a = 1− 1 t for some integer t ≥ 2. First, the results of Goodman [Goo59] (for r = 3) and Moon and Moser [MM62] (for r ≥ 4) imply that for any m, n we have where the real t is defined by m = (1 − 1/t)n 2 /2. (Note that the Turán graph T t (n) shows that (5) is asymptotically best possible when t is an integer; thus g r (1 − 1 t ) = r! t r /t r in this case.) While Moon and Moser [MM62] only indicate how to prove (5), a complete proof can be found in Lovász and Simonovits [LS83, Theorem 1] who also deduced that all almost extremal graphs are close to T t (n) in the edit distance: Theorem 1.3 For every r and ε > 0, there are δ > 0 and n 0 such that, for any integer t ≥ r−1, every graph G with n ≥ n 0 vertices, (1 − 1 t ± δ) n 2 edges, and at most (g r (1 − 1 t ) + δ) n r copies of K r can be made isomorphic to T t (n) by changing at most ε n 2 edges.
In fact, a sharper form of this result (with an explicit δ = δ(r, t, ε, n)) wad proved by Lovász and Simonovits [ES83] who used it to establish Conjecture 1.2 when ex(n, K s ) ≤ m ≤ ex(n, K s ) + εn 2 for some ε = ε(r, s) > 0. Unfortunately, we have not succeeded in proving the case r = 3 of Conjecture 1.2 so far. This paper is organized as follows. We outline the main ideas behind flag algebras and state some of the key inequalities from [Raz08] in Section 2. There, we also state our result on the structure of g 3 -extremal homomorphisms (Theorem 2.1) and show how this implies Theorem 1.1. Section 3 contains a sketch of the proof from [Raz08] that g 3 (a) = h(a). Theorem 2.1 is proved in Section 4.

Flag Algebras
In order to understand this paper the reader should be familiar with the concepts introduced in [Raz07]. We do not see any reasonable way of making this paper self-contained, without making it quite long and repeating large passages from [Raz07]. Therefore, we restrict ourselves to sketching the proofs in [Raz07,Raz08], during which we informally illustrate the main ideas by providing some analogs from the discrete world. This serves two purposes: to state the key inequalities from [Raz07, Raz08] that we need here and to provide some guiding intuition for the reader who is about to start reading [Raz07]. We stress that some flag algebra concepts do not have direct combinatorial analogs or require a plethora of constants to state them in terms of graphs. Here we just try to distill and present some motivational ideas.
Many proofs in extremal graph theory proceed by considering possible densities of small subgraphs and deriving various inequalities between them. These calculations often become very cumbersome and difficult to keep track of "by hand", especially that the number of nonisomorphic graphs increases very quickly with the number of vertices. One of the motivations behind introducing flag algebras was to develop a framework where the mechanical book-keeping part of the work is relegated to a computer.
So suppose that we have a graph G. Let n = |V (G)| be its order.
The density of a graph F in G, denoted by p(F, G), is the probability that a random |V (F )|subset of V (G) spans a subgraph isomorphic to F . The quantities that we are interested in are finite linear combinations s i=1 α i p(F i , G), where F i is a graph and α i is a real constant. One can view a formal finite sum s i=1 α i F i as a function that evaluates to s i=1 α i p(F i , G) on input G. Since we would like to operate with these objects on computers, we try to keep redundancies to minimum. In particular, the graphs F i are unlabeled and pairwise non-isomorphic. Let F 0 consist of all (unlabeled non-isomorphic) graphs and let RF 0 be the vector space that has F 0 for a basis. (The meaning of the superscript 0 will be explained a bit later.) There are some relations which are identically true when it comes to evaluations on input G: for example if n ≥ ℓ ≥ |V (F )| for some graphF and we know the densities of all subgraphs on ℓ vertices, then the density ofF can be easily determined: where F 0 ℓ ⊆ F 0 consists of all graphs with exactly ℓ vertices. So it makes sense to factor over K 0 , the subspace of RF 0 generated byF − F ∈F 0 ℓ p(F , F )F , over all choices ofF and ℓ ≥ |V (F )|. Let By (6), any element of A 0 can still be identified with an evaluation on (sufficiently large) graphs.
A theorem of Whitney [Whi32, Theorem 5a] implies that a linear combination m i=1 α i F i that always evaluates to 0 is necessarily in K 0 (see the discussion in [HN11, Page 551]).
Let some F i ∈ F 0 ℓ i for i = 1, 2 be fixed. The product p(F 1 , G)p(F 2 , G) is the probability that two random subsets U 1 , U 2 ⊆ V (G) of sizes ℓ 1 and ℓ 2 , drawn independently, induce copies of F 1 and F 2 respectively. With probability 1 − O(1/n) (recall that n = |V (G)|), the sets U 1 and U 2 are disjoint. Let us condition on this event. The conditional distribution can be generated as follows: first pick a random (ℓ 1 + ℓ 2 )-set U and then take a random partition This probability depends on F 1 , F 2 , and F only. Thus Since we are interested in the case when n → ∞, we formally define the product F 1 · F 2 to be equal to F ∈F 0 ℓ 1 +ℓ 2 p(F 1 , F 2 ; F ) F ∈ RF 0 and extend this mutiplication to RF 0 by linearity.
It is not surprising that this definition is compatible with the factorization by K 0 , making A 0 into a commutative associate algebra with the empty graph being the multiplicative identity, see [Raz07, Lemma 2.4]. Unfortunately, we do not have the property that graph evaluations preserve multiplication exactly. This can be fixed if we take as input not just a single graph G but a sequence of graphs {G n } which is convergent by which we mean that |V (G 1 )| < |V (G 2 )| < . . . (we call such sequences increasing) and for every graph F the limit exists. We extend φ by linearity to RF 0 . It is routine to check that φ is compatible with the factorization by K 0 and, in fact, gives an algebra homomorphism from A 0 to R (which we still denote by φ), see [Raz07, Theorem 3.3]. We say that φ is the limit of {G n } and, following the notation in [Raz07, Section 3.1], denote this as φ = lim n→∞ p Gn .
Clearly, φ is non-negative, that is, φ(F ) ≥ 0 for every graph F . Let Hom + (A 0 , R) be the set of all non-negative homomorphisms.
It turns out that every non-negative homomorphism φ : A 0 → R is the limit of some sequence of graphs. It is instructive to sketch a proof of this, see Lovász and Szegedy [LS06, Lemma 2.4] (or [Raz07, Theorem 3.3]) for details. Take some integer n. Since the identity F ∈F 0 n F = 1 holds in A 0 , we have that F ∈F 0 n φ(F ) = 1, that is, φ defines some probability distribution on F 0 n . Let G n,φ ∈ F 0 n be drawn according to this distribution with the choices for different values of n being independent. Fix some F and ε > 0. Let n ≥ |V (F )|. Easy calculations show that the expectation of p(F, G n,φ ) is exactly φ(F ). Also, the variance of p(F, G n,φ ), which can be expressed via counting pairs of F -subgraphs versus two independent copies of F , is O(1/n). Chebyshev's inequality implies that the probability of the "bad" event |p(F, G n,φ ) − φ(F )| > ε is O(1/n) and the Borel-Cantelli Lemma shows that with probability 1 only finitely many bad events occur when n runs over, for example, all squares. Since there are only countably many choices of F and, for example, ε ∈ {1, 1 2 , 1 3 , . . . }, we conclude that {G n 2 ,φ } converges to φ with probability 1. Thus the required convergent sequence exists.
If one wishes that the graph orders in the sequence span all natural numbers, one can pick some convergent sequence and fill all orders by uniformly "blowing" up its members, see e.g. [HHK + 11, Section 2.2].
How can these concepts be useful for proving that g 3 (a) = h(a)? Pick an increasing sequence of graphs {G n } of edge density a + o(1) such that the limit of p(K 3 , G n ) exists and is equal to g 3 (a). A standard diagonalization argument shows that {G n } has a convergent subsequence; let φ be its limit. Then φ(K 2 ) = a. Now, if we show that then we can conclude that indeed g 3 (a) = h(a), as it was done by [Raz08].
Let Φ ⊆ Hom + (A 0 , R) consist of all possible limits of convergent sequences {G n }, where G n ∈ H a for some a ∈ [0, 1] and all n. Equivalently, Φ can be defined as follows. Recall that the join G 1 ∨ . . . ∨ G k of graphs G 1 , . . . , G k is obtained by taking their disjoint union and adding all edges in between. We define a similar operation on homomorphisms φ 1 , . . . , φ k ∈ Hom + (A 0 , R). We need a more general construction where one specifies how much relative weight each φ i has, by giving non-negative reals α 1 , . . . , α k with sum 1. Let n → ∞ and, for i ∈ [k], let G i,n be a graph with ⌊α i n⌋ vertices such that the sequence (G i,n ) converges to φ i ; as we have already remarked, it exists. Let F n = G 1,n ∨ · · · ∨ G k,n . Let the join φ = ∨(φ 1 , . . . , φ k ; α 1 , . . . , α k ) be the limit of {F n } (the limit clearly exists).
Alternatively, we can define the join φ without appealing to convergence. To this end, it is enough to define the density of each graph F ∈ F 0 , and we do it as follows. Let aut(F ) denote the number of automorphisms of F . Let where the summation runs over all possible ways (up to isomorphism) to partition V (F ) = V 1 ∪ · · · ∪ V k into k labeled parts (allowing empty parts) so that the induced bipartite subgraph The reader is welcome to formally check that the join is well-defined (with respect to the factorization by K 0 ) and belongs to Hom + (A 0 , R). (These facts are obvious from the first definition.) Now, Φ is exactly the set of all possible joins where 0 denote the (unique) non-negative homomorphism of edge-density 0, 1/(t+1) ≤ c ≤ 1/t, and ψ is an arbitrary non-negative homomorphism with ψ(K 3 ) = 0 and ψ( Our next result states that the set of g 3 -extremal homomorphisms is exactly Φ.
Let us show that Theorem 2.1 implies Theorem 1.1. The shortest way is to refer to some known results about the so-called cut-distance δ ✷ that goes back to Frieze and Kannan [FK99]. We omit the definition of δ ✷ but refer the reader to [BCL + 08, Definition 2.2].
Suppose for the sake of contradiction that Theorem 1.1 is false, which is witnessed by some ε > 0. Then we can find an increasing sequence {G n } of graphs with p(K 3 , G n ) ≤ g 3 (p(K 2 , G n )) + o(1) that violates the conclusion of Theorem 1.1. By passing to a subsequence, we can assume that {G n } is convergent. Let φ 0 ∈ Hom + (A 0 , R) be its limit. Let a = φ 0 (K 2 ). By (9), we know that φ 0 (K 3 ) = h(a). By Theorem 2.1, φ 0 ∈ Φ and we can choose a sequence This convergence means that G n and H n have asymptotically same statistics of fixed subgraphs. This does not necessarily implies that G n and H n are close in the edit distance. (For example, two typical random graphs of edge density 1/2 have similar subgraph statistics but are far in the edit distance.) However, the presence of a spanning complete partite graph in H n implies a similar conclusion about G n as follows.
Theorem 2.7 in Borgs et al [BCL + 08] gives that δ ✷ (G n , H n ) = o(1), that is, the cut-distance between G n and H n tends to 0. (An important property of the cut-distance is that an increasing sequence {G n } is convergent if and only if it is Cauchy with respect to δ ✷ .) By [BCL + 08, Theorem 2.3], we can relabel V (H n ) so that for every disjoint S, where v = v(n) is the number of vertices in G n . Informally, this means that the graphs G n and H n have almost the same edge distribution with respect to cuts. Take the partition (11), then we conclude that the number of S − T edges that are missing from for otherwise a random partition V i = S ∪ T would contradict (11). Thus, by changing o(v 2 ) adjacencies in G n , we can assume that the graphs G n and H n coincide except for the subgraph induced by U . Suppose that |U | = Ω(n) for otherwise we are easily done. We have Lemma 2.2 For every ε > 0 there are δ > 0 and n 0 such that for every K 3 -free graph G on n ≥ n 0 vertices and every integer s with e(G) < s ≤ min(e(G) + δn 2 , ⌊n 2 /4⌋) one can change at most εn 2 adjacencies in G so that the new graph is still K 3 -free and has exactly s edges.
Proof. Clearly, it is enough to show how to ensure at least s edges in the final K 3 -free graph. Given ε > 0, choose small positive c ≫ δ. Let n be large and let s satisfy (12). Let m = e(G). We can assume that, for example, m ≥ εn 2 /3. Also, assume that m ≤ ⌊n 2 /4⌋ − cn 2 for otherwise we are done by by the Stability Theorem of Erdős [Erd67] and Simonovits [Sim68] which implies that G can be transformed into the Turán graph T 2 (n) by changing at most εn 2 adjacencies.
The number p of paths of length 2 in G is x∈V (G) which is at least n 2m/n 2 by the convexity of the function x 2 . By averaging, there is an edge xy ∈ E(G) that belongs to at least such paths (which is just the number of edges between the set {x, y} and its complement).
Let G ′ be obtained from G by adding cn clones of x and cn clones of y. Thus G ′ has n ′ = (1 + 2c)n vertices and m ′ ≥ m + cn( 4m n − δn) + (cn) 2 edges. If we take a random n-subset U of V (G ′ ), then each edge of G ′ is included with probability n 2 / n ′ 2 . Thus there is a choice of an n-set U such that the number of edges in H = G ′ [U ] is at least the average, which in turn is at least m + cn( 4m n − δn) + (cn) 2 n 2 (1+2c)n 2 This is at least m + δn 2 ≥ s by our assumption on m. Since G and H coincide on the set V (G) ∩ V (H) of least n − 2cn vertices, G can be transformed into the K 3 -free graph H by changing at most 2cn 2 ≤ εn 2 adjacencies, as required.

Sketch of Proof of φ(K ) ≥ h(φ(K 2 ))
Let us sketch the proof of (9) from [Raz07,Raz08], being consistent with the notation defined there. Consider the "defect" functional f (φ) = φ(K 3 ) − h(φ(K 2 )), where h is defined by (4). We can identify each homomorphism φ ∈ Hom(A 0 , R) with the sequence of its values on graphs. Let us equip all products with the pointwise convergence (or product) topology. The set Hom(A 0 , R) is a closed subset of R F 0 as the intersection of closed subsets corresponding to the relations that an algebra homomorphism has to satisfy. Thus the set is closed too. Moreover, it lies inside the compact space [0, 1] F 0 , so it is compact as well. Since h(x) is a continuous function (including the special point x = 1), our functional f is also continuous and assumes its smallest value on Hom + (A 0 , R) at some non-negative homomorphism φ 0 . Fix one such φ 0 for the rest of the proof. Let a = φ 0 (K 2 ). Let t = t(a) and c = c(a) be defined as in the Introduction. Let b = φ 0 (K 3 ). We have to show that b ≥ h(a). If a = φ(K 2 ) ≤ 1/2, then h(a) = 0 and there is nothing to do. Let us write an explicit formula for the function h(x) defined in (4) when 1− 1 If a = 1 − 1 t or a = 1 − 1 t+1 , then we are done by Goodman's bound (that is, the case r = 3 of (5)). So let us assume that a lies in the open interval (1 − 1 t , 1 − 1 t+1 ). Here the function h t (x) is differentiable and it is routine to see that h ′ t (a) = 3(t − 1)c. A calculation-free intuition is that if we add one edge to H ∈ H a then the number of triangles increases by ((t − 1)c + o(1))n (while the effect of the change in the part sizes is relatively negligible); so we expect that h ′ t (a) n Let us see which properties φ 0 has. Let {G n } converge to φ 0 with |V (G n )| = n. Let ε > 0 be fixed.
It is impossible that at least εn 2 edges of G n are each in more than ((t − 1)c + ε)n triangles: by removing a uniformly spread subset of these edges we get a change that is noticeable in the limit and strictly decreases the defect functional f . Thus, if we pick a random edge from E(G n ), then with probability 1 − o(1) there are at most ((t − 1)c + o(1))n triangles containing this edge. (Note that G n has Ω(n 2 ) edges by our assumption a ≥ 1/2.) The corresponding flag algebra statement [Raz08,(3.3)] reads Let us informally explain (14). It involves counting triangles that contain a specified edge. Let F E consist of E-flags, by which we mean graphs with some two adjacent vertices being labeled as 1 and 2. Any isomorphism has to preserve the labels. We may represent elements of F E as (G; x 1 , x 2 ), where G ∈ F 0 is a graph and x i ∈ V (G) is the vertex that gets label i. Suppose that we wish to keep track of various subgraph densities and their finite linear combinations for E-flags. We can view (F ; y 1 , y 2 ) ∈ F E as an evaluation on F E that on input (G; x 1 , x 2 ) returns p((F ; y 1 , y 2 ), (G; x 1 , x 2 )), the probability that the E-subflag of G induced by a random |V (F )|-set X with {x 1 , x 2 } ⊆ X ⊆ V (G) is isomorphic to (F ; y 1 , y 2 ).
Again, if we know the densities of all E-flags with ℓ ≥ |F | vertices, then we can determine the density of (F ; y 1 , y 2 ) by the analog of (6). So we can define the corresponding linear subspace K E and let A E def = RF E /K E . The obvious analog of (7) holds, and the corresponding coefficients define a multiplication on RF E that turns A E into a commutative algebra. The multiplicative identity is E ∈ F E , the unique E-flag on K 2 . As in the unlabeled case, the limits of convergent sequences of E-flags are precisely non-negative algebra homomorphisms from A E to the reals ([Raz07, Theorem 3.3]). Now, we can turn G n into an E-flag by taking a random edge uniformly from E(G n ) and randomly labeling its endpoints by 1 and 2. Thus for each n we have a probability distribution on E-flags which weakly converges to the (unique) distribution on Hom + (A E , R), see [Raz08, Section 3.2]. In (14), φ E 0 denotes the extension of φ 0 (that is, a random homomorphism drawn according to this distribution) while K E 3 is the unique E-flag with the underlying graph being K 3 .
Let us consider the effect of removing a vertex x from G n . When we first remove d(x) edges at x, the edge density goes down by d(x)/ n 2 . Next, when we remove the (now isolated) vertex x, the edge density is multiplied by n 2 / n−1 2 = 1 + 2 n + O(n −2 ). Thus the edge density changes by −d(x)/ n 2 + 2a/n + O(n −2 ). Likewise, the triangle density changes by −K 1 3 (x)/ n 3 + 3b/n + O(n −2 ), where K 1 3 (x) is the number of triangles per x. Thus for all but at most εn vertices x we have (−2d(x)/n + 2a)h ′ t (a) < −3K 1 3 (x)/ n 2 + 3b + ε, for otherwise by removing such vertices (and taking the limit as n → ∞) we can strictly decrease the defect functional f . In the flag algebra language this reads where F 1 consists of all graphs with one vertex labeled 1, K 1 2 , K 1 3 ∈ F 1 "evaluate" the edge and triangle density at the labeled vertex, and φ 1 0 ∈ Hom + (A 1 , R) is the random extension of φ 0 .
Note that if we take the expectation of each side of (15) with respect to the random φ 1 0 ∈ Hom + (A 1 , R), then we get 0. (A calculation-free intuition is that the edge/triangle density of a graph G is equal to the average density of edges/triangles sitting on a random vertex of G.) Thus we conclude that (15) is in fact equality a.e. ([Raz08, (3.2)]).
How can (14) and (15) be converted into statements about φ 0 ? If, for example, one applies the averaging operator ... 1 ([Raz07, Section 2.2]) to (15), that is, taking the expected value of (15) over φ 1 0 , then one obtains the identity 0 = 0, as we have just mentioned. However, one can multiply both sides of (15) by some 1-flag F and then average. (In terms of graphs this corresponds to weighting vertices of G n proportionally to the density of F -subgraphs rooted at them.) What sufficed in [Raz07,Raz08] was to take F = K 1 2 . Denoting e = K 1 2 for convenience and rearranging terms, we get ([Raz08, (3.4)]): Applying the operator . . . E (averaging over φ E 0 ) directly to (14) is not useful. Namely, if we take a graph G ∈ H a , then the graph analog of (14) may have slack for edges that connect two larger parts; thus the obtained inequality will not be best possible. The trick in [Raz07] was first to multiply (14) by the E-flagP E 3 whose graph is the complement of the 3-vertex path. (Thus each edge of H a with slack gets weight 0.) We obtain ([Raz08, (3.5)]): We will also need the following identity which may be routinely checked (compare with [Raz08, Lemma 3.2]): where K s,t is the complete bipartite graph with part sizes s and t. (ThusK 1,3 is a triangle plus an isolated vertex.) Also, we have Now, if we apply φ 0 to (18) and (19) and combine with (16) and (17), then we obtain the following inequality (cf. [Raz08, (3.6)]): If φ 0 (K 1,3 ) = 0 and φ 0 (K 4 ) is equal to the limiting K 4 -density in H a , then the right-hand side of (20) is exactly h(a). Thus it remains to bound φ 0 (K 4 ) from below. In particular, we are already done if a ≤ 2/3 since every graph in H a has no (or very few) copies of K 4 ; this is what was done in [Raz07]. Of course, the result of Nikiforov [Nik11] who determined g 4 (a) for all a would suffice here but in order to prove our new Theorem 2.1 we need to analyze the argument of [Raz08] further. Following [Raz08, Page 612] define Then, for example, (15), which is an equality a.e., can be rewritten as Also, let us apply the averaging operator . . . E,1 to (14). Informally speaking, given the labeled vertex x 1 ∈ V (G n ), we pick the second labeled vertex x 2 uniformly at random and take the expectation of (14) multiplied by the indicator function of x 1 and x 2 being adjacent. Since K E 3 E,1 = K 1 3 and E E,1 = e, we get ([Raz08, (3.8)]) From (21) and (22) we obtain Now let us take any individual φ 1 ∈ Hom + (A 1 , R) for which (21)-(23) hold. Let see [Raz08,Page 612]. Informally, we take an arbitrary vertex x for G n and assume that the density of edges/triangles containing x satisfies (21)-(23). Then ψ correspond to taking the subgraph H n of G n induced by the neighborhood of x. For example, the edge density of H n can be calculated by taking the triangle density at x and multiplying it by n−1 In the flag algebra formalism this reads ([Raz08, (3.13)]) where following [Raz08, Page 612] we define Some calculations based on Goodman's bound (5) show that ([Raz08, (3.15)]) Summarizing (in the graph theory language): the degree of a typical x ∈ V (G n ) determines the edge density of G n [N (x)], the subgraph induced by the neighborhood N (x) of x. Moreover, this density is below 1 − 1 t + o(1). This give us a strategy for bounding the number of K 4 's in G n from below: use induction on t to bound the number of K 3 's in N (x) and then sum this over all x ∈ V (G n ) (and divide by 4). Unfortunately, this bound on ψ(K 3 ) involves radicals and it is not clear how to average it. Another complication is that t(ψ(K 2 )) may assume different values for different choices of φ 1 . Both of these difficulties are overcome by proving the following lower bound on φ 1 (K 1 4 ) = ψ(K 3 )(φ 1 (e)) 3 which is a linear function of φ 1 (e) that does not depend on t(ψ(K 2 )) ([Raz08, (3.24)]): where, for 1 ≤ s ≤ t − 1, η s is the unique root of the equation that lies in the interval [µ, 2µ], see [Raz08,(3.17)]. Thus the random extension φ 1 0 satisfies (28) a.e. and we can average it, obtaining a lower bound on φ 0 (K 4 ), which is Inequality (3.25) in [Raz08]. (Note that the expectation of φ 1 0 (K 1 4 ) is φ 0 (K 4 ).) It turns out that this lower bound, when substituted into (20) suffices for proving the desired conclusion b ≥ h(a). The derivations (also those of (28)) are rather messy and do not involve any genuine flag algebras calculations. So we omit them and refer the reader to [Raz08] for all details.
4 Proof of Theorem 2.1 All notation here is compatible with that of [Raz07,Raz08]. We exclusively work with the theory of graphs. As before, let 0, 1, and E denote the (unique) types with respectively 0, 1 and 2 (adjacent) vertices. Also, ρ def = K 2 ∈ F 0 2 and e def = K 1 2 ∈ F 1 2 are the (unique) 0-and 1-flags having two adjacent vertices.
Suppose first that a = 1 − 1 s for some integer s. Apply Theorem 1.3 to any sequence {G n } convergent to φ 0 , say with |V (G n )| = n, to conclude that G n is o(n 2 )-close to the Turán graph T s (n) in the edit distance. Clearly, when we change o(n 2 ) edges in G n , then the density of any fixed graph F changes by o(1) so φ 0 is still the limit of {G n }. Since the limit of {T s (n)} is in Φ, we are done in this case.
So let a lie in the open interval (1 − 1 t , 1 − 1 t+1 ). Let c be defined by (3). We assume that the reader is familiar with the proof in [Raz08] (that we sketched in Section 3) and we use the facts established there. In particular, we know that b = h(a). This gives some noticeable simplifications: Since any type σ is simply a labeled graph [Raz07, Section 2], after unlabeling its vertices, σ can be viewed as an element of F 0 . If φ 0 (σ) > 0, then the random extension φ σ 0 ∈ Hom + (A σ , R) is well-defined ([Raz08, Section 3.2]). Let S σ (φ 0 ) denote its support, that is, the smallest closed subset of Hom + (A σ , R) of measure 1. A useful property of the support is that if some closed property has measure 1, then every element of S σ (φ 0 ) has this property. We fix an arbitrary φ 1 ∈ S 1 (φ 0 ). Inequalities (21)-(23) hold a.e., thus φ 1 satisfies them. In particular, (23) simplifies to 0 < tc 2 ≤ φ 1 (e) ≤ tc < 1.
So, we can define ψ by (24). Let us prove that ψ is extremal (that is, has the smallest possible triangle density given its edge density).
The intuition behind the following claim is as follows. Identity (21) gives a linear relation between triangle and edge densities via a vertex. By Claim 4.1 we know that (21) also holds for the subgraph induced by the neighborhood of almost every vertex x ∈ V (G). If we average this for all choices of x, then we get some linear relation between the densities of K 4 , K 3 , and K 2 that has to hold for all extremal homomorphisms. Repeating we get a linear relation for K 5 , K 4 , and K 3 , and so on.
Proof. This is true for r = 3 as φ 0 (K 3 ) = g 3 (a). The general case follows from Claim 4.2 by induction on r.
The following lemma verifies Theorem 2.1 if φ 0 is the limit of complete partite graphs. Note the special role of c 0 : it corresponds to the proportion of vertices that are in parts of relative size o(1).
Proof. The claim is routine to establish, see the calculations in [Nik11, Section 3].
Suppose first that φ 0 (P 3 ) = 0, whereP 3 denotes the complement of the 3-vertex path. Take a sequence {G n } convergent to φ 0 . By the Induced Removal Lemma of Alon, Fischer, Krivelevich, and Szegedy [AFKS00], we can change o(|V (G n )| 2 ) adjacencies in G n and ensure that G n has noP 3 -subgraph. As it is well-known, a graph without an induced copy ofP 3 is complete multipartite. Here, Theorem 2.1 follows from Lemma 4.4.
So suppose that φ 0 (P 2 ) > 0. We need a few auxiliary results in this case. For a graph F ∈ F 0 ℓ , let F (1) ∈ F 1 ℓ+1 be the 1-flag obtained by adding a new vertex x that is connected to all vertices of F (i.e., taking the join F ∨ K 1 ) and labeling x as 1.
Inequality (17) is also used in the proof, so it has to be tight. Since we assumed that φ 0 (P 3 ) > 0, we have that φ 0 ( P E 3 K E 3 E ) > 0, whereP E 3 is the unique E-flag onP 3 . But The two graphs in Figure 1, called G 1 and G 2 , will play a special role.
Claim 4.6 φ 0 (G 1 ) = φ 0 (G 2 ) = 0. Proof. We apply the same strategy (although with much more involved calculations) as the one used to prove (34). Namely, we make up an analog of (20) that is tight on extremal homomorphisms and such that the "overall slackness" involved will cover G 1 and G 2 .
Form the element f E ∈ F E 4 as follows: where P E,c 4 , P E,b 4 , F E ∈ F E 4 are shown on Figure 2. Since (17) is tight, and F E contain P E 3 , this implies that (Recall that h t is just the restriction of h to the interval [1 − 1 t , 1 − 1 t+1 ] as defined by (13).) Thus, by (14), we can multiply the left-hand side of (35) by f E , obtaining a true inequality. If we apply the averaging operator . . . E to this new inequality, we get that Next, similarly to [Raz08, (3.4)] but multiplying [Raz08, (3.2)] (i.e. our formula (15) which is equality) by K 1 3 rather than by e, we obtain φ 0 (3 (K 1 3 ) 2 1 − 2h ′ t (a) eK 1 3 1 ) = b(3b − 2ah ′ t (a)).
Let φ ′ i be the limit of complete partite graphs that have the same edge and triangle density as φ i and let φ ′ 0 = ∨(φ ′ 1 , φ ′ 2 , α 1 , α 2 ). Then φ ′ 0 is also extremal and all the claims that we proved for φ 0 apply to φ ′ 0 . In particular, φ ′ 0 (K t+2 ) = 0. Thus φ ′ 0 is the limit of complete partite graphs with s ≤ t + 1 parts. Assume that s ≥ 3 for otherwise we are done. By Lemma 4.4, the parts have the correct ratios. We conclude that one of φ ′ i , say φ ′ 1 , has equal part sizes and thus φ ′ 1 = φ 1 . By induction, we have φ 2 ∈ Φ. The required structure of the join φ 0 routinely follows from Lemma 4.4.
This finishes the proof of Theorem 2.1.