Short proofs for long induced paths

We present a modification of the Depth first search algorithm, suited for finding long induced paths. We use it to give simple proofs of the following results. We show that the induced size-Ramsey number of paths satisfies $\hat{R}_{\mathrm{ind}}(P_n)\leq 5\cdot 10^7n$, thus giving an explicit constant in the linear bound, improving the previous bound with a large constant from a regularity lemma argument by Haxell, Kohayakawa and {\L}uczak. We also provide a bound for the $k$-color version, showing that $\hat{R}_{\mathrm{ind}}^k(P_n)=O(k^3\log^4k)n$. Finally, we present a new short proof of the fact that the binomial random graph in the supercritical regime, $G(n,\frac{1+\varepsilon}{n})$, contains typically an induced path of length $\Theta(\varepsilon^2) n$.


Introduction
In this article, we give short proofs for two well known problems regarding finding long induced paths in random graphs.
The first problem concerns the induced size-Ramsey number of paths. For a graph H we define the k-colour induced size-Ramsey number of H, denoted byR k ind (H), as the smallest number m such that there exists a graph G on m edges such that for every k-colouring of the edges of G, there is a monochromatic copy of H which is an induced subgraph of G. In 1987, Graham and Rödl [15] asked if the induced size-Ramsey numbers of paths P n are linear in n (for any fixed number of colours). This was confirmed by Haxell, Kohayakawa and Łuczak [16], who showed thatR k ind (P n ) ≤ c k n for every fixed k. Their proof is quite technical and is based on the regularity lemma, hence the derived constants c k are astronomically large. We revisit this problem and give a short and rather simple proof of the fact that the induced size-Ramsey numbers of paths are linear. Moreover, we obtain an explicit absolute constant for the 2-colour version, and give a bound polynomial in k for c k in the general case, for any fixed number of colours k. Theorem 1.1.R ind (P n ) ≤ 5 · 10 7 n for all large enough n. Theorem 1.2.R k ind (P n ) = O(k 3 log 4 k)n. The second classical problem we address is finding a linear sized induced path in a binomial random graph G(n, p) in the supercritical regime, i.e. when p = 1+ε n for a sufficiently small positive constant ε. We give a short alternative proof of the following result, originally due to Suen [25]. Theorem 1.3. There exists a constant ε 0 > 0 such that for all positive ε < ε 0 , the random graph G ∼ G(n, 1+ε n ) with high probability 1 (whp) contains an induced path of length ε 2 n 5 . We remark that the dependency on ε is optimal: it is known that even the length of a (not necessarily induced) path is whp O(ε 2 )n (see, e.g., [17]). Let us also note that Suen's result is slightly stronger in the sense that 1/5 can be replaced by any constant smaller than 1.
We give some background on those two problems and prove the results above in Sections 3 and 4. The common tool which we use in our proofs is a modified version of the Depth first search (DFS) graph search algorithm. By nature, DFS is very suitable for finding long paths in graphs. Our version is tailored for finding long induced paths, specifically in graphs with certain local density conditions, hence it comes in handy for applications in random graphs. We will first present the proof of Theorem 1.3, where the DFS algorithm is used directly in the random graph G(n, p). Subsequently, in the proofs of Theorems 1.1 and 1.2, we apply it in a monochromatic subgraph of a random graph. Throughout, we treat large numbers like integers whenever this has no effect on the argument.

DFS for induced paths
In the standard DFS algorithm, we explore the vertices one by one, always following one branch as far as possible, before we start backtracking. Given a graph G, the idea is to keep track of three sets of vertices U, T, S, where T is the set of unexplored vertices, S is the set of vertices whose exploration is complete, and the remaining vertices U are kept in a stack. At every step we look at the vertex u which is the last one added to U , and try to find a neighbour t of u in T . If we succeed, we move t to U , and if not, we move u to S. It is easy to see that the vertices in U contain a spanning path in G[U ]. In our modified version of the DFS algorithm (described below), after finding t, we also check if t has any other neighbours in U except u, and if so, we move t to S. This ensures that U always spans an induced path in G, and makes the algorithm suitable for finding long induced paths in sparse expanders.
More precisely, our goal is the following. Given two graphs G ′ , G on the same vertex set and with G ′ ⊆ G, we want to find a long induced path in G, whose edges are all in G ′ . When G ′ = G this boils down to finding a long induced path in G ′ , but with the Ramsey question mentioned before in mind, it will be convenient for us to formulate the algorithm with two input graphs, so that in this specific instance, G ′ will be a monochromatic subgraph of the coloured host graph G. In applications we usually run the algorithm up to a certain stage, and by analyzing it we conclude that the input graphs contain a suitable induced path.
The algorithm is a graph search algorithm which visits all the vertices in the following manner. As input, it receives graphs G ′ = (V, E ′ ) and G = (V, E) with E ′ ⊆ E, and an ordering π of V . The algorithm maintains four sets of vertices U, T, S 1 and S 2 . The set T is the set of unvisited vertices, S 1 and S 2 are the sets of discarded vertices, while U = V \ (T ∪ S 1 ∪ S 2 ) is the set of remaining vertices which are kept in a stack (the last vertex to enter U is the first to leave). At every stage of the algorithm U will induce a path in G with all edges belonging also to G ′ . In the beginning we set S 1 = S 2 = U = ∅ and T = V , and we stop when U = T = ∅. The algorithm is carried out in rounds, and in each round we proceed as follows.
Beginning of round 1. If U is empty, we take the first vertex in T according to π, remove it from T and push it to U .
2. Otherwise, let u be the vertex on the top of the stack in U . Now we query T for vertices t ∈ T such that (u, t) is an edge in E ′ , by scanning T according to the ordering π. We have one of the following scenarios, given by Steps 3 and 4.
3. If an appropriate t is found, we query U \ {u} for vertices u ′ ∈ U \ {u} such that (t, u ′ ) ∈ E by scanning U \ {u} according to the ordering π.
(a) If all the answers are negative, remove t from T and push it to U .
(b) If we get at least one positive answer, remove t from T and add it to S 2 .
4. If no such t is found, remove u from U and add it to S 1 .

End of round
In order to explore all pairs of vertices in the graph, for technical reasons, we also query all the pairs in V which were not queried before for being in G ′ (in the first paragraph of the proof of Theorem 1.3 it becomes apparent why we want to query all pairs). This completes the algorithm.
The following properties of the algorithm will play an important role in analyzing it in the sections that follow.
(A) At every point all pairs between S 1 and T have been queried, and none of them is in E ′ .
(B) Every time we enter Step 3, the size of U ∪ S 1 ∪ S 2 increases by 1, and it never decreases.
(D) At every point of the algorithm, U induces a path in G with all its edges being also in G ′ .
Properties (A), (B) and (D) hold for immediate reasons, while (C) holds since every vertex which lands in S 2 has at least two neighbours in G in the current U , and all the vertices in this current U either stay at U or go to S 1 . Now we give a result which shows that given two graphs G ′ ⊆ G, if G satisfies a local density condition, and G ′ has a certain expansion property, then G ′ contains a long path induced in G. It follows from analyzing our modified DFS algorithm, and it will be used to prove our Ramsey results.
Given a graph G and a subset of vertices S, we denote by N G (S) the external neighbourhood of S, that is, the set of vertices outside S which have a neighbour in S.
Theorem 2.1. Let G, G ′ be graphs on the same vertex set with G ′ ⊆ G, and let s 1 , s 2 and ℓ be positive integers such that for every set of vertices S the following hold: If |V (G)| ≥ ℓ + s 1 + s 2 , then G ′ contains a path of length ℓ which is an induced path in G.
Proof. In order to find the path, we run the algorithm described above with input graphs G ′ and G, with an arbitrary ordering of their vertices π. Let us show that at the first point when either |S 1 | = s 1 or |S 2 | = s 2 , U induces a path of length ℓ (observe that such a point exists by the lower bound on |V (G)|). By (D), U always induces a path in G and all edges in the path are in G ′ ; so this would give us precisely the path from the statement. Suppose for the sake of contradiction that |U | ≤ ℓ.

Long induced paths in the supercritical regime
In this section, we prove Theorem 1.3. Determining the order of a largest induced path/tree in a random graph is a well-known problem with a long history [7,9,10,11,13,14,21,23,24,25]. Frieze and Jackson [13] showed that for every sufficiently large d, there exists a constant α(d) > 0 such that whp the random graph G(n, d/n) contains an induced path of length α(d)n. Łuczak [23] and independently Suen [25] showed that one can take α(d) ∼ log d d as d → ∞. This is optimal up to a factor 2, as can be seen by a simple first moment calculation. Recently, the authors [4] obtained this "missing" factor in the lower bound, thus showing that whp the length of a longest induced path is asymptotically 2n d log d. Here, we consider the case when d is close to 1 (the so-called supercritical regime). Łuczak [23] and Suen [25] also showed that for any constant d > 1 whp there is an induced path of linear length, thus answering a question of Frieze and Jackson [13]. In particular, Suen [25] showed that one can take α(d) to be any constant smaller than d −1 d 1 1−y(ξ) ξ dξ, where y(ξ) is the smallest positive root of y = e ξ(y−1) . From this, one can derive Theorem 1.3. Our goal here is to present a simple proof of this result. Suen's proof is also based on a version of the DFS algorithm; in particular, he uses it to find large m-ary trees, and then he shows that the depth of one of the trees is large enough to guarantee a long path. Our version of the algorithm, combined with local density considerations, makes the analysis shorter and more straightforward.
We will need the following (rather standard) definition, which helps us quantify how far the components of a graph are from being trees. If G has more than one connected component then let exc(G) be the sum of the excesses of each of its components.
The excess of a random graph in the supercritical regime typically comes overwhelmingly from the excess of its giant component, while the typical size of the giant component in terms of number of edges and number of vertices is well understood. We will use the following lemma (see, for example, Theorems 2.14 and 2.18 in [12], and set c = 1 + ε, for small enough ε).
We are now ready to prove Theorem 1.3. The argument follows closely that of Krivelevich and Sudakov [20] in the non-induced case. One key idea is to construct the random graph "on the fly" while the DFS algorithm is executed. The source of randomness is a sequence of independent Bernoulli random variables which is used to answer the queries made by the algorithm. We use the same notation as in Section 2.
Proof of Theorem 1.3. We will run the algorithm defined in the previous section with G ′ = G and an arbitrary ordering of the vertices π. We feed the algorithm with a sequence of i.i.d. random variables {X i } i∈N which follow a Bernoulli distribution with mean p = 1+ε n , where N = n 2 , so that the i-th new query of the algorithm is answered positively when X i = 1, and otherwise negatively. By new query, we mean a query made to a pair which has not yet been queried before (as in the third step we might query an already exposed pair, and there we just take its previous answer). Therefore, the explored graph obviously follows the distribution of G(n, p), so our problem boils down to studying the properties of the random sequence {X i } i∈N .
First, let us show that the number of vertices in S 2 is always at most the excess of G. Each vertex in S 2 , before leaving T , was adjacent to at least two vertices on the path induced by U , so it contributes at least one to the excess of G, as it adds one vertex but at least two edges to its own connected component. Crucially, notice that the sets of contributing edges for each vertex S 2 are disjoint, as the at least two neighbouring vertices in U are never added to S 2 . Since whp exc(G) ≤ ε 3 n, we have the same bound on |S 2 | whp.
Suppose for the sake of contradiction that we always have |U | ≤ ε 2 n 5 . For the analysis of the algorithm, we will focus on the pairs (u, t) which were queried when u was in U and t was in T , i.e. the pairs queried in Step 2 of the algorithm. Let us show that whp at the point when we queried N 0 := εn 2 2 pairs of this type, then U is of size at least ε 2 n 5 + 1, which would mean we are done by (D). Observe that we can assume that at some point we queried N 0 pairs of the mentioned type; indeed, when say |T | = n/2, by (A), we queried at least |T ||S 1 | = n 2 ( n 2 − |S 2 | − |U |) > n 2 8 such pairs. Now, we observe that when we have queried N 0 pairs of the mentioned type, then |S 1 ∪ S 2 | < n/3; if this is not the case, then at some point before we must have had |S 1 ∪ S 2 | = n/3. Since |T | = n − |S 1 | − |S 2 | − |U | > n/2, by (A) we have queried more than |T ||S 1 | > |T |(n/3 − ε 3 n) > n 2 /10 pairs of the observed type, which is larger than N 0 , a contradiction. So |S 1 ∪ S 2 | < n/3. When we queried precisely N 0 of our pairs (in Step 2), the expected number of positive answers among them is ε(1+ε)n 2 , hence, using Chernoff bounds we whp get at least ε(1+ε)n 2 − n 2/3 edges among the queried pairs, and hence at least this many vertices in U ∪ S 1 ∪ S 2 , thanks to Property (B). Hence, we also have |S 1 | ≥ ε(1+ε)n 2 − n 2/3 − ε 2 n 5 − ε 3 n. By (A) we have at least |S 1 ||T | = |S 1 |(n − |S 1 | − |S 2 | − |U |) queried pairs of the observed type, so we have: (where the second inequality uses ε(1+ε)n 2 − n 2/3 − ε 2 n 5 − ε 3 n ≤ |S 1 | < n/3, so the product grows with |S 1 |), contradicting the assumption on N 0 for all small enough ε > 0, which completes the proof.

Induced size-Ramsey number of paths
The size-Ramsey number of H, denoted byR(H), is the smallest number m such that there exists a graph G on m edges with the property that for every 2-colouring of the edges of G, there is a monochromatic copy of H in G. This notion was introduced by Erdős, Faudree, Rousseau and Schelp [8], and over the past few decades there has been a lot of research devoted to studying this and other related Ramsey functions. One of the classical problems posed by Erdős was to determine the order of magnitude ofR(P n ), and he actually conjectured thatR (Pn) n → ∞, which was disproved by Beck [2] who showedR(P n ) = O(n). Since then, there has been a series of papers concerned with giving more precise bounds onR(P n ); for lower bounds see [1,2,3,6], and for upper bounds see [2,3,5,6,22]. The current records for lower and upper bounds are given by Bal and DeBiasio [1], and by Dudek and Prałat [6], respectively: (3.75 + o(1))n ≤R(P n ) ≤ 74n.
For the k-colour version of the size-Ramsey number of paths, almost tight asymptotic bounds are known in terms of k [5,6,18,19]: Concerning the induced size-Ramsey number of paths, Haxell, Kohayakawa and Łuczak [16] showed thatR k ind (P n ) is linear for any fixed k, but no reasonably small constant can be extracted from their proof even if k = 2, as it relies on the regularity lemma. 2 We improve upon this considerably, showing thatR 2 ind (P n ) ≤ 5 · 10 7 n andR k ind (P n ) ≤ O(k 3 log 4 k)n. As in previous proofs, our "host graph" will be a sparse random graph G(n, c/n), where c is a sufficiently large constant. We have already seen in the last section that whp there is an induced path of linear length. The additional challenge here is to guarantee such a path even if an adversary may delete half of the edges, say. Fortunately, the DFS algorithm presented in Section 2 is very robust and does not require the full randomness of the host graph, but performs well in "locally sparse" graphs with a mild expansion property (cf. Theorem 2.1). After a simple cleaning step, we can always guarantee such a pseudorandom graph in the densest colour class. Hence, our results are density-type results, i.e. we prove that a subset of edges forming an appropriate percentage of the whole graph contains a long path induced in the host graph.

The two-colour result
We first show a simple lemma which collects several useful properties of a random graph with parameters tailored for the proof of Theorem 1.1. 1. Every vertex set S of size at most 196n 10 7 spans less than 12 7 |S| edges. 2. Every two disjoint vertex sets S, T of sizes |S| = 21n 10 7 and |T | ≤ 175n 10 7 satisfy e(S, T ) < 95 7 |S|. 3. G has (1 + o(1))32n edges and Θ(n) isolated vertices.

Proof.
1. Let p = 64/n and let t = 196n 10 7 . We bound the probability of the existence of a set S of size at most t which spans at least 12 7 |S| edges by using the following simple union bound i≤t n i respectively (note that it suffices to consider sets T of size exactly t). We get that the probability of a bad outcome is at most n s n t ts 95s/7 p 95s/7 ≤ 10 7 ne 21n 3. These are standard facts, so we omit the proofs.
Proof of Theorem 1.1.
For large enough n, let G be a fixed graph on n vertices which satisfies all the properties given by Lemma 4.1. Let ℓ = 7n 10 7 , s 1 = 3ℓ, and s 2 = 24ℓ; these are the parameters which we will use when applying Theorem 2.1.
Consider an arbitrary 2-colouring of G and let G 1 be the subgraph induced by the majority colour (and containing no isolated vertices); note that G 1 is of order at most (1 − ε)n for some fixed ε > 0, and has at least (1 − o(1))16n edges. Let G ′ be the graph obtained from G 1 by successively removing vertices of degree at most 16, for as long as there are such vertices. G ′ is not empty, as otherwise G 1 contains at most (1 − ε)16n edges. Furthermore, we have that |E(G ′ )| ≥ δ(G ′ )|V (G ′ )|/2 ≥ 17 2 |V (G ′ )|, so by Property 1 of Lemma 4.1 we have that |V (G ′ )| > 196n 10 7 = ℓ + s 1 + s 2 . We will apply Theorem 2.1 to the graphs G ′ and G[V (G ′ )]. Notice that Property 1 from Lemma 4.1 translates directly to the first condition of Theorem 2.1; let us now show that the second condition is also satisfied. Suppose towards a contradiction that there is a set S ⊆ V (G ′ ) such that |S| = s 1 = 21n 10 7 and |N G ′ (S)| < s 2 + ℓ = 175n 10 7 . Note that e G (S, N G ′ (S)) ≥ δ(G ′ )|S| − 2 · e G ′ (S) ≥ 17|S| − 2 12 7 |S| ≥ 95 7 |S|, where the second inequality follows from Property 1; this gives a contradiction with Property 2 applied to the sets S and N G ′ (S). So we can apply Theorem 2.1, and find the required monochromatic path in G ′ , which is induced in G. Since G has at most (1 + o(1))32n edges, and we can always find an induced path of length ℓ = 7n 10 7 in any 2-colouring of E(G), this gives the required bound onR ind (P 7n/10 7 ).

The multicolour result
In this section we again show an auxiliary lemma about random graphs with certain parameters, which is then used to prove Theorem 1.2.