The Nearest Unvisited Vertex Walk on Random Graphs

We revisit an old minor topic in algorithms, the deterministic walk on a finite graph which always moves toward the nearest unvisited vertex until every vertex is visited. There is an elementary connection between this cover time and ball-covering (metric entropy) measures. For some familiar models of random graphs, this connection allows the order of magnitude of the cover time to be deduced from first passage percolation estimates. Establishing sharper results seems a challenging problem.


Introduction
Consider a connected undirected graph G on n vertices, where the edges e have positive real lengths (e). Note we do not assume G is planar. And consider an entity -let's call it a robot -that can move at speed 1 along edges. There are many different rules one might specify for how the robot chooses which edge to take after reaching a vertex -for instance the "random walk" rule, to choose edge e with probability proportional to (e) or 1/ (e). One well-studied aspect of the random walk is the cover time, the time until every vertex has been visited -see [6] for references to special examples and surprisingly deep connections with other fields. This article instead concerns what we will call 1 the nearest unvisited vertex (NUV) walk, defined as follows. A path of edges has a length, the sum of edge-lengths, and the distance d(v, v * ) between vertices is the length of the shortest path. For simplicity assume all such distances are distinct, so the shortest path is unique. Now the NUV walk is the deterministic walk defined in words by after arriving at a vertex, next move at speed 1 along the path to the closest unvisited vertex and continue until every vertex has been visited. 2 In symbols, from initial vertex v 0 the vertices can be written v 0 , v 1 , v 2 , . . . , v n−1 in order of first visit; • The order of magnitude of L for a general graph?
• Sharper estimates of L for specific models of random graph?
• Structural properties of the NUV path in different contexts?
The first question has been studied in the context of TSP (travelling salesman problem) heuristics and robot motion, and a 2012 survey of the general area, under the name online graph exploration, is given in [15]. We record the key basic facts in section 2.
The main purpose of this paper is to point out that the connection with ball-covering enables (in some simple probability models) the order of magnitude of L to be deduced easily from known first passage percolation estimates. Details for the lattice with randomized edge-lengths, and the complete graph with randomized edge-lengths, are in section 4. Another purpose is to point out that the second and third questions above have apparently never been studied. The NUV rule on a deterministic graph is "fragile" in the sense that small changes in the length of an edge might affect a large proportion of the walk, and it is possible that introducing random edge-lengths might "smooth" the typical properties of the walk on a random graph. Note that as an algorithm the NUV walk is somewhat similar to the greedy (Prim's) algorithm for the MST (minimum spanning tree) in that both grow a connected graph one edge at at a time. Recall that for the MST there is an intrinsic criterion for whether an edge e is in the MST (if and only if there is no alternative path between endpoints all of whose edges are shorter than (e)) which enables a martingale proof [14] of the central limit theorem for the length L M ST for the Euclidean model in section 3 (complete graph on random points in the square). There is no such intrinsic criterion for the NUV walk, so to improve the order-of-magnitude result (Corollary 4 below) for L N U V in that model one would need some other kind of control over the geometry of the set of points visited before each step. One particular interesting model for random graphs is the "meanfield model of distance" (section 4.2) where the exact asymptotic constants for the lengths of the TSP tour and the MST (minimum spanning tree) are known: can they also be calculated for the NUV walk?

Relation with ball-covering
A basic mathematical observation is that L N U V is related to ball-covering 3 . Given r > 0 define N (r) = N (G, r) to be the minimal size of a set SS of vertices such that every vertex is within distance r from some element of SS. In other words, such that the union over s ∈ SS of the balls of radii r centered at s covers the entire graph.
Proof. In one direction the relation is almost obvious. As above, write the vertices as v 0 , v 1 , v 2 , . . . , v n−1 in order of first visit by the NUV walk, and say v i has rank i. Write ζ(v i ) = i−1 j=0 d(v j , v j+1 ) for the length of the walk up to v i . Select vertices (z(k), 0 ≤ k ≤ k * − 1) along the walk by selecting the first vertex at distance > r along the walk after the previous selected vertex. That is, z(k) = v I(k) where I(0) = 0 and for k ≥ 0 until no such i exists. By construction every vertex is within distance r of some z, and the number k * of selected vertices is at most 1 + L N U V /r. This establishes (i).
In the opposite direction, write D(v i ) = d(v i , v i+1 ) for the length of the path (which may encompass several edges) from the rank-i vertex to the rank-(i + 1) vertex, and D(v n−1 ) = 0. The argument rests upon the following simple observation. Figure 1 provides an illustration.
Lemma 2 Fix a vertex v * and a real r > 0, and consider the set of vertices within distance r from v * : Then D(v) ≤ 2r for all v ∈ B(v * , r) except perhaps for the vertexv of highest NUV-rank within B(v * , r).
Proof. When the NUV walk first visits v i ∈ B(v * , r) with v i =v, there is then some first unvisited vertexṽ on the minimum-length path from v i tō v, and so the final inequality using the triangle inequality via v * . Now by considering the set, say S(r), containing N (r) vertices, such that every vertex is within distance r from some element of S(r), Lemma 2 implies the number of vertices w with D(w) > 2r is at most N (r). (1) Because D(w) is bounded by the graph diameter ∆, which is equivalent to (ii).
Remarks. The simple formulation of Proposition 1 is more implicit than explicit in the literature we have found. Part (i) is a less sharp version of a more complex lemma used in [17] to prove Corollary 3 below. In the context of TSP or robot exploration heuristics the NUV algorithm is typically (e.g. in [10,12]) mentioned only briefly before continuing to better algorithms. From an algorithmic viewpoint, calculating N (r) on a general graph is not simple, so part (ii) of Proposition 1 is not so relevant, but as we see in section 4 it is very helpful in providing order-of-magnitude bounds for familiar models of random networks.

Two classical results
Two classical results follow readily from the formulation of Proposition 1.
Write L T SP = L T SP (G, v 0 ) for the length of the shortest walk starting from v 0 and visiting every vertex 4 . So L N U V ≥ L T SP and it is natural to ask how large the ratio can be. This was answered in [17].
Corollary 3 Let a(n) be the maximum, over all connected n-vertex graphs with edge lengths and all initial vertices, of the ratio L N U V /L T SP . Then a(n) = O(log n).
Proof. The argument for Proposition 1(i) is unchanged if we use the TSP path instead of the NUV path, so in fact gives the stronger result N (r) ≤ 1+ L T SP /r, 0 < r < ∞. Now apply Proposition 1(ii) and note that ∆ ≤ L T SP , so the second inequality by splitting the integral at r = L T SP /n.
There are examples to show that the O(log n) bound cannot be improved -see [12,10,9,17]. As noted in the elementary expository article [3], in constructing such an example the key point is to make the bound in Lemma 2 be tight, in the sense for various values of r with 1 Lr n there are distinguished vertices separated by distance r along the TSP path such that the NUV path from one to the next is order r One can make [10] such examples be planar, embedded in the plane with edge-lengths as Euclidean length, and edge-lengths constrained to a neighborhood of 1. But such constructions seem very artificial.
Here is the second classical result. See [18] for one proof and the early history of this result.
Corollary 4 There is a constant A such that, for the complete graph on n arbitrary points in the unit square, with Euclidean lengths, Note this implies the well known corresponding result L T SP ≤ An 1/2 .
Proof. In this context there is a numerical constant C such that N (r) ≤ C/r 2 , and so Proposition 1(ii) gives

The order of magnitude question
What is the size of L N U V for a typical graph? That is a very vague question, but let us attempt a discussion anyway. It is convenient to scale distances so that the typical distance from a vertex to its closest neighbor is order 1. Examples mentioned above show that L N U V can still be as large as order n log n. But intuition suggests that for natural examples L N U V is of order n rather than larger order. For this it is certainly necessary, but not sufficient, that the length L M ST of the MST 5 is O(n). Proposition 1(ii) provides a quantitative criterion: it is sufficient that N (r)/n is order r −α for some α > 1. Informally this corresponds to "dimension > 1", as illustrated in the examples in section 4.

Other questions in the deterministic setting
It is not clear what other results might hold for general graphs G. One can ask about the variability of L N U V (G, v) as v varies. Clearly it can be arbitrarily concentrated e.g. on the complete graph with edge-lengths arbitrarily close to 1. On the other hand, consider the linear graph G n on vertices {0, 1, . . . , n − 1} with slowly decreasing edge-lengths (i − 1, i) = 1−i/n 2 . Here there is a factor of 2 variability in L N U V (G, v) as v varies. We do not see any easy example with large variability, prompting the following question.
In this context it is perhaps more natural to extend the NUV walk to a tour which finally returns to its start. Note that in the linear graph example above, , so one can ask whether there there is a general bound for some average of One can also consider overlap of edges used in walks from different starts. Note that if two vertices are each other's nearest neighbor then every NUV walk uses their linking edge. One can ask how small can be the proportion of time spent by the walk started at v in edges used also by the walk started at v , though we hesitate to formulate a conjecture.

The three levels of randomness
Introducing randomness leads to different questions. There are three ways one can introduce randomness. One can simply randomize the starting vertex. This suggests the following conjecture, modifying Open Problem 5.
, where the initial vertex V is uniform random, is bounded over all finite graphs.
A second level of randomness is to start with a given deterministic G but then consider the random graph G in which the edge-lengths (e) are replaced by independent random lengths * (e) with Exponential(mean (e)) distribution. So here we have a random variable L * (G) = L N U V (G, V ) where again the initial vertex V is uniform random. In this model of random graphs G there are results [2] for first passage percolation which say that the percolation time is weakly concentrated 6 around its mean provided no single edge contributes non-negligibly to the total time. So one can ask whether a similar result holds for L * (G).
The third level of randomness involves more specific models of random graphs, which we will consider in the next sections.

Random points in the square
One very special model of random graph is to take the complete graph on n random (i.i.d. 7 uniform) points in the unit square, with Euclidean edgelengths. Figure 2 shows a realization of the corresponding NUV walk with n = 800 random points, and Table 1 shows some simulation data for the lengths L * n . The qualitative behavior seen in simulations corresponds to intuition: the walk starts to traverse through most (but not all) vertices in any small region, goes through different regions as some discrete analog of a space-filling curve, and near the end has to capture missed patches and 6 As in the weak law of large numbers. 7 Independent identically distributed  Table 1: Simulation data for lengths L * n in the random points in unit square model. Simulations and data in this model by Yechen Wang.
For discussion, to adhere to our scaling convention (distance to nearest neighbor is order 1) we take the square to have area n and write L n = n 1/2 L * n for the length of the NUV walk. Intuition, thinking of L n as the sum of n order-1 lengths, suggests there are limit constants Our small-scale simulation data suggests this holds in the present model with c ≈ 0.9 and σ ≈ 0.5. How generally this holds is a natural questions, and we defer further discussion to section 5. Corollary 4 implies EL n ≤ An, which is all that we know rigorously. But there are many questions one can ask. As well as the limits (2) one might conjecture there are concentration bounds and a Gaussian limit for n −1/2 (L n − EL n ). For TSP length, existence of a limit constant is known via subadditivity arguments [19,21] and concentration via now-classical Talagrand arguments, and for MST length the Gaussian limit is also known by martingale arguments [14]. Alas it seems hard to find any rigorous such arguments for the NUV walk. One might also bear in mind that, for the random walk cover time problem, the two-dimensional case is the hardest to analyze sharply, so this might also hold for the NUV walk.
In any of our models, by considering the length as L n (G n , V n ) for a uniform random starting vertex V n , we can consider the variance decomposition varL n = varE(L n |G n ) + Evar(L n |G n ) where the first term represents the variability due to the random graph and the second term represents the variability due to the starting vertex. In simulations of the present model, for n = 100 the two terms are roughly equal. Figure 3 superimposes the NUV walks from three different starts, in a realization of the present model, giving some impression of the extent of overlap.

Relation with first passage percolation
For graphs with i.i.d. random edge-lengths, one can seek to find the correct order of magnitude of L N U V by combining Proposition 1(ii) with known first passage percolation (FPP) results. Here is the basic example.

The 2-dimensional grid
Consider the random graph G m that is the m × m grid, that is the subgraph of the Euclidean lattice Z 2 , assigned i.i.d. edge-lengths (e) > 0. Because the shortest edge-length at a given vertex is Ω(1), clearly L N U V is Ω(m 2 ).  We conjecture that in fact m −2 L N U V (G m ) converges in probability to a constant, but we do not see any simple argument. Table 2 shows simulation data, where (e) has Exponential(1) distribution.  Table 2: Simulation data for lengths L(G m ) in the grid model.
Proof. For a vertex v of G m write B(v, r) for the random set of vertices v with d(v, v ) ≤ r, and write D(v, r) for the non-random set of vertices v with Euclidean distance ||v − v || ≤ r. Standard results for FPP on Z 2 going back to [13] (see [4] Theorem 3.41 for recent discussion) imply that there exist constants c 1 , c 2 , c 3 (depending on the distribution of (e)) such that The remainder of the proof is conceptually straightforward. Given large m and r, there is a set S(m, r) of at most a 1 m 2 /r 2 vertices of G m such that ∪ v∈S(m,r) D(v, r) covers G m , and note D(v, r) contains at most a 2 r 2 vertices; here a 1 and a 2 are absolute constants. By Markov's inequality and (3) the event that the number of v in S(m, r) such that D(v, r) ⊆ B(v, c 1 r)) exceeds a given s > 0 is at most a 1 m 2 r −2 c 2 exp(−c 3 r)/s. Apply this with s = m 2 r −2 exp(−c 3 r/2). Observe that, outside the event above, we can define a vertex-set S + (m, r) as the union of S(m, r) and all the vertices in all the discs D(v, r) with v ∈ S(m.r) and D(v, r) ⊆ B(v, c 1 r)), and then ∪ v∈S + (m,r) D(v, r) covers G m and S + (m, r) has cardinality at most n m (r) := a 1 m 2 /r 2 + sa 2 r 2 = a 1 m 2 /r 2 + a 2 m 2 exp(−c 3 r/2).

So we have shown
This holds for fixed r, but because N (G m , r) and n m (r) are decreasing in r we have inclusion of events, for j = 1, 2, . . .
Applying (4) and summing over j, where Φ depends on the distribution of (e) but not on m, and Φ(r 0 ) ↓ 0 as The central point is that the argument depends only on some bound like (3), which one expects to hold very generally in FPP-like settings in dimension > 1. For instance FPP on a large family of connected random geometric graphs is studied in [8] and it seems plausible that results from that topic can be used to prove that L N U V is O(n) on such n-vertex graphs.
The next example is infinite dimensional, and the bound (6) below will be the analog of the bound (3) above.

The mean-field model of distance
Take the complete graph on n vertices and assign to edges i.i.d. random weights with Exponential (mean n) lengths. This "mean-field model of distance" G n turns out to be surprisingly tractable, because the smallest edge-lengths at a given vertex are distributed (in the n → ∞ limit) as the points of a rate-1 Poisson point process on (0, ∞), and as regards short edges the graph is locally tree-like. A now classical result of Frieze [7] proves that the length of the MST is asymptotically ζ(3)n, and a remarkable result of Wästlund [20] formalizing ideas of Mézard -Parisi [16] shows that the length of the TSP path is asymptotically cn for an explicit constant c = 2.04..... Might it be possible to get a similar explicit result for the NUV length? Corollary 8 below gives the correct order of magnitude by essentially the same method as above.  Table 3: Simulation data for lengths L n in the mean-field model. and Table 3 is loosely consistent with that. As in section 3, by considering the length as L n (G n , V n ) for a uniform random starting vertex V n , we can consider the variance decomposition varL n = varE(L n |G n ) + Evar(L n |G n ) where the first term represents the variability due to the random graph and the second term represents the variability due to the starting vertex. In simulations with n = 100 the former variance term is around 30 times larger than the second term, consistent with the general conjectures (section 2.5) that the initial state v typically has little influence on L N U V (G, v).

Corollary 8
For the mean-field model of distance G n , the sequence (n −1 L N U V (G n ), n ≥ 2) is tight.
Proof. We first record a simple estimate. As before, for a vertex v ∈ [n] = {1, 2, . . . , n} write B n (v, r) = {v : d(v, v ) ≤ r} for the ball of radius r in G n . Conceptually we want to consider balls around s randomly chosen vertices, but by symmetry this is equivalent to using the first s vertices, which is notationally simpler. So define the vertex-set C n (s, r) = complement of ∪ i≤s B(i, r) and then by appending to [s] every vertex in that complement, The n → ∞ limit distribution of the process (|B n (v, r)|, 0 ≤ r < ∞) over a fixed r-interval is well known to be the standard Yule process (Y (r), 0 ≤ r < ∞) for which Y (r) has exactly Geometric(e −r ) distribution. (This is part of the theory surrounding the PWIT [1]). Choosing r 1 = 1 3 log n so that exp(r 1 ) = n 1/3 it is easy to see (birthday problem) that the distribution of (|B n (v, r)|, 0 ≤ r ≤ r 1 ) agrees with the distribution of (Y (r), 0 ≤ r ≤ r 1 ) outside an event A n (v) of probability δ n = O(n −1/4 ) → 0 as n → ∞. Summing over v, from (5) we can write, for r ≤ r 1 , N (G n , r) ≤ s n (r) + X n + Y n (r) where EX n ≤ nδ n and EY n (r) ≤ ne −r/2 .
For the tail of the integral, the diameter ∆ of G n is known [11] to be asymptotically 3 log n and so by monotonicity of N (r) N (G n , r) dr = O(n −1 · N (G n , r 1 ) · log n) → 0 in probability.
Because δ 1/2 n log n → 0 and n −1 N (G n , r) ≤ 1 for r ≤ r 0 , these bounds establish that the sequence n −1 ∆/2 0 N (G n , r) dr is tight which by Proposition 1(ii) implies the sequence (n −1 L N U V (G n ), n ≥ 2) is tight.

Final Remarks
Our results are conceptually merely consequences of Proposition 1, and further progress would require some other technique. One possible general approach is via local weak convergence [1,5]. Our three specific models each have local weak convergence limits (complete graph on a Poisson point process on the infinite plane; i.i.d. edge-lengths on the infinite lattice; the PWIT) and intuitively the conjectured limits lim n n −1 EL n are the mean step-lengths in an appropriately defined NUV walk on the limit infinite graph. Can this intuition be made rigorous?
In fact one expects the limits in our models to be collections of disjoint doubly-infinite walks which cover the graph. This relates to a longstanding folklore problem: for the NUV walk on the complete-graph Poisson point process on the infinite plane, estimate the number of never-visited vertices in the radius-r ball, as r → ∞.
For another possible direction of analysis, consider the Figure 1 sketch of one possible trajectory for the NUV path through a given ball. In general there will be many possible trajectories, depending on the graph outside the ball, but can one find restrictions on the possibilities, extending the obvious restriction if two vertices are each other's nearest neighbor, then every NUV walk, after visiting the first, immediately visits the second.
Intuitively, for 1 r 1 r 2 , given the subgraph in the ball B(v * , r 2 ), in a random graph there will typically be only a few possibilities for the NUV trajectory within B(v * , r 1 ).
A final issue involves the variance of L N U V is random graph models. We expect order n "each other's nearest neighbor" pairs, and then the randomness of edge-lengths suggests that the contribution to variance of L N U V from these edges alone must be at least order n (in our conventional scaling). However our small-scale simulation results in Tables 2 and 3 cast some doubt on this conjectured lower bound.