The Power of Two Choices for Random Walks

We apply the power-of-two-choices paradigm to a random walk on a graph: rather than moving to a uniform random neighbour at each step, a controller is allowed to choose from two independent uniform random neighbours. We prove that this allows the controller to significantly accelerate the hitting and cover times in several natural graph classes. In particular, we show that the cover time becomes linear in the number $n$ of vertices on discrete tori and bounded degree trees, of order $\mathcal{O}(n \log \log n)$ on bounded degree expanders, and of order $\mathcal{O}(n (\log \log n)^2)$ on the Erd\H{o}s-R\'{e}nyi random graph in a certain sparsely connected regime. We also consider the algorithmic question of computing an optimal strategy, and prove a dichotomy in efficiency between computing strategies for hitting and cover times.


Introduction
The power of choice paradigm asserts that when a random process is offered a choice between two or more uniformly selected options, as opposed to being supplied with just one, then a series of choices can be made to improve the overall performance. This idea was first applied to the 'balls into bins' model [5,9,31], where it was proved that the power of choice decreases the maximum load from log n log log n to log log n when assigning n balls to n bins. The power of choice was later extensively studied for random graphs under the broader class of rule-based random graph processes, known as Achlioptas processes, see for example [1,10,11,33,34] and references therein. The power of choice has also been studied with regard to the Preferential Attachment process for growing a random connected graph; in this context, the choices may have a powerful effect on the degree distribution, see e.g. [24,30].
In this paper, we extend the power-of-two-choices paradigm to random walk on a graph. We show that for many natural classes of graphs, this results in a significant speed-up of the cover and hitting times, which are the expected times to visit all vertices or any fixed vertex from a worst-case start vertex. We study the choice random walk (CRW), which at every step is offered two uniformly random independently sampled neighbours (with repetition) of the current location and (with full knowledge of the graph) must choose one as the next step; see Section 2 for more details. We prove that the cover time of CRW decreases to (n) for grids (i.e. finite quotients of Z d ) and bounded degree trees on n vertices, and that the cover time of expander graphs decreases to O n log log n . We note that for simple random walk (SRW) these cover times are all (n log n) and some are (n 2 ) [2]. We also consider computational questions relating to choosing a good strategy: we show that an optimal strategy for minimising a hitting time can be computed in polynomial time, but choosing an optimal strategy for minimising the cover time is NP-hard. See Section 1.2 for more details and other results.
Part of our motivation is to improve the efficiency of random walks used in algorithmic applications such as searching, routing, self-stabilization and query processing in wireless networks, peer-to-peer networks and other distributed systems. One practical setting where routing using the power of choice walk may be advantageous is in relatively slowly evolving dynamic networks such as the internet. For example, say a packet has a target destination v and each node stores a pointer to a neighbour which it believes leads most directly to v. If this network is perturbed, then the deterministic scheme may get stuck in 'dead ends' , whereas a random walk would avoid this fate. The CRW which prefers edges pointed to by a node may be the best of both worlds as it would also avoid traps but may see a speed-up over the SRW when the original paths are still largely intact.

Related literature
To the best of our knowledge, Avin and Krishnamachari [3] were the first to apply the principle of the power of choice to random walks. However, their version only considers a simple choice rule where the vertex with fewer previous visits is always preferred, and ties are broken randomly. This is in the spirit of balanced allocations, the origin of the power-of-two-choices paradigm. Their results are mainly empirical and suggest a decrease in the variance of the cover time, and a significant improvement in visit load balancing. This is related to the greedy random walk of Orenshtein and Shinkar [32], which chooses uniformly from adjacent vertices that have not yet been visited (if possible). This model is well studied for expanders [8,17]. The power of choice has also been studied in the context of deterministic random walks and the rotor-router model [7,18].
Perhaps closest to our work, Azar, Broder, Karlin, Linial and Phillips [4] introduced the ε-biased random walk (ε-BRW) where at each step with probability ε > 0 a controller can choose a neighbour of the current vertex to move to, otherwise a uniformly random one is selected. The model is quite similar to ours in the sense that the controller has full knowledge of the graph when choosing a neighbour. They obtained bounds on the stationary probabilities and show that optimal strategies for maximising or minimising stationary probabilities or hitting times can be computed in polynomial time. There is some overlap with our results in Section 7, where in particular Theorem 7.4 uses a clever substitution from [4] to express an optimisation problem as a linear program. One major difference is that Azar, Broder, Karlin, Linial and Phillips restrict their study to time-independent strategies and do not investigate cover times. Three of the authors of this paper have recently extended [4] to the time-dependent setting and studied cover times for ε-BRWs [25]. The conference paper [22] collects some of our results on the CRW from here and on the ε-BRW from [25] giving a comparison between the two processes.
Azar, Broder, Karlin, Linial and Phillips [4] suggest that the most natural choice of bias for the ε-BRW is ε = (1/d max ), where d max is the maximum degree. It is shown in [22,Prop. 1] that the CRW can emulate the ε-BRW provided ε 1/d max . However, the reverse does not hold unless the bias ε is close to 1; the main obstacle is that avoiding a particular next step is much more difficult for the ε-BRW. Further evidence that the CRW is more powerful than the ε-BRW is in the cover time bounds we prove for the CRW in Theorem 6.1 and for the time-dependent version of the ε-BRW in [25,Thm. 3.2]. For the most natural choice ε = (1/d max ), these bounds differ by a factor which is almost linear in d max , suggesting that the CRW deals better with high-degree graphs than the ε-BRW.
With regard to complexity questions, we note that for the SRW, hitting times can be expressed as the solution to a set of n linear equations and can therefore be computed in polynomial time. Determining the complexity of computing the cover time, however, is far more challenging and still remains open [2,Open Problem 6.35]. Significant progress was made by Ding, Lee and Peres [20] who discovered a deterministic polynomial time O(1)-approximation algorithm for the cover time. In this paper, we show that computing an optimal strategy for the cover time of the CRW is NP-hard.

Our results
In this section, we shall present the main results we have obtained for CRW. The numbers of these theorems correspond to where they appear in the paper, although some theorem statements have been simplified for ease of exposition.
The CRW is not reversible in general; however, we show that it can emulate certain reversible chains. Combining this with the well-known connection between electrical networks and reversible Markov chains, we obtain the following general bound on the maximum hitting time t two hit (G) between any two vertices of a graph G. Theorem 1.1. For any finite graph G, we have t two hit (G) < min{3|E|, n 2 }. This is tight up to constants at both ends of the density spectrum and improves considerably over the well-known O(n|E|) worst-case bound for the SRW. A witness to tightness for sparse graphs is traversing a path from end to end, and for dense graphs hitting a vertex connected by a single edge to a clique.
Most of this paper focuses on the cover time t two cov (G) for CRW on a graph G under an optimal strategy. For the SRW t hit , the maximum hitting time between any two vertices determines the cover time up to a log n factor by Matthew's bound [28,Ch. 11.2]. However, due to the effect of the choices, this does not apply to the CRW and so we develop other methods to bound t two cov . The next result implies that t two cov (T) is linear for a bounded degree tree T: Theorem 1.2. For every d ∈ N and every n-vertex tree T with maximum degree d, we have t two cov (T) 8dn. Our strategy for achieving this changes with time and covers the vertices of T in a prescribed order.
Next, we obtain a similar result for d-dimensional grids and tori. The proof technique is different: we show that there exists a CRW strategy for the infinite d-dimensional grid under which the CRW becomes strongly recurrent. In particular, the expected crossing time of any edge is finite. We use this to deduce Theorem 1.3. For any d, and any d-dimensional n-vertex torus or grid G, we have t two cov (G) = (n) and t two hit (G) = (diam (G)) = n 1/d .
Avin and Krishnamachari [3] conjecture a speed-up for their aforementioned local power of two choice walk on the two-dimensional grid. Theorem 1.3 corroborates this for our global version of the process but does not yet prove their conjecture.
We develop a method for boosting the probabilities of rare events in the CRW, which gives bounds on hitting and cover times. Perhaps the most important application of these methods is to expander graphs: Theorem 1.4. For every sequence (G n ) n∈N of bounded degree expanders, where G n has n vertices, we have t two cov (G n ) = O n log log n . Theorem 1.4 is in fact an immediate corollary of a more general bound (Theorem 6.1), bounding t two cov (G) in terms of the hitting time (of the SRW), relaxation time and degree discrepancy. In particular, these bounds apply w.h.p. to the random d-regular graph for fixed d. Another application of these methods gives the following bounds for the Erdős-Rényi random graph, showing a significant improvement on cover time for the regime with subpolynomial growth of the average degree. p) where np c ln n for any fixed c > 1 and log np = o log n . Then w.h.p.
Finally, Section 7 deals with the computational complexity of computing optimal strategies to minimise hitting and cover times. We show the following dichotomy: an optimal strategy to hit a set of vertices can be computed efficiently, whereas choosing between two cover time strategies is NP-hard. More precisely, we have Theorem 1.6. For any graph G, S ⊂ V and x ∈ V \ S, a strategy minimising the hitting time of S from x can be computed in time poly (|V|).
Notice that any strategy for covering a graph must specify a set of choice preferences from every vertex for every possible set of vertices covered and thus may have size exponential in n. This makes the second half of the dichotomy as phrased above sound somewhat modest. However, what we show is that even in the 'on-line' setting where one is given the set covered so far then just choosing the next step (outputting something of polynomial size) is NP-hard. Theorem 1.7. Given the covered set X and position v of the walk at some time, it is NP-hard to choose the next step from two neighbours of v so as to minimise the expected time for the CRW to visit every vertex not in X, assuming an optimal strategy is followed thereafter.
Our proof shows that this remains NP-hard if G is constrained to have maximum degree 3. To the best of our knowledge, this is the first intractability result for processes involving random walks with choice.
Our results for fundamental graph topologies are summarised in Table 1, along with the corresponding hitting and cover times for the SRW for ease of comparison.

Preliminaries
The CRW is a discrete time stochastic process (X t ) t 0 on the vertices of a connected graph G = (V, E), influenced by a controller. The starting state is a fixed vertex; at each time t ∈ N the controller is presented with two neighbours {c t 1 , c t 2 } of the current state X t chosen uniformly at random with replacement and must choose one of these neighbours as the next state X t+1 . We assume that at each time t the controller knows the graph G, its current position X t ∈ V, and H t : , the history of the process so far. The controller has access to arbitrary computational resources and an infinite string of random bits ω in order to choose X t+1 from {c t 1 , c t 2 }. A CRW strategy is a function which given any t, H t and {c t 1 , c t 2 } ⊆ (X t ), outputs one Complete graph n − 1 We say that a CRW strategy is unchanging if it is independent of both time and the history of the walk. We say that an unchanging strategy is reversible if the Markov chain it defines is reversible. We recall that any reversible Markov chain is identically distributed with a random walk on an edge-weighted graph as explained, for example, in [2], we shall make use of this representation. For many graphs with a high degree of symmetry, we can find good reversible strategies, and we can then use tools from the theory of reversible Markov chains to analyse the CRW on these graphs. The strategies we consider may use random bits in addition to those used for choosing {c t 1 , c t 2 }; we say a strategy is deterministic if no additional random bits are used. If we are trying to minimise the expected hitting time of a given vertex, it is easy to see that there is an unchanging, deterministic optimal strategy. However, it need not be reversible; an example where it is not is given in Figure 1. We shall use reversible strategies to bound the hitting time of the optimal strategy; these will also in general not be deterministic.
For a strategy α and for a vertex v and distinct neighbours i, j let α j v,i ∈ [0, 1] be the probability that when the walk is at v it chooses i when offered {i, j} as choices, that is, α j v,i := P X t+1 = i | X t = v, c t = {i, j} (this probability is also conditional on H t but we suppress this for notational convenience). These are the only parameters we may vary, but we shall find it convenient to define α i v,i := 1/2 for each i adjacent to v. Thus, The transition probabilities q v,i for the strategy α are then given by: For a family of parameters α j v,i to yield a valid set of transition probabilities q i,j they must satisfy for every v ∈ V. Notice that any weights satisfying (1) also satisfy (3). Let C two v (G) denote the minimum expected time (taken over all strategies) for the CRW to visit every vertex of G starting from v and define the cover time t two cov (G) := max v∈V C two v (G). Analogously, let H two x (y) denote the minimum expected time for the CRW to reach y, which may be a single vertex or a set of vertices, starting from a vertex x and define the hitting time t two hit (G): = max x,y∈V H two x (y). We drop the superscript from this notation when referring to the associated quantities for the SRW.

Bounds from weighted graphs
In this section, we analyse CRW strategies which emulate a random walk on a weighted graph. We prove a tight general bound on hitting times and show that any vertex of a graph with maximum degree 3 can be hit in time proportional to its distance from the start vertex.

An extremal hitting time result
In this section, we prove that t two hit (G) = O(e(G)) for an arbitrary graph G, where e(G) is the number of edges. This bound is best possible up to the implied constants: for sparse graphs, the path has t two hit around 2e(G). For dense graphs, a clique with a pendant path, where the length of the path is growing much slower than the size of the clique, gives t two hit around 3n 2 /8.

Lemma 3.1. Fix a vertex v, and partition its neighbours into two sets, A and B.
There is an unchanging strategy for the CRW such that whenever the walker is at v, it moves to a random neighbour according to the probability distribution in which every vertex in B is twice as likely as every vertex in A.
Proof. Fix some number p ∈ [0, 1] and consider the following strategy for moving from v. If offered two choices from the same set, choose between them uniformly at random, but if offered one choice from A and one choice from B, choose the one from A with probability p. Clearly, all elements of A are equiprobable, as are all elements of B, so it is sufficient to show that for some p this strategy chooses an element of A with probability q = |A| |A|+2|B| . If this is the case, each element of A will be chosen with probability 1 |A|+2|B| and each element of B w.p. Since (|A| + |B|) 2 |A|(|A| + 2|B|), we have q q, and hence for some p ∈ [0, 1/2] we have the required probability by continuity.
By considering the strategy at each vertex separately, we immediately get the following consequence.
Corollary 3.2. Let G = (V, E) be a locally finite weighted graph with weight function w:E → R + , having the property that for any two incident edges xy, xz either w(xy) = w(xz), or w(xy) = 2w(xz), or 2w(xy) = w(xz). Then there is an unchanging strategy for the CRW on G which simulates the random walk defined by the weights w.
Here, by the random walk defined by the weights w, we mean the reversible Markov chain where the transition probability from a vertex x to a neighbour y is proportional to w(xy). For a weighted graph (G, w), write w(G) = e∈E(G) w(e). Lemma 3.3. Let (G, w) be a finite weighted graph, and let x be a vertex such that every edge incident with x has weight 1. Then for any vertex y adjacent to x, we have Proof. Since the stationary distribution is given by We now restate and prove our result for CRW hitting times.
Proof. We have to show that the above bounds apply to H two y (x) for two arbitrary vertices x, y. Define a weight function w: ) . Note that w satisfies the requirements of Corollary 3.2, so we can bound H two y (x) by the corresponding hitting time of the random walk on (G, w). We will now bound the latter hitting time.
Write d for the maximum distance of a vertex from x, and V k for the set of vertices at distance exactly k from x. Note that if y ∈ V k+1 then For each 0 k d − 1, let G k be the simple weighted graph obtained by deleting i<k V i and identifying vertices in V k to give a vertex v k ; if a vertex in V k+1 has multiple edges to V k , delete all but one of them to leave a simple graph. Since removing edges between V k+1 and V k cannot reduce the hitting time of V k , we have for any . Note that the latter hitting time is unchanged by multiplying all weights by 2 k , and since every z ∈ V k+1 is adjacent to If e is an edge between V j and V j+1 , then the contribution of e to the kth term of the above sum is 2 k−j+1 if k < j, at most 1 if k = j and 0 otherwise, so its total contribution is less than 3 and is less than 2 if e is one of the edges deleted to make G j simple. If e is an edge within V j , then its contribution to the kth term is 2 k−j+1 if k < j and 0 otherwise, so its total contribution is less than 2. The first bound follows. Note that of the edges of the first type which are not deleted, there is exactly one from each vertex (other than x) to a vertex in a lower layer of G, and so these edges form a tree. Thus, there are n − 1 such edges, whose contribution is bounded by 3, and at most n 2 − (n − 1) other edges, whose contribution is bounded by 2, giving a bound of 2 n 2 + n − 1 = n 2 − 1.

Cover times of subcubic graphs
In this section, we prove the CRW cover time of any subcubic graph is linear in the number of vertices, where we remind the reader that a subcubic graph is a graph with maximum degree 3.

Proposition 3.4.
Let G be any connected graph of maximum degree 3. Then, H two u (v) 9 for any uv ∈ E(G). If in addition G is finite with n vertices, then t two cov (G) = (n).
If G has n vertices, let v be any vertex and choose a spanning walk in the graph starting at v and having at most 2n − 3 edges. Such a walk always exists, for example, a depth-first exploration of a spanning tree. Proceed in 2n − 3 rounds, in each round using the strategy above to hit the next vertex of the walk. Each round has expected duration at most 9 by (4), and so t two cov (G) 18n − 27.
Remark. Since t two hit (G) t two cov (G), this is also linear. Even for 3-regular graphs the diameter could grow linearly, so this is the best possible.

Trees
In this section, we show that t two cov (T) = (n) for trees T of bounded degree. Even more, we will prove that we can even specify an arbitrary (closed) walk W traversing each edge of T once in each direction and cover the vertices of T in the order dictated by W in linear expected time. This is the gist of the following result:

Theorem 1.2 For every d ∈ N and every tree T with maximum degree d, we have x,y∈V(T),xy∈E(T)
This result will be proved by realising a strategy to cover T as a sequence of weighted walks, and then bounding the hitting times in these walks using the Essential Edge Lemma. We shall now remind the reader of the setting and statement of this lemma: we say that an edge vx of a graph is essential if its removal would disconnect the graph, into two components A(v, x) and A(x, v), say, containing v and x, respectively. Let E(v, x) be the set of edges of A(v, x).
is the hitting time of the reversible Markov chain defined by these edge weights.
We now define the CRW strategies we use in the proof of Theorem 1.2. Given a tree T, we pick an arbitrary 'root' vertex r ∈ V(T). In order to obtain an upper bound on H two x (y) for x, y ∈ V(T) such that xy ∈ E(T), we follow the (unchanging) strategy σ xy making the following choices at each vertex v: Reduce the distance to y if possible. Otherwise, choose uniformly an option that increases distance to r if at least one is available.
In other words, σ xy prefers the unique neighbour w of v with d(w, y) < d(v, y), avoids the unique neighbour z with d(z, r) < d(v, r) and is indifferent among all other neighbours of v. We emphasise that r was an arbitrary vertex, but it is important for our calculations below that it is fixed for all σ xy , x, y ∈ V(T).
Since the strategy σ xy is unchanging, there is an assignment of weights w x,y (e), e ∈ E(T) such that the corresponding random walk (as defined after Corollary 3.2) is equidistributed with the CRW under strategy σ xy when both walks start at x and stop when first visiting y. These weights can be multiplied by any positive constant without changing the random walk they define, and we normalise by fixing w x,y (xy) = 1 for concreteness. The rest of the weights can be calculated explicitly, and so we can apply the Lemma 4.1 to give the bound: with the understanding that we set w x,y (e) = 0 if y separates x from e as this edge does not contribute to the sum in Lemma 4.1. The latter formula expresses H two x (y) as a sum of contributions of each e ∈ E(T). The main surprise in the proof of Theorem 1.2 is the following lemma, which says that for each e ∈ E(T), the sum of these contributions w x,y (e) over all H two x (y), x, y ∈ V(T), xy ∈ E(T), is bounded. An obvious double-counting argument involving (6) will then establish Theorem 1.2. We emphasise that this sum is taken over all ordered pairs of adjacent vertices. The proof of Theorem 1.2 is based on the fact that, for a fixed e, the values w x,y (e) decay fast with the distance d(xy, e), and even more so in the direction of r. (Here, we define the distance between two edges xy, wz ∈ E to be d(xy, wz) := min{d(x, w), d(y, z), d(x, z), d(y, w)}.) The following two propositions will yield quantitative bounds on the speed of this decay.

Proposition 4.3. Let G be any graph, x ∈ V(G), and v ∈ N(x). Consider a CRW strategy that when at x always chooses v when that choice is available, otherwise it chooses each of the available options independently with probability 1/2. Then for every w
since each w = v is chosen with equal probability, and only when v is not among the options. Thus, Proof. As in the proof of Proposition 4.3, we have q Armed with these propositions, we can now prove Lemma 4.2.
Proof of Lemma 4.2. Fix e ∈ E(T). We split := x,y∈V(T),xy∈E(T) w x,y (e) as a sum = i∈N i of 'layers' i , corresponding roughly to distance from e, and show that i decays exponentially in i.
Let P be the path from e to r in T (excluding e), and let L 0 be the set of all edges of P and all edges incident with P (including e). Let 0 := x,y∈V(T),xy∈L 0 w x,y (e) be the total weight assigned to e by pairs of adjacent vertices of L 0 . Define L i , i 1 recursively as the set of edges incident with L i−1 not contained in j<i L j , and let i := x,y∈V(T),xy∈L i w x,y (e). See Figure 2 for an illustration of the different sets L i .

Claim
The following inequalities hold . , x k , where x k = r be the vertices of P as they appear from e to r.
Recall that w x i+1 ,x i (e) = 0 for every i 1, as e is separated from x i+1 by removing x i and thus does not contribute to the sum in the formula for H x i+1 (x i ) from Lemma 4.1. In the other direction, we ) < 1/2 because this ratio coincides with the ratio of the corresponding transition probabilities by the definitions. Moreover, at each x j , 1 j < i, the strategy σ x i x i+1 makes the same choices as σ x j x j+1 , hence by Proposition 4.3 again, with the convention that x 0 x 1 = e. Our claim follows by multiplying these fractions for j ranging from 1 to i. For each of the at most d − 1 other edges x i z = e of L 0 incident with x i , where i 1, we use the rather generous bound w x i ,z (e) < (1/2) i−1 , which is true by similar arguments because Adding these contributions, we obtain We will bound the contribution of the edges incident with yv to i in terms of the contribution of yv to i−1 . For this, let vw ∈ L i . Note that v separates w from r and so first, w w,v (e) = 0. Second, this implies σ vw avoids moving from v to y whenever possible by (5). Thus, Proposition 4.4 yields each edge yv ∈ L i−1 , and adding together, noting that at most one end vertex v of yv is incident with edges in L i by construction, we finally deduce Combining both parts of the Claim proves our statement, as = i i 2 0 4d.
It is now easy to complete the proof of Theorem 1.2.
Proof of Theorem 1.2. By (4.1), we have Changing the order of summation, and then applying Lemma 4.2 to each summand, we bound the right-hand side by

Infinite graphs and cover time of tori
In this section, we bound the cover time of the d-dimensional discrete torus Z d k , which has n = k d vertices. Here, we think of the dimension d as being fixed while the side length k grows. In order to prove a linear bound on the cover time, we will instead consider the infinite limit Z d and infinite (but locally finite) graphs more generally.
For infinite graphs, it is meaningless to ask about the CRW cover time, but still interesting to ask about hitting times. The most fundamental question is whether these can be made finite, which corresponds to asking for positive recurrence.
Definition 5.1. A graph is positive choice recurrent (PCR) if there exists an unchanging strategy for the CRW such that the expected return time to any given vertex is finite. A graph is strongly PCR if for every p ∈ (0, 1) there exists an unchanging CRW strategy such that expected return times are finite for the process which, at every time step, takes a step of the CRW with that strategy with probability p and a step of the SRW otherwise.
A natural question is whether there is a strategy under which the walk becomes a transient Markov chain. The answer is always yes: fixing a root r and giving each edge uv weight 2 min (d(u,r),d(v,r)) produces a suitable weighting to apply Corollary 3.2. This weighted graph is transient because any infinite geodesic starting at the root has total resistance 2 (see e.g. [29,Thm. 2.3]), and taking other edges into account cannot increase the effective resistance to infinity.
While positive recurrence is the property which will be useful to us, we might also ask for the weaker property of choice recurrence, where we simply require return times to be almost surely finite. It is possible for a graph to be choice recurrent but not PCR; indeed, there are graphs which are recurrent under the SRW but not PCR.
Remark. Proposition 3.4 implies that any graph of maximum degree 3 is PCR. This is not true for higher degrees, since for the infinite 4-regular tree any strategy is more likely to move away from a given target vertex than towards it.
Note that Z d = Z d−1 Z, where indicates the Cartesian product. We will need the following result about Cartesian products of PCR graphs.

Lemma 5.2. If G is PCR, H is strongly PCR and both G, H are regular, then G H is PCR.
Proof. Define the p-product of two time-homogeneous Markov chains A, B to be the chain with state space S(A) × S(B) where at each time step with probability p a transition of B occurs, and otherwise a transition of A occurs. If both chains are irreducible and positive recurrent, then so is the p-product (this follows easily from the existence of stationary distributions). Now we define a strategy for the CRW on G H as follows. If at least one of the choices given is a move in the H co-ordinate, we make such a move. Now the probability of exactly one of the options being a move in H is 2rs (r+s) 2 , where G is r-regular and H is s-regular, and the probability of both options being moves in H is s 2 (r+s) 2 . Thus, conditional on at least one option being in H, both are in H with probability s 2r+s . There is a strategy on H, for this probability of having two choices, which reaches the root in finite time; whenever we move in the H co-ordinate we use this strategy. If both choices are moves in G, then we follow the appropriate strategy for the random walk with two choices in G. The resulting Markov chain is the 2rs+s 2 (r+s) 2 -product of positive recurrent Markov chains on G and H, hence positive recurrent.
The same argument shows that if in addition G is strongly PCR, then so is G H. Lemma 5.2 allows us to conclude that Z d is PCR and consequently obtain a bound on its cover times and hitting times.

Hitting and cover times in expanders
In this section, we prove bounds on the cover and hitting times of the CRW on a graph G in terms of fundamental parameters. First, we introduce our notation. Let G be a graph with n vertices, and write d max , d min and d avg for the maximum, minimum and average degree of G, respectively. Let t rel be the relaxation time of G, defined as 1 1−λ 2 , where λ 2 is the second largest eigenvalue of the transition matrix of the lazy random walk (LRW) on G with loop probability 1/2. Recall that t hit is maximum over all pairs uv ∈ V of the expected time it takes the SRW to reach u from v. Our first result bounds the CRW cover time.
Theorem 6.1. For any connected n-vertex graph G the following holds We also bound hitting times. First, we define the exponent γ d = log d d 2 2d−1 ; note that γ d is increasing in d, γ d < 1 and 1 − γ d ∼ 1/ log 2 d. Also recall that for a set S ⊆ V let π(S) = s∈S π(s) be the stationary probability of S. Theorem 6.2. For any graph G, and any x ∈ V and S ⊂ V, we have H two x (S) 12 · π(S) −γ dmax · t rel · ln n; this bound also holds for return times. Consequently, We say that a sequence of graphs (G n ) is a sequence of expanders if t rel (G n ) = (1). Theorems 6.1 & 6.2 yield the following corollary: Theorem 1.4 For every sequence (G n ) n∈N of bounded degree n-vertex expanders, we have t two cov (G n ) = O n log log n and t two hit (G n ) n α for some fixed α < 1. These are significantly less than the corresponding cover and hitting times by the SRW, which are (n log n) and (n), respectively [2, Thm. 10.1]. Theorems 6.1 and 6.2 will follow from Theorem 6.3 below. For a given graph G, we consider possible trajectories of a (non-lazy) walker, that is, finite sequences of vertices in which any two consecutive vertices are adjacent; the length of a trajectory will be the number of steps taken. In the following, we use bold characters to denote trajectories in G and if u ∈ V(G), then u will denote the length 0 trajectory from u. Fix a non-negative integer t and a set S of trajectories of length t.
Let p x,S denote the probability that extending a trajectory x to length t according to the law of a SRW results in a member of S. Let q x,S denote the corresponding probability under the CRW law; this probability will depend on the particular strategy used. This function can encode probabilities of many events of interest such as 'the graph is covered by time t' , 'the walk is in a set X at time t' or 'the walk has hit a vertex x by time t' for example. However, let us emphasise that our result in fact applies to any possible event. Theorem 6.3. Let G be a graph, u ∈ V, t > 0 and S be a set of trajectories of length t from u. Then, there exists a strategy for the CRW such that q u,S p u,S γ dmax .
We also give an analogue of Theorem 6.3 for bad events. This analogue, unlike Theorem 6.3, gives an exponent which does not depend on the maximum degree d max of G, and so a significant reduction is possible even if d max is large. Theorem 6.4. Let G be a graph, u ∈ V, t > 0, and S be a set of trajectories of length t from u. Then there exists a strategy for the CRW such that q u,S p u,S 2 .
Remark. The exponent 2 in Theorem 6.4 is best possible, since we have equality whenever t = 1 and therefore also when t > 1 but every trajectory of the SRW of length t − 1 has the same probability to reach S. Similarly the exponent given in Theorem 6.3 is best possible, as evidenced by the case where this probability is 1/d max for every trajectory of length t − 1.
After stating two technical lemmas in Section 6.2, we then explain an alternative way of considering the CRW in Section 6.3, which enables the proof of Theorems 6.3 and 6.4 to be completed. To motivate the importance of Theorem 6.3, we shall begin by showing how it implies our main results on cover time and hitting times.

Deducing Theorems and 6.2 from Theorem 6.3
In order to prove our main bounds from the key tool, Theorem 6.3, we must first overcome the obstacle that Theorem 6.3 is expressed in terms of the SRW probabilities, whereas our bounds involve the relaxation time, which is defined in terms of the LRW. The reason for using the LRW to define relaxation time is to ensure that the associated Markov chain is aperiodic. Our next lemma resolves this issue by relating the relaxation time to SRW probabilities.
Write p (t) x,· and p (t) x,· for the distribution of the SRW and LRW, respectively, after t steps started at x, and write π(S) for the stationary probability of a set S (note that the two walks have the same stationary distribution).

Lemma 6.5. For any graph G, S ⊂ V and x ∈ V, there exists t 4t rel ln n such that
x,S π(S)/3.

Proof.
If G is bipartite, then we may find a subsetS ⊆ S which lies entirely within one part satisfying π(S) π(S)/2. Otherwise, the SRW is aperiodic and we setS = S. We now consider the multigraphḠ formed from G by contractingS to a single vertexs, retaining all edges (with edges insideS becoming loops ats). Retaining edges ensures that the stationary probability ofs inḠ is precisely π(S). Letλ 2 be the second largest eigenvalue of the LRW onḠ. Then for any x / ∈S and t 0, by [28, (12.11)], we have | p (t) x,s − π(S)| π(S)/π(x) · e −t(1−λ 2 ) . It follows that if we run the LRW onḠ for T = log (3/ π(S)π(x))/(1 −λ 2 ) steps then Now, we can express the density of the LRW by p (T) , where the random variable X T ∼ Bin (T, 1/2) is the number of non-lazy steps taken by the LRW in time T. Thus, We can assume n 2 or else the result holds trivially, so log (3/ π(S)π(x)) log 3 + 2 log n 4 log n. Finally, [2,Cor. 3.27] gives thatλ 2 λ 2 , so T 4t rel ln n.
Our strategy to bound the cover time will be to emulate the SRW until most of the vertices are covered, only using the additional strength of the CRW when there are few uncovered vertices remaining. We will need a simple lemma to bound how long the first stage takes.

Lemma 6.6. Let U(t) be the number of unvisited vertices at time t by a SRW on a graph and let T n/2 x be the number of SRW steps taken before U n/2 x . Then
n 2 x and E T n/2 x 4(x + 1)t hit .

Proof.
Let v ∈ V. Then by Markov's inequality P w [X t = v, ∀0 t 2t hit ] 1/2, for any w ∈ V. Thus, the probability v is not visited by time 2x · t hit is at most 2 −x by sub-multiplicity and so the expected number of unvisited vertices at time 2x · t hit is at most n · 2 −x . By the above E [ U(2(x + 1)t hit ) ] n/(2 · 2 x ) and so P [ U(2(x + 1)t hit ) n/2 x ] 1/2 by Markov's inequality. Considering sections of length 2(x + 1)t hit separately, and continuing until one section covers the required number of vertices, we use in expectation at most two such sections, thus E T n/2 x 4(x + 1)t hit .
We now have what we need to prove the cover and hitting time bounds.
Proof of Theorem 6.1. For convenience, we write γ = γ d max . We first emulate the SRW (i.e. set α z x,y = 1/2 for all x, y, z ∈ V(G) with y, z ∈ (x)) until all but m = n/ log C n vertices have been visited, for some C to be specified later. Let τ 1 be the expected time to complete this phase. Then, by Lemma 6.6, we have τ 1 4t hit · C log 2 log n.
We cover the remaining vertices in m different phases, labelled m, m − 1, . . . , 1, each of which reduces the number of uncovered vertices by 1. In phase i, a set of i vertices are still uncovered, and we write S i for this set. By Lemma 6.5 for any vertex x, there is some t 4t rel log n such that and thus q (t) u,S i d min · i/(3nd avg ) γ by Theorem 6.3. Since from any starting point, we can achieve this probability of hitting a vertex in S i within the next 4t rel log n steps, the expected number of attempts needed to achieve this is at most d min · i/(3nd avg ) −γ , meaning that the expected time required to complete phase i is at most Hence, the expected time τ 2 to complete all m phases satisfies Then, since For the first bound, we choose C = log ( d avg d min ) · t rel · log 2 n / (1 − γ ) · log log n then since log C(1−γ ) n = (d avg /d min )t rel · log 2 n and γ < 1 this gives τ 2 = O(n) by (7) above. Since in any graph t hit = (n), a the total time is therefore O(τ 1 ), and for this value of C we have Proof of Theorem 6.2. Write T = 4 · t rel · ln n. For any x ∈ V and S ⊂ V, Lemma 6.5 gives a t T such that p (t) x,S π(S)/3, and Theorem 6.3 consequently gives a strategy for the CRW such that x,y (π(y)/3) γ . Thus, for any target set S and start vertex x, we need in expectation at most (3/π(S)) γ attempts to hit S in at most T steps, since if an attempt fails, ending at some vertex z, we have the same bound on the probability of hitting S from z. Therefore, there is a strategy for the CRW where the hitting time H x (S) 12 · π(S) −γ · t rel log n. The second result follows since for any vertex π(v) d min nd avg .

The max choice and min choice operations
In this section, we introduce two operators which represent the effect of making optimal choices for a single step of the random walk, assuming that the effects of choice on future steps are already known and prove inequalities relating them to power means. Define the max choice operator MC 2 : [0, ∞) m → [0, ∞) as follows: For p ∈ R \ {0}, the p-power mean M p of non-negative reals x 1 , . . . , x m is defined by: We use a key lemma which could be be described as a multivariate anti-convexity inequality.
Proof. By the power-mean inequality, since γ −1 m γ −1 d it is sufficient to prove the case m = d. We show this by induction on d; we have equality for d = 1. Suppose that either d = 2 or d 3 and the result holds for d − 1. Without loss of generality, using symmetry and homogeneity of both operators, we may assume that max{x 1 , . . . , x d } = x d = 1.
We first claim that we may further assume x 1 = · · · = x d−1 . If d = 2, this claim is trivial. If a Let τ v be the first time that v is visited during a random walk from u.
where the first inequality uses the assumption that the result holds for n − 1 and the second uses the power-mean inequality. Thus, replacing x 1 , . . . , x d−1 byx, . . . ,x does not increase the difference between the two operators, proving the claim. Next, we claim that the function . , x, 1) is linear, and the two functions agree at 0 (by choice of γ d ) and at 1, this will complete the Lemma 6.7 will be used to prove Theorem 6.3. In order to prove Theorem 6.4, we will need a corresponding inequality for an appropriate operator. To that end, we define the min choice operator mC 2 :[0, ∞) m → [0, ∞) by: Proof. Observe that

The tree gadget for graphs
In this section, we prove Theorem 6.3. To achieve this, we introduce the tree gadget which encodes trajectories of length at most t from u in a rooted graph (G, u) by vertices of an arborescence (T t , r), that is, a tree with all edges oriented away from the root r. Given (G, u), we represent each trajectory of length i t started from u in G as a node at distance i from the root r in the tree T t . The root r represents the trajectory of length 0 from u. There is an edge from x to y in T t if x is obtained from y by deleting the final vertex. See Figure 3 for an illustration of the tree gadget.
Also for x ∈ V(T t ) let + (x) = {y ∈ V(T t ):xy ∈ E(T t )} be the offspring of x in T ; as usual we write d + (x) for the number of offspring. Write |x| for the length of the trajectory x. To prove  Proof of Theorem 6.3. For ease of notation, we write η = 1/γ d max . To each node x of the tree gadget T t , we assign the value q x,S under the CRW strategy of preferring the choice which extends to a trajectory y ∈ + (x) giving a higher value of q y,S . This is well defined because both the strategy and the values q x,S can be computed in a 'bottom up' fashion starting at the leaves, where if x ∈ V(T t ) is a leaf then q x,S is 1 if x ∈ S and 0 otherwise. Suppose x is not a leaf. The controller is presented with two uniformly random offspring y, z ∈ + (x) and chooses y if q y,S q z,S and z otherwise. Thus, we have q x,S = 1 d + (x) 2 y,z∈ + (x) max{q y,S , q z,S } = MC 2 q y,S y∈ + (x) .
We define the following potential function (i) on the i th generation of the tree gadget T : where the sum ranges over all trajectories x of length i. Notice that if xy ∈ E(T t ) then Also since each y with absy = i has exactly one parent x with |x| = i − 1, we can write We now show that (i) is non-increasing in i. By combining (10) and (11), we can see that the difference (i−1) − (i) is given by: Recalling (9), to establish (i−1) − (i) 0, it is sufficient to show the following inequality holds whenever x is not a leaf: Raising both sides to the power 1/η = γ d max , since d + (x) d max this inequality holds by Lemma 6.7, and thus (i) is non-increasing in i.
Observe (0) = q η u,S . Also if |x| = t then q x,S = 1 if x ∈ S and 0 otherwise. It follows that Thus, since (t) is non-increasing q η u,S = (0) (t) = p u,S , as required.
Theorem 6.4 now follows similarly to Theorem 6.3.
Proof of Theorem 6.4. Construct the tree gadget to height t. We associate each node x with the probability q x,S under a strategy which always prefers the smaller value. For a leaf this is simply the indicator function 1 {x∈S} , whereas for an internal vertex it is given by mC 2 (q y,S ) y∈ + (x) . We define a potential function by: As before, Further, for each internal vertex x we have, using Lemma 6.8, Summing over all x at level i, we obtain (i) (i+1) for each i < t, and consequently √ q u,S = (0) (t) = p u,S , as required.

Random graphs
We now consider CRW hitting and cover times in the Erdős-Rényi random graph G(n, p). This is the probability distribution over all n-vertex simple graphs generated by sampling each possible edge independently with probability p, see [12] for more details. , p) where np c ln n for any fixed c > 1 and log np = o log n . Then w.h.p.
Proof. To begin, we show that the graph is almost regular w.h.p.
Claim For p as above, d min , d max = np w.h.p.

Proof of claim.
In G d ∼ G(n, p) since each edge is independent with probability p, each degree d(u) is distributed as a binomial random variable Bin (n − 1, p). The Chernoff bound [14,Thm. 3.2] states that for any λ, P Bin (n, p) np + λ exp − λ 2 2(np+λ/3) . Thus, by a union bound over all vertices d max 5np w.h.p.. For d min note that the expected number of vertices of degree k is given by x k = n n−1 k p k (1 − p) n−1−k . We shall consider k = κnp for κ 1/2, in this case x k /x k−1 = np k (1−p) 2 and so the expected number of vertices with degree k is O(x k ).
Observe that x k Cooper & Frieze [16] show that for np = c ln n, c > 1 w.h.p. the conductance of G(n, p) is at least 1/6, implying that t rel = O(1) [28,Thm. 13.14]. For larger values of np, Coja-Oghlan [15,Thm. 1.2] showed that there exists some c < ∞ such that for np c log n the spectral gap of the normalised Laplacian of G(n, p) Since the normalised Laplacian L is similar to the random walk Laplacian L , and the later is given by L = I − P, we see that also in this range t rel = O(1). We have shown that, in this regime, G(n, p) is almost regular and has constant relaxation time w.h.p., thus t hit = O(n) w.h.p. by [13,Thm. 5.2]. Theorems 6.1 & 6.2 now yield the results.
Thus, the CRW gives a significant improvement in the cover and hitting times whenever degrees of G(n, p) are subpolynomial in n.

Computing optimal choice strategies
In this section, we focus on the following problem: given a graph G and an objective, how can we compute a series of choices for the walk which achieves the given objective in optimal expected time? In particular, we consider the following computational problems related to our main objectives of max/minimising hitting times, cover times and stationary probabilities π v .
Stat (G, w): Find a CRW strategy min/maximising v∈V w v π v for vertex weights w v 0.
Hit (G, v, S): Find a CRW strategy minimising H two v (S) for a given S ⊆ V(G) and v ∈ V(G).
Cov (G, v): Find a CRW strategy minimising C two v (G) for a given v ∈ V(G).
The analogous problems to Stat (G, w) and Hit (G, v, S) were studied in [4] for the BRW. While Stat is not one of our primary objectives, we include it here both as a natural problem to consider but also because of its relationship to Hit in the case where w is the indicator function of a set S; we shall abuse notation by writing Stat(G, S) for this case. Clearly for Stat, we must restrict ourselves to unchanging strategies for the stationary probabilities π v to be well defined; we shall show that Hit also has an unchanging optimal strategy. For Hit and Cov, there are two possible interpretations of what it means to 'find' a CRW strategy. Perhaps the most natural is to compute a sequence of optimal choices in an online fashion, that is at each time step to compute which of the two offered choices to accept. For any particular walk, with suitable memoisation, at most a polynomial number of such computations will be required for either problem: which choice to accept depends only on the current vertex, the two choices, and in the case of Cov the vacant set, which can change at most n times. We might alternatively want to compute a complete optimal strategy in advance; for Hit this requires only a polynomial number of single-choice computations, but for Cov the number of possible situations our strategy must cover will be exponential. However, we shall show that Cov is hard even for individual choices.

A polynomial-time algorithm for Stat and Hit
First, we show how the (unknown) optimal values H two x (v) determine an optimal strategy for Hit(G, ·, v). In the following two lemmas, we will need to work with a multigraph F; in this context, the choice offered at each stage is between two random edges from the current vertex.
. Let β be the deterministic unchanging strategy given by β v k v i ,v j = 1 whenever j < k. Then β is optimal (among all strategies) for Hit (F, x, v) for every x = v, and also for the problem of minimising Proof. Fix an optimal strategy α for Hit (F, x, v), and for each y ∈ (x) write q y for the probability that the first step under this strategy is from x to y. Recall that q y = z∈ (x) 2α z x,y d(x) 2 . Now given that the first step is at y, an optimal strategy for the remaining steps is precisely an optimal strategy for Hit(F, y, v), and thus Suppose there exist y, z ∈ (x) with H two y (v) < H two z (v) but α z x,y < 1 at the first step. By instead (at time 1 only) always choosing y in preference to z, the expected hitting time is decreased by , then the expected hitting time does not depend on α z x,y , and so any strategy satisfying these conditions at time 1, and thereafter following an optimal strategy, is itself optimal.
It follows by induction that following β for k turns and thereafter following α is optimal; since this gives arbitrarily good approximations of the expected hitting time under β, β is itself optimal for Hit (F, x, v), and, since the definition of β does not depend on x, for Hit(F, y, v) for any y = v.
Next, we show that β is also an optimal strategy for minimising E v τ + v . Suppose not, and let γ be an optimal strategy. Write q γ x for the probability of moving from v to x at time 1 under γ , and H , which is non-positive by choice of β. Thus, after a sequence of such changes, we obtain

Lemma 7.3. For any simple graph G of order n and every pair of vertices x, y with H two
Proof. Note that the hitting times h x x∈V of S from x for any given unchanging strategy are uniquely determined by the equations: where P is the transition matrix for the strategy. This set of equations can be written as Ah = b, where A := (I − Q), Q i,j = P i,j if i / ∈ S and 0 otherwise, and b is a 0-1 vector. Notice that A is diagonally dominant, and from any row where equality occurs there is a path of non-zero entries to a strictly dominant row. It is straightforward to check that such a matrix is invertible: see for example [6,Lem. 3.2]. For any non-random strategy, and in particular for the optimal strategy described above, every transition probability from x is a multiple of d(x) −2 . Thus, all the elements of A can be put over a common denominator D, where D := LCM (d(x) 2 ) x∈V < (n!) 2 < n 2n /2.
where C is the matrix of cofactors. Each entry in C can be put over a common denominator which is at most D n , and so the same applies to each entry of C T b. Also, |A| < 2 n by Hadamard's inequality [26,Thm. 7.8.1]. It follows that if two hitting times differ, they differ by at least (2D) −n .
For any graph G and weighting w:V → [0, ∞) on the vertices of G we can phrase Stat (G, w) as an optimisation problem as follows, where we shall encode our actions using the probabilities α z x,y = P X t+1 = y | X t = x, c = {y, z} from Section 2. maximise: For minimising the stationary probabilities, we maximise −1 times the objective function.
To prove Theorem 7.4, the quadratic terms in (13) can be eliminated using the same substitution as [4, Thm. 6]; we can then solve (13) as a linear program. Proof. We prove the simple graph case; this proof may be easily extended for multigraphs with suitably adapted notation. The optimisation problem (13) above can be rephrased as a linear program by making the substitution r x,y,z = π(x) · α z x,y . Either the Ellipsoid method or Karmarkar's algorithm will approximate the solution to within an additive ε > 0 factor in time which is polynomial in the dimension of the problem and log (1/ε), see for example [23,27].
We now show how one can now use this linear program to determine the hitting times. Theorem 1.6 For any graph G and any S ⊂ V, a solution to Hit (G, x, S) for every x ∈ V \ S can be computed in time poly (n).
Proof. Contract S to a single vertex v to obtain a multigraph F; where a vertex x has more than one edge to S in G, retain multiple edges between x and v in F. Note that F has at most n vertices and at most n 2 edges. Provided that the CRW on G has not yet reached S, there is a natural correspondence between strategies on G and F with the same transition probabilities, and it follows that H two x (S) for G and H two x (v) for F are equal for any x ∈ V(G) \ S. We compute an optimal strategy to Stat(F, {v}) to within an additive error of ε := n −10n 2 ; note that log (1/ε) = o(n 3 ) and so this may be done in time poly (n) by Theorem 7.4. Applying Lemma 7.2 to F and Lemma 7.3 to G, using the equality of corresponding hitting times, implies that this strategy has α z x,y > 1/2 whenever H two y (v) < H two z (v), and so rounding each of the probabilities α z x,y to the nearest integer gives an optimal strategy (on F) for every x, which may easily be converted to an optimal strategy for G.

A hardness result for Cov
We show that in general even the online version of Cov (G, v) is NP-hard. To that end, we introduce the following problem, which represents a single decision in the online version. The input is a graph G, a current vertex u, two vertices v and w which are adjacent to u, and a visited set X, which must be connected and contain u.
NextStep (G, u, v, w, X): Choose whether to move from u to either v or w so as to minimise the expected time for the CRW to visit every vertex not in X, assuming an optimal strategy is followed thereafter.
Any such problem may arise during a random walk with choice on G starting from any vertex in X, no matter what strategy was followed up to that point, since with positive probability no real choice was offered in the walk up to that point.

Theorem 1.7
NextStep is NP-hard, even if G is constrained to have maximum degree 3.

Proof.
We give a (Cook) reduction from the NP-hard problem of either finding a Hamilton path in a given graph H or determining that none exists. This is known to be NP-hard even if H is restricted to have maximum degree 3 [21]. We shall find it more convenient to work with the following problem, which takes as input a graph G, a current vertex u and a connected visited set X containing u.
BestStep (G, u, X): Choose a neighbour of u to move to so as to minimise the expected time for the CRW to visit every vertex not in X, assuming an optimal strategy is followed thereafter.
We may solve BestStep(G, u, X) by computing NextStep(G, u, v, w, X) for every pair v, w of neighbours of u; since all optimal neighbours must be preferred to all others, this will identify a set of one or more optimal choices for BestStep(G, u, X). Consequently, it is sufficient to reduce the Hamilton path search problem to BestStep. Given an n-vertex graph H, construct the graph G as follows. First replace each edge of H by a path of length 2cn 2 through new vertices. Next add a new pendant path of length n 3 starting at the midpoint of each path corresponding to an edge of H. Finally, add edges to form a cycle consisting of the end vertices of these pendant paths (in any order). Note that if H has maximum degree 3, so does G.
Fix a starting vertex u and a non-empty unvisited set Y ⊆ V(H) \ {u} and set X = V(G) \ Y. (The purpose of the second and third stages of the construction is to make X connected without affecting the optimal strategy.) Suppose that H contains at least one path of length |Y| starting at u which visits every vertex of Y; in particular if Y = V(H) \ {u} this is a Hamilton path of H. We claim that any optimal next step is to move towards the next vertex on some such path. Assuming the truth of this claim, an algorithm to find a Hamilton path starting at x, if one exists, is to set u = x and Y = V(H) \ {x}, then find the vertex y such that moving towards y is optimal, set u = y and remove y from Y, then continue. If this fails to find a Hamilton path, repeat for other possible choices of x.
To prove the claim, first we argue by induction that there is a strategy to visit every vertex in |Y| in expected time (4cn 2 + O(n) )|Y|, where the implied constant does not depend on c. This is clearly true for |Y| = 0. Let y be the next vertex on a suitable path in H, and let z be the middle vertex of the path corresponding to the edge uy. Attempting to reach z by a straightforward strategy, the distance to z evolves as a random walk with probability 3/4 of decreasing unless the current location is a branch vertex. We thus reach z in expected time 2cn 2 plus an additional constant time for each visit to u, of which we expect O d(u) = O(n), giving a total expected time of 2cn 2 + O(n) (if the walker is forced to a different branch vertex first, the expected time to return from this point is polynomial in n, but this event occurs with exponentially small probability). Similarly, the time taken to reach y from z is 2cn 2 + O(1). Once y is reached, there is (by choice of y) a path of length |Y| − 1 in H starting from y and visiting all of Y \ {y}. Thus, by induction, the required bound holds. Secondly, suppose that an optimal first step in a strategy from u moves towards a vertex y of H which is not the first step in a suitable path. Since the expected remaining time decreases whenever an optimal step is taken, two successive optimal steps cannot be in opposite directions unless the walker visits a vertex of Y in between. Thus, the optimal strategy is to continue in the direction of y if possible, and such a strategy reaches y before returning to u with at least constant probability p, and this takes at least 2cn 2 steps. Note that the expected time taken to reach another vertex of H from a vertex in H, even if the walker is purely trying to minimise this quantity, is at least 4cn 2 , and from either u or y at least |Y| such transitions are necessary to cover Y. Thus, such a strategy, conditioned on the first step being in the direction of y , has expected time at least 4cn 2 + 2pcn 2 , which, for suitable choice of c, proves the claim.

Computing Cov via Markov decision processes
To compute a solution for Cov (G, v), we can encode the cover time problem as a hitting time problem on a (significantly) larger graph. Proof. There is a natural bijection between the out-edges in G from u and those in G from (u, T) for any u ∈ V, T ⊆ V. This extends to a natural bijection from finite walks (which we may think of as a vertex together with a history) in G starting from v to walks in G starting from v, and also to a measure-preserving bijection between the choices which may be offered from u and (u, T). Thus, there is a natural bijection between strategies for the two walks, and both the choices offered and any random bits used may be coupled so that corresponding strategies produce corresponding walks. Since the walk in G has covered V if and only if the walk in G has hit some vertex in W, the times that these events first occur are identically distributed for corresponding strategies, and in particular the sets of optimal strategies correspond.
In light of Lemma 7.5, it may appear that we can solve Cov(G, v) by converting it to an instance of Hit G, v, W and appealing to Theorem 1.6. This is unfortunately not the case as G is a directed graph and Theorem 1.6 cannot handle directed graphs. Lemma 7.5 is still of use as we can phrase Hit in terms of Markov decision processes (MDPs) and then standard results tell us that an optimal strategy for the problem can be computed in finite time.
A MDP is a discrete time finite-state stochastic process controlled by a sequence of decisions [19]. At each step, a controller specifies a probability distribution over a set of actions which may be taken and this has a direct affect on the next step of the process. Costs are associated with each step/action and the aim of the controller is to minimise the total cost of performing a given task, for example, hitting a given state. In our setting, the actions are orderings of the vertices in each neighbourhood and the cost of each step/action is one unit of time. The problem Hit G, u, v) is then an instance of the optimal first passage problem which is known to be computable in finite time [19]. Corollary 7.6. For any graph G and v ∈ V, an optimal policy for the problem Cov (G, v) can be computed in exponential time.
Proof. We first encode the problem Cov (G, v) as the problem Hit G, v, W as described in Lemma 7.5. Now as mentioned Hit G, v, W is an instance of the optimal first passage problem which for a given graph G, start vertex v and target vertex W can be computed in finite time using either policy iteration or linear programming, see for example [19,Ch. 5,Cor. 1]. Examination of the linear program on [19, page. 58] reveals that there is a constraint for every ordering of the neighbours of each vertex. Since G has at most 2 n vertices and each of these has at most n neighbours, we see that there are at most 2 n · n! e n 3 constraints. It follows that this linear program can be solved in time poly (e n 3 ), thus Cov (G, v) ∈ EXP.
Remark. Since in our setting actions are orderings of neighbourhoods, the space of actions may be factorial in the size of the graph. The algorithms for computing Hit G, u, v) from [19] used to establish Corollary 7.6 are polynomial in the number of actions and thus will not yield a polynomial-time algorithm for the problem. This is why we resisted appealing to MDP theory when finding a polynomial-time algorithm for Hit G, u, v) on undirected graphs in Section 7.1.

Summary
In this paper, we proposed a new random walk process inspired by the power of choice paradigm. We derived several quantitative bounds on the hitting and cover times and also presented a dichotomy with regard to computing optimal strategies.
While we were able to show that on an expander graph, the CRW significantly outperforms the SRW in terms of its cover time, we do not yet know the exact order of magnitude of t two cov . In fact, we do not have any lower bound on t two cov improving the trivial (n) for any sequence of bounded degree graphs. Constructing a sequence of graphs (G n ), especially expanders, with t two cov (G n ) = ω(n) would be very interesting.
We have shown that Cov ∈ EXP and that the problem is NP-hard. It would be interesting to find a complexity class for which the problem is complete, and we suspect it is PSPACE-complete.