ON STATE-INDEPENDENT IMPORTANCE SAMPLING FOR THE GI|GI|1 TANDEM QUEUE1

In this paper, we consider a d-node GI|GI|1 tandem queue with i.i.d. inter-arrival process and service processes that are independent of each other. Our main interest is to estimate the probability to reach a high level N in a busy cycle of the system using simulation. As crude simulation does not give a sufficient precision in reasonable time, we use importance sampling. We introduce a method to find a state-independent change of measure and we show that this is equivalent to a change of measure that was earlier, but implicitly, described by Parekh and Walrand [8]. We also show that this change of measure is the only exponential state-independent change of measure that may result in an asymptotically efficient estimator. Lastly, we provide necessary conditions for this state-independent change of measure to give an asymptotically efficient estimator.


INTRODUCTION
Rare events can play an important role in many practical situations, including in logistics or telecommunications systems. For instance, the event in which some storage buffer becomes too full may lead to expensive loss of material, while an overflowing data buffer may lead to loss of important information. Even though such events may have a very low probability of occurring, their impact on the performance of the system as a whole can be profound, which explains why it may be important to obtain accurate estimates of such probabilities.
Many other examples exist, but the ones we mentioned here can be modeled as overflow in a queueing system, which is the topic of this paper.
Importance sampling is one of the methods used to estimate the (small) probability of a so-called rare event using stochastic simulation. In importance sampling, the event of interest is made less rare by changing the underlying probability distributions. This change of the probability distributions is also called the change of measure or tilting. During the simulation, one keeps track of the likelihood ratio, which is the ratio between the probabilities in the original system and the probabilities in the changed system. The results of the simulation are weighted by this ratio and hence one obtains an unbiased estimator.
In this paper, we consider importance sampling for GI|GI|1 tandem queues. More specifically, we are interested in estimating the probability that in a busy cycle of the queueing system the total number of customers reaches some high level N . The goal is to obtain a so-called asymptotically efficient estimator, so that the relative error of the estimator grows less than exponentially with N .
One of the first papers to consider importance sampling in queueing networks is by Parekh and Walrand [8]. Their interest is in the same probability as in the current paper. To estimate this probability for the single queue, they propose a simple, explicit, change of measure, and for networks of queues, they implicitly describe how to find a change of measure. Their proposed change of measure is state-independent, that is, the change of measure does not depend on the current state of the system. In the remainder of this paper, the change of measure proposed by Parekh and Walrand will be referred to as the P&W change of measure. To determine this change of measure, an equation needs to be solved. Frater and Anderson [6] partially solved the equation proposed by P&W, resulting in a simpler but still implicit description of the P&W state-independent change of measure for a class of GI|GI|1 tandem queues.
Sadowsky [9] shows that for the single GI|GI|m queue the P&W change of measure gives an asymptotically efficient estimator under some mild conditions. However, Glasserman and Kou [7] show that for the M |M |1 tandem queue the P&W change of measure may or may not give an asymptotically efficient estimator. They provide necessary conditions and (other) sufficient conditions for asymptotic efficiency. De Boer [3] extends these results, but also shows that the P&W change of measure is the only state-independent change of measure that can possibly yield an asymptotically efficient estimator for the M |M |1 tandem queue.
To the best of our knowledge, no results on asymptotic efficiency for the GI|GI|1 tandem queue had been obtained so far. The generalization from M |M |1 tandem queues to GI|GI|1 tandem queues is important because in practice, queueing networks usually do not have a Markovian arrival and/or service process. The contribution of this paper is threefold. First, we introduce another, simpler, method to obtain a change of measure for GI|GI|1 tandem queues, based on knowledge of the decay rate of the probability of interest which has been determined in Buijsrogge et al. [2]. This method is not implicit, and we show that it is equivalent to the earlier method, in the sense that it results in the same change of measure as P&W. Secondly, we show that the change of measure proposed by P&W is the only exponential state-independent change of measure that may give an asymptotically efficient estimator. Lastly, we provide necessary conditions for this exponential state-independent change of measure to give an asymptotically efficient estimator.
Based on results for the M |M |1 tandem queue, it is clear that for the GI|GI|1 tandem queue the P&W change of measure does not always give an asymptotically efficient estimator. In [4,5], Dupuis et al. prove that a certain state-dependent change of measure is asymptotically efficient for Markovian networks. This change of measure roughly coincides with the P&W change of measure in most of the state space, but deviates from it near the edges. We expect the same for the GI|GI|1 case, which motivates our interest in the (state-independent) P&W change of measure: even though it fails to be asymptotically efficient in some models, it seems plausible that it will be an important ingredient for any asymptotically efficient state-dependent change of measure. This paper is structured as follows. In Section 2, we introduce the model and the change of measure as derived from the decay rate obtained in Buijsrogge et al. [2]. In Section 3, we show that this is equivalent to the change of measure of Frater and Anderson [6] and thus P&W. In Section 4, we show that this P&W change of measure is the only exponential stateindependent change of measure that can give an asymptotically efficient estimator. Other necessary conditions for the state-independent change of measure to give an asymptotically efficient estimator are presented in Section 5. In Section 6, we give some numerical results, and the conclusions are presented in Section 7.

The model
In this paper, we consider d GI|GI|1 queues in tandem; in Section 5 and 6, we consider the special case d = 2. Let A k be the inter-arrival time at queue 1 between customers k and k + 1 and let B (j) k be the service time of customer k at queue j. The arrival process and all service processes are assumed to be i.i.d. and are independent of each other. After service completion at queue j < d, the customer enters queue j + 1, so there is no probabilistic routing. Note that the arrival process at queue j > 1 is obviously not independent and identically distributed. When the customer finishes service at queue d, the customer leaves the system. Starting with customer 1 in queue 1 and all other queues empty, we are interested in the event that there are N customers in the system before the system is empty again. We define K N as the index of the first customer who reaches the overflow level N . Likewise, K 0 is the index of the first customer after customer 1 who sees an empty system upon arrival. Let K = min(K 0 , K N ). Then the indicator 1{K = K N } defines if we have reached our rare event in the busy cycle or not, and the probability of this rare event, denoted by p N , is We denote the distribution functions of A k and B (j) k by F A and F B (j) , respectively, and their moment generating functions by M A (t) and M B (j) (t); for notational convenience, we let Λ A (t) = log M A (t) and Λ B (j) (t) = log M B (j) (t). Throughout this paper, we assume that for all j = 1, . . . , d, M B (j) (t) exists for some t > 0. We also assume that the system is stable, that is, E[B (j) ] < E[A] ∀j = 1, . . . , d. If at least one of the queues would be unstable, then our event of interest would not be rare and, therefore, no importance sampling would be needed in order to obtain a good estimation of the probability of the event. Also, we make the non-triviality assumptions that P(B (j) > A) > 0 for at least one queue j, so that the number of customers can reach any high level N in a busy cycle of the system, and that P A > d j=1 B (j) > 0, so that the system can become empty. Under these assumptions, it is shown in [2] that the decay of p N is given by where θ * is the minimum of θ (1) , . . . , θ (d) , with θ (j) given as or equivalently for all queues j. We say θ (j) = ∞ when M A (−θ)M B (j) (θ) < 1 for all θ > 0, which only happens when P(B (j) > A) = 0, see Lemma A.1. As a consequence of the stability and non-triviality assumption, 0 < θ * < ∞.

Importance sampling simulation
In importance sampling, the rare event is made less rare by changing the underlying probability distribution. For a single GI|GI|1 queue, so d = 1, it is suggested by Parekh and Walrand [8] to apply an exponential tilt θ = θ (1) , with θ (1) as in (2), for both the inter-arrival times and the service times in the following way, where F θ A (a) and F θ B (1) (b) denote the distribution functions under the change of measure. It is shown by Sadowsky in [9] that this change of measure results in an asymptotically efficient estimator, assuming that E[B (1) ] < E[A] (stability), that P(B (1) > A) > 0 (non-triviality), and that P(B (1) < M) = 1 for some finite constant M (bounded service times). The last assumption is the only real restriction, but it was claimed that this is a mere technicality, and not essential for the result to hold.
Let us now consider d GI|GI|1 queues in tandem and let θ = (θ 0 , . . . , θ d ) be a vector of exponential tilts. Then E θ [·] and P θ (·) denote expected values and probabilities under this change of measure θ, and we denote the distribution function and the moment generating function of a random variable X under this change of measure as F θ , respectively. For the distribution functions of A and B (j) we have Note that the difference compared with Eqs. (3) and (4) is that now the inter-arrival time distribution and service time distributions of all queues j may be tilted differently. As a result, the moment generating functions of A and B (j) under the change of measure θ are and we find the expected values under the change of measure θ to be In the sequel it will become clear that the "best" tilt θ * = (θ * 0 , . . . , θ * d ) is such that θ * 0 = θ * j * and θ * j = 0, j = 0, j * , where j * is the "bottleneck queue" in some sense. Next, we discuss how to find queue j * and the tilt-parameter θ * j * . In Section 3 it turns out that the change of measure described above is, in fact, the P&W change of measure, although this is not immediately clear from [8].

Specific change of measure θ *
Based on our knowledge of the decay rate from [2], see (1), we propose a specific change of measure. We start by solving the equations in (2), then we define the notion of bottleneck queue in the following way. (2) for all j, and θ * = min j θ (j) .

Assumption 2.1: In addition to
• stability of the system, that is, • non-triviality, that is, P(B (j) > A) > 0 for at least one queue j and

(all mentioned earlier), we assume that
• the bottleneck queue is unique, that is, θ * < θ (j) for all j = j * ; and • the inequality in definition (2) for j = j * holds with equality: Due to the uniqueness assumption, we are now ready to introduce the change of measure based on [2]: it is simply a θ-tilt as given in (5) and (6), where we choose the exponential tilt to be θ = θ * with θ * = (θ * , 0, . . . , 0, θ * , 0, . . . , 0). This means that we only tilt the inter-arrival times and the service times of the bottleneck queue j * , with the same tilting parameter θ * .
We will refer to this change of measure as the θ * -tilt. As mentioned earlier, in Section 3 we will show that this θ * -tilt coincides with the P&W change of measure for cases in which this is properly defined (that is, when (10) holds), and in Section 4 we will show that it is the only reasonable exponential change of measure, since taking θ = θ * will result in an estimator that is not asymptotically efficient.

Remark 2.1:
The notion of bottleneck queue mentioned in Definition 2.1 does not necessarily coincide with that of the ρ-bottleneck queue, that is, queue j * is not necessarily the queue with the largest server utilization ρ. However, both notions may yield the same bottleneck; for example, in case of an M |M |1 tandem queue, this is always the case.

COMPARISON WITH FRATER AND ANDERSON
In this section, we compare our method to obtain j * and θ * for the change of measure for the GI|GI|1 tandem queue to the earlier developed method by Frater and Anderson [6]. They presented one way to obtain a change of measure for the GI|GI|1 tandem queue in the early 90s. Their method is based on Parekh and Walrand [8] and is written in an implicit form. In Section 3.1, we present the method of Frater and Anderson; then in Section 3.2, we show that the two are equivalent in all cases where they are properly defined.

Method by Frater and Anderson [6]
In [6], the change of measure proposed by Parekh and Walrand is further explored. Based on large deviations theory, Parekh and Walrand defined a cost function H that needs to be minimized in order to find the change of measure. Frater and Anderson simplify this function (see (37) in [6]) to where λ 1 is the arrival rate at queue 1 and μ j the service rate of queue j, where each rate is just the inverse of the corresponding expectation, and where the primes denote that the values should be optimized to find the change of measure. Furthermore, h A (·) and h B (j) (·) denote the Cramér transforms of the inter-arrival time distribution and the service time distribution at queue j, respectively (where the Cramér transform of a random variable X is defined as h X (y) = sup s [sy − log M X (s)]). Finally, R is the index of the rightmost unstable queue under the change of measure, that is, R is the largest index j for which μ j < λ 1 under the change of measure. (Note that Frater and Anderson write M instead of R.) Then they explain how to find the minimum of (11). They show that for all queues j = R the optimal value of μ j is μ j and since h B (j) (1/μ j ) = 0 (see [8]) this implies that H reduces to a function of λ 1 , μ R and R in the following way, see (43) in [6]. Next, they note the two problems that remain in order to find the change of measure: 1. to find the value of R that is optimal, that is, the value of R that minimizes H(λ 1 , μ R , R), 2. given R, to find the values of λ 1 and μ R that minimize H(λ 1 , μ R , R).
Assuming the first problem is solved, that is, given R, the solution of the second problem is not hard, using a similar method as for the single GI|GI|1 queue, and Frater and Anderson show how to obtain the optimal values of λ 1 and μ R , referring to [8]. From these values, again using [8], the change of measure now follows, which prescribes exponential tilting of the distributions such that their rates become equal to the optimal rates. This change of measure turns out to be precisely as in (5) and (6) above, with the tilting vector given by where the latter is such that it satisfies As a result, the expectations of A and B (R) under the change of measure become 1/λ 1 and 1/μ R , as they should, so given the optimal value of R the problem is solved. However, finding the optimal R is difficult since (the index of) the rightmost unstable queue under the change of measure depends on this change of measure itself. Only for a certain class of problems, Frater and Anderson show that R can be chosen simply as the "rho-bottleneck" (see Remark 2.1 above). For the general case, they need to calculate for each possible value of R the optimal λ 1 and μ R , and then substitute these in H(λ 1 , μ R , R) to obtain a function H(R) that only depends on R, after which the optimal value R needs to be picked such that it minimizes H(R). When there are multiple candidates for R they seem to suggest that R should be chosen as large as possible, but this is not entirely clear to us.
Finally, it has to be checked whether under the resulting change of measure θ, the corresponding R is indeed the rightmost unstable queue. If this is not the case it is not clear how to proceed, but we will show in Section 3.2 that R is indeed the rightmost unstable queue under the change of measure.

Comparison of the two methods
In this section, we show that the method based on the decay rate (as described in Section 2.3) and the method by Frater and Anderson [6] (as described in Section 3.1) are equivalent. First of all it is clear that both methods consider the same type of exponential tilting based on (5) and (6), and that the tilting vectors θ * and θ have the same structure, so that only the inter-arrival times and the service times of one of the queues are tilted. Frater and Anderson find optimal values for λ 1 and μ R , where R is the particular queue to be tilted, but as described above this optimization is equivalent to finding the corresponding θ (R) .
(In fact, their use of λ 1 and μ j , j = 1, . . . , d in minimizing (11) and (12) can be seen as an alternative (one-to-one) parametrization to optimize the tilting parameters θ 0 and θ j , j = 1, . . . , d.) Given the optimal value of R, the value of the tilting parameter θ (R) is given in the same way as the θ (j) in our method, compare (13) with (2), and note that (13) also shows that [6] and [8] only consider cases in which (2) holds with equality (as we assume for our j * , see Assumption 2.1).
As a consequence, the change of measure is exactly the same for both methods if the same queue is tilted. Therefore, we only need to show that the bottleneck queue j * as described in Section 2.3 minimizes the function H(R), and then do the "Frater and Anderson check" to see if queue j * is indeed the rightmost unstable queue under the change of measure, as described in Section 3.1. We show these statements in the following two lemmas.
The first lemma relates the θ * -bottleneck queue to minimizing H(R). We start by briefly motivating how to rewrite H(R). As mentioned, for fixed R, Frater and Anderson choose the values for λ and μ R which minimize the function H(λ 1 , μ R , R) in (12). The optimization is done in exactly the same manner as was done by Parekh and Walrand in [8] for the single GI|GI|1 queue. We will not copy the details but only mention they set the partial derivatives of H(λ 1 , μ R , R) with respect to λ and μ R equal to zero, and combine this with properties of the Cramér transform and with the implicit assumption that (13) holds; for more details see Equations (37)-(44) in [8]. The result is simply that H(R) can be written as Proof: As mentioned before, θ (R) coincides with our θ * when j * = R. Indeed, H(j) = −Λ A (−θ (j) ) is minimal for the choice j = j * since θ * < θ (j) for all j = j * , and −Λ A (−θ) is a strictly increasing function of θ.
In the second lemma, we check that queue j * is the rightmost unstable queue in the θ * -tilted system.
and (ii) all queues k > j * are stable under the θ * -tilt, which proves the lemma.
(i) We say that a queue is unstable when the service rate of that queue is smaller than the local arrival rate to that queue. Under the θ * -tilt the service rate at queue We know that f (0) = f (θ * ) = 0 and f (0) < 0. By convexity of the log moment generating functions it must hold that f (θ * ) > 0 and so it follows, using (8) and (9), . We conclude this part of the proof by show- For a graphical interpretation, see Figure 1. Again using (8) and (9), we have for any where the first and the final inequality follow from the convexity of Λ B (k) (θ) and Λ B (j * ) (θ), and the second inequality follows by definition and uniqueness of θ Hence we have that queue j * is unstable under the θ * -tilt and, in particular, (ii) Finally, we show that queue j * is the rightmost unstable queue under the θ * -tilt. If j * = d this statement is trivial, so suppose for the remainder of the proof that j * < d. By (i), the arrival rate for queue j * + 1 is equal to the service rate of the unstable queue j * . Queue j * + 1 is stable, as we have by Eq. (14). Since queue j * + 1 is stable, the arrival rate for queue j * + 2 also equals the service rate of queue j * . Now look at any queue k ∈ [j * + 2, d] (if any). If all queues between j * and k are stable, queue k is also stable because , which follows immediately from Eq. (14), and hence the arrival rate at queue k equals the service rate of queue j * . Thus, by induction, the result follows.
Theorem 3.3: Under Assumption 2.1 and when R is unique, we have j * = R, and hence the θ * -tilt described in Section 2.3 and the P&W method as described in Frater and Anderson give the same change of measure.
Proof: Consider the θ * -tilt with bottleneck queue j * , then j * is the rightmost unstable queue in the θ * -tilted system by Lemma 3.2. We also know, in view of the uniqueness of j * , that θ * < min j =j * θ (j) and by Lemma 3.1 that j * minimizes H(R). Hence j * = R.
The equivalence of the two corresponding changes of measure is then immediate from the fact that both are based on (5) and (6) with θ = θ = θ * .
We note that −H(j * ) = Λ A (−θ * ) is the rate of decay, see (1). This implies that the large deviations approximation made in Parekh and Walrand is actually good.

THE θ-TILT IS NOT ASYMPTOTICALLY EFFICIENT WHEN θ = θ *
Having determined that the θ-tilt is the same as the P&W change of measure by Parekh and Walrand [8], in this section we show that it is the only exponential state-independent change of measure that may give an asymptotically efficient estimator. In Section 4.1, we introduce the likelihood ratio L θ of a path that reaches level N in a busy cycle of the system and give the mathematical definition of asymptotic efficiency in terms of the second moment of this random variable. Then in Section 4.2, we show the main result of this section, Theorem 4.3.

Definitions
Suppose we use the exponential change of measure θ. Remembering that by definition we have K = min(K 0 , K N ), we let the likelihood ratio L θ of a path that consists of K arrivals be, Here, k j is the number of initiated services in queue j just before the K-th arrival, formally defined as where n j is the number of customers in queue j just before the K-th arrival. When K = K N it holds that d k=1 n k = N − 1, so in that case we can also write k j = K − N + d k=j+1 n k + 1{n j > 0}.

Remark 4.1:
In principle, one could reduce the estimator variance a bit further by dividing the likelihood ratio in (15) by the likelihood ratio of the remaining service times upon reaching level N (but for a clearer presentation we decided not to do this).
Under the tilt θ, L θ 1{K = K N } is an unbiased estimator for p N , that is, The goal of importance sampling simulation is to get an asymptotically efficient estimator, which can be defined as follows (see for example [7]).

Definition 4.1: An unbiased estimator is asymptotically efficient if
Note that we always have lim sup N →∞ log E θ [(L θ ) 2 1{K = K N }]/ log p N ≤ 2 by Jensen's inequality. Hence, alternatively, we could replace the inequality in Definition 4.1 by an equality sign (and the liminf by a limit).
The meaning of the definition is that for an asymptotically efficient estimator, the second moment vanishes at twice the rate of the estimator itself. As a consequence, the relative error increases sub-exponentially.
Using (1), we find that the estimator is asymptotically efficient if

Main result
In this section, we show that using an exponential tilt other than the P&W change of measure cannot give an asymptotically efficient estimator. By the above, we need to show that an estimator based on the tilt θ = θ * satisfies lim sup Before we state the theorem, we need the following lemmas. Even though the statements seem obvious, they are not entirely trivial (especially the first one when d > 1); we present the proofs in the appendix.
where E is the event that the system never empties. Moreover, we have as N → ∞ that  Proof: Consider an exponential tilt θ = θ * , then the goal is to show (17) when θ = θ * .
To rewrite the second moment of the likelihood ratio in terms of the expectation under θ * , rather than θ, notice that Let E denote the event that the system never empties, as in Lemma 4.1. We find From (15) and then (5) and (6) it follows that and so the first term of the right-hand side of (18) is greater than or equal to Observe that with probability 1 we have Conditional on the event E, for which we have P θ * (E) > 0, we can replace K by K N and note that with probability 1 the liminf is a constant as K N − N ≤ k j ≤ K N . Then applying Lemmas 4.1 and 4.2, we can remove the conditioning from all terms of f (θ) and change the liminf to a limit in the first term. Thus, we have We now first show that a unique minimum of the above (and hence the tightest lower bound of (18)) is achieved at θ = θ * and conclude the proof by showing that f (θ * ) = Λ A (−θ * ). To find the minimum of f (θ), we note that we only have to consider θ such that Λ B (j) (θ j ) − θ j E θ * [B (j) ] ≤ 0 for all j = 1, . . . , d, so that we can take this constant out of the liminf which then becomes limsup. It is not hard to see that such θ exists (for example, by the convexity of Λ B (j) (θ j ) and by (9), we have for all . For all such θ we can write We take partial derivatives of f (θ): These partial derivatives are zero if and only if θ j = θ * j , j = 0, . . . , d, since all limsups exist and are strictly positive constants. Since the log-moment generating functions Λ A (−θ 0 ) and Λ B (j) (θ j ), j = 1, . . . , d, are strictly convex functions (unless their distributions are deterministic), the right-hand side of (19) is a strictly convex function (unless all distributions are deterministic, but this is ruled out by the non-triviality and stability assumption). Therefore, and because θ * is one of the values of θ for which (19) holds, we are justified in concluding that θ * is indeed a global minimum. Hence f (θ) is minimal only for θ = θ * .

Remark 4.2:
Note that the tilt θ = θ * can still give an asymptotically efficient estimator, but this is not guaranteed.

Remark 4.3:
In case of a single queue, Sadowsky showed that the θ * -tilt is the unique change of measure that is asymptotically efficient, see [9, Theorem 3]; however, he assumes bounded support for the service time distribution, which we do not need.

NECESSARY CONDITIONS FOR ASYMPTOTIC EFFICIENCY WHEN d = 2
Having found that the θ * -tilt is the only exponential state-independent change of measure that can possibly give an asymptotically efficient estimator, we show that additional conditions are needed for this change of measure to actually give an asymptotically efficient estimator. In this section, we assume that we have two queues in tandem (d = 2). First, we derive the conditions in Section 5.1, then we zoom in to the Markovian case and compare with earlier work in Section 5.2.

Derivation of necessary conditions
To work with (16), we first rewrite the likelihood L θ as given in (15), using (5) and (6).
To rewrite the denominator of (20), we note the following relation for I j , the idle time of queue j during the busy cycle, when K = K N , whereB (j) is the residual service time of the customer in service (if any) in queue j just before the overflow level N is reached; in the event that queue j is empty just before N is reached (which is unlikely when j = j * ), we setB (j) = 0. Combining with (16) and (20) we have asymptotic efficiency when For the numerator, we distinguish between two cases, depending on which queue is the bottleneck. When this is queue 1 (j * = 1), we have K − 1 − k j * = n 1 − 1{n 1 > 0}, so that we have asymptotic efficiency if and only if lim sup When queue 2 is the bottleneck queue (j * = 2), we have where we used that lim sup N →∞ 1/N log M A (−θ * ) 2(N −1) equals the right-hand side in (16), 2Λ A (−θ * ). In the sequel we will give necessary conditions for these inequalities to hold by considering specific sample paths which are very unlike the "typical" paths that lead to overflow. The advantage of this approach, which is also used in Glasserman and Kou [7] for the Markovian case, is that the chosen unlikely paths are easy to analyze, and the process spends much time on the boundaries of the state space which we know is problematic for asymptotic efficiency, at least in the Markovian case.
The specific paths we will consider are illustrated in Figure 2, and will be used in the proofs of the following theorems. After stating the theorems we will consider the Markovian case, also comparing with De Boer [3] and Glasserman and Kou [7].
We start with the necessary condition for asymptotic efficiency when queue 2 is the bottleneck queue since this is the easiest case, and show what it looks like for some special cases, including the M |M |1 tandem queue case.
Theorem 5.1: Consider 2 GI|GI|1 queues in tandem and suppose queue 2 is the bottleneck queue (j * = 2). Under Assumption 2.1 a necessary condition for asymptotic efficiency of the θ * -tilt is where More specifically, when the service times of the first queue are exponentially distributed with rate μ 1 , this condition becomes and for an M |M |1 tandem queue with arrival rate λ and service rates μ 1 and μ 2 (with μ 1 > μ 2 ), the condition becomes Proof: Consider the specific sample path with no service completions before level N is reached, that is, the path that moves N − 1 steps to the right from (1, 0) to (N, 0), see Figure 2, left panel. Based on this path we find a lower bound on the left-hand side of (22).
Since for this path 1{n 2 > 0} = 0,B (2) = 0 and I 2 = where the inequality follows because we only consider one possible path to reach the overflow level. Thus the general necessary condition for asymptotic efficiency in (23) follows. When B (1) ∼ exp(μ 1 ), the argument of the logarithm of the left-hand side in (23) reduces to so the necessary condition for asymptotic efficiency becomes which, using (7), leads to (24). Finally, (25) follows immediately from (24) by noting that θ * = μ 2 − λ for an M |M |1 tandem queue with j * = 2.
Next, we provide a necessary condition for an asymptotically efficient estimator when queue 1 is the bottleneck queue. Here, we will assume that the service times of the second queue are exponentially distributed (with rate μ 2 ) in order to give a useful expression for the necessary conditions. We also consider some special cases, including the M |M |1 tandem queue.
Theorem 5.2: Consider 2 GI|GI|1 queues in tandem and suppose queue 1 is the bottleneck queue (j * = 1) and that the service times of queue 2 are exponentially distributed with rate μ 2 . Under Assumption 2.1 a necessary condition for asymptotic efficiency of the θ * -tilt is where, as before, F θ * A (x) and F θ * B (1) (y) denote the probability distribution functions of A and B (1) under the tilt θ * . More specifically, when both queues have exponential services with rates μ 1 and μ 2 respectively, this condition becomes and for an M |M |1 tandem queue with arrival rate λ and service rates μ 1 and μ 2 (with μ 1 < μ 2 ), the condition becomes Proof: We determine a lower bound on the expected value in (21) by considering the sample path that alternates between 0 and 1 customers in queue 1, with no departures from queue 2, until level N is reached; that is, the path moves from (1, 0) to (0, 1), (1, 1), (0, 2), (1, 2), (0, 3), . . . , to (1, N − 1), see Figure 2, right panel. It is not hard to see that on this path we have B (1) k < A k , k = 1, . . . , N − 1, and also 1 . Obviously every B (1) k should be smaller than B (1) 1 + B (2) 1 as well, but this condition is implied by the above.
Also on this path we have n 1 = 0,B (1) = 0, K = K N = N and k 1 = N − 1, so where the first inequality follows because there are more paths that reach the overflow level than just the one we consider here. Next, since B (2) has the memoryless property, we can write where the B 1,k are i.i.d. copies of B 1 , independent of all else and d = denotes an equality in distribution. Note that, by assuming j * = 1, the service times of queue 2 remain exponentially distributed with rate μ 2 under the θ * -tilt, and hence still have the memoryless property. As a consequence, the right-hand side of (29) can be written as where in the last step the independence of A i and B (1) i is used. The general necessary condition for asymptotic efficiency in (26) now follows from applying (21) and B (2) ∼ exp(μ 2 ). When B (1) ∼ exp(μ 1 ) (and B (2) ∼ exp(μ 2 ) as before), the left-hand side of (26) reduces to from which (27) follows by using (7). Finally (28) follows from (27) by noting that θ * = μ 1 − λ for an M |M |1 tandem queue with j * = 1.

Comparison of necessary conditions for the M |M |1 tandem queue
In this section, we will make a comparison with earlier papers in the Markovian case. Since these papers always consider simulation in discrete time, we will first explain how this relates to our current work.

Continuous-time vs. discrete-time models.
In the current paper, we represent the GI/GI/1 queueing systems in continuous time, and we simulate in continuous time, by which we mean that we tilt the (typically continuous) distributions of the A k and B Alternatively, if all distributions are exponential, the system state can also be represented by a discrete-time Markov chain, embedded at transition epochs, which can also be simulated. We will refer to this as simulation in discrete time.
Parekh and Walrand [8] consider both simulation in continuous and in discrete time. For the single Markovian queue, they show that their heuristic for both continuous and discrete-time leads to the same change of measure, namely an interchange of the arrival and service rates (or probabilities). In the same way, any exponential change of measure in the discrete-time Markov chain (changing transition probabilities) can easily be shown to be equivalent to an exponential change of measure in the corresponding continuous-time Markov chain (changing transition rates).
We will now compare all known conditions for asymptotic efficiency from De Boer [3] and Glasserman and Kou [7], who apply simulation in discrete time, and the current paper. For ease of comparison, we will normalize the (continuous time) rates such that λ + μ 1 + μ 2 = 1, so that on the interior of the state space they coincide with the (discrete time) transition probabilities .

Queue 2 is bottleneck.
First, we consider the case in which queue 2 is the bottleneck, which now means that μ 1 > μ 2 . With the normalization, our necessary condition in Theorem 5.1 becomes μ 1 + 4μ 2 ≤ 2, and since μ 1 > μ 2 it follows in particular that a necessary condition for asymptotic efficiency is μ 2 < 2/5. This is stricter than μ 2 ≤ √ 2 − 1, which was obtained in Glasserman and Kou [7], simulating in discrete time. Thus, if μ 2 ∈ [2/5, √ 2 − 1] our estimator cannot be asymptotically efficient, while the estimator in [7] could be asymptotically efficient. Although this situation seems very unlikely, we cannot rule out the possibility since the two estimators are different.

Queue 1 is bottleneck.
Next, we look at the case in which queue 1 is the bottleneck, which for the M |M |1 tandem queue means that μ 1 < μ 2 . For this case, importance sampling has never been studied analytically before, because in the Markovian network both servers are interchangeable without changing the probability of overflow, see [10]. Nevertheless in [3] it is shown by numerical computations that in terms of asymptotic efficiency both servers are not interchangeable. Before we continue to summarize all known necessary conditions for the M |M |1 tandem queue in a figure, we present the "missing" result in Glasserman and Kou [7], namely a necessary condition for asymptotic efficiency of the estimator for simulation in discrete time, when queue 1 is the bottleneck queue. The proof is completely analogous to their approach for the other case, except that we consider the path in the right panel of Figure 2 (in discrete time), rather than the left panel.
Proposition 5.3: For an M |M |1 tandem queue, simulated in discrete time, with arrival rate λ and service rates μ 1 and μ 2 such that λ < μ 1 < μ 2 (queue 1 is bottleneck), a necessary condition for asymptotic efficiency of the corresponding estimator is Proof: In this case, the change of measure (here denoted as Q) prescribes to interchange λ and μ 1 . The definition for asymptotic efficiency is (cf. the continuous-time analog in Definition 4.1), Here L is the likelihood ratio of the path as simulated in discrete time, that is, where the product is taken over all transitions t i on the path, and P(t i ) and Q(t i ) are the probabilities of t i under the original and changed measure respectively. In order to get a lower bound on E Q [L 2 1{K = K N }], we consider the path in the right panel of Figure 2 (in discrete time) and note the following. For each transition, t i which is an arrival to queue 1, the contribution P(t i )/Q(t i ) to the likelihood ratio is In order to reach the overflow level, there are N − 1 arrivals to queue 1 (as we start with one customer in queue 1). For each departure from queue 1, except for the first one, the contribution to the likelihood ratio is The contribution to the likelihood ratio of the first departure from queue 1 is In total, there are N − 1 departures from queue 1. Therefore, the total likelihood ratio for this path is Similarly, the probability of this path under the change of measure Q, is ( , so that the left-hand side of (30) is at least Solving (30) concludes the proof.

Comparison.
We are now ready to summarize all necessary conditions from [3], [7] and this section for the M |M |1 tandem queue (with the convention that λ + μ 1 + μ 2 = 1) in Figure 3. For each estimator this figure shows for which parameter settings the estimator is certainly not asymptotically efficient, and for which settings it could be: • Between the dash-dotted (blue) lines the change of measure as discussed in the current paper (simulated in continuous time) does not give an asymptotically efficient estimator according to Theorems 5.1 and 5.2. • Between the dashed (red) lines the change of measure as discussed in [7] (simulated in discrete time) does not give an asymptotically efficient estimator according to [7] and Proposition 5.3. • Between the dotted (yellow) and solid (black) line the change of measure as discussed in [7] (simulated in discrete time) does not give an asymptotically efficient estimator according to [3].
When we compare the areas where asymptotic efficiency is certainly not attained, as derived from considering some unlikely path, that is, the area between the blue dash-dotted lines (simulated in continuous time), and the area between the red dashed lines (simulated in discrete time), we see that the first is largest. The method used by De Boer [3] gives an even bigger area for the discrete-time estimator, but this approach is different and only for the case where queue 2 is the bottleneck. Unfortunately this method cannot be used for simulation in continuous time.

NUMERICAL RESULTS
In this section, we give an example of the conditions that have been shown in the previous section. In order to easily show both bottleneck cases in one figure, we consider a tandem queue with exponentially distributed service times. Also we compare our results for the M |M |1 tandem queue with the results obtained by De Boer [3].
In Figure 4, we give an example to show that the necessary conditions for asymptotic efficiency are not always satisfied. In Tables 1 and 2, we show some simulation results for parameters as in Figures 3 and 4. In these tables RE denotes the relative error, that is, 1.96 times the standard deviation of the estimator divided by the mean of the estimator, and AE is given by where S is the total number of simulations, L(i) is the likelihood ratio in simulation i, and I(i) indicates whether level N has been reached in simulation i or not. This value should be 2 in case of asymptotic efficiency as N goes to infinity, see also Definition 4.1 and the text below it.
In Table 1, we present the results in case of a two node M |M |1 tandem queue, where the parameters are chosen such that the second queue is the bottleneck, and the necessary conditions for asymptotic efficiency given by De Boer [3] and Glasserman and Kou [7] are satisfied, while the conditions in Theorem 5.1 are not satisfied. In Table 2, we give results for a tandem queue with uniform arrivals and exponential service times at both queues, where the first queue is the bottleneck. Again, the parameters are such that the necessary conditions in Theorem 5.2 are not satisfied.   These tables suggest that the estimators are asymptotically efficient, as AE tends to 2 when N goes to infinity, although, in fact, they are not since they do not satisfy the conditions in Theorems 5.1 and 5.2. We can explain this in the following way. Firstly, in the proofs of Theorems 5.1 and 5.2 we considered very unlikely paths. So it is likely that these paths did not occur during these simulations and therefore it still seems that the estimator is asymptotically efficient.
Secondly, from Figures 5 and 6 we can indeed see that there are (very) unlikely paths with a large contribution to the likelihood ratio.
These figures show AE for three fixed values of N against the number of simulation runs S. We see that, even though for increasing N the value of AE seems to increase to 2 (as in Tables 1 and 2), the value of AE is clearly decreasing as the number of simulations increases. Therefore, the estimator cannot be asymptotically efficient. Moreover, the big jumps are caused by (rare) paths that have a large contribution to the likelihood ratio and they suggest that there exist paths that are even more unlikely to occur. Those paths  Table 1 is asymptotically efficient, while we proved that it is not. Figure 6. Similar possible explanation as in Figure 5, but corresponding to the situation as in Table 2. probably have an even larger contribution to the likelihood ratio such that the estimator is not asymptotically efficient.
Next, we compare our results for the M |M |1 tandem queue with the results obtained numerically by De Boer [3]. Both our and [3]'s results concern P&W changes of measure, but ours in continuous time and [3]'s in discrete time; cf. Section 5.2.1. In order to compare all these results, we use the convention that λ + μ 1 + μ 2 = 1 and we transform Figure 3 from [3], see Figure 7, such that λ/μ 1 and λ/μ 2 are along the x -axis and y-axis, respectively (which has been used more often throughout this paper). What we see in Figure 7 is that when queue 1 is the bottleneck queue our necessary condition is within the blue area. When queue 2 is the bottleneck queue it seems that our necessary condition does not coincide with the numerical results of [3]. This means that there are Figure 7. Figure 3 of [3], displayed with different x -axis and y-axis, together with our necessary conditions from Theorems 5.1 and 5.2. The green part has bounded relative error, the blue part does not give an asymptotically efficient estimator but has finite variance and the red part has infinite variance. Between the black lines, our necessary conditions are not satisfied.
certain parameter choices where our estimator is not asymptotically efficient, while the numerical results of [3] tell that the estimator considered there has bounded relative error. This can be explained by the fact that in [3] the queueing system is simulated in discrete time, while we simulate it in continuous time.

CONCLUSIONS
Parekh and Walrand [8] introduced a method to estimate the probability that the total number of customers in a queueing network reaches some level N in a busy cycle using simulation, but unfortunately for a network of GI|GI|1 queues it is not clear how to do so. Frater and Anderson [6] found this change of measure for the GI|GI|1 tandem queue, but only specified it implicitly, in the form of a minimization over all possible "guesses" for which queue would become the rightmost unstable queue. Fortunately, there is another way to explicitly find the change of measure for the GI|GI|1 tandem queue, based on the decay rate determined in Buijsrogge et al. [2]. In the present paper, we have shown that these two methods result in the same change of measure for the GI|GI|1 tandem queue for all cases where they are properly defined.
Also, we proved that this change of measure is the only exponential change of measure that can possibly result in an asymptotically efficient estimator. In other words, for a state-independent change of measure one should only consider using the P&W change of measure. However, using this change of measure does not guarantee asymptotic efficiency.
We have identified some additional necessary conditions for this change of measure to be asymptotically efficient in case of a two node tandem queue.
For future research, it seems useful to look for sufficient conditions for asymptotic efficiency, examine the tightness of the necessary conditions or maybe to improve the likelihood ratio with respect to Remark 4.1. However, it may be better to focus on state-dependent change of measures as these could be asymptotically efficient in the whole parameter space.