STOCHASTIC OPTIMAL DYNAMIC CONTROL OF GIm/GIm/1n QUEUES WITH TIME-VARYING WORKLOADS

Motivated by applications in areas such as cloud computing and information technology services, we consider GI/GI/1 queueing systems under workloads (arrival and service processes) that vary according to one discrete time scale and under controls (server capacity) that vary according to another discrete time scale. We take a stochastic optimal control approach and formulate the corresponding optimal dynamic control problem as a stochastic dynamic program. Under general assumptions for the queueing system, we derive structural properties for the optimal dynamic control policy, establishing that the optimal policy can be obtained through a sequence of convex programs. We also derive fluid and diffusion approximations for the problem and propose analytical and computational approaches in these settings. Computational experiments demonstrate the benefits of our theoretical results over standard heuristics.


INTRODUCTION
We consider a general capacity management and planning problem motivated by applications in diverse fields such as cloud computing, information technology services delivery, and modern energy management. In these applications, a high volume of demand arrives at the system and it is required that the system serves this demand in a timely fashion according to some form of service level agreement (SLA) guarantee. For this purpose, resource capacity has to be allocated to meet the demand and satisfy these guarantees. In addition, there are several important common features that have been observed in these applications that will have significant impact on the performance of the system, such as • Time-varying demand, • Timing of control, • Server speed selection, and • Server failure and departure.
The combination of these problem features results in a general class of very complex resource capacity management and planning problems.
A realistic model should accommodate the statistical patterns of the demand as it varies over time. It is also quite common in many of these systems that the time-varying workloads exhibit periodic or cyclic patterns. Seasonal effects and product cycles are some of the typical examples. In many cases, different forms and sources of periodicities can be mixed to create very complex behaviors; for example, seasonality, which is usually tangled with account cycles, might be coupled with the introduction of new products, and thus with product cycles. Hence, it is necessary to consider the time between shifts of statistical patterns of the demand to be random, where there is uncertainty in both the magnitude and direction of the pattern shifts.
It is unrealistic to assume that capacity decision can be made at the same time scale as the demand arrival; namely, we should not expect the service capacity to react in real time to any anomalies and changes in the demand, which would otherwise result unnecessarily in prohibitive costs. Meanwhile, the time scale at which control is applied should be short enough to affect the performance that will be measured against the SLA guarantees. Therefore, in addition to physical constraints for applying control, the right balance between operational cost and SLA penalties should also be included in the time scale for control.
Modern servers can be tuned to operate at different speeds. Different costs, largely related to different levels of energy consumptions, will be incurred at different speeds. At the same time, the servers will have different performance outcomes. While managing a largescale data center of servers, it will be beneficial to have the option of being able to accelerate and slow down the speed at which each server operates for both cost and performance purposes, since it is widely known that the network performance is not necessarily monotonic with respect to the service rate of individual servers.
It is not uncommon in typical large-scale data centers that servers might leave the system under consideration for various reasons [14]. This includes servers being relocated to run other applications, being temporarily unavailable due to maintenance or upgrade, or being gone purely due to physical failure. It has been observed that such exodus of resource capacity can happen at a rather stable rate and that the impact of these effects on the performance of the system is not negligible. Hence, it is reasonable to assume that the server capacity decays at a particular rate over control periods, which aggregates the different server departure and/or failure phenomenon mentioned above, and thus allows the capacity planning process to factor these effects in its decision process.
The backbone of our mathematical model is a general queueing system under a Markovmodulated demand process. As we mentioned above, the demand model needs to be tractable but versatile enough to accommodate the randomness in both the time interval between the demand pattern shifts as well as the pattern shifts themselves. A continuoustime Markov chain model is employed to capture these characterizations of the demand pattern shifts. Although we fix the time scale of our control periods, the SLA and operational cost will be factored in the objective function, which is an indirect way of determining the timing of the control. This way, the optimal control policy will include the time scale of the control. In the first part of the paper, we use a single server model with controllable service rate to capture features such as time-varying demand, control scaling, and server failure. Some of these features are modeled approximately, but we believe our mathematical model is capturing fundamental trade-offs that provide key insights while making it possible to solve a very complex stochastic dynamic program. In the second part of the paper, we discuss how our theoretical results can be extended to multi-server queues, which can model these features more precisely, and more importantly, is capable of modeling the server at different speeds.

Related Work
Time-varying stochastic systems have been employed quite frequently as a model for studying system performance. For queues and queueing networks with time-varying inputs, there is an extensive literature devoted to their asymptotic performance; see, for example, [10,15]. The control of time-varying systems is studied in the area of inventory management; see, for example, [2,13]. The optimal control of an Erlang loss system is studied in [1], where structural properties of the optimal solution as well as asymptotically optimal heuristic policies are obtained. It is worth emphasizing that queueing control is exercised only at discrete time periods in the present paper. We refer to [7,12] and further references therein for work in which server capacity can be adjusted at all times.
More recently, different online algorithms have been developed for a general class of resource allocation problems that are somewhat related to those considered in this paper; see, for example, [5,9]. An optimal control approach is developed in [6] for the case when different resources have to be utilized to fulfill a common demand.

Our Contributions
We develop a general model with some important fundamental features.
• We provide a general and unified capacity planning model, based on a stochastic optimal control approach with general assumptions on the arrival and service patterns of the demand. Our approach also considers features such as the timing of control, server speeds, and server failure and/or departure. • Under these very general assumptions, we derive important structural properties for the stochastic optimal control problem. More specifically, we show that the value function is concave with respect to the capacity (service rate), and hence can be obtained through a sequence of convex programming problems. These structural properties allow us to compute the optimal policy for problem sizes of interest, and also lead to some critical-point threshold optimal control policies as surrogates for larger-sized problems. • For the general model, we also derive forms of fluid and diffusion approximations for the stochastic optimal control problem, which enable us to derive easy-toimplement heuristic policies. The performance of the systems under heuristic policies are investigated via computational experiments.
The rest of the paper is organized as follows. Section 2 presents the detailed mathematical model of the queueing system. Structural properties of the general problem are derived in Section 3. Fluid and diffusion approximations together with a heuristic analysis and related computational experiments are considered in Section 4. A discussion of a generalized model is presented in Section 5, and the paper concludes with a summary in Section 6.

MODEL AND FORMULATION
Consider a GI/GI/1 queue over a time horizon of length L under time-varying workloads and time-varying control of the service rate, with L finite or infinite. The arrival and service processes vary according to an exogenous discrete time scale from one workload period to the next, indexed by m = 1, . . . , M, where workload periods have length T w . The control of the single-server service rate varies according to an independent decision-making discrete time scale from one control period to the next, indexed by n = 1, . . . , N, where control periods have length T c . The period lengths T w and T c can be determininstic or random variables; to simplify the exposition, we focus on the case where T c is determininstic. We use the notation GI m /GI m /1 n to refer to this general class of queueing systems. Let {A m (t); t ≥ 0} denote the arrival process for workload period m over the time inter- More specifically, the workload starts period m = 1 in state j with probability α j , remains in state j for the duration of the period of length T w , and then transitions at the end of period m = 1 to state j with probability P 1,jj , for all j, j = 1, . . . , J. These workload dynamics continue for all subsequent periods m = 2, . . . , M starting in any state j where the workload remains in state j for the duration of the period of length T w and then transitions to state j with probability P m,jj at the end of period m.
The objective of our stochastic optimal control problem formulation is to determine the service rates μ n for every control period n = 1, . . . , N that maximize net-benefit in expectation over the entire time horizon of length L = N · T c , subject to model inputs and constraints. Let W n denote the amount of workload remaining in the queue at the beginning of control period n, {D n (t); t ≥ 0} denote the workload process over the time interval [t c n−1 , t c n ), D n denote the aggregate amount of workload arriving during control period n, and B n denote the proportion of time the server is busy over the interval [t c n−1 , t c n ). The quantity W n and the sequence of previous service rate decisions μ n , n = 1, . . . , n − 1, are known with certainty at the start of control period n. In addition, the stochastic process {D n (t); t ≥ 0} is probabilistically characterized by the sequence of arrival and service processes {A m (t); t ≥ 0} and {S m (t); t ≥ 0} over the time interval [t c n−1 , t c n ), the random variable D n is probabilistically characterized by {D n (t); t ≥ 0}, and the random variable B n is probabilistically characterized by and dependent upon W n , {D n (t); t ≥ 0} and the control decision μ n . The dynamics of the workload over the control periods take the form where we assume there is no server idle time between any two control periods unless all the workload has been served.
Our objective function is based on the expected total discounted net-benefit over the entire time horizon, with discount factor β, in which rewards are gained for the completion of workload within each control period as a function R(μ n , B n , T c ), costs are incurred for the service-rate capacity deployed for each control period as a function C(μ n , T c ), and penalties are incurred for server idleness and workload delay within each control period as a function P I (μ n , B n , T c ) and P D (W n , D n , μ n , B n , T c ), respectively. We consider the reward and cost functions to be respectively given by with revenue rate r ≥ 0 and cost rate c ≥ 0. Similarly, we consider the penalty functions to be given by with idle and delay penalty rates p I ≥ 0 and p D ≥ 0, respectively.
Define the service rate vector μ := (μ 1 , . . . , μ N ). We then have the following general stochastic dynamic program formulation of our stochastic optimal control problem over the time horizon of length L = N · T c : where β is the discount factor and the expectation is over the probability space (Ω, F, P) of the GI m /GI m /1 n queue. The service rate vector μ comprises the decision variables over the time horizon that we seek to obtain, with all other variables as input parameters.

GENERAL SOLUTION
In this section, we consider our main results for the general solution of the stochastic dynamic program (2) and (3). We start with structural properties of the optimal dynamic control policy and establish that the general solution can be obtained through a sequence of convex programs. We then turn to consider a surrogate problem of the original stochastic dynamic program and establish that it is equipped with a critical-value (threshold) optimal dynamic control policy.

Structural Properties
For any control period n and any non-negative service rate μ n , the expected discounted net-benefit can be rewritten as Let J n (W n ) be the value function for maximizing the expected discounted net-benefit of the stochastic dynamic program of interest over the time horizon from control period n to control period N with W n workload in the queue at the start of control period n.
Then we can formulate our stochastic dynamic program in terms of the Bellman optimality equations where we assume J N +1 (W N +1 ) = 0 and a discount factor β. It is important to note that the probabilistic characterization of the random variable D n is based on the stochastic process {D n (t); t ≥ 0}, which in turn is probabilistically characterized by the sequence of time-varying workload processes {A m (t); t ≥ 0} and {S m (t); t ≥ 0} over the time interval [t c n−1 , t c n ). One of the most interesting and important quantities in the Bellman optimality equations is μ n B n . We next present the following key monotonicity result for this quantity under the general stochastic dynamic control model. Proof: Note that μ n B n T c represents the amount of work that is served within each control period n. It is easy to see that μ n B n is a piecewise linear function of μ n that consists of two elements, the first having positive slope and the second element having slope zero. Hence, as T c is a constant, it naturally follows that μ n B n is increasing and concave with respect to μ n almost surely.
Since the expectation operator preserves monotonicity and concavity, we also have the following result. By similar arguments, we obtain another key monotonicity result. Recall that the random variable B n depends on the stochastic process {D n (t); t ≥ 0} and the control decision μ n , and that the process {D n (t); t ≥ 0} depends on the sequence of arrival and service processes {A m (t); t ≥ 0} and {S m (t); t ≥ 0} over the time interval [t c n−1 , t c n ). Hence, generally speaking, there are important implications with respect to both performance and cost management for the server to be busy throughout the entire period. A more detailed analysis that includes such effects is beyond the scope of this paper, and therefore we assume that the decision variable μ n will always be taken from the set in which μ n B n < T c is guaranteed. Then, from Lemma 3.3, μ n B n is always linear with respect to W n .
We now can present our first main result of this section for the finite-horizon version of the stochastic optimal control problem.
Theorem 3.1: For each control period n = 1, 2, . . . , N < ∞, there exists a finite service-rate capacity allocation that realizes the global optimal solution of the problem (4), (5) starting at control period n with a given initial workload W n , and this optimal solution is the service-rate capacity planning policy that employs the service-rate capacity μ * n where μ * n is the smallest service-rate capacity assignment that maximizes U n (μ n , W n ).
Proof: The proof proceeds by induction where we first consider the basis step n = N . Suppose that the remaining workload is W N at the beginning of the last control period and that the total amount of workload to arrive over the last control period is given by We therefore conclude that a finite optimal service-rate capacity μ N can be obtained through the solution of a corresponding convex program. By definition, for control period N − 1, we also know that for any pair (μ, W ). It then follows that Next, as part of the induction step, suppose the above statements are true for n + 1 and consider the problem starting at control period n. We then have the following four induction-step properties: Now, for each sample path in control period n, consider U n+1 (μ n+1 , W n+1 ) as a function of μ n B n , from which it is readily verified that the decision variable μ n is a convex function of μ n B n and thus −μ n is a concave function of μ n B n . Meanwhile, W n+1 is a linear function of μ n B n with negative drift. Moreover, since μ n is taken from the set in which μ n B n < T c is guaranteed, J n+1 (W n+1 ) is increasing and linear in W n+1 . Combining all these properties, we conclude that J n+1 (W n+1 ) is a concave function of μ n B n . Since the expectation operator preserves concavity, a corresponding convex program can be employed to obtain the optimal solution.
Furthermore, from the perspective of the stochastic dynamic program with respect to μ n , we have We therefore conclude there is a finite optimal service-rate capacity μ n that corresponds to the solution of the convex program for control period n. From (5) and the fact that for any pair (μ, W ) and thus we conclude for any W ≥ 0, completing the proof. We next establish our second main result of this section for the infinite-horizon version of the stochastic optimal control problem. To this end, let us denote by U N n (μ n , W n ) and J N n (W n ) the Bellman optimality equations in (4) and (5) with finite N .
Hence, there exists a finite service-rate capacity vector that realizes the global optimal solution of the infinite-horizon problem (4), (5), and this optimal solution is the service-rate capacity planning policy that employs the service-rate capacity μ * n where μ * n is the smallest service-rate capacity assignment that maximizes U ∞ 1 (μ, W ).
Proof: The statements concerning the limits U ∞ 1 and J ∞ 1 , as well as the properties of U ∞ 1 and J ∞ 1 , directly follow from Theorem 3.1 provided there exists a finite non-decreasing function J (W ) such that From the induction step of Theorem 3.1, we know there exists a finite nondecreasing function U(μ, W ) such that for all N . Replacing the right-hand side of (4) with U(μ, W ) and applying the induction arguments of Theorem 3.1 to this revised stochastic optimization problem reveals the existence of a finite non-decreasing function J (W ) such that J N 1 (W ) ≤ J (W ) for all N . This together with the convex programming arguments of Theorem 3.1 complete the proof.
In Theorems 3.1 and 3.2 we establish crucial structural properties of the optimal solution for our stochastic dynamic program (2), (3) in the case of finite and infinite time horizons, respectively. The original stochastic dynamic program is an extremely difficult problem to solve and these structural properties are important from a theoretical perspective, as well as for developing corresponding stochastic optimal control policies in practice. In particular, we have established second-order monotonicity properties for the value function of the stochastic dynamic program (4), (5), and have ensured that the stochastic optimal control policy which solves this dynamic program can be obtained through the solutions of a sequence of convex programs. Clearly, these convex program solutions take into account the probabilistic characterization and dependence of the random variable B n through {D n (t); t ≥ 0} and μ n , and the probabilistic characterization and dependence of the stochastic process . We note that, within the context of our general results, the stochastic optimal control policy tends to not leave W n either very high or very low. Hence, upon solving the convex programs for appropriately selected discrete values of W n , then only a relatively small amount of possible values would need to be considered for each control period. This in turn can be used to significantly reduce the computational effort needed for computing the sequence of convex programs as part of a stochastic optimal control policy in practice.
Since the stochastic optimal control policy for each control period n as the optimal solutions of a sequence of convex programs depends on the initial condition W n in a non-linear fashion, one should not expect the optimal solution to be of a critical-value (threshold) type, which is often sought after in many stochastic dynamic programming problems. This further speaks to the difficulty of the general stochastic dynamic program (2), (3), even with steps taken to reduce the computational effort required. Hence, to address these complexities, we next turn to a surrogate problem of the original stochastic dynamic program.

A Surrogate Problem
We consider in this section an approximation to the stochastic dynamic programming problem (2), (3) in which the dependence of the convex programming on W n is assumed to be linear. Such a surrogate problem is identified with the goal that the corresponding stochastic optimal dynamic control policy will be of a critical-value (threshold) type. Indeed, we will establish that this surrogate problem is equipped with a threshold optimal dynamic control policy, together with establishing performance bounds on the corresponding approximate solution relative to the original stochastic dynamic programming solution.
Specifically, the approximation we consider here consists of revising the Bellman optimality equations as follows where we assumeJ N +1 (W N +1 ) = 0 and denote by μ nBn T c the work that would be served assuming there is no initial workload in the system and the service-rate capacity is μ n ; that is, we decouple the overall workload from W n at the start of the control period and from the arrivals within the control period. The per-period control decision of this approximation is physically equivalent to a system in which there is an infinitely fast server available to serve all of the initial workload at the beginning of the control period. Observe that, in the case of (6), (7), the stochastic dynamic program depends on W n only linearly, which allows us to establish that there is a critical-value (threshold) type of optimal control policy for the revised stochastic dynamic program. Namely, as a result of the first term in the expectation of (7), for each control period n we only need to solve one convex programming problem independent of W n in order to determine the optimal service-rate capacity policy. Meanwhile, for any μ n , we certainly know that From the setup of the problem, we also have It then follows that The above arguments directly lead to the following approximation of the solution of our general stochastic dynamic program (2), (3) and its performance bounds. (6) and (7). In addition, this optimal control policy has the performance guarantee given in (8).

APPROXIMATE SOLUTIONS
In this section we consider various approximate solutions of the stochastic dynamic program (2), (3) primarily within a single workload period m. We start with a heuristic analysis, including comparisons of system performance under heuristic policies with respect to optimal solutions. We then derive forms of fluid and diffusion approximations for the general stochastic optimal control problem. Lastly, we present the results of computational experiments that investigate some of the properties identified herein; this includes exploring a highly dynamic workload scenario.

Heuristic Analysis
Let us begin by conducting a heuristic analysis with the goal of obtaining some qualitative properties of the stochastic dynamic program (2), (3). For simplicity, we assume the discount factor β = 0 throughout this section.
First, we provide an equivalent representation of the objective function (2). From (1), we directly have which allows one to express B n in terms of W n and W n+1 . Then we can rewrite the objective function (2) using (9) to obtain Note that the first term in (12) corresponds to the exogenous demand process over which we have no control. Moreover, for a well controlled system and moderately large N , the last term in (12) is of constant order and therefore does not play a significant role from a qualitative standpoint. Hence, maximizing the expected profit P({μ n }) is equivalent to minimizing In other words, one wants to strike a balance between the sequence of controls {μ n } and the sequence of workloads {W n } relative to the corresponding costs and penalties. As is clear from (9), W n+1 is non-increasing in μ n for a fixed demand process. However, even for a fixed D n and μ n , there is significant variability in W n+1 depending on the arrival pattern of the customers. Furthermore, the exact cost implications are complicated by the idleness cost since capacity in excess of the unknown and uncertain demand is penalized both at the nominal procurement cost rate c and also at the idleness rate p I . Two intuitive control policies will be analyzed to shed light on some fundamental relationships for the problem. We first consider an on-off policy that alternates between setting μ n = 0 for successive periods n = 1, . . . , K, and then setting μ K+1 = μ (for some constant μ). In this case, no capacity or idleness costs are incurred for the K periods of inaction with the sole penalty based on accumulated work, thus rendering which grows quadratically in the length of the off -period (assuming E[D n ] is constant over the K periods). However, even if we choose the constant on-capacity μ = W K+1 in order to clear all the backlog, the expected revenue is rKE[D 1 ] and the sole expected production cost (since there are no capacity and idle costs) is cμ n T = cKE [D 1 ]. This together with the expected delay penalty above results in the following expression for expected profit over the K + 1 periods we will later show that this expected profit can be significantly improved upon. Roughly speaking, the insight is that if we make μ n too small so as to consistently leave a fraction of the demand unfulfilled, then the backlog grows quadratically in the number of periods, whereas any possible profit accruable by clearing that backlog at a future time will only grow linearly in the same number of periods.
Next, we consider an off-line policy that assumes to know the actual values taken by the random demands D n in advance but not the arrival pattern. It is reasonable in this case to set with the goal of clearing the demand within the period in which it arrives; but owing to the uncertain arrival times of the demand, there is a non-trivial probability that W n = 0 and B n < 1 so that we incur additional backlog and idleness penalties. The performance of this off-line policy can be better understood by studying a related M/G/1 queueing system where, instead of assuming the actual offered demand is known, we assume knowledge of the first two moments of the service demand random variable S. We will then show how one can determine the optimal capacity μ that balances all revenue and costs. Recall the objective function (10) which is the expected value of profit. Consider the case where we do not partition the planning horizon into periods and instead want to maximize profit for an M/G/1 queueing system with Poisson arrivals at rate λ, independent and identically distributed (i.i.d.) service demands S n drawn from a general distribution, and constant server speed/capacity set at μ. To maintain stability, we assume the system load ρ = λE[S]/μ < 1. The average workload (or work in system) is known to be [16] We then can express the expected profit per unit time as a function of the capacity μ by . (14) Note that the backlog penalty p D is a crucial parameter since it helps determine how much service one should provision over and above the bare minimum amount of λE [S]. It is clear that the first term in (14) is a constant, and therefore in order to maximize expected profit we want to minimize the function

over μ > λE[S]. Such a minimizer exists because f (μ) → ∞ both as μ ↓ λE[S] and μ → ∞.
Differentiating with respect to μ, we derive and then upon setting the derivative to zero we obtain Solving the non-linear Eq. (15) renders the optimal capacity μ * MG1 . In Section 4.4, we will consider the special case when the service times are i.i.d. Exp(θ) whereupon E[S] = 1/θ, E[S 2 ] = 2/θ 2 . Now, we turn to consider lazy processing and workload-based policies. The following heuristic provides a lower bound to the optimal profit: In each period, set the processing rate to be μ lazy Such a lazy processing approach is particularly attractive because it eliminates the need to estimate the time-varying characteristics of the demand by simply processing work with a delay of one period. Within this setting, we guarantee that the server never idles and hence B n = 1; moreover, all the work that arrives in a period is left unprocessed so that W n+1 = D n . The objective function (11) also simplifies (since there is no idling cost) to Assuming the system starts empty we have W 1 = 0 and the objective function reduces further to The lazy processing approach suggests the following parametric family of policies with multiplicative factor α > 0: For 0 ≤ α ≤ 1, it is still true that B n = 1 and hence W n+1 = (1 − α)W n + D n . We therefore have It is not difficult to show that E[W n ] ≈ 1 α E[D n ] and thus we have Upon comparing this with (16), we observe that the α < 1 provides worse expected profit than α = 1.
It is a difficult problem to ascertain the optimal α * for the family of lazy multiplicative policies, in part because it is hard to characterize how α * depends on the model parameters such as c and p D . We have considered so far in this section two kinds of policies: (1) a workload-based policy and (2) an M/G/1-based policy. Both have their respective benefits: The workload-based policies do not need to estimate the statistics of the demand process and are self-regulating but at the expense of an added backlog penalty; whereas the M/G/1based policy tries to use process statistics and estimates to set the right capacity that stays fixed over a large number of consecutive periods but at the expense of reduced flexibility and an added penalty due to excess capacity when the instantaneous load is low. We believe that the optimal policy is able to use a hybrid setting for capacity that depends on both W n and an estimate of expected demand over the next period, for example μ n = Wn Tc + αμ *

Fluid Model Analysis
We next investigate the more formal fluid model approximation (e.g., see [3]), again considering the stochastic dynamic program over a single workload period m. We assume the demand D n in each control period n is a deterministic quantity, arriving at constant speed D n /T c . Let μ * n (w) denote the optimal policy (i.e., the optimal capacity level to set) for control period n when w units of workload is observed at the beginning of the period. For the fluid model, μ * n (w) ∈ [0,μ n (w)] whereμ n (w) := (w + D n )/T c ; that is, the server speed that is needed to process the sum of the starting workload and the newly arriving workload.
Define  Proof: To simplify the presentation, we provide the details of the proof for a two controlperiod problem. The proof for problems with more control periods follows similarly. The one-period-to-go optimal value function reads and hence Upon substituting (19) into (18), we have Now consider the two-period-to-go optimal value function, which after some term manipulation yields where z := max{r − c, −p D }. The optimal solution then can be broken down into three cases.
1. If z = r − c or r − c > −p D , we have that μ * N −1 (w) =μ N −1 (w) and 2. If z = −p D (or equivalently r − c ≤ −p D ) and in the meantime r − c > −2p D , we also have that μ * N −1 (w) =μ N −1 (w) and 3. If r − c ≤ −2p D , we have that μ * N −1 (w) = 0 and In words, Theorem 4.1 states that depending into which of the I n intervals the quantity r − c falls, the optimal policy is to always deplete the workload to zero until a certain control period and then set the capacity level to zero thereafter. The key quantity r − c can be naturally viewed as a profitability index of the system.
The structure of the optimal policy in the fluid model approximation suggests that a good policy for the original stochastic dynamic program is to set the capacity level for control period n such that the expected remaining workload at the end of the period is equal to some target level w * n , with w * Note that for the fluid model approximation some of the w * n are simply zero, according to Theorem 4.1.
The above insights from the fluid model approximation inspires the following control policy: 1. At the beginning of control period 1, observe the current workload w and solve min (μ1,...,μn) 2. Set the server speed to the optimal μ 1 that results from solving the above linear program.
3. At the beginning of control period 2, repeat the above two steps by treating it as the beginning of an (N − 1)-period problem.
4. Continue the above three steps until the end.
Note that the linear program objective function (25) is motivated by the equivalent expression for the N -period objective function (12), in which the non-controllable terms are dropped from (25). Moreover, the recursion (26) is simply a linear approximation that relates the workload to the capacity level. By resolving the linear program that aims to achieve a desirable target workload path at each control period, the above policy exercises good control on the evolution of the stochastic system.

Diffusion Approximation
Now, since the system state at each control period over a single workload period can be viewed as the time-T c workload of a G/G/1 queue, we consider the use of the diffusion approximation result for the workload process of a G/G/1 queue (see [4]) to derive an explicit approximating expression for the transition probability of the stochastic dynamic program. Let 1/λ denote the mean interarrival time, 1/μ the mean service time, and c 2 a (c 2 s ) the squared coefficient of variation for the interarrival (service) times. Define ρ := λ/μ. Then the following approximation holds whereW (·) is a reflected Brownian motion with initial level w(1 − ρ), drift -1 and variance (c 2 a + c 2 s )μ −1 . Furthermore, using the time dependent distribution of the reflected Brownian motion (see [8]), we have which serves as the action-dependent transition probability function of the continuous-statespace, continuous-action-space Markov decision process at hand. Specifically, due to the value function expression in Section 3 and relation (9), the diffusion-approximated dynamic programming recursion boils down to subject to (28). This can be solved by first discretizing the state and/or action space and then using the value iteration method.

Computational Experiments
We end this section by presenting the results of computational experiments that quantitatively investigate some of the results and properties derived above. To ground the discussion, we make a few basic assumptions. First, we assume r > c since this setting represents a realistic situation where the system is capable of earning a positive profit. Second, motivated by (13) and the discussion preceding it, we fix c = 1 and p I = 1 and study the performance of control policies as a function of changes in p D ; roughly speaking, the performance of the system is a function of the ratio (c + p I )/p D and therefore only p D needs to be varied. We set T c = 1 and the number of control periods to be N = 100. Further we consider a system with low arrival rate, that is, λ = λ = 1, and another with a high arrival rate, that is, λ = λ h = 5. As mentioned in Section 4.1, we will solve (15) to obtain μ * MG1 by assuming that service demands are i.i.d. Exp(θ) whereupon E[S] = 1/θ, E[S 2 ] = 2/θ 2 . Without loss of generality, we set θ = 1 and therefore E[S] = 1 and E[S 2 ] = 2. As in Section 4.1 we set the discount factor β = 0 for simplicity.
Before proceeding to describe our computational results, we briefly summarize some of the key insights derived earlier in this section. At the end of Section 4.1, we consider a heuristic control policy that depends on a linear combination of W n , the backlog workload at the beginning of a period, and some estimate of the expected demand that will be realized in the period, such as μ MG1 . Our fluid analysis in Section 4.2 confirms this insight. Specifically, when r > c, Theorem 4.1 recommends that the system always drain fluid at a rate μ fluid given that r − c always belongs to the interval I N by assumption. In essence, our heuristic control policy is an affine decision rule that only depends on the current workload (as opposed to the whole sequence of past workloads) and an affine term that can be chosen to be proportional to some estimate of anticipated demand such as λE [S] or μ * MG1 . For our first set of computational results, we focus on the low arrival rate case where λ = λ l = 1. Figure 1 depicts the expected profit curves as a function of α for three different values of p D = 1, 5, 10. As expected, the expected profit reduces with an increased backlog penalty; further the value of α that maximizes profit increases with p D . This latter trend is depicted in Figure 2 where we consider the best value of α as a function of p D = 1, . . . , 10. Observe that α * = 0 as long as p D ≤ c + p I , which corresponds to our intuition above that in this case the capacity maintenance and idleness costs dominate the backlog penalty and hence a purely workload-based policy renders optimal expected profits within this family of linear decision rules. Note also that α appears to grow linearly in p D once p D > c + p I . It is worth pointing out that α allows one to balance between choosing a server with high average utilization or with low average workload. Hence, we observe in Figure 3 that while  increasing α leads to a system with decreasing average workload and hence a lower backlog penalty, it also leads to the server idling more and thus incurring a higher idling penalty.
Next, we consider the high arrival rate case where λ = λ h = 5. We observe that a nonzero α always appears to provide the highest expected profit as illustrated in Figures 4  and 5. The dependence between α * and p D , however, still appears to follow a fairly linear trend as shown in Figure 5.
Finally, we consider a case where the demand arrival rate alternates between λ = λ l = 1 in every odd numbered control period and λ = λ h = 5 in every even numbered control period. We further suppose that the system operator has some indication of this highly    Note that both of these control approaches are similar in spirit to the off-line policy discussed in Section 4.1 since we assume some knowledge of future demand. Through this last computational experiment, we seek to investigate the importance of employing a linear decision rule wherein the capacity depends on the backlogged demand from the previous period. With such a highly variable load, it is almost futile to try and calculate the required capacity based solely on an estimate of expected demand. Figures 6  and 7 demonstrate the stark difference between the two policies: Whereas the fluid-model control policy is able to maintain expected profit within a reasonable range over all values of α, expected profit under the demand-based control policy exhibits a large sensitivity to the correct choice of α. Moreover, the expected profit for this latter policy is also lower than that accrued when using the linear decision rule.

GENERALIZED MODELS
To elucidate the exposition in earlier sections, we chose to postpone the inclusion of certain problem features, mentioned in the introduction, from our original general stochastic model, optimization formulation and optimal solution. We now discuss how these problem features can be addressed as part of and incorporated in our analysis and results of earlier sections.
One of the features that was not included in our earlier analysis concerns the failure and/or departure of the servers. In this case, for each control period n, if the service-rate capacity is set to a certain level μ n at the beginning of the control period, then the realized service-rate capacity will vary over time within the interval [t c n−1 , t c n ) according to μ n (t). To this end, we consider both an additive model and a multiplicative model for μ n (t) as follows: • additive model: μ n (t) = μ n − η(t − (n − 1)T c ), for t ∈ [t c n−1 , t c n ); • multiplicative model: μ n (t) = μ n e −γ(t−(n−1)Tc) , for t ∈ [t c n−1 , t c n ); where η > 0 and γ > 0 are two fixed parameters. It is then readily verified that the structural properties we derived in Section 3 continue to hold and that the computational results to obtain both the general and surrogate problem solutions are easily adapted to that of the corresponding GI m /GI m /1 n queues with time-varying service rates.
On the other hand, to faithfully model the detailed dynamics of server failures and departures, it is essential to consider a multi-server model. Doing so also allows us to incorporate the phenomenon that a server can operate at different speeds together with addressing the corresponding cost/benefit trade-offs in such a queueing system. Suppose that, for each control period n, we need to determine a vector X n that consists of the number of servers operating at each of the speed levels s = 1, 2, . . . , S L . The objective function of our stochastic dynamic programs can then be revised accordingly. As a specific example, we can replace the objective (13)  where the cost factor c is replaced by a vector {c s } that reflects the different costs incurred for operating servers at different speed levels. Note that W n , n = 1, 2, . . . , N, will follow the much more complex dynamics of a multi-server queueing system with different speed levels for each server and where each server has an independent clock to follow for its failure and departure from the system. However, even in such very complex instances of our general stochastic dynamic programs, the corresponding optimal solutions continue to enjoy the general structural properties we derived earlier in the paper. We note, however, that detailed fluid and diffusion approximations of such systems require sophisticated studies of multi-server queues with time-varying service rates, which is the subject of future work.

CONCLUSIONS
In this paper, we considered a general class of stochastic dynamic control problems for a queueing system with time-varying Markov-modulated arrival and service processes (indexed by m) and time-varying control of the service rates of the system (indexed by n). Briefly reiterating the contributions of our study, we formulated a stochastic dynamic control problem for maximizing the expected total discounted net-benefit under very general assumptions on the customer arrivals and service distributions. For this general formulation, we established fundamentally important structural properties of the optimal solution, including the second-order monotonicity of some key performance metrics as well as results on the concavity of the objective function for our net-benefit maximization problem. These results reduced the calculation of the value function to a sequence of convex programming problems, which in turn allows the computation of the value functions for various real-world problem sizes of practical interest. Although larger-sized problems still face a prohibitively difficult task of computing the value function due to the inherent curse of dimensionality, the structural properties established in this paper (e.g., concavity) make it possible to employ various additional techniques, such as those from approximate dynamic programming, to compute approximate optimal solutions; refer to, for example, [11]. Furthermore, making use of the derived structural properties, we also demonstrated that there is a critical-value (threshold) type of approximate optimal control policy that can be fairly easily computed together with certain performance guarantees.
We also investigated different options for approximations within a single workload period m. The first class of approximations consists of several parameterized control policies that are commonly used in real applications, along with some observations of system behaviors under such heuristics. Both theoretical and computational results have been obtained on the performance of the system under these control policies. Furthermore, we investigate both fluid and diffusion approximations that in general have been successfully applied in the analysis and control of complex queueing systems and networks. The optimal solution for the fluid model is obtained, which suggests a certain targeted level for the expected remaining workload; a heuristic control policy based on these observations is evaluated in our computational experiments. A diffusion approximation provides a further refinement of these models and bridges the differences between the heuristics and the exact solutions. Various computational experiments were conducted, demonstrating and quantifying our results and related observations.