To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Part VI discusses queueing analysis where the arrival process and/or service process are generally distributed.
We start with Chapter 20, where we study empirical job size distributions from computing workloads. These are often characterized by heavy tails, very high variance, and decreasing failure rate. Importantly, these are very different from the Markovian (Exponential) distributions that have enabled the Markov-chain-based analysis that we have done so far.
New distributions require new analysis techniques. The first of these, the method of phase-type distributions, is introduced in Chapter 21. Phase-type distributions allow us to represent general distributions as mixtures of Exponential distributions. This in turn enables the modeling of systems involving general distributions using Markov chains. However, the resulting Markov chains are very different from what we have seen before and often have no simple solution. We introduce matrix-analytic techniques for solving these chains numerically. Matrix-analytic techniques are very powerful. They are efficient and highly accurate. Unfortunately, they are still numerical techniques, meaning that they can only solve “instances” of the problem, rather than solving the problem symbolically in terms of the input variables.
In Chapter 22 we consider a new setting: networks of Processor-Sharing (PS) servers with generally distributed job sizes. These represent networks of computers, where each computer time-shares among several jobs. We again exploit the idea of phasetype distributions to analyze these networks, proving the BCMP product form theorem for networks with PS servers. The BCMP theorem provides a simple closed-form solution for a very broad class of networks of PS servers.
In Chapter 21, we saw one application for phase-type (PH) distributions: If we need to analyze a system whose workload involves distributions that are non-Exponential (e.g., high-variability workloads), then we can use a PH distribution to at least match 2 or 3 moments of that workload distribution. This allows us to represent the system via a Markov chain, which we can often solve via matrix-analytic methods.
In this chapter we see another application of PH distributions. Here, we are interested in analyzing networks of Processor-Sharing (time-sharing) servers (a.k.a. PS servers). It will turn out that networks of PS servers exhibit product form solutions, even under general service times. This is in contrast to networks of FCFS servers, which require Exponential service times. Our proof of this PS result will rely on phase-type distributions. This result is part of the famous BCMP theorem [16].
Review of Product Form Networks
So far we have seen that all of the following networks have product form:
Open Jackson networks: These assume probabilistic routing, FCFS servers with Exponential service rates, Poisson arrivals, and unbounded queues.
Open classed Jackson networks: These are Jackson networks, where the outside arrival rates and routing probabilities can depend on the “class” of the job.
Closed Jackson networks
Closed classed Jackson networks
We have also seen (see Exercise 19.3) that Jackson networks with load-dependent service rates have product form. Here the service rate can depend on the number of jobs at the server. This is useful for modeling the effects of parallel processing.
Part V involves the analysis of multi-server and multi-queue systems.
We start in Chapter 14 with the M/M/k server farm model, where k servers all work “cooperatively” to handle incoming requests from a single queue. We derive simple closed-form formulas for the distribution of the number of jobs in the M/M/k. We then exploit these formulas in Chapter 15 to do capacity provisioning for the M/M/k. Specifically, we answer questions such as, “What is the minimum number of servers needed to guarantee that only a small fraction of jobs are delayed?” We derive simple answers to these questions in the form of square-root staffing rules. In these two chapters and the exercises therein, we also consider questions pertaining to resource allocation, such as whether a single fast server is superior to many slow servers, and whether a single central queue is superior to having a queue at each server.
We then move on to analyzing networks of queues, consisting of multiple servers, each with its own queue, with probabilistic routing of packets (or jobs) between the queues. In Chapter 16 we build up the fundamental theory needed to analyze networks of queues. This includes time-reversibility and Burke's theorem. In Chapter 17, we apply our theory to Jackson networks of queues. We prove that these have product form, and we derive the limiting distribution of the number of packets at each queue. Our proofs introduce the concept of Local Balance, which we use repeatedly in derivations throughout the book.
In Chapter 22 we studied the M/G/1/PS queue and derived simple closed-form solutions for πn, E[N], and E[T] (assuming G is any Coxian distribution).
In this chapter we move on to the M/G/1/FCFS queue. We have already had some exposure to thinking about the M/G/1/FCFS. Using the matrix-analytic techniques of Chapter 21, we saw that we could solve the M/PH/1/FCFS queue numerically, where PH represents an arbitrary phase-type distribution. However, we still do not have a simple closed-form solution for the M/G/1/FCFS that lets us understand the effect of load and the job size variability on mean response time.
This chapter introduces a simple technique, known as the “tagged job” technique, which allows us to obtain a simple expression for mean response time in the M/G/1/FCFS queue. The technique will not allow us to derive the variance of response time, nor will it help us understand the higher moments of the number of jobs in the M/G/1/FCFS – for those, we will need to wait until we get to transform analysis in Chapter 25. Nonetheless, the resulting simple formula for mean response time will lead to many insights about the M/G/1 queue and optimal system design for an M/G/1 system.
The Inspection Paradox
We motivate this chapter by asking several questions. We will come back to these questions repeatedly throughout the chapter. By the end of the chapter everything will be clear.
Until now, we have only considered scheduling policies that do not have any knowledge of the job sizes. In this chapter and the next two chapters, we will look at size-based scheduling policies, starting with non-preemptive size-based policies (this chapter) and followed by preemptive size-based policies (next two chapters). The size-based policies that we will be studying include the following:
It will be convenient to evaluate these size-based policies as special cases of priority queueing, so we start by analyzing priority queues, which are important in their own right.
Size-based scheduling is a very important topic, which is why we devote three chapters to it. The proper size-based scheduling policy can greatly improve the performance of a system. It costs nothing to alter your scheduling policy (no money, no new hardware), so the performance gain comes for free. The above size-based policies are implemented in real systems. For web servers serving static content, SRPT scheduling has been implemented in the Linux kernel to schedule HTTP requests [92]. It has also been used to combat transient overload in web servers [162]. Priority queues are likewise prevalent in computer systems. Prioritization of jobs is used in databases to provide differentiated levels of service, whereby high-priority transactions (those that bring in lots of money) are given priority over low-priority transactions (those that are less lucrative).
We have seen many examples of systems questions that can be answered by modeling the system as a Markov chain. For a system to be well modeled by a Markov chain, it is important that its workloads have the Markovian property. For example, if job sizes and interarrival times are independent and Exponentially distributed, and routing is probabilistic between the queues, then the system can typically be modeled easily using a CTMC. However, if job sizes or interarrival times are distributed according to a distribution that is not memoryless, for example Uniform(0, 100), then it is not at all clear how a Markov chain can be used to model the system.
In this chapter, we introduce a technique called “the method of stages” or “the method of phases.” The idea is that almost all distributions can be represented quite accurately by a mixture of Exponential distributions, known as a phase-type distribution (PH).We will see how to represent distributions by PH distributions in Section 21.1. Because PH distributions are made up of Exponential distributions, once all arrival and service processes have been represented by PH distributions, we will be able to model our systems problem as a CTMC, as shown in Section 21.2.
The Markov chains that result via the method of phases are often much more complex than Markov chains we have seen until now. They typically cannot be solved in closed form. Thus, in Section 21.3, we introduce the matrix-analytic method, a very powerful numerical method that allows us to solve many such chains that come up in practice.
We have alluded several times during this book to the fact that computing workloads have highly variable job sizes (service requirements), that are not well described by an Exponential distribution. This chapter is a story of my own experience in studying UNIX jobs in the mid-1990s, as a PhD student at U.C. Berkeley. Results of this research are detailed in [84, 85]. The story serves as both an introduction to empirical measurements of computer workloads and as a case study of how a deeper understanding of computer workloads can lead to improved computer system designs. The remaining chapters in the book address modeling and performance evaluation of systems with high-variability workloads.
Grad School Tales … Process Migration
In the mid-1990s, an important research area was CPU load balancing in a Network of Workstations (at U.C. Berkeley it was coined the “N.O.W. project”). The idea in CPU load balancing is that CPU-bound jobs might benefit from being migrated from a heavily loaded workstation to a more lightly loaded workstation in the network. CPU load balancing is still important in today's networks of servers. It is not free, however: Migration can be expensive if the job has a lot of “state” that has to be migrated with the job (e.g., lots of open files associated with the job), as is common for jobs that have been running for a while. When the state associated with the job is great, then the time to migrate the job to another machine is high, and hence it might not be worth migrating that job.
In today's high-volume world, almost no websites, compute centers, or call centers consist of just a single server. Instead a “server farm” is used. The server farm is a collection of servers that work together to handle incoming requests. Each request might be routed to a different server, so that servers “share” the incoming load. From a practical perspective, server farms are often preferable to a single “super-fast” server because of their low cost (many slow servers are cheaper than a single fast one) and their flexibility (it is easy to increase/decrease capacity as needed by adding/removing servers). These practical features have made server farms ubiquitous.
In this chapter, we study server farms where there is a single queue of requests and where each server, when free, takes the next request off the queue to work on. Specifically, there are no queues at the individual servers. We defer discussion of models with queues at the individual servers to the exercises and later chapters.
The two systems we consider in this chapter are the M/M/k system and the M/M/k/k system. In both, the first “M” indicates that we have memoryless interarrival times, and the second “M” indicates memoryless service times. The third field denotes that k servers share a common pool of arriving jobs. For the M/M/k system, there is no capacity constraint, and this common pool takes the form of an unbounded FCFS queue, as shown later in Figure 14.3, where each server, when free, grabs the job at the head of the queue to work on.