To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter begins our study of Markov chains, specifically discrete-time Markov chains. In this chapter and the next, we limit our discussion to Markov chains with a finite number of states. Our focus in this chapter will be on understanding how to obtain the limiting distribution for a Markov chain.
In the last two chapters we studied many tail bounds, including those from Markov, Chebyshev, Chernoff and Hoeffding. We also studied a tail approximation based on the Central Limit Theorem (CLT). In this chapter we will apply these bounds and approximations to an important problem in computer science: the design of hashing algorithms. In fact, hashing is closely related to the balls-and-bins problem that we recently studied in Chapter 19.
This part of the book is devoted to randomized algorithms. A randomized algorithm is simply an algorithm that uses a source of random bits, allowing it to make random moves. Randomized algorithms are extremely popular in computer science because (1) they are highly efficient (have low runtimes) on every input, and (2) they are often quite simple.
In the previous chapter, we studied individual continuous random variables. We now move on to discussing multiple random variables, which may or may not be independent of each other. Just as in Chapter 3 we used a joint probability mass function (p.m.f.), we now introduce the continuous counterpart, the joint probability density function (joint p.d.f.). We will use the joint p.d.f. to answer questions about the expected value of one random variable, given some information about the other random variable.
This final part of the book is devoted to the topic of Markov chains. Markov chains are an extremely powerful tool used to model problems in computer science, statistics, physics, biology, and business – you name it! They are used extensively in AI/machine learning, computer science theory, and in all areas of computer system modeling (analysis of networking protocols, memory management protocols, server performance, capacity provisioning, disk protocols, etc.). Markov chains are also very common in operations research, including supply chain, call center, and inventory management.
We have studied several common continuous distributions: the Uniform, the Exponential, and the Normal. However, if we turn to computer science quantities, such as file sizes, job CPU requirements, IP flow times, and so on, we find that none of these are well represented by the continuous distributions that we’ve studied so far. To understand the type of distributions that come up in computer science, it’s useful to start with a story.
This chapter introduces randomized algorithms. We start with a discussion of the differences between randomized algorithms and deterministic algorithms. We then introduce the two primary types of randomized algorithms: Las Vegas algorithms and Monte Carlo algorithms. This chapter and its exercises will contain many examples of randomized algorithms, all of the Las Vegas variety. In Chapter 22 we will turn to examples of the Monte Carlo variety.
In this part of the book we delve deeply into understanding the tail of a random variable, namely the probability that the random variable exceeds some value. While we briefly touched on this topic in Section 5.9, in Chapter 18 we derive much more sophisticated tail bounds, including Chernoff bounds and Hoeffding bounds.
In Chapter 3, we studied several common discrete distributions. In this chapter we will learn how to obtain their mean, or expectation. We will also cover some useful tools that help us to simplify deriving expectations, such as the linearity of expectation result and deriving expectations by conditioning.
In Chapter 6, we covered a type of generating function known as the z-transform, which is particularly well suited to discrete, integer-valued, random variables. In this chapter, we will introduce a new type of generating function, called the Laplace transform, which is particularly well suited to common continuous random variables.
An important and ubiquitous continuous distribution is the Normal distribution (also called the Gaussian). Normal distributions occur frequently in statistics, economics, natural sciences, and social sciences. For example, IQs approximately follow a Normal distribution. Men’s heights and weights are approximately Normally distributed, as are women’s heights and weights.
The “budgeting for SDGs”–B4SDGs–paradigm seeks to coordinate the budgeting process of the fiscal cycle with the sustainable development goals (SDGs) set by the United Nations. Integrating the goals into public financial management systems is crucial for an effective alignment of national development priorities with the objectives set in the 2030 Agenda. Within the dynamic process defined in the B4SDGs framework, the step of SDG budget tagging represents a precondition for subsequent budget diagnostics. However, developing a national SDG taxonomy requires substantial investment in terms of time, human, and administrative resources. Such costs are exacerbated in least-developed countries, which are often characterized by a constrained institutional capacity. The automation of SDG budget tagging could represent a cost-effective solution. We use well-established text analysis and machine learning techniques to explore the scope and scalability of automatic labeling budget programs within the B4SDGs framework. The results show that, while our classifiers can achieve great accuracy, they face limitations when trained with data that is not representative of the institutional setting considered. These findings imply that a national government trying to integrate SDGs into its planning and budgeting practices cannot just rely solely on artificial intelligence (AI) tools and off-the-shelf coding schemes. Our results are relevant to academics and the broader policymaker community, contributing to the debate around the strengths and weaknesses of adopting computer algorithms to assist decision-making processes.
We have alluded to the fact that probability is useful in the performance analysis and design of computer systems. Queueing theory is an area of applied probability which directly targets systems performance. Here the “system” might refer to a computer system, a call center, a healthcare system, a manufacturing system, a banking system, or one of many other examples. Markov chains (particularly continuous-time chains) are just one of many tools used in queueing theory. In this final part of the book, we provide a very brief introduction to queueing theory. For a much more in-depth coverage, see [35].
At this point, we have discussed many discrete and continuous distributions. This chapter shows how we can generate instances of these distributions and others. This is helpful when performing simulations of computer systems, as in Chapter 14. For example, we might have a computer system where the interarrival times of jobs are well modeled by an Exponential distribution and the job sizes (service requirements) are well modeled by a Pareto distribution. To simulate the system, we need to be able to generate instances of Exponential and Pareto random variables.
So far we have only talked about finite-state discrete-time Markov chains (DTMCs) with states. Now we move on to infinite-state DTMCs. For a Markov chain with an infinite number of states, one can still imagine a transition probability matrix, , but the matrix has infinite dimension.