To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
QUEUES FEATURE IN our daily lives like never before. From the checkout counter in the community grocery store to customer support over the phone, queues are theatres of great social and engineering drama. Entire business operations of many leading companies are geared towards providing hassle-free customer support and experience – timely and effective resolution of client queries about services on a regular basis. Alternatively, it could be effective traffic management and resource optimization for a multiplex cinema operator involved in ticket sales. Sometimes it may not involve humans at all, like in the case of a database query to a computer server for specific information that may be routed through a job queue. How a queue moves in time and how services are offered over epochs determine how businesses will be able to make profit or how efficiently computer servers will execute tasks. All these have a huge technological and economical impact. No wonder we have seen huge investments by concerned stakeholders to upgrade and upscale hardware and software infrastructure to re-engineer queues towards greater system efficiency and profitability. The mathematical technology of queues is crafted out of models that investigate and replicate stochastic behavior of engineering systems. This is the subject of our study in this chapter.
STATISICAL EXPERIMENTS ENABLE us to make inferences from data about parameters that characterize a population. Generally speaking, inferences may be of two types, namely, deductive inference and inductive inference. Deductive inference pertains to conclusions based on a set of premises (propositions) and their synthesis. Deductive reasoning has a definitive character. For example, all men are mortal (first proposition); Socrates is a man (second proposition); hence, Socrates is mortal (deductive conclusion). On the other hand, inductive inference has a probabilistic character. One conducts an experiment and collects data. Based on this data, certain conclusions are drawn that may have a broader applicability beyond the contours of the particular experiment performed by the researcher. This generalization of the conclusions drawn from the particular experiment constitutes the framework of inductive reasoning. For example, measurement of heights of a small group of people belonging to a certain population is conducted. Based on the calculations of this small sample set, and upon finding that for this small group the average height of men is greater than the average height of women, it is inferred that the men of this population are generally taller than the women.
The formal practice of inductive reasoning dates back to the thesis of Gottfried Wilhelm Leibniz (see Figure 5.1). He was the first to propose that probability is a relation between hypothesis and evidence (data). His thesis was founded on three conceptual pillars: chance (probability), possibilities (realizable random events), and ideas (generalization of inferences by induction). We have encountered the first two concepts in earlier chapters of this textbook. In this chapter, we will delve into the third theme whereby we will discuss methods to draw conclusions from data derived from statistical experiments based on the principles of inductive reasoning.
Our lived experiences are punctuated by events that are sometimes a result of our purposeful intentions and at other times outcomes that happen by pure chance. Even at an abstract level, it is a very human endeavor to deduce meaning from seemingly random observations an exercise whose primary objective is to derive a causal structure in observed phenomena. In fact, our whole intellectual pursuit that differentiates us from other beings can be understood through our inner urge to discover the very purpose of our existence and the conditions that make this possible. This eternal play between chance episodes and purposeful volition manifests in diverse situations that I have labored to recreate through computer simulations of realistic events. This play has a dual role - first, it binds together the flow of our varied experiences and, second, it offers us a perspective to assimilate our understanding of events happening around us that affect us. In order to appreciate this play of chance and purpose, it is essential that students and readers have a conceptual grounding in the areas of probability, statistics, and stochastic processes. Therefore, several playful computer simulations and projects are interlaced with theoretical foundations and numerical examples - both solved and exercise problems. In this way, the presentation in this book remains true to its spirit of inviting thoughtful readers to the various aspects of this area of study.
Historical remark
The advent of a rigorous framework for studying probability and statistics dates back to the eighth century AD and is documented in the works of Al-Khalil, who was an Arab philologist. This branch of mathematics continues to be under development with major contributions from Soviet mathematician Andrey N. Kolmogorov, who developed the modern foundations of probability and statistical theory from a measure-theoretic standpoint in the twentieth century.
DISTRIBUTIONS ARE GENERALIZATIONS of mathematical functions from a purely technical standpoint. But perhaps it is most pertinent to begin by asking a more utilitarian question. Why should we study distributions? Specifically, why should we study probability distributions? One of the motivations stems from a practical limitation of experimental measurements that is underlined by the uncertainty principle postulated by Werner Heisenberg (see Figure 2.1). The very fabric of reality and the structure of scientific laws that govern our ability to understand physical phenomena demand a probabilistic (statistical) approach. Our inability to make infinite-precision measurements of data necessitates the consideration of averages over many measurements, and under similar conditions, as a more reliable strategy to affix experimental values to unknowns with reasonable accuracy.
The advent of the internet and sensor technology has enabled humankind to collect, store, and share data in bulk. In turn, access to a variety of data has amplified a different kind of problem, which is to devise an appropriate strategy to derive meaning from data. Indeed, extracting information from data has acquired the highest priority among tasks performed by engineers and scientists alike. State-ofthe-art machine learning algorithms are used to process and analyze data in order to leverage maximum gains in developing new technology and creating a new body of knowledge.
Further, the data-rich tech-universe has inherent complexity in addition to the vastness in terms of numbers. This complexity arises from the fact that often this data is embedded in a higher-dimensional space. For example, the data acquired by a camera hosted on a robot is in the form of multiple grayscale images (frames); each data-frame is constituted of a sequence of numbers that represents the intensity of grayness of each pixel. If each image has a resolution 100 × 100 (pixel count), then this image data is embedded in a 10000 dimensional space. Additionally, if the camera records 100 frames per second for one minute, then we have 6000 data points in a 10000 dimensional space. This is just an illustrative example of how a high-dimensional large data set may be generated. Quite evidently, not all the 10000 dimensions host most of the information. One of the most important techniques that we will learn in this chapter will allow us to extract a lower dimensional representation of the data set that will retain sufficient information for the robot to navigate and perform its tasks.
MARKOV CHAINS WERE first formulated as a stochastic model1 by Russian mathematician Andrei Andreevich Markov. Markov spent most of his professional career at St. Petersburg University and the Imperial Academy of Science. During this time, he specialized in the theory of numbers, mathematical analysis, and probability theory. His work on Markov chains utilized finite square matrices (stochastic matrices) to show that the two classical results of probability theory, namely, the weak law of large numbers and the central limit theorem, can be extended to the case of sums of dependent random variables. Markov chains have wide scientific and engineering applications in statistical mechanics, financial engineering, weather modeling, artificial intelligence, and so on. In this chapter, we will look at a few applications as we build the concepts of Markov chains. Additionally, we will also implement a technique (using Markov chains) to solve a simple and practical engineering problem related to aircraft control and automation.
3.1 Chapter objectives
The chapter objectives are listed as follows.
1. Students will learn the definition and applications of Markov processes.
2. Students will learn the definition of the stochastic matrix (also known as the probability transition matrix) and perform simple matrix calculations to compute conditional probabilities.
3. Students will learn to solve engineering and scientific problems based on discrete time Markov chains (DTMCs) using multi-step transition probabilities.
4. Students will learn to compute return times and hitting times to Markov states.
5. Students will learn to classify different Markov states.
6. Students will learn to use the techniques of DTMCs introduced in this chapter to solve a complex engineering problem related to flight control operations.
An intricate landscape of bias permeates biomedical research. In this groundbreaking exploration the myriad sources of bias shaping research outcomes, from cognitive biases inherent in researchers to the selection of study subjects and data interpretation, are examined in detail. With a focus on randomized controlled trials, pharmacologic studies, genetic research, animal studies, and pandemic analyses, it illuminates how bias distorts the quest for scientific truth. Historical and contemporary examples vividly illustrate the impact of biases across research domains. Offering insights on recognizing and mitigating bias, this comprehensive work equips scientists and research teams with tools to navigate the complex terrain of biased research practices. A must-read for anyone seeking a deeper understanding of the critical role biases play in shaping the reliability and reproducibility of biomedical research.
Understanding change over time is a critical component of social science. However, data measured over time – time series – requires their own set of statistical and inferential tools. In this book, Suzanna Linn, Matthew Lebo, and Clayton Webb explain the most commonly used time series models and demonstrate their applications using examples. The guide outlines the steps taken to identify a series, make determinations about exogeneity/endogeneity, and make appropriate modelling decisions and inferences. Detailing challenges and explanations of key techniques not covered in most time series textbooks, the authors show how navigating between data and models, deliberately and transparently, allows researchers to clearly explain their statistical analyses to a broad audience.
Owing to their innovative guarantee features, the popularity of variable annuities has gained significant traction as suitable retirement products in recent years. Amongst these guarantees, the guaranteed minimum income benefit (GMIB) stands out as an appealing rider that can be integrated into variable annuity contracts. In this research, we construct a comprehensive modelling framework that encompasses three sources of uncertainty, namely interest risk, mortality risk and investment risk, with the aim of valuing the GMIB. These risk factors are modelled stochastically whilst accounting for the interdependence between interest and mortality risks. The numéraire transformation technique is utilised in our approach, capitalising on the concepts of the forward and endowment-risk-adjusted measures. By considering two distinct settings of the Benefit Base functions, we derive an analytic solution for the GMIB. Our numerical findings demonstrate the superiority of our proposed methodology vis-á-vis the standard Monte Carlo simulation as a benchmark in terms of computational accuracy and efficiency, achieving a remarkable average improvement of 99% computing time reduction compared to the benchmark. Furthermore, we conduct an extensive sensitivity analysis to explore the levels of impact of various model parameters on the value of the GMIB.
Lifetime pension pools—also known as group self-annuitization plans, pooled annuity funds, and variable payment life annuities in the literature—offer retirees lifelong income by collectively managing mortality risk and adjusting benefits based on the investment performance and the mortality experience within the pool. The benefit structure hinges on two key design parameters: the investment policy and the hurdle rate. However, past research offers limited guidance on optimal asset allocation in such settings, often relying on overly simplistic strategies. Furthermore, the choice of hurdle rate has received virtually no attention in the literature. This study addresses this gap by jointly analyzing optimal hurdle rates and investment strategies using a dynamic programming approach that allows for varying degrees of risk aversion via a hyperbolic absolute risk aversion utility function. Our findings reveal that, as risk aversion increases, the model favours more conservative portfolios and lower hurdle rates; conversely, lower risk aversion supports riskier allocations and higher hurdle rates. The threshold parameter—which reflects the minimum acceptable level of consumption—plays a critical role in shaping the hurdle rate behaviour.
We consider instrumental variables (IV) estimation of a possibly infinite order dynamic panel autoregressive (AR) process with individual effects. The estimation is based on the sieve AR approximation, with its lag order increasing with sample size. Transforming the variable to eliminate individual effects generates an endogeneity problem, particularly when the time series is only moderately long. IV approaches are useful to obtain well-behaved estimators in panels with large cross sections. We establish the consistency and asymptotic normality of the IV estimators, including the Anderson-Hsiao, generalized method of moments, and double filter IV (DFIV) estimators. The theoretical results are obtained under homoskedasticity using double asymptotics under which both the cross-sectional sample size and the length of the time series tend to infinity. The finite-sample performance of the estimators is examined using Monte Carlo simulation. Our preferred estimator is the DFIV estimator, as it exhibits excellent performance in terms of bias and coverage probability, despite its finite-sample distribution being relatively dispersed.
Core-periphery (CP) structure is frequently observed in networks where the nodes form two distinct groups: a small, densely interconnected core and a sparse periphery. Borgatti and Everett (Borgatti, S. P., & Everett M. G. (2000). Models of core/periphery structures. Social Networks, 21(4), 375–395.) proposed one of the most popular methods to identify and quantify CP structure by comparing the observed network with an “ideal” CP structure. While this metric has been widely used, an improved algorithm is still needed. In this work, we detail a greedy, label-switching algorithm to identify CP structure that is both fast and accurate. By leveraging a mathematical reformulation of the CP metric, our proposed heuristic offers an order-of-magnitude improvement on the number of operations compared to a naive implementation. We prove that the algorithm monotonically ascends to a local maximum while consistently yielding solutions within 90% of the global optimum on small toy networks. On synthetic networks, our algorithm exhibits superior classification accuracies and run-times compared to a popular competing method, and on one-real- world network, it is 340 times faster.
We study the multiserver-job setting in the load-focused multilevel scaling limit, where system load approaches capacity much faster than the growth of the number of servers $n$. We consider the “1 and $n$” system, where each job requires either one server or all $n$. Within the multilevel scaling limit, we examine three regimes: load dominated by $n$-server jobs, 1-server jobs, or balanced. In each regime, we characterize the asymptotic growth rate of the boundary of the stability region and the scaled mean queue length. We demonstrate that mean queue length peaks near balanced load via theory, numerics, and simulation.
Given a collection $\mathcal{D} =\{D_1,D_2,\ldots ,D_m\}$ of digraphs on the common vertex set $V$, an $m$-edge digraph $H$ with vertices in $V$ is transversal in $\mathcal{D}$ if there exists a bijection $\varphi \,:\,E(H)\rightarrow [m]$ such that $e \in E(D_{\varphi (e)})$ for all $e\in E(H)$. Ghouila-Houri proved that any $n$-vertex digraph with minimum semi-degree at least $\frac {n}{2}$ contains a directed Hamilton cycle. In this paper, we provide a transversal generalisation of Ghouila-Houri’s theorem, thereby solving a problem proposed by Chakraborti, Kim, Lee, and Seo. Our proof utilises the absorption method for transversals, the regularity method for digraph collections, as well as the transversal blow-up lemma and the related machinery. As an application, when $n$ is sufficiently large, our result implies the transversal version of Dirac’s theorem, which was proved by Joos and Kim.