To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this chapter, we describe a few discrete probability models to which we will come back repeatedly throughout the book. While there exists a vast array of well-studied random combinatorial structures (permutations, partitions, urn models, Boolean functions, polytopes, etc.), our focus is primarily on a limited number of graph-based processes, namely percolation, random graphs, Ising models, and random walks on networks. We will not attempt to derive the theory of these models exhaustively here. Instead we will employ them to illustrate some essential techniques from discrete probability. Note that the toolkit developed in this book is meant to apply to other probabilistic models of interest as well, and in fact many more will be encountered along the way. After a brief review of graph basics and Markov chains theory, we formally introduce our main models. We also formulate various key questions about these models that will be answered (at least partially) later on. We assume that the reader is familiar with the measure-theoretic foundations of probability. A refresher of all required concepts and results is provided in the appendix.
Branching processes, which are the focus of this chapter, arise naturally in the study of stochastic processes on trees and locally tree-like graphs. Similarly to martingales, finding a hidden branching process within a probabilistic model can lead to useful bounds and insights into asymptotic behavior. After a review of the extinction theory of branching processes and of a fruitful random-walk perspective, we give a couple examples of applications in discrete probability. In particular we analyze the height of a binary search tree, a standard data structure in computer science. We also give an introduction to phylogenetics, where a “multitype” variant of the Galton–Watson branching process plays an important role; we use the techniques derived in this chapter to establish a phase transition in the reconstruction of ancestral molecular sequences. We end this chapter with a detailed look into the phase transition of the Erdos–Renyi graph model. The random-walk perspective mentioned above allows one to analyze the “exploration” of a largest connected component, leading to information about the “evolution” of its size as edge density increases.
In this chapter, we turn to martingales, which play a central role in probability theory. We illustrate their use in a number of applications to the analysis of discrete stochastic processes. After some background on stopping times and a brief review of basic martingale properties and results, we develop two major directions. We show how martingales can be used to derive a substantial generalization of our previous concentration inequalities – from the sums of independent random variables we focused on previously to nonlinear functions with Lipschitz properties. In particular, we give several applications of the method of bounded differences to random graphs. We also discuss bandit problems in machine learning. In the second thread, we give an introduction to potential theory and electrical network theory for Markov chains. This toolkit in particular provides bounds on hitting times for random walks on networks, with important implications in the study of recurrence among other applications. We also introduce Wilson’s remarkable method for generating uniform spanning trees.
The yield of contact investigation on relapsed tuberculosis (TB) cases can guide strategies and resource allocation in the TB control programme. We conducted a retrospective cohort study to review the yield of contact investigation in relapsed TB cases and identify factors associated with TB infection (TBI) among close contacts of relapsed TB cases notified between 2018 and 2022 in Singapore. TB infection positivity was higher among contacts of relapsed cases which were culture-positive for Mycobacterium tuberculosis complex compared to those who were only polymerase chain reaction (PCR)-positive (14.8% vs. 12.3%). On multivariate analysis, after adjusting for age and gender of the index, gender, and existing comorbidities of contacts, factors independently associated with TBI were culture and smear positivity of the index (AOR 1.41, 95%CI 1.02–1.94), higher odds with every 10 years of increase in age compared to contacts below aged 30, contacts who were not Singapore residents (AOR 2.09, 95%CI 1.46–2.97), and household contacts (AOR 2.19, 95%CI 1.44–3.34). Although the yield of screening was higher for those who were culture-positive compared to only PCR-positive relapsed cases, contact tracing for only PCR-positive cases may still be important in a country with moderate TB incidence, should resources allow.
We consider the propagation of a stochastic SIR-type epidemic in two connected populations: a relatively small local population of interest which is surrounded by a much larger external population. External infectives can temporarily enter the small population and contribute to the spread of the infection inside this population. The rules for entry of infectives into the small population as well as their length of stay are modeled by a general Markov queueing system. Our main objective is to determine the distribution of the total number of infections within both populations. To do this, the approach we propose consists of deriving a family of martingales for the joint epidemic processes and applying classical stopping time or convergence theorems. The study then focuses on several particular cases where the external infection is described by a linear branching process and the entry of external infectives obeys certain specific rules. Some of the results obtained are illustrated by numerical examples.
Thanks to its outstanding performances, boosting has rapidly gained wide acceptance among actuaries. Wüthrich and Buser (Data Analytics for Non-Life Insurance Pricing. Lecture notes available at SSRN. http://dx.doi.org/10.2139/ssrn.2870308, 2019) established that boosting can be conducted directly on the response under Poisson deviance loss function and log-link, by adapting the weights at each step. This is particularly useful to analyze low counts (typically, numbers of reported claims at policy level in personal lines). Huyghe et al. (Boosting cost-complexity pruned trees on Tweedie responses: The ABT machine for insurance ratemaking. Scandinavian Actuarial Journal. https://doi.org/10.1080/03461238.2023.2258135, 2022) adopted this approach to propose a new boosting machine with cost-complexity pruned trees. In this approach, trees included in the score progressively reduce to the root-node one, in an adaptive way. This paper reviews these results and presents the new BT package in R contributed by Willame (Boosting Trees Algorithm. https://cran.r-project.org/package=BT; https://github.com/GiregWillame/BT, 2022), which is designed to implement this approach for insurance studies. A numerical illustration demonstrates the relevance of the new tool for insurance pricing.
We introduce an extension to Kermack and McKendrick’s classic susceptible–infected–recovered (SIR) model in epidemiology, whose underlying mechanism of infection consists of individuals attending randomly generated social gatherings. This gives rise to a system of ordinary differential equations (ODEs) where the force of the infection term depends non-linearly on the proportion of infected individuals. Some specific instances yield models already studied in the literature, to which the present work provides a probabilistic foundation. The basic reproduction number is seen to depend quadratically on the average size of the gatherings, which may be helpful in understanding how restrictions on social gatherings affect the spread of the disease. We rigorously justify our model by showing that the system of ODEs is the mean-field limit of the jump Markov process corresponding to the evolution of the disease in a finite population.
Accurately predicting neurosyphilis prior to a lumbar puncture (LP) is critical for the prompt management of neurosyphilis. However, a valid and reliable model for this purpose is still lacking. This study aimed to develop a nomogram for the accurate identification of neurosyphilis in patients with syphilis. The training cohort included 9,504 syphilis patients who underwent initial neurosyphilis evaluation between 2009 and 2020, while the validation cohort comprised 526 patients whose data were prospectively collected from January 2021 to September 2021. Neurosyphilis was observed in 35.8% (3,400/9,504) of the training cohort and 37.6% (198/526) of the validation cohort. The nomogram incorporated factors such as age, male gender, neurological and psychiatric symptoms, serum RPR, a mucous plaque of the larynx and nose, a history of other STD infections, and co-diabetes. The model exhibited good performance with concordance indexes of 0.84 (95% CI, 0.83–0.85) and 0.82 (95% CI, 0.78–0.86) in the training and validation cohorts, respectively, along with well-fitted calibration curves. This study developed a precise nomogram to predict neurosyphilis risk in syphilis patients, with potential implications for early detection prior to an LP.
Operational Risk is one of the most difficult risks to model. It is a large and diverse category covering anything from cyber losses to mis-selling fines; and from processing errors to HR issues. Data is usually lacking, particularly for low frequency, high impact losses, and consequently there can be a heavy reliance on expert judgement. This paper seeks to help actuaries and other risk professionals tasked with the challenge of validating models of operational risks. It covers the loss distribution and scenario-based approaches most commonly used to model operational risks, as well as Bayesian Networks. It aims to give a comprehensive yet practical guide to how one may validate each of these and provide assurance that the model is appropriate for a firm’s operational risk profile.
During the COVID-19 pandemic in Germany, a variety of societal activities were restricted to minimize direct personal interactions and, consequently, reduce SARS-CoV-2 transmission. The aim of the CoViRiS study was to investigate whether certain behaviours and societal factors were associated with the risk of sporadic symptomatic SARS-CoV-2 infections. Adult COVID-19 cases and frequency-matched population controls were interviewed by telephone regarding activities that involved contact with other people during the 10 days before illness onset (cases) or before the interview (controls). Associations between activities and symptomatic SARS-CoV-2 infection were analysed using logistic regression models adjusted for potential confounding variables. Data of 859 cases and 1 971 controls were available for analysis. The risk of symptomatic SARS-CoV-2 infection was lower for individuals who worked from home (adjusted odds ratio (aOR) 0.5; 95% confidence interval (CI) 0.3–0.6). Working in a health care setting was associated with a higher risk (aOR: 1.5; 95% CI: 1.1–2.1) as were private indoor contacts, personal contacts that involved shaking hands or hugging, and overnight travelling within Germany. Our results are in line with some of the public health recommendations aimed at reducing interpersonal contacts during the COVID-19 pandemic.
Viral marketing campaigns target primarily those individuals who are central in social networks and hence have social influence. Marketing events, however, may attract diverse audience. Despite the importance of event marketing, the influence of heterogeneous target groups is not well understood yet. In this paper, we define the Audience Selection (AS) problem in which different sets of agents need to be evaluated and compared based on their social influence. A typical application of Audience selection is choosing locations for a series of marketing events. The Audience selection problem is different from the well-known Influence Maximization (IM) problem in two aspects. Firstly, it deals with sets rather than nodes. Secondly, the sets are diverse, composed by a mixture of influential and ordinary agents. Thus, Audience selection needs to assess the contribution of ordinary agents too, while IM only aims to find top spreaders. We provide a systemic test for ranking influence measures in the Audience Selection problem based on node sampling and on a novel statistical method, the Sum of Ranking Differences. Using a Linear Threshold diffusion model on two online social networks, we evaluate eight network measures of social influence. We demonstrate that the statistical assessment of these influence measures is remarkably different in the Audience Selection problem, when low-ranked individuals are present, from the IM problem, when we focus on the algorithm’s top choices exclusively.
This article considers the individual equilibrium behavior and socially optimal strategy in a fluid queue with two types of parallel customers and incomplete fault. Assume that the working state and the incomplete fault state appear alternately in the buffer. Different from the linear revenue and expenditure structure, an exponential utility function can be constructed to obtain the equilibrium balking thresholds in the fully observable case. Besides, the steady-state probability distribution and the corresponding expected social benefit are derived based on the renewal process and the standard theory of linear ordinary differential equations. Furthermore, a reasonable entrance fee strategy is discussed under the condition that the fluid accepts the globally optimal strategies. Finally, the effects of the diverse system parameters on the entrance fee and the expected social benefit are explicitly illustrated by numerical comparisons.
We study 2-stage game-theoretic problem oriented 3-stage service policy computing, convolutional neural network (CNN) based algorithm design, and simulation for a blockchained buffering system with federated learning. More precisely, based on the game-theoretic problem consisting of both “win-lose” and “win-win” 2-stage competitions, we derive a 3-stage dynamical service policy via a saddle point to a zero-sum game problem and a Nash equilibrium point to a non-zero-sum game problem. This policy is concerning users-selection, dynamic pricing, and online rate resource allocation via stable digital currency for the system. The main focus is on the design and analysis of the joint 3-stage service policy for given queue/environment state dependent pricing and utility functions. The asymptotic optimality and fairness of this dynamic service policy is justified by diffusion modeling with approximation theory. A general CNN based policy computing algorithm flow chart along the line of the so-called big model framework is presented. Simulation case studies are conducted for the system with three users, where only two of the three users can be selected into the service by a zero-sum dual cost game competition policy at a time point. Then, the selected two users get into service and share the system rate service resource through a non-zero-sum dual cost game competition policy. Applications of our policy in the future blockchain based Internet (e.g., metaverse and web3.0) and supply chain finance are also briefly illustrated.
Data on real-time individuals’ location may provide significant opportunities for managing emergency situations. For example, in the case of outbreaks, besides informing on the proximity of people, hence supporting contact tracing activities, location data can be used to understand spatial heterogeneity in virus transmission. However, individuals’ low consent to share their data, proved by the low penetration rate of contact tracing apps in several countries during the coronavirus disease-2019 (COVID-19) pandemic, re-opened the scientific and practitioners’ discussion on factors and conditions triggering citizens to share their positioning data. Following the Antecedents → Privacy Concerns → Outcomes (APCO) model, and based on Privacy Calculus and Reasoned Action Theories, the study investigates factors that cause university students to share their location data with public institutions during outbreaks. To this end, an explanatory survey was conducted in Italy during the second wave of COVID-19, collecting 245 questionnaire responses. Structural equations modeling was used to contemporary investigate the role of trust, perceived benefit, and perceived risk as determinants of the intention to share location data during outbreaks. Results show that respondents’ trust in public institutions, the perceived benefits, and the perceived risk are significant predictor of the intention to disclose personal tracking data with public institutions. Results indicate that the latter two factors impact university students’ willingness to share data more than trust, prompting public institutions to rethink how they launch and manage the adoption process for these technological applications.
Bus Rapid Transit (BRT) has grown fast in the last 25 years, promising low-cost, rapid implementation, and large positive impacts. Despite advances, many systems in middle- and low-income countries face operational and financial issues, particularly in Latin America. Some practitioners, researchers, and decision makers, and the media are questioning its ability to provide quality services. Is this the end of a trend? To answer this question, this paper explores the status of the BRT industry and literature on the topic, with a focus on Latin America, as well as the emblematic cases of Curitiba, Quito, Bogotá, Mexico, and Santiago. Overcrowding, lack of reliability, fare evasion, issues of safety and security, and poor maintenance are evident problems in these and other cities. They seem to be a result of institutional and financial constraints, as well as technical limitations of surface-based transit modes. BRT has been able to deliver high-capacity and fast and reliable services, but requires permanent management and investment to face growing demand and aging infrastructure and vehicles, just like rail systems do. In addition, attention needs to be provided to data, technology innovation, urban integration, and public participation to keep BRT as an integral part of multimodal high-quality sustainable mobility networks in the future.