To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Motivated by recent studies of big samples, this work aims to construct a parametric model which is characterized by the following features: (i) a ‘local’ reinforcement, i.e. a reinforcement mechanism mainly based on the last observations, (ii) a random persistent fluctuation of the predictive mean, and (iii) a long-term almost sure convergence of the empirical mean to a deterministic limit, together with a chi-squared goodness-of-fit result for the limit probabilities. This triple purpose is achieved by the introduction of a new variant of the Eggenberger–Pólya urn, which we call the rescaled Pólya urn. We provide a complete asymptotic characterization of this model, pointing out that, for a certain choice of the parameters, it has properties different from the ones typically exhibited by the other urn models in the literature. Therefore, beyond the possible statistical application, this work could be interesting for those who are concerned with stochastic processes with reinforcement.
We study a stochastic compartmental susceptible–infected (SI) epidemic process on a configuration model random graph with a given degree distribution over a finite time interval. We split the population of graph vertices into two compartments, namely, S and I, denoting susceptible and infected vertices, respectively. In addition to the sizes of these two compartments, we keep track of the counts of SI-edges (those connecting a susceptible and an infected vertex) and SS-edges (those connecting two susceptible vertices). We describe the dynamical process in terms of these counts and present a functional central limit theorem (FCLT) for them as the number of vertices in the random graph grows to infinity. The FCLT asserts that the counts, when appropriately scaled, converge weakly to a continuous Gaussian vector semimartingale process in the space of vector-valued càdlàg functions endowed with the Skorokhod topology. We discuss applications of the FCLT in percolation theory and in modelling the spread of computer viruses. We also provide simulation results illustrating the FCLT for some common degree distributions.
We analyse an additive-increase and multiplicative-decrease (also known as growth–collapse) process that grows linearly in time and that, at Poisson epochs, experiences downward jumps that are (deterministically) proportional to its present position. For this process, and also for its reflected versions, we consider one- and two-sided exit problems that concern the identification of the laws of exit times from fixed intervals and half-lines. All proofs are based on a unified first-step analysis approach at the first jump epoch, which allows us to give explicit, yet involved, formulas for their Laplace transforms. All eight Laplace transforms can be described in terms of two so-called scale functions associated with the upward one-sided exit time and with the upward two-sided exit time. All other Laplace transforms can be obtained from the above scale functions by taking limits, derivatives, integrals, and combinations of these.
We consider the simultaneous propagation of two contagions over a social network. We assume a threshold model for the propagation of the two contagions and use the formal framework of discrete dynamical systems. In particular, we study an optimization problem where the goal is to minimize the total number of new infections subject to a budget constraint on the total number of available vaccinations for the contagions. While this problem has been considered in the literature for a single contagion, our work considers the simultaneous propagation of two contagions. This optimization problem is NP-hard. We present two main solution approaches for the problem, namely an integer linear programming (ILP) formulation to obtain optimal solutions and a heuristic based on a generalization of the set cover problem. We carry out a comprehensive experimental evaluation of our solution approaches using many real-world networks. The experimental results show that our heuristic algorithm produces solutions that are close to the optimal solution and runs several orders of magnitude faster than the ILP-based approach for obtaining optimal solutions. We also carry out sensitivity studies of our heuristic algorithm.
Beyond quantifying the amount of association between two variables, as was the goal in a previous chapter, regression analysis aims at describing that association and/or at predicting one of the variables based on the other ones. Examples of applications where this is needed abound in engineering and a broad range of industries. For example, in the insurance industry, when pricing a policy, the predictor variable encapsulates the available information about what is being insured, and the response variable is a measure of risk that the insurance company would take if underwriting the policy. In this context, a procedure is solely evaluated based on its performance at predicting that risk, and can otherwise be very complicated and have no simple interpretation. The chapter covers both local methods such as kernel regression (e.g., local averaging) and empirical risk minimization over a parametric model (e.g., linear models fitted by least squares). Cross-validation is introduced as a method for estimating the prediction power of a certain regression or classification metod.
Measurements are often numerical in nature, which naturally leads to distributions on the real line. We start our discussion of such distributions in the present chapter, and in the process introduce the concept of random variable, which is really a device to facilitate the writing of probability statements and the derivation of the corresponding computations. We introduce objects such as the distribution function, survival function, and quantile function, any of which characterizes in the underlying distribution.
Some experiments lead to considering not one, but several measurements. As before, each measurement is represented by a random variable, and these are stacked into a random vector. For example, in the context of an experiment that consists in flipping a coin multiple times, we defined in a previous chapter as many random variables, each indicating the result of one coin flip. These are then concatenated to form a random vector, compactly describing the outcome of the entire experiment. Concepts such as conditional probability and independence are introduced.
We consider an experiment that yields, as data, a sample of independent and identically distributed (real-valued) random variables with a common distribution on the real line. The estimation of the underlying mean and median is discussed at length, and bootstrap confidence intervals are constructed. Tests comparing the underlying distribution to a given distribution (e.g., the standard normal distribution) or a family of distribution (e.g., the normal family of distributions) are introduced. Censoring, which is very common in some clinical trials, is briefly discuss.
In this chapter we introduce some tools for sampling from a distribution. We also explain how to use computer simulations to approximate probabilities and, more generally, expectations, which can allow one to circumvent complicated mathematical derivations. The methods that are introduced include Monte Carlo sampling/integration, rejection sampling, and Markov Chain Monte Carlo sampling.
An expectation is simply a weighted mean, and means are at the core of Probability Theory and Statistics. In Statistics, in particular, such expectations are used to define parameters of interest. It turns out that an expectation can be approximated by an empirical average based on a sample from the distribution of interest, and the accuracy of this approximation can be quantified via what is referred to as concentration inequalities.
We derive formulae for the moments of the time of ruin in both ordinary and modified Sparre Andersen risk models without specifying either the inter-claim time distribution or the individual claim amount distribution. We illustrate the application of our results in the special case of exponentially distributed claims, as well as for the following ordinary models: the classical risk model, phase-type(2) risk models, and the Erlang($\mathscr{n}$) risk model. We also show how the key quantities for modified models can be found.