To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Analysis of experimental scalar data is tackled here. Starting from basic analysis of a large number of well-behaved data, eventually displaying Gaussian distributions, we move on to Bayesian inference and face the cases of few (or no) data, sometimes badly behaved. We first present methods to analyze data whose ideal distribution is known, and then we show methods to make predictions even when our ignorance about the data distribution is total. Eventually, various resampling methods are provided to deal with time-correlated measurements, biased estimators, anomalous data, and under- or over-estimation of statistical errors.
As we realize that random walks, chain reactions, and recurrent events are all Markov chains, i.e., correlated processes without memory, in this chapter we derive a general theory, including classification and properties of the single states and of chains. In particular, we focus on the building blocks of the theory, i.e., irreducible chains, presenting and proving a number of fundamental and useful theorems. We end up deriving the balance equation for the limit probability and the approach to the limit for long times, developing and applying the Perron–Frobenius theory for non-negative matrices and the spectral decomposition for non-Hermitian matrices. Among the applications of the theory, we underline the sorting of Web pages by search engines.
Yet another memoryless correlated discrete process is considered, recurrent events. These are classified in different ways, and a whole theory is developed to describe the possible behaviors. Special attention is devoted to the proof of the limit probability theorem, whose lengthy details are reported in an appendix, so as not to scare readers. The theory of recurrent events is particularly useful because many properties and mathematical theorems can be straightforwardly translated in the more general theory of Markov chains.
This chapter is devoted to correlations. We take up the central limit theorem once again, first with a couple of specific examples solved with considerable – but instructive – effort: Markov chains and recurrent events. Then, we generalize the machinery of generating functions to multivariate, correlated systems of stochastic variables, until we are able to prove the central limit theorem and the large deviations theorem for correlated events. We go back to the Markov chain central limit example to show how the theorem massively simplifies things. Eventually, we show how correlations and the lack of a Gaussian central limit are linked to phase transitions in statistical physics.
An overview of probability distributions and their properties is shown, to provide readers with fundamental concepts, computational methods, and their applications to the study of stochastic problems. Binomial, Poisson, Gaussian, and Cauchy–Lorentz distributions are examined in detail, computing their moments and cumulants, as long as they do not diverge. Very useful tools like multivariate Gaussian integrals, the Laplace maximum method, and the properties of the Euler Gamma function are reported in the appendices.
In this chapter we continue to present the theory of Markov chains, specializing in reversible chains, for which the very important relation of detailed balance is shown to hold. This is a basic concept also in continuous processes, and the foundation of the property termed “equilibrium” in thermodynamics and statistical mechanics. The major application we present is the Monte Carlo method for estimating thermodynamic averages, which maps the average on a statistical ensemble into the dynamics through states of a reversible Markov chain. We introduce the Metropolis algorithm and we apply it to the Ising model for ferromagnetism.
Understanding the properties of lower-carbon concrete products is essential for their effective utilization. Insufficient empirical test data hinders practical adoption of these emerging products, and a lack of training data limits the effectiveness of current machine learning approaches for property prediction. This work employs a random forest machine learning model combined with a just-in-time approach, utilizing newly available data throughout the concrete lifecycle to enhance predictions of 28 and 56 day concrete strength. The machine learning hyperparameters and inputs are optimized through a novel unified metric that combines prediction accuracy and uncertainty estimates through the coefficient of determination and the distribution of uncertainty quality. This study concludes that optimizing solely for accuracy selects a different model than optimizing with the proposed unified accuracy and uncertainty metric. Experimental validation compares the 56-day strength of two previously unseen concrete mixes to the machine learning predictions. Even with the sparse dataset, predictions of 56-day strength for the two mixes were experimentally validated to within 90% confidence interval when using slump as an input and further improved by using 28-day strength.
We develop anapproximation for the buffer overflow probability of a stable tandem network in dimensions three or more. The overflow event in terms of the constrained random walk representing the network is the following: the sum of the components of the process hits n before hitting 0. This is one of the most commonly studied rare events in the context of queueing systems and the constrained processes representing them. The approximation is valid for almost all initial points of the process and its relative error decays exponentially in n. The analysis is based on an affine transformation of the process and the problem; as $n\rightarrow \infty$ the transformed process converges to an unstable constrained random walk. The approximation formula consists of the probability of the limit unstable process hitting a limit boundary in finite time. We give an explicit formula for this probability in terms of the utilization rates of the nodes of the network.
Keeping an up-to-date three-dimensional (3D) representation of buildings is a crucial yet time-consuming step for Building Information Modeling (BIM) and digital twins. To address this issue, we propose ICON (Intelligent CONstruction) drone, an unmanned aerial vehicle (UAV) designed to navigate indoor environments autonomously and generate point clouds. ICON drone is constructed using a 250 mm quadcopter frame, a Pixhawk flight controller, and is equipped with an onboard computer, an Red Green Blue-Depth camera and an IMU (Inertial Measurement Unit) sensor. The UAV navigates autonomously using visual-inertial odometer and frontier-based exploration. The collected RGB images during the flight are used for 3D reconstruction and semantic segmentation. To improve the reconstruction accuracy in weak-texture areas in indoor environments, we propose depth-regularized planar-based Gaussian splatting reconstruction, where we use monocular-depth estimation as extra supervision for weak-texture areas. The final outputs are point clouds with building components and material labels. We tested the UAV in three scenes in an educational building: the classroom, the lobby, and the lounge. Results show that the ICON drone could: (1) explore all three scenes autonomously, (2) generate absolute scale point clouds with F1-score of 0.5806, 0.6638, and 0.8167 compared to point clouds collected using a high-fidelity terrestrial LiDAR scanner, and (3) label the point cloud with corresponding building components and material with mean intersection over union of 0.588 and 0.629. The reconstruction algorithm is further evaluated on ScanNet, and results show that our method outperforms previous methods by a large margin on 3D reconstruction quality.
The rise of large language models (LLMs) has marked a substantial leap toward artificial general intelligence. However, the utilization of LLMs in (re)insurance sector remains a challenging problem because of the gap between general capabilities and domain-specific requirements. Two prevalent methods for domain specialization of LLMs involve prompt engineering and fine-tuning. In this study, we aim to evaluate the efficacy of LLMs, enhanced with prompt engineering and fine-tuning techniques, on quantitative reasoning tasks within the (re)insurance domain. It is found that (1) compared to prompt engineering, fine-tuning with task-specific calculation dataset provides a remarkable leap in performance, even exceeding the performance of larger pre-trained LLMs; (2) when acquired task-specific calculation data are limited, supplementing LLMs with domain-specific knowledge dataset is an effective alternative; and (3) enhanced reasoning capabilities should be the primary focus for LLMs when tackling quantitative tasks, surpassing mere computational skills. Moreover, the fine-tuned models demonstrate a consistent aptitude for common-sense reasoning and factual knowledge, as evidenced by their performance on public benchmarks. Overall, this study demonstrates the potential of LLMs to be utilized as powerful tools to serve as AI assistants and solve quantitative reasoning tasks in (re)insurance sector.
The Erdős-Sós Conjecture states that every graph with average degree exceeding $k-1$ contains every tree with $k$ edges as a subgraph. We prove that there are $\delta \gt 0$ and $k_0\in \mathbb N$ such that the conjecture holds for every tree $T$ with $k \ge k_0$ edges and every graph $G$ with $|V(G)| \le (1+\delta )|V(T)|$.
This article considers a three-dimensional latent factor model in the presence of one set of global factors and two sets of local factors. We show that the numbers of global and local factors can be estimated uniformly and consistently. Given the number of global and local factors, we propose a two-step estimation procedure based on principal component analysis (PCA) and establish the asymptotic properties of the PCA estimators. Monte Carlo simulations demonstrate that they perform well in finite samples. An application to the dataset of international trade reveals the relative importance of different types of factors.
The Secretary of the US Department of Health & Human Services, Robert Kennedy Jr is leading a political agenda against vaccination. This is undermining the delivery of life-saving vaccination programmes and provision of evidence-based information on the safety and effectiveness of vaccines for the public and health professionals. Inconsistent and conflicting messaging between health practitioners and government health agencies erodes trust in public health programmes, creating a vacuum which is often filled with mis/disinformation that presents severe consequences for families. Due to the transnational spread of diseases, we consider the implications of events in the US for routine childhood vaccination programmes in the UK. Public health agencies across the world need to be ‘Kennedy ready’; pragmatic steps must be taken to mitigate threats posed to vaccine confidence and the control of vaccine preventable diseases.
We seek to understand the factors that drive mortality in the contiguous United States using data that are indexed by county and year and grouped into 18 different age bins. We propose a model that adds two important contributions to existing mortality studies. First, we treat age as a random effect. This is an improvement over previous models because it allows the model in one age group to borrow information from other age groups. Second, we utilize Gaussian Processes to create nonlinear covariate effects for predictors such as unemployment rate, race, and education level. This allows for a more flexible relationship to be modeled between mortality and these predictors. Understanding that the United States is expansive and diverse, we allow for many of these effects to vary by location. The flexibility in how predictors relate to mortality has not been used in previous mortality studies and will result in a more accurate model and a more complete understanding of the factors that drive mortality. Both the multivariate nature of the model as well as the spatially varying non-linear predictors will advance the study of mortality and will allow us to better examine the relationships between the predictors and mortality.
The Erdős–Simonovits stability theorem is one of the most widely used theorems in extremal graph theory. We obtain an Erdős–Simonovits type stability theorem in multi-partite graphs. Different from the Erdős–Simonovits stability theorem, our stability theorem in multi-partite graphs says that if the number of edges of an $H$-free graph $G$ is close to the extremal graphs for $H$, then $G$ has a well-defined structure but may be far away from the extremal graphs for $H$. As applications, we strengthen a theorem of Bollobás, Erdős, and Straus and solve a conjecture in a stronger form posed by Han and Zhao concerning the maximum number of edges in multi-partite graphs which does not contain vertex-disjoint copies of a clique.
We consider the hypergraph Turán problem of determining $ex(n, S^d)$, the maximum number of facets in a $d$-dimensional simplicial complex on $n$ vertices that does not contain a simplicial $d$-sphere (a homeomorph of $S^d$) as a subcomplex. We show that if there is an affirmative answer to a question of Gromov about sphere enumeration in high dimensions, then $ex(n, S^d) \geq \Omega (n^{d + 1 - (d + 1)/(2^{d + 1} - 2)})$. Furthermore, this lower bound holds unconditionally for 2-LC (locally constructible) spheres, which includes all shellable spheres and therefore all polytopes. We also prove an upper bound on $ex(n, S^d)$ of $O(n^{d + 1 - 1/2^{d - 1}})$ using a simple induction argument. We conjecture that the upper bound can be improved to match the conditional lower bound.
QuickSelect (also known as Find), introduced by Hoare ((1961) Commun. ACM4 321–322.), is a randomized algorithm for selecting a specified order statistic from an input sequence of $n$ objects, or rather their identifying labels usually known as keys. The keys can be numeric or symbol strings, or indeed any labels drawn from a given linearly ordered set. We discuss various ways in which the cost of comparing two keys can be measured, and we can measure the efficiency of the algorithm by the total cost of such comparisons.
We define and discuss a closely related algorithm known as QuickVal and a natural probabilistic model for the input to this algorithm; QuickVal searches (almost surely unsuccessfully) for a specified population quantile $\alpha \in [0, 1]$ in an input sample of size $n$. Call the total cost of comparisons for this algorithm $S_n$. We discuss a natural way to define the random variables $S_1, S_2, \ldots$ on a common probability space. For a general class of cost functions, Fill and Nakama ((2013) Adv. Appl. Probab.45 425–450.) proved under mild assumptions that the scaled cost $S_n / n$ of QuickVal converges in $L^p$ and almost surely to a limit random variable $S$. For a general cost function, we consider what we term the QuickVal residual:
\begin{equation*} \rho _n \,{:\!=}\, \frac {S_n}n - S. \end{equation*}
The residual is of natural interest, especially in light of the previous analogous work on the sorting algorithm QuickSort (Bindjeme and Fill (2012) 23rd International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods for the Analysis of Algorithms (AofA'12), Discrete Mathematics, and Theoretical Computer Science Proceedings, AQ, Association: Discrete Mathematics and Theoretical Computer Science, Nancy, pp. 339–348; Neininger (2015) Random Struct. Algorithms46 346–361; Fuchs (2015) Random Struct. Algorithms46 677–687; Grübel and Kabluchko (2016) Ann. Appl. Probab.26 3659–3698; Sulzbach (2017) Random Struct. Algorithms50 493–508). In the case $\alpha = 0$ of QuickMin with unit cost per key-comparison, we are able to calculate–àla Bindjeme and Fill for QuickSort (Bindjeme and Fill (2012) 23rd International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods for the Analysis of Algorithms (AofA'12), Discrete Mathematics and Theoretical Computer Science Proceedings, AQ, Association: Discrete Mathematics and Theoretical Computer Science, Nancy, pp. 339–348.)–the exact (and asymptotic) $L^2$-norm of the residual. We take the result as motivation for the scaling factor $\sqrt {n}$ for the QuickVal residual for general population quantiles and for general cost. We then prove in general (under mild conditions on the cost function) that $\sqrt {n}\,\rho _n$ converges in law to a scale mixture of centered Gaussians, and we also prove convergence of moments.