To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
We consider stationary configurations of points in Euclidean space that are marked by positive random variables called scores. The scores are allowed to depend on the relative positions of other points and outside sources of randomness. Such models have been thoroughly studied in stochastic geometry, e.g. in the context of random tessellations or random geometric graphs. It turns out that in a neighborhood of a point with an extreme score it is possible to rescale positions and scores of nearby points to obtain a limiting point process, which we call the tail configuration. Under some assumptions on dependence between scores, this local limit determines the global asymptotics for extreme scores within increasing windows in $\mathbb{R}^d$. The main result establishes the convergence of rescaled positions and clusters of high scores to a Poisson cluster process, quantifying the idea of the Poisson clumping heuristic by Aldous (1989, in the point process setting). In contrast to the existing results, our framework allows for explicit calculation of essentially all extremal quantities related to the limiting behavior of extremes. We apply our results to models based on (marked) Poisson processes where the scores depend on the distance to the kth nearest neighbor and where scores are allowed to propagate through a random network of points depending on their locations.
Simulations of critical phenomena, such as wildfires, epidemics, and ocean dynamics, are indispensable tools for decision-making. Many of these simulations are based on models expressed as Partial Differential Equations (PDEs). PDEs are invaluable inductive inference engines, as their solutions generalize beyond the particular problems they describe. Methods and insights acquired by solving the Navier–Stokes equations for turbulence can be very useful in tackling the Black-Scholes equations in finance. Advances in numerical methods, algorithms, software, and hardware over the last 60 years have enabled simulation frontiers that were unimaginable a couple of decades ago. However, there are increasing concerns that such advances are not sustainable. The energy demands of computers are soaring, while the availability of vast amounts of data and Machine Learning(ML) techniques are challenging classical methods of inference and even the need of PDE based forecasting of complex systems. I believe that the relationship between ML and PDEs needs to be reset. PDEs are not the only answer to modeling and ML is not necessarily a replacement, but a potent companion of human thinking. Algorithmic alloys of scientific computing and ML present a disruptive potential for the reliable and robust forecasting of complex systems. In order to achieve these advances, we argue for a rigorous assessment of their relative merits and drawbacks and the adoption of probabilistic thinking for developing complementary concepts between ML and scientific computing. The convergence of AI and scientific computing opens new horizons for scientific discovery and effective decision-making.
We study convergence rates, in mean, for the Hausdorff metric between a finite set of stationary random variables and their common support, which is supposed to be a compact subset of $\mathbb{R}^d$. We propose two different approaches for this study. The first is based on the notion of a minimal index. This notion is introduced in this paper. It is in the spirit of the extremal index, which is much used in extreme value theory. The second approach is based on a $\beta$-mixing condition together with a local-type dependence assumption. More precisely, all our results concern stationary $\beta$-mixing sequences satisfying a tail condition, known as the (a, b)-standard assumption, together with a local-type dependence condition or stationary sequences satisfying the (a, b)-standard assumption and having a positive minimal index. We prove that the optimal rates of the independent and identically distributed setting can be reached. We apply our results to stationary Markov chains on a ball, or to a class of Markov chains on a circle or on a torus. We study with simulations the particular examples of a Möbius Markov chain on the unit circle and of a Markov chain on the unit square wrapped on a torus.
Previously, we reported the persistence of the bacterial pathogen Neisseria meningitidis on fomites, indicating a potential route for environmental transmission. The current goal was to identify proteins that vary among strains of meningococci that have differing environmental survival. We carried out a proteomic analysis of two strains that differ in their potential for survival outside the host. The Group B epidemic strain NZ98/254 and Group W carriage strain H34 were cultured either at 36 °C, 5% CO2, and 95% relative humidity (RH) corresponding to host conditions in the nasopharynx, or at lower humidities of 22% or 30% RH at 30 °C, for which there was greater survival on fomites. For NZ98/254, the shift to lower RH and temperature was associated with increased abundance of proteins involved in metabolism, stress responses, and outer membrane components, including pili and porins. In contrast, H34 responded to lower RH by decreasing the abundance of multiple proteins, indicating that the lower viability of H34 may be linked to decreased capacity to mount core protective responses. The results provide a snapshot of bacterial proteins and metabolism that may be related to normal fitness, to the greater environmental persistence of NZ98/254 compared to H34, and potentially to differences in transmission and pathogenicity.
This work concerns stochastic differential equations with jumps. We prove convergence for solutions to a sequence of (possibly degenerate) stochastic differential equations with jumps when the coefficients converge in some appropriate sense. Then some special cases are analyzed and some concrete and verifiable conditions are given.
This article studies the robustness of quasi-maximum-likelihood estimation in hidden Markov models when the regime-switching structure is misspecified. Specifically, we examine the case where the data-generating process features a hidden Markov regime sequence with covariate-dependent transition probabilities, but estimation proceeds under a simplified mixture model that assumes regimes are independent and identically distributed. We show that the parameters governing the conditional distribution of the observables can still be consistently estimated under this misspecification, provided certain regularity conditions hold. Our results highlight a practical benefit of using computationally simpler mixture models in settings where regime dependence is complex or difficult to model directly.
We develop explicit bounds for the tail of the distribution of the all-time supremum of a random walk with negative drift, where the increments have a truncated heavy-tailed distribution. As an application, we consider a ruin problem in the presence of reinsurance.
We study the uniform convergence rates of nonparametric estimators for a probability density function and its derivatives when the density has a known pole. Such situations arise in some structural microeconometric models, for example, in auction, labor, and consumer search, where uniform convergence rates of density functions are important for nonparametric and semiparametric estimation. Existing uniform convergence rates based on Rosenblatt’s kernel estimator are derived under the assumption that the density is bounded. They are not applicable when there is a pole in the density. We treat the pole nonparametrically and show various kernel-based estimators can attain any convergence rate that is slower than the optimal rate when the density is bounded uniformly over an appropriately expanding support under mild conditions.
We investigate geometric properties of invariant spatio-temporal random fields $X\colon\mathbb M^d\times \mathbb R\to \mathbb R$ defined on a compact two-point homogeneous space $\mathbb M^d$ in any dimension $d\ge 2$, and evolving over time. In particular, we focus on chi-squared-distributed random fields, and study the large-time behavior (as $T\to +\infty$) of the average on [0,T] of the volume of the excursion set on the manifold, i.e. of $\lbrace X(\cdot, t)\ge u\rbrace$ (for any $u >0$). The Fourier components of X may have short or long memory in time, i.e. integrable or non-integrable temporal covariance functions. Our argument follows the approach developed in Marinucci et al. (2021) and allows us to extend their results for invariant spatio-temporal Gaussian fields on the two-dimensional unit sphere to the case of chi-squared distributed fields on two-point homogeneous spaces in any dimension. We find that both the asymptotic variance and limiting distribution, as $T\to +\infty$, of the average empirical volume turn out to be non-universal, depending on the memory parameters of the field X.
The efficient integration of outpatient (OPD) and inpatient (IPD) care is a critical challenge in modern healthcare, essential for maximising patient-centred care and resource utilisation. OPD, encompassing preventive services, routine check-ups, and chronic disease management, aims to minimise the need for hospitalisations. Conversely, IPD remains crucial for acute interventions and complex medical needs. This paper investigates the nuanced relationship between OPD and IPD: specifically, we seek to determine if increased OPD utilisation while improving overall health, leads to a reduction or increase in subsequent IPD utilisation, duration, and associated costs. We analyse anonymised data from Indian organisations providing company-sponsored OPD and IPD insurance to their employees between 2021 and 2024. By examining the correlation between OPD utilisation patterns and IPD outcomes, we aim to provide data-driven insights for effective healthcare integration strategies. Furthermore, we explore the feasibility of developing a personalised “Wellbeing Rating” derived from longitudinal OPD insurance utilisation data. This automated methodology aims to provide a continuous, dynamic health assessment, moving beyond the limitations of traditional, sporadic medical examinations by leveraging the comprehensive data inherent within insured OPD coverage.
We introduce a new class of heavy-tailed distributions for which any weighted average of independent and identically distributed random variables is larger than one such random variable in (usual) stochastic order. We show that many commonly used extremely heavy-tailed (i.e., infinite-mean) distributions, such as the Pareto, Fréchet, and Burr distributions, belong to this class. The established stochastic dominance relation can be further generalized to allow negatively dependent or non-identically distributed random variables. In particular, the weighted average of non-identically distributed random variables dominates their distribution mixtures in stochastic order.
Temporal variability and methodological differences in data normalization, among other factors, complicate effective trend analysis of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) wastewater surveillance data and its alignment with coronavirus disease 2019 (COVID-19) clinical outcomes. As there is no consensus approach for these analyses yet, this study explored the use of piecewise linear trend analysis (joinpoint regression) to identify significant trends and trend turning points in SARS-CoV-2 RNA wastewater concentrations (normalized and non-normalized) and corresponding COVID-19 case rates in the greater Las Vegas metropolitan area (Nevada, USA) from mid-2020 to April 2023. The analysis period was stratified into three distinct phases based on temporal changes in testing protocols, vaccination availability, SARS-CoV-2 variant prevalence, and public health interventions. While other statistical methodologies may require fewer parameter specifications, joinpoint regression provided an interpretable framework for characterization and comparison of trends and trend turning points, revealing sewershed-specific variations in trend magnitude and timing that also aligned with known variant-driven waves. Week-level trend agreement corroborated previous findings demonstrating a close relationship between SARS-CoV-2 wastewater surveillance data and COVID-19 outcomes. These findings guide future applications of advanced statistical methodologies and support the continued integration of wastewater-based epidemiology as a complementary approach to traditional COVID-19 surveillance systems.
Human toxocariasis is a worldwide parasitic disease caused by zoonotic roundworms of the genus Toxocara, which can cause blindness and epilepsy. The aim of this study was to estimate the risk of food-borne transmission of Toxocara spp. to humans in the UK by developing mathematical models created in a Bayesian framework. Parameter estimation was based on published experimental studies and field data from southern England, with qPCR Cq values used as a measure of eggs in spinach portions and ELISA optical density data as an indirect measure of larvae in meat portions. The average human risk of Toxocara spp. infection, per portion consumed, was estimated as 0.016% (95% CI: 0.000–0.100%) for unwashed leafy vegetables and 0.172% (95% CI: 0.000–0.400%) for undercooked meat. The average proportion of meat portions estimated positive for Toxocara spp. larvae was 0.841% (95% CI: 0.300–1.400%), compared to 0.036% (95% CI: 0.000–0.200%) of spinach portions containing larvated Toxocara spp. eggs. Overall, the models estimated a low risk of infection with Toxocara spp. by consuming these foods. However, given the potentially severe human health consequences of toxocariasis, intervention strategies to reduce environmental contamination with Toxocara spp. eggs and correct food preparation are advised.
We prove an ergodic theorem for Markov chains indexed by the Ulam–Harris–Neveu tree over large subsets with arbitrary shape under two assumptions: (i) with high probability, two vertices in the large subset are far from each other, and (ii) with high probability, those two vertices have their common ancestor close to the root. The assumption on the common ancestor can be replaced by some regularity assumption on the Markov transition kernel. We verify that these assumptions are satisfied for some usual trees. Finally, with Markov chain Monte Carlo considerations in mind, we prove that when the underlying Markov chain is stationary and reversible, the Markov chain, that is the line graph, yields minimal variance for the empirical average estimator among trees with a given number of nodes. In doing so, we prove that the Hosoya–Wiener polynomial is minimized over $[{-}1,1]$ by the line graph among trees of a given size.
We analyse a Markovian SIR epidemic model where individuals either recover naturally or are diagnosed, leading to isolation and potential contact tracing. Our focus is on digital contact tracing via a tracing app, considering both its standalone use and its combination with manual tracing. We prove that as the population size n grows large, the epidemic process converges to a limiting process, which, unlike with typical epidemic models, is not a branching process due to dependencies created by contact tracing. However, by grouping to-be-traced individuals into macro-individuals, we derive a multi-type branching process interpretation, allowing computation of the reproduction number R. This is then converted to an individual reproduction number $R^\mathrm{(ind)}$, which, in contrast to R, decays monotonically with the fraction of app-users, while both share the same threshold at 1. Finally, we compare digital (only) contact tracing and manual (only) contact tracing, proving that the critical fraction of app-users, $\pi_{\mathrm{c}}$, required for $R=1$ is higher than the critical fraction manually contact-traced, $p_{\mathrm{c}}$, for manual tracing.
This chapter delves into the theory and application of reversible Markov Chain Monte Carlo (MCMC) algorithms, focusing on their role in Bayesian inference. It begins with the Metropolis–Hastings algorithm and explores variations such as component-wise updates, and the Metropolis-Adjusted Langevin Algorithm (MALA). The chapter also discusses Hamiltonian Monte Carlo (HMC) and the importance of scaling MCMC methods for high-dimensional models or large datasets. Key challenges in applying reversible MCMC to large-scale problems are addressed, with a focus on computational efficiency and algorithmic adjustments to improve scalability.