To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Acute gastrointestinal infections (AGIs) can lead to significant morbidity and mortality. In diagnosing AGI, culture-independent diagnostic tests offer advantages over traditional methods and increase the chance of detecting multiple pathogens (co-detection). A retrospective analysis of data from a tertiary pediatric hospital was conducted to characterize occurrence of AGI co-detections and compare outcomes with patients who had only one AGI pathogen detected. Medical records were obtained for patients with stool samples tested using BioFire FilmArray GI Panel between 1 January 2016 and 31 December 2020. Data were described using descriptive statistics, correlation analysis, and logistic regression to identify risk factors and estimate co-detection rates. During the study period, 12,753 patients had a total of 17,159 stool samples tested. Of these, 8,212(47.9%) tested positive, with 6,040(73.6%) being single detections and 2,172(26.4%) being co-detections. Patients with single detection experienced higher hospitalization rates than patients with co-detection. Patients 1–4 years old exhibited the highest co-detection rate relative to other age groups, while Hispanic/Latino individuals were 1.75 times more likely to have co-detection than other races. This study emphasizes the significance of understanding pathogen interactions concerning clinical characteristics and epidemiology of AGI, and the necessity for effective diagnostic strategies and optimal healthcare resource allocation.
Addressing and predicting degenerative phenomena in domains such as health care and engineering, two fundamental fields of vital importance for society, offers valuable insights into early warning steps and critical event forecasting, leading to far-reaching implications for safety and resource allocation. By harnessing the power of data-driven insights, prognostics becomes the principal component of predicting such phenomena. Developing clustering techniques as feature extractors acts as an intermediate step between the raw incoming data and prognostics and provides the opportunity to unveil hidden relationships within complex datasets. However, when limited, noisy, and multimodal data are available in a label-free format, extensive preprocessing, and unreliable, complicated models are required for extracting meaningful features. This prohibits the development of adaptable methods in diverse domains that are in favor of robustness and interpretability. In this regard, this study introduces a novel unsupervised deep clustering model for feature extraction in degenerative phenomena. The model innovatively extracts prognostic-related features from raw data via clustering analysis, characterized by an increasing monotonic behavior representing system deterioration. This monotonicity is partial rather than complete, to incorporate the potential occurrence of oscillations in the degradation trajectory of the system or noise-related data, reflecting real-world scenarios. Its performance, robustness, generalizability, and interpretability are evaluated across diverse domains utilizing three datasets from health care and engineering featuring limited, noisy, high-dimensional, and multimodal raw signals. Results show that the model extracts meaningful prognostic-related features in both domains and all datasets, without a significant alteration in its architecture and independently of the chosen prognostic algorithm.
We study time-inhomogeneous random walks on finite groups in the case where each random walk step need not be supported on a generating set of the group. When the supports of the random walk steps satisfy a natural condition involving normal subgroups of quotients of the group, we show that the random walk converges to the uniform distribution on the group and give bounds for the convergence rate using spectral properties of the random walk steps. As an application, we use the moment method of Wood to prove a universality theorem for cokernels of random integer matrices allowing some dependence between entries.
Powassan virus (POWV), a tick-borne flavivirus, is an emerging public health threat in the United States. In New York State (NYS), incidence of human POWV infection has increased in recent years. We describe the epidemiology of confirmed and probable POWV infection cases reported in NYS from 2013 to 2023. A total of 44 human cases were reported over the study period, with the highest incidence rates in Columbia and Putnam counties. Most cases occurred in White, non-Hispanic males over age 50. Hospitalization was reported in 91% of cases, and 11% were fatal. Human case data and tick surveillance results were analysed to assess spatiotemporal patterns of disease emergence. Spatial analysis revealed clustering of human cases in the Capital and Metropolitan regions of NYS. The prevalence of POWV in adult tick populations increased significantly statewide, and entomological risk was positively but modestly correlated to disease incidence at the ZIP code level. These findings suggest that POWV infection is emerging in geographically concentrated areas of NYS, highlighting the need for enhanced surveillance and targeted prevention efforts in high-risk regions.
We consider a generalization of the forest fire model on $\mathbb{Z}_+$ with ignition at zero only, studied by Volkov (2009 ALEA6, 399–414). Unlike that model, we allow delays in the spread of the fires and the non-zero burning time of individual ‘trees’. We obtain some general properties for this model, which cover, among others, the phenomenon of an ‘infinite fire’, not present in the original model.
'High-Dimensional Probability,' winner of the 2019 PROSE Award in Mathematics, offers an accessible and friendly introduction to key probabilistic methods for mathematical data scientists. Streamlined and updated, this second edition integrates theory, core tools, and modern applications. Concentration inequalities are central, including classical results like Hoeffding's and Chernoff's inequalities, and modern ones like the matrix Bernstein inequality. The book also develops methods based on stochastic processes – Slepian's, Sudakov's, and Dudley's inequalities, generic chaining, and VC-based bounds. Applications include covariance estimation, clustering, networks, semidefinite programming, coding, dimension reduction, matrix completion, and machine learning. New to this edition are 200 additional exercises, alongside extra hints to assist with self-study. Material on analysis, probability, and linear algebra has been reworked and expanded to help bridge the gap from a typical undergraduate background to a second course in probability.
We develop a methodology for conducting inference on extreme quantiles of unobserved individual heterogeneity (e.g., heterogeneous coefficients and treatment effects) in panel data and meta-analysis settings. Inference is challenging in such settings: only noisy estimates of heterogeneity are available, and central limit approximations perform poorly in the tails. We derive a necessary and sufficient condition under which noisy estimates are informative about extreme quantiles, along with sufficient rate and moment conditions. Under these conditions, we establish an extreme value theorem and an intermediate order theorem for noisy estimates. These results yield simple optimization-free confidence intervals (CIs) for extreme quantiles. Simulations show that our CIs have favorable coverage and that the rate conditions matter for the validity of inference. We illustrate the method with an application to firm productivity differences across areas of varying population density. By analyzing the left tails of the productivity distributions, we find no evidence of stronger firm selection in more densely populated areas.
Carbapenem-resistant Enterobacterales were isolated from the outlet of a wastewater treatment plant in Kristianstad in southern Sweden, during spring and summer of the year 2024. MALDI-ToF MS identification and subsequent whole-genome sequencing identified eight Klebsiella pneumoniae strains belonging to ST437 and ST873 and 10 Escherichia coli strains belonging to ST167, ST648, ST1284, and ST8346. All strains, except K. pneumoniae ST873, were NDM-5 positive. K. pneumaniae ST437 and E. coli ST8346 carried two carbapenemase genes, blaNDM-5 and blaOXA-181, as well as the extended-spectrum-β-lactamases (ESBL) gene blaCTX-M-15. These two multi-drug-resistant ST variants that are widespread globally, that have previously not been detected in clinical settings in Sweden, are now detected in treated wastewater in a Swedish middle-sized town.
In this work, by considering coherent systems comprising independent components with discrete lifetimes, we introduce the notion of discrete-time signature and then discuss some of its properties. With the use of the introduced signature, a stochastic ordering result is also established. We then introduce transformation formulas for the discrete-time signature to facilitate the comparison of systems of different sizes. Some examples are also presented to illustrate all the results developed here.
This paper presents a reliability-constrained Bayesian optimization framework for structural design under uncertainty, addressing challenges in stochastic optimization where the objectives and constraints are defined implicitly by potentially expensive numerical models. Our approach explicitly accounts for parameter uncertainty using results from Bayesian quadrature for uncertainty propagation in Gaussian process surrogate models. The method accommodates arbitrary probability distributions and employs gradient-based optimization for acquisition function maximization, strategically selecting sample points to minimize numerical model evaluations. We demonstrate our algorithm’s superior performance over random search and conventional Bayesian optimization through both an analytical test function and a prestressed tie-beam design case study, showing its practical applicability to structural optimization problems.
Analyzing topics and emotions in social media activism offers valuable insights into the competing voices that shape digital discourse. However, existing research has largely neglected the influence of geographic and linguistic diversity on public dialogue during crises. To address this gap, it is essential to recognize the varied perspectives of local communities and language groups. Doing so helps uncover specific local needs, ensures more inclusive representation, and supports the development of solutions that are responsive to the local context. We leverage machine learning models to analyze 1,036,111 public tweets from the #NoMore movement, including tweets containing #NoMore, #EthiopiaPrevails, and #SayNoMore. Our analysis examined the differences in content, emotional responses, and user influence by comparing tweets from Ethiopia and the United States (US), as well as those written in English and Amharic. The findings reveal distinct societal perspectives, emotional expressions, and opinion dynamics. Ethiopian users emphasized local issues with higher fear and joy responses, while users from the US leaned toward peace-related themes with spikes in anger. Amharic tweets focused on domestic concerns with greater emotional intensity than English tweets. These insights help surface region and language-specific perspectives often marginalized in mainstream coverage, paving the way for more inclusive and effective approaches to societal challenges.
We consider the filtering problem associated with partially observed McKean–Vlasov stochastic differential equations (SDEs). The model consists of data that are observed at regular and discrete times; the objective is to compute the conditional expectation of (functionals) of the solutions of the SDE at the current time. This problem is challenging even in the ordinary SDE case and requires numerical approximations. Based on the ideas in Ben Rached et al. (2024) and dos Reis et al. (2023), we develop a new particle filter (PF) and multilevel particle filter (MLPF) to approximate the aforementioned expectations. We prove under assumptions that, for $\varepsilon>0$, to obtain a mean square error of $\mathcal{O}(\varepsilon^2)$ the PF has a cost per observation time of $\mathcal{O}(\varepsilon^{-5})$, and the MLPF costs $\mathcal{O}(\varepsilon^{-4})$ (best case) or $\mathcal{O}(\varepsilon^{-4}\log(\varepsilon)^2)$ (worst case). Our theoretical results are supported by numerical experiments.
The marked Hawkes risk process is a compound point process where the occurrence and amplitude of past events impact the future. Since data in real life are acquired over a discrete time grid, we propose a strong discrete-time approximation of the continuous-time risk process obtained by embedding from the same Poisson measure. We then prove trajectorial convergence results in both fractional Sobolev spaces and the Skorokhod space, hence extending the theorems proven in Huang and Khabou ((2023). Stoch. Process. Appl.161, 201–241) and Kirchner ((2016). Stoch. Process. Appl.126(8), 2494–2525). We also provide upper bounds on the convergence speed with explicit dependence on the size of the discretization step, the time horizon, and the regularity of the kernel.
An increasing number of reports highlight the potential of machine learning (ML) methodologies over the conventional generalised linear model (GLM) for non-life insurance pricing. In parallel, national and international regulatory institutions are accentuating their focus on pricing fairness to quantify and mitigate algorithmic differences and discrimination. However, comprehensive studies that assess both pricing accuracy and fairness remain scarce. We propose a benchmark of the GLM against mainstream regularised linear models and tree-based ensemble models under two popular distribution modelling strategies (Poisson-gamma and Tweedie), with respect to key criteria including estimation bias, deviance, risk differentiation, competitiveness, loss ratios, discrimination and fairness. Pricing performance and fairness were assessed simultaneously on the same samples of premium estimates for GLM and ML models. The models were compared on two open-access motor insurance datasets, each with a different type of cover (fully comprehensive and third-party liability). While no single ML model outperformed across both pricing and discrimination metrics, the GLM significantly underperformed for most. The results indicate that ML may be considered a realistic and reasonable alternative to current practices. We advocate that benchmarking exercises for risk prediction models should be carried out to assess both pricing accuracy and fairness for any given portfolio.
This paper presents an actuarially oriented approach for estimating health state utility values using an enhanced EQ-5D-5L framework that incorporates demographic heterogeneity directly into a Generalised Linear Model (GLM). Using data from 148 patients with Stage IV non-small cell lung cancer (NSCLC) in South Africa, an inverse Gaussian GLM was fitted with demographic variables and EQ-5D-5L domain responses to explain variation in visual analogue scale (VAS) scores. Model selection relied on Akaike Information Criterion, Bayesian Information Criterion, and residual deviance, and extensive diagnostic checks confirmed good calibration, no overdispersion, and strong robustness under bootstrap validation. The final model identified age, gender, home language, and financial dependency as significant predictors of perceived health, demonstrating that utility values differ meaningfully across demographic groups. By generating subgroup-specific estimates rather than relying on uniform value sets, the framework supports more context-sensitive cost-effectiveness modelling and fairer resource allocation. Although developed in the South African NSCLC setting, the methodology is generalisable and offers actuaries and health economists a replicable tool for integrating population heterogeneity into Health Technology Assessment, pricing analysis, and value-based care.
Surrogate models have gained widespread popularity for their effectiveness in replacing computationally expensive numerical analyses, particularly in scenarios such as design optimization procedures, requiring hundreds or thousands of simulations. While one-shot sampling methods—where all samples are generated in a single stage without prior knowledge of the required sample size—are commonly adopted in the creation of surrogate models, these methods face significant limitations. Given that the characteristics of the underlying system are generally unknown prior to training, adopting one-shot sampling can lead to suboptimal model performance or unnecessary computational costs, especially in complex or high-dimensional problems. This paper addresses these challenges by proposing a novel, model-independent adaptive sampling approach with batch selection, termed Cross-Validation Batch Adaptive Sampling for High-Efficiency Surrogates (CV-BASHES). CV-BASHES is first validated using two analytical functions to explore its flexibility and accuracy under different configurations, confirming its robustness. Comparative studies on the same functions with two state-of-the-art methods, maximum projection (MaxPro) and scalable adaptive sampling (SAS), demonstrate the superior accuracy and robustness of CV-BASHES. Its applicability is further demonstrated through a geotechnical application, where CV-BASHES is used to develop a surrogate model to predict the horizontal deformation of a diaphragm wall supporting a deep excavation. Results show that CV-BASHES efficiently selects training samples, reducing the dataset size while maintaining high surrogate accuracy. By offering more efficient sampling strategies, CV-BASHES streamlines and enhances the process of creating machine learning models as surrogates for tackling complex problems in general engineering disciplines.
Pretesting for exogeneity has become routine in many empirical applications involving instrumental variables (IVs) to decide whether the ordinary least squares or IV-based method is appropriate. Guggenberger (2010a, Econometric Theory, 26, 369–382) shows that the second-stage test – based on the outcome of a Durbin-Wu-Hausman-type pretest in the first stage – exhibits extreme size distortion, with asymptotic size equal to 1 when the standard critical values are used. In this paper, we first show that both conditional and unconditional on the data, standard wild bootstrap procedures are invalid for two-stage testing. Second, we propose an identification-robust two-stage test statistic that switches between OLS-based and weak-IV-robust statistics. Third, we develop a size-adjusted wild bootstrap approach for our two-stage test that integrates specific wild bootstrap critical values with an appropriate size-adjustment method. We establish uniform validity of this procedure under conditional heteroskedasticity or clustering in the sense that the resulting tests achieve correct asymptotic size, regardless of whether the identification is strong or weak. Our procedure is especially valuable for empirical researchers facing potential weak identification. In such settings, its power advantage is notable: whereas weak-IV-robust methods maintain correct size but often suffer from relatively low power, our approach achieves better performance.
We consider two-person zero-sum semi-Markov games with incomplete reward information on one side under the expected discount criterion. First, we prove that the value function exists and satisfies the Shapley equation. From the Shapley equation, we construct an optimal policy for the informed player. Second, to show the existence of an optimal policy for the uninformed player, we introduce an auxiliary dual game and establish the relationship between the primal game and the dual game. By this relationship, we also prove the existence of the value function of the dual game, and then construct an optimal policy for the uninformed player in the primal game. Finally, we develop two iterative algorithms to compute $\varepsilon$-optimal policies for the informed player and the uninformed player, respectively.
A random variable $\xi$ has a light-tailed distribution (for short, is light-tailed) if it possesses a finite exponential moment, ${\mathbb{E}} \, {\exp}{(\lambda \xi)} <\infty$ for some $\lambda >0$, and has a heavy-tailed distribution (is heavy-tailed) if ${\mathbb{E}} \, {\exp}{(\lambda\xi)} = \infty$ for all $\lambda>0$. Leipus et al. (2023 AIMS Math.8, 13066–13072) presented a particular example of a light-tailed random variable that is the minimum of two independent heavy-tailed random variables. We show that this phenomenon is universal: any light-tailed random variable with right-unbounded support may be represented as the minimum of two independent heavy-tailed random variables. Moreover, a more general fact holds: these two independent random variables may have as heavy-tailed distributions as we wish. Further, we extend the latter result to the minimum of any finite number of independent random variables. We also comment on possible generalizations of our result to the case of dependent random variables.
Longevity risk significantly impacts the reserve adequacy ratio of annuity issuers, thereby reducing product profitability. Effectively managing this risk has thus become a priority for insurance companies. A natural hedging strategy, which involves balancing longevity risk through an optimised portfolio of life insurance and annuity products, offers a promising solution and has attracted considerable academic attention in recent years. In this study, we construct a realistic portfolio scenario comprising annuities and life insurance policies across various ages and genders. By applying Cholesky decomposition, we transform the portfolio into an uncorrelated linear model. Our objective function minimises the variance in portfolio value changes, allowing us to explore the impact of mortality on longevity risk mitigation through natural hedging. Using actuarial mathematics and the Bayesian MCMC algorithm, we analyse the factors influencing the hedging effectiveness of a portfolio with minimised variance. Empirical findings indicate that the optimal life-to-annuity ratio is influenced by multiple factors, including gender, age, projection period, and forecast horizon. Based on these findings, we recommend that insurance companies adjust their business structures and actively pursue product innovation to enhance longevity risk management.