To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
An increasing number of reports highlight the potential of machine learning (ML) methodologies over the conventional generalised linear model (GLM) for non-life insurance pricing. In parallel, national and international regulatory institutions are accentuating their focus on pricing fairness to quantify and mitigate algorithmic differences and discrimination. However, comprehensive studies that assess both pricing accuracy and fairness remain scarce. We propose a benchmark of the GLM against mainstream regularised linear models and tree-based ensemble models under two popular distribution modelling strategies (Poisson-gamma and Tweedie), with respect to key criteria including estimation bias, deviance, risk differentiation, competitiveness, loss ratios, discrimination and fairness. Pricing performance and fairness were assessed simultaneously on the same samples of premium estimates for GLM and ML models. The models were compared on two open-access motor insurance datasets, each with a different type of cover (fully comprehensive and third-party liability). While no single ML model outperformed across both pricing and discrimination metrics, the GLM significantly underperformed for most. The results indicate that ML may be considered a realistic and reasonable alternative to current practices. We advocate that benchmarking exercises for risk prediction models should be carried out to assess both pricing accuracy and fairness for any given portfolio.
This paper presents an actuarially oriented approach for estimating health state utility values using an enhanced EQ-5D-5L framework that incorporates demographic heterogeneity directly into a Generalised Linear Model (GLM). Using data from 148 patients with Stage IV non-small cell lung cancer (NSCLC) in South Africa, an inverse Gaussian GLM was fitted with demographic variables and EQ-5D-5L domain responses to explain variation in visual analogue scale (VAS) scores. Model selection relied on Akaike Information Criterion, Bayesian Information Criterion, and residual deviance, and extensive diagnostic checks confirmed good calibration, no overdispersion, and strong robustness under bootstrap validation. The final model identified age, gender, home language, and financial dependency as significant predictors of perceived health, demonstrating that utility values differ meaningfully across demographic groups. By generating subgroup-specific estimates rather than relying on uniform value sets, the framework supports more context-sensitive cost-effectiveness modelling and fairer resource allocation. Although developed in the South African NSCLC setting, the methodology is generalisable and offers actuaries and health economists a replicable tool for integrating population heterogeneity into Health Technology Assessment, pricing analysis, and value-based care.
Surrogate models have gained widespread popularity for their effectiveness in replacing computationally expensive numerical analyses, particularly in scenarios such as design optimization procedures, requiring hundreds or thousands of simulations. While one-shot sampling methods—where all samples are generated in a single stage without prior knowledge of the required sample size—are commonly adopted in the creation of surrogate models, these methods face significant limitations. Given that the characteristics of the underlying system are generally unknown prior to training, adopting one-shot sampling can lead to suboptimal model performance or unnecessary computational costs, especially in complex or high-dimensional problems. This paper addresses these challenges by proposing a novel, model-independent adaptive sampling approach with batch selection, termed Cross-Validation Batch Adaptive Sampling for High-Efficiency Surrogates (CV-BASHES). CV-BASHES is first validated using two analytical functions to explore its flexibility and accuracy under different configurations, confirming its robustness. Comparative studies on the same functions with two state-of-the-art methods, maximum projection (MaxPro) and scalable adaptive sampling (SAS), demonstrate the superior accuracy and robustness of CV-BASHES. Its applicability is further demonstrated through a geotechnical application, where CV-BASHES is used to develop a surrogate model to predict the horizontal deformation of a diaphragm wall supporting a deep excavation. Results show that CV-BASHES efficiently selects training samples, reducing the dataset size while maintaining high surrogate accuracy. By offering more efficient sampling strategies, CV-BASHES streamlines and enhances the process of creating machine learning models as surrogates for tackling complex problems in general engineering disciplines.
Pretesting for exogeneity has become routine in many empirical applications involving instrumental variables (IVs) to decide whether the ordinary least squares or IV-based method is appropriate. Guggenberger (2010a, Econometric Theory, 26, 369–382) shows that the second-stage test – based on the outcome of a Durbin-Wu-Hausman-type pretest in the first stage – exhibits extreme size distortion, with asymptotic size equal to 1 when the standard critical values are used. In this paper, we first show that both conditional and unconditional on the data, standard wild bootstrap procedures are invalid for two-stage testing. Second, we propose an identification-robust two-stage test statistic that switches between OLS-based and weak-IV-robust statistics. Third, we develop a size-adjusted wild bootstrap approach for our two-stage test that integrates specific wild bootstrap critical values with an appropriate size-adjustment method. We establish uniform validity of this procedure under conditional heteroskedasticity or clustering in the sense that the resulting tests achieve correct asymptotic size, regardless of whether the identification is strong or weak. Our procedure is especially valuable for empirical researchers facing potential weak identification. In such settings, its power advantage is notable: whereas weak-IV-robust methods maintain correct size but often suffer from relatively low power, our approach achieves better performance.
We consider two-person zero-sum semi-Markov games with incomplete reward information on one side under the expected discount criterion. First, we prove that the value function exists and satisfies the Shapley equation. From the Shapley equation, we construct an optimal policy for the informed player. Second, to show the existence of an optimal policy for the uninformed player, we introduce an auxiliary dual game and establish the relationship between the primal game and the dual game. By this relationship, we also prove the existence of the value function of the dual game, and then construct an optimal policy for the uninformed player in the primal game. Finally, we develop two iterative algorithms to compute $\varepsilon$-optimal policies for the informed player and the uninformed player, respectively.
A random variable $\xi$ has a light-tailed distribution (for short, is light-tailed) if it possesses a finite exponential moment, ${\mathbb{E}} \, {\exp}{(\lambda \xi)} <\infty$ for some $\lambda >0$, and has a heavy-tailed distribution (is heavy-tailed) if ${\mathbb{E}} \, {\exp}{(\lambda\xi)} = \infty$ for all $\lambda>0$. Leipus et al. (2023 AIMS Math.8, 13066–13072) presented a particular example of a light-tailed random variable that is the minimum of two independent heavy-tailed random variables. We show that this phenomenon is universal: any light-tailed random variable with right-unbounded support may be represented as the minimum of two independent heavy-tailed random variables. Moreover, a more general fact holds: these two independent random variables may have as heavy-tailed distributions as we wish. Further, we extend the latter result to the minimum of any finite number of independent random variables. We also comment on possible generalizations of our result to the case of dependent random variables.
Longevity risk significantly impacts the reserve adequacy ratio of annuity issuers, thereby reducing product profitability. Effectively managing this risk has thus become a priority for insurance companies. A natural hedging strategy, which involves balancing longevity risk through an optimised portfolio of life insurance and annuity products, offers a promising solution and has attracted considerable academic attention in recent years. In this study, we construct a realistic portfolio scenario comprising annuities and life insurance policies across various ages and genders. By applying Cholesky decomposition, we transform the portfolio into an uncorrelated linear model. Our objective function minimises the variance in portfolio value changes, allowing us to explore the impact of mortality on longevity risk mitigation through natural hedging. Using actuarial mathematics and the Bayesian MCMC algorithm, we analyse the factors influencing the hedging effectiveness of a portfolio with minimised variance. Empirical findings indicate that the optimal life-to-annuity ratio is influenced by multiple factors, including gender, age, projection period, and forecast horizon. Based on these findings, we recommend that insurance companies adjust their business structures and actively pursue product innovation to enhance longevity risk management.
This paper addresses the gap between theoretical modeling of cyber risk propagation and empirical analysis of loss characteristics by introducing a novel approach that integrates both approaches. We model the development of cyber loss counts over time using a discrete-time susceptible-infected-recovered process, linking these counts to covariates, and modeling loss severity with regression models. By incorporating temporal and covariate-dependent transition rates, we eliminate the scaling effect of population size on infection counts, revealing the true underlying dynamics. Simulations show that this susceptible-infected-recovered framework significantly improves aggregate loss prediction accuracy, providing a more effective and practical tool for actuarial assessments and risk management in the cyber risk context.
We developed a dynamic COVID-19 Vaccination Barrier Index (CVBI) at the census-tract level in Clark County, Nevada and assessed its geographic disparities and relationship with COVID-19 vaccination rates over time. Using monthly census-tract data from December 2020 to June 2022, the CVBI integrated demographic, socioeconomic, environmental, housing, and transportation variables, alongside surrogates for vaccination accessibility and vaccine hesitancy. Lagged weighted quantile sum regression was applied to construct monthly indices, while a Besag-York-Mollié model assessed associations with vaccination rates. The results revealed consistent vaccination barriers such as living in group quarters, housing inadequacy, and population density across all vaccination statuses (partial, full, booster). Rural and northern Clark County, especially northeastern Las Vegas, exhibited higher CVBI scores that correlated negatively with vaccination rates. Booster vaccination patterns differed, displaying fewer significantly vulnerable tracts. The dynamic nature of barriers is evident, highlighting temporal shifts in the significance of variables like driving distance to vaccine sites. This study emphasizes the importance of dynamic, localized assessments in identifying vaccination barriers, guiding public health interventions, and informing resource allocation to enhance vaccine accessibility during pandemics.
COVID-19 led to a pandemic in 2020, which officially arrived in Colombia on 6 March 2020. As in other parts of the world, the spread of the virus was underestimated due to the lack of diagnostic tests and follow-up protocols. The present study estimates the number of daily cases of COVID-19 infection compatible with theoretical knowledge of the disease, seroprevalence studies, and records of daily deaths due to the disease. To this end, the REMEDID (Retrospective Methodology to Estimate Daily Infections from Deaths) algorithm was applied in nine Colombian cities. On average, official records detected only around 13% of the maximum number of infected persons in the first wave, which they dated with a delay of 25 days. In addition, there was an average delay of 30 days in detecting the first cases. In particular, in Bogotá, the city with the highest number of infections in Colombia, it was observed that (1) the first infected person arrived on 26 January 2020, 40 days before the official registration; (2) the maximum peak of infections was around 6 times higher than that recorded in the official statistics; and (3) this peak was reached on 08 July 2020, 39 days before the official registration date.
We investigate some investment problems related to maximizing the expected utility of the terminal wealth in a continuous-time Itô–Markov additive market. In this market, the prices of financial assets are described by Markov additive processes that combine Lévy processes with regime-switching models. We give explicit expressions for the solutions to the portfolio selection problem for the hyperbolic absolute risk aversion (HARA) utility, the exponential utility, and the extended logarithmic utility. In addition, we demonstrate that the solutions for the HARA utility are stable in terms of weak convergence when the parameters vary in a suitable way.
We study bond and site Bernoulli percolation models on $\mathbb{Z}^d$ for $d \geq 3$ with parameter p, in both the oriented and non-oriented versions. The main macroscopic quantity of interest is the probability of long-range order, and the existence of a non-trivial threshold is well established. Precise numerical results for the threshold values are available in the literature, but mathematically rigorous bounds are mostly restricted to two-dimensional lattices. Utilizing dynamical coupling techniques, we introduce a comprehensive set of new rigorous upper bounds that corroborate existing numerical values.
The Latent Position Model (LPM) is a popular approach for the statistical analysis of network data. A central aspect of this model is that it assigns nodes to random positions in a latent space, such that the probability of an interaction between each pair of individuals or nodes is determined by their distance in this latent space. A key feature of this model is that it allows one to visualize nuanced structures via the latent space representation. The LPM can be further extended to the Latent Position Cluster Model (LPCM), to accommodate the clustering of nodes by assuming that the latent positions are distributed following a finite mixture distribution. In this paper, we extend the LPCM to accommodate missing network data and apply this to non-negative discrete weighted social networks. By treating missing data as “unusual” zero interactions, we propose a combination of the LPCM with the zero-inflated Poisson distribution. Statistical inference is based on a novel partially collapsed Markov chain Monte Carlo algorithm, where a Mixture-of-Finite-Mixtures (MFM) model is adopted to automatically determine the number of clusters and optimal group partitioning. Our algorithm features a truncated absorb-eject move, which is a novel adaptation of an idea commonly used in collapsed samplers, within the context of MFMs. Another aspect of our work is that we illustrate our results on 3-dimensional latent spaces, maintaining clear visualizations while achieving more flexibility than 2-dimensional models. The performance of this approach is illustrated via three carefully designed simulation studies, as well as four different publicly available real networks, where some interesting new perspectives are uncovered.
The generalized Gompertz distribution—an extension of the standard Gompertz distribution as well as the exponential distribution and the generalized exponential distribution—offers more flexibility in modeling survival or failure times as it introduces an additional parameter, which can account for different shapes of hazard functions. This enhances its applicability in various fields such as actuarial science, reliability engineering and survival analysis, where more complex survival models are needed to accurately capture the underlying processes. The effect of heterogeneity has generated increased interest in recent times. In this article, multivariate chain majorization methods are exploited to develop stochastic ordering results for extreme-order statistics arising from independent heterogeneous generalized Gompertz random variables with increased degree of heterogeneity.
The dynamics of information diffusion on social media platforms vary significantly between individual communities and the broader population. This study explores and compares the differences between community-based interventions and population-wide approaches in adjusting the spread of information. We first examine the temporal dynamics of social media groups, assessing their behavior through metrics such as time-dependent posts and retweets. Using functional data analysis, we investigate Twitter activities related to incidents such as the Skripal/Novichok case. We present three ways to quantify disparities between communities and uncover the strategies used by each group to promote specific narratives. We then compare the impact of targeted, community-based interventions with that of broader, population-wide responses in shaping the diffusion of information. Through this analysis, we identify key differences in how communities engage with and amplify information, revealing distinct patterns in the diffusion process. Our findings provide a comparative framework for understanding the relative consequences of different intervention strategies, offering insights into how targeted and broad approaches influence public discourse across social media platforms.
As data are becoming increasingly important resources for municipal administrations in the context of urban development, formalization of urban data governance (DG) is considered a prerequisite to systematic municipal data practice for the common good. Unlike for larger cities, it is unclear how common such formalized DG is in rural districts and small towns. We therefore mapped the current status quo in small municipalities in Germany as a case exemplifying the broader phenomenon. We systematically searched online for policy documents on DG in all metropolitan regions, all rural districts, and a quota sample of nearly a sixth of all German small towns. We then performed content analysis of the identified documents along predefined categories of urban development. Results show that hardly any small towns dispose of relevant policy documents. Rural districts are somewhat more active in formally defining DG. Identified policy documents tend to address mostly economic activities, social infrastructure, and demography, whereas Housing and Urban design and public space are among the least mentioned categories of urban development.
Regular inspections of civil structures and infrastructure, performed by professional inspectors, are costly and demanding in terms of time and safety requirements. Additionally, the outcome of inspections can be subjective and inaccurate as they rely on the inspector’s expertise. To address these challenges, autonomous inspection systems offer a promising alternative. However, existing robotic inspection systems often lack adaptive positioning capabilities and integrated crack labelling, limiting detection accuracy and their contribution to long-term dataset improvement. This study introduces a fully autonomous framework that combines real-time crack detection with adaptive pose adjustment, automated recording and labelling of defects, and integration of RGB-D and LiDAR sensing for precise navigation. Damage detection is performed using YOLOv5, a widely used detection model, which analyzes the RGB image stream to detect cracks and generates labels for dataset creation. The robot autonomously adjusts its position based on confidence feedback from the detection algorithm, optimizing its vantage point for improved detection accuracy. Experiment inspections showed an average confidence gain of 18% (exceeding 20% for certain crack types), a reduction in size estimation error from 23.31% to 10.09%, and a decrease in the detection failure rate from 20% to 6.66%. While quantitative validation during field testing proved challenging due to dynamic environmental conditions, qualitative observations aligned with these trends, suggesting its potential to reduce manual intervention in inspections. Moreover, the system enables automated recording and labeling of detected cracks, contributing to the continuous improvement of machine learning models for structural health monitoring.
International travel is thought to be a major risk factor for developing gastrointestinal illness in England. Transmission is thought to be more likely in countries which have lower food hygiene standards, poorer sanitation, and lack of access to clean water. However, many studies are conducted within travel clinic settings which may bias findings. Here, we present a case–control study undertaken in returning English travellers in the community conducted with cases of gastrointestinal illness notified to UKHSA.
All Cryptosporidiosis, Giardiasis, non-typhoidal Salmonellosis, and Shigellosis cases notified to the UK Health Security Agency (UKHSA) between 01 July 2023 and 15 October 2023 were asked to complete an anonymous electronic questionnaire if travelling during their incubation period. Asymptomatic travellers were recruited as controls via a market research panel and asked to complete the same questionnaire. A destination water, hygiene, and sanitation score were derived from the WHO ‘Attributable fraction of diarrhoea to inadequate WASH’ dataset. Demographics, travel details, and exposures while travelling were compared by Pearson’s chi-squared test, and pathogen and destination specific multivariable analyses were performed using a forward stepwise approach.
A total of 653 cases and 483 controls were included. The odds of being a case were significantly higher when travelling to countries outside of the EU (OR:4.6, 95%CI:3.5–6.0; p = <0.001) and to countries with high-risk WASH score (OR 6.6, 95%CI:4.9–9.1; p = <0.001), particularly Egypt, Mexico, Tunisia, and Turkey. For those travelling to a low-risk destination, eating undercooked meat or fish and swallowing water from environmental water sources were significantly associated with higher odds of illness by multivariable analysis (p < 0.05). At high-risk destinations, eating foods consumed on excursions, swallowing water from environmental sources, and eating foods from hotel buffets were significantly associated with higher odds of being a case.
Travel to popular tourist destinations is a potentially under-recognized risk factor for acquiring gastrointestinal infections. Exposures at low-risk destinations were broadly similar to risk factors in the UK. Exposures in high-risk destinations highlighted potential risks associated with catered hotels and tourist excursions which should be explored further.
A well-known theorem of Nikiforov asserts that any graph with a positive $K_{r}$-density contains a logarithmic blowup of $K_r$. In this paper, we explore variants of Nikiforov’s result in the following form. Given $r,t\in \mathbb{N}$, when a positive $K_{r}$-density implies the existence of a significantly larger (with almost linear size) blowup of $K_t$? Our results include:
• For an $n$-vertex ordered graph $G$ with no induced monotone path $P_{6}$, if its complement $\overline {G}$ has positive triangle density, then $\overline {G}$ contains a biclique of size $\Omega ({n \over {\log n}})$. This strengthens a recent result of Pach and Tomon. For general $k$, let $g(k)$ be the minimum $r\in \mathbb{N}$ such that for any $n$-vertex ordered graph $G$ with no induced monotone $P_{2k}$, if $\overline {G}$ has positive $K_r$-density, then $\overline {G}$ contains a biclique of size $\Omega ({n \over {\log n}})$. Using concentration of measure and the isodiametric inequality on high dimensional spheres, we provide constructions showing that, surprisingly, $g(k)$ grows quadratically. On the other hand, we relate the problem of upper bounding $g(k)$ to a certain Ramsey problem and determine $g(k)$ up to a factor of 2.
• Any incomparability graph with positive $K_{r}$-density contains a blowup of $K_r$ of size $\Omega ({n \over {\log n}}).$ This confirms a conjecture of Tomon in a stronger form. In doing so, we obtain a strong regularity type lemma for incomparability graphs with no large blowups of a clique, which is of independent interest. We also prove that any $r$-comparability graph with positive $K_{(2h-2)^{r}+1}$-density contains a blowup of $K_h$ of size $\Omega (n)$, where the constant $(2h-2)^{r}+1$ is optimal.
The ${n \over {\log n}}$ size of the blowups in all our results are optimal up to a constant factor.