To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
An increasing number of reports highlight the potential of machine learning (ML) methodologies over the conventional generalised linear model (GLM) for non-life insurance pricing. In parallel, national and international regulatory institutions are accentuating their focus on pricing fairness to quantify and mitigate algorithmic differences and discrimination. However, comprehensive studies that assess both pricing accuracy and fairness remain scarce. We propose a benchmark of the GLM against mainstream regularised linear models and tree-based ensemble models under two popular distribution modelling strategies (Poisson-gamma and Tweedie), with respect to key criteria including estimation bias, deviance, risk differentiation, competitiveness, loss ratios, discrimination and fairness. Pricing performance and fairness were assessed simultaneously on the same samples of premium estimates for GLM and ML models. The models were compared on two open-access motor insurance datasets, each with a different type of cover (fully comprehensive and third-party liability). While no single ML model outperformed across both pricing and discrimination metrics, the GLM significantly underperformed for most. The results indicate that ML may be considered a realistic and reasonable alternative to current practices. We advocate that benchmarking exercises for risk prediction models should be carried out to assess both pricing accuracy and fairness for any given portfolio.
This paper presents an actuarially oriented approach for estimating health state utility values using an enhanced EQ-5D-5L framework that incorporates demographic heterogeneity directly into a Generalised Linear Model (GLM). Using data from 148 patients with Stage IV non-small cell lung cancer (NSCLC) in South Africa, an inverse Gaussian GLM was fitted with demographic variables and EQ-5D-5L domain responses to explain variation in visual analogue scale (VAS) scores. Model selection relied on Akaike Information Criterion, Bayesian Information Criterion, and residual deviance, and extensive diagnostic checks confirmed good calibration, no overdispersion, and strong robustness under bootstrap validation. The final model identified age, gender, home language, and financial dependency as significant predictors of perceived health, demonstrating that utility values differ meaningfully across demographic groups. By generating subgroup-specific estimates rather than relying on uniform value sets, the framework supports more context-sensitive cost-effectiveness modelling and fairer resource allocation. Although developed in the South African NSCLC setting, the methodology is generalisable and offers actuaries and health economists a replicable tool for integrating population heterogeneity into Health Technology Assessment, pricing analysis, and value-based care.
Surrogate models have gained widespread popularity for their effectiveness in replacing computationally expensive numerical analyses, particularly in scenarios such as design optimization procedures, requiring hundreds or thousands of simulations. While one-shot sampling methods—where all samples are generated in a single stage without prior knowledge of the required sample size—are commonly adopted in the creation of surrogate models, these methods face significant limitations. Given that the characteristics of the underlying system are generally unknown prior to training, adopting one-shot sampling can lead to suboptimal model performance or unnecessary computational costs, especially in complex or high-dimensional problems. This paper addresses these challenges by proposing a novel, model-independent adaptive sampling approach with batch selection, termed Cross-Validation Batch Adaptive Sampling for High-Efficiency Surrogates (CV-BASHES). CV-BASHES is first validated using two analytical functions to explore its flexibility and accuracy under different configurations, confirming its robustness. Comparative studies on the same functions with two state-of-the-art methods, maximum projection (MaxPro) and scalable adaptive sampling (SAS), demonstrate the superior accuracy and robustness of CV-BASHES. Its applicability is further demonstrated through a geotechnical application, where CV-BASHES is used to develop a surrogate model to predict the horizontal deformation of a diaphragm wall supporting a deep excavation. Results show that CV-BASHES efficiently selects training samples, reducing the dataset size while maintaining high surrogate accuracy. By offering more efficient sampling strategies, CV-BASHES streamlines and enhances the process of creating machine learning models as surrogates for tackling complex problems in general engineering disciplines.
Pretesting for exogeneity has become routine in many empirical applications involving instrumental variables (IVs) to decide whether the ordinary least squares or IV-based method is appropriate. Guggenberger (2010a, Econometric Theory, 26, 369–382) shows that the second-stage test – based on the outcome of a Durbin-Wu-Hausman-type pretest in the first stage – exhibits extreme size distortion, with asymptotic size equal to 1 when the standard critical values are used. In this paper, we first show that both conditional and unconditional on the data, standard wild bootstrap procedures are invalid for two-stage testing. Second, we propose an identification-robust two-stage test statistic that switches between OLS-based and weak-IV-robust statistics. Third, we develop a size-adjusted wild bootstrap approach for our two-stage test that integrates specific wild bootstrap critical values with an appropriate size-adjustment method. We establish uniform validity of this procedure under conditional heteroskedasticity or clustering in the sense that the resulting tests achieve correct asymptotic size, regardless of whether the identification is strong or weak. Our procedure is especially valuable for empirical researchers facing potential weak identification. In such settings, its power advantage is notable: whereas weak-IV-robust methods maintain correct size but often suffer from relatively low power, our approach achieves better performance.
We consider two-person zero-sum semi-Markov games with incomplete reward information on one side under the expected discount criterion. First, we prove that the value function exists and satisfies the Shapley equation. From the Shapley equation, we construct an optimal policy for the informed player. Second, to show the existence of an optimal policy for the uninformed player, we introduce an auxiliary dual game and establish the relationship between the primal game and the dual game. By this relationship, we also prove the existence of the value function of the dual game, and then construct an optimal policy for the uninformed player in the primal game. Finally, we develop two iterative algorithms to compute $\varepsilon$-optimal policies for the informed player and the uninformed player, respectively.
A random variable $\xi$ has a light-tailed distribution (for short, is light-tailed) if it possesses a finite exponential moment, ${\mathbb{E}} \, {\exp}{(\lambda \xi)} <\infty$ for some $\lambda >0$, and has a heavy-tailed distribution (is heavy-tailed) if ${\mathbb{E}} \, {\exp}{(\lambda\xi)} = \infty$ for all $\lambda>0$. Leipus et al. (2023 AIMS Math.8, 13066–13072) presented a particular example of a light-tailed random variable that is the minimum of two independent heavy-tailed random variables. We show that this phenomenon is universal: any light-tailed random variable with right-unbounded support may be represented as the minimum of two independent heavy-tailed random variables. Moreover, a more general fact holds: these two independent random variables may have as heavy-tailed distributions as we wish. Further, we extend the latter result to the minimum of any finite number of independent random variables. We also comment on possible generalizations of our result to the case of dependent random variables.
Longevity risk significantly impacts the reserve adequacy ratio of annuity issuers, thereby reducing product profitability. Effectively managing this risk has thus become a priority for insurance companies. A natural hedging strategy, which involves balancing longevity risk through an optimised portfolio of life insurance and annuity products, offers a promising solution and has attracted considerable academic attention in recent years. In this study, we construct a realistic portfolio scenario comprising annuities and life insurance policies across various ages and genders. By applying Cholesky decomposition, we transform the portfolio into an uncorrelated linear model. Our objective function minimises the variance in portfolio value changes, allowing us to explore the impact of mortality on longevity risk mitigation through natural hedging. Using actuarial mathematics and the Bayesian MCMC algorithm, we analyse the factors influencing the hedging effectiveness of a portfolio with minimised variance. Empirical findings indicate that the optimal life-to-annuity ratio is influenced by multiple factors, including gender, age, projection period, and forecast horizon. Based on these findings, we recommend that insurance companies adjust their business structures and actively pursue product innovation to enhance longevity risk management.
This paper addresses the gap between theoretical modeling of cyber risk propagation and empirical analysis of loss characteristics by introducing a novel approach that integrates both approaches. We model the development of cyber loss counts over time using a discrete-time susceptible-infected-recovered process, linking these counts to covariates, and modeling loss severity with regression models. By incorporating temporal and covariate-dependent transition rates, we eliminate the scaling effect of population size on infection counts, revealing the true underlying dynamics. Simulations show that this susceptible-infected-recovered framework significantly improves aggregate loss prediction accuracy, providing a more effective and practical tool for actuarial assessments and risk management in the cyber risk context.
We developed a dynamic COVID-19 Vaccination Barrier Index (CVBI) at the census-tract level in Clark County, Nevada and assessed its geographic disparities and relationship with COVID-19 vaccination rates over time. Using monthly census-tract data from December 2020 to June 2022, the CVBI integrated demographic, socioeconomic, environmental, housing, and transportation variables, alongside surrogates for vaccination accessibility and vaccine hesitancy. Lagged weighted quantile sum regression was applied to construct monthly indices, while a Besag-York-Mollié model assessed associations with vaccination rates. The results revealed consistent vaccination barriers such as living in group quarters, housing inadequacy, and population density across all vaccination statuses (partial, full, booster). Rural and northern Clark County, especially northeastern Las Vegas, exhibited higher CVBI scores that correlated negatively with vaccination rates. Booster vaccination patterns differed, displaying fewer significantly vulnerable tracts. The dynamic nature of barriers is evident, highlighting temporal shifts in the significance of variables like driving distance to vaccine sites. This study emphasizes the importance of dynamic, localized assessments in identifying vaccination barriers, guiding public health interventions, and informing resource allocation to enhance vaccine accessibility during pandemics.
We investigate some investment problems related to maximizing the expected utility of the terminal wealth in a continuous-time Itô–Markov additive market. In this market, the prices of financial assets are described by Markov additive processes that combine Lévy processes with regime-switching models. We give explicit expressions for the solutions to the portfolio selection problem for the hyperbolic absolute risk aversion (HARA) utility, the exponential utility, and the extended logarithmic utility. In addition, we demonstrate that the solutions for the HARA utility are stable in terms of weak convergence when the parameters vary in a suitable way.
We study bond and site Bernoulli percolation models on $\mathbb{Z}^d$ for $d \geq 3$ with parameter p, in both the oriented and non-oriented versions. The main macroscopic quantity of interest is the probability of long-range order, and the existence of a non-trivial threshold is well established. Precise numerical results for the threshold values are available in the literature, but mathematically rigorous bounds are mostly restricted to two-dimensional lattices. Utilizing dynamical coupling techniques, we introduce a comprehensive set of new rigorous upper bounds that corroborate existing numerical values.
The Latent Position Model (LPM) is a popular approach for the statistical analysis of network data. A central aspect of this model is that it assigns nodes to random positions in a latent space, such that the probability of an interaction between each pair of individuals or nodes is determined by their distance in this latent space. A key feature of this model is that it allows one to visualize nuanced structures via the latent space representation. The LPM can be further extended to the Latent Position Cluster Model (LPCM), to accommodate the clustering of nodes by assuming that the latent positions are distributed following a finite mixture distribution. In this paper, we extend the LPCM to accommodate missing network data and apply this to non-negative discrete weighted social networks. By treating missing data as “unusual” zero interactions, we propose a combination of the LPCM with the zero-inflated Poisson distribution. Statistical inference is based on a novel partially collapsed Markov chain Monte Carlo algorithm, where a Mixture-of-Finite-Mixtures (MFM) model is adopted to automatically determine the number of clusters and optimal group partitioning. Our algorithm features a truncated absorb-eject move, which is a novel adaptation of an idea commonly used in collapsed samplers, within the context of MFMs. Another aspect of our work is that we illustrate our results on 3-dimensional latent spaces, maintaining clear visualizations while achieving more flexibility than 2-dimensional models. The performance of this approach is illustrated via three carefully designed simulation studies, as well as four different publicly available real networks, where some interesting new perspectives are uncovered.
The generalized Gompertz distribution—an extension of the standard Gompertz distribution as well as the exponential distribution and the generalized exponential distribution—offers more flexibility in modeling survival or failure times as it introduces an additional parameter, which can account for different shapes of hazard functions. This enhances its applicability in various fields such as actuarial science, reliability engineering and survival analysis, where more complex survival models are needed to accurately capture the underlying processes. The effect of heterogeneity has generated increased interest in recent times. In this article, multivariate chain majorization methods are exploited to develop stochastic ordering results for extreme-order statistics arising from independent heterogeneous generalized Gompertz random variables with increased degree of heterogeneity.
The dynamics of information diffusion on social media platforms vary significantly between individual communities and the broader population. This study explores and compares the differences between community-based interventions and population-wide approaches in adjusting the spread of information. We first examine the temporal dynamics of social media groups, assessing their behavior through metrics such as time-dependent posts and retweets. Using functional data analysis, we investigate Twitter activities related to incidents such as the Skripal/Novichok case. We present three ways to quantify disparities between communities and uncover the strategies used by each group to promote specific narratives. We then compare the impact of targeted, community-based interventions with that of broader, population-wide responses in shaping the diffusion of information. Through this analysis, we identify key differences in how communities engage with and amplify information, revealing distinct patterns in the diffusion process. Our findings provide a comparative framework for understanding the relative consequences of different intervention strategies, offering insights into how targeted and broad approaches influence public discourse across social media platforms.
As data are becoming increasingly important resources for municipal administrations in the context of urban development, formalization of urban data governance (DG) is considered a prerequisite to systematic municipal data practice for the common good. Unlike for larger cities, it is unclear how common such formalized DG is in rural districts and small towns. We therefore mapped the current status quo in small municipalities in Germany as a case exemplifying the broader phenomenon. We systematically searched online for policy documents on DG in all metropolitan regions, all rural districts, and a quota sample of nearly a sixth of all German small towns. We then performed content analysis of the identified documents along predefined categories of urban development. Results show that hardly any small towns dispose of relevant policy documents. Rural districts are somewhat more active in formally defining DG. Identified policy documents tend to address mostly economic activities, social infrastructure, and demography, whereas Housing and Urban design and public space are among the least mentioned categories of urban development.
Regular inspections of civil structures and infrastructure, performed by professional inspectors, are costly and demanding in terms of time and safety requirements. Additionally, the outcome of inspections can be subjective and inaccurate as they rely on the inspector’s expertise. To address these challenges, autonomous inspection systems offer a promising alternative. However, existing robotic inspection systems often lack adaptive positioning capabilities and integrated crack labelling, limiting detection accuracy and their contribution to long-term dataset improvement. This study introduces a fully autonomous framework that combines real-time crack detection with adaptive pose adjustment, automated recording and labelling of defects, and integration of RGB-D and LiDAR sensing for precise navigation. Damage detection is performed using YOLOv5, a widely used detection model, which analyzes the RGB image stream to detect cracks and generates labels for dataset creation. The robot autonomously adjusts its position based on confidence feedback from the detection algorithm, optimizing its vantage point for improved detection accuracy. Experiment inspections showed an average confidence gain of 18% (exceeding 20% for certain crack types), a reduction in size estimation error from 23.31% to 10.09%, and a decrease in the detection failure rate from 20% to 6.66%. While quantitative validation during field testing proved challenging due to dynamic environmental conditions, qualitative observations aligned with these trends, suggesting its potential to reduce manual intervention in inspections. Moreover, the system enables automated recording and labeling of detected cracks, contributing to the continuous improvement of machine learning models for structural health monitoring.
A well-known theorem of Nikiforov asserts that any graph with a positive $K_{r}$-density contains a logarithmic blowup of $K_r$. In this paper, we explore variants of Nikiforov’s result in the following form. Given $r,t\in \mathbb{N}$, when a positive $K_{r}$-density implies the existence of a significantly larger (with almost linear size) blowup of $K_t$? Our results include:
• For an $n$-vertex ordered graph $G$ with no induced monotone path $P_{6}$, if its complement $\overline {G}$ has positive triangle density, then $\overline {G}$ contains a biclique of size $\Omega ({n \over {\log n}})$. This strengthens a recent result of Pach and Tomon. For general $k$, let $g(k)$ be the minimum $r\in \mathbb{N}$ such that for any $n$-vertex ordered graph $G$ with no induced monotone $P_{2k}$, if $\overline {G}$ has positive $K_r$-density, then $\overline {G}$ contains a biclique of size $\Omega ({n \over {\log n}})$. Using concentration of measure and the isodiametric inequality on high dimensional spheres, we provide constructions showing that, surprisingly, $g(k)$ grows quadratically. On the other hand, we relate the problem of upper bounding $g(k)$ to a certain Ramsey problem and determine $g(k)$ up to a factor of 2.
• Any incomparability graph with positive $K_{r}$-density contains a blowup of $K_r$ of size $\Omega ({n \over {\log n}}).$ This confirms a conjecture of Tomon in a stronger form. In doing so, we obtain a strong regularity type lemma for incomparability graphs with no large blowups of a clique, which is of independent interest. We also prove that any $r$-comparability graph with positive $K_{(2h-2)^{r}+1}$-density contains a blowup of $K_h$ of size $\Omega (n)$, where the constant $(2h-2)^{r}+1$ is optimal.
The ${n \over {\log n}}$ size of the blowups in all our results are optimal up to a constant factor.