To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Large deviations of the largest and smallest eigenvalues of $\mathbf{X}\mathbf{X}^\top/n$ are studied in this note, where $\mathbf{X}_{p\times n}$ is a $p\times n$ random matrix with independent and identically distributed (i.i.d.) sub-Gaussian entries. The assumption imposed on the dimension size p and the sample size n is $p=p(n)\rightarrow\infty$ with $p(n)={\mathrm{o}}(n)$. This study generalizes one result obtained in [3].
Climate change is expected to increase the frequency and intensity of extreme weather events. To properly assess the increased economical risk of these events, actuaries can gain in relying on expert models/opinions from multiple different sources, which requires the use of model combination techniques. From non-parametric to Bayesian approaches, different methods rely on varying assumptions potentially leading to very different results. In this paper, we apply multiple model combination methods to an ensemble of 24 experts in a pooling approach and use the differences in outputs from the different combinations to illustrate how one can gain additional insight from using multiple methods. The densities obtained from pooling in Montreal and Quebec City highlight the significant changes in higher quantiles obtained through different combination approaches. Areal reduction factor and quantile projected changes are used to show that consistency, or lack thereof, across approaches reflects the uncertainty of combination methods. This shows how an actuary using multiple expert models should consider more than one combination method to properly assess the impact of climate change on loss distributions, seeing as a single method can lead to overconfidence in projections.
Let a random geometric graph be defined in the supercritical regime for the existence of a unique infinite connected component in Euclidean space. Consider the first-passage percolation model with independent and identically distributed random variables on the random infinite connected component. We provide sufficient conditions for the existence of the asymptotic shape, and we show that the shape is a Euclidean ball. We give some examples exhibiting the result for Bernoulli percolation and the Richardson model. In the latter case we further show that it converges weakly to a nonstandard branching process in the joint limit of large intensities and slow passage times.
We focus on modelling categorical features and improving predictive power of neural networks with mixed categorical and numerical features in supervised learning tasks. The goal of this paper is to challenge the current dominant approach in actuarial data science with a new architecture of a neural network and a new training algorithm. The key proposal is to use a joint embedding for all categorical features, instead of separate entity embeddings, to determine the numerical representation of the categorical features which is fed, together with all other numerical features, into hidden layers of a neural network with a target response. In addition, we postulate that we should initialize the numerical representation of the categorical features and other parameters of the hidden layers of the neural network with parameters trained with (denoising) autoencoders in unsupervised learning tasks, instead of using random initialization of parameters. Since autoencoders for categorical data play an important role in this research, they are investigated in more depth in the paper. We illustrate our ideas with experiments on a real data set with claim numbers, and we demonstrate that we can achieve a higher predictive power of the network.
Due to the presence of reporting and settlement delay, claim data sets collected by non-life insurance companies are typically incomplete, facing right censored claim count and claim severity observations. Current practice in non-life insurance pricing tackles these right censored data via a two-step procedure. First, best estimates are computed for the number of claims that occurred in past exposure periods and the ultimate claim severities, using the incomplete, historical claim data. Second, pricing actuaries build predictive models to estimate technical, pure premiums for new contracts by treating these best estimates as actual observed outcomes, hereby neglecting their inherent uncertainty. We propose an alternative approach that brings valuable insights for both non-life pricing and reserving. As such, we effectively bridge these two key actuarial tasks that have traditionally been discussed in silos. Hereto, we develop a granular occurrence and development model for non-life claims that tackles reserving and at the same time resolves the inconsistency in traditional pricing techniques between actual observations and imputed best estimates. We illustrate our proposed model on an insurance as well as a reinsurance portfolio. The advantages of our proposed strategy are most compelling in the reinsurance illustration where large uncertainties in the best estimates originate from long reporting and settlement delays, low claim frequencies and heavy (even extreme) claim sizes.
COVID-19 impacts population health equity. While mRNA vaccines protect against serious illness and death, little New Zealand (NZ) data exist about the impact of Omicron – and the effectiveness of vaccination – on different population groups. We aim to examine the impact of Omicron on Māori, Pacific, and Other ethnicities and how this interacts with age and vaccination status in the Te Manawa Taki Midland region of NZ. Daily COVID-19 infection and hospitalisation rates (1 February 2022 to 29 June 2022) were calculated for Māori, Pacific, and Other ethnicities for six age bands. A multivariate logistic regression model quantified the effects of ethnicity, age, and vaccination on hospitalisation rates. Per-capita Omicron cases were highest and occurred earliest among Pacific (9 per 1,000) and Māori (5 per 1,000) people and were highest among 12–24-year-olds (7 per 1,000). Hospitalisation was significantly more likely for Māori people (odds ratio (OR) = 2.03), Pacific people (OR = 1.75), over 75-year-olds (OR = 39.22), and unvaccinated people (OR = 4.64). Length of hospitalisation is strongly related to age. COVID-19 vaccination reduces hospitalisations for older individuals and Māori and Pacific populations. Omicron inequitably impacted Māori and Pacific people through higher per-capita infection and hospitalisation rates. Older people are more likely to be hospitalised and for longer.
Let $(Z_n)_{n\geq0}$ be a supercritical Galton–Watson process. Consider the Lotka–Nagaev estimator for the offspring mean. In this paper we establish self-normalized Cramér-type moderate deviations and Berry–Esseen bounds for the Lotka–Nagaev estimator. The results are believed to be optimal or near-optimal.
Development of robust concrete mixes with a lower environmental impact is challenging due to natural variability in constituent materials and a multitude of possible combinations of mix proportions. Making reliable property predictions with machine learning can facilitate performance-based specification of concrete, reducing material inefficiencies and improving the sustainability of concrete construction. In this work, we develop a machine learning algorithm that can utilize intermediate target variables and their associated noise to predict the final target variable. We apply the methodology to specify a concrete mix that has high resistance to carbonation, and another concrete mix that has low environmental impact. Both mixes also fulfill targets on the strength, density, and cost. The specified mixes are experimentally validated against their predictions. Our generic methodology enables the exploitation of noise in machine learning, which has a broad range of applications in structural engineering and beyond.
For a $k$-uniform hypergraph $\mathcal{H}$ on vertex set $\{1, \ldots, n\}$ we associate a particular signed incidence matrix $M(\mathcal{H})$ over the integers. For $\mathcal{H} \sim \mathcal{H}_k(n, p)$ an Erdős–Rényi random $k$-uniform hypergraph, ${\mathrm{coker}}(M(\mathcal{H}))$ is then a model for random abelian groups. Motivated by conjectures from the study of random simplicial complexes we show that for $p = \omega (1/n^{k - 1})$, ${\mathrm{coker}}(M(\mathcal{H}))$ is torsion-free.
In this paper we study the asymptotic behaviour of a random uniform parking function $\pi_n$ of size n. We show that the first $k_n$ places $\pi_n(1),\ldots,\pi_n(k_n)$ of $\pi_n$ are asymptotically independent and identically distributed (i.i.d.) and uniform on $\{1,2,\ldots,n\}$, for the total variation distance when $k_n = {\rm{o}}(\sqrt{n})$, and for the Kolmogorov distance when $k_n={\rm{o}}(n)$, improving results of Diaconis and Hicks. Moreover, we give bounds for the rate of convergence, as well as limit theorems for certain statistics such as the sum or the maximum of the first $k_n$ parking places. The main tool is a reformulation using conditioned random walks.
When subjected to a sudden, unanticipated threat, human groups characteristically self-organize to identify the threat, determine potential responses, and act to reduce its impact. Central to this process is the challenge of coordinating information sharing and response activity within a disrupted environment. In this paper, we consider coordination in the context of responses to the 2001 World Trade Center (WTC) disaster. Using records of communications among 17 organizational units, we examine the mechanisms driving communication dynamics, with an emphasis on the emergence of coordinating roles. We employ relational event models (REMs) to identify the mechanisms shaping communications in each unit, finding a consistent pattern of behavior across units with very different characteristics. Using a simulation-based “knock-out” study, we also probe the importance of different mechanisms for hub formation. Our results suggest that, while preferential attachment and pre-disaster role structure generally contribute to the emergence of hub structure, temporally local conversational norms play a much larger role in the WTC case. We discuss broader implications for the role of microdynamics in driving macroscopic outcomes, and for the emergence of coordination in other settings.
We investigated cardiovascular disease (CVD) risk associated with latent tuberculosis infection (LTBI) (Aim-1) and LTBI therapy (Aim-2) in British Columbia, a low-tuberculosis-incidence setting. 49,197 participants had valid LTBI test results. Cox proportional hazards model was fitted, adjusting for potential confounders. Compared with the participants who tested LTBI negative, LTBI positive was associated with an 8% higher CVD risk in complete case data (adjusted hazard ratio (HR): 1.08, 95% CI: 0.99-1.18), a statistically significant 11% higher risk when missing confounder values were imputed using multiple imputation (HR: 1.11, 95% CI: 1.02-1.20), and 10% higher risk when additional proxy variables supplementing known unmeasured confounders were incorporated in the highdimensional disease risk score technique to reduce residual confounding (HR: 1.10, 95% CI: 1.01-1.20). Also, compared with participants who tested negative, CVD risk was 27% higher among people who were LTBI positive but incomplete LTBI therapy (HR: 1.27, 95% CI: 1.04-1.55), whereas the risk was similar in people who completed LTBI therapy (HR: 1.04, 95% CI: 0.87-1.24). Findings were consistent in different sensitivity analyses. We concluded that LTBI is associated with an increased CVD risk in low-tuberculosis-incidence settings, with a higher risk associated with incomplete LTBI therapy and attenuated risk when therapy is completed.
The global and uneven spread of COVID-19, mirrored at the local scale, reveals stark differences along racial and ethnic lines. We respond to the pressing need to understand these divergent outcomes via neighborhood level analysis of mobility and case count information. Using data from Chicago over 2020, we leverage a metapopulation Susceptible-Exposed-Infectious-Removed model to reconstruct and simulate the spread of SARS-CoV-2 at the ZIP Code level. We demonstrate that exposures are mostly contained within one’s own ZIP Code and demographic group. Building on this observation, we illustrate that we can understand epidemic progression using a composite metric combining the volume of mobility and the risk that each trip represents, while separately these factors fail to explain the observed heterogeneity in neighborhood level outcomes. Having established this result, we next uncover how group level differences in these factors give rise to disparities in case rates along racial and ethnic lines. Following this, we ask what-if questions to quantify how segregation impacts COVID-19 case rates via altering mobility patterns. We find that segregation in the mobility network has contributed to inequality in case rates across demographic groups.
The International Maritime Organization along with couple European countries (Paris MoU) has introduced in 1982 the port state control (PSC) inspections of vessels in national ports to evaluate their compliance with safety and security regulations. This study discusses how the PSC data share common characteristics with Big Data fundamental theories, and by interpreting them as Big Data, we could enjoy their governance and transparency as a Big Data challenge to gain value from their use. Thus, from the scope of Big Data, PSC should exhibit volume, velocity, variety, value, and complexity to support in the best possible way both officers ashore and on board to maintain the vessel in the best possible conditions for sailing. For the above purpose, this paper employs Big Data theories broadly used within the academic and business environment on datasets characteristics and how to access the value from Big Data and Analytics. The research concludes that PSC data provide valid information to the shipping industry. However, the lack of PSC data ability to present the complete picture of PSC regimes and ports challenges the maritime community’s attempts for a safer and more sustainable industry.
We introduce a new test for a two-sided hypothesis involving a subset of the structural parameter vector in the linear instrumental variables (IVs) model. Guggenberger, Kleibergen, and Mavroeidis (2019, Quantitative Economics, 10, 487–526; hereafter GKM19) introduce a subvector Anderson–Rubin (AR) test with data-dependent critical values that has asymptotic size equal to nominal size for a parameter space that allows for arbitrary strength or weakness of the IVs and has uniformly nonsmaller power than the projected AR test studied in Guggenberger et al. (2012, Econometrica, 80(6), 2649–2666). However, GKM19 imposes the restrictive assumption of conditional homoskedasticity (CHOM). The main contribution here is to robustify the procedure in GKM19 to arbitrary forms of conditional heteroskedasticity. We first adapt the method in GKM19 to a setup where a certain covariance matrix has an approximate Kronecker product (AKP) structure which nests CHOM. The new test equals this adaptation when the data are consistent with AKP structure as decided by a model selection procedure. Otherwise, the test equals the AR/AR test in Andrews (2017, Identification-Robust Subvector Inference, Cowles Foundation Discussion Papers 3005, Yale University) that is fully robust to conditional heteroskedasticity but less powerful than the adapted method. We show theoretically that the new test has asymptotic size bounded by the nominal size and document improved power relative to the AR/AR test in a wide array of Monte Carlo simulations when the covariance matrix is not too far from AKP.
In deception research, little consideration is given to how the framing of the question might impact the decision-making process used to reach a veracity judgment. People use terms such as “sure” to describe their uncertainty about an event (i.e., aleatory) and terms such as “chance” to describe their uncertainty about the world (i.e., epistemic). Presently, the effect of such uncertainty framing on veracity judgments was considered. By manipulating the veracity question wording the effect of uncertainty framing on deception detection was measured. The data show no difference in veracity judgments between the two uncertainty framing conditions, suggesting that these may operate on a robust and invariant cognitive process.
Consider a set of n vertices, where each vertex has a location in $\mathbb{R}^d$ that is sampled uniformly from the unit cube in $\mathbb{R}^d$, and a weight associated to it. Construct a random graph by placing edges independently for each vertex pair with a probability that is a function of the distance between the locations and the vertex weights.
Under appropriate integrability assumptions on the edge probabilities that imply sparseness of the model, after appropriately blowing up the locations, we prove that the local limit of this random graph sequence is the (countably) infinite random graph on $\mathbb{R}^d$ with vertex locations given by a homogeneous Poisson point process, having weights which are independent and identically distributed copies of limiting vertex weights. Our set-up covers many sparse geometric random graph models from the literature, including geometric inhomogeneous random graphs (GIRGs), hyperbolic random graphs, continuum scale-free percolation, and weight-dependent random connection models.
We prove that the limiting degree distribution is mixed Poisson and the typical degree sequence is uniformly integrable, and we obtain convergence results on various measures of clustering in our graphs as a consequence of local convergence. Finally, as a byproduct of our argument, we prove a doubly logarithmic lower bound on typical distances in this general setting.
We investigated the potential effects of COVID-19 public health restrictions on the prevalence and distribution of Neisseria gonorrhoeae (NG) genotypes in our Queensland isolate population in the first half of the year 2020. A total of 763 NG isolates were genotyped to examine gonococcal strain distribution and prevalence for the first 6 months of 2020, with 1 January 2020 to 31 March 2020 classified as ‘pre’ COVID-19 restrictions (n = 463) and 1 April 2020 to 30 June 2020 classified as ‘post’ COVID-19 restrictions (n = 300). Genotypes most prevalent ‘pre’ restrictions remained proportionally high ‘post’ restrictions, with some significantly increasing ‘post’ restrictions. However, genotype diversity was significantly reduced ‘post’ restrictions. Overall, it seems public health restrictions (9–10 weeks) were not sufficient to affect rates of infection or reduce the prevalence of well-established genotypes in our population, potentially due to reduced access to services or health-seeking behaviours.
We construct a class of non-reversible Metropolis kernels as a multivariate extension of the guided-walk kernel proposed by Gustafson (Statist. Comput.8, 1998). The main idea of our method is to introduce a projection that maps a state space to a totally ordered group. By using Haar measure, we construct a novel Markov kernel termed the Haar mixture kernel, which is of interest in its own right. This is achieved by inducing a topological structure to the totally ordered group. Our proposed method, the $\Delta$-guided Metropolis–Haar kernel, is constructed by using the Haar mixture kernel as a proposal kernel. The proposed non-reversible kernel is at least 10 times better than the random-walk Metropolis kernel and Hamiltonian Monte Carlo kernel for the logistic regression and a discretely observed stochastic process in terms of effective sample size per second.