To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
We derive a new theoretical lower bound for the expected supremum of drifted fractional Brownian motion with Hurst index $H\in(0,1)$ over a (in)finite time horizon. Extensive simulation experiments indicate that our lower bound outperforms the Monte Carlo estimates based on very dense grids for $H\in(0,\tfrac{1}{2})$. Additionally, we derive the Paley–Wiener–Zygmund representation of a linear fractional Brownian motion in the general case and give an explicit expression for the derivative of the expected supremum at $H=\tfrac{1}{2}$ in the sense of Bisewski, Dȩbicki and Rolski (2021).
State-of-the-art machine-learning-based models are a popular choice for modeling and forecasting energy behavior in buildings because given enough data, they are good at finding spatiotemporal patterns and structures even in scenarios where the complexity prohibits analytical descriptions. However, their architecture typically does not hold physical correspondence to mechanistic structures linked with governing physical phenomena. As a result, their ability to successfully generalize for unobserved timesteps depends on the representativeness of the dynamics underlying the observed system in the data, which is difficult to guarantee in real-world engineering problems such as control and energy management in digital twins. In response, we present a framework that combines lumped-parameter models in the form of linear time-invariant (LTI) state-space models (SSMs) with unsupervised reduced-order modeling in a subspace-based domain adaptation (SDA) approach, which is a type of transfer-learning (TL) technique. Traditionally, SDA is adopted for exploiting labeled data from one domain to predict in a different but related target domain for which labeled data is limited. We introduced a novel SDA approach where instead of labeled data, we leverage the geometric structure of the LTI SSM governed by well-known heat transfer ordinary differential equations to forecast for unobserved timesteps beyond available measurement data by geometrically aligning the physics-derived and data-derived embedded subspaces closer together. In this initial exploration, we evaluate the physics-based SDA framework on a demonstrative heat conduction scenario by varying the thermophysical properties of the source and target systems to demonstrate the transferability of mechanistic models from physics to observed measurement data.
A proliferation of data-generating devices, sensors, and applications has led to unprecedented amounts of digital data. We live in an era of datafication, one in which life is increasingly quantified and transformed into intelligence for private or public benefit. When used responsibly, this offers new opportunities for public good. The potential of data is evident in the possibilities offered by open data and data collaboratives—both instances of how wider access to data can lead to positive and often dramatic social transformation. However, three key forms of asymmetry currently limit this potential, especially for already vulnerable and marginalized groups: data asymmetries, information asymmetries, and agency asymmetries. These asymmetries limit human potential, both in a practical and psychological sense, leading to feelings of disempowerment and eroding public trust in technology. Existing methods to limit asymmetries (such as open data or consent) as well as some alternatives under consideration (data ownership, collective ownership, personal information management systems) have limitations to adequately address the challenges at hand. A new principle and practice of digital self-determination (DSD) is therefore required. The study and practice of DSD remain in its infancy. The characteristics we have outlined here are only exploratory, and much work remains to be done so as to better understand what works and what does not. We suggest the need for a new research framework or agenda to explore DSD and how it can address the asymmetries, imbalances, and inequalities—both in data and society more generally—that are emerging as key public policy challenges of our era.
This paper studies large N and large T conditional quantile panel data models with interactive fixed effects. We propose a nuclear norm penalized estimator of the coefficients on the covariates and the low-rank matrix formed by the interactive fixed effects. The estimator solves a convex minimization problem, not requiring pre-estimation of the (number of) interactive fixed effects. It also allows the number of covariates to grow slowly with N and T. We derive an error bound on the estimator that holds uniformly in the quantile level. The order of the bound implies uniform consistency of the estimator and is nearly optimal for the low-rank component. Given the error bound, we also propose a consistent estimator of the number of interactive fixed effects at any quantile level. We demonstrate the performance of the estimator via Monte Carlo simulations.
Pseudo cross-variograms appear naturally in the context of multivariate Brown–Resnick processes, and are a useful tool for analysis and prediction of multivariate random fields. We give a necessary and sufficient criterion for a matrix-valued function to be a pseudo cross-variogram, and further provide a Schoenberg-type result connecting pseudo cross-variograms and multivariate correlation functions. By means of these characterizations, we provide extensions of the popular univariate space–time covariance model of Gneiting to the multivariate case.
Fuelled by the big data explosion, a new methodology to estimate sub-annual death probabilities has recently been proposed, opening new insurance business opportunities. This new approach exploits all the detailed information available from millions of microdata records to develop seasonal-ageing indexes (SAIs) from which sub-annual (quarterly) life tables can be derived from annual tables. In this paper, we explore whether a shortcut could be taken in the estimation of SAIs and (life insurance) sub-annual death rates. We propose three different approximations, in which estimates are attained by using just a small bunch of thousands of data records and assess their impact on several competitive markets defined from an actual portfolio of life insurance policies. Our analyses clearly point to the shortcuts as good practical alternatives that can be used in real-life insurance markets. Noticeably, we see that embracing the new quarterly based approach, even using only an approximation (shortcut), is economically preferable to using the associated annual table, offering a significant competitive advantage to the company adopting this innovation.
Given a graph $G$ and an integer $\ell \ge 2$, we denote by $\alpha _{\ell }(G)$ the maximum size of a $K_{\ell }$-free subset of vertices in $V(G)$. A recent question of Nenadov and Pehova asks for determining the best possible minimum degree conditions forcing clique-factors in $n$-vertex graphs $G$ with $\alpha _{\ell }(G) = o(n)$, which can be seen as a Ramsey–Turán variant of the celebrated Hajnal–Szemerédi theorem. In this paper we find the asymptotical sharp minimum degree threshold for $K_r$-factors in $n$-vertex graphs $G$ with $\alpha _\ell (G)=n^{1-o(1)}$ for all $r\ge \ell \ge 2$.
A random two-cell embedding of a given graph $G$ is obtained by choosing a random local rotation around every vertex. We analyse the expected number of faces of such an embedding, which is equivalent to studying its average genus. In 1991, Stahl [5] proved that the expected number of faces in a random embedding of an arbitrary graph of order $n$ is at most $n\log (n)$. While there are many families of graphs whose expected number of faces is $\Theta (n)$, none are known where the expected number would be super-linear. This led the authors of [1] to conjecture that there is a linear upper bound. In this note we confirm their conjecture by proving that for any $n$-vertex multigraph, the expected number of faces in a random two-cell embedding is at most $2n\log (2\mu )$, where $\mu$ is the maximum edge-multiplicity. This bound is best possible up to a constant factor.
One of the drivers for pushing for open data as a form of corruption control stems from the belief that in making government operations more transparent, it would be possible to hold public officials accountable for how public resources are spent. These large datasets would then be open to the public for scrutiny and analysis, resulting in lower levels of corruption. Though data quality has been largely studied and many advancements have been made, it has not been extensively applied to open data, with some aspects of data quality receiving more attention than others. One key aspect however—accuracy—seems to have been overlooked. This gap resulted in our inquiry: how is accurate open data produced and how might breakdowns in this process introduce opportunities for corruption? We study a government agency situated within the Brazilian Federal Government in order to understand in what ways is accuracy compromised. Adopting a distributed cognition (DCog) theoretical framework, we found that the production of open data is not a neutral activity, instead it is a distributed process performed by individuals and artifacts. This distributed cognitive process creates opportunities for data to be concealed and misrepresented. Two models mapping data production were generated, the combination of which provided an insight into how cognitive processes are distributed, how data flow, are transformed, stored, and processed, and what instances provide opportunities for data inaccuracies and misrepresentations to occur. The results obtained have the potential to aid policymakers in improving data accuracy.
Representative school data on SARS-CoV-2 past-infection are scarce, and differences between pupils and staff remain ambiguous. We performed a nation-wide prospective seroprevalence study among pupils and staff over time and in relation to determinants of infection using Poisson regression and generalised estimating equations. A cluster random sample was selected with allocation by region and sociodemographic (SES) background. Surveys and saliva samples were collected in December 2020, March, and June 2021, and also in October and December 2021 for primary pupils. We recruited 885 primary and 569 secondary pupils and 799 staff in 84 schools. Cumulative seroprevalence (95% CI) among primary pupils increased from 11.0% (7.6; 15.9) at baseline to 60.4% (53.4; 68.3) in December 2021. Group estimates were similar at baseline; however, in June they were significantly higher among primary staff (38.9% (32.5; 46.4)) compared to pupils and secondary staff (24.2% (20.3; 28.8)). Infections were asymptomatic in 48–56% of pupils and 28% of staff. Seropositivity was associated with individual SES in pupils, and with school level, school SES and language network in staff in June. Associations with behavioural characteristics were inconsistent. Seroconversion rates increased two- to four-fold after self-reported high-risk contacts, especially with adults. Seroprevalence studies using non-invasive sampling can inform public health management.
Large deviations of the largest and smallest eigenvalues of $\mathbf{X}\mathbf{X}^\top/n$ are studied in this note, where $\mathbf{X}_{p\times n}$ is a $p\times n$ random matrix with independent and identically distributed (i.i.d.) sub-Gaussian entries. The assumption imposed on the dimension size p and the sample size n is $p=p(n)\rightarrow\infty$ with $p(n)={\mathrm{o}}(n)$. This study generalizes one result obtained in [3].
Climate change is expected to increase the frequency and intensity of extreme weather events. To properly assess the increased economical risk of these events, actuaries can gain in relying on expert models/opinions from multiple different sources, which requires the use of model combination techniques. From non-parametric to Bayesian approaches, different methods rely on varying assumptions potentially leading to very different results. In this paper, we apply multiple model combination methods to an ensemble of 24 experts in a pooling approach and use the differences in outputs from the different combinations to illustrate how one can gain additional insight from using multiple methods. The densities obtained from pooling in Montreal and Quebec City highlight the significant changes in higher quantiles obtained through different combination approaches. Areal reduction factor and quantile projected changes are used to show that consistency, or lack thereof, across approaches reflects the uncertainty of combination methods. This shows how an actuary using multiple expert models should consider more than one combination method to properly assess the impact of climate change on loss distributions, seeing as a single method can lead to overconfidence in projections.
Let a random geometric graph be defined in the supercritical regime for the existence of a unique infinite connected component in Euclidean space. Consider the first-passage percolation model with independent and identically distributed random variables on the random infinite connected component. We provide sufficient conditions for the existence of the asymptotic shape, and we show that the shape is a Euclidean ball. We give some examples exhibiting the result for Bernoulli percolation and the Richardson model. In the latter case we further show that it converges weakly to a nonstandard branching process in the joint limit of large intensities and slow passage times.
We focus on modelling categorical features and improving predictive power of neural networks with mixed categorical and numerical features in supervised learning tasks. The goal of this paper is to challenge the current dominant approach in actuarial data science with a new architecture of a neural network and a new training algorithm. The key proposal is to use a joint embedding for all categorical features, instead of separate entity embeddings, to determine the numerical representation of the categorical features which is fed, together with all other numerical features, into hidden layers of a neural network with a target response. In addition, we postulate that we should initialize the numerical representation of the categorical features and other parameters of the hidden layers of the neural network with parameters trained with (denoising) autoencoders in unsupervised learning tasks, instead of using random initialization of parameters. Since autoencoders for categorical data play an important role in this research, they are investigated in more depth in the paper. We illustrate our ideas with experiments on a real data set with claim numbers, and we demonstrate that we can achieve a higher predictive power of the network.
Due to the presence of reporting and settlement delay, claim data sets collected by non-life insurance companies are typically incomplete, facing right censored claim count and claim severity observations. Current practice in non-life insurance pricing tackles these right censored data via a two-step procedure. First, best estimates are computed for the number of claims that occurred in past exposure periods and the ultimate claim severities, using the incomplete, historical claim data. Second, pricing actuaries build predictive models to estimate technical, pure premiums for new contracts by treating these best estimates as actual observed outcomes, hereby neglecting their inherent uncertainty. We propose an alternative approach that brings valuable insights for both non-life pricing and reserving. As such, we effectively bridge these two key actuarial tasks that have traditionally been discussed in silos. Hereto, we develop a granular occurrence and development model for non-life claims that tackles reserving and at the same time resolves the inconsistency in traditional pricing techniques between actual observations and imputed best estimates. We illustrate our proposed model on an insurance as well as a reinsurance portfolio. The advantages of our proposed strategy are most compelling in the reinsurance illustration where large uncertainties in the best estimates originate from long reporting and settlement delays, low claim frequencies and heavy (even extreme) claim sizes.
COVID-19 impacts population health equity. While mRNA vaccines protect against serious illness and death, little New Zealand (NZ) data exist about the impact of Omicron – and the effectiveness of vaccination – on different population groups. We aim to examine the impact of Omicron on Māori, Pacific, and Other ethnicities and how this interacts with age and vaccination status in the Te Manawa Taki Midland region of NZ. Daily COVID-19 infection and hospitalisation rates (1 February 2022 to 29 June 2022) were calculated for Māori, Pacific, and Other ethnicities for six age bands. A multivariate logistic regression model quantified the effects of ethnicity, age, and vaccination on hospitalisation rates. Per-capita Omicron cases were highest and occurred earliest among Pacific (9 per 1,000) and Māori (5 per 1,000) people and were highest among 12–24-year-olds (7 per 1,000). Hospitalisation was significantly more likely for Māori people (odds ratio (OR) = 2.03), Pacific people (OR = 1.75), over 75-year-olds (OR = 39.22), and unvaccinated people (OR = 4.64). Length of hospitalisation is strongly related to age. COVID-19 vaccination reduces hospitalisations for older individuals and Māori and Pacific populations. Omicron inequitably impacted Māori and Pacific people through higher per-capita infection and hospitalisation rates. Older people are more likely to be hospitalised and for longer.
Let $(Z_n)_{n\geq0}$ be a supercritical Galton–Watson process. Consider the Lotka–Nagaev estimator for the offspring mean. In this paper we establish self-normalized Cramér-type moderate deviations and Berry–Esseen bounds for the Lotka–Nagaev estimator. The results are believed to be optimal or near-optimal.
Development of robust concrete mixes with a lower environmental impact is challenging due to natural variability in constituent materials and a multitude of possible combinations of mix proportions. Making reliable property predictions with machine learning can facilitate performance-based specification of concrete, reducing material inefficiencies and improving the sustainability of concrete construction. In this work, we develop a machine learning algorithm that can utilize intermediate target variables and their associated noise to predict the final target variable. We apply the methodology to specify a concrete mix that has high resistance to carbonation, and another concrete mix that has low environmental impact. Both mixes also fulfill targets on the strength, density, and cost. The specified mixes are experimentally validated against their predictions. Our generic methodology enables the exploitation of noise in machine learning, which has a broad range of applications in structural engineering and beyond.
For a $k$-uniform hypergraph $\mathcal{H}$ on vertex set $\{1, \ldots, n\}$ we associate a particular signed incidence matrix $M(\mathcal{H})$ over the integers. For $\mathcal{H} \sim \mathcal{H}_k(n, p)$ an Erdős–Rényi random $k$-uniform hypergraph, ${\mathrm{coker}}(M(\mathcal{H}))$ is then a model for random abelian groups. Motivated by conjectures from the study of random simplicial complexes we show that for $p = \omega (1/n^{k - 1})$, ${\mathrm{coker}}(M(\mathcal{H}))$ is torsion-free.