To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Following an outbreak of Salmonella Typhimurium in Wales in July 2021 associated with sheep meat and offal, further genetically related cases were detected across the UK. Cases were UK residents with laboratory-confirmed Salmonella Typhimurium in the same 5-single-nucleotide polymorphism (SNP) single-linkage cluster with specimen date between 01/08/2021–2031/12/2022. We described cases using routine (UK) and enhanced (Wales only) surveillance data. Exposures in cases in Wales were compared with non-Typhimurium Salmonella case–controls. Environmental Health Practitioners and the Food Standards Agency investigated supply chains of food premises reported by ≥2 cases. Animal, carcass, and environmental samples taken for diagnostic or monitoring purposes for gastrointestinal pathogens were included in microbiological investigations. We identified 142 cases: 75% in England, 23% in Wales and 3% in Scotland. Median age was 32 years, and 59% were male. Direct contact with sheep was associated with becoming a case (aOR: 14, 95%CI: 1.4–145) but reported by few (6/32 cases). No single food item, premises, or supplier linked all cases. Multi-agency collaboration enabled the identification of isolates in the same 5-SNP single-linkage cluster from a sheep carcass at an English abattoir and in ruminant, wildlife, poultry, and environmental samples, suggesting multiple vehicles and pathways of infection.
In this paper we consider the filtering of partially observed multidimensional diffusion processes that are observed regularly at discrete times. This is a challenging problem which requires the use of advanced numerical schemes based upon time-discretization of the diffusion process and then the application of particle filters. Perhaps the state-of-the-art method for moderate-dimensional problems is the multilevel particle filter of Jasra et al. (SIAM J. Numer. Anal.55 (2017), 3068–3096). This is a method that combines multilevel Monte Carlo and particle filters. The approach in that article is based intrinsically upon an Euler discretization method. We develop a new particle filter based upon the antithetic truncated Milstein scheme of Giles and Szpruch (Ann. Appl. Prob.24 (2014), 1585–1620). We show empirically for a class of diffusion problems that, for $\epsilon>0$ given, the cost to produce a mean squared error (MSE) of $\mathcal{O}(\epsilon^2)$ in the estimation of the filter is $\mathcal{O}(\epsilon^{-2}\log(\epsilon)^2)$. In the case of multidimensional diffusions with non-constant diffusion coefficient, the method of Jasra et al. (2017) requires a cost of $\mathcal{O}(\epsilon^{-2.5})$ to achieve the same MSE.
Since its establishment in 2014, Data for Policy (https://dataforpolicy.org) has emerged as a prominent global community promoting interdisciplinary research and cross-sector collaborations in the realm of data-driven innovation for governance and policymaking. This report presents an overview of the community’s evolution from 2014 to 2023 and introduces its six-area framework, which provides a comprehensive mapping of the data for policy research landscape. The framework is based on extensive consultations with key stakeholders involved in the international committees of the annual Data for Policy conference series and the open-access journal Data & Policy (https://www.cambridge.org/core/journals/data-and-policy), published by Cambridge University Press. By presenting this inclusive framework, along with the guiding principles and future outlook for the community, this report serves as a vital foundation for continued research and innovation in the field of data for policy.
Occurrence of cryptosporidiosis has been associated with weather conditions in many settings internationally. We explored statistical clusters of human cryptosporidiosis and their relationship with severe weather events in New Zealand (NZ). Notified cases of cryptosporidiosis from 1997 to 2015 were obtained from the national surveillance system. Retrospective space–time permutation was used to identify statistical clusters. Cluster data were compared to severe weather events in a national database. SaTScan analysis detected 38 statistically significant cryptosporidiosis clusters. Around a third (34.2%, 13/38) of these clusters showed temporal and spatial alignment with severe weather events. Of these, nearly half (46.2%, 6/13) occurred in the spring. Only five (38%, 5/13) of these clusters corresponded to a previously reported cryptosporidiosis outbreak. This study provides additional evidence that severe weather events may contribute to the development of some cryptosporidiosis clusters. Further research on this association is needed as rainfall intensity is projected to rise in NZ due to climate change. The findings also provide further arguments for upgrading the quality of drinking water sources to minimize contamination with pathogens from runoff from livestock agriculture.
In the traditional multidimensional credibility models developed by Jewell ((1973) Operations Research Center, pp. 73–77.), the estimation of the hypothetical mean vector involves complex matrix manipulations, which can be challenging to implement in practice. Additionally, the estimation of hyperparameters becomes even more difficult in high-dimensional risk variable scenarios. To address these issues, this paper proposes a new multidimensional credibility model based on the conditional joint distribution function for predicting future premiums. First, we develop an estimator of the joint distribution function of a vector of claims using linear combinations of indicator functions based on past observations. By minimizing the integral of the expected quadratic distance function between the proposed estimator and the true joint distribution function, we obtain the optimal linear Bayesian estimator of the joint distribution function. Using the plug-in method, we obtain an explicit formula for the multidimensional credibility estimator of the hypothetical mean vector. In contrast to the traditional multidimensional credibility approach, our newly proposed estimator does not involve a matrix as the credibility factor, but rather a scalar. This scalar is composed of both population information and sample information, and it still maintains the essential property of increasingness with respect to the sample size. Furthermore, the new estimator based on the joint distribution function can be naturally extended and applied to estimate the process covariance matrix and risk premiums under various premium principles. We further illustrate the performance of the new estimator by comparing it with the traditional multidimensional credibility model using bivariate exponential-gamma and multivariate normal distributions. Finally, we present two real examples to demonstrate the findings of our study.
Collecting network data directly from network members can be challenging. One alternative involves inferring a network from observed groups, for example, inferring a network of scientific collaboration from researchers’ observed paper authorships. In this paper, I explore when an unobserved undirected network of interest can accurately be inferred from observed groups. The analysis uses simulations to experimentally manipulate the structure of the unobserved network to be inferred, the number of groups observed, the extent to which the observed groups correspond to cliques in the unobserved network, and the method used to draw inferences. I find that when a small number of groups are observed, an unobserved network can be accurately inferred using a simple unweighted two-mode projection, provided that each group’s membership closely corresponds to a clique in the unobserved network. In contrast, when a large number of groups are observed, an unobserved network can be accurately inferred using a statistical backbone extraction model, even if the groups’ memberships are mostly random. These findings offer guidance for researchers seeking to indirectly measure a network of interest using observations of groups.
We explore the limiting spectral distribution of large-dimensional random permutation matrices, assuming the underlying population distribution possesses a general dependence structure. Let $\textbf X = (\textbf x_1,\ldots,\textbf x_n)$$\in \mathbb{C} ^{m \times n}$ be an $m \times n$ data matrix after self-normalization (n samples and m features), where $\textbf x_j = (x_{1j}^{*},\ldots, x_{mj}^{*} )^{*}$. Specifically, we generate a permutation matrix $\textbf X_\pi$ by permuting the entries of $\textbf x_j$$(j=1,\ldots,n)$ and demonstrate that the empirical spectral distribution of $\textbf {B}_n = ({m}/{n})\textbf{U} _{n} \textbf{X} _\pi \textbf{X} _\pi^{*} \textbf{U} _{n}^{*}$ weakly converges to the generalized Marčenko–Pastur distribution with probability 1, where $\textbf{U} _n$ is a sequence of $p \times m$ non-random complex matrices. The conditions we require are $p/n \to c >0$ and $m/n \to \gamma > 0$.
The betweenness centrality of a graph vertex measures how often this vertex is visited on shortest paths between other vertices of the graph. In the analysis of many real-world graphs or networks, the betweenness centrality of a vertex is used as an indicator for its relative importance in the network. In particular, it is among the most popular tools in social network analysis. In recent years, a growing number of real-world networks have been modeled as temporal graphs instead of conventional (static) graphs. In a temporal graph, we have a fixed set of vertices and there is a finite discrete set of time steps, and every edge might be present only at some time steps. While shortest paths are straightforward to define in static graphs, temporal paths can be considered “optimal” with respect to many different criteria, including length, arrival time, and overall travel time (shortest, foremost, and fastest paths). This leads to different concepts of temporal betweenness centrality, posing new challenges on the algorithmic side. We provide a systematic study of temporal betweenness variants based on various concepts of optimal temporal paths.
Computing the betweenness centrality for vertices in a graph is closely related to counting the number of optimal paths between vertex pairs. While in static graphs computing the number of shortest paths is easily doable in polynomial time, we show that counting foremost and fastest paths is computationally intractable (#P-hard), and hence, the computation of the corresponding temporal betweenness values is intractable as well. For shortest paths and two selected special cases of foremost paths, we devise polynomial-time algorithms for temporal betweenness computation. Moreover, we also explore the distinction between strict (ascending time labels) and non-strict (non-descending time labels) time labels in temporal paths. In our experiments with established real-world temporal networks, we demonstrate the practical effectiveness of our algorithms, compare the various betweenness concepts, and derive recommendations on their practical use.
Rectal swabs of 104 patients who underwent abdominal surgery were screened for ESBL producers. Sequence types (STs) and resistance genes were identified by whole-genome sequencing of 46 isolates from 17 patients. All but seven isolates were assigned to recognized STs. While 18 ESBL-producing E. coli (EPEC) strains were of unique STs, ESBL-producing K. pneumoniae (EPKP) strains were mainly ST14 or ST15. Eight patients harboured strains of the same ST before and after abdominal surgery. The most prevalent resistant genes in E. coli were blaEC (69.57%), blaCTX-M (65.22%), and blaTEM (36.95%), while blaSHV was present in only K. pneumoniae (41.30%). Overall, genes encoding β-lactamases of classes A (blaCTX-M, blaTEM, blaZ), C (blaSHV, blaMIR, and blaDHA), and D (blaOXA) were identified, the most prevalent variants being blaCTX-M-15, blaTEM-1B, blaSHV-28, and blaOXA-1. Interestingly, blaCMY-2, the most common pAmpC β-lactamase genes reported worldwide, and mobile colistin resistance genes, mcr-10-1, were also identified. The presence of blaCMY-2 and mcr-10-1 is concerning as they may constitute a potentially high risk of pan-resistant post-surgical infections. It is imperative that healthcare professionals monitor intra-abdominal surgical site infections rigorously to prevent transmission of faecal ESBL carriage in high-risk patients.
Ross River virus (RRV), the most medically and economically important arbovirus in Australia, has been the most prevalent arbovirus infections in humans for many years. Infected humans and horses often suffer similar clinical symptoms. We conducted a prospective longitudinal study over a 3.5-year period to investigate the exposure dynamics of RRV in three foal cohorts (n = 32) born in a subtropical region of South East Queensland, Australia, between 2020 and 2022. RRV-specific seroconversion was detected in 56% (n = 18) of foals with a median time to seroconversion, after waning of maternal antibodies, of 429 days (95% CI: 294–582). The median age at seroconversion was 69 weeks (95% CI: 53–57). Seroconversion events were only detected between December and March (Southern Hemisphere summer) over the entire study period. Cox proportion hazards regression analyses revealed that seroconversions were significantly (p < 0.05) associated with air temperature in the month of seroconversion. Time-lags in meteorological variables were not significantly (p > 0.05) associated with seroconversion, except for relative humidity (p = 0.036 at 2-month time-lag). This is in contrast to research results of RRV infection in humans, which peaked between March and May (Autumn) and with a 0–3 month time-lag for various meteorological risk factors. Therefore, horses may be suitable sentinels for monitoring active arbovirus circulation and could be used for early arbovirus outbreak detection in human populations.
The global incidence of syphilis is increasing. Continuity of care challenges the control of sexually transmitted diseases. In this study, we assessed the follow-up and serological decline differences between community- and hospital-diagnosed patients in Israel. A historical cohort study was conducted using the Israel National Syphilis Center (NSC) repository. Patients with a positive non-specific Venereal Disease Research Laboratory (VDRL) test between 2011 and 2020 were included. Rates of serological follow-up and serological titre decreases were compared between hospital- and community-diagnosed patients. The study included 4,445 patients, 2,596 (58.4%) were diagnosed in community clinics and 1,849 (41.6%) in hospitals. Of community-diagnosed patients, 1,957 (75.4%) performed follow-up testing, compared with 834 (51.2%) hospital-diagnosed patients (p < 0.001). On multivariate analysis, the odds ratio of serology follow-up among community-diagnosed patients was 2.8 (95 per cent confidence interval (95% CI): 2.2–3.5) that of hospital-diagnosed patients. There were 1,397 (71.4%) community-diagnosed patients with serological titre decrease, compared with 626 (74.9%) hospital-diagnosed patients (p = 0.03). On multivariate analysis, this difference diminished. Serological follow-up testing is suboptimal and was performed more often among patients initially diagnosed in the community compared to hospitals. Continuity of care should be improved to promote successful patient care and prevent disease spread.
The protection number of a vertex $v$ in a tree is the length of the shortest path from $v$ to any leaf contained in the maximal subtree where $v$ is the root. In this paper, we determine the distribution of the maximum protection number of a vertex in simply generated trees, thereby refining a recent result of Devroye, Goh, and Zhao. Two different cases can be observed: if the given family of trees allows vertices of outdegree $1$, then the maximum protection number is on average logarithmic in the tree size, with a discrete double-exponential limiting distribution. If no such vertices are allowed, the maximum protection number is doubly logarithmic in the tree size and concentrated on at most two values. These results are obtained by studying the singular behaviour of the generating functions of trees with bounded protection number. While a general distributional result by Prodinger and Wagner can be used in the first case, we prove a variant of that result in the second case.
High-cardinality categorical features are pervasive in actuarial data (e.g., occupation in commercial property insurance). Standard categorical encoding methods like one-hot encoding are inadequate in these settings.
In this work, we present a novel Generalised Linear Mixed Model Neural Network (“GLMMNet”) approach to the modelling of high-cardinality categorical features. The GLMMNet integrates a generalised linear mixed model in a deep learning framework, offering the predictive power of neural networks and the transparency of random effects estimates, the latter of which cannot be obtained from the entity embedding models. Further, its flexibility to deal with any distribution in the exponential dispersion (ED) family makes it widely applicable to many actuarial contexts and beyond. In order to facilitate the application of GLMMNet to large datasets, we use variational inference to estimate its parameters—both traditional mean field and versions utilising textual information underlying the high-cardinality categorical features.
We illustrate and compare the GLMMNet against existing approaches in a range of simulation experiments as well as in a real-life insurance case study. A notable feature for both our simulation experiment and the real-life case study is a comparatively low signal-to-noise ratio, which is a feature common in actuarial applications. We find that the GLMMNet often outperforms or at least performs comparably with an entity-embedded neural network in these settings, while providing the additional benefit of transparency, which is particularly valuable in practical applications.
Importantly, while our model was motivated by actuarial applications, it can have wider applicability. The GLMMNet would suit any applications that involve high-cardinality categorical variables and where the response cannot be sufficiently modelled by a Gaussian distribution, especially where the inherent noisiness of the data is relatively high.
We explore some of the risks related to Artificial Intelligence (AI) from an actuarial perspective based on research from a transregional industry focus group. We aim to define the key gaps and challenges faced when implementing and utilising modern modelling techniques within traditional actuarial tasks from a risk perspective and in the context of professional standards and regulations. We explore best practice guidelines to attempt to define an ideal approach and propose potential next steps to help reach the ideal approach. We aim to focus on the considerations, initially from a traditional actuarial perspective and then, if relevant, consider some implications for non-traditional actuarial work, by way of examples. The examples are not intended to be exhaustive. The group considered potential issues and challenges of using AI, related to the following key themes:
Ethical
○ Bias, fairness, and discrimination
○ Individualisation of risk assessment
○ Public interest
Professional
○ Interpretability and explainability
○ Transparency, reproducibility, and replicability
○ Validation and governance
Lack of relevant skills available
Wider themes
This paper aims to provide observations that could help inform industry and professional guidelines or discussion or to support industry practitioners. It is not intended to replace current regulation, actuarial standards, or guidelines. The paper is aimed at an actuarial and insurance technical audience, specifically those who are utilising or developing AI, and actuarial industry bodies.
The Internet of Things (IoT) and wearable computing are crucial elements of modern information systems and applications in which advanced features for user interactivity and monitoring are required. However, in the fields of pervasive gaming, IoT has had limited real-world applications. In this work, we present a prototype of a wearable platform for pervasive games that combines IoT with wearable computing to enable the real-time monitoring of physical activity. The main objective of the solution is to promote the utilization of gamification techniques to enhance the physical activity of users through challenges and quests. This aims to create a symbolic link between the virtual gameplay and the real-world environment without the requirement of a smartphone. With the integration of sensors and wearable devices by design, the platform has the capability of real-time monitoring the users’ physical activity during the game. The system performance results highlight the efficiency and attractiveness of the wearable platform for gamifying physical activity.
Compressible anisothermal flows, which are commonly found in industrial settings such as combustion chambers and heat exchangers, are characterized by significant variations in density, viscosity, and heat conductivity with temperature. These variations lead to a strong interaction between the temperature and velocity fields that impacts the near-wall profiles of both quantities. Wall-modeled large-eddy simulations (LESs) rely on a wall model to provide a boundary condition, for example, the shear stress and the heat flux that accurately represents this interaction despite the use of coarse cells near the wall, and thereby achieve a good balance between computational cost and accuracy. In this article, the use of graph neural networks for wall modeling in LES is assessed for compressible anisothermal flow. Graph neural networks are a type of machine learning model that can learn from data and operate directly on complex unstructured meshes. Previous work has shown the effectiveness of graph neural network wall modeling for isothermal incompressible flows. This article develops the graph neural network architecture and training to extend their applicability to compressible anisothermal flows. The model is trained and tested a priori using a database of both incompressible isothermal and compressible anisothermal flows. The model is finally tested a posteriori for the wall-modeled LES of a channel flow and a turbine blade, both of which were not seen during training.
This paper examines the potential role of network analysis in understanding the powerful elites that pose a significant threat to peace and state-building within post-conflict contexts. This paper makes a threefold contribution. First, it identifies a caveat in the scholarship surrounding international interventions, shedding light on shortcomings in their design and implementation strategies, and elucidating the influence these elites wield in the political and economic realms. Next, it delineates the essentials of the network analysis approach, addressing the information and data requirements and limitations inherent in its application in conflict environments. Finally, the paper provides valuable insights gleaned from the international operation in Guatemala known as the International Commission for Impunity in Guatemala, which specifically targeted illicit networks. The argument asserts that network analysis functions as a dual-purpose tool—serving as both a descriptive instrument to reveal, identify, and address the root causes of conflict and a predictive tool to enhance peace agreement implementation and improve decision-making. Simultaneously, it underscores the challenge of data analysis and translating network interventions into tangible real-life consequences for long-lasting results.