To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
What drives changes in the thematic focus of state-linked manipulated media? We study this question in relation to a long-running Iranian state-linked manipulated media campaign that was uncovered by Twitter in 2021. Using a variety of machine learning methods, we uncover and analyze how this manipulation campaign’s topical themes changed in relation to rising Covid-19 cases in Iran. By using the topics of the tweets in a novel way, we find that increases in domestic Covid-19 cases engendered a shift in Iran’s manipulated media focus away from Covid-19 themes and toward international finance- and investment-focused themes. These findings underscore (i) the potential for state-linked manipulated media campaigns to be used for diversionary purposes and (ii) the promise of machine learning methods for detecting such behaviors.
The Least Trimmed Squares (LTS) regression estimator is known to be very robust to the presence of “outliers”. It is based on a clear and intuitive idea: in a sample of size n, it searches for the h-subsample of observations with the smallest sum of squared residuals. The remaining $n-h$ observations are declared “outliers”. Fast algorithms for its computation exist. Nevertheless, the existing asymptotic theory for LTS, based on the traditional $\epsilon $-contamination model, shows that the asymptotic behavior of both regression and scale estimators depend on nuisance parameters. Using a recently proposed new model, in which the LTS estimator is maximum likelihood, we show that the asymptotic behavior of both the LTS regression and scale estimators are free of nuisance parameters. Thus, with the new model as a benchmark, standard inference procedures apply while allowing a broad range of contamination.
We model voting behaviour in the multi-group setting of a two-tier voting system using sequences of de Finetti measures. Our model is defined by using the de Finetti representation of a probability measure (i.e. as a mixture of conditionally independent probability measures) describing voting behaviour. The de Finetti measure describes the interaction between voters and possible outside influences on them. We assume that for each population size there is a (potentially) different de Finetti measure, and as the population grows, the sequence of de Finetti measures converges weakly to the Dirac measure at the origin, representing a tendency toward weakening social cohesion as the population grows large. The resulting model covers a wide variety of behaviours, ranging from independent voting in the limit under fast convergence, a critical convergence speed with its own pattern of behaviour, to a subcritical convergence speed which yields a model in line with empirical evidence of real-world voting data, contrary to previous probabilistic models used in the study of voting. These models can be used, e.g., to study the problem of optimal voting weights in two-tier voting systems.
The principal function of an open recirculating system (ORS) is to remove heat from power plant equipment. In particular, the presence of scale on the internal surfaces of ORS heat exchange equipment can reduce heat transfer efficiency, which leads to increased energy consumption and operating costs. The purpose of this article is to investigate the process of calcium carbonate (CaCO3) precipitation formation in terms of the components of the carbonate system and parameters affecting the shift of carbonate equilibrium in an ORS. An appraisal model was used to represent the processes occurring during the operation of an ORS. In this study, it is demonstrated that water heating in ORS condensers leads to the excretion of carbon dioxide (CO2) from the water, while cooling in the cooling towers results in CO2 uptake by the water. These processes significantly influence the state of carbonate equilibrium within the ORS. The study used the results of chemical control of the make-up and cooling water at the ORS Rivne Nuclear Power Plant (RNPP) for 2022. Furthermore, the dependencies of changes in the components of the carbonate system on the pH levels of the make-up (pH 7.51–9.52) and cooling (pH 8.21–9.53) water were revealed, and changes in the cycles of concentration (CоC), total hardness (TH), total dissolved solids (TSD), and total alkalinity (TA) were estimated. Taking into account the obtained correlation dependencies, in general, it was found that the lower the CoC levels, the lower the TA reduction value, and it is possible to increase or decrease the cooling water pH levels, which is determined by the initial state of carbonate equilibrium of make-up water. These findings enable the prediction and control of CaCO3 scale formation through continuous monitoring of water chemistry, making the process more efficient, reliable, and sustainable. The results emphasize the importance of data-driven modeling for optimizing water treatment and reducing operational costs in power plants by reducing CaCO3 scale formation.
This article examines high-dimensional covariates in regression discontinuity design (RDD) analysis. We introduce estimation and inference methods for the RDD models that incorporate covariate selection while maintaining stability across various numbers of covariates. The proposed methods combine a localization approach using kernel weights with $\ell _{1}$-penalization to handle high-dimensional covariates. We provide both theoretical and numerical evidence demonstrating the efficacy of our methods. Theoretically, we present risk and coverage properties for our point estimation and inference methods. Conditions are given under which the proposed estimator becomes more efficient than the conventional covariate adjusted estimator at the cost of an additional sparsity condition. Numerically, our simulation experiments and empirical examples show the robust behaviors of the proposed methods to the number of covariates in terms of bias and variance for point estimation and coverage probability and interval length for inference.
The complex socioeconomic landscape of conflict zones demands innovative approaches to assess and predict vulnerabilities for crafting and implementing effective policies by the United Nations (UN) institutions. This article presents a groundbreaking Augmented Intelligence-driven Prediction Model developed to forecast multidimensional vulnerability levels (MVLs) across Afghanistan. Leveraging a symbiotic fusion of human expertise and machine capabilities (e.g., artificial intelligence), the model demonstrates a predictive accuracy ranging between 70% and 80%. This research not only contributes to enhancing the UN Early Warning (EW) Mechanisms but also underscores the potential of augmented intelligence in addressing intricate challenges in conflict-ridden regions. This article outlines the use of augmented intelligence methodology applied to a use case to predict MVLs in Afghanistan. It discusses the key findings of the pilot project, and further proposes a holistic platform to enhance policy decisions through augmented intelligence, including an EW mechanism to significantly improve EW processes, thereby supporting decision-makers in formulating effective policies and fostering sustainable development within the UN.
This paper defines and studies a broad class of shock models by assuming that a Markovian arrival process models the arrival pattern of shocks. Under the defined class, we show that the system’s lifetime follows the well-known phase-type distribution. Further, we examine the age replacement policy for systems with a continuous phase-type distribution, identifying sufficient conditions for determining the optimal replacement time. Since phase-type distributions are dense in the class of lifetime distributions, our findings for the age replacement policy are widely applicable. We include numerical examples and graphical illustrations to support our results.
We study the Markov chain Monte Carlo estimator for numerical integration for functions that do not need to be square integrable with respect to the invariant distribution. For chains with a spectral gap we show that the absolute mean error for $L^p$ functions, with $p \in (1,2)$, decreases like $n^{({1}/{p}) -1}$, which is known to be the optimal rate. This improves currently known results where an additional parameter $\delta \gt 0$ appears and the convergence is of order $n^{(({1+\delta})/{p})-1}$.
In February 2023, 52 cases of gastrointestinal illness were reported in customers of Takeaway A, South Wales. Shigella flexneri serotype 2a was the causative organism. An outbreak investigation was conducted to determine the extent and vehicle of the outbreak.
Following descriptive summary and environmental investigations, a case–control study was completed. Participants completed a telephone questionnaire on food, travel, and environmental exposures. A multivariable logistic regression model was built, including exposures with p-values < 0.2 and interactions identified on stratified analysis. Staff faecal samples were screened for Shigella sp.
Thirty-one cases and 29 controls were included in the study. Eighty-seven per cent of cases and 76% of controls ate from Takeaway A on 10 February 2023. Coleslaw was the main factor associated with illness (aOR: 200, 95% CI: 12–3220) and an interaction with cabbage was identified (aOR: 886, 95% CI: 26–30034). Shigella sp. were not detected in any staff samples.
Coleslaw was the most likely vehicle. Though the contamination route is unknown, a food handler is the most likely source. This large outbreak differs from recent European outbreaks, which primarily have been associated with sexual transmission. Although uncommon in the UK, S. flexneri should be considered as a cause of foodborne outbreaks.
Given a fixed small graph H and a larger graph G, an H-factor is a collection of vertex-disjoint subgraphs $H'\subset G$, each isomorphic to H, that cover the vertices of G. If G is the complete graph $K_n$ equipped with independent U(0,1) edge weights, what is the lowest total weight of an H-factor? This problem has previously been considered for $H=K_2$, for example. We show that if H contains a cycle, then the minimum weight is sharply concentrated around some $L_n = \Theta(n^{1-1/d^*})$ (where $d^*$ is the maximum 1-density of any subgraph of H). Some of our results also hold for H-covers, where the copies of H are not required to be vertex-disjoint.
We consider the problem of identifying the parameters of a time-homogeneous bivariate Markov chain when only one of the two variables is observable. We show that, subject to conditions that we spell out, the transition kernel and the distribution of the initial condition are uniquely recoverable (up to an arbitrary relabelling of the state space of the latent variable) from the joint distribution of four (or more) consecutive time-series observations. The result is, therefore, applicable to (short) panel data as well as to (stationary) time series data.
Since the implementation of the Basel III Accord, expected shortfall (ES) has gained increasing attention from regulators as a complement to value-at-risk (VaR). The problem of elicitability for ES makes jointly modeling VaR and ES a popular method to study ES. In this article, we develop model averaging for joint VaR and ES regression models that selects the two weight vectors by minimizing a jackknife criterion. We show the large sample properties of the estimators under potential model misspecification with increasing dimension of parameters and the asymptotic optimality of the selected weights in the sense of minimizing the out-of-sample excess final prediction error. Simulation studies and three empirical analyses reveal good finite sample performance.
Spatial analysis and disease mapping have the potential to enhance understanding of tuberculosis (TB) dynamics, whose spatial dynamics may be complicated by the mix of short and long-range transmission and long latency periods. TB notifications in Nam Dinh Province for individuals aged 15 and older from 2013 to 2022 were analyzed with a variety of spatio-temporal methods. The study commenced with an analysis of spatial autocorrelation to identify clustering patterns, followed by the evaluation of several candidate Bayesian spatio-temporal models. These models varied from simple assessments of spatial heterogeneity to more complex configurations incorporating covariates and interactions. The findings highlighted a peak in the TB notification rate in 2017, with 98 cases per 100,000 population, followed by a sharp decline in 2021. Significant spatial autocorrelation at the commune level was detected over most of the 10-year period. The Bayesian model that best balanced goodness-of-fit and complexity indicated that TB trends were associated with poverty: each percentage point increase in the proportion of poor households was associated with a 1.3% increase in TB notifications, emphasizing a significant socioeconomic factor in TB transmission dynamics. The integration of local socioeconomic data with spatio-temporal analysis could further enhance our understanding of TB epidemiology.
Epidemic preparedness requires clear procedures and guidelines when a rapid risk assessment of a communicable disease threat is requested. In an evaluation of past risk assessments, we found that modifications to existing guidelines, such as the European Centre for Disease Prevention and Control’s (ECDC) rapid risk assessment operational tool, can strengthen this process. Therefore, we present alternative guidelines, in which we propose a unifying risk assessment terminology, describe how the risk question should be phrased by the risk manager, and redefine the probability and impact dimension of risk, including a methodology to express uncertainty. In our approach, probability refers to the probability of the introduction of a disease into a specified population in a specified time period, and impact combines the magnitude of spread and the severity of the health outcomes. Based on the collected evidence, both the probability of introduction and the magnitude of spread are quantitatively expressed by expert judgements, providing unambiguous risk assessment. We advise not to summarize the risk by a single qualification as ‘low’ or ‘high’. These alternative guidelines, which are illustrated by a hypothetical example on mpox, have been implemented at Statens Serum Institut in Denmark and can benefit other public health institutes.
The additive reserving model assumes the existence of volume measures such that the corresponding expected loss ratios are identical for all accident years. While classical literature assumes these volumes are known, in practice, accurate volume measures are often unavailable. The issue of uncertain volume measures in the additive model was addressed in a generalization of the loss ratio method published in 2018. The derivation is rather complex and the method is computationally intensive, especially for large loss development triangles. This paper introduces an alternative approach that leverages the well-established EM algorithm, significantly reducing computational requirements.
This study explores the relationship between alter centrality in various social domains and the perception of linguistic similarity within personal networks. Linguistic similarity perception is defined as the extent to which individuals perceive others to speak similarly to themselves. A survey of 126 college students and their social connections (n = 1035) from the French-speaking region of Switzerland was conducted. We applied logistic multilevel regressions to account for the hierarchical structure of dyadic ties. The results show that alters holding central positions in supportive networks are positively associated with perceived linguistic similarity, while those who are central in conflict networks show a negative association. The role of ambivalence yielded mixed results, with a positive and significant association emerging when ambivalence was linked to family members.
Digital twins are a new paradigm for our time, offering the possibility of interconnected virtual representations of the real world. The concept is very versatile and has been adopted by multiple communities of practice, policymakers, researchers, and innovators. A significant part of the digital twin paradigm is about interconnecting digital objects, many of which have previously not been combined. As a result, members of the newly forming digital twin community are often talking at cross-purposes, based on different starting points, assumptions, and cultural practices. These differences are due to the philosophical world-view adopted within specific communities. In this paper, we explore the philosophical context which underpins the digital twin concept. We offer the building blocks for a philosophical framework for digital twins, consisting of 21 principles that are intended to help facilitate their further development. Specifically, we argue that the philosophy of digital twins is fundamentally holistic and emergentist. We further argue that in order to enable emergent behaviors, digital twins should be designed to reconstruct the behavior of a physical twin by “dynamically assembling” multiple digital “components”. We also argue that digital twins naturally include aspects relating to the philosophy of artificial intelligence, including learning and exploitation of knowledge. We discuss the following four questions (i) What is the distinction between a model and a digital twin? (ii) What previously unseen results can we expect from a digital twin? (iii) How can emergent behaviours be predicted? (iv) How can we assess the existence and uniqueness of digital twin outputs?
This article establishes a data-driven modeling framework for lean hydrogen ($ {\mathrm{H}}_2 $)-air reaction rates for the Large Eddy Simulation (LES) of turbulent reactive flows. This is particularly challenging since $ {\mathrm{H}}_2 $ molecules diffuse much faster than heat, leading to large variations in burning rates, thermodiffusive instabilities at the subfilter scale, and complex turbulence-chemistry interactions. Our data-driven approach leverages a Convolutional Neural Network (CNN), trained to approximate filtered burning rates from emulated LES data. First, five different lean premixed turbulent $ {\mathrm{H}}_2 $-air flame Direct Numerical Simulations (DNSs) are computed each with a unique global equivalence ratio. Second, DNS snapshots are filtered and downsampled to emulate LES data. Third, a CNN is trained to approximate the filtered burning rates as a function of LES scalar quantities: progress variable, local equivalence ratio, and flame thickening due to filtering. Finally, the performances of the CNN model are assessed on test solutions never seen during training. The model retrieves burning rates with very high accuracy. It is also tested on two filter and downsampling parameters and two global equivalence ratios between those used during training. For these interpolation cases, the model approximates burning rates with low error even though the cases were not included in the training dataset. This a priori study shows that the proposed data-driven machine learning framework is able to address the challenge of modeling lean premixed $ {\mathrm{H}}_2 $-air burning rates. It paves the way for a new modeling paradigm for the simulation of carbon-free hydrogen combustion systems.