To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this short note, we prove the following analog of the Kővári–Sós–Turán theorem for intersection graphs of boxes. If G is the intersection graph of n axis-parallel boxes in $${{\mathbb{R}}^d}$$ such that G contains no copy of Kt,t, then G has at most ctn( log n)2d+3 edges, where c = c(d)>0 only depends on d. Our proof is based on exploring connections between boxicity, separation dimension and poset dimension. Using this approach, we also show that a construction of Basit, Chernikov, Starchenko, Tao and Tran of K2,2-free incidence graphs of points and rectangles in the plane can be used to disprove a conjecture of Alon, Basavaraju, Chandran, Mathew and Rajendraprasad. We show that there exist graphs of separation dimension 4 having superlinear number of edges.
During the first wave of the severe acute respiratory syndrome-coronavirus-2 epidemic in the Netherlands, notifications consisted mostly of patients with relatively severe disease. To enable real-time monitoring of the incidence of mild coronavirus disease 2019 (COVID-19) – for which medical consultation might not be required – the Infectieradar web-based syndromic surveillance system was launched in mid-March 2020. Our aim was to quantify associations between Infectieradar participant characteristics and the incidence of self-reported COVID-19-like illness. Recruitment for this cohort study was via a web announcement. After registering, participants completed weekly questionnaires, reporting the occurrence of a set of symptoms. The incidence rate of COVID-19-like illness was estimated and multivariable Poisson regression used to estimate the relative risks associated with sociodemographic variables, lifestyle factors and pre-existing medical conditions. Between 17 March and 24 May 2020, 25 663 active participants were identified, who reported 7060 episodes of COVID-19-like illness over 131 404 person-weeks of follow-up. The incidence rate declined over the analysis period, consistent with the decline in notified cases. Male sex, age 65+ years and higher education were associated with a significantly lower COVID-19-like illness incidence rate (adjusted rate ratios (RRs) of 0.80 (95% CI 0.76–0.84), 0.77 (0.70–0.85), 0.84 (0.80–0.88), respectively) and the baseline characteristics ever-smoker, asthma, allergies, diabetes, chronic lung disease, cardiovascular disease and children in the household were associated with a higher incidence (RRs of 1.11 (1.04–1.19) to 1.69 (1.50–1.90)). Web-based syndromic surveillance has proven useful for monitoring the temporal trends in, and risk factors associated with, the incidence of mild disease. Increased relative risks observed for several patient factors could reflect a combination of exposure risk, susceptibility to infection and propensity to report symptoms.
Hong Kong is an intermediate tuberculosis (TB) burden city in Asia Pacific with slow decline of case notification in the last decade. By 24-loci mycobacterial interspersed repetitive units – variable number of tandem repeats genotyping, we examined 534 Mycobacterium tuberculosis isolates collected from culture-positive hospitalised TB patients in a 1.7 million population geographic region in the city. Overall, 286 (75%) were classified as Beijing genotype, of which 216 (76%) and 59 (21%) belonged to modern and ancient sub-lineage, respectively. Only two cases were genetically clustered while spatial clustering was absent. Male gender, permanent residency in Hong Kong and born in Hong Kong or Mainland China were associated with Beijing genotype. The high prevalence of Beijing modern lineage was similar to that in East Asia, which reflected the pattern resulting from population migration. The paucity of clustering suggested that reactivation accounted for most of the TB disease cases, which was and echoed by observation that half were 60 years old or above, and the presence of co-morbid medical conditions. The predominance of reactivation TB cases in intermediate burden localities implies that the detection and control of latent TB infection would be the major challenge in achieving TB elimination.
Population-based seroprevalence studies on coronavirus disease 2019 (COVID-19) in low- and middle-income countries are lacking. We investigated the seroprevalence of severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2) antibodies in Sergipe state, Northeast Brazil, using rapid IgM−IgG antibody test and fluorescence immunoassay. The seroprevalence was 9.3% (95% CI 8.5–10.1), 10.2% (95% CI 9.2–11.3) for women and 7.9% (IC 95% 6.8–9.1) for men (P = 0.004). We found a decline in the prevalence of SARS-CoV-2 antibodies according to age, but the differences were not statistically significant: 0–19 years (9.9%; 95% CI 7.8–12.5), 20–59 years (9.3%; 95% CI 8.4–10.3) and ≥60 years (9.0%; 95% CI 7.5–10.8) (P = 0.517). The metropolitan area had a higher seroprevalence (11.7%, 95% CI 10.3–13.2) than outside municipalities (8.0%, 95% CI 7.2–8.9) (P < 0.001). These findings highlight the importance of serosurveillance to estimate the real impact of the COVID-19 outbreak and thereby provide data to better understand the spread of the virus, as well as providing information to guide stay-at-home measures and other policies. In addition, these results may be useful as basic data to follow the progress of COVID-19 outbreak as social restriction initiatives start to be relaxed in Brazil.
Data and data science offer tremendous potential to address some of our most intractable public problems (including the Covid-19 pandemic). At the same time, recent years have shown some of the risks of existing and emerging technologies. An updated framework is required to balance potential and risk, and to ensure that data is used responsibly. Data responsibility is not itself a new concept. However, amid a rapidly changing technology landscape, it has become increasingly clear that the concept may need updating, in order to keep up with new trends such as big data, open data, the Internet of things, and artificial intelligence, and machine learning. This paper seeks to outline 10 approaches and innovations for data responsibility in the 21st century. The 10 emerging concepts we have identified include:
End-to-end data responsibility
Decision provenance
Professionalizing data stewardship
From data science to question science
Contextual consent
Responsibility by design
Data asymmetries and data collaboratives
Personally identifiable inference
Group privacy
Data assemblies
Each of these is described at greater length in the paper, and illustrated with examples from around the world. Put together, they add up to a framework or outline for policy makers, scholars, and activists who seek to harness the potential of data to solve complex social problems and advance the public good. Needless to say, the 10 approaches outlined here represent just a start. We envision this paper more as an exercise in agenda-setting than a comprehensive survey.
In this paper, a new three-parameter discrete family of distributions, the $r{\mathcal B}ell$ family, is introduced. The family is based on series expansion of the r-Bell polynomials. The proposed model generalises the classical Poisson and the recently proposed Bell and Bell–Touchard distributions. It exhibits interesting stochastic properties. Its probabilities can be computed by a recursive formula that allows us to calculate the probability function of the amount of aggregate claims in the collective risk model in terms of an integral equation. Univariate and bivariate regression models are presented. The former regression model is used to explain the number of out-of-use claims in an automobile insurance portfolio, by showing a good out-of-sample performance. The latter is used to describe the number of out-of-use and parking claims jointly. This family provides an alternative to other traditionally used distributions to describe count data such as the negative binomial and Poisson-inverse Gaussian models.
The recent surge of data-driven methods in social policy have created new opportunities to assess existing poverty programs. The expectation is that the combination of advanced methods and more data can calculate the effectiveness of public interventions more accurately and tailor local initiatives accordingly. Specifically, nonmonetary indicators are increasingly being measured at micro levels in order to target social exclusion in combination with poverty. However, the multidimensional character of poverty, local context, and data matching pose challenges to data-driven analyses. By linking Dutch household-level data with policy-initiative-specific data at local level, we present an explorative study on the uptake of a local poverty pass. The goal is to unravel pass usage in terms of household income and location as well as the age of users. We find that income and age play a role in whether the pass is used, and usage differs per neighborhood. With this, the paper feeds into the discourse on how to operationalize and design data matching work in the multidimensional space of poverty and nonmonetary government initiatives.
Human papillomavirus (HPV) has been confirmed as the causative agent for cervical cancer. In this study, a total of 301 880 women were recruited from four different regions of Western China, with 301 880 exfoliated cervical cell samples collected from women for DNA isolation and purification. The HPV genotype was tested by polymerase chain reaction. The overall HPV prevalence rate, high-risk (HR) HPV infection rate, low-risk (LR) HPV infection rate and mixed HPV infection rate was 18.24%, 79.14%, 12.56% and 8.30%, respectively. The four most common HR HPV subtypes were HPV-52, 16, 58 and 53, which accounted for 20.49%, 19.93%, 14.54% and 10.01%, respectively. In LR HPV genotype, HPV-6 ranked the highest (28.17%), followed by HPV-81 (9.09%) and HPV-11 (3.78%). HPV genotype subgroup analysis also showed that single-type infection was the most common (77.26%) among HPV-positive individuals. Among multi-infection genotypes, double infection was the most common with frequencies of 76.04%. The overall prevalence of HPV is high in Western China, whose distribution demonstrates different patterns across different ages and regions. Viral genotypes HPV 53, 6 were frequently detected in this population, which is worth of significant clinical attention.
Decentralized coordination is one of the fundamental challenges for societies and organizations. While extensively explored from a variety of perspectives, one issue that has received limited attention is human coordination in the presence of adversarial agents. We study this problem by situating human subjects as nodes on a network, and endowing each with a role, either regular (with the goal of achieving consensus among all regular players), or adversarial (aiming to prevent consensus among regular players). We show that adversarial nodes are, indeed, quite successful in preventing consensus. However, we demonstrate that having the ability to communicate among network neighbors can considerably improve coordination success, as well as resilience to adversarial nodes. Our analysis of communication suggests that adversarial nodes attempt to exploit this capability for their ends, but do so in a somewhat limited way, perhaps to prevent regular nodes from recognizing their intent. In addition, we show that the presence of trusted nodes generally has limited value, but does help when many adversarial nodes are present, and players can communicate. Finally, we use experimental data to develop computational models of human behavior and explore additional parametric variations: features of network topologies and densities, and placement, all using the resulting data-driven agent-based (DDAB) model.
We investigate joint modelling of longevity trends using the spatial statistical framework of Gaussian process (GP) regression. Our analysis is motivated by the Human Mortality Database (HMD) that provides unified raw mortality tables for nearly 40 countries. Yet few stochastic models exist for handling more than two populations at a time. To bridge this gap, we leverage a spatial covariance framework from machine learning that treats populations as distinct levels of a factor covariate, explicitly capturing the cross-population dependence. The proposed multi-output GP models straightforwardly scale up to a dozen populations and moreover intrinsically generate coherent joint longevity scenarios. In our numerous case studies, we investigate predictive gains from aggregating mortality experience across nations and genders, including by borrowing the most recently available “foreign” data. We show that in our approach, information fusion leads to more precise (and statistically more credible) forecasts. We implement our models in R, as well as a Bayesian version in Stan that provides further uncertainty quantification regarding the estimated mortality covariance structure. All examples utilise public HMD datasets.
Brucellosis remains one of the main zoonoses worldwide. Epidemiological data on human brucellosis in Spain are scarce. The objective of this study was to assess the epidemiological characteristics of inpatient brucellosis in Spain between 1997 and 2015. A retrospective longitudinal descriptive study was performed. Data were requested from the Health Information Institute of the Ministry of Health and Equality, which provided us with the Minimum Basic Data Set of patients admitted to the National Health System. We also obtained data published in the System of Obligatory Notifiable Diseases. A total of 5598 cases were registered. The period incidence rate was 0.67 (95% CI 0.65–0.68) cases per 100 000 person-years. We observed a progressive decrease in the number of cases and annual incidence rates. A total of 3187 cases (56.9%) came from urban areas. The group most at risk comprised men around the fifth decade of life. The average (±s.d.) hospital stay was 12.6 days (±13.1). The overall lethality rate of the cohort was 1.5%. The number of inpatients diagnosed with brucellosis decreased exponentially. The group of patients with the highest risk of brucellosis in our study was males under 45 years of age and of urban origin. The lethality rate has reduced to minimum values. It is probable that hospital discharge records could be a good database for the epidemiological analysis of the hospital management of brucellosis and offer a better information collection system than the notifiable diseases system (EDO in Spanish).
Epidemic intelligence activities are undertaken by the WHO Regional Office for Africa to support member states in early detection and response to outbreaks to prevent the international spread of diseases. We reviewed epidemic intelligence activities conducted by the organisation from 2017 to 2020, processes used, key results and how lessons learned can be used to strengthen preparedness, early detection and rapid response to outbreaks that may constitute a public health event of international concern. A total of 415 outbreaks were detected and notified to WHO, using both indicator-based and event-based surveillance. Media monitoring contributed to the initial detection of a quarter of all events reported. The most frequent outbreaks detected were vaccine-preventable diseases, followed by food-and-water-borne diseases, vector-borne diseases and viral haemorrhagic fevers. Rapid risk assessments generated evidence and provided the basis for WHO to trigger operational processes to provide rapid support to member states to respond to outbreaks with a potential for international spread. This is crucial in assisting member states in their obligations under the International Health Regulations (IHR) (2005). Member states in the region require scaled-up support, particularly in preventing recurrent outbreaks of infectious diseases and enhancing their event-based surveillance capacities with automated tools and processes.
The aim of this study was to systematically assess the association between smoking and cardiovascular disease (CVD) and disease progression among novel coronavirus pneumonia (coronavirus disease 2019 (COVID-19)) cases. PubMed database and Cochrane Library database were searched by computer to seek the epidemiological data of COVID-19 cases and literatures regarding CVDs from 1 Jan to 6 October 2020. Two researchers independently conducted literature screening, data collection and the assessment of the risk of bias of the studies included. RevMan 5.2 software was employed for meta-analysis. Funnel plot was adopted to assess the publication bias. On the whole, 21 studies comprising 7041 COVID-19 cases were included. As revealed from the meta-analysis, 14.0% (984/7027) of cases had a history of smoking, and 9.7% (675/6931) were subject to underlying CVDs. Cases with a history of smoking achieved a higher rate of COVID-19 disease progression as opposed to those having not smoked (OR 1.53, 95% CI 1.29–1.81, P < 0.00001), while no significant association could be found between smoking status and COVID-19 disease progression (OR 1.23, 95% CI 0.93–1.63, P = 0.15). Besides, smoking history elevated the mortality rate by 1.91-fold (OR 1.91, 95% CI 1.35–2.69, P = 0.0002). Moreover, underlying CVD elevated the incidence of severe disease by 2.87-fold (OR 2.87, 95% CI 2.29–3.61, P < 0.00001) and mortality by 3.05-fold (OR 3.05, 95% CI 1.82–5.11, P < 0.0001) in COVID-19 cases. As demonstrated from the current evidence, smoking displays a strong association with COVID-19 disease progression and mortality, and intensive tobacco control is imperative. Moreover, cases with CVD show a significantly elevated risk of disease progression and death when subject to COVID-19. However, the association between COVID-19 and CVD, and the potential effect exerted by smoking in the development of the two still require further verifications by larger and higher quality studies.
The Scenario Weights for Importance Measurement (SWIM) package implements a flexible sensitivity analysis framework, based primarily on results and tools developed by Pesenti et al. (2019). SWIM provides a stressed version of a stochastic model, subject to model components (random variables) fulfilling given probabilistic constraints (stresses). Possible stresses can be applied on moments, probabilities of given events, and risk measures such as Value-At-Risk and Expected Shortfall. SWIM operates upon a single set of simulated scenarios from a stochastic model, returning scenario weights, which encode the required stress and allow monitoring the impact of the stress on all model components. The scenario weights are calculated to minimise the relative entropy with respect to the baseline model, subject to the stress applied. As well as calculating scenario weights, the package provides tools for the analysis of stressed models, including plotting facilities and evaluation of sensitivity measures. SWIM does not require additional evaluations of the simulation model or explicit knowledge of its underlying statistical and functional relations; hence, it is suitable for the analysis of black box models. The capabilities of SWIM are demonstrated through a case study of a credit portfolio model.
We propose a new neighbouring prediction model for mortality forecasting. For each mortality rate at age x in year t, mx,t, we construct an image of neighbourhood mortality data around mx,t, that is, Ꜫmx,t (x1, x2, s), which includes mortality information for ages in [x-x1, x+x2], lagging k years (1 ≤ k ≤ s). Combined with the deep learning model – convolutional neural network, this framework is able to capture the intricate nonlinear structure in the mortality data: the neighbourhood effect, which can go beyond the directions of period, age, and cohort as in classic mortality models. By performing an extensive empirical analysis on all the 41 countries and regions in the Human Mortality Database, we find that the proposed models achieve superior forecasting performance. This framework can be further enhanced to capture the patterns and interactions between multiple populations.
We investigated likelihood to vaccinate and reasons for and against accepting a coronavirus disease 2019 (COVID-19) vaccine among adult residents of Finland. Vaccine acceptance declined from 70% in April to 64% in December 2020. Complacency and worry about side effects were main reasons against vaccination while concern about severe disease was a strong motive for vaccination. Convenience of vaccination and recommendations by healthcare workers were identified as enablers for vaccination among those aged under 50 years. Understanding barriers and enablers behind vaccine acceptance is decisive in ensuring a successful implementation of COVID-19 vaccination programmes, which will be key to ending the pandemic.
This article discusses the stochastic behavior and reliability properties for the inactivity times of failed components in coherent systems under double monitoring. A mixture representation of reliability function is obtained for the inactivity times of failed components, and some stochastic comparison results are also established. Furthermore, some sufficient conditions are developed in terms of the aging properties of the inactivity times of failed components. Finally, some numerical examples are presented to illustrate the theoretical results.
A hooking network is built by stringing together components randomly chosen from a set of building blocks (graphs with hooks). The vertices are endowed with “affinities” which dictate the attachment mechanism. We study the distance from the master hook to a node in the network chosen according to its affinity after many steps of growth. Such a distance is commonly called the depth of the chosen node. We present an exact average result and a rather general central limit theorem for the depth. The affinity model covers a wide range of attachment mechanisms, such as uniform attachment and preferential attachment, among others. Naturally, the limiting normal distribution is parametrized by the structure of the building blocks and their probabilities. We also take the point of view of a visitor uninformed about the affinity mechanism by which the network is built. To explore the network, such a visitor chooses the nodes uniformly at random. We show that the distance distribution under such a uniform choice is similar to the one under random choice according to affinities.
Understanding core statistical properties and data features in mortality data are fundamental to the development of machine learning methods for demographic and actuarial applications of mortality projection. The study of statistical features in such data forms the basis for classification, regression and forecasting tasks. In particular, the understanding of key statistical structure in such data can aid in improving accuracy in undertaking mortality projection and forecasting when constructing life tables. The ability to accurately forecast mortality is a critical aspect for the study of demography, life insurance product design and pricing, pension planning and insurance-based decision risk management. Though many stylised facts of mortality data have been discussed in the literature, we provide evidence for a novel statistical feature that is pervasive in mortality data at a national level that is as yet unexplored. In this regard, we demonstrate in this work a strong evidence for the existence of long memory features in mortality data, and second that such long memory structures display multifractality as a statistical feature that can act as a discriminator of mortality dynamics by age, gender and country. To achieve this, we first outline the way in which we choose to represent the persistence of long memory from an estimator perspective. We make a natural link between a class of long memory features and an attribute of stochastic processes based on fractional Brownian motion. This allows us to use well established estimators for the Hurst exponent to then robustly and accurately study the long memory features of mortality data. We then introduce to mortality analysis the notion from data science known as multifractality. This allows us to study the long memory persistence features of mortality data on different timescales. We demonstrate its accuracy for sample sizes commensurate with national-level age term structure historical mortality records. A series of synthetic studies as well a comprehensive analysis of real mortality death count data are studied in order to demonstrate the pervasiveness of long memory structures in mortality data, both mono-fractal and multifractal functional features are verified to be present as stylised facts of national-level mortality data for most countries and most age groups by gender. We conclude by demonstrating how such features can be used in kernel clustering and mortality model forecasting to improve these actuarial applications.