To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This paper demonstrates how the combustion of fossil fuels for transport purpose might cause health implications. Based on an original case study [i.e. the Hubei province in China, the epicentre of the coronavirus disease-2019 (COVID-19) pandemic], we collected data on atmospheric pollutants (PM2.5, PM10 and CO2) and economic growth (GDP), along with daily series on COVID-19 indicators (cases, resuscitations and deaths). Then, we adopted an innovative Machine Learning approach, applying a new image Neural Networks model to investigate the causal relationships among economic, atmospheric and COVID-19 indicators. Empirical findings emphasise that any change in economic activity is found to substantially affect the dynamic levels of PM2.5, PM10 and CO2 which, in turn, generates significant variations in the spread of the COVID-19 epidemic and its associated lethality. As a robustness check, the conduction of an optimisation algorithm further corroborates previous results.
The development of transformative technologies for mitigating our global environmental and technological challenges will require significant innovation in the design, development, and manufacturing of advanced materials and chemicals. To achieve this innovation faster than what is possible by traditional human intuition-guided scientific methods, we must transition to a materials informatics-centered paradigm, in which synergies between data science, materials science, and artificial intelligence are leveraged to enable transformative, data-driven discoveries faster than ever before through the use of predictive models and digital twins. While materials informatics is experiencing rapidly increasing use across the materials and chemicals industries, broad adoption is hindered by barriers such as skill gaps, cultural resistance, and data sparsity. We discuss the importance of materials informatics for accelerating technological innovation, describe current barriers and examples of good practices, and offer suggestions for how researchers, funding agencies, and educational institutions can help accelerate the adoption of urgently needed informatics-based toolsets for science in the 21st century.
The quality of service in healthcare is constantly challenged by outlier events such as pandemics (i.e., Covid-19) and natural disasters (such as hurricanes and earthquakes). In most cases, such events lead to critical uncertainties in decision-making, as well as in multiple medical and economic aspects at a hospital. External (geographic) or internal factors (medical and managerial) lead to shifts in planning and budgeting, but most importantly, reduce confidence in conventional processes. In some cases, support from other hospitals proves necessary, which exacerbates the planning aspect. This paper presents three data-driven methods that provide data-driven indicators to help healthcare managers organize their economics and identify the most optimum plan for resources allocation and sharing. Conventional decision-making methods fall short in recommending validated policies for managers. Using reinforcement learning, genetic algorithms, traveling salesman, and clustering, we experimented with different healthcare variables and presented tools and outcomes that could be applied at health institutes. Experiments are performed; the results are recorded, evaluated, and presented.
Contemporary data tools such as online dashboards have been instrumental in monitoring the spread of the COVID-19 pandemic. These real-time interactive platforms allow citizens to understand the local, regional, and global spread of COVID-19 in a consolidated and intuitive manner. Despite this, little research has been conducted on how citizens respond to the data on the dashboards in terms of the pandemic and data governance issues such as privacy. In this paper, we seek to answer the research question: how can governments use data tools, such as dashboards, to balance the trade-offs between safeguarding public health and protecting data privacy during a public health crisis? This study used surveys and semi-structured interviews to understand the perspectives of the developers and users of COVID-19 dashboards in Hong Kong. A typology was also developed to assess how Hong Kong’s dashboards navigated trade-offs between data disclosure and privacy at a time of crisis compared to dashboards in other jurisdictions. Results reveal that two key factors were present in the design and improvement of COVID-19 dashboards in Hong Kong: informed actions based on open COVID-19 case data, and significant public trust built on data transparency. Finally, this study argues that norms surrounding reporting on COVID-19 cases, as well as cases for future pandemics, should be co-constructed among citizens and governments so that policies founded on such norms can be acknowledged as salient, credible, and legitimate.
Attempts to formalize inspection and monitoring strategies in industry have struggled to combine evidence from multiple sources (including subject matter expertise) in a mathematically coherent way. The perceived requirement for large amounts of data are often cited as the reason that quantitative risk-based inspection is incompatible with the sparse and imperfect information that is typically available to structural integrity engineers. Current industrial guidance is also limited in its methods of distinguishing quality of inspections, as this is typically based on simplified (qualitative) heuristics. In this paper, Bayesian multi-level (partial pooling) models are proposed as a flexible and transparent method of combining imperfect and incomplete information, to support decision-making regarding the integrity management of in-service structures. This work builds on the established theoretical framework for computing the expected value of information, by allowing for partial pooling between inspection measurements (or groups of measurements). This method is demonstrated for a simulated example of a structure with active corrosion in multiple locations, which acknowledges that the data will be associated with some precision, bias, and reliability. Quantifying the extent to which an inspection of one location can reduce uncertainty in damage models at remote locations has been shown to influence many aspects of the expected value of an inspection. These results are considered in the context of the current challenges in risk based structural integrity management.
This study describes the development of a pilot sentinel school absence syndromic surveillance system. Using data from a sample of schools in England the capability of this system to monitor the impact of disease on school absences in school-aged children is shown, using the coronavirus disease 2019 (COVID-19) pandemic period as an example. Data were obtained from an online app service used by schools and parents to report their children absent, including reasons/symptoms relating to absence. For 2019 and 2020, data were aggregated into daily counts of ‘total’ and ‘cough’ absence reports. There was a large increase in the number of absence reports in March 2020 compared to March 2019, corresponding to the first wave of the COVID-19 pandemic in England. Absence numbers then fell rapidly and remained low from late March 2020 until August 2020, while lockdown was in place in England. Compared to 2019, there was a large increase in the number of absence reports in September 2020 when schools re-opened in England, although the peak number of absences was smaller than in March 2020. This information can help provide context around the absence levels in schools associated with COVID-19. Also, the system has the potential for further development to monitor the impact of other conditions on school absence, e.g. gastrointestinal infections.
Given the rapid reductions in human mortality observed over recent decades and the uncertainty associated with their future evolution, there have been a large number of mortality projection models proposed by actuaries and demographers in recent years. Many of these, however, suffer from being overly complex, thereby producing spurious forecasts, particularly over long horizons and for small, noisy data sets. In this paper, we exploit statistical learning tools, namely group regularisation and cross-validation, to provide a robust framework to construct discrete-time mortality models by automatically selecting the most appropriate functions to best describe and forecast particular data sets. Most importantly, this approach produces bespoke models using a trade-off between complexity (to draw as much insight as possible from limited data sets) and parsimony (to prevent over-fitting to noise), with this trade-off designed to have specific regard to the forecasting horizon of interest. This is illustrated using both empirical data from the Human Mortality Database and simulated data, using code that has been made available within a user-friendly open-source R package StMoMo.
This third edition of Braun and Murdoch's bestselling textbook now includes discussion of the use and design principles of the tidyverse packages in R, including expanded coverage of ggplot2, and R Markdown. The expanded simulation chapter introduces the Box–Muller and Metropolis–Hastings algorithms. New examples and exercises have been added throughout. This is the only introduction you'll need to start programming in R, the computing standard for analyzing data. This book comes with real R code that teaches the standards of the language. Unlike other introductory books on the R system, this book emphasizes portable programming skills that apply to most computing languages and techniques used to develop more complex projects. Solutions, datasets, and any errata are available from www.statprogr.science. Worked examples - from real applications - hundreds of exercises, and downloadable code, datasets, and solutions make a complete package for anyone working in or learning practical data science.
Estimating the coronavirus disease-2019 (COVID-19) infection fatality rate (IFR) has proven to be particularly challenging –and rather controversial– due to the fact that both the data on deaths and the data on the number of individuals infected are subject to many different biases. We consider a Bayesian evidence synthesis approach which, while simple enough for researchers to understand and use, accounts for many important sources of uncertainty inherent in both the seroprevalence and mortality data. With the understanding that the results of one's evidence synthesis analysis may be largely driven by which studies are included and which are excluded, we conduct two separate parallel analyses based on two lists of eligible studies obtained from two different research teams. The results from both analyses are rather similar. With the first analysis, we estimate the COVID-19 IFR to be 0.31% [95% credible interval (CrI) of (0.16%, 0.53%)] for a typical community-dwelling population where 9% of the population is aged over 65 years and where the gross-domestic-product at purchasing-power-parity (GDP at PPP) per capita is $17.8k (the approximate worldwide average). With the second analysis, we obtain 0.32% [95% CrI of (0.19%, 0.47%)]. Our results suggest that, as one might expect, lower IFRs are associated with younger populations (and may also be associated with wealthier populations). For a typical community-dwelling population with the age and wealth of the United States we obtain IFR estimates of 0.43% and 0.41%; and with the age and wealth of the European Union, we obtain IFR estimates of 0.67% and 0.51%.
We show that for an $n\times n$ random symmetric matrix $A_n$, whose entries on and above the diagonal are independent copies of a sub-Gaussian random variable $\xi$ with mean 0 and variance 1,
This improves a result of Vershynin, who obtained such a bound with $n^{1/2}$ replaced by $n^{c}$ for a small constant c, and $1/8$ replaced by $(1/8) - \eta$ (with implicit constants also depending on $\eta > 0$). Furthermore, when $\xi$ is a Rademacher random variable, we prove that
The special case $\epsilon = 0$ improves a recent result of Campos, Mattos, Morris, and Morrison, which showed that $\mathbb{P}[s_n(A_n) = 0] \le O(\exp(\!-\Omega(n^{1/2}))).$ Notably, in a departure from the previous two best bounds on the probability of singularity of symmetric matrices, which had relied on somewhat specialized and involved combinatorial techniques, our methods fall squarely within the broad geometric framework pioneered by Rudelson and Vershynin, and suggest the possibility of a principled geometric approach to the study of the singular spectrum of symmetric random matrices. The main innovations in our work are new notions of arithmetic structure – the Median Regularized Least Common Denominator (MRLCD) and the Median Threshold, which are natural refinements of the Regularized Least Common Denominator (RLCD)introduced by Vershynin, and should be more generally useful in contexts where one needs to combine anticoncentration information of different parts of a vector.
Innon-life insurance, the payment history can be predictive of the timing of a settlement for individual claims. Ignoring the association between the payment process and the settlement process could bias the prediction of outstanding payments. To address this issue, we introduce into the literature of micro-level loss reserving a joint modeling framework that incorporates longitudinal payments of a claim into the intensity process of claim settlement. We discuss statistical inference and focus on the prediction aspects of the model. We demonstrate applications of the proposed model in the reserving practice with a detailed empirical analysis using data from a property insurance provider. The prediction results from an out-of-sample validation show that the joint model framework outperforms existing reserving models that ignore the payment–settlement association.
The World Health Organization African region recorded its first laboratory-confirmed coronavirus disease-2019 (COVID-19) cases on 25 February 2020. Two months later, all the 47 countries of the region were affected. The first anniversary of the pandemic occurred in a changed context with the emergence of new variants of concern (VOC) and growing COVID-19 fatigue. This study describes the epidemiological trajectory of COVID-19 in the region, summarises public health and social measures (PHSM) implemented and discusses their impact on the pandemic trajectory. As of 24 February 2021, the African region accounted for 2.5% of cases and 2.9% of deaths reported globally. Of the 13 countries that submitted detailed line listing of cases, the proportion of cases with at least one co-morbid condition was estimated at 3.3% of all cases. Hypertension, diabetes and human immunodeficiency virus (HIV) infection were the most common comorbid conditions, accounting for 11.1%, 7.1% and 5.0% of cases with comorbidities, respectively. Overall, the case fatality ratio (CFR) in patients with comorbid conditions was higher than in patients without comorbid conditions: 5.5% vs. 1.0% (P < 0.0001). Countries started to implement lockdown measures in early March 2020. This contributed to slow the spread of the pandemic at the early stage while the gradual ease of lockdowns from 20 April 2020 resulted in an upsurge. The second wave of the pandemic, which started in November 2020, coincided with the emergence of the new variants of concern. Only 0.08% of the population from six countries received at least one dose of the COVID-19 vaccine. It is critical to not only learn from the past 12 months to improve the effectiveness of the current response but also to start preparing the health systems for subsequent waves of the current pandemic and future pandemics.
As of 03 January 2021, the WHO African region is the least affected by the coronavirus disease-2019 (COVID-19) pandemic, accounting for only 2.4% of cases and deaths reported globally. However, concerns abound about whether the number of cases and deaths reported from the region reflect the true burden of the disease and how the monitoring of the pandemic trajectory can inform response measures.
We retrospectively estimated four key epidemiological parameters (the total number of cases, the number of missed cases, the detection rate and the cumulative incidence) using the COVID-19 prevalence calculator tool developed by Resolve to Save Lives. We used cumulative cases and deaths reported during the period 25 February to 31 December 2020 for each WHO Member State in the region as well as population data to estimate the four parameters of interest. The estimated number of confirmed cases in 42 countries out of 47 of the WHO African region included in this study was 13 947 631 [95% confidence interval (CI): 13 334 620–14 635 502] against 1 889 512 cases reported, representing 13.5% of overall detection rate (range: 4.2% in Chad, 43.9% in Guinea). The cumulative incidence of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) was estimated at 1.38% (95% CI: 1.31%–1.44%), with South Africa the highest [14.5% (95% CI: 13.9%–15.2%)] and Mauritius [0.1% (95% CI: 0.099%–0.11%)] the lowest. The low detection rate found in most countries of the WHO African region suggests the need to strengthen SARS-CoV-2 testing capacities and adjusting testing strategies.
The effectiveness of screening travellers during times of international disease outbreak is contentious, especially as the reduction in the risk of disease importation can be very small. Border screening typically consists of travellers being thermally scanned for signs of fever and/or completing a survey declaring any possible symptoms prior to admission to their destination country; while more thorough testing typically exists, these would generally prove more disruptive to deploy. In this paper, we describe a simple Monte Carlo based model that incorporates the epidemiology of coronavirus disease-2019 (COVID-19) to investigate the potential decrease in risk of disease importation that might be achieved by requiring travellers to undergo screening upon arrival during the current pandemic. This is a purely theoretical study to investigate the maximum impact that might be attained by deploying a test or testing programme simply at the point of entry, through which we may assess such action in the real world as a method of decreasing the risk of importation. We, therefore, assume ideal conditions such as 100% compliance among travellers and the use of a ‘perfect’ test. In addition to COVID-19, we also apply the presented model to simulated outbreaks of influenza, severe acute respiratory syndrome (SARS) and Ebola for comparison. Our model only considers screening implemented at airports, being the predominant method of international travel. Primary results showed that in the best-case scenario, screening at the point of entry may detect a maximum of 8.8% of travellers infected with COVID-19, compared to 34.8.%, 9.7% and 3.0% for travellers infected with influenza, SARS and Ebola respectively. While results appear to indicate that screening is more effective at preventing disease ingress when the disease in question has a shorter average incubation period, our results suggest that screening at the point of entry alone does not represent a sufficient method to adequately protect a nation from the importation of COVID-19 cases.
The SARS-CoV-2 virus is rapidly evolving via mutagenesis, lengthening the pandemic, and threatening the public health. Until August 2021, 12 variants of SARS-CoV-2 named as variants of concern (VOC; Alpha to Delta) or variants of interest (VOI; Epsilon to Mu), with significant impact on transmissibility, morbidity, possible reinfection and mortality, have been identified. The VOC Delta (B.1.617.2) of Indian origin is now the dominant and the most contagious variant worldwide as it provokes a strong binding to the human ACE2 receptor, increases transmissibility and manifests considerable immune escape strategies after natural infection or vaccination. Although the development and administration of SARS-CoV-2 vaccines, based on different technologies (mRNA, adenovirus carrier, recombinant protein, etc.), are very promising for the control of the pandemic, their effectiveness and neutralizing activity against VOCs varies significantly. In this review, we describe the most significant circulating variants of SARS-CoV-2, and the known effectiveness of currently available vaccines against them.
The use of a Kaplan–Meier (K–M) survival time approach is generally considered appropriate to report antimalarial efficacy trials. However, when a treatment arm has 100% efficacy, confidence intervals may not be computed. Furthermore, methods that use probability rules to handle missing data for instance by multiple imputation, encounter perfect prediction problem when a treatment arm has full efficacy, in which case all imputed values are either treatment success or all imputed values are failures. The use of a survival K–M method addresses this imputation problem in estimating the efficacy estimates also referred to as cure rates. We discuss the statistical challenges and propose a potential way forward.
The proposed approach includes the use of K–M estimates as the main measure of efficacy. Confidence intervals could be computed using the binomial exact method. p-Values for comparison of difference in efficacy between treatments can be estimated using Fisher’s exact test. We emphasize that when efficacy rates are not 100% in both groups, the K–M approach remains the main strategy of analysis considering its statistical robustness in handling missing data and confidence intervals can be computed under such scenarios.
Ex post moral hazard arises when the insured has an unobservable influence on the size of a loss after its occurrence. In automobile (property) insurance, ex post moral hazard could increase in the scope of the repairs and/or the value of the repairs. Both vehicle owners and auto repairers could gain from increasing the scope of repairs, while auto repairers would gain from an increase in the value of repairs. An analysis of 994 Australian road traffic crashes found that ex post moral hazard increased the value of repairs by 46.8 per cent of which 9 percentage points was explained by an increase in the scope of the repairs, which was defined as an increased from 2 to 2.4 parts per auto repair.
In this paper, we present a method for generating a copula by composing two arbitrary n-dimensional copulas via a vector of bivariate functions, where the resulting copula is named as the multivariate composite copula. A necessary and sufficient condition on the vector guaranteeing the composite function to be a copula is given, and a general approach to construct the vector satisfying this necessary and sufficient condition via bivariate copulas is provided. The multivariate composite copula proposes a new framework for the construction of flexible multivariate copula from existing ones, and it also includes some known classes of copulas. It is shown that the multivariate composite copula has a clear probability structure, and it satisfies the characteristic of uniform convergence as well as the reproduction property for its component copulas. Some properties of multivariate composite copulas are discussed. Finally, numerical illustrations and an empirical example on financial data are provided to show the advantages of the multivariate composite copula, especially in capturing the tail dependence.
This study was endeavoured to contribute in furthering our understanding of the molecular epidemiology of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) by sequencing and analysing the first full-length genome sequences obtained from 48 coronavirus disease-2019 (COVID-19) patients in five districts in Western Serbia in the period April 2020–July 2020. SARS-CoV-2 sequences in Western Serbia distinguished from the Wuhan sequence in 128 SNPs in total. The phylogenetic structure of local SARS-CoV-2 isolates suggested the existence of at least four distinct groups of SARS-CoV-2 strains in Western Serbia. The first group is the most similar to the strain from Italy. These isolates included two 20A sequences and 15−30 20B sequences that displayed a newly occurring set of four conjoined mutations. The second group is the most similar to the strain from France, carrying two mutations and belonged to 20A clade. The third group is the most similar to the strain from Switzerland carrying four co-occurring mutations and belonging to 20B clade. The fourth group is the most similar to another strain from France, displaying one mutation that gave rise to a single local isolate that belonged to 20A clade.