To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Whole-genome sequencing (WGS) has shown tremendous potential in rapid diagnosis of drug-resistant tuberculosis (TB). In the current study, we performed WGS on drug-resistant Mycobacterium tuberculosis isolates obtained from Shanghai (n = 137) and Russia (n = 78). We aimed to characterise the underlying and high-frequency novel drug-resistance-conferring mutations, and also create valuable combinations of resistance mutations with high predictive sensitivity to predict multidrug- and extensively drug-resistant tuberculosis (MDR/XDR-TB) phenotype using a bootstrap method. Most strains belonged to L2.2, L4.2, L4.4, L4.5 and L4.8 lineages. We found that WGS could predict 82.07% of phenotypically drug-resistant domestic strains. The prediction sensitivity for rifampicin (RIF), isoniazid (INH), ethambutol (EMB), streptomycin (STR), ofloxacin (OFL), amikacin (AMK) and capreomycin (CAP) was 79.71%, 86.30%, 76.47%, 88.37%, 83.33%, 70.00% and 70.00%, respectively. The mutation combination with the highest sensitivity for MDR prediction was rpoB S450L + rpoB H445A/P + katG S315T + inhA I21T + inhA S94A, with a sensitivity of 92.17% (0.8615, 0.9646), and the mutation combination with highest sensitivity for XDR prediction was rpoB S450L + katG S315T + gyrA D94G + rrs A1401G, with a sensitivity of 92.86% (0.8158, 0.9796). The molecular information presented here will be of particular value for the rapid clinical detection of MDR- and XDR-TB isolates through laboratory diagnosis.
Serosurveillance is an important epidemiologic tool for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), used to estimate infection rates and the degree of population immunity. There is no general agreement on which antibody biomarker(s) should be used, especially with the rollout of vaccines globally. Here, we used random forest models to demonstrate that a single spike or receptor-binding domain (RBD) antibody was adequate for classifying prior infection, while a combination of two antibody biomarkers performed better than any single marker for estimating time-since-infection. Nucleocapsid antibodies performed worse than spike or RBD antibodies for classification, but can be useful for estimating time-since-infection, and in distinguishing infection-induced from vaccine-induced responses. Our analysis has the potential to inform the design of serosurveys for SARS-CoV-2, including decisions regarding a number of antibody biomarkers measured.
This paper addresses the task of modeling severity losses using segmentation when the data distribution does not fall into the usual regression frameworks. This situation is not uncommon in lines of business such as third-party liability insurance, where heavy-tails and multimodality often hamper a direct statistical analysis. We propose to use regression models based on phase-type distributions, regressing on their underlying inhomogeneous Markov intensity and using an extension of the expectation–maximization algorithm. These models are interpretable and tractable in terms of multistate processes and generalize the proportional hazards specification when the dimension of the state space is larger than 1. We show that the combination of matrix parameters, inhomogeneity transforms, and covariate information provides flexible regression models that effectively capture the entire distribution of loss severities.
The fact that a large proportion of insurance policyholders make no claims during a one-year period highlights the importance of zero-inflated count models when analyzing the frequency of insurance claims. There is a vast literature focused on the univariate case of zero-inflated count models, while work in the area of multivariate models is considerably less advanced. Given that insurance companies write multiple lines of insurance business, where the claim counts on these lines of business are often correlated, there is a strong incentive to analyze multivariate claim count models. Motivated by the idea of Liu and Tian (Computational Statistics and Data Analysis, 83, 200–222; 2015), we develop a multivariate zero-inflated hurdle model to describe multivariate count data with extra zeros. This generalization offers more flexibility in modeling the behavior of individual claim counts while also incorporating a correlation structure between claim counts for different lines of insurance business. We develop an application of the expectation–maximization (EM) algorithm to enable the statistical inference necessary to estimate the parameters associated with our model. Our model is then applied to an automobile insurance portfolio from a major insurance company in Spain. We demonstrate that the model performance for the multivariate zero-inflated hurdle model is superior when compared to several alternatives.
This study investigated the characteristics of transmission routes of COVID-19 cluster infections (⩾10 linked cases within a short period) in Gangwon Province between 22 February 2020 and 31 May 2021. Transmission routes were divided into five major categories and 35 sub-categories according to the relationship between the infector and the infectee and the location of transmission. A total of 61 clusters occurred during the study period, including 1741 confirmed cases (55.7% of all confirmed cases (n = 3125)). The the five major routes of transmission were as follows: ‘using (staying in) the same facility (50.7%), ‘cohabiting family members’ (23.3%), ‘social gatherings with acquaintances’ (10.8%), ‘other transmission routes’ (7.0%), and ‘social gatherings with non-cohabiting family members/relatives’ (5.5%). For transmission caused by using (staying in) the same facility, the highest number of confirmed cases was associated with churches, followed by medical institutions (inpatient), sports facilities, military bases, offices, nightlife businesses, schools, restaurants, day-care centres and kindergarten, and service businesses. Our analysis highlights specific locations with frequent transmission of infections, and transmission routes that should be targeted in situations where adherence to disease control rules is difficult.
Since the start of the coronavirus disease-2019 (COVID-19) pandemic, there has been interest in using wastewater monitoring as an approach for disease surveillance. A significant uncertainty that would improve the interpretation of wastewater monitoring data is the intensity and timing with which individuals shed RNA from severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) into wastewater. By combining wastewater and case surveillance data sets from a university campus during a period of heightened surveillance, we inferred that individual shedding of RNA into wastewater peaks on average 6 days (50% uncertainty interval (UI): 6–7; 95% UI: 4–8) following infection, and that wastewater measurements are highly overdispersed [negative binomial dispersion parameter, k = 0.39 (95% credible interval: 0.32–0.48)]. This limits the utility of wastewater surveillance as a leading indicator of secular trends in SARS-CoV-2 transmission during an epidemic, and implies that it could be most useful as an early warning of rising transmission in areas where transmission is low or clinical testing is delayed or of limited capacity.
It is unclear if – after symptom onset of a primary case of coronavirus disease-2019 (COVID-19) in a household – ensuing chains of transmissions among household members occur and if household epidemiology of COVID-19 is modified by the different circulating variants. We analysed data of 52 774 household clusters to investigate the day of symptom onset of ensuing cases in households relative to the symptom onset of the primary case within the household. Irrespective of cluster size or age of the primary case, 95% of all secondary household cases had symptom onset within 14 days after the symptom onset of the primary case. Stratification by variant showed that the mean interval from symptom onset of the primary case to the symptom onset of secondary cases decreased significantly from 4.8 days (wildtype) to 4.5 days (alpha) and 4.0 days (delta). Similarly, the cumulative proportion of 95% of secondary cases occurred within 14 days (wild type), 12 days (alpha) and 10 days (delta). Our findings suggest that during dominant delta circulation – apart from rare individual constellations – a 10-day household quarantine after symptom onset of the primary case is sufficient for household contacts who remain COVID-free.
Rabies, a fatal and vaccine-preventable disease, is endemic throughout Africa. In 2016, a rabies outbreak occurred in black-backed jackals (Canis mesomelas) along the western boundary of Gauteng Province, South Africa. We investigated the possible drivers of the 2016 outbreak and established its origin. Using spatio-temporal locations of cases, we applied logistic regression and Geographic Information System techniques to investigate environmental covariates driving occurrences of emerging rabies cases in Gauteng Province. About 53.8% of laboratory-confirmed lyssaviruses in Gauteng Province in 2016 originated from jackals. Phylogenetic trees reconstructed from a partial region of the glycoprotein gene of these and historical rabies viruses (RABVs) demonstrated the lyssaviruses to be of canid origin with 97.7% nucleotide sequence similarity. The major cluster comprised jackal RABVs from the 2012 KwaZulu/Natal outbreak and the 2016 outbreak in Gauteng Province. The second cluster was composed of both jackal and dog RABVs. Both clusters correlated with independent RABV introductions into Gauteng by dogs and jackals, respectively. This study demonstrated an expansion of a jackal rabies cycle from north-west Province into Gauteng Province during the 2016 dry period, as jackals ranged widely in search for food resources leading to increased jackal-dog interactions, reminiscent of the intricate links of domestic and wildlife rabies cycles in South Africa.
This essential reference for students and scholars in the input-output research and applications community has been fully revised and updated to reflect important developments in the field. Expanded coverage includes construction and application of multiregional and interregional models, including international models and their application to global economic issues such as climate change and international trade; structural decomposition and path analysis; linkages and key sector identification and hypothetical extraction analysis; the connection of national income and product accounts to input-output accounts; supply and use tables for commodity-by-industry accounting and models; social accounting matrices; non-survey estimation techniques; and energy and environmental applications. Input-Output Analysis is an ideal introduction to the subject for advanced undergraduate and graduate students in many scholarly fields, including economics, regional science, regional economics, city, regional and urban planning, environmental planning, public policy analysis and public management.
We determine the asymptotics of the number of independent sets of size $\lfloor \beta 2^{d-1} \rfloor$ in the discrete hypercube $Q_d = \{0,1\}^d$ for any fixed $\beta \in (0,1)$ as $d \to \infty$, extending a result of Galvin for $\beta \in (1-1/\sqrt{2},1)$. Moreover, we prove a multivariate local central limit theorem for structural features of independent sets in $Q_d$ drawn according to the hard-core model at any fixed fugacity $\lambda>0$. In proving these results we develop several general tools for performing combinatorial enumeration using polymer models and the cluster expansion from statistical physics along with local central limit theorems.
In this article, we focus on data trust and data privacy, and how attitudes may be changing during the COVID-19 period. On balance, it appears that Australians are more trusting of organizations with regards to data privacy and less concerned about their own personal information and data than they were prior to the spread of COVID-19. The major determinant of this change in trust with regards to data was changes in general confidence in government institutions. Despite this improvement in trust with regards to data privacy, trust levels are still low.
Large-scale coordinated efforts have been dedicated to understanding the global health and economic implications of the COVID-19 pandemic. Yet, the rapid spread of discrimination and xenophobia against specific populations has largely been neglected. Understanding public attitudes toward migration is essential to counter discrimination against immigrants and promote social cohesion. Traditional data sources to monitor public opinion are often limited, notably due to slow collection and release activities. New forms of data, particularly from social media, can help overcome these limitations. While some bias exists, social media data are produced at an unprecedented temporal frequency, geographical granularity, are collected globally and accessible in real-time. Drawing on a data set of 30.39 million tweets and natural language processing, this article aims to measure shifts in public sentiment opinion about migration during early stages of the COVID-19 pandemic in Germany, Italy, Spain, the United Kingdom, and the United States. Results show an increase of migration-related Tweets along with COVID-19 cases during national lockdowns in all five countries. Yet, we found no evidence of a significant increase in anti-immigration sentiment, as rises in the volume of negative messages are offset by comparable increases in positive messages. Additionally, we presented evidence of growing social polarization concerning migration, showing high concentrations of strongly positive and strongly negative sentiments.
We estimate the density and its derivatives using a local polynomial approximation to the logarithm of an unknown density function f. The estimator is guaranteed to be non-negative and achieves the same optimal rate of convergence in the interior as on the boundary of the support of f. The estimator is therefore well-suited to applications in which non-negative density estimates are required, such as in semiparametric maximum likelihood estimation. In addition, we show that our estimator compares favorably with other kernel-based methods, both in terms of asymptotic performance and computational ease. Simulation results confirm that our method can perform similarly or better in finite samples compared to these alternative methods when they are used with optimal inputs, that is, an Epanechnikov kernel and optimally chosen bandwidth sequence. We provide code in several languages.
Novel navigation applications provide a driving behavior score for each finished trip to promote safe driving, which is mainly based on experts’ domain knowledge. In this paper, with automobile insurance claims data and associated telematics car driving data, we propose a supervised driving risk scoring neural network model. This one-dimensional convolutional neural network takes time series of individual car driving trips as input and returns a risk score in the unit range of (0,1). By incorporating credibility average risk score of each driver, the classical Poisson generalized linear model for automobile insurance claims frequency prediction can be improved significantly. Hence, compared with non-telematics-based insurers, telematics-based insurers can discover more heterogeneity in their portfolio and attract safer drivers with premiums discounts.
This paper highlights a tension between semiparametric efficiency and bootstrap consistency in the context of a canonical semiparametric estimation problem, namely the problem of estimating the average density. It is shown that although simple plug-in estimators suffer from bias problems preventing them from achieving semiparametric efficiency under minimal smoothness conditions, the nonparametric bootstrap automatically corrects for this bias and that, as a result, these seemingly inferior estimators achieve bootstrap consistency under minimal smoothness conditions. In contrast, several “debiased” estimators that achieve semiparametric efficiency under minimal smoothness conditions do not achieve bootstrap consistency under those same conditions.
Digital identity (eID) systems are a crucial piece in the digital services ecosystem. They connect individuals to a variety of socioeconomic opportunities but can also reinforce power asymmetries between organizations and individuals. Data collection practices can negatively impact an individual’s right to privacy, autonomy, and self-determination. Protecting individual rights, however, may be at odds with imperatives of profit maximization or national security. The use of eID technologies is hence highly contested. Current approaches to governing eID systems have been unable to fully address the trade-offs between the opportunities and risks associated with these systems. The responsible innovation (RI) literature provides a set of principles to govern disruptive innovations, such as eID systems, toward societally desirable outcomes. This article uses RI principles to develop a framework to govern eID systems in a more inclusive, responsible, and user-centered manner. The proposed framework seeks to complement existing practices for eID system governance by bringing forth principles of deliberation and democratic engagement to build trust amongst stakeholders of the eID system and deliver shared socioeconomic benefits.