To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Let G be a graph of minimum degree at least k and let Gp be the random subgraph of G obtained by keeping each edge independently with probability p. We are interested in the size of the largest complete minor that Gp contains when p = (1 + ε)/k with ε > 0. We show that with high probability Gp contains a complete minor of order $\tilde{\Omega}(\sqrt{k})$, where the ~ hides a polylogarithmic factor. Furthermore, in the case where the order of G is also bounded above by a constant multiple of k, we show that this polylogarithmic term can be removed, giving a tight bound.
Bayesian probability models uncertain knowledge and learning from observations. As a defining feature of optimal adversarial behaviour, Bayesian reasoning forms the basis of safety properties in contexts such as privacy and fairness. Probabilistic programming is a convenient implementation of Bayesian reasoning but the adversarial setting imposes obstacles to its use: approximate inference can underestimate adversary knowledge and exact inference is impractical in cases covering large state spaces. By abstracting distributions, the semantics of a probabilistic language, and inference, jointly termed probabilistic abstract interpretation, we demonstrate adversary models both approximate and sound. We apply the techniques to build a privacy-protecting monitor and describe how to trade off the precision and computational cost in its implementation while remaining sound with respect to privacy risk bounds.
This chapter offers an accessible introduction to the channel-based approach to Bayesian probability theory. This framework rests on algebraic and logical foundations, inspired by the methodologies of programming language semantics. It offers a uniform, structured and expressive language for describing Bayesian phenomena in terms of familiar programming concepts, like channel, predicate transformation and state transformation. The introduction also covers inference in Bayesian networks, which will be modelled by a suitable calculus of string diagrams.
Influenza vaccine effectiveness (VE) wanes over the course of a temperate climate winter season but little data are available from tropical countries with year-round influenza virus activity. In Singapore, a retrospective cohort study of adults vaccinated from 2013 to 2017 was conducted. Influenza vaccine failure was defined as hospital admission with polymerase chain reaction-confirmed influenza infection 2–49 weeks after vaccination. Relative VE was calculated by splitting the follow-up period into 8-week episodes (Lexis expansion) and the odds of influenza infection in the first 8-week period after vaccination (weeks 2–9) compared with subsequent 8-week periods using multivariable logistic regression adjusting for patient factors and influenza virus activity. Records of 19 298 influenza vaccinations were analysed with 617 (3.2%) influenza infections. Relative VE was stable for the first 26 weeks post-vaccination, but then declined for all three influenza types/subtypes to 69% at weeks 42–49 (95% confidence interval (CI) 52–92%, P = 0.011). VE declined fastest in older adults, in individuals with chronic pulmonary disease and in those who had been previously vaccinated within the last 2 years. Vaccine failure was significantly associated with a change in recommended vaccine strains between vaccination and observation period (adjusted odds ratio 1.26, 95% CI 1.06–1.50, P = 0.010).
Two hundred days after the first confirmed case of COVID-19 in Brazil, the epidemic has rapidly spread in metropolitan areas and advanced throughout the countryside. We followed the temporal epidemic pattern at São Paulo State, the most populous of the country, the first to have a confirmed case of COVID-19, and the one with the most significant number of cases until now. We analysed the number of new cases per day in each regional health department and calculated the effective reproduction number (Rt) over time. Social distance measures, along with improvement in testing and isolating positive cases, general population mask-wearing and standard health security protocols for essential and non-essential activities, were adopted and impacted on slowing down epidemic velocity but were insufficient to stop transmission.
Most textbooks on regression focus on theory and the simplest of examples. Real statistical problems, however, are complex and subtle. This is not a book about the theory of regression. It is about using regression to solve real problems of comparison, estimation, prediction, and causal inference. Unlike other books, it focuses on practical issues such as sample size and missing data and a wide range of goals and techniques. It jumps right in to methods and computer code you can use immediately. Real examples, real stories from the authors' experience demonstrate what regression can do and its limitations, with practical advice for understanding assumptions and implementing methods for experiments and observational studies. They make a smooth transition to logistic regression and GLM. The emphasis is on computation in R and Stan rather than derivations, with code available online. Graphics and presentation aid understanding of the models and model fitting.
The epidemic of coronavirus disease 2019 (COVID-19) began in China and had spread rapidly to many other countries. This study aimed to identify risk factors associated with delayed negative conversion of SARS-CoV-2 in COVID-19 patients. In this retrospective single-centre study, we included 169 consecutive patients with confirmed COVID-19 in Zhongnan Hospital of Wuhan University from 15th January to 2nd March. The cases were divided into two groups according to the median time of SARS-CoV-2 negative conversion. The differences between groups were compared. In total, 169 patients had a median virus negative conversion time of 18 days (interquartile range: 11–25) from symptom onset. Compared with the patients with short-term negative conversion, those with long-term conversion had an older age, higher incidence of comorbidities, chief complaints of cough and chest distress/breath shortness and severer illness on admission, higher level of leucocytes, neutrophils, aspartate aminotransferase, creatine kinase and erythrocyte sedimentation rate (ESR), lower level of CD3+CD4+ lymphocytes and albumin and more likely to receive mechanical ventilation. In multivariate analysis, cough, leucocytes, neutrophils and ESR were positively correlated with delayed virus negative conversion, and CD3+CD4+ lymphocytes were negatively correlated. The integrated indicator of leucocytes, neutrophils and CD3+CD4+ lymphocytes showed a good performance in predicting the negative conversion within 2 weeks (area under ROC curve (AUC) = 0.815), 3 weeks (AUC = 0.804), 4 weeks (AUC = 0.812) and 5 weeks (AUC = 0.786). In conclusion, longer quarantine periods might be more justified for COVID-19 patients with cough, higher levels of leucocytes, neutrophils and ESR and lower levels of CD3+CD4+ lymphocytes.
A general multi-type population model is considered, where individuals live and reproduce according to their age and type, but also under the influence of the size and composition of the entire population. We describe the dynamics of the population as a measure-valued process and obtain its asymptotics as the population grows with the environmental carrying capacity. Thus, a deterministic approximation is given, in the form of a law of large numbers, as well as a central limit theorem. This general framework is then adapted to model sexual reproduction, with a special section on serial monogamic mating systems.
Insurance companies make extensive use of Monte Carlo simulations in their capital and solvency models. To overcome the computational problems associated with Monte Carlo simulations, most large life insurance companies use proxy models such as replicating portfolios (RPs). In this paper, we present an example based on a variable annuity guarantee, showing the main challenges faced by practitioners in the construction of RPs: the feature engineering step and subsequent basis function selection problem. We describe how neural networks can be used as a proxy model and how to apply risk-neutral pricing on a neural network to integrate such a model into a market risk framework. The proposed model naturally solves the feature engineering and feature selection problems of RPs.
It is well-known that in a small Pólya urn, i.e., an urn where the second largest real part of an eigenvalue is at most half the largest eigenvalue, the distribution of the numbers of balls of different colours in the urn is asymptotically normal under weak additional conditions. We consider the balanced case, and then give asymptotics of the mean and the covariance matrix, showing that after appropriate normalization, the mean and covariance matrix converge to the mean and covariance matrix of the limiting normal distribution.
This paper considers ergodic, continuous-time Markov chains $\{X(t)\}_{t \in (\!-\infty,\infty)}$ on $\mathbb{Z}^+=\{0,1,\ldots\}$. For an arbitrarily fixed $N \in \mathbb{Z}^+$, we study the conditional stationary distribution $\boldsymbol{\pi}(N)$ given the Markov chain being in $\{0,1,\ldots,N\}$. We first characterize $\boldsymbol{\pi}(N)$ via systems of linear inequalities and identify simplices that contain $\boldsymbol{\pi}(N)$, by examining the $(N+1) \times (N+1)$ northwest corner block of the infinitesimal generator $\textbf{\textit{Q}}$ and the subset of the first $N+1$ states whose members are directly reachable from at least one state in $\{N+1,N+2,\ldots\}$. These results are closely related to the augmented truncation approximation (ATA), and we provide some practical implications for the ATA. Next we consider an extension of the above results, using the $(K+1) \times (K+1)$ ($K > N$) northwest corner block of $\textbf{\textit{Q}}$ and the subset of the first $K+1$ states whose members are directly reachable from at least one state in $\{K+1,K+2,\ldots\}$. Furthermore, we introduce new state transition structures called (K, N)-skip-free sets, using which we obtain the minimum convex polytope that contains $\boldsymbol{\pi}(N)$.
The angular power spectrum is a natural tool to analyse the observed galaxy number count fluctuations. In a standard analysis, the angular galaxy distribution is sliced into concentric redshift bins and all correlations of its harmonic coefficients between bin pairs are considered—a procedure referred to as ‘tomography’. However, the unparalleled quality of data from oncoming spectroscopic galaxy surveys for cosmology will render this method computationally unfeasible, given the increasing number of bins. Here, we put to test against synthetic data a novel method proposed in a previous study to save computational time. According to this method, the whole galaxy redshift distribution is subdivided into thick bins, neglecting the cross-bin correlations among them; each of the thick bin is, however, further subdivided into thinner bins, considering in this case all the cross-bin correlations. We create a simulated data set that we then analyse in a Bayesian framework. We confirm that the newly proposed method saves computational time and gives results that surpass those of the standard approach.
Despite high exposure to Middle East respiratory syndrome coronavirus (MERS-CoV), the predictors for seropositivity in the context of husbandry practices for camels in Eastern Africa are not well understood. We conducted a cross-sectional survey to describe the camel herd profile and determine the factors associated with MERS-CoV seropositivity in Northern Kenya. We enrolled 29 camel-owning households and administered questionnaires to collect herd and household data. Serum samples collected from 493 randomly selected camels were tested for anti-MERS-CoV antibodies using a microneutralisation assay, and regression analysis used to correlate herd and household characteristics with camel seropositivity. Households reared camels (median = 23 camels and IQR 16–56), and at least one other livestock species in two distinct herds; a home herd kept near homesteads, and a range/fora herd that resided far from the homestead. The overall MERS-CoV IgG seropositivity was 76.3%, with no statistically significant difference between home and fora herds. Significant predictors for seropositivity (P ⩽ 0.05) included camels 6–10 years old (aOR 2.3, 95% CI 1.0–5.2), herds with ⩾25 camels (aOR 2.0, 95% CI 1.2–3.4) and camels from Gabra community (aOR 2.3, 95% CI 1.2–4.2). These results suggest high levels of virus transmission among camels, with potential for human infection.
We consider a fractional Brownian motion with linear drift such that its unknown drift coefficient has a prior normal distribution and construct a sequential test for the hypothesis that the drift is positive versus the alternative that it is negative. We show that the problem of constructing the test reduces to an optimal stopping problem for a standard Brownian motion obtained by a transformation of the fractional Brownian motion. The solution is described as the first exit time from some set, and it is shown that its boundaries satisfy a certain integral equation, which is solved numerically.
We provide the first generic exact simulation algorithm for multivariate diffusions. Current exact sampling algorithms for diffusions require the existence of a transformation which can be used to reduce the sampling problem to the case of a constant diffusion matrix and a drift which is the gradient of some function. Such a transformation, called the Lamperti transformation, can be applied in general only in one dimension. So, completely different ideas are required for the exact sampling of generic multivariate diffusions. The development of these ideas is the main contribution of this paper. Our strategy combines techniques borrowed from the theory of rough paths, on the one hand, and multilevel Monte Carlo on the other.
An important problem in modeling networks is how to generate a randomly sampled graph with given degrees. A popular model is the configuration model, a network with assigned degrees and random connections. The erased configuration model is obtained when self-loops and multiple edges in the configuration model are removed. We prove an upper bound for the number of such erased edges for regularly-varying degree distributions with infinite variance, and use this result to prove central limit theorems for Pearson’s correlation coefficient and the clustering coefficient in the erased configuration model. Our results explain the structural correlations in the erased configuration model and show that removing edges leads to different scaling of the clustering coefficient. We prove that for the rank-1 inhomogeneous random graph, another null model that creates scale-free simple networks, the results for Pearson’s correlation coefficient as well as for the clustering coefficient are similar to the results for the erased configuration model.
This study aimed to analyse the spatial–temporal distribution of COVID-19 mortality in Sergipe, Northeast, Brazil. It was an ecological study utilising spatiotemporal analysis techniques that included all deaths confirmed by COVID-19 in Sergipe, from 2 April to 14 June 2020. Mortality rates were calculated per 100 000 inhabitants and the temporal trends were analysed using a segmented log-linear model. For spatial analysis, the Kernel estimator was used and the crude mortality rates were smoothed by the empirical Bayesian method. The space–time prospective scan statistics applied the Poisson's probability distribution model. There were 391 COVID-19 registered deaths, with the majority among ⩾60 years old (62%) and males (53%). The most prevalent comorbidities were hypertension (40%), diabetes (31%) and cardiovascular disease (15%). An increasing mortality trend across the state was observed, with a higher increase in the countryside. An active spatiotemporal cluster of mortality comprising the metropolitan area and neighbouring cities was identified. The trend of COVID-19 mortality in Sergipe was increasing and the spatial distribution of deaths was heterogeneous with progression towards the countryside. Therefore, the use of spatial analysis techniques may contribute to surveillance and control of COVID-19 pandemic.