To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This book looks at how numbers and statistics have been used to underpin quality in news reporting. In doing so, the aim is to challenge some common assumptions about how journalists engage and use statistics in their quest for quality news. It seeks to improve our understanding about the usage of data and statistics as a primary means for the construction of social reality. This is a task, in our view, that is urgent in times of 'post-truth' politics and the rise of 'fake news'. In this sense, the quest to produce 'quality' news, which seems to require incorporating statistics and engaging with data, as laudable and straightforward as it sounds, is instead far more problematic and complex than what is often accounted for.
Building upon the success of the first edition, Statistics Using Stata uses the latest version of Stata to meet the needs of today's students. Engaging and accessible for students from a variety of mathematical backgrounds, this textbook integrates statistical concepts with the Stata (version 16) software package. It aligns Stata commands with examples based on real data, enabling students to understand statistics in a way that reflects statistical practice. Capitalizing on Stata's menu-driven 'point and click' and program syntax interface, the chapters guide students from the comfortable 'point and click' environment to the beginnings of statistical programming. Its coverage of essential topics gives instructors flexibility in curriculum planning and provides students with more advanced material to prepare for future work. Online resources - including solutions to exercises, PowerPoint slides, and Stata syntax (do-files) for each chapter - allow students to review independently and adapt code to analyze new problems.
The prevalence of Chagas disease has decreased in the Americas region due to vector control measures. However, non-vectorial transmission through blood transfusions and organ transplantation has gained importance in recent years. Screening among blood and organ donors are essential to reduce Trypanosoma cruzi transmission and could provide information to estimate population prevalence. We conducted a cross-sectional study on the prevalence of immunoglobulin G (IgG) antibodies against T. cruzi in healthy blood donors, solid organ donors and heart transplant recipients from 2012 to 2019. We found a total of 99 357 IgG T. cruzi results during the study period. The cumulative seroprevalence in healthy blood donors was 0.13% (95% confidence interval (CI) 0.10–0.15), in organ donors was 0.53% (95% CI 0.06–1.92) and in heart transplant recipients was 3.03 (95% CI 0.07–15.75). Seroprevalence trend in healthy blood donors showed annual increase between 2012 and 2015, decreasing in the following years. No trend was seen in organ donors neither heart recipients. Adjusted rates did not show difference by sex and age among blood donors. No significant increases in seroprevalence T. cruzi were found during the study period. T. cruzi transmission remains low.
Although the progression of invasive aspergillosis (IA) shares some risk factors in the development of active pulmonary tuberculosis (PTB), however, the prevalence of IA in suspected PTB remains unclear. During a period of 1 year (from January 2016 to December 2016), consecutive patients with suspected PTB were included in a referral TB hospital. Data, including demographic information and underlying diseases, were collected from medical records. PTB were all confirmed by mycobacterial culture (Lowenstein–Jensen medium). IA were diagnosed as proven or probable according to the criteria of the 2008 EORTC/MSG definitions. A descriptive analysis was performed to estimate the corresponding prevalence. During the study year, 1507 patients have a positive mycobacterial culture, with a mean age of 45.6 (s.d. 19.9) years old and a female:male ratio of 1:4. Among the 82 patients with non-tuberculous mycobacterial diseases, two patients (2.44%, 95% CI 0.67–8.46%) were diagnosed as IA (one proven and one probable); two probable IA patients (0.15%, 95% CI 0.04–0.55%) were diagnosed in PTB patients (n = 1315), and all were retreatment cases. In addition, all four IA patients (100%) exhibited cavities in both lobes on radiograph. In China, the prevalence of IA is low in active PTB patients. However, when high-risk factors for IA are encountered in PTB patients, further investigations are required and empirically treatment for IA might be warranted.
Drift analysis is one of the state-of-the-art techniques for the runtime analysis of randomized search heuristics (RSHs) such as evolutionary algorithms (EAs), simulated annealing, etc. The vast majority of existing drift theorems yield bounds on the expected value of the hitting time for a target state, for example the set of optimal solutions, without making additional statements on the distribution of this time. We address this lack by providing a general drift theorem that includes bounds on the upper and lower tail of the hitting time distribution. The new tail bounds are applied to prove very precise sharp-concentration results on the running time of a simple EA on standard benchmark problems, including the class of general linear functions. On all these problems, the probability of deviating by an r-factor in lower-order terms of the expected time decreases exponentially with r. The usefulness of the theorem outside the theory of RSHs is demonstrated by deriving tail bounds on the number of cycles in random permutations. All these results handle a position-dependent (variable) drift that was not covered by previous drift theorems with tail bounds. Finally, user-friendly specializations of the general drift theorem are given.
The aim was to analyse invasive pneumococcal disease (IPD) serotypes in children aged ⩽17 years according to clinical presentation and antimicrobial susceptibility. We conducted a prospective study (January 2012–June 2016). IPD cases were diagnosed by culture and/or real-time polymerase chain reaction (PCR). Demographic, microbiological and clinical data were analysed. Associations were assessed using the odds ratio (OR) and 95% confidence intervals (CI). Of the 253 cases, 34.4% were aged <2 years, 38.7% 2–4 years and 26.9% 5–17 years. Over 64% were 13-valent pneumococcal conjugate vaccine (PCV13) serotypes. 48% of the cases were diagnosed only by real-time PCR. Serotypes 3 and 1 were associated with complicated pneumonia (P < 0.05) and non-PCV13 serotypes with meningitis (OR 7.32, 95% CI 2.33–22.99) and occult bacteraemia (OR 3.6, 95% CI 1.56–8.76). Serotype 19A was more frequent in children aged <2 years and serotypes 3 and 1 in children aged 2–4 years and 5–17 years, respectively. 36.1% of cases were not susceptible to penicillin and 16.4% were also non-susceptible to cefotaxime. Serotypes 14, 24F and 23B were associated with non-susceptibility to penicillin (P < 0.05) and serotypes 11, 14 and 19A to cefotaxime (P < 0.05). Serotype 19A showed resistance to penicillin (P = 0.002). In conclusion, PCV13 serotypes were most frequent in children aged ⩽17 years, mainly serotypes 3, 1 and 19A. Non-PCV13 serotypes were associated with meningitis and occult bacteraemia and PCV13 serotypes with pneumonia. Non-susceptibility to antibiotics of non-PCV13 serotypes should be monitored.
We prove two estimates for the expectation of the exponential of a complex function of a random permutation or subset. Using this theory, we find asymptotic expressions for the expected number of copies and induced copies of a given graph in a uniformly random graph with degree sequence(d1, …, dn) as n→ ∞. We also determine the expected number of spanning trees in this model. The range of degrees covered includes dj= λn + O(n1/2+ε) for some λ bounded away from 0 and 1.
Vaccination has reduced the disease burden of vaccine-preventable diseases. However, the extent to which seasonal cycles of immunity could influence vaccine-induced immunity is not well understood. A national cross-sectional serosurveillance study performed in the Netherlands (Pienter-2) yielded data to investigate whether season of vaccination was associated with antibody responses induced by DT-IPV (diphtheria, tetanus and poliomyelitis), MMR (measles, mumps and rubella) and meningococcus C (MenC) vaccines in children. In total, 434 children met the inclusion criteria to study DT-IPV immunity, 811 for MMR and 311 for MenC. Differences in log(antibody levels) by season of vaccination were investigated with linear multivariable regression analyses. Seroconversion rates varied according to season of vaccination for rubella (90% of autumn-vaccinated children vs. 99% of winter-vaccinated had concentrations above cut-off levels). Summer-vaccinated boys showed a slower decline of tetanus antibodies (6% per month), in comparison with winter-vaccinated boys. In conclusion, season of vaccination showed little association with immunological protection. However, a number of associations were seen with a P-value of about 0.03; and adding data from a just-completed nationwide serological study might add more power to the current study. Further immunological and longitudinal investigations could help understand the mechanisms of seasonal influence in vaccine-induced responses.
SARS-CoV-2, the causative agent of coronavirus disease 19 (COVID-19), was identified in Wuhan, China. Since then, the novel coronavirus started to be compared to influenza. The haematological parameters and inflammatory indexes are associated with severe illness in COVID-19 patients. In this study, the laboratory data of 120 COVID-19 patients, 100 influenza patients and 61 healthy controls were evaluated. Lower lymphocytes, eosinophils, basophils, platelets and higher delta neutrophil index (DNI), neutrophil-to-lymphocyte ratio (NLR) and platelet-to-lymphocyte ratio (PLR) were found in COVID-19 and influenza groups compared to healthy controls. The eosinophils, lymphocytes and PLR made the highest contribution to differentiate COVID-19 patients from healthy controls (area under the curves (AUCs): 0.819, 0.817 and 0.716, respectively; P-value is <0.0001 for all). The NLR, the optimal cut-off value was 3.58, which resulted in a sensitivity of 30.8 and a specificity of 100 (AUC: 0.677, P < 0.0001). Higher leucocytes, neutrophils, DNI, NLR, PLR and lower lymphocytes, red blood cells, haemoglobin, haematocrit levels were found in severe patients at the end of treatment. Nonsevere patients showed an upward trend for lymphocytes, eosinophils and platelets, and a downward trend for neutrophils, DNI, NLR and PLR. However, there was an increasing trend for eosinophils, platelets and PLR in severe patients. In conclusion, NLR and PLR can be used as biomarkers to distinguish COVID-19 patients from healthy people and to predict the severity of COVID-19. The increasing value of PLR during follow-up may be more useful compared to NLR to predict the disease severity.
This paper investigates a high-dimensional vector-autoregressive (VAR) model in mortality modeling and forecasting. We propose an extension of the sparse VAR (SVAR) model fitted on the log-mortality improvements, which we name “spatially penalized smoothed VAR” (SSVAR). By adaptively penalizing the coefficients based on the distances between ages, SSVAR not only allows a flexible data-driven sparsity structure of the coefficient matrix but simultaneously ensures interpretable coefficients including cohort effects. Moreover, by incorporating the smoothness penalties, divergence in forecast mortality rates of neighboring ages is largely reduced, compared with the existing SVAR model. A novel estimation approach that uses the accelerated proximal gradient algorithm is proposed to solve SSVAR efficiently. Similarly, we propose estimating the precision matrix of the residuals using a spatially penalized graphical Lasso to further study the dependency structure of the residuals. Using the UK and France population data, we demonstrate that the SSVAR model consistently outperforms the famous Lee–Carter, Hyndman–Ullah, and two VAR-type models in forecasting accuracy. Finally, we discuss the extension of the SSVAR model to multi-population mortality forecasting with an illustrative example that demonstrates its superiority in forecasting over existing approaches.
A cluster of 18 scarlet fever cases and large illness absenteeism (32%, 58/184) in a school prompted concern and further investigation. We conducted telephone interviews with parents to ascertain cases and better comprehend parents' views. We identified 19 cases, of which 13 reported scarlet fever diagnosis by a physician and only seven fulfilled the probable case definition. We concluded that the outbreak was far smaller than suspected and found that communication and reporting could be improved. Accurate information and communication is essential in an outbreak; the school's concern could have been alleviated sooner and response measures better targeted.
Variable annuity (VA) policies are typically issued on mutual funds invested in both fixed income and equity asset classes. However, due to the lack of specialized models to represent the dynamics of fixed income fund returns, the literature has primarily focused on studying long-term investment guarantees on single-asset equity funds. This article develops a mixed bond and equity fund model in which the fund return is linked to movements of the yield curve. Theoretical motivation for our proposed specification is provided through an analogy with a portfolio of rolling horizon bonds. Moreover, basis risk between the portfolio return and its risk drivers is naturally incorporated into our framework. Numerical results show that the fit of our model to Canadian VA data is adequate. Finally, the valuation of VAs is illustrated and it is found that the prevailing interest rate environment can have a substantial impact on guarantee costs.
The disjointness graph G = G(𝒮) of a set of segments 𝒮 in ${\mathbb{R}^d}$, $$d \ge 2$$, is a graph whose vertex set is 𝒮 and two vertices are connected by an edge if and only if the corresponding segments are disjoint. We prove that the chromatic number of G satisfies $\chi (G) \le {(\omega (G))^4} + {(\omega (G))^3}$, where ω(G) denotes the clique number of G. It follows that 𝒮 has Ω(n1/5) pairwise intersecting or pairwise disjoint elements. Stronger bounds are established for lines in space, instead of segments.
We show that computing ω(G) and χ(G) for disjointness graphs of lines in space are NP-hard tasks. However, we can design efficient algorithms to compute proper colourings of G in which the number of colours satisfies the above upper bounds. One cannot expect similar results for sets of continuous arcs, instead of segments, even in the plane. We construct families of arcs whose disjointness graphs are triangle-free (ω(G) = 2), but whose chromatic numbers are arbitrarily large.
When we consider a probability distribution about how many COVID-19-infected people will transmit the disease, two points become important. First, there could be super-spreaders in these distributions/networks and second, the Pareto principle could be valid in these distributions/networks regarding estimation that 20% of cases were responsible for 80% of local transmission. When we accept that these two points are valid, the distribution of transmission becomes a discrete Pareto distribution, which is a kind of power law. Having such a transmission distribution, then we can simulate COVID-19 networks and find super-spreaders using the centricity measurements in these networks. In this research, in the first we transformed a transmission distribution of statistics and epidemiology into a transmission network of network science and second we try to determine who the super-spreaders are by using this network and eigenvalue centrality measure. We underline that determination of transmission probability distribution is a very important point in the analysis of the epidemic and determining the precautions to be taken.
We propose a stochastic model for claims reserving that captures dependence along development years within a single triangle. This dependence is based on a gamma process with a moving average form of order $p \ge 0$ which is achieved through the use of poisson latent variables. We carry out Bayesian inference on model parameters and borrow strength across several triangles, coming from different lines of businesses or companies, through the use of hierarchical priors. We carry out a simulation study as well as a real data analysis. Results show that reserve estimates, for the real data set studied, are more accurate with our gamma dependence model as compared to the benchmark over-dispersed poisson that assumes independence.
Catastrophic loss data are known to be heavy-tailed. Practitioners then need models that are able to capture both tail and modal parts of claim data. To this purpose, a new parametric family of loss distributions is proposed as a gamma mixture of the generalized log-Moyal distribution from Bhati and Ravi (2018), termed the generalized log-Moyal gamma (GLMGA) distribution. While the GLMGA distribution is a special case of the GB2 distribution, we show that this simpler model is effective in regression modeling of large and modal loss data. Regression modeling and applications to risk measurement are illustrated using a detailed analysis of a Chinese earthquake loss data set, comparing with the results of competing models from the literature. To this end, we discuss the probabilistic characteristics of the GLMGA and statistical estimation of the parameters through maximum likelihood. Further illustrations of the applicability of the new class of distributions are provided with the fire claim data set reported in Cummins et al. (1990) and a Norwegian fire losses data set discussed recently in Bhati and Ravi (2018).
In this study, an analysis of the Chilean public health response to mitigate the spread of COVID-19 is presented. The analysis is based on the daily transmission rate (DTR). The Chilean response has been based on dynamic quarantines, which are established, lifted or prolonged based on the percentage of infected individuals in the fundamental administrative sections, called communes. This analysis is performed at a national level, at the level of the Metropolitan Region (MR) and at the commune level in the MR according to whether the commune did or did not enter quarantine between late March and mid-May of 2020. The analysis shows a certain degree of efficacy in controlling the pandemic using the dynamic quarantine strategy. However, it also shows that apparent control has only been partially achieved to date. With this policy, the control of the DTR partially falls to 4%, where it settles, and the MR is the primary vector of infection at the country level. For this reason, we can conclude that the MR has not managed to control the disease, with variable results within its own territory.
It has become standard practice in the non-life insurance industry to employ generalized linear models (GLMs) for insurance pricing. However, these GLMs traditionally work only with a priori characteristics of policyholders, while nowadays we increasingly have a posteriori information of individual customers available across multiple product categories. In this paper, we therefore develop a framework to capture this a posteriori information over several product lines using a dynamic claim score. More specifically, we extend the bonus-malus-panel model of Boucher and Inoussa (2014) and Boucher and Pigeon (2018) to include claim scores from other product categories and to allow for nonlinear effects of these scores. The application of the proposed multi-product framework to a Dutch property and casualty insurance portfolio shows that customers’ individual claims experience can have a significant impact on the risk classification. Moreover, it indicates that considerably more profits can be gained by accounting for their multi-product claims experience.
Some studies have suggested that the Toll-like receptor 9 polymorphism (TLR9 rs352140) is closely related to the risk of bacterial meningitis (BM), but this is subject to controversy. This study set out to estimate whether the TLR9 rs352140 polymorphism confers an increased risk of BM. Relevant literature databases were searched including PubMed, Embase, the Cochrane Library and China National Knowledge Infrastructure (CNKI) up to August 2020. Seven case-control studies from four publications were enrolled in the present meta-analysis. Odds ratios (OR) and confidence intervals (95% CI) were calculated to estimate associations between BM risk and the target polymorphism. Significant associations identified were allele contrast (A vs. G: OR 0.66, 95% CI 0.59–0.75, P = 0.000), homozygote comparison (AA vs. AG/GG: OR 0.62, 95% CI 0.49–0.78, P = 0.000), heterozygote comparison (A vs. G: OR 0.74, 95% CI 0.61–0.91, P = 0.005), recessive genetic model (AA vs. AG/GG: OR 0.78, 95% CI 0.65–0.93, P = 0.006) and dominant genetic model (AA vs. AG/GG: OR 0.70, 95% CI 0.57–0.85, P = 0.000). The findings indicate that, in contrast to some studies, the TLR9 rs352140 polymorphism is associated with a decreased risk for BM.