To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter introduces the basic principles of hypothesis testing. We explain the ideas of the null and alternative hypothesis, and the role of the test statistic in quantifying the discrepancy between the null hypothesis and the observed data sample. The value of the test statistic is used to estimate the probability of incorrectly rejecting the null hypothesis (i.e. the test significance). We also discuss the Type II error probability and the related concept of test power. The process of statistical decision-making for formulated hypotheses is illustrated with the help of simple categorical data, assessed by the goodness-of-fit test. We also touch on the issues of sampling bias, suitable sample sizes, and briefly outline a different approach to exploring our hypotheses using Bayesian statistics. Finally, we discuss the limitations and failures of hypothesis testing in statistical practice and outline improvements to this approach. The methods described in this chapter are accompanied by a carefully-explained guide to the R code needed for their use.
This chapter provides a brief introduction to the structural equation model (SEM), which typically postulates more complex relationships between variables than that of linear regression. In SEM, a variable can serve both as a predictor for one or multiple other variables, and also as a response variable, affected by other predictors. The suitability of the SEM can be tested by comparing it with the observed data, and further evaluated by a range of fit indices. We demonstrate the use of the simplest form of SEM, which is equivalent to the formerly used method of path analysis. Finally, we discuss the extent to which we can use structural equation models to identify causal relationships in study systems. The methods described in this chapter are accompanied by a carefully-explained guide to the R code needed for their use, in this case employing the sem package.
The chapter starts by describing the appearance and interpretation of a regression tree, followed by a more detailed explanation of the recursive partitioning algorithm used in the construction of tree models. We describe how optimum tree complexity is chosen based on the results of a crossvalidation procedure, and how a tree model can be simplified via its pruning. The concepts of competing and surrogate predictors are touched upon, both enabling a more effective application of the fitted tree models. The methods described in this chapter are accompanied by a carefully-explained guide to the R code needed for their use, in this case employing the rpart package.
We describe the Poisson distribution and how it is used to detect randomness in the spatial (or temporal) pattern of objects or events. We discuss several options for testing and quantifying the nature of spatial patterns, ranging from aggregated, through randomly arranged, up to regularly positioned objects. We demonstrate an approach based on comparing the variance and mean of count data, before moving to alternative methods that use the K-function alongside other similar functions, describing the change in spatial relationships across a range of spatial scales. Finally, we introduce the binomial distribution and how to estimate its p parameter (proportion of outcomes of a particular type within a set of observed objects) together with its confidence interval. The methods described in this chapter are accompanied by a carefully-explained guide to the R code needed for their use, including the spatstat and fitdistrplus packages.
The normal (Gaussian) distribution plays an essential role in statistical theory and is often referred to in applied statistical analysis, unfortunately sometimes in an inappropriate manner. We first describe the properties of this distribution and its two parameters (mean and variance), paying particular attention to the standardized normal distribution. We then discuss deviations of observed distributions from the normal distribution, measured by their skewness and kurtosis. Another important part of this chapter is the description of specialised graphs and of the tests allowing us to check whether a set of values is likely to originate from a normal distribution. But we generally discourage the reader from performing statistical tests when checking for normality of observed distributions. The methods described in this chapter are accompanied by a carefully-explained guide to the R code needed for their use.
We start by focusing on the important assumption of additivity for the effects of predictors in ANOVA models. Using a practical example, we show how the predictors used to explain response variables in biology oftentimes operate on a multiplicative scale, and thus stress the need to take this into account when choosing an appropriate data transformation. We introduce three important monotonic transformations, namely the log transformation, arcsine transformation, and square-root transformation. We thoroughly discuss the advantages and possible dangers of individual transformation types. The methods described in this chapter are accompanied by a carefully-explained guide to the R code needed for their use.
The anabolic effects of androgen on skeletal muscles are thought to be mediated by androgen receptor (AR). Although multiple studies concerning the effects of AR in males have been performed, the molecular mechanisms of AR in skeletal muscles remain unclear. Here we first confirmed that satellite cells from mouse hindlimb muscles express AR. We then generated satellite cell-specific AR knockout mice using Pax7CreERT2 and ARL2/Y mice to test whether AR in satellite cells is necessary for muscle regeneration. Surprisingly, we found that muscle regeneration was compromised in both Pax7CreERT2(Fan)/+ control mice and Pax7CreERT2(Fan)/+;ARL2/Y mice compared to ARL2/Y mice. However, Pax7CreERT2(Gaka)/+;ARL2/Y;R26tdTomato/+ mice showed no significant differences between control and mutant muscle regeneration. These findings indicate that AR in satellite cells is not essential for muscle regeneration. We propose that Pax7CreERT2(Fan)/+ control mice should be included in all experiments, because these mice negatively affect the muscle regeneration and show the mild regeneration phenotype.
Introducing common shocks is a popular dependence modelling approach, with some recent applications in loss reserving. The main advantage of this approach is the ability to capture structural dependence coming from known relationships. In addition, it helps with the parsimonious construction of correlation matrices of large dimensions. However, complications arise in the presence of “unbalanced data”, that is, when (expected) magnitude of observations over a single triangle, or between triangles, can vary substantially. Specifically, if a single common shock is applied to all of these cells, it can contribute insignificantly to the larger values and/or swamp the smaller ones, unless careful adjustments are made. This problem is further complicated in applications involving negative claim amounts. In this paper, we address this problem in the loss reserving context using a common shock Tweedie approach for unbalanced data. We show that the solution not only provides a much better balance of the common shock proportions relative to the unbalanced data, but it is also parsimonious. Finally, the common shock Tweedie model also provides distributional tractability.
After the 2003 SARS epidemic, China started constructing a primary-level emergency response system and focused on strengthening and implementation of policies, resource allocation. After 17 years of restructuring, China's primary-level response capabilities towards public health emergencies have greatly improved. During the coronavirus disease 2019 epidemic, primary-level administrative and medical personnel, social organisations, volunteers, etc. have played a significant role in providing professional services utilising the primary-level emergency response system of 17 years. However, China's organisations did not learn their lesson from the SARS epidemic, and certain problems are exposed in the system. By analysing the experience and shortcomings of China's disease prevention and control system at the primary level, we can focus on the development of disease control systems for major epidemics in the future.
We construct a high-order conditional distance covariance, which generalizes the notation of conditional distance covariance. The joint conditional distance covariance is defined as a linear combination of conditional distance covariances, which can capture the joint relation of many random vectors given one vector. Furthermore, we develop a new method of conditional independence test based on the joint conditional distance covariance. Simulation results indicate that the proposed method is very effective. We also apply our method to analyze the relationships of PM2.5 in five Chinese cities: Beijing, Tianjin, Jinan, Tangshan and Qinhuangdao by the Gaussian graphical model.
The emergence of 2019 novel coronavirus disease (COVID-19) is currently a global concern. In this study, our goal was to explore the changing expression levels of acute-phase reaction proteins (APRPs) in the serum of COVID-19 patients and to elucidate the immunological characteristics of COVID-19. In the study design, we recruited 72 COVID-19 patients, including 22 cases of mild degree, 38 cases of moderate degree and 12 cases of severe degree. We also recruited 20 patients with community-acquired pneumonia (CAP) and 20 normal control subjects as a comparison. Fasting venous blood was taken to detect the content of complement 3 (C3), complement 4 (C4), C-reactive protein (CRP), serum amyloid A (SAA) and prealbumin (PA). When compared the COVID-19 group with the CAP and normal control groups, respectively, the mean value of CRP and SAA in the COVID-19 group (including mild, moderate and severe patients) had increased significantly (P < 0.01), whereas the mean values of C3, C4 and PA decreased (P < 0.01). For the asymptomatic or mild symptomatic patients with COVID-19, the actual aggravation of disease may be more advanced than the clinical appearances. Meanwhile, the statistical analyses indicated that the development of COVID-19 brought about a significant increase in the content of CRP and SAA (P < 0.01), and a decline in the content of C3, C4 and PA (P < 0.01). These findings suggested that the changes in the level of APRPs could be used as indicators to identify the degree and progression of COVID-19, and the significant changes might demonstrate the aggravation of disease. This study provided a new approach to improve the clinical management plan and prognosis of COVID-19.
Given a fixed graph H, a real number p(0, 1) and an infinite Erdös–Rényi graph G ∼ G(∞, p), how many adjacency queries do we have to make to find a copy of H inside G with probability at least 1/2? Determining this number f(H, p) is a variant of the subgraph query problem introduced by Ferber, Krivelevich, Sudakov and Vieira. For every graph H, we improve the trivial upper bound of f(H, p) = O(p−d), where d is the degeneracy of H, by exhibiting an algorithm that finds a copy of H in time O(p−d) as p goes to 0. Furthermore, we prove that there are 2-degenerate graphs which require p−2+o(1) queries, showing for the first time that there exist graphs H for which f(H, p) does not grow like a constant power of p−1 as p goes to 0. Finally, we answer a question of Feige, Gamarnik, Neeman, Rácz and Tetali by showing that for any δ < 2, there exists α < 2 such that one cannot find a clique of order α log2n in G(n, 1/2) in nδ queries.
The Biofire® Film Array Meningitis Encephalitis (FAME) panel can rapidly diagnose common aetiologies but its impact in Colombia is unknown. A retrospective study of adults with CNS infections in one tertiary hospital in Colombia. The cohort was divided into two time periods: before and after the implementation of the Biofire® FAME panel in May 2016. A total of 98 patients were enrolled, 52 and 46 were enrolled in the Standard of Care (SOC) group and in the FAME group, respectively. The most common comorbidity was human immunodeficiency virus infection (47.4%). The median time to a change in therapy was significantly shorter in the FAME group than in the SOC group (3 vs. 137.3 h, P < 0.001). This difference was driven by the timing to appropriate therapy (2.1 vs. 195 h, P < 0.001) by identifying viral aetiologies. Overall outcomes and length of stay were no different between both groups (P > 0.2). The FAME panel detected six aetiologies that had negative cultures but missed identifying one patient with Cryptococcus neoformans. The introduction of the Biofire FAME panel in Colombia has facilitated the identification of viral pathogens and has significantly reduced the time to the adjustment of empirical antimicrobial therapy.
Case fatality rate (CFR) and doubling time are important characteristics of any epidemic. For coronavirus disease 2019 (COVID-19), wide variations in the CFR and doubling time have been noted among various countries. Early in the epidemic, CFR calculations involving all patients as denominator do not account for the hospitalised patients who are ill and will die in the future. Hence, we calculated cumulative CFR (cCFR) using only patients whose final clinical outcomes were known at a certain time point. We also estimated the daily average doubling time. Calculating CFR using this method leads to temporal stability in the fatality rates, the cCFR stabilises at different values for different countries. The possible reasons for this are an improved outcome rate by the end of the epidemic and a wider testing strategy. The United States, France, Turkey and China had high cCFR at the start due to low outcome rate. By 22 April, Germany, China and South Korea had a low cCFR. China and South Korea controlled the epidemic and achieved high doubling times. The doubling time in Russia did not cross 10 days during the study period.
As more people move back into densely populated cities, bike sharing is emerging as an important mode of urban mobility. In a typical bike-sharing system (BSS), riders arrive at a station and take a bike if it is available. After retrieving a bike, they ride it for a while, then return it to a station near their final destinations. Since space is limited in cities, each station has a finite capacity of docks, which cannot hold more bikes than its capacity. In this paper, we study BSSs with stations having a finite capacity. By an appropriate scaling of our stochastic model, we prove a mean-field limit and a central limit theorem for an empirical process of the number of stations with k bikes. The mean-field limit and the central limit theorem provide insight on the mean, variance, and sample path dynamics of large-scale BSSs. We also leverage our results to estimate confidence intervals for various performance measures such as the proportion of empty stations, the proportion of full stations, and the number of bikes in circulation. These performance measures have the potential to inform the operations and design of future BSSs.
Catheter-related blood-stream infections (CRBSIs) are the most common healthcare-associated blood-stream infections. They can be diagnosed by either semi-quantitative or quantitative methods, which may differ in diagnostic accuracy. A meta-analysis was undertaken to compare the diagnostic accuracy of semi-quantitative and quantitative methods for CRBSI. A systematic search of Medline, Scopus, Cochrane and Embase databases up to January 2020 was performed and subjected to a QUADAS (quality assessment of diagnostic accuracy studies 2) tool to evaluate the risk of bias among studies. The pooled sensitivity and specificity of the methods were determined and heterogeneity was evaluated using the χ2 test and I2. Publication bias was assessed using a Funnel plot and the Egger's test. In total, 45 studies were analysed with data from 11 232 patients. The pooled sensitivity and specificity of semi-quantitative methods were 85% (95% CI 79–90%) and 84% (95% CI 79–88%), respectively; and for quantitative methods were 85% (95% CI 79–90%) and 95% (95% CI 91–97%). Considerable heterogeneity was statistically evident (P < 0.001) by both methods with a correspondingly symmetrical Funnel plot that was confirmed by a non-significant Deek's test. We conclude that both semi-quantitative and quantitative methods are highly useful for screening for CRBSI in patients and display high sensitivity and specificity. Quantitative methods, particularly paired quantitative cultures, had the highest sensitivity and specificity and can be used to identify CRBSI cases with a high degree of certainty.
Trichosporon is a yeast-like basidiomycete, a conditional pathogenic fungus that is rare in the clinic but often causes fatal infections in immunocompromised individuals. Trichosporon asahii is the most common pathogenic fungus in this genus and the occurrence of infections has dramatically increased in recent years. Here, we report a systematic literature review detailing 140 cases of T. asahii infection reported during the past 23 years. Statistical analysis shows that T. asahii infections were most frequently reported within immunodeficient or immunocompromised patients commonly with blood diseases. Antibiotic use, invasive medical equipment and chemotherapy were the leading risk factors for acquiring infection. In vitro susceptibility, clinical information and prognosis analysis showed that voriconazole is the primary drug of choice in the treatment of T. asahii infection. Combination treatment with voriconazole and amphotericin B did not show superiority over either drug alone. Finally, we found that the types of infections prevalent in China are significantly different from those in other countries. These results provide detailed information and relevant clinical treatment strategies for the diagnosis and treatment of T. asahii infection.