To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
A prototypical (although somewhat idealized) workflow in any scientific investigation starts with the design of the experiment to probe a question or hypothesis of interest. The experiment is modeled using several plausible mechanisms. The experiment is conducted and the data are collected. These data are finally analyzed to identify the most adequate mechanism, meaning the one among those considered that best explains the data. Although an experiment is supposed to be repeatable, this is not always possible, particularly if the system under study is chaotic or random in nature. When this is the case, the mechanisms above are expressed as probability distributions. We then talk about probabilistic modeling --- albeit with not one but several probability distributions. It is as if we contemplate several probability experiments, and the goal of statistical inference is to decide on the most plausible one in view of the collected data. We introduce core concepts such as estimators, confidence intervals, and tests.
The chapter focuses on discrete probability spaces, where probability calculations are combinatorial in nature. Urn models are presented as the quintessential discrete experiments.
Statistics is the science of data collection and data analysis. We provide, in this chapter, a brief introduction to principles and techniques for data collection, traditionally divided into survey sampling and experimental design --- each the subject of a rich literature. While most of this book is on mathematical theory, covering aspects of Probability Theory and Statistics, the collection of data is, by nature, much more practical, and often requires domain-specific knowledge. And careful data collection is of paramount importance. Indeed, data that were improperly collected can be completely useless and unsalvageable by any technique of analysis. And it is worth keeping in mind that the collection phase is typically much more expensive that the analysis phase that ensues (e.g., clinical trials, car crash tests, etc). Thus the collection of data should be carefully planned according to well-established protocols or with expert advice. We discuss the basics of data collection in this chapter.
This paper is concerned with the optimal number of redundant allocation to n-component coherent systems consisting of heterogeneous dependent components. We assume that the system is built up of L groups of different components, $L\geq 1$, where there are $n_i$ components in group i, and $\sum_{i=1}^{L}n_i=n$. The problem of interest is to allocate $v_i$ active redundant components to each component of type i, $i=1,\dots, L$. To get the optimal values of $v_i$ we propose two cost-based criteria. One of them is introduced based on the costs of renewing the failed components and the costs of refreshing the alive ones at the system failure time. The other criterion is proposed based on the costs of replacing the system at its failure time or at a predetermined time $\tau$, whichever occurs first. The expressions for the proposed functions are derived using the mixture representation of the system reliability function based on the notion of survival signature. We assume that a given copula function models the dependency structure between the components. In the particular case that the system is a series-parallel structure, we provide the formulas for the proposed cost-based functions. The results are discussed numerically for some specific coherent systems.
This chapter introduces Kolmogorov’s probability axioms and related terminology and concepts such as outcomes and events, sigma-algebras, probability distributions and their properties.
In this chapter we introduce and briefly discuss some properties of estimators and tests that make it possible to compare multiple methods addressing the same statistical problem. We discuss the notions of sufficiency and consistency, and various notions of optimality (including minimax optimality), both for estimators and for tests.
In a wide range of real-life situations, not one but several, even many hypotheses are to be tested, and not accounting for multiple inference can lead to a grossly incorrect analysis. In this chapter we look closely at this important issue, describing some pitfalls and presenting remedies that `correct’ for this multiplicity. Combination tests assess whether there is evidence against any of the null hypotheses being tested. Other procedures aim instead at identifying the null hypotheses that are not congruent with the data while controlling some notion of error rate.
Randomization was presented in a previous chapter as an essential ingredient in the collection of data, both in survey sampling and in experimental design. We argue here that randomization is the essential foundation of statistical inference: It leads to conditional inference in an almost canonical way, and allows for causal inference, which are the two topics covered in the chapter.
We prove a surprising symmetry between the law of the size $G_n$ of the greedy independent set on a uniform Cayley tree $ \mathcal{T}_n$ of size n and that of its complement. We show that $G_n$ has the same law as the number of vertices at even height in $ \mathcal{T}_n$ rooted at a uniform vertex. This enables us to compute the exact law of $G_n$. We also give a Markovian construction of the greedy independent set, which highlights the symmetry of $G_n$ and whose proof uses a new Markovian exploration of rooted Cayley trees that is of independent interest.
Estimating a proportion is one of the most basic problems in statistics. Although basic, it arises in a number of important real-life situations. Examples include election polls, conducted to estimate the proportion of people that will vote for a particular candidate; quality control, where the proportion of defective items manufactured at a particular plant or assembly line needs to be monitored, and one may resort to statistical inference to avoid having to check every single item; and clinical trials, which are conducted in part to estimate the proportion of people that would benefit (or suffer serious side effects) from receiving a particular treatment. The fundamental model is that of Bernoulli trials. The binomial family of distributions plays a central role. Also discussed are sequential designs, which lead to negative binomial distributions.
Dual-purpose sorghum response to anthracnose disease, growth, and yield was undertaken in Derashe and Arba Minch trial sites during March–June 2018 and 2019. Five sorghum varieties and Rara (local check) were arranged in a randomized complete block design with four replications. Variety Chelenko exhibited the tallest main crop plant height (430 cm) while Dishkara was the tallest (196.65 cm) at ratoon crop harvesting. Rara had a higher tiller number (main = 6.73, ratoon = 9.73) among the varieties. Dishkara and Chelenko varieties produced 50 and 10% more dry biomass yield (DBY) than the overall mean DBY, while Konoda produced 40% less. Although the anthracnose infestation was highest on the varieties Konoda (percentage severity index [PSI] = 20.37%) and NTJ_2 (PSI = 32.19%), they produced significantly (p < .001) higher grain yield (3.89 t/ha) than others. Under anthracnose pressure, Chelenko and Dishkara varieties are suggested for dry matter yield while NTJ_2 for grain yield production in the study area and similar agroecology.
We consider an experiment that yields, as data, a sample of independent and identically distributed (real-valued) random variables with a common distribution on the real line. The estimation of the underlying mean and median is discussed at length, and bootstrap confidence intervals are constructed. Tests comparing the underlying distribution to a given distribution (e.g., the standard normal distribution) or a family of distribution (e.g., the normal family of distributions) are introduced. Censoring, which is very common in some clinical trials, is briefly discuss.
The creamatocrit is a simple technique for estimating the lipid content of milk, widely adopted for clinical and research purposes. We evaluated the effect of long-term cryogenic storage on the creamatocrit for human milk.
Methods
Frozen and thawed milk specimens (n = 18) were subjected to the creamatocrit technique. The specimens were reanalyzed after long-term cryogenic storage (10 years at <70°C). The correlation between pre- and post-storage values was tested, and their differences were analyzed using the Bland–Altman plot.
Results
The pre- and post-storage values were highly correlated (r = 0.960, p < .0001). The Bland–Altman plot revealed a positive association between their differences and means (Pitman’s test r = 0.743, p < .001), suggesting the presence of nonconstant bias across the creamatocrit range. Long-term storage of human milk may introduce subtle bias to the creamatocrit in replicating pre-storage values. Further research should evaluate whether this bias is statistically correctable.
During military operations, soldiers are required to successfully complete numerous physical and cognitive tasks concurrently. Understanding the typical variance in research tools that may be used to provide insight into the interrelationship between physical and cognitive performance is therefore highly important. This study assessed the inter-day variability of two military-specific cognitive assessments: a Military-Specific Auditory N-Back Task (MSANT) and a Shoot-/Don’t-Shoot Task (SDST) in 28 participants. Limits of agreement ±95% confidence intervals, standard error of the mean, and smallest detectable change were calculated to quantify the typical variance in task performance. All parameters within the MSANT and SDST demonstrated no mean difference for trial visit in either the seated or walking condition, with equivalency demonstrated for the majority of comparisons. Collectively, these data provided an indication of the typical variance in MSANT and SDST performance, while demonstrating that both assessments can be used during seated and walking conditions.