To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Statistics is the science of data collection and data analysis. We provide, in this chapter, a brief introduction to principles and techniques for data collection, traditionally divided into survey sampling and experimental design --- each the subject of a rich literature. While most of this book is on mathematical theory, covering aspects of Probability Theory and Statistics, the collection of data is, by nature, much more practical, and often requires domain-specific knowledge. And careful data collection is of paramount importance. Indeed, data that were improperly collected can be completely useless and unsalvageable by any technique of analysis. And it is worth keeping in mind that the collection phase is typically much more expensive that the analysis phase that ensues (e.g., clinical trials, car crash tests, etc). Thus the collection of data should be carefully planned according to well-established protocols or with expert advice. We discuss the basics of data collection in this chapter.
This chapter introduces Kolmogorov’s probability axioms and related terminology and concepts such as outcomes and events, sigma-algebras, probability distributions and their properties.
Technology has transformed how people interact with one another. According to two recent Pew Research Center surveys (2021a; 2021b), 97 per cent of United States adults have a cell phone, 85 per cent have a smartphone, 93 per cent use the Internet, and 77 per cent have broadband Internet access at home. The Internet has opened countless doors by providing unprecedented access to information and connecting people. While we know from laboratory research that context and collaboration can influence memory, little is known about how virtual collaboration affects memory and whether in-person studies generalise to virtual contexts. In this article, we discuss the challenges, value, and broader relevance of extending laboratory-based memory research to online platforms. In doing so, we report a virtual collaborative memory paradigm, where we examine individual and social remembering in a fully online, chat-based setting, and present data from two completely virtual experiments. In Experiment 1, online participants studied a word list and, in a chatroom, recalled the words either alone (as controls) or with two other participants. Surprisingly, collaborative inhibition – the robust finding of lower recall in collaborative groups than controls – disappeared. This outcome occurred because participants who worked alone recalled less than what we see in in-person studies. In Experiment 2, where instructions were modified and an experimenter was present, individual performance improved, resulting in collaborative inhibition. We reflect on the contextualised nature of this effect in online settings, for both collaborative and individual remembering, and on the implications for the study of memory in the digital age.
In this chapter we introduce and briefly discuss some properties of estimators and tests that make it possible to compare multiple methods addressing the same statistical problem. We discuss the notions of sufficiency and consistency, and various notions of optimality (including minimax optimality), both for estimators and for tests.
In a wide range of real-life situations, not one but several, even many hypotheses are to be tested, and not accounting for multiple inference can lead to a grossly incorrect analysis. In this chapter we look closely at this important issue, describing some pitfalls and presenting remedies that `correct’ for this multiplicity. Combination tests assess whether there is evidence against any of the null hypotheses being tested. Other procedures aim instead at identifying the null hypotheses that are not congruent with the data while controlling some notion of error rate.
Randomization was presented in a previous chapter as an essential ingredient in the collection of data, both in survey sampling and in experimental design. We argue here that randomization is the essential foundation of statistical inference: It leads to conditional inference in an almost canonical way, and allows for causal inference, which are the two topics covered in the chapter.
Estimating a proportion is one of the most basic problems in statistics. Although basic, it arises in a number of important real-life situations. Examples include election polls, conducted to estimate the proportion of people that will vote for a particular candidate; quality control, where the proportion of defective items manufactured at a particular plant or assembly line needs to be monitored, and one may resort to statistical inference to avoid having to check every single item; and clinical trials, which are conducted in part to estimate the proportion of people that would benefit (or suffer serious side effects) from receiving a particular treatment. The fundamental model is that of Bernoulli trials. The binomial family of distributions plays a central role. Also discussed are sequential designs, which lead to negative binomial distributions.
Dual-purpose sorghum response to anthracnose disease, growth, and yield was undertaken in Derashe and Arba Minch trial sites during March–June 2018 and 2019. Five sorghum varieties and Rara (local check) were arranged in a randomized complete block design with four replications. Variety Chelenko exhibited the tallest main crop plant height (430 cm) while Dishkara was the tallest (196.65 cm) at ratoon crop harvesting. Rara had a higher tiller number (main = 6.73, ratoon = 9.73) among the varieties. Dishkara and Chelenko varieties produced 50 and 10% more dry biomass yield (DBY) than the overall mean DBY, while Konoda produced 40% less. Although the anthracnose infestation was highest on the varieties Konoda (percentage severity index [PSI] = 20.37%) and NTJ_2 (PSI = 32.19%), they produced significantly (p < .001) higher grain yield (3.89 t/ha) than others. Under anthracnose pressure, Chelenko and Dishkara varieties are suggested for dry matter yield while NTJ_2 for grain yield production in the study area and similar agroecology.
We consider an experiment that yields, as data, a sample of independent and identically distributed (real-valued) random variables with a common distribution on the real line. The estimation of the underlying mean and median is discussed at length, and bootstrap confidence intervals are constructed. Tests comparing the underlying distribution to a given distribution (e.g., the standard normal distribution) or a family of distribution (e.g., the normal family of distributions) are introduced. Censoring, which is very common in some clinical trials, is briefly discuss.