To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This appendix discusses some of the datasets that are used as examples throughout the text. They are all available online from www.cambridge.org/9781107607590.
Although R can load files directly from the web, it is good practice to download the individual files to a local directory so that they can be used off-line. The files needed for this chapter are rutherford.dat, reynolds.txt, hipparcos.txt.gz and pedroni.dat.
Michelson's speed of light data
A. A. Michelson – known to students of physics for the famous Michelson-Morley experiment – made great advances in precision optical measurements, particularly the measurement of the speed of light. Here we shall use a set of 100 measurements of the speed of light in air taken in summer 1879 originally published by Michelson (1882) and reproduced by Stigler (1977).
The first few data values are shown in Table B.1. Each of the numbers represents the speed recorded in one ‘run’ of the apparatus, in units of km s−1. Each run was in fact an average of several individual measurements. The 100 numbers, which together form a sample, are divided into five groups of 20, each group labelled an ‘experiment’. The speed measurements are in principle continuous, but Michelson's data have been rounded to the nearest 10 km s−1. Stigler (1977) applied Michelson's own corrections to the modern value of c to give a value of 299 734.5 km s−1 for the speed of light in air.
About thirty years ago there was much talk that geologists ought only to observe and not theorize, and I well remember someone saying that at this rate a man might as well go into a gravel-pit and count the pebbles and describe the colours. How odd it is that anyone should not see that all observation must be for or against some view if it is to be of any service!
Charles Darwin (letter to Henry Fawcett, 18 September 1861)
How do we know if the model fitted to our data is actually a good match to the data? And how do we quantify the uncertainty on the estimates of the model's parameters? The first question can be addressed by significance testing, and the second can be answered using confidence intervals.
A thought experiment
We shall return to the thought experiment begun in Chapter 5, drawing from a bag containing sweets of two colours, red and green. But now let us imagine that we do not know the proportions of red and green sweets. Instead, we are allowed to draw 10 times from the bag, with replacement. A simple hypothesis is that the bag contains equal numbers of red and green. What do we say about this hypothesis if we get eight greens from our 10 draws?
Let's assume the bag contains equal proportions, and find the probability for getting data like ours.
Science is not about certainty, it is about dealing rigorously with uncertainty. The tools for this are statistical. Statistics and data analysis are therefore an essential part of the scientific method and modern scientific practice, yet most students of physical science get little explicit training in statistical practice beyond basic error handling. The aim of this book is to provide the student with both the knowledge and the practical experience to begin analysing new scientific data, to allow progress to more advanced methods and to gain a more statistically literate approach to interpreting the constant flow of data provided by modern life.
More specifically, if you work through the book you should be able to accomplish the following.
• Explain aspects of the scientific method, types of logical reasoning and data analysis, and be able to critically analyse statistical and scientific arguments.
• Calculate and interpret common quantitative and graphical statistical summaries.
• Use and interpret the results of common statistical tests for difference and association, and straight line fitting.
• Use the calculus of probability to manipulate basic probability functions.
• Apply and interpret model fitting, using e.g. least squares, maximum likelihood.
• Evaluate and interpret confidence intervals and significance tests.
Students have asked me whether this is a book about statistics or data analysis or statistical computing. My answer is that they are so closely connected it is difficult to untangle them, and so this book covers areas of all three.
This book was written because I could not find a suitable textbook to use as the basis of an undergraduate course on scientific inference, statistics and data analysis. Although there are good books on different aspects of introductory statistics, those intended for physicists seem to target a post-graduate audience and cover either too much material or too much detail for an undergraduate-level first course. By contrast, the ‘Intro to stats’ books aimed at a broader audience (e.g. biologists, social scientists, medics) tend to cover topics that are not so directly applicable for physical scientists. And the books aimed at mathematics students are usually written in a style that is inaccessible to most physics students, or in a recipe-book style (aimed at science students) that provides ready-made solutions to common problems but develops little understanding along the way.
This book is different. It focuses on explaining and developing the practice and understanding of basic statistical analysis, concentrating on a few core ideas that underpin statistical and data analysis, such as the visual display of information, modelling using the likelihood function, and simulating random data. Key concepts are developed using several approaches: verbal exposition in the main text, graphical explanations, case studies drawn from some of history's great physics experiments, and example computer code to perform the necessary calculations. The result is that, after following all these approaches, the student should both understand the ideas behind statistical methods and have experience in applying them in practice.