To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Broadly speaking, statistics is the numerical study of a problem. Unless the problem can be reduced to quantities measurable on a scale or capable of being expressed as a number, it is impossible to make a statistical study of it. Statistics is not, however, just concerned with the counting of individuals or the measuring of items. Its ramifications are far wider than that, and include the study of what are the right figures to collect and the correct interpretation to be placed on them. The politician trying to envisage the effects of different forms of taxation needs to know the estimated yields of each form of taxation proposed; and the local town councillor must be able to appreciate how the local rate is split up into various headings. The citizen today is deluged with White papers, Economic surveys and a multitude of reports not only from the Government but from banks, insurance companies and industrial firms, all of which present, and argue from, a mass of statistical data. An understanding of statistics and the treatment of numerical data is therefore essential and only by a patient study of the part played by figures in such reports can good decisions be made and policies understood and, if need be, criticised. It is necessary to recognise, moreover, the power and the limitations of statistical arguments, to learn how to obtain the full information from a set of figures and how to avoid the pitfalls which await the unwary.
The previous chapter was concerned with tests designed to determine, on the basis of a sample, whether the population mean was equal to some specified amount. An alternative test is required when it is desired to investigate whether the variability of the observations in a population has some specified value, on the basis of a sample of n observations from the population. Thus a local corporation might find it more economical to change all its streetlamp bulbs at a fixed time instead of waiting for individual complaints that a lamp has gone out and then sending a man to replace it. If this policy is to be successful, however, the variability of the bulbs must be small, because it would be wasteful to change all the bulbs if only a few were burnt out and many burning hours left in the rest. On the other hand, to delay changing them would result in angry protests from the residents in the streets concerned. Hence the corporation would require bulbs that have little variation about the nominal length of life. A sample of bulbs could be examined from time to time by burning them to extinction in order to see whether the variability was remaining constant.
Suppose that in the past the average length of life of the bulbs has turned out to be 1,608 burning-hourswith a standard deviation of 381 hr.
The reader who has conscientiously worked through the preceding chapter and its examples will have come across many charts and diagrams of varying shapes and forms. In order to refer easily to the form taken by such charts and diagrams, two technical terms will now be introduced.
Frequency distribution. The first term, ‘frequency distribution’, was briefly introduced in chapter 3 and can be illustrated by stating that a table such as table 3.9, where the numbers of schoolboys with heights in different categories are recorded, gives the frequency distribution of the heights of the 100 schoolboys. Thus a frequency distribution merely gives the frequency with which individuals fall into a number of different categories. The interval chosen for the classification is referred to as the group interval, and the frequency in any particular group interval is the group frequency. The manner in which the group frequencies are distributed over the group intervals is referred to as the frequency distribution of the variable.
Table 4.1 is a frequency distribution of the number of incomes that lie between £2,000 and £10,000 in the year ending 5 April 1965. The group intervals in this case do not remain constant over the whole range of the variable (income) but, nevertheless, the resulting distribution is still a frequency distribution. The exercises at the end of chapter 4 give numerous frequency distributions of variables, such as deaths per day, the tensile strength of iron castings, the ages of the population, and the degree of cloudiness at Greenwich.
Some of the basic ideas of sampling were introduced in chapter 8, but these ideas can now be taken a few stages further and linked in with a wider range of practical problems. The traditional method of acquiring knowledge about an aggregate of individuals (this aggregate is also sometimes referred to as a population or universe) is to enumerate them all. This is especially common in the economic field, for example, censuses of population, or of imports, or of production, or of distribution. These enumerations depend upon a complete count of all the individual human beings concerned, or the goods passing through Liverpool docks, etc. Until relatively recently governments have been entirely census-minded. The major statistics of agriculture, production, distribution, of the labour force and of unemployment have all been based on the census approach. Even for prices, where an exhaustive enumeration of each transaction is impossible to carry out, the tendency has always been to include as much as possible and to spread the cover of an index number over as many commodities as could conveniently be included. The index of retail prices that is published monthly provides an example of this tendency.
But censuses are expensive, even in a small country like our own. A population census is not normally attempted more than once every ten years. What is just as bad, a census may defeat its own object by resulting in such a mass of information that the collectors are snowed under.
During recent years the importance of the subject of statistics has become increasingly recognised and it is now studied not only by statistical specialists but by students from many different disciplines. It has also been recognised that the subject is a suitable one for all levels of educational activity in that it can provide, at quite an early stage, a unifying link between the theoretical and practical sides of many forms of scientific training. This book is an attempt to put across the main principles of statistical methods to those students who are fundamentally interested in the practical applications of the subject, but are not so much concerned with the philosophical bases of the concepts used.
The choice of what to include and what to omit has been difficult, and this second edition of the book includes some new material whilst omitting some of the earlier material. Primarily the aim has been to give a selection of the more commonly used tools and not to provide a complete set of statistical tools for use in each and every situation. The student is then left in the position where he should be able to appreciate what further tools are needed and he can usefully profit from a reading of the more advanced books on the subject that are available. To have included every technique in common use would have lengthened the present book very considerably and destroyed a greater part of its planned utility.
The usefulness of probability distributions was well established by such early mathematicians as Laplace (1749–1827) and Gauss (1777–1855). The idea that frequency distributions could be explained as a practical consequence of the laws of probability applied to everyday matters, seized the imagination of the pioneers of mathematical statistics. Since a probability distribution is by its nature, in most instances, composed of an infinite number of items, and frequency distributions by their nature are composed of a finite number of items, these latter had to be thought of as samples from an underlying theoretical probability distribution. The problem that then arose was how to describe a probability distribution given only a sample from it. The mathematical difficulties of this seemed immense and such steps as were taken needed experimental verification to give the early workers confidence. Thus was born the sampling experiment. A close approximation to a probability distribution was created, samples were taken, combined and transformed in suitable ways and the resulting frequency chart of sampled values compared with the predictions of theory. Although mathematical techniques have developed to levels of sophistication that would astonish earlier workers, the value of sampling experiments in mathematical statistics still remains.
There are two main types of distribution from which samples are required. The first is where the statistical variable takes a continuous form, giving rise to a continuous probability density function; the second is where the statistical variable can take a discrete number of values.
This chapter will indicate how a knowledge of the theory of probability as outlined in the previous chapter can be used to make deductions as to the shape of the frequency distributions produced when certain types of experiment are repeated. The classes of experiments considered have two main characteristics in common:
(i) Each experiment is independent of the result of the preceding experiments. Thus the fact that a coin, when tossed, comes down heads does not affect the chance of the coin coming down heads at the succeeding tossing.
(ii) The quantity studied is the presence or absence of some characteristic; that is, there are only two classes to be considered and every event falls into one or other of these. These may be, for example, the heads or tails for a coin tossing experiment, under 6 ft. or over 6 ft. in height for men drawn from some population, or the presence or absence of some defect in articles made by a machine.
In the previous chapter it was found that if repeated independent drawings are made from the population under consideration, the proportion of individuals in the drawings possessing the characteristic concerned will approach the proportion in the whole population possessing it. But in all sets of drawings there will be some variation from the exact proportion, these variations depending on the size of the sets of drawings and the frequency with which the particular characteristic occurs.
In the opening stages of a statistical inquiry the investigator will need to collect a large amount of raw material or data from which to extract the quantities relevant to the purposes that he has in mind. Thus the market research investigator may have to collect data on consumer preferences after four different sets of advertisements have appeared; the botanist may have to spend several days in a grassland area counting the number of shoots of Solidago glaberima per square foot; the traffic investigator may have to count traffic at a busy crossing; the agriculturalist may have to collect data concerning the quantity of fertiliser applied to wheat crops on all the farms in Sussex. The method and care given to the collection of this raw material is important. The strength of a chain lies in the strength of its weakest link and it is useless to reach intricate conclusions from insufficient or inaccurate data. Before making any form of elaborate analysis it is essential to know the limitations and accuracy of statistical material and to be aware of the kind of errors that can arise. In this chapter two of the most common sources of data will be considered in some detail. These are:
(i) Questionnaires. The data here are obtained by forms designed by the statistician and completed by the general public.
(ii) Observations. The data here are collected by the investigator himself recording the results of a series of observations but not necessarily relying on the public at large for his information.