To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Traditional courses for engineers in filtering and signal processing have been based on elementary linear algebra, Hilbert space theory and calculus. However, the key objective underlying such procedures is the (recursive) estimation of indirectly observed states given observed data. This means that one is discussing conditional expected values, given the observations. The correct setting for conditional expected value is in the context of measurable spaces equipped with a probability measure, and the initial object of this book is to provide an overview of required measure theory. Secondly, conditional expectation, as an inverse operation, is best formulated as a form of Bayes’ Theorem. A mathematically pleasing presentation of Bayes’ theorem is to consider processes as being initially defined under a “reference probability.” This is an idealized probability under which all the observations are independent and identically distributed. The reference probability is a much nicer measure under which to work. A suitably defined change of measure then transforms the distribution of the observations to their real world form. This setting for the derivation of the estimation and filtering results enables more general results to be obtained in a transparent way.
I cannot conceal the fact here that in the specific application of these rules, I foresee many things happening which can cause one to be badly mistaken if he does not proceed cautiously.
James Bernoulli (1713, Part 4, Chapter III)
I. J. Good (1950) has shown how we can use probability theory backwards to measure our own strengths of belief about propositions. For example, how strongly do you believe in extrasensory perception?
Extrasensory perception
What probability would you assign to the hypothesis that Mr Smith has perfect extrasensory perception? More specifically, that he can guess right every time which number you have written down. To say zero is too dogmatic. According to our theory, this means that we are never going to allow the robot's mind to be changed by any amount of evidence, and we don't really want that. But where is our strength of belief in a proposition like this?
Our brains work pretty much the way this robot works, but we have an intuitive feeling for plausibility only when it's not too far from 0 db. We get fairly definite feelings that something is more than likely to be so or less than likely to be so. So the trick is to imagine an experiment. How much evidence would it take to bring your state of belief up to the place where you felt very perplexed and unsure about it?
I protest against the use of infinite magnitude as something accomplished, which is never permissible in mathematics. Infinity is merely a figure of speech, the true meaning being a limit.
C. F. Gauss
The term ‘paradox’ appears to have several different common meanings. Székely (1986) defines a paradox as anything which is true but surprising. By that definition, every scientific fact and every mathematical theorem qualifies as a paradox for someone. We use the term in almost the opposite sense; something which is absurd or logically contradictory, but which appears at first glance to be the result of sound reasoning. Not only in probability theory, but in all mathematics, it is the careless use of infinite sets, and of infinite and infinitesimal quantities, that generates most paradoxes.
In our usage, there is no sharp distinction between a paradox and an error. A paradox is simply an error out of control; i.e. one that has trapped so many unwary minds that it has gone public, become institutionalized in our literature, and taught as truth. It might seem incredible that such a thing could happen in an ostensibly mathematical field; yet we can understand the psychological mechanism behind it.
How do paradoxes survive and grow?
As we stress repeatedly, from a false proposition – or from a fallacious argument that leads to a false proposition – all propositions, true and false, may be deduced.
We have seen, in Chapter 7, how the great mathematician Leonhard Euler was unable to solve the problem of estimating eight orbital parameters from 75 discrepant observations of the past positions of Jupiter and Saturn. Thinking in terms of deductive logic, he could not even conceive of the principles by which such a problem could be solved. But, 38 years later, Laplace, thinking in terms of probability theory as logic, was in possession of exactly the right principles to resolve the great inequality of Jupiter and Saturn. In this chapter we develop the solution as it would be done today by considering a simpler problem, estimating two parameters from three observations. But our general solution, in matrix notation, will include Laplace's automatically.
Reduction of equations of condition
Suppose we wish to determine the charge e and mass m of the electron. The Millikan oil-drop experiment measures e directly. The deflection of an electron beam in a known electromagnetic field measures the ratio e/m. The deflection of an electron toward a metal plate due to attraction of image charges measures e2/m.
From the results of any two of these experiments we can calculate values of e and m. But all the measurements are subject to error, and the values of e and m obtained from different experiments will not agree. Yet each of the measurements does contain some information relevant to our question that is not contained in the others.
In several previous discussions we inserted parenthetic remarks to the effect that ‘there is still an essential point missing here, which will be supplied when we take up decision theory’. However, in postponing the topic until now, we have not deprived the reader of a needed technical tool, because the solution of the decision problem was, from our viewpoint, so immediate and intuitive that we did not need to invoke any underlying formal theory.
Inference vs. decision
The situation of appraising inference vs. decision arose as soon as we started applying probability theory to our first problem. When we illustrated the use of Bayes' theorem by sequential testing in Chapter 4, we noted that there is nothing in probability theory per se which could tell us where to put the critical levels at which the robot changes its decision: whether to accept the batch, reject it, or make another test. The location of these critical levels obviously depends in some way on value judgments as well as on probabilities; what are the consequences of making wrong decisions, and what are the costs of making further tests?