To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Frank Ramsey's essay “Truth and Probability” represents the culmination of a long tradition at Cambridge of philosophical investigation into the foundations of probability and inductive inference; and in order to appreciate completely both the intellectual context within which Ramsey wrote, and the major advance that his essay represents, it is essential to have some understanding of his predecessors at Cambridge. One of the primary purposes of this paper is to give the reader some sense of that background, identifying some of the principal personalities involved and the nature of their respective contributions; the other is to discuss just how successful Ramsey was in his attempt to construct a logic of partial belief.
“Truth and Probability” has a very simple structure. The first two sections of the essay discuss the two most important rival theories concerning the nature of probability that were current in Ramsey's day, those of Venn and Keynes. The next section then presents the alternative advocated by Ramsey, the simultaneous axiomatization of utility and probability as the expression of a consistent set of preferences. The fourth section then argues the advantages of this approach; and the last section confronts the problem of inductive inference central to English philosophy since the time of Hume. The present paper has a structure parallel to Ramsey's; each section discusses the corresponding section in Ramsey's paper.
ELLIS AND VENN
Ramsey's essay begins by disposing of the frequentist and credibilist positions; that is, the two positions advocated by his Cambridge predecessors John Venn (in The Logic of Chance, 1866) and John Maynard Keynes (in his Treatise on Probability, 1921); and thus the two positions certain to be known to his audience.
Abstract. The fiducial argument arose from Fisher's desire to create an inferential alternative to inverse methods. Fisher discovered such an alternative in 1930, when he realized that pivotal quantities permit the derivation of probability statements concerning an unknown parameter independent of any assumption concerning its a priori distribution.
The original fiducial argument was virtually indistinguishable from the confidence approach of Neyman, although Fisher thought its application should be restricted in ways reflecting his view of inductive reasoning, there by blending an inferential and a behaviorist viewpoint. After Fisher attempted to extend the fiducial argument to the multiparameter setting, this conflict surfaced, and he then abandoned the unconditional sampling approach of his earlier papers for the conditional approach of his later work.
Initially unable to justify his intuition about the passage from a probability assertion about a statistic (conditional on a parameter) to a probability assertion about a parameter (conditional on a statistic), Fisher thought in 1956 that he had finally discovered the way out of this enigma with his concept of recognizable subset. But the crucial argument for the relevance of this concept was founded on yet another intuition – one which, now clearly stated, was later demonstrated to be false by Buehler and Feddersen in 1963.
Key words and phrases: Fiducial inference, R. A. Fisher, Jerzy Neyman, Maurice Bartlett, Behrens-Fisher problem, recognizable subsets.
Abstract. R. A. Fisher's account of the decline of inverse probability methods during the latter half of the nineteenth century identifies Boole, Venn and Chrystal as the key figures in this change. Careful examination of these and other writings of the period, however, reveals a different and much more complex picture. Contrary to Fisher's account, inverse methods – at least in modified form – remained theoretically respectable until the 1920s, when the work of Fisher and then Neyman caused their eclipse for the next quarter century.
Key words and phrases: R. A. Fisher, inverse probability, history of statistics.
R. A. Fisher was a lifelong critic of inverse probability. In the second chapter of his last book, Statistical Methods and Scientific Inference (1956), Fisher traced the history of what he saw as the increasing disaffection with Bayesian methods that arose during the second half of the nineteenth century. Fisher's account is one of the few that covers this neglected period in the history of probability, in effect taking up where Todhunter (1865) left off, and has often been cited (e.g., Passmore, 1968, page 550, n. 7 and page 551, n. 15; de Finetti, 1972, page 159; Shafer, 1976, page 25). The picture portrayed is one of gradual progress, the logical lacunae and misconceptions of the inverse methods being steadily recognized and eventually discredited.
But on reflection Fisher's portrait does not appear entirely plausible.
Abstract. A major difficulty for currently existing theories of inductive inference involves the question of what to do when novel, unknown, or previously unsuspected phenomena occur. In this paper one particular instance of this difficulty is considered, the so-called sampling of species problem.
The classical probabilistic theories of inductive inference due to Laplace, Johnson, de Finetti, and Carnap adopt a model of simple enumerative induction in which there are a prespecified number of types or species which may be observed. But, realistically, this is often not the case. In 1838 the English mathematician Augustus De Morgan proposed a modification of the Laplacian model to accommodate situations where the possible types or species to be observed are not assumed to be known in advance; but he did not advance a justification for his solution.
In this paper a general philosophical approach to such problems is suggested, drawing on work of the English mathematician J. F. C. Kingman. It then emerges that the solution advanced by De Morgan has a very deep, if not totally unexpected, justification. The key idea is that although “exchangeable” random sequences are the right objects to consider when all possible outcome-types are known in advance, exchangeable random partitions are the right objects to consider when they are not. The result turns out to be very satisfying. The classical theory has several basic elements: a representation theorem for the general exchangeable sequence (the de Finetti representation theorem), a distinguished class of sequences (those employing Dirichlet priors), and a corresponding rule of succession (the continuum of inductive methods).
This chapter introduces the notion of a ring, more specifically, a commutative ring with unity. The theory of rings provides a useful conceptual framework for reasoning about a wide class of interesting algebraic structures. Intuitively speaking, a ring is an algebraic structure with addition and multiplication operations that behave like we expect addition and multiplication should. While there is a lot of terminology associated with rings, the basic ideas are fairly simple.
Definitions, basic properties, and examples
Definition 9.1. A commutative ring with unityis a set R together with addition and multiplication operations on R, such that:
(i) the set R under addition forms an abelian group, and we denote the additive identity by 0R;
(ii) multiplication is associative; that is, for all a, b, c ∈ R, we have a(bc) = (ab)c;
(iii) multiplication distributes over addition; that is, for all a, b, c ∈ R, we have a(b + c) = ab + ac and (b + c)a = ba + ca;
(iv) there exists a multiplicative identity; that is, there exists an element 1R ∈ R, such that 1R · a = a = a · 1R for all a ∈ R;
(v) multiplication is commutative; that is, for all a, b ∈ R, we have ab = ba.
There are other, more general (and less convenient) types of rings–one can drop properties (iv) and (v), and still have what is called a ring.
This chapter introduces the notion of an abelian group. This is an abstraction that models many different algebraic structures, and yet despite the level of generality, a number of very useful results can be easily obtained.
Definitions, basic properties, and examples
Definition 8.1.An abelian group is a set G together with a binary operation ⋆ on G such that
(i) for all a, b, c ∈ G, a ⋆ (b ⋆ c) = (a ⋆ b) ⋆ c (i.e., ⋆ is associative),
(ii) there exists e ∈ G (called the identity element) such that for all a ∈ G, a ⋆ e = a = e ⋆ a,
(iii) for all a ∈ G there exists a′ ∈ G (called the inverse of a) such that a ⋆ a′ = e = a′ ⋆ a,
(iv) for all a, b ∈ G, a ⋆ b = b ⋆ a (i.e., ⋆ is commutative).
While there is a more general notion of a group, which may be defined simply by dropping property (iv) in Definition 8.1, we shall not need this notion in this text. The restriction to abelian groups helps to simplify the discussion significantly. Because we will only be dealing with abelian groups, we may occasionally simply say “group” instead of “abelian group.”
In this chapter, we discuss Euclid's algorithm for computing greatest common divisors. It turns out that Euclid's algorithm has a number of very nice properties, and has applications far beyond that purpose.
The basic Euclidean algorithm
We consider the following problem: given two non-negative integers a and b, compute their greatest common divisor, gcd(a, b). We can do this using the well-known Euclidean algorithm, also called Euclid's algorithm.
The basic idea of Euclid's algorithm is the following. Without loss of generality, we may assume that a ≥ b ≥ 0. If b = 0, then there is nothing to do, since in this case, gcd(a, 0) = a. Otherwise, if b > 0, we can compute the integer quotient q ≔ └a/b┘ and remainder r ≔ a mod b, where 0 ≤ r < b. From the equation
it is easy to see that if an integer d divides both b and r, then it also divides a; likewise, if an integer d divides a and b, then it also divides r. From this observation, it follows that gcd(a, b) = gcd(b, r), and so by performing a division, we reduce the problem of computing gcd(a, b) to the “smaller” problem of computing gcd(b, r).
In this chapter, we discuss basic definitions and results concerning matrices. We shall start out with a very general point of view, discussing matrices whose entries lie in an arbitrary ring R. Then we shall specialize to the case where the entries lie in a field F, where much more can be said.
One of the main goals of this chapter is to discuss “Gaussian elimination,” which is an algorithm that allows us to efficiently compute bases for the image and kernel of an F-linear map.
In discussing the complexity of algorithms for matrices over a ring R, we shall treat a ring R as an “abstract data type,” so that the running times of algorithms will be stated in terms of the number of arithmetic operations in R. If R is a finite ring, such as ℤm, we can immediately translate this into a running time on a RAM (in later chapters, we will discuss other finite rings and efficient algorithms for doing arithmetic in them).
If R is, say, the field of rational numbers, a complete running time analysis would require an additional analysis of the sizes of the numbers that appear in the execution of the algorithm. We shall not attempt such an analysis here—however, we note that all the algorithms discussed in this chapter do in fact run in polynomial time when R = ℚ, assuming we represent rational numbers as fractions in lowest terms. Another possible approach for dealing with rational numbers is to use floating point approximations.
This chapter concerns itself with the question: how many primes are there? In Chapter 1, we proved that there are infinitely many primes; however, we are interested in a more quantitative answer to this question; that is, we want to know how “dense” the prime numbers are.
This chapter has a bit more of an “analytical” flavor than other chapters in this text. However, we shall not make use of any mathematics beyond that of elementary calculus.
Chebyshev's theorem on the density of primes
The natural way of measuring the density of primes is to count the number of primes up to a bound x, where x is a real number. For a real number x ≥ 0, the function π(x) is defined to be the number of primes up to x. Thus, π(1) = 0, π(2) = 1, π(7.5) = 4, and so on. The function π is an example of a “step function,” that is, a function that changes values only at a discrete set of points. It might seem more natural to define π only on the integers, but it is the tradition to define it over the real numbers (and there are some technical benefits in doing so).
Let us first take a look at some values of π(x). Table 5.1 shows values of π(x) for x = 103i and i = 1, …, 6.
In this chapter, we review standard asymptotic notation, introduce the formal computational model we shall use throughout the rest of the text, and discuss basic algorithms for computing with large integers.
Asymptotic notation
We review some standard notation for relating the rate of growth of functions. This notation will be useful in discussing the running times of algorithms, and in a number of other contexts as well.
Suppose that x is a variable taking non-negative integer or real values, and let g denote a real-valued function in x that is positive for all sufficiently large x; also, let f denote any real-valued function in x. Then
f = O(g) means that |f(x)| ≤ cg(x) for some positive constant c and all sufficiently large x (read, “f is big-O of g”),
f = Ω(g) means that f(x) ≥ cg(x) for some positive constant c and all sufficiently large x (read, “f is big-Omega of g”),
f = ⊗(g) means that cg(x) ≤ f(x) ≤ dg(x), for some positive constants c and d and all sufficiently large x (read, “f is big-Theta of g”),
f = o(g) means that f/g → 0 as x → ∞ (read, “f is little-o of g”), and
f ∼ g means that f/g → 1 as x → ∞ (read, “f is asymptotically equal to g”).
Example 3.1. Let f(x) ≔ x2 and g(x) ≔ 2x2 - x+1. Then f = O(g) and f = Ω(g). Indeed, f = ⊗(g). ▪