UNPRINCIPLED

Abstract It is widely thought that chance should be understood in reductionist terms: claims about chance should be understood as claims that certain patterns of events are instantiated. There are many possible reductionist theories of chance, differing as to which possible pattern of events they take to be chance-making. It is also widely taken to be a norm of rationality that credence should defer to chance: special cases aside, rationality requires that one’s credence function, when conditionalized on the chance-making facts, should coincide with the objective chance function. It is a shortcoming of a theory of chance if it implies that this norm of rationality is unsatisfiable. The primary goal of this paper is to show, on the basis of considerations concerning computability and inductive learning, that this shortcoming is more common than one would have hoped.

below that it is much more difficult than one might expect to find reductionist theories of chance that do not face difficulties of this sort.
Very roughly speaking: on Lewis's Best System account, the chance laws at a world are just good scientific summaries of the patterns of events at that world; and his Principal Principle says that if you are rational, then to the extent that you are confident in a chance hypothesis it should guide your estimates of probability.So, taken together, the package seems to imply that in order to be rational, you need to be a certain sort of universal learner: whatever the chance laws at your world are, if you are a good scientist and you see enough data, you should become confident in something like those laws and so you should eventually mimic those laws in your estimates of probability.But in many contexts we know that there can be no universal learner. 3lmost all of the discussion below will take place in the special setting in which possible worlds can be modelled by infinite binary sequences (simple enough to be tractable, complex enough to provide a picture of inquiry in which typical questions are never definitively settled by a finite number of observations).The framework we will be using is introduced in Sections 1.1-1.3.
Section 2 is concerned with reductionist theories of chance.For present purposes, such a theory of chance for a given space of possible worlds is a map (satisfying certain mild technical conditions) that assigns to (some) worlds of that space a probability measure on that space-for a world assigned such a probability measure, the measure plays the familiar role of chance laws (telling us the chance for various events to happen, unconditionally or conditionally on other events having happened).This section also introduces five very simple paradigm examples of reductionist theories of chance for our chosen set of worlds.These will be used throughout as illustrations.
Section 3 is concerned with a weak version of the Principal Principle, the Chance-Credence Principle (CCP).We will see that it is straightforward to characterize the priors that satisfy CCP with respect to any given theory of chance-and that some of our paradigm examples do not admit any priors that satisfy this principle.
There would be something unsatisfactory about an account of rationality that required rational agents to be able to perform some absolutely impossible tasksquaring the circle, say.In the same way, if we are interested in the rationality of finite agents-e.g., human beings and the machines that they build-then there is something unsatisfactory about any account of rationality that requires agents to be able to perform tasks that are impossible for finite agents-solving the Halting Problem, say. 4 We will see in Section 4 that each of our paradigm examples of theories of chance is unsatisfactory in this way.So there is a challenge here for those attracted to reductionist theories of chance and to the Principal Principle: to develop and to defend examples of reductionist theories of chance that are CCP-satisfiable by computable priors (or to explain how norms of rationality can require agents to perform tasks that are impossible for computable agents).
Motivated in part by the picture sketched above of the relationship between the Lewisian package (Best System account of chance + the Principal Principle) and learning, Section 5 considers weaker analogs of CCP (and versions of them that require priors to be computable).The upshot is that if CCP is replaced by a weak enough principle along these lines, then some more reductionist theories of chance are consistent with Bayesian rationality.But many obstacles remain: for those interested in the combination of a reductionist theory of chance and a Principal Principle-style account of the relation between chance and credence there is interesting work to be done.
Appendix A consists of some remarks concerning the relation between the Principal Principle, some of its principal rivals, and CCP.Appendix B plays some Whac-A-Mole against the complaint that at various points, my discussion shows that I am being stiff-necked, closed-minded, naive, or unimaginative in my handling of conditional probabilities and null sets.
Ground-rules: all probability measures are real-valued and countably additive; the Axiom of Choice is taken for granted; and, for the most part, the notational conventions of [18] are followed.Non-reductionist theories of chance are neglected throughoutnot because they are somehow immune from the problems encountered below, but because their treatment would require an extension of the present framework.

Main characters.
a.The set of bits, 2 := {0, 1} (note the sans serif font-2 denotes a set, 2 a number).b.For each n ∈ N, 2 n is the set of n-bit strings (we include the case of the empty string of zero bits).c.The set of all binary strings: 2 < := ∞ n=0 2 n .If and are binary strings we write for the string that results from concatenating and (in that order).We call a binary string evenly split if it contains the same number of 0's as it does 1's.d.Cantor space: the set 2 of all infinite binary sequences. 5For S ∈ 2 , we write S(k) for the kth bit of S and S k for the string formed by concatenating the first k bits of S. We call a binary sequence evenly split if the limiting relative frequency of 0's in it is .5. e.For any binary string , we denote by [[ ]] the set of S ∈ 2 whose initial bits are given by .We call the [[ ]] basic subsets of 2 . 6We use B to denote the -algebra of subsets of 2 generated by the family of basic sets: B is the smallest family of subsets of 2 that includes all the basic sets and which is closed under taking complements, countable unions, and countable intersections.We call the members of B the Borel subsets of 2 . 7.The space P of Borel probability measures on 2 : each ∈ P is a map from B to [0, 1] such that: (a) (2 ) = 1; and (b) is countably additive (i.e., ( U k ) = (U k ), for any countable family U 1 , U 2 , ... of pairwise disjoint Borel sets).We call a map m from the family of basic sets to [0, 1] a probability function if: (i) m(2 ) = 1; and (ii) for any binary string ,

m([[ ]]) = m([[ 0]]) + m([[ 1]]
). 8 Any Borel probability measure determines via restriction a probability function.And the restriction in fact determines the probability measure: probability measures that agree on all basic sets agree on all Borel sets. 9Further, every probability function can be extended to a Borel probability measure. 10ample 1.1 (Delta-functions).For any S ∈ 2 , the delta-function concentrated on S is the Borel probability measure S that for any binary string , assigns the basic set [[ ]] probability one if S begins with , and otherwise assigns it probability zero.
Example 1.2 (Bernoulli measures).For any r ∈ (0, 1), the Bernoulli measure with parameter r is the Borel probability measure r such that for any binary string , r ([[ ]]) = r k (1r) , where k is the number of 0's in and is the number of 1's in .So according to r , bits are selected by independent tosses of a coin with bias r in favour of 0's.We call .5 the fair coin measure.Example 1.3 (Hybrids).At various points it will be helpful to have recourse to measures that initially behave like delta-functions before switching to the behaviour of the fair coin measure.For any binary string we define the measure .5 as follows: .5 ([[ ]]) is 1 if is an initial segment of , is 1/2 k if is a k-bit extension of , and is 0 otherwise.
Since there is a natural bijection between binary strings and the basic sets that they determine, when we are focusing on the action of probability measures on basic sets we will often simplify notation by writing ( ) for ([[ ]]) etc.Similarly, if b is a bit, and is a binary string, the expression (b | ) stands for ( ), the probability that gives for the next bit to be b, conditional on the sequence beginning with the bits of .

Computable numbers and measures.
A binary sequence S ∈ 2 is computable if there is a Turing machine that on input of a natural number k gives as output the first k bits of S.
We say that a binary string = b 1 b 2 ... b n is a k-bit approximation to a number x in the unit interval if A real number x in the unit interval is computable if there is a Turing machine that on input of a natural number k gives as output a k-bit approximation to x.
So, intuitively, a real number (or a sequence) is computable if and only if it can be computably approximated in a controlled way.Similarly, a real-valued function 8 An argument by induction shows that for any k > 1, any probability function satisfies a generalized form of condition (ii), with the sum ranging over all k-bit extensions of .From this it follows that any probability function is finitely additive as a function on basic sets.And in this case finite additivity implies countable (sub)additivity: recall from footnote 6 that in the product topology each basic set is both open and compact-so if a basic set arises as the union of a countable family of basic sets, then it also arises as the union of a finite subfamily. 9See, e.g., [43, lemma 1.42]. 10This follows via the Carathéodory Extension Theorem (since the family of empty-or-basic subsets of 2 is a semi-ring and the extension of m to this family, via the stipulation m(∅) = 0, is finitely additive and countably subadditive).See, e.g., [43, theorem 1.53].on a computably enumerable set is computable if there is a single machine that can approximate its value on any argument in a controlled way (this is stronger than just requiring that each of its values be computable).Specializing this to the case of probability measures-which, as we have seen, are determined by their restrictions to the basic subsets of 2 -it is natural to define a Borel probability measure on 2 to be computable if there is a Turing machine that on input of a binary string and a natural number k gives as output a k-bit approximation to ([[ ]]).Note that a delta-function measure S is computable if and only if the sequence S that it is concentrated on is and that a Bernoulli measure r is computable if and only if its parameter r is.Note, further, that since there are only countably many Turing machines, there can be only countably many computable sequences, computable real numbers, and computable probability measures.

Worlds and priors.
We are going to use binary sequences to model certain simple possible worlds: time at these worlds is discrete, has a first instant, and is infinite towards the future; the state of one of these worlds at a time is encodable by a single bit.These worlds are too simple to be realistic, but allow us to study problems about credence and chance in a controlled setting (and of course they provide models of subsystems of more complex worlds).
We are going to work in a Bayesian framework in which agents satisfy three conditions.
Probabilism: Each agent begins life in a credal state encoded in a probability measure (the agent's prior), defined over some salient set of possible worlds.Evidential Regularity: Each agent's prior must assign non-zero weight to any possible evidence E that the agent might acquire.Conditionalization: The credal state of an agent with prior and total evidence E is encoded by (• | E).
In our setting, the natural explication of possible state of evidence is: a finite initial segment of a world.So we will call a probability measure ∈ P a prior if it satisfies ( ) > 0 for every binary string .§2. Chance.Our focus here is on reductionist theories of chance.The spirit behind these (broadly Humean) approaches can be summarized as follows: Philosophers are deeply divided on the status of modal truths in general and nomic truths in particular-truths related to physical possibility and necessity.Besides chance, prominent nomic phenomena include causation, counterfactuals, dispositions, and laws of nature.One approach to the nomic, going back to Hume and further to the medieval nominalists, holds that nomic facts about what could or would or must are always reducible to facts about what is.For the most part, Humeans do not deny the reality of nomic phenomena; they agree that there are laws of nature, chances, dispositions, etc.But they maintain that these things are derivative, determined by more fundamental, non-modal elements of reality.Thus Humeans might identify laws of nature with regularities in the history of physical events, and chances with relative frequencies.This ensures that whenever two possible worlds agree in non-nomic aspects, they also agree with respect to laws and chance. 11 what is a Humean reductionist theory of chance?The core idea is that: the chance facts are reducible to certain facts about patterns to be found in the whole panoply of events that happen in the history of our universe.What kinds of patterns?Broadly speaking, and unsurprisingly, they are stable and stochastic-looking or randomlooking patterns. 12 our setting, a complete history of a world is a binary sequence-a good candidate for an object that can be described in non-modal terms but which might or might not exhibit the sorts of patterns that could ground talk about chance.A probability measure on 2 can be thought of in the following terms: (0) gives a probability for the initial state of the world to be 0 and (1) gives the probability for it to be 1; and given any binary string , (0 | ) and (1 | ) give the probability for the next bit to be 0 or 1, if gives the history so far.So we can picture as a book of instructions that tells Nature how to construct worlds: first, flip a coin of such and such bias to generate the first state; at any later time, flip a coin of this bias to generate the next state, if the history so far is that.More generally, for any (Borel) subset A of 2 , and any binary string , (A | ) gives the probability for A to obtain, given that encodes an initial segment of the history of the world. 13o in our setting, a reductionist theory of chance is an assignment of probability measures to (some) worlds.It is convenient to build a couple of weak technical conditions into the official definition.Definition 2.1.A theory of chance is a (possibly merely partially defined ) map L : 2 → P such that for any in the image of L we have (i) that L -1 ( ) is a Borel subset of 2 and (ii) that (L -1 ( )) > 0.
Interpretation: if L is a theory of chance and L(S) = for some S ∈ 2 and ∈ P, then we think of L as saying that encodes the facts about chance at S. We refer to any ∈ P in the image of L as a law of chance of L (or a chance law of L).If is a law of chance of L, then we refer to the proposition (= set of worlds) Λ := L -1 ( ) as the chance hypothesis determined by .If, on the other hand, L(S) is undefined, then L says that there are no facts about chance at S. We will refer to the set of worlds Λ * at which L is undefined as the proposition that the world is lawless (it is the negation of the disjunction of all the chance hypotheses of the theory).
Clause (i) in Definition 2.1 requires that chance hypotheses be the sort of things that probabilities can be assigned to. 14Clause (ii) rules out a certain sort of aberrant behaviour: a law of chance that says that there is zero chance that it is the law of chance. 15n practice, we are most interested in theories of chance inspired by the heuristic considerations that underlie various classic philosophical programs for understanding chance reductively.Most prominent among these are: frequentist approaches, in which the chance of any event type is identified with the relative frequency with which that type of event occurs in some relevant reference class of events; and Best System approaches in which the facts about chance at a world are provided by the (in general stochastic) hypothesis that provides the best summary of facts at that world, according to ordinary scientific standards (very roughly speaking-the best hypothesis is the one that in some relevant sense ideally balances simplicity, logical strength, and fit with the data). 16It will be helpful below to also have in mind a third (not especially popular) approach, super-determinism, on which the only laws of chance are deterministic (i.e., are encoded in delta-functions).
Consider the fair coin measure, .5 , on 2 .What does it mean to think of it as the law of chance at a world S ∈ 2 ?Well, for starters, .5 tells you that the probability of 0 occurring at any time is .5-andthat this probability is independent of what happens at other times.In order to be acceptable to frequentists, a theory of chance must assign the fair coin measure to all and only worlds that are evenly split between 0's and 1's.Advocates of the Best System approach may recognize some exceptions to this rule: perhaps (in the setting of finite worlds) it sometimes makes sense to take the fair coin measure to give the law of chance at a world at which the relative frequency of 0's is given by a number r that is close to .5, when r itself is sufficiently far from being simple. 17And perhaps some evenly split sequences should be assigned a law of chance other than the fair coin measure-e.g., one might well want to assign a particularly simple S ∈ 2 the delta-function measure S concentrated on S as its law of chance, whether or not S is evenly split. 18t will be helpful to have in mind a few theories of chance that can be thought of as very simple-minded implementations of super-determinism, frequentism, and the Best System approach.
Example 2.2 (Super-Determinism).Each S ∈ 2 is assigned as its chance law S , the delta-function measure concentrated on S.
Example 2.3 (Computable Super-Determinism).Each computable S ∈ 2 is assigned S as its chance law.All other worlds are lawless.
Example 2.4 (Basic Frequentism).If 0's have relative frequency r in S ∈ 2 , then the law of chance for S is the Bernoulli measure r .If this relative frequency is not defined, S is lawless.
Example 2.5 (Computable Frequentism).If 0's have relative frequency r in S ∈ 2 and r is computable, then the law of chance for S is the Bernoulli measure r .If the relative frequency of 0's in S is uncomputable or undefined, then S is lawless.
There may seem to be little reason to opt for Computable Frequentism over Basic Frequentism: it doesn't seem to follow from the thought that probabilities are relative frequencies that those frequencies must be given by computable numbers.But a restriction to computable chance laws is entirely natural on the Best System approach.If we are thinking of candidate chance laws for a world as something like putative scientific summaries of that world, with the actual chance law (if there is one) being the optimal such summary (as judged by ordinary scientific standards), then it would appear to be essential that candidate chance laws be finitarily specifiable.After all, what would it mean to compare the overall virtues of two infinite texts by the ordinary standards of scientific-or literary-criticism?To directly specify a binary sequence (or, equivalently, a real number in the unit interval) would be to specify the bits that make it up, one by one: an infinitary task.Turing launched modern theoretical computer science with the assertion that "The 'computable' numbers may be described briefly as the real numbers whose expressions as a decimal are calculable by finite means." 19Turing's proposal is that the finitarily specifiable real numbers are the Turingcomputable real numbers.A probability measure on 2 can be thought of as a certain sort of real-valued function on the countably infinite space of finite binary strings.Direct specification of such an object would be a doubly infinitary task, involving the specification of an infinite number of real numbers.But some such objects are finitarily specifiable: the Turing-computable measures (and the success of the Church-Turing thesis gives us reason to think that these are all of the finitarily specifiable such objects).So it is natural for those interested in Best System approaches to restrict their attention to theories in which all chance laws are computable. 20ample 2.6 (A Toy Best System Theory).Any computable S ∈ 2 is assigned the delta-function measure concentrated on S. Any incomputable S in which 0's have computable relative frequency r is assigned the Bernoulli measure r .All other S are lawless.
Remark 2.7.In a more plausible implementation of the best-system approach, it would be natural to further require that any sequence assigned a given computable probability measure as its law of chance must be algorithmically random relative to that measure. 21It would also be natural to include further families of laws of chance: generalized Bernoulli measures ( for which the bias of the coin used to determine the nth bit is allowed to depend on n); Markov chains ( for which the bias of the coin used to determine the state at a time depends on the states at some fixed number of immediately preceding times); and so on.Definition 2.8 (Proper and Improper Theories).Let L be a theory of chance.We call a law of chance of L with chance hypothesis Λ proper if (Λ) = 1, otherwise we call improper.We call L proper if all of its laws of chance are proper; otherwise we call L improper.
Remark 2.9.Each of the theories of chance mentioned above is proper.This follows from three facts.(a) Each delta-function S assigns measure one to {S}.(b) Each Bernoulli measure r assigns measure one to the set of sequences in which 0's have relative frequency r (the strong law of large numbers).(c) Each Bernoulli measure r assigns measure zero to each countable set.
Example 2.10 (An Improper Theory of Chance).As in Example 1.3, let 0 .5 be the probability measure that is certain that the first bit will be 0 but thinks that all subsequent bits are sampled by tossing a fair coin.Consider the theory of chance that assigns the fair coin measure to all sequences that begin with 1 and assigns 0 .5 all sequences that begin with 0. In this theory 0 .5 is a proper chance law but the fair coin measure in an improper one.§3.Chance and credence.It has seemed to many that rationality requires there to be a certain sort of relation between chance and credence-that in certain situations, if you are rational, then your credence in an event must coincide with the chance of that event.
Knowing only that the chance of drawing a red ball from an urn is 0.95, everyone agrees, in accordance with the law of likelihood, that a guess of 'red' about some trial is much better supported than one of 'not-red.'But nearly everyone will go further, and agree that 0.95 is a good measure of the degree to which 'red' is supported by the limited data. 22]he chancemaking pattern in the arrangement of qualities must be something that would, if known, correspondingly constrain rational credence.Whatever makes it true that the chance of decay is 50% must also, if known, make it rational to believe to degree 50% that decay will occur. 23will codify the constraint on priors suggested as follows.
Definition 3.1 (CCP).Let L be a theory of chance.We say that a prior satisfies the Chance-Credence Principle (CCP) for L if: for any chance law of L with chance hypothesis Λ and for any Borel subset A of 2 we have If there exists such a prior, we say that L is CCP-satisfiable; otherwise we say that it is CCP-unsatisfiable.Remark 3.2 (Nomenclature).The name Miller's Principle is often given to principles in this neighbourhood. 24With more justice (and less snark) it might perhaps be called Hacking's Principle, in light of Hacking's somewhat earlier enunciation of a principle of this kind. 25But, really, it seems hopeless to try to identify the originator of an idea like this: as Hacking notes in introducing his version of the principle, it "seems to be so universally accepted that it is hardly ever stated." 26 22 Se [27, p. 136]. 23See [46, p. 478]. 24Because Miller [49] argued that a principle along these lines leads to contradiction. 25 See [27, esp.pp.135 ff. and 193]. 26See [27, p. 135].Hacking himself offers a reading on which Bayes was at least implicitly committed to something like it-and also warns that although "many people readily grant [this principle], it is by no means certainly correct" (p.193).
CCP can be thought of as a weak form of the Principal Principle (see Appendix A).What does CCP require?Definition 3.3.Let be a prior and let be a law of chance of a theory of chance L. We say that contains as a nugget of truth if we can write = + c • with c > 0 and a (possibly trivial ) measure that considers the chance hypothesis Λ of a null set. 27oposition 3.4.Let L be a theory of chance and let be a prior on 2 .Then satisfies CCP for L if and only if L has a countable number of chance laws, each of which is proper and contained in as a nugget of truth.
Proof.Suppose, first, that the chance laws of L are 1 , 2 , ... (with chance hypotheses Λ 1 , Λ 2 , ...), that each k is proper, and that = + c k • k with each c k > 0 and a measure that considers each Λ k a null set.We show that satisfies CCP for L. Fix j = 1, 2, 3, ... and a Borel set A. We have where the first equality holds by definition; the second follows from the expansion for given above; the third follows from the fact that and the k other than j assign measure zero to Λ j ; and the fourth follows via the propriety of j .
Suppose, on the other hand, that satisfies CCP for L. Let be one of the chance laws of L and let Λ be the corresponding chance hypothesis.
(a) From the fact that CCP is satisfied, it follows in particular that the conditional probability (A | Λ) must be well-defined.So (Λ) > 0. Since was arbitrary, the corresponding fact must hold for each chance law of L. Since the chance hypotheses corresponding to distinct chance laws must be disjoint, this means that L must have only countably many laws of chance.(b) CCP tells us that a certain condition holds for all Borel sets A-so in particular, it holds when we set A to Λ.So L is proper: where c := (Λ) > 0: contains as a nugget of truth.

.). A prior satisfies CCP for L if and only if it can be written in the form
with each c k > 0 and a (possibly trivial ) measure on 2 such that (Λ k ) = 0 for each k.

Corollary 3.6. A theory of chance is CCP-satisfiable if and only if it is proper and has only countably many laws of chance.
Proof.The left to right implication follows immediately from the preceding proposition.For the other half, let L be a proper theory of chance with chance laws 1 , 2 , ....It suffices to show the existence of a prior that contains each of the k as a nugget of truth.
If M is empty, then 0 is a prior and we can take = 0 .Otherwise, enumerate M as 1 , 2 , ... (M is countable, being a subset of 2 < , and is infinite, since any extension of a string in M is in M).Then, recalling the notation of Example 1.3, 1 := 1 2 k k .5 is a probability measure concentrated on the L-lawless sequences.So we can take = 1 2 0 + 1 2 1 .Of our examples of theories of chance from Section 2, Super-Determinism and Basic Frequentism are not CCP-satisfiable (they have too many chance laws), but Computable Super-Determinism, Computable Frequentism, and the Toy Best System account are.28 §4.Chance, credence, and computability.As noted above, fans of the Principal Principle like to present it as Janus-faced, both giving us substantive information about the nature of chance and giving us a substantive constraint on rational Bayesian priors.From this perspective, something has gone wrong if we find no computable prior satisfies CCP for a certain reductionist theory of chance: the theory of chance should be discarded; we need to replace CCP with some weaker condition relating credence and chance; or we need to provide an account of the normative force of a putative requirement of rationality that requires agents to perform supra-computable tasks.

Definition 4.1 (CCCP).
Let L be a theory of chance and let be a prior.We say that satisfies the Computable Chance-Credence Principle (CCCP) for L if is computable and satisfies CCP for L. Definition 4.2.We call a theory of chance CCCP-satisfiable if there is a prior satisfying CCCP for it; otherwise we call it CCCP-unsatisfiable.
We are going to see that there are nontrivial obstructions to CCCP-satisfiability. Indeed, none of our paradigm examples of theories of chance are CCCP-satisfiable.

Learning and delta-functions.
Our first objective will be to show that Computable Super-Determinism and the Toy Best System Theory are not CCCPsatisfiable.We will approach this result indirectly via one of the classic computer science models of inductive learning (this will give us purchase in Section 4.3 on the question of what sort of theories are CCCP-satisfiable). 29finition 4.3.An extrapolating machine is a computable map m : 2 < → {0, 1}.Definition 4.4.Let m be an extrapolating machine and let S ∈ 2 .We say that m NV-learns S if there is an N such that m(S n) = S(n + 1) for all n > N.
Informal picture: a learning agent is being fed an infinite binary data stream S bit-by-bit and each time a bit is revealed, the agent makes a guess as to the identity of the next bit.An extrapolating machine m is a computable strategy that an agent might use for arriving at such guesses: when given a binary string (a finite data set), m outputs a bit m( ).Extrapolating machine m successfully NV-learns S just in case from some point onwards in processing S, all of m's guesses are correct (here NV=next value).Note that any binary sequence NV-learned by an extrapolating machine must be computable.Proof.Let m 0 be an extrapolating machine and define the binary sequence S as follows: the first bit of S is 1m 0 (∅) (the opposite of what m 0 predicts on empty input); and in general, S(k + 1) is the opposite of what m 0 predicts when shown the first k bits of S. S is computable (since m 0 is).And m 0 does not NV-learn S (since it never predicts a bit correctly).
Given a computable prior we can define an extrapolating machine m as follows.
Let κ be a computable number in the open unit interval that is not in the set , never uses κ as the conditional probability that the next bit will be 0 or that it will be 1). 30Define m : 2 < → 2 as follows: on input of 29 This model of learning has its roots [54, 55] and was codified and developed in [6] and in [16].For classic surveys see [2] and [50, chap.VII]. 30It is always possible to find such a κ: let 1 , 2 , ... be a computable enumeration of the binary strings; then (0| 1 ), (1| 1 ), (0| 2 ), (1| 2 ), ... is an enumeration of some computable real numbers; since we have , we can calculate any of these to any desired degree of accuracy; so the numbers listed are uniformly computable and so must not include all computable numbers in the unit interval (see Remark 4.14).A variant of this argument shows that we can choose a κ arbitrarily close to .5:otherwise the computable numbers in some open subinterval around .5 would be uniformly computable-and by omitting some initial bits, binary string , m outputs 0 if (0 | ) > κ and outputs 1 if (0 | ) < κ.Since κ and are computable, so is m . 31oposition 4.6.Let be a prior on 2 and let S be a binary sequence.If = + c • S , where c > 0 and is a measure, then m NV-learns S.
Proof.We can assume without loss of generality that ({S}) = 0. We then have since S (S n) = 1 for all n.Since ({S}) = 0, by making n sufficiently large we can make (S n)-and therefore also (S (n + 1))-as small as we like. 32 Proof.Any prior on 2 that satisfied CCCP for one of these theories would in particular satisfy CCP for that theory-and so, by Proposition 3.4, would contain each computable delta-function measure as a nugget of truth.But then by Proposition 4.6, the extrapolating machine m induced by would NV-learn each computable sequence.But Proposition 4.5 shows that to be impossible.So Computable Super-Determinism and the Toy Best System Theory, although CCPsatisfiable, are CCCP-unsatisfiable: they admit priors that satisfy the weak form of the Principal Principle that we have been considering; but all such priors are uncomputable.
Remark 4.8.Adleman and Blum [1] show that if A is an oracle, then there exists a Turing machine with access to A that NV-learns every computable sequence if and only if A is high (i.e., A enables the computation of a function that grows more quickly than any computable function).So the problem of NV-learning all computable sequences is unsolvable, but strictly easier than the Halting Problem.

Learning and Bernoulli measures.
The above discussion shows that Computable Super-Determinists must rein in their ambitions, maintaining that only a well-behaved proper subset of computable delta-function measures can correspond to chance laws, if they want their theory to support the existence of a prior satisfying we can transform a listing of the computable numbers in such a subinterval into a listing of all computable numbers in the unit interval.Note that it is not required for present purposes that the identification of a suitable cutoff κ for a given extrapolating machine need itself be a computable task. 31We choose κ / ∈ X because determining that two computable numbers are equal is not in general as computable task.On this point see, e.g., [53, theorem 3.3]. 32This follows from the general fact that measures exhibit continuity from above-see e.g., [43,  theorem 1.36].
CCCP.We will see next that Computable Frequentists must likewise fall back to a position on which only some of the computable Bernoulli measures arise as laws of chance, if they want their theory to support the existence of a prior satisfying CCCP.And, of course, fans of the Toy Best System will be driven to make both moves at once. 33e begin by recalling some basic notions from the theory of algorithmic randomness deriving from [48].A family U = {U k } k∈N of subsets of 2 is called a uniformly effective family of open sets if there is a Turing machine that on input k ∈ N outputs a sequence k 1 , k 2 , ... of binary strings with the feature that That is: each U k can be effectively approximated as a union of basic subsets of 2 ; and this approximation is uniform in k, in the sense that a single machine can handle each of the U k . 34Let be a computable probability measure on 2 .Then an effective -null set is a subset T of 2 that can be written in the form T = U k , for some uniformly effective family of open sets, {U k }, satisfying (U k ) ≤ 2 -k . 35We say that a binary sequence is -Martin-Löf random if it does not belong to any effective -null set and we denote by ML the set of such sequences.Note that each computable ∈ P assigns measure one to its ML . 36oposition 4.9.Let R = {r k } k∈I be a set of computable binary sequences and for each k ∈ I let k be the Bernoulli measure whose parameter admits r k as a binary expansion.If is a prior that can be written in the form = + c k • k (with a possibly trivial measure and each c k > 0), then there is an extrapolating machine that NV-learns each r k .
Proof.(1) We claim that for any m ∈ I and any binary sequence S, if S ∈ ML m , then S ∈ ML .For suppose that S / ∈ ML .Then there exists a uniformly effective family of open sets, U 1 , U 2 , ..., with (U k ) ≤ 2 -k (for each k ∈ N) and S ∈ ∞ k=1 U k .Choosing j large enough to ensure that 1/c m ≤ 2 j , we have 33 Much of the argument of this section was suggested by Christopher Porter (in private communication). 34And if we equip 2 with the product topology (as in footnote 6), then each U k is an open subset of 2 . 35Commentary: such T are -null sets that can be effectively specified in a certain sense.To belong to a given effective -null set is to be special in a certain effectively specifiable way.The intuition behind the present approach is that a sequence is -random if it avoids each such null set.You can think of the -Martin-L öf random sequences as those that exhibit no effectively specifiable behaviour that would be arbitrarily surprising to agents expecting to see data sampled from . 36Since there can only be countably many effective -null sets, their union must be a -null set.
(2) Next, we invoke a result due to Porter: for any computable measure on 2 , there is a computable measure on 2 such that for each r ∈ [0, 1], if ML ∩ ML r = ∅, then r ∈ ML . 37Applied to our case, this implies that there exists a computable probability measure such that R ⊆ ML .
(3) It follows that must assign positive probability to each {r k }: otherwise, since and r k are both computable, we could choose initial segments 1 , 2 , ... of r k so that the family of basic sets [ Remark 4.11.Vitányi and Chater [63] develop a model of learning in which an agent viewing a binary data stream attempts to guess a code number for a Turing machine that calculates a probability measure on 2 relative to which the data stream is Martin-Löf random.Barmpalias et al. [5, theorem 1.6] show that the set of computable Bernoulli measures is not learnable in this sense-indeed, they further show that this set is learnable relative to an oracle if and only if that oracle is high.It is natural to conjecture that an analogous result holds in our setting: there exists a prior satisfying CCP for computable Frequentism that is computable relative to a given oracle if and only if that oracle is high.

Satisfying CCCP.
So in order for a prior to satisfy CCCP for a given theory of chance, the set sequences on which the theory's delta-function laws of chance are concentrated and of parameters of its Bernoulli measure laws of chance must both be NV-learnable.Here we consider the contours of NV-learnability.We denote by NV (m) that set of sequences NV-learned by extrapolating machine m.Definition 4.12.A set F of binary sequences is dense if for each binary string , there is a sequence S in F with as an initial segment. 38finition 4.13.A set F of binary sequences is uniformly computable if there is an enumeration S 1 , S 2 , S 3 , ... of the elements of F such that the map that sends (k, ) ∈ N 2 to S k ( ) is computable.

Remark 4.14. It is immediate that each member of a uniformly computable set of sequences is computable. A straightforward diagonalization argument shows that the set of computable sequences is not itself uniformly computable. And since any finite subset of 2 is uniformly computable and the union of any two uniformly computable sets of sequences in uniformly computable, it follows that any uniformly computable family of binary sequences must exclude infinitely many computable binary sequences. Moral: to say that a set of sequences is uniformly computable is to say that it is computationally tractable in a certain sense.
Proposition 4.15.Let F be a set of binary sequences.The following are equivalent: i) F is uniformly computable and dense.ii) There is an extrapolating machine m such that F = NV (m).
Proof.Let F ⊂ 2 be uniformly computable and dense.Fix an enumeration S 1 , S 2 , ... of the members of F such that the map (k, ) → S k ( ) is computable.Define an extrapolating machine m as follows: on input of an n-bit binary string , m finds the least k such that S k n = and gives output m( ) = S k (n + 1) (such a k always exists since F is dense).For any S k ∈ F, we can find n large enough so that there is no j < k with S j = S k and S j n = S k n.It follows that m NV-learns each sequence in F. Suppose, on the other hand, that m NV-learns S and let be an initial segment of S that contains all of the bits that m predicts incorrectly.On input , m uses some S k that has as an initial segment to predict that next bit.This prediction is correct, so m also uses S k to predict the next bit-and all subsequent bits.So S = S k .So each sequence NV-learned by m is in F. So F = NV (m).
Suppose that there is an extrapolating machine m such that F = NV (m).Then F must be dense, since it must include, for each binary string , the sequence S m that has as an initial segment and in which all subsequent bits are chosen to vindicate m's predictions.Further, if 1 , 2 , 3 , ... is a computable enumeration of the binary strings, then S 1 m , S 2 m , S 3 m , ... is an enumeration of the sequences NV-learned by m and the map (k, ) → S k m ( ) is computable, since m and our enumeration of the strings are both computable.So F is uniformly computable.Corollary 4.16.Let F be a set of binary sequences.There is an extrapolating machine that NV-learns each sequence in F if and only if F is a subset of a uniformly computable set of binary sequences.
Proof.Note that the union of a uniformly computable set of sequences with the set of periodic sequences (i.e., those sequences that can be written in the form ... for some binary string ) is both uniformly computable and dense.
Remark 4.17.This corollary gives a version of one of the two standard characterizations of NV-learnability. 39The other is: A set F of computable binary sequences is a subset of some NV (m) if and only if it is a subclass of an abstract complexity class. 40ne gloss that has been given to this latter characterization: "It says, in essence, that the [computably] extrapolable sequences are the ones that can be computed rapidly." 41erhaps a more accurate gloss would be: for any computably NV-learnable set S of sequences, there is an at least somewhat natural notion of rapidly computable relative to which every sequence in S is rapidly computable. 39It is usually given in the form: a set of sequences is a subset of some NV (m) if and only if it is a subset of a computably enumerable family of computable binary sequences.See the references in footnote 29. 40See the references in footnote 29.Roughly, one defines an abstract complexity class by choosing a way of measuring the complexity of a computation (satisfying certain weak conditions) and choosing a computable upper bound on this complexity, and then restricting attention to sequences satisfying this bound.For a brief introduction to abstract complexity classes, see [32, sec.12.7].For a thorough treatment, see [50, chap.VII]. 41See [16, p. 127].For natural senses in which any NV-learnable set of computable sequences forms a "small" subset of the set of computable sequences, see [22].For further discussion, see [10].
All of this may seem congenial to fans of the Best System approach.True, they may have to recognize fewer chance laws than they might have originally hoped-but to the extent that we can think of there being some sort of complexity cut-off that rules out certain delta-function measures and Bernoulli measures as respectable chance laws, that may seem to uphold the spirit of the Best System approach.(But here it is important to keep in mind the arbitrariness entailed by the considerations canvassed in Remark 4. 14.)   Let us end this discussion on a positive note.Definition 4.18.We say that a set M of probability measures on 2 is uniformly computable if there is an enumeration 1 , 2 , ... of the elements of M and a Turing machine that on input of a binary string and non-zero natural numbers j and k, give as output a k-bit approximation (as in Section 1.2) to j ( ).
Proposition 4.19.If L is a proper theory of chance whose set of chance laws is uniformly computable and includes a prior, then L is CCCP-satisfiable.
Proof.Let 1 , 2 , ... be an enumeration of the chance laws of L in virtue of which these laws are uniformly computable.Consider the measure It is a prior (since by assumption one of the k 's is).It is immediate that satisfies CCP for L. And it is straightforward to show that is computable. 42§5.Will we never learn?A reaction that people sometimes have to the Principal Principle: Who cares how my prior behaves conditional on an event E like that of the coin being fair?If I am a true Humean, I should regard this E as something that could never be part of my evidence, since it is a holistic fact about the history of the entire universe. 43ewis has an answer to this challenge. 44 the believer in chance, chance is a proper subject to have beliefs about.Propositions about chance will enjoy various degrees of belief, and other propositions will be believed to various degrees conditionally upon them. 45 42 Se, e.g., [10, proposition 4.7]. 43 This makes sense as an application of the law of total probability (writing H for the event of Heads, A for the chance being .5,B for it being .35, and C for it being .8),To the subjectivist who believes in objective chance, particular or general propositions about chance are nothing special.We believe them to varying degrees.As new evidence arrives, our credence in them should wax and wane in accordance with Bayesian confirmation theory.It is reasonable to believe such a proposition, like any other, to the degree given by a reasonable initial credence function conditionalized on one's present total evidence. 46 are interested in the Principal Principle, not because we want to be prepared just in case we should learn the truth of some law of chance, but because chance laws can be confirmed and disconfirmed: we have varying degrees of credence in various chance laws and the Principal Principle is a constraint on the rational connection between those theoretical beliefs and concrete expectations about future events. 47earnability is in fact an important (if largely subterranean) Lewisian theme in the paper in which the Principal Principle made its debut. 48He has recourse to nonstandard-valued priors because he thinks that rational priors must be regular (i.e., assign non-zero probability to each non-trivial proposition): it is required as a condition of reasonableness: one who started out with an irregular credence function (and who then learned from experience by conditionalizing) would stubbornly refuse to believe some propositions no matter what the evidence in their favor. 49 a post-script he suggests that "if we start with a reasonable initial credence function and do enough feasible investigation, we may expect our credences to converge to the chances." 50ow, within the standard framework that we have been working in, only Bayesian agents who assign non-zero prior probability to a chance hypothesis can become more confident of that chance hypothesis through experience.And within our standard framework, no Bayesian agent can assign non-zero prior probability to each chance hypothesis of a theory of chance that has uncountably many laws of chance.It is natural to think that something has gone wrong here.Surely Basic Frequentism, despite having uncountably many laws of chance, should provide an extraordinarily hospitable environment for a Bayesian agent attempting to learn the chance facts!One option would be too simply excuse priors from having to defer to chance laws when they assign the corresponding chance hypothesis probability zero. 51finition 5.1 (CCP * ).Let L be a theory of chance and let be a prior.We say that satisfies CCP * for L if: for any Borel subset A of 2 and for any chance law such that assigns positive probability to Λ := L -1 ( ), we have Relative to Basic Frequentism, the fair coin measure satisfies CCP * (but not CCP of course).Someone who begins life with this prior is certain from birth that the chance of a 0 at any particular future time is 1/2 and remains certain of this no matter how strongly the data speak against this hypothesis and in favour of others. 52To accept CCP * as an explication of the idea that a rational agent's credence function should defer to chance is to give up entirely on the idea that part of what deference to chance involves is being open to learning chance hypotheses. 53 think that we should be interested in finding variants of CCP that require priors to be open to the learning of chance hypotheses. 54To this end, let us zoom out.The thought behind the Best System approach is that a chance law of a world is a sort of optimal scientific summary of the pattern of events at that world (relative to the ordinary standards of scientific practice).The thought behind the Principal Principle is that (in ordinary circumstances at least) to the extent that you are confident that the chance law of your world is given by , your credences should be close to the -chances.Suppose that a given prior satisfies at least the spirit of the Principal Principle and embodies the canons of scientific rationality (preferring simpler theories to more complex ones, etc.). 55Then it seems that if sees larger and larger portions of a world with chance law 0 , it should become more and more confident that the law of chance is 0 -like (since is a good scientist, it will eventually home in on the best scientific description of its world) and hence, upon being conditionalized on larger and larger data sets, the probabilities that assigns events will approach those that 0 assigns them (since satisfies the spirit behind the Principal Principle).
Let L be a proper theory of chance and let be a chance law of L with corresponding chance hypothesis Λ = L -1 ( ).Let S be typical in Λ.Our line of thought above tells us that as n → ∞, we expect that (• | S n) comes closer and closer to (• | S n) in some sense.So let us consider generalizations of CCP of the form: there exists a prior such that for every chance law , for -almost all sequences S in the corresponding chance hypothesis, the (•, | S n) converge to the (• | S n) as n → ∞.The question is: What notion of convergence should we demand here? 56The following is a specialization to our context of the notion of merging introduced in [14] and subsequently widely discussed by statisticians and game theorists. 52Some seem to feel the temptation to say: Yes, but if that if the fair coin is your prior that is exactly what you should do!That is of course correct according to subjective Bayesianism.But the Principal Principle is not consistent with subjective Bayesianism: it was put forward as a part of an attempt to articulate constraints on rational priors beyond mere coherence.

Merging. Recall that the total variation distance between probability measures and on 2 is sup A∈B
There is not much point to such constraints unless one is willing to say: some agents who are coherent and update by conditionalization are nonetheless irrational. 53The same sort of case can be made against a reformulation of CCP in terms of regular conditional probabilities-see Appendix B.4. Dmitri Gallow pointed out to me that a version of this problem arises for New Principle of Hall [28] and Lewis [46] (discussed in Appendix A): in the setting of finite worlds, the fair coin measure satisfies the New Principle for the theory of chance that assigns each world the Bernoulli measure whose parameter gives the relative frequency of 0's at that world. 54Further remarks about alternative directions can be found in Appendix B. 55 I do not actually believe that it is possible to design priors that always prefer simpler hypotheses-see [9].But here I bracket my various heretical views about the shortcomings of the Bayesian account of rationality. 56For background on this notion of distance and its relation to other ways of topologizing spaces of probability measures, see, e.g., [26].

Definition 5.2 (Merging).
Let and be probability measures on 2 .We say that merges with if there is a set of sequences of -measure one such that for each S in that set: Definition 5.3 (M-CCP).Let L be a theory of chance.We say that a prior on 2 satisfies the Merging Chance-Credence Principle (M-CCP) if merges with each chance law of L.
If a prior merges with a proper law of chance of a theory of chance, then we can take the set of sequences that witnesses merging to be a subset of the chance hypothesis of .Definition 5.4.We call a theory of chance L M-CCP-satisfiable if there is a prior that satisfies M-CCP for L. If there is a computable prior with this feature, we call L M-CCCP-satisfiable.
A pair of classic results clarify what it takes for a prior to merge with a probability measure .Recall that if 1 and 2 are measures on a measurable space, then we say that 1 is absolutely continuous with respect to 2 if for each measurable set A, 1 (A) > 0 implies 2 (A) > 0. In this case we write 1 Î 2 .
Remark 5.5.Note that if we have some measures satisfying = 1 + 2 , then 1 , 2 Î .So, in particular, if a prior contains a proper law of chance as a nugget of truth, then Î .Proposition 5.6 (Blackwell and Dubins [14]).Let be a probability measure on 2 and let be a prior on 2 .If Î , then merges with .Proposition 5.7 (Lehrer and Smorodinsky [44]).Let be a probability measure on 2 and let be a prior on 2 .If merges with , then Î .
Each prior merges with uncountably many probability measures. 57But only countably many of these can be chance laws of any given reductionist theory of chance.Proposition 5.8.Let L be a theory of chance.If L is M-CCP satisfiable, then L has only countably many chance laws.
Proof.Suppose that satisfies the M-CCP for L. Let be a law of chance of L with chance hypothesis Λ = L -1 ( ).Since L is a theory of chance, we have (Λ) > 0. Since merges with , Proposition 5.7 then implies that (Λ) > 0. So must assign positive probability to each chance hypothesis of L-so L can have only countably many chance laws.

Corollary 5.9. Super-Determinism and Basic Frequentism are M-CCP-unsatisfiable (and hence also M-CCCP-unsatisfiable).
As an immediate consequence of Remark 5.5 and Propositions 3.4 and 5.6 we have: Proposition 5.10.If a prior satisfies CCP for a theory of chance, then it also satisfies M-CCP for that theory. 57See, e.g., [10, proposition 4.4].

Corollary 5.11. A theory is M-CCP satisfiable if it is CCP satisfiable. In particular, Computable Super-Determinism, Computable Frequentism, and the Toy Best System
Theory are M-CCP-satisfiable.Remark 5.12 (Mixtures and Merging).Let { k } k∈I be a countable family of probability measures at least one of which is a prior and for each k ∈ I, let q k > 0 with k∈I q k = 1.Then := q k • k is a prior.And merges with any measure merged with by one of the k : if is merged with by k , then Î k (by Proposition 5.7), so Î (given how was defined ), and so merges with (by Proposition 5.6).
In the case of Computable Super-Determinism, a prior satisfies CCP if and only if it satisfies M-CCP.But, for more typical theories, we expect that there will be priors that satisfy M-CCP without also satisfying CCP.
Example 5.13.A theory L and a prior may satisfy M-CCP without satisfying CCP (even if L is proper and is computable): let L be the theory that assigns the fair coin measure .5 to every sequence and let † be the prior that thinks there is a .9chance that the first bit is a 0 and thinks that each subsequent bit is chosen by flipping a fair coin.We have merging (because .5 and † agree completely when conditionalized on any non-trivial initial segment) but CCP is violated.
Example 5.14.Improper theories of chance are never CCP-satisfiable.But they can be M-CCCP satisfiable.Consider the theory of chance that assigns the measure † of the preceding example to each sequence that begins with 0 as its law of chance (and which considers sequences that begin with 1 to be lawless).This theory is CCP-unsatisfiable (because it is improper- † assigns its chance hypothesis probability .9).But the fair coin measure (considered as a prior) satisfies M-CCP for this theory.
We have a partial converse to Corollary 5.11.

Proposition 5.15. Any proper theory of chance that is M-CCP-satisfiable is also CCP-satisfiable.
Proof.Let L be a proper theory of chance that is also M-CCP satisfiable.We know from Proposition 5.8 that L has a countable family of laws of chance: 1 , 2 , .... Let be a prior that merges with each k .A result of Ryabko tells us that if K is a set of probability measures on 2 and there exists a probability measure that merges with each measure in K, then there exists a measure * that merges with each measure in K and which can be written as a sum of measures, each of which is a positive multiple of a measure in K. 58 Applied to our case: there must be a * that is a weighted sum of the k and which merges with each k .By Proposition 5.7, each k Î * -so each k appears with positive weight in * .So * (or the result of adding something to it to make sure that we have a prior as in the proof of Corollary 3.6) satisfies CCP for L (by Proposition 3.4).
Of course, it does not follow from this that M-CCCP-satisfiability implies CCCPsatisfiability for a proper theory of chance L: it could be that the only computable priors that merge with every chance law of L are not themselves weighted sums of the chance laws of L. So, in particular, the above results leave open the questions whether Computable Super-Determinism, Computable Frequentism, and the Toy Best System Theory are M-CCCP-satisfiable.We will see in Proposition 5.25 that Computable Super-Determinism is M-CCCP-unsatisfiable.We can use a variant of the proof of Proposition 4.9 to show that there are no computable priors that satisfy M-CCP for Computable Frequentism or for the Toy Best System.Proposition 5.16.Let R = {r k } k∈J be a set of computable binary sequences and for each k ∈ J, let k be the Bernoulli measure whose parameter admits the binary expansion r k .If is a computable prior that merges with each k , then there is an extrapolating machine that NV-learns each r k .
Proof.It suffices to show that for each r k ∈ R, ML ∩ ML k = ∅, and then to argue as in steps ( 2) and ( 3) of the proof of Proposition 4.9.
For each k ∈ J, let N k be the set of sequences in which 0's have limiting relative frequency r k .Then we can write = + k∈J c k k , where is a (possibly trivial measure) that considers each N k a null set, each c k > 0, and each k is a probability measure that assigns N k probability one (so c k • k is the restriction of to N k ).Since merges with each k , we know via Proposition 5.7 that k Î , from which it follows that k Î k .
The k may or may not be computable.But the notion of Martin-L öf randomness can be generalized to apply to arbitrary probability measures: we call a sequence S blindly Martin-Löf random relative to the probability measure if there is no uniformly effective family of open sets U 1 , U 2 , ... such that S ∈ U k and (U k ) ≤ 2 -k for each k. 59We write BML to denote the set of sequences blindly Martin-L öf random relative to the probability measure (of course BML = ML when is computable).Note that we have (BML ) = 1 for any probability measure . 60e claim that for any k ∈ J, we have BML k ∩ ML k = ∅.For suppose otherwise.Then of course k (ML k ) = 0 (since k (BML k ) = 1).But since k (ML k ) = 1, this contradicts the fact that k Î k .
And we can now argue, as in the first step of the proof of Proposition 4.9, that since can be decomposed as a sum of measures in which each k appears with non-zero weight, every sequence in BML k must also be in ML .So we have, as desired, that for each r k ∈ R, ML ∩ ML k = ∅.Corollary 5.17.There are no computable priors that satisfy M-CCP for Computable Frequentism or for the Toy Best System.

Weak merging.
M-CCP is weaker than CCP and M-CCCP is weaker than CCCP.But for proper theories of chance M-CCP satisfiability is equivalent to CCP-satisfiability and it would be surprising if there were a significant gap between M-CCCP-satisfiability and CCCP-satisfiability for such theories.So it makes sense to consider a further alternative to CCP.Here is another criterion of learning that has played some role in game theory. 61 59 Fr this notion, see [42].For some purposes it is natural to work with another generalization of the notion of Martin-L öf randomness to the setting of arbitrary measures, on which one requires -random sequences to avoid all -null sets that are effectively definable relative to an oracle encoding .See [56]. 60The reasoning of footnote 36 carries over unchanged. 61Weak merging was introduced in [38].The presentation here follows that of Lehrer and  Smorodinsky [44].For further discussion, see [10] (where weak merging is called next-chance learning).
Definition 5.18 (Weak merging).Let and be probability measures on 2 .We say that weakly merges with if there is a set of sequences of -measure one such that for each S in that set: Definition 5.19 (WM-CCP).Let L be a theory of chance.We say that a prior on 2 satisfies the Weak Merging Chance-Credence Principle (WM-CCP) for L if weakly merges with each chance law of L.
If a prior weakly merges with a proper law of chance of a theory of chance, then we can take the set of sequences that witnesses weak merging to be a subset of the chance hypothesis of .Definition 5.20.We call a theory of chance L WM-CCP-satisfiable if there is a prior that satisfies WM-CCP for L. If there is a computable prior with this feature, we say that L is WM-CCCP-satisfiable.
The following is immediate.

Proposition 5.21. If one probability measure merges with another, then the first also weakly merges with the second. If a theory of chance is CCP-or M-CCP-satisfied (CCCP-or M-CCCP-satisfied ) by a given prior, then it is also WM-CCP-satisfied (WM-CCCP-satisfied ) by it. If a theory is CCP-or M-CCP-satisfiable (CCCP-or M-CCCP-satisfiable) then it is also WM-CCCP-satisfiable (WM-CCCP-satisfiable).
Remark 5.22 (Stronger than it looks).In moving from merging to weak merging we lower our sights-rather than requiring that, in the limit of large data sets sampled from , for every event, the judgement of of the probability of that event must conform to the judgement of (in a fashion uniform across events) we require this only for a special class of events-the identity of the next bit to be revealed.
But to require weak merging is to require more than may be immediately apparent.It turns out that weakly merges with if and only if: as the size of the data set goes to infinity, for each k, must asymptotically learn to defer to in a way uniform across all events that look k time-steps beyond the data about the chance of all such events. 62So merging is stronger than weak merging precisely in paying attention to events involving an infinite number of times (a type events about whose epistemological salience a Humean might well be suspicious).
Example 5.23.Consider the Bayes-Laplace prior. 63This is the computable probability measure ¯ determined by the rule that if is an m-bit string containing 0's, then Famously, there is a sense in which ¯ is a Lebesgue-uniform mixture of the Bernoulli measures-and from this it follows that ¯ weakly merges with each Bernoulli measure. 64 62 Se [44, definition 9 and remark 5]. 63For more on this prior, see, e.g., [21,   64 See, e.g., [23, 24].

It then follows that ¯ satisfies WM-CCP (indeed, WM-CCCP) for Basic Frequentism and for Computable Frequentism.
This example shows that theories with uncountably many laws of chance can be WM-CCP-satisfiable (and even WM-CCCP-satisfiable), even though they must be CCCP-, CCP-, M-CCCP, and M-CCP-unsatisfiable.But there are limits.Proposition 5.24.Let L be a theory of chance and let be a prior that satisfies WM-CCP for L. Then the laws of chance of L include only countably many delta-functions.In particular, Super-Determinism is WM-CCP-unsatisfiable.
Proof.We generalize the notion of an extrapolating machine: an extrapolator is a (not necessarily computable) function from binary strings to bits.We extend the notion of NV-learning to extrapolators: extrapolator m NV-learns sequence S if when shown S bit by bit, m is eventually always correct in its predictions about the next bit.The set of sequences NV-learned by an extrapolator m is always countable (each such sequence can be generated by extending some binary string by asking m what it expects to see next).Any prior determines an extrapolator m * via the rule: Suppose that prior satisfies WM-CCP for theory of chance L. Suppose that for some S ∈ 2 , S is a law of chance of L. We can make (0 | S n) as close as we like to S (0 | S n) by choosing n sufficiently large.It follows that m * NV-learns S. Since m * cannot NV-learn an uncountable set of sequences, there can only be countably many delta-functions among the laws of chance of L.
Computable Super-Determinism and the Toy Best System Theory are WM-CCPsatisfiable (being CCP-satisfiable).But they are not WM-CCCP-satisfiable.

Proposition 5.25. No computable prior weakly merges with each delta-function concentrated on a computable sequence.
Proof.Let be a computable prior with associated extrapolating machine m (as in Section 4.1) and let S be a binary sequence.Arguing as in Proposition 5.24 we see that if weakly merges with S , then m NV-learns S. But Proposition 4.5 implies that no extrapolating machine NV-learns each computable sequence.
Remark 5.26 (Mixtures and weak merging).In Remark 5.12, we saw that when we build a prior by taking a non-trivial mixture of some measures, that prior merges with every probability measure merged with by any component of the mixture.The picture is different with weak merging: Ryabko and Hutter give an example in which is a prior, is a probability measure, and 1 2 ( + ) fails to weakly merge with a delta-function measure that is weakly merged with by . 65So it is not obvious that you can build a prior suited to a Best System theory of chance by taking a mixture of a prior (such as the Bayes-Laplace prior) that weakly merges with a desirable range of Bernoulli measures with a measure that weakly merges with a desirable range of delta-function measures. 65See [59, proposition 10].In this connection, see also [44, corollary 6].

Toy Best System Theory:
The Toy Best System is not CCCP-satisfiable (see Proposition 4.7 or Corollary 4.10).It is also not WM-CCCP-satisfiable (Proposition 5.25)-from which it follows that is not M-CCCP-satisfiable (Proposition 5.21-or see Corollary 5.17).On the other hand, this theory is CCP-satisfiable (Proposition 3.4).So it is also M-CCP-satisfiable (Corollary 5.11) and WM-CCP-satisfiable (Proposition 5.21).§7.Conclusion.The main goal of this paper has been to show that it is more difficult one might think to find a consistent package consisting of a reductionist theory of chance and a principle encoding the rational relation between credence and chance-especially if one expects there to be rationally permitted computable priors. 66n Section 3 we saw there exists a prior probability measure satisfying CCP relative to a given proper theory of chance if and only if that theory has only countably many chance laws.This is, of course, a direct consequence of the choice to use ordinary (i.e., countably additive) probability measures to model rational credal states.Some reductionists about chance may be happy to ignore theories with uncountably many chance laws-e.g., advocates of best-system approaches who take fully seriously the idea that each chance law should be something like a humanly expressible theory of the world.But many reductionists will, I suspect, regard Basic Frequentism, the laws of chance of which are the Bernoulli measures, as a theory that should be compatible with our account of the relation between chance and credence.In the first instance, such reductionists face a choice between accepting that credal states can be represented by objects more general than probability measures and replacing CCP with something weak enough to accommodate the existence of rationally permitted priors adapted to theories of chance with uncountably many chance laws.
In Section 4 we saw that even if priors exist that satisfy CCP relative to a given theory of chance, there may be no computable priors with this feature.In particular this happens when the laws of chance of a theory include each delta-function measure concentrated on a computable sequence-the root of the problem being that the computable sequences, though enumerable, are not computably enumerable.A related problem arises for theories whose chance laws include each Bernoulli measure with a computable parameter.I do not think we should rest content with an account of chance and credence that tells us that rationality requires us to adopt a noncomputable credence function-any more than we would be satisfied with an account of rationality that required rational agents to be able to solve the Halting Problem.Some reductionists may be happy with the moral that certain prima facie attractive theories of chance are unsuited to computable agents-such reductionists will then need to be careful to avoid using computationally intractable sub-families of the deltafunction measures and Bernoulli measures as laws of chance.Others may again be interested in investigating whether the problem can be avoided by generalizing the Bayesian framework or by endorsing a constraint of theories of chance on rationally permitted priors weaker than CCP.
There are a number of standard strategies that Bayesians have explored in other contexts in which the presence of an uncountable number of alternatives and/or awkward null sets causes trouble, including infinitesimal-valued credence functions, merely finitely additive probabilities, regular conditional probabilities, and primitive conditional probabilities.These are considered briefly in Appendix B, where it is argued that none of them offers much help in our current predicament.
Section 5 explored two ways of weakening CCP with an eye to satisfying those desiderata: M-CCP and WM-CCP.The latter turns out to be the more fruitful for our purposes.It says, roughly, that the rational priors are those with the feature that, almost certainly, in the limit of large data sets the posterior probabilities will converge to the true chances for events that depend on only finitely many times.WM-CCP has the attractive feature that it is satisfied by the computable Laplace-Bayes prior, both relative to Basic Frequentism (with its uncountable family of chance laws) and relative to Computable Frequentism (with its enumerable but computationally intractable family of chance laws).So it is a substantive constraint on priors that is consistent with the intuition that frequentism in its several forms is one of the more forgiving of reductionist approaches-and with the thesis that rationality should be consistent with computability.But like CCP, WM-CCP cannot be satisfied by any computable prior relative to a theory of chance that includes among its laws of chance all delta-function measures concentrated on computable sequences.
Other responses to the problems encountered in Sections 3 and 4 are of course possible.One of the most natural would be to reject the assumption that we have implicitly carried over from the literature on the Principal Principle: that our account of the relation between credence and chance should be the basis of a dichotomy between priors that are rationally permitted and those that are not rationally permitted.Perhaps we should instead think of considerations about chance as inducing a partial ordering more rational than on possible priors.For instance, given a theory of chance L and a prior we could look for the largest sub-theory of L (= restriction of L to a subset of its domain of definition) such that satisfies one of our principles (CCP, M-CCP, or WM-CCP) for that sub-theory, considering 1 to be no more rational than 2 if the sub-theory L 1 of L associated with 1 is itself a sub-theory of the sub-theory L 2 or L associated with 2 .If there are priors that satisfy the given principle for a given theory of chance then they will be fully rational in the sense that no prior is more rational then them-but otherwise, there will just more and less rational priors without any being fully rational.
Applied to CCP or to M-CCP, this strategy is not particularly enticing: it tells us that the fair-coin measure is a more rational prior for Basic Frequentism than the Laplace-Bayes indifference prior-even though agents with the fair coin measure as their prior will remain certain the chance that the next bit will be 0 is .5 no matter what data they see, while the posterior probabilities that agents with the Laplace-Bayes prior assign to that event will converge (almost surely) to the true chance (if one there be) in the limit of large data sets.But applied to WM-CCP, this strategy suggests a way forward for fans of the best-system-type approaches in grappling with the problems surrounding deltafunction measures concentrated on computable sequences (although some challenges would remain).This is far from a complete account of possible responses to problems exposed above.But I hope, at least, that one moral is clear: interesting work remains to be done for anyone who is attracted to Lewis's accounts of chance and credence and who is also inclined to take the computability of priors to be consistent with their rationality.§Appendix A. Lewisiana.This appendix examines the relation between the approach adopted above and the main stream of literature on the Principal Principle.The two chief goals are to clarify the relation between the Chance-Credence Principle and the Principal Principle and to examine the proper role of improper theories of chance.
A.1.CCP and the principal principle.Lewis thought of his Principal Principle (PP) as a generalization of the sort of principle associated with Miller and Hacking. 67oughly speaking: PP says not only that a rational prior should agree with the chance facts when conditionalized on the chancemaking facts, but also there are many further propositions that are screened off by the chance facts. 68CCP is intended to be a special case of (a reasonable explication of) Lewis's PP in which we ignore such further screened-off propositions (along with attendant complications about admissibility).For the curious, I offer further comments on the relation between CCP and PP.
1.As Lewis formulates it, PP has no relativity to a theory of chance.But such relativity is unavoidable: PP is intended to compatible with a range of reductionist theories of chance (Lewis was not committed to the Best-System account when he first formulated PP); and priors that satisfy the spirit of PP with respect to one such theory may be incompatible with the spirit of PP with respect to other such theory (consider, e.g., the question whether it is rationally required to put non-zero prior credence on the sequence that eternally alternates between 0 and 1). 2. As Lewis formulates PP, it involves talking about the chances at a time, which in the present framework would involve conditionalizing a law of chance on a binary string.Of course, one such string is the empty string-and specializing a Lewis-style formulation to that case gives a version in which time no longer plays a role.And: no content is lost in via this specialization. 693. Omitting all the bits about the screened-off event E and the time t, Lewis's PP reads: Let C be any reasonable initial credence function.Let x be any real number in the unit interval.Let X be the proposition that the chance of A's holding equals x.Then C (A | X ) = x. 70r Lewis, a proposition is a set of worlds.So in our context, the X must correspond to a subset of 2 .Which one?Recall that above we quoted Lewis as saying that the idea behind PP is that a rational prior should have the feature that it says that the chance of an outcome is 50% when conditionalized on the 67 See [7, Letters 659 and 660]. 68It is not clear that Lewis achieves what he set out to do: if a prior satisfies CCP then it is (essentially) a weighted sum of the laws of chance-so the only freedom available in choosing a prior reduces to choosing these weights, and screening-off conditions are not going to help to fix these.In this connection, see the discussion of Pettigrew [51, sec.1]. 69On this point, see [51, fns. 3 and 4]. 70Adapted from [45, p. 87].See also [46, sec.5].
proposition that makes it true that the chance of that outcome is 50%.So in the schema above, X should be the proposition that the chancemaking pattern obtains in virtue of which the chance of A is x.In the case where is the only law of chance that assigns A chance x, this is the pattern that makes be the law of chance-that is, the chance hypothesis Λ corresponding to .More generally, there may be more than one chance law that assigns A chance x-and then X will be the disjunction of the corresponding chance hypotheses.But if a prior satisfies a Lewis-style principle that deals in such disjunctive events, then it also satisfies one in the style of CCP (since you can always replace A by the conjunction of A with the proposition that a certain chance law holds, which lands you in the simple case considered above).And, at least in the case where the disjunction of chance laws under consideration involves only finitely many chance laws, the converse also holds.71

A.2. What about the New Principle?
Each of the five paradigmatic theories of chance that we have been concerned with is proper.If we were to work with worlds whose complete histories corresponded to finite binary strings instead of to infinite binary sequences, it would have been difficult to find interesting proper theories of chance. 72For instance, since the fair coin measure assigns positive probability to each finite string, it can be a proper law of chance of a theory for finite worlds only if it is the sole law of chance.
However, there are improper theories of chance even in the infinite setting.Example 2.10 provided a toy example: a theory of chance in which all and only sequences beginning with 1 get the fair coin measure .5 as their (improper) law of chance.
It is not obvious that it makes sense to assert the proposition corresponding to an improper law of chance.In Example 2.10, for instance, the relevant proposition implies both that the first bit will be 1 and that for each bit, there is only a 50-50 chance that it will be 1.Now, corresponding to any theory of chance L there is a proper theory of chance L whose laws of chance are given by ¯ := (• | L -1 ( )) (so that L = L if and only if L is proper). 73To my mind, it is more natural (because less Moore-paradoxical) to interpret someone apparently advancing an improper chance L as having chosen an clumsy way to assert L. 74 The observation that the Principal Principle is inconsistent with improper theories of chance launched the large and messy literature on undermining with its many attempts to correct the Principal Principle.If you are the sort of person who likes improper theories of chance, then it is worth noting the following.Specialized to our setting, the New Principle of Hall [28] and Lewis [46] takes the following form.
Definition A.1.Let L be a theory of chance and let be a prior.We say that satisfies the New Principle for L if: for any chance law of L with chance hypothesis Λ = L -1 ( ) and for any Borel subset A of 2 , we have it requires of priors that when conditionalized on the set of worlds corresponding to a law of chance of that theory they give the same probability to each event that the law of chance itself does.The discussion in the main body of this paper took for granted that the conditional probability of A given B is the ratio of two real numbers, the probability of A&B and the probability of B, and so is undefined when the probability of B is zero.This led to Proposition 3.4-and from there, along the path that led us to consider CCCP, M-CCP, and the rest.Many readers will want off the boat as soon as CCP appears-either because they think it is too intolerant of null sets (see the discussion of CCP * in Section 5) or because they prefer some more sophisticated approach to handling conditional probabilities.
Here I will briefly give my reasons for cleaving to the simple-minded approach followed above.In brief: allowing primitive conditional probabilities to model rational credal states would in any case push us in the direction pursued in Section 5; allowing merely finitely additive probabilities (or non-standard-valued probabilities) to model rational credal states would not help us to evade the delta-function version of the computational obstruction encountered in Section 4; and reformulating CCP in terms of regular conditional probabilities requires us to abandon the connection between learning and deference to chance discussed in Sections 1 and 5.

B.1. What about primitive conditional probabilities?
The idea that conditional probability should be taken as primitive rather than unconditional probability has its attractions.Why not just skip all the aggravation and work in a formalism in which the basic probabilistic notion takes two arguments, rather than one?
There are several of ways of working this idea out. 77Note that under the most plausible ones, we can always conditionalize on logical truths and (when defined) the result of conditionalizing on a proposition is a probability measure.So we always have a surrogate within this framework for our ordinary one-place prior probabilities.So we can think of primitive conditional probabilities as plans of the following form: I will start life with prior 0 (an ordinary probability measure); if I face evidence assigned positive probability by 0 , then I update by conditionalization; if I face evidence corresponding to a null set of 0 , then I deploy a backup plan (essentially, a fall-back prior that happens to assign the evidence in question probability one).
Fix some fancy gizmos to your liking that encode primitive conditional probabilities.Cook up a version CCP ‡ of CCP adapted to such fancy gizmos.Fix a well-behaved theory of chance L with uncountably many laws of chance that is CCP ‡ -satisfied by a fancy gizmo relative to this theory of chance.Let us call an ordinary prior on on 2 L-admissible if it arises by conditionalizing a fancy gizmo satisfying CCP ‡ for L on a logical truth.It will count against this approach if there are no L-admissible priors: this would tell us that in order to obey the principle that credence should defer to L-chance, we must assign prior probability zero to some possible finite data sets (and so be willing to bet our lives against observing such data etc.).It would also be bad if the fair coin measure were L-admissible: when it comes to forecasting probabilities of future events, someone with this prior never learns from experience-so it is hard to see how they could be deferring to L-chance.
So, presumably, there must be some interesting difference between priors that are L-admissible and those that are not.And since we are working in a context in which possible data sets are finite binary strings and in which all priors assign positive probability to each such string, any agent whose initial credal state is given by a fancy gizmo 0 that determines an admissible prior 0 is always going to behave exactly like someone whose initial credal state is given by 0 .Now, we know that 0 doesn't satisfy CCP (no prior does).This strongly suggests that it will be more fruitful to search directly for a generalization of CCP that focuses on the response of priors to finite data sets (as M-CCP and WM-CCP do) rather than proceeding indirectly via the apparatus of primitive conditional probabilities.

B.2. What about merely finitely additive priors?
Note, first, that merely finitely additive priors defined on -algebras are non-constructive objects: the existence of such objects cannot be proven without recourse to some choice-like axiom. 78So such objects can, presumably, be set aside given our interest in computable priors.
At the same time: any finitely additive probability measure on 2 defined only on the algebra of sets A generated by the basic sets is in fact countably additive (see footnote 8). 79So in order to be of interest for present purposes, a finitely additive prior would need to be defined on an algebra intermediate between the algebra A generated by the basic sets and the -algebra B generated by them.Our notion of a computable probability measure on (2 , B) exploits two facts: there is a natural sense in which by computably enumerating the binary strings, we effectively list the basic sets; and the behaviour of a probability measure on the basic sets determines its behaviour on all Borel sets.Presumably, in order for finitely additive probability measures defined on an algebra intermediate between A and B to be candidates for computability, we would want to demand something similar: that there be a sense in which we can effectively list a set of generators for this algebra and that any finitely additive probability measure on this algebra is determined by its restriction to these generators.
Example B.1.Let C 0 algebra generated by the basic sets together with {0 } (the singleton set containing the all 0's sequence). 80In order to specify a finitely additive probability measure M on (2 , C) it suffices to specify: (i) the probability that M assigns to each basic set; (ii) the probability that M assigns to {0 } ( freely chosen, subject to the constraint that it does not exceed R : 81 So long as these data are computable, we can reasonably consider the finitely additive measure M : C 0 → [0, 1] to be computable. 82re generally, it is clear how to make sense of the notion of a computable but merely finitely additive probability measure on an algebra C k whose generators include the basic sets together with k singleton sets of computable sequences. Let C be the algebra generated by the basic sets together with the singleton sets of all of the computable sequences. 83A C-prior is a finitely additive probability measure on (2 , C) that assigns positive probability to each basic set.We say that a C-prior M satisfies C-CCP for Computable Frequentism if for each computable sequence S and each B ∈ C we have M (B | {S}) = S (B).There are two reasons that this will not allow us to evade the obstruction to CCCP-satisfiability that we encountered in Section 4.
1.In order for us to think of a finitely additive measure M on C as being computable, we need to think of there being an effective listing of its generators, the basic sets and the singletons of computable sequences.This would appear to require that the computable sequences be uniformly computable-which they are not (see Remark 4.14).2. Let M be a C-prior on (2 , C) and let be the unique prior on (2 , B) that whose restriction to basic sets coincides with that of M. Note that if M ({S}) > 0 for some sequence S, then ({S}) > 0 as well. 84So if M satisfies C-CCP for Computable Frequentism, then ({S}) > 0 for each computable S-which means that cannot be computable.So M cannot be computable even in the weak sense that its restriction to the basic sets is computable in the usual sense.

B.3. What about priors taking non-standard values?
Many philosophers follow Lewis [45] in using as representors of rational credal states generalized probability measures taking values in extensions of the unit interval that include infinitesimals.The real part of such an object is a finitely additive probability measure.So the considerations adduced above against the helpfulness of merely finitely additive priors carry over the case of non-standard-valued priors.

B.4. What about regular conditional probabilities?
In the following, 2 always carries the product topology (generated by the basic subsets), any subspace of 2 always carries the corresponding subspace topology, the unit interval always carries its standard topology (generated by its open subintervals), and P always carries the weak topology. 85Unless otherwise noted, when considered as measurable spaces, we take these spaces to be equipped with the Borel -algebra generated by their open sets. 8682 Let be the unique (countably additive) probability measure on (2 , B) that agrees with M when restricted to basic sets.Then R = ({0 }) and M is the restriction to C 0 of if and only if M ({0 }) = R. 83 A finitely additive probability measure on C is determined by its behaviour on sets of the following kinds: basic sets, finite unions of singleton sets of computable sequences, and basic sets with such finite unions of singletons deleted.And C consists of finite disjoint unions of sets of these kinds (plus the empty set).See, e.g., [12, theorems 1.1.9(3)and 3.5.1(ii)]. 84See [12, propositions 3.2.7 and 3.3.1]. 85Recall that the weak topology on P can be characterized as follows: for any p, q ∈ Q with p < q and any basic set For any theory of chance L, we denote by Λ + the set of sequences to which L assigns chance laws and by Λ * the set of lawless sequences.We have all along required that each chance hypothesis of a theory of chance be Borel.The following is a natural strengthening of that requirement.Definition B.2.A theory of chance L is measurable if it is measurable as a map from Λ + to P. 87 Our five paradigmatic reductionist theories of chance are measurable.This is more or less immediate for Super-Determinism, Computable Super-Determinism, Computable Frequentism, and the Toy Best System. 88And the measurability of the Basic Frequentism is implicit in standard proofs of the de Finetti Representation Theorem. 89mark B.3. 2 and P can both be viewed as computable metric spaces in a natural way. 90So it makes sense to ask whether an acceptable theory of chance should be computable as well as measurable.There is a powerful intuitive reason for resisting the idea that they should be: even computable frequentism is not computable as a map from 2 to P. 91 However, computable frequentism satisfies the weaker condition of layerwise computability. 92So it would perhaps be reasonable to require an acceptable theory of chance to be layerwise computable. 93finition B. (where is the chance law corresponding to Λ). 94 87 When considering a theory of chance L, it is sometimes helpful to consider the -algebra M on 2 generated by Λ * together with the chance hypotheses of L. Note that if L is measurable, then M is a sub--algebra of B. 88 For the case of Super-Determinism, see, e.g., [39, lemma 3.1].For the others, note that for any theory of chance L and any Borel subset D of P, L -1 (D) will be a union of chance hypotheses of L. When L has only countably many chance laws, each such union is Borel (being a countable union of Borel sets). 89See, e.g., [43, sec.12.3]-thanks to Chris Mierzewski for this labour-saving observation.For a direct proof of the measurability of Basic Frequentism note that by [39, lemma 3.1.iii]it suffices to show that for any fixed string and any closed interval J with rational endpoints, the set of sequences S such that r(S) ([[ ]]) ∈ J is Borel (where r(S) is the limiting relative frequency of 0's in S, if defined).This will just be the set of sequences in which the limiting relative frequency of 0's lies in some closed interval, which is Borel-see, e.g., [40, p. 70]. 90See, e.g., [25, Appendix B]. 91 The problem is that because for each r, the set of sequences in which 0's have relative frequency r forms a dense subset of 2 , there is no way to force the relative frequency of 0's in S n to be close to the limiting relative frequency of 0's in S by choosing n sufficiently large.On this point see, e.g., [13, sec.IV]. 92See, e.g., [13, sec.IV].For further discussion, see [33, sec.3]. 93Thanks to Chris Mierzewski and to Francesca Zaffora Blando for the discussion of these points. 94Let M be as in footnote 87 and let S * be some L-lawless sequence.(where we write Λ(S) for L -1 (L(S))). 95ample B.6.Suppose that L is a proper theory of chance with only countably many chance laws 1 , 2 ... (with chance hypotheses Λ 1 , Λ 2 , ...).Then L is measurable (see footnote 88).If is a prior that assigns positive probability to each Λ k and probability zero to the lawless sequences of L, then p L is a proper regular conditional probability for if and only if, for each B ∈ B, which happens if and only if p L (B Λ k ) = (B Λ k )/ (Λ k ).
The core idea behind the Principal Principle, codified in the Chance-Credence Principle, is that, conditional on knowing which law of chance is actual, your credences in events should coincide with their actual chances.In the setting of a proper theory of chance with only countably many chance laws this says: Adopt any prior you like, so long as the probability measure that results from conditionalizing it on a chance hypothesis is the corresponding chance law.
The preceding example shows that one way to generalize this advice to the setting of uncountably many chance laws is to advise: Adopt any prior you like, so long as the kernel for the theory of chance under consideration is a proper regular conditional probability for it.
Think of this kernel as a plan: if you find out which chance hypothesis obtains, then you will adopt credences given by the corresponding chance law; requiring that the kernel be a proper regular conditional probability is a way of constraining priors so that this plan can be thought of arising naturally out the priors rather than being wholly arbitrary.is measurable as a function from (2 , M) to the unit interval (by the measurability of L, via Lemma 3.1 of [39]).So p L determines a probability kernel from (2 , M) to (2 , B).If Λ * is empty, then this probability kernel is proper, in the sense that for each M ∈ M we have κ(M, S) = 1 if S ∈ M. Otherwise, it is proper modulo Λ * (in the sense that the given condition holds for any M disjoint from Λ * ). 95For a helpful philosophical introduction to regular conditional probabilities, see [20].
Blackwell and Dubins [15, sec.1] observe that if (X, C, ) is a probability space and D is a sub--algebra of C, then a proper probability kernel κ : C × X → [0, 1] from (X, D) to (X, C) is a regular conditional probability for (in the standard sense) if and only if (•) = κ(•, S) d (S).Here we specialize to the case (X, C, ) = (2 , B, ) and D = M (with M as in footnote 87).Since Λ * is by assumption a -null set, the distinction of footnote 94 between being proper and being proper modulo Λ * is immaterial.Definition B.7 (CCP * * ).Let L be a proper and measurable theory of chance and let be a prior that considers the set of L-lawless sequences to be null set.We say that satisfies CCP * * for L if the kernel p L of L is a regular conditional probability for .
Good news: the Laplace-Bayes prior of Example 5.23 satisfies CCP * * for Basic Frequentism. 96Bad news: Proposition B.8.Let L be a proper and measurable theory of chance.Then any law of chance of L that is also a prior satisfies CCP * * for L.
Proof.Let be a prior that is also law of chance of L (with chance hypothesis Λ).Let B be a Borel subset of 2 .Then we have where the first equality follows because L is proper (so that (Λ) = 1), the second because Λ(s) = Λ for each S ∈ Λ, the third by the definition of the kernel p L , the fourth because (B) doesn't depend on the variable of integration, and the fifth again via propriety.(Super-Determinism is of course a pathological theory-but it would be more considerably more reassuring to be told that no prior is rational for someone who hopes to defer to this theory of chance.) Acknowledgments.Previous versions of the paper were presented at NYU, CMU, and PSA 2022.For helpful correspondence and suggestions, thanks very much to Anonymous, Dave Baker, Cian Dorr, John Earman, Dmitri Gallow, Kevin Kelly, Alex 96 On this point see, e.g., [43, sec.8.3].
, so long as the probability function p, which represents your credences, satisfies p(H | A) = .5,p(H | B) = .35,and p(H | C ) = .8.That is, the obvious calculation works here so long as you credences satisfy a basic form of the Principal Principle.

4 .
Let L be a measurable theory of chance and let L be the set of chance hypotheses of L. The kernel of L is the map pL (• •) : B × L → [0, 1] defined by p L (B Λ) = (B) Define κ : B × 2 → [0, 1] by κ(B, S) := p L (B L -1 (L(S))) (unless S is lawless, in which case set κ(B, S) = S * (B)).Then, for any sequence S, κ(•, S) ∈ P (by the definition of p L ) and for any B ∈ B, κ(B, •) Definition B.5.Let L be a proper and measurable theory of chance with kernel p L .Let be a prior that considers the set of lawless sequences of L to be a null set.Then p L is a proper regular conditional probability for if for each B ∈ B we have (B) = p(B Λ(S)) d (S) p L (B Λ(s)) d (s) = Λ p L (B Λ(s)) d (s)

Corollary B. 9 .
The fair coin measure satisfies CCP * * for Basic Frequentism.Example B.10 (Super-Determinism).Also worrying: every prior satisfies CCP * * for Super-Determinism (i.e., the map L : S → S ).For let be a prior and let B be Borel subset of 2 .For any Borel set B and any binary sequence S, we have p L (B Λ(S)) = I B , where I B is the characteristic function of B. So p L (B S) d (S) = I B (S) d (S) = (B).