Skip to main content Accessibility help
×
Home
Hostname: page-component-684899dbb8-c97xr Total loading time: 2.854 Render date: 2022-05-27T18:37:11.488Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "useRatesEcommerce": false, "useNewApi": true }

TWO-SORTED FREGE ARITHMETIC IS NOT CONSERVATIVE

Published online by Cambridge University Press:  18 April 2022

STEPHEN MACKERETH*
Affiliation:
DEPARTMENT OF PHILOSOPHY UNIVERSITY OF PITTSBURGHPITTSBURGH, PA, USA
JEREMY AVIGAD
Affiliation:
DEPARTMENT OF PHILOSOPHY CARNEGIE MELLON UNIVERSITYPITTSBURGH, PA, USAE-mail: avigad@cmu.edu
*
Rights & Permissions[Opens in a new window]

Abstract

Neo-Fregean logicists claim that Hume’s Principle (HP) may be taken as an implicit definition of cardinal number, true simply by fiat. A long-standing problem for neo-Fregean logicism is that HP is not deductively conservative over pure axiomatic second-order logic. This seems to preclude HP from being true by fiat. In this paper, we study Richard Kimberly Heck’s Two-Sorted Frege Arithmetic (2FA), a variation on HP which has been thought to be deductively conservative over second-order logic. We show that it isn’t. In fact, 2FA is not conservative over n-th order logic, for all $n \geq 2$ . It follows that in the usual one-sorted setting, HP is not deductively Field-conservative over second- or higher-order logic.

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press on behalf of The Association for Symbolic Logic

1 Introduction

Frege [Reference Frege10Reference Frege12] sought to derive the theorems of arithmetic from nothing but basic logical laws and definitions. Such a derivation, called a logicist derivation of arithmetic, would provide the ultimate foundation for our arithmetical knowledge. It would justify the theorems of arithmetic once and for all by deriving them from principles that needed no justification—principles that were either self-evident (‘basic logical laws’) or true simply by stipulation (‘definitions’).

By his own lights, Frege did not manage to give a logicist derivation of arithmetic. But he did show how to derive a very powerful system of arithmetic from a single, natural principle, known as Hume’s Principle (HP).Footnote 1 Informally, HP says, ‘The number of Fs is equal to the number of Gs iff there is a one–one correspondence between the Fs and the Gs.’ In second-order logic, HP is expressible as the universal closure of

$$ \begin{align*}\# F = \# G \ \leftrightarrow\ \exists R(F \approx_R G), \end{align*} $$

where $\#$ is an operator that combines with monadic second-order variables $F,G$ to form terms of object type, and $ F \approx _R G$ abbreviates the statement that R is a one–one correspondence between the Fs and the Gs.Footnote 2 Then we have the following beautiful result:

Theorem 1.1 (Frege’s Theorem)

The theorems of second-order arithmetic are derivable in second-order logic from HP together with eliminative definitions of natural number, zero, and successor.Footnote 3

Neo-Fregean logicists, preeminently Hale and Wright [Reference Hale and Wright14], argue that Frege’s Theorem already yields a logicist derivation of arithmetic. They claim that HP may be taken as an implicit definition of the operator $\#$ (‘the number of’) in purely logical terms.Footnote 4 Hale and Wright’s notion of implicit definition is deeply controversial. For our purposes, the main point is that Hale and Wright conceive of implicit definitions as true simply by stipulation [Reference Hale and Wright14, p. 117]. Such definitions need no justification. They are true by fiat. So, if Hale and Wright are correct, Frege’s Theorem does indeed yield a logicist derivation of arithmetic.

Not just anything can be stipulated to be true. We cannot establish any new ‘substantive’ truths by fiat. No one could have established by fiat that the Morning Star is the Evening Star. Accordingly, it is natural to think that any legitimate stipulative definition must meet the following requirement, known as conservativeness:

Definition 1.2. Let T be a theory in a formal language L. Let $\Delta $ be a definition of one new sign, and let $L^+$ be the language obtained by adding that new sign to L. Assume that deductive systems for L and $L^+$ have been specified. Then $\Delta $ is conservative over T iff any L-formula that is derivable $_{L^+}$ from $T + \Delta $ is already derivable $_{L}$ from T.

Intuitively, a definition is conservative over our theory T just in case adding it to our theory does not yield any new theorems expressible entirely in old vocabulary. The definition does not settle any open questions that we already knew how to ask.

But HP is not conservative. More precisely, HP is not conservative over pure axiomatic second-order logic—which presumably ought to be the starting theory for aspiring logicists.Footnote 5 For HP proves a sentence DI in the language of pure second-order logic which says that the universe is Dedekind-infinite (‘there is a one–one mapping from the universe into itself that is not onto’). But DI is not a theorem of pure second-order logic. So, it seems that HP cannot be a legitimate stipulative definition. Call this the conservativeness problem for neo-Fregean logicism.

The conservativeness problem is robust. Definitions that are conservative over pure second-order logic seem to be mathematically very weak, and hence unable to provide a foundation for arithmetic. Furthermore, adding more basic logical laws won’t help unless those laws suffice to prove DI. But it seems like a tall order to prove the existence of infinitely many objects from basic logical laws alone.

Hale and Wright respond to the conservativeness problem by denying that stipulative definitions must be conservative in the sense of Definition 1.2. Roughly speaking, they hold that stipulative definitions need only satisfy a modified conservativeness requirement, known as Field-conservativeness.Footnote 6 We set out to explore a different approach. Is it possible to find a variant of HP that is conservative in the standard deductive sense—the sense of Definition 1.2?

A promising direction is suggested by Heck’s work on the Julius Caesar problem [Reference Heck15, Reference Heck16]. Heck reconstrues Hume’s Principle as introducing a new sort of singular term into the language. Call the reconstrued principle two-sorted Hume’s Principle (2HP), and the theory that results from supplementing 2HP with logical axioms for the expanded language, two-sorted Frege Arithmetic (2FA). The theory 2FA interprets second-order arithmetic in the numerical sort. In particular, 2FA proves that the numerical universe is Dedekind-infinite. But there is no obvious witness to non-conservativeness, because the numerical sort is not part of the base language. Indeed, it has been claimed that 2FA is conservative over pure second-order logic [Reference Burgess3, p. 237, n. 7].

In this paper we prove that 2FA is not conservative over pure second-order logic. In fact, we prove something stronger. Our strategy is based on the following little fact:

Lemma 1.3. Let T be a theory in a formal language L, and let A be any L-sentence. Suppose that a sentence $\Delta $ is not conservative over $T + A$ . Then $\Delta $ is not conservative over T.

Proof. Let $\varphi $ be an L-sentence such that $T + A + \Delta \vdash \varphi $ , but $T + A \not \vdash \varphi $ . By the Deduction Theorem, we have $T + \Delta \vdash A \rightarrow \varphi $ , but $T \not \vdash A \rightarrow \varphi $ .

In Section 7, we consider a theory w2FA that is much weaker than 2FA. We show that w2FA is non-conservative over pure second-order logic together with an axiom saying that the base universe is infinite.Footnote 7 In other words, even if we already know that there are infinitely many objects, w2FA tells us something new about them! Then from Lemma 1.3, it follows that w2FA, and hence 2FA, is non-conservative over pure second-order logic.

In Section 8, we show that for the weaker theory w2FA, the non-conservativeness vanishes if we strengthen the base theory in either of two natural ways. First, w2FA is conservative over third- or higher-order logic. Second, w2FA is conservative over second-order logic plus ‘the base universe is finite’.

In Section 9, we present a different proof that 2FA is not conservative over pure second-order logic. This proof shows that 2FA remains non-conservative over the stronger base theories discussed in the previous section. Specifically, we show that 2FA is not conservative over second-order logic plus ‘the base universe is finite’, and the proof of this fact generalizes to third- and higher-order logic.

In order to state and prove these results, we will need some preliminaries. In Section 2, we explain the logical setting for the paper: many-sorted axiomatic second-order logic. In Section 3, we explain how to construe Hume’s Principle in a many-sorted setting, and we define the theories w2FA and 2FA. In Section 4, we present some background material on first- and second-order arithmetic. In Section 5, we show how to formalize some facts about well-orderings and finiteness in second-order logic. In Section 6, we discuss the Fraenkel model, which is the minimal infinite model of pure second-order logic.

In Sections 79, we prove the main results. Lastly, in Section 10, we connect our work to the literature on Field-conservativeness and related notions. Our main result implies that in a one-sorted setting, HP is neither deductively Field-conservative nor deductively Caesar-neutral conservative over second- or higher-order logic. This answers some open problems raised by Shapiro and Weir [Reference Shapiro and Weir27, p. 298], Fine [Reference Fine9, p. 192, n. 1], and Studd [Reference Studd30, p. 597]. We conclude by mentioning some open problems of our own.

2 Many-sorted second-order logic

We work in axiomatic second-order logic with many sorts of singular terms and first-order variables. In this section we explain the logical framework in considerable generality.

In Section 2.1, we define ‘sort’. In Sections 2.2 and 2.3, we define second-order languages $\mathcal {L}_J[K]$ for any nonempty set of object sorts J and any set of constant symbols K. We present deductive systems and general semantics for these languages. In Section 2.4, we define the two many-sorted second-order languages that will be central to the rest of the paper, called the base language $\mathcal {L} := \mathcal {L}_{\{0\}}[\varnothing ]$ and the expanded language $\mathcal {L}^+ := \mathcal {L}_{\{0, n\}}[\{\#_0, \#_n \}]$ .

2.1 Sorts

Let J be any nonempty set of symbols. These symbols are called first-order sorts or object sorts.

Let $\mathit {Sorts}^2(J)$ be the set of all tuples $\langle j_1, \dots , j_n \rangle $ with $n\geq 1$ and $j_1, \dots , j_n \in J$ . These tuples are second-order relation sorts formed from J.

Let $\mathit {Sorts}^3(J)$ be the set of all tuples $\langle \tau _1, \dots , \tau _n \rangle $ with $n\geq 1$ and $\tau _1, \dots , \tau _n \in J \cup \mathit {Sorts}^2(J)$ , and with at least one of $\tau _1, \dots , \tau _n$ belonging to $\mathit {Sorts}^2(J)$ . These tuples are third-order relation sorts formed from J.

Let $\mathit {FnSorts}(J)$ be the set of all tuples $\langle \tau _1, \ldots , \tau _n; \tau _{n+1} \rangle $ with $n\geq 1$ and $\tau _1, \ldots , \tau _n, \tau _{n+1} \in J \cup \mathit {Sorts}^2(J)$ . These tuples are function sorts formed from J.

Let $\mathit {Sorts}(J) = J \cup \mathit {Sorts}^2(J) \cup \mathit {Sorts}^3(J) \cup \mathit {FnSorts}(J)$ .

Intuitively, $\langle \tau _1, \ldots , \tau _n \rangle $ is the sort of n-ary relations with arguments of sorts $\tau _1, \ldots , \tau _n$ , while $\langle \tau _1, \ldots , \tau _n; \tau _{n+1} \rangle $ is the sort of n-ary functions with arguments of sorts $\tau _1, \ldots , \tau _n$ and values of sort $\tau _{n+1}$ .

Example 2.4. Suppose $J = \{0, 1\}$ . Then $\langle 1, 1, 0 \rangle \in \mathit {Sorts}^2(J)$ , $\langle 0, 1, \langle 0, 1 \rangle \rangle \in \mathit {Sorts}^3(J)$ , and $\langle \langle 0 \rangle; 1 \rangle \in \mathit {FnSorts}(J)$ .

In the languages $\mathcal {L}_J[K]$ , there will be no function variables and no third-order variables. We only allow variables of sorts $\tau \in J \cup \mathit {Sorts}^2(J)$ . However, there may be constant symbols of any sort $\tau \in \mathit {Sorts}(J)$ .

2.2 Languages without constant symbols

For any set of object sorts J, we define the second-order language $\mathcal {L}_J$ as follows:

  1. (i) The alphabet of $\mathcal {L}_J$ contains variables $x^j, y^j, z^j, \ldots $ for each object sort $j \in J$ , and relation variables $X^{\tau }, Y^{\tau }, Z^{\tau }, \ldots $ for each second-order sort $\tau \in \mathit {Sorts}^2(J)$ . There are no nonlogical constant symbols. The logical constants are $\neg , \,{\rightarrow}, \forall , =$ .

  2. (ii) The terms of sort $\tau $ are the variables of sort $\tau $ , for each $\tau \in J \cup \mathit {Sorts}^2(J)$ .

  3. (iii) In atomic formulas, we require that the sorts match. More precisely, the atomic formulas are strings of the form $t_1^j = t_2^j$ and $T^{\langle j_1, \ldots , j_n \rangle } t_1^{j_1}, \dots , t_n^{j_n}$ , where each $t^j$ is a term of sort $j \in J$ , and $T^{\langle j_1, \ldots , j_n \rangle }$ is a term of sort $\langle j_1, \ldots , j_n \rangle $ .

  4. (iv) If $\varphi , \psi $ are formulas and $x^j, X^\tau $ are variables, then $ \neg \varphi $ , $ \varphi \rightarrow \psi $ , $ \forall x^j \varphi $ , $ \forall X^\tau \varphi $ are also formulas.

The deductive system for $\mathcal {L}_J$ is essentially equivalent to Shapiro’s D2 minus the axiom schema of choice [Reference Shapiro26, pp. 66–67]. Compare [Reference Enderton8, pp. 112–113]. Its axioms are all closed universal generalizations of the formulas depicted in Table 1. For legibility, we suppress sorts. But note that x, y, and t must all be of the same sort, and X and T must be of the same sort. This requirement is induced by the formation rules of the language.

Table 1 Deductive system for  $\mathcal {L}_J$ .

Let $\varphi , \psi $ be any formulas of $\mathcal {L}_J$ . Let $x,y, X$ be variables, and $t,T$ be terms. (Note that $x,y,t$ must all be of the same sort. Likewise, X and T must be of the same sort.) Let $\varphi (t)$ be the result of substituting t for all free occurrences of x in $\varphi $ . In (*), let $\alpha $ be any atomic formula of $\mathcal {L}_J$ , and let $\alpha '$ be any formula obtained from $\alpha $ by replacing zero or more occurrences of x with y. In Comprehension, we write $X \bar x$ to abbreviate $X ^{\langle j_1, \ldots , j_n \rangle } x_1^{j_1} \cdots x_n^{j_n}$ .

An $\mathcal {L}_J$ -prestructure $\mathcal {M}$ is a collection of nonempty sets $\{M_\tau : \tau \in J \cup \mathit {Sorts}^2(J) \}$ such that $M_{\langle j_1, \ldots , j_n \rangle } \subseteq \mathcal {P}(M_{j_1} \times \cdots \times M_{j_n})$ for all $j_1, \ldots , j_n \in J$ . Satisfaction and truth in $\mathcal {M}$ are defined inductively, taking variables of sort $\tau $ to range over domain $M_\tau $ .

A general $\mathcal {L}_J$ -structure is an $\mathcal {L}_J$ -prestructure in which the second-order comprehension axioms are satisfied. Our deductive system is sound and complete with respect to general $\mathcal {L}_J$ -structures.

A standard $\mathcal {L}_J$ -structure $\mathcal {M}$ is a general $\mathcal {L}_J$ -structure in which $M_{\langle j_1, \ldots , j_n \rangle }= \mathcal {P}(M_{j_1} \times \cdots \times M_{j_n})$ for all $j_1, \ldots , j_n \in J$ . So, a standard $\mathcal {L}_J$ -structure is fully specified by its object domains $\{M_j : j \in J\}$ . Our deductive system is sound but not complete with respect to standard structures.

2.3 Languages with constant symbols

We will now sketch how to add constant symbols to the languages $\mathcal {L}_J$ .

For each $\tau \in \mathit {Sorts}(J)$ , let $K_\tau $ be a set of new symbols, called constant symbols. Each constant symbol is assigned to a particular sort $\tau $ , and is classified as an object, relation, or function constant accordingly. Assume that the $K_\tau $ ’s are pairwise disjoint, or use superscripts to keep track of sorts. Let $K = \bigcup _{\tau \in \mathit {Sorts}(J)} K_\tau $ .

Define the language $\mathcal {L}_J[K]$ as follows:

  1. (i) The alphabet of $\mathcal {L}_J[K]$ is the alphabet of $\mathcal {L}_J$ expanded by K.

  2. (ii) If $\tau \in J \cup \mathit {Sorts}^2(J)$ , the atomic terms of sort $\tau $ are the variables $x^\tau $ and the constants in $K_\tau $ .

    If $\tau \in \mathit {Sorts}^3(J)$ , the atomic terms of sort $\tau $ are the constants in $K_\tau $ .

    If $\tau = \langle \tau _1, \ldots , \tau _n; \tau _{n+1} \rangle \in \mathit {FnSorts}(J)$ , and $f^\tau \in K_\tau $ , and $t_1^{\tau _1}, \ldots , t_n^{\tau _n}$ are terms of the indicated sorts, then $f^\tau t_1^{\tau _1} \cdots t_n^{\tau _n}$ is a term of sort $\tau _{n+1}$ .

  3. (iii) The atomic formulas are defined as in $\mathcal {L}_J$ , except that we also allow atomic formulas of the form $T^{\tau } t_1^{\tau _1} \cdots t_n^{\tau _n}$ with $\tau = \langle \tau _1, \ldots , \tau _n \rangle \in \mathit {Sorts}^3(J)$ .

  4. (iv) The inductive clauses generating the set of all formulas are unchanged.

The deductive system for $\mathcal {L}_J[K]$ is obtained from the deductive system for $\mathcal {L}_J$ by allowing $\varphi , \psi $ to range over $\mathcal {L}_J[K]$ -formulas, $\alpha $ to range over atomic $\mathcal {L}_J[K]$ -formulas, and adding axioms of Extensionality analogous to the axioms of Identity.Footnote 8

An $\mathcal {L}_J[K]$ -prestructure $\mathcal {M} = (\mathcal {S}, I)$ consists of an $\mathcal {L}_J$ -prestructure $\mathcal {S}$ together with an interpretation I of the constant symbols that meets the following three conditions:

  1. (i) If $c^j$ is an object constant of sort $j \in J$ , then $I(c^j) \in M_j$ .

  2. (ii) If $R^\tau $ is a relation constant of sort $\tau = \langle \tau _1, \ldots , \tau _n \rangle \in \mathit {Sorts}^2(J) \cup \mathit {Sorts}^3(J)$ , then $I(R^\tau ) \in \mathcal {P}(M_{\tau _1} \times \cdots \times M_{\tau _n})$ .

  3. (iii) If $f^\tau $ is a function constant of sort $\tau = \langle \tau _1, \ldots , \tau _n; \tau _{n+1} \rangle \in \mathit {FnSorts}(J)$ , then $I(f^\tau )$ is a function from $M_{\tau _1} \times \cdots \times M_{\tau _{n}}$ into $M_{\tau _{n+1}}$ .

General and standard $\mathcal {L}_J[K]$ -structures are defined analogously to $\mathcal {L}_J$ -structures.

2.4 The languages $\mathcal {L}$ and $\mathcal {L}^+$

We now define the two languages that will be at the center of the rest of the paper.

Definition 2.5. The base language is $\mathcal {L} := \mathcal {L}_{\{0\}}$ .

Definition 2.6. The expanded language is $\mathcal {L}^+ := \mathcal {L}_{\{0, n\}}[\{\#_0, \#_n\}]$ , where $\#_0$ and $\#_n$ are function constants of sorts $\langle \langle 0 \rangle; n \rangle $ and $\langle \langle n \rangle; n \rangle $ , respectively.

The logical axioms for $\mathcal {L}$ and $\mathcal {L}^+$ will be denoted by $\mathit {Ax}_{\mathcal {L}}$ and $\mathit {Ax}_{\mathcal {L}^+}$ , respectively.

Some notational conventions:

  1. (i) We generally drop the superscripts $0$ , $\langle 0 \rangle $ , $\langle 0, 0 \rangle $ , $\ldots $ .

  2. (ii) We generally write variables of sorts $\tau \in \{n\} \cup \mathit {Sorts}^2(\{n\})$ in boldface, and drop the superscripts n, $\langle n \rangle $ , $\langle n, n \rangle $ , $\cdots $ .

  3. (iii) When we write second-order relation superscripts, we drop the angle brackets and commas. For example, we write $X^{n0}$ instead of $X^{\langle n,0 \rangle }$ .

  4. (iv) We drop the subscripts from $\#_0$ and $\#_n$ , writing $\#$ for both.

  5. (v) Following Frege, we refer to monadic relations as concepts.

3 Heck’s theory 2FA

Think of the base language $\mathcal {L}$ as our starting language, and $\mathit {Ax}_{\mathcal {L}}$ as our starting theory. Heck [Reference Heck15], [Reference Heck16, pp. 150–151] reconstrues Hume’s Principle as introducing a new, numerical sort of object (sort n), together with a host of new second-order relation sorts. The operator $\#$ (‘the number of’) may be applied to a concept variable of either sort, yielding a singular term of the numerical sort.

Definition 3.7. Weak two-sorted Hume’s Principle (w2HP) is the universal closure of:

$$ \begin{align*}\#F^0 = \#G^0 \ \leftrightarrow \ \exists R^{00}(F^0 \approx_{R^{00}} G^0). \end{align*} $$

Here, $F^0 \approx _{R^{00}} G^0$ abbreviates the statement that $R^{00}$ is a one–one correspondence between $F^0$ and $G^0$ .

Intuitively, w2HP gives the criterion of identity for numbers belonging to base-sort concepts. It tells us how to count base-sort objects. But w2HP does not tell us how to count numbers. Since we do in fact count numbers, we are motivated to consider a stronger principle.

Definition 3.8. Two-sorted Hume’s Principle (2HP) is the conjunction of the universal closures of the following three $\mathcal {L}^+$ -formulas:

$$ \begin{align*} \#F^0 = \#G^0 \ &\leftrightarrow \ \exists R^{00}(F^0 \approx_{R^{00}} G^0), \\ \# F^n = \# G^n \ &\leftrightarrow \ \exists R^{nn}(F^n \approx_{R^{nn}} G^n), \\ \# F^n = \# G^0 \ &\leftrightarrow \ \exists R^{n0}(F^n \approx_{R^{n0}} G^0). \end{align*} $$

The first line is w2HP. The second line gives the criterion of identity for numbers belonging to numerical concepts. The third line gives the mixed criterion of identity, which tells us (e.g.) whether the number of Julio-Claudian emperors equals the number of prime numbers less than 12.

Using our superscript-dropping conventions, we may write 2HP as follows:

$$ \begin{align*} \#F = \#G \ &\leftrightarrow \ \exists R(F \approx_{R} G), \\ \# \mathbf{{F}} = \# \mathbf{{G}} \ &\leftrightarrow \ \exists \mathbf{{R}}(\mathbf{{F}} \approx_{\mathbf{{R}}} \mathbf{{G}}), \\ \# \mathbf{{F}} = \# G \ &\leftrightarrow \ \exists R^{n0}(\mathbf{{F}} \approx_{R^{n0}} G). \end{align*} $$

Definition 3.9. Weak two-sorted Frege Arithmetic (w2FA) is the theory whose logical axioms are $\mathit {Ax}_{\mathcal {L}^+}$ and whose sole nonlogical axiom is $\mathrm{w2HP}$ .Footnote 9 In other words,

$$ \begin{align*}\mathrm{w2FA} = \mathit{Ax}_{\mathcal{L}^+} + \mathrm{w2HP}.\end{align*} $$

Definition 3.10. Two-sorted Frege Arithmetic (2FA) is the theory whose logical axioms are $\mathit {Ax}_{\mathcal {L}^+}$ and whose sole nonlogical axiom is $\mathrm{2HP}$ . In other words,

$$ \begin{align*}\mathrm{2FA} = \mathit{Ax}_{\mathcal{L}^+} + \mathrm{2HP}.\end{align*} $$

Notice that the logical axioms of 2FA include full second-order comprehension for the expanded language. So, by Frege’s Theorem, 2FA interprets second-order arithmetic in the numerical sort. It follows that 2FA proves a sentence which says that the numerical universe is Dedekind-infinite. But this is not a witness to non-conservativeness, because the numerical sort is not part of the base language. Prima facie, it seems quite plausible that 2FA should be a conservative extension of $\mathit {Ax}_{\mathcal {L}}$ .

4 Arithmetic

We will study 2FA by comparing it with other, better-known systems of arithmetic. In Section 4.1, we describe the usual systems of first- and second-order arithmetic. In Section 4.2, we describe systems of arithmetic with no function symbols.

4.1 First- and second-order arithmetic

We begin with first-order arithmetic. For reference, see [Reference Hájek and Pudlák13, pp. 12–13, 28–29].

Definition 4.11. The language of Peano arithmetic, $L_{\mathrm{PA}}$ , is a classical first-order language with identity whose nonlogical vocabulary is $(0, S, \leq , +, \cdot )$ . Here, 0 is a constant symbol, S is a unary function symbol, $\leq $ is a binary relation symbol, and $+,\cdot $ are binary function symbols.

Definition 4.12. Robinson arithmetic, Q, is the theory in $L_{\mathrm{PA}}$ with the following eight axioms:

$$ \begin{align*} & 0 \neq Sx, \\ & Sx = Sy \rightarrow x=y, \\ & x \neq 0 \rightarrow \exists y (x = Sy), \\ & x+0 = x, \\ & x+Sy = S(x+y), \\ & x \cdot 0 = 0, \\ & x \cdot Sy = (x \cdot y) + x,\\ & x \leq y \leftrightarrow \exists z (z + x = y). \end{align*} $$

Definition 4.13. Peano arithmetic, PA, is the result of adding to Q the following axiom schema of induction:

$$ \begin{align*} & \varphi(0) \wedge \forall x (\varphi(x) \rightarrow \varphi(Sx)) \rightarrow \forall x \varphi(x), \end{align*} $$

where $\varphi (x)$ is any formula of $L_{\mathrm{PA}}$ .

We write $(\forall x \leq t)(\cdots )$ to abbreviate $\forall x (x \leq t \rightarrow \cdots )$ , and similarly we write $(\exists x \leq t)(\cdots )$ . The quantifiers occurring in these expressions are said to be bounded.

An $L_{\mathrm{PA}}$ -formula is called bounded, or $\Sigma _0$ , if all quantifiers occurring in it are bounded.

An $L_{\mathrm{PA}}$ -formula is called $\Sigma _n$ ( $n \geq 0$ ) if it consists of a string of n alternating unbounded quantifiers, the first of which is existential, followed by a bounded formula. That is, a $\Sigma _n$ formula has the form $\exists x \forall y \exists z \forall w \cdots \theta $ , where $\theta $ is bounded.

Definition 4.14. The theory $I\Sigma _n$ ( $n \geq 0$ ) is the result of adding to Q the axiom schema of induction above, restricted to $\Sigma _n$ formulas.

We now turn our attention to second-order arithmetic. For reference, see [Reference Simpson28, pp. 2–5].

Definition 4.15. The language of second-order arithmetic, $L_2$ , is a two-sorted language consisting of all the vocabulary of $L_{\mathrm{PA}}$ , together with denumerably many monadic second-order variables $X, Y, Z, \ldots $ and a second-order quantifier $\forall X$ . The atomic formulas of $L_2$ include all strings of the form $Xt$ , where t is a first-order term and X is a second-order variable.

The second-order variables of $L_2$ are usually called set variables, and the atomic formulas $Xt$ are sometimes written $t \in X$ . For our purposes, there is no difference between set variables and concept variables, and the predication relation $\in $ may be left implicit. Hence, $L_2$ may be regarded as an expansion of the monadic fragment of $\mathcal {L}$ .

Definition 4.16. Second-order arithmetic, $Z_2$ , is the theory in $L_2$ whose axioms are those of Q, together with the second-order induction axiom

$$ \begin{align*}X0 \wedge \forall x (Xx \rightarrow X(Sx)) \rightarrow \forall x Xx \end{align*} $$

and the second-order comprehension scheme

$$ \begin{align*}\exists X \forall x (Xx \leftrightarrow \varphi(x)) \end{align*} $$

for each formula $\varphi $ of $L_2$ not containing X free. As usual, $\varphi $ may contain parameters, i.e., free first- or second-order variables other than x.

4.2 First- and second-order arithmetic with no function symbols

In this section, we introduce an arithmetical language $L'$ in which successor, addition, and multiplication are rendered as relations (which may be only partially defined) instead of functions. This allows us to define $BA'$ , a weak system of arithmetic that does not assume the existence of infinitely many natural numbers. The main point of the section is to state Lemma 4.23 and prove Lemmas 4.25 and 4.28. We will use these lemmas in Section 9 only, so feel free to skip this section and return to it later.

For reference, see [Reference Hájek and Pudlák13, pp. 86–89, 233].

Definition 4.17. Let $L'$ be the classical first-order language with identity whose nonlogical vocabulary is $(0, S, \leq , A, M)$ . Here, 0 is a constant symbol, S and $\leq $ are binary relation symbols, and A and M are ternary relation symbols.

An $L'$ -formula is called bounded′, or $\Sigma _0'$ , if it contains only bounded quantifiers.

Definition 4.18. $BA'$ is the theory in $L'$ with the following axioms:

  1. 1. $\leq $ is a discrete linear order with least element 0,

  2. 2. $Sxy$ iff y is the upper neighbor of x with respect to $\leq $ ,

  3. 3. Definitions of A and M:

    $$ \begin{align*} & Ax0z\leftrightarrow z= x, \\ & Syy' \wedge Szz' \rightarrow (Axyz \leftrightarrow Axy'z'), \\ & Mx0z \leftrightarrow z=0, \\ & Syy' \wedge Azxz' \rightarrow (Mxyz \leftrightarrow Mxy'z'), \end{align*} $$
  4. 4. Commutativity and associativity of A and M, distributivity, monotonicity of addition, monotonicity of multiplication by a positive number, and $x \leq y \leftrightarrow (\exists u \leq y) Axuy$ ,

  5. 5. Induction scheme for $\Sigma _0'$ formulas:

    $$ \begin{align*}\varphi(0) \wedge \forall x \forall y (\varphi(x) \wedge Sxy \rightarrow \varphi(y)) \rightarrow \forall x \varphi(x). \end{align*} $$

Definition 4.19. $I\Sigma _0'$ is the result of adding to $BA'$ axioms saying that $S,A,M$ define total functions, namely $\forall x \exists y Sxy$ , etc.

An $L'$ -formula is called $\Sigma _n'$ ( $n \geq 0$ ) if it consists of a string of n alternating unbounded quantifiers, the first of which is existential, followed by a bounded $'$ formula.

Definition 4.20. The theory $I\Sigma _n'$ ( $n \geq 0$ ) is the result of adding to $I\Sigma _0'$ the axiom schema of induction above, extended to $\Sigma _n'$ formulas.

We now state some useful facts about $BA'$ and its relatives.

Definition 4.21. Let $\mathfrak {D}$ be the conjunction of the following three $(L_{\mathrm{PA}} \cup L')$ -formulas:

$$ \begin{align*} Sx = y \ &\leftrightarrow \ Sxy, \\ x + y = z \ &\leftrightarrow \ Axyz, \\ x \cdot y = z \ &\leftrightarrow \ Mxyz. \end{align*} $$

For each $n \in \mathbb {N}$ , let $x \doteq n$ abbreviate the $L'$ -formula

$$ \begin{align*}(\exists u_1, \ldots, u_{n-1} \leq x)(S0u_1 \wedge Su_1u_2 \wedge \cdots \wedge Su_{n-1} x). \end{align*} $$

Lemmas 4.22 and 4.23 tell us that the theories $I\Sigma _n$ and $I\Sigma _n'$ are in a strong sense equivalent.

Lemma 4.22. Let $n \geq 0$ . Then $I\Sigma _n' + \mathfrak {D} \vdash I\Sigma _n$ , and conversely $I\Sigma _n + \mathfrak {D} \vdash I\Sigma _n'$ .

Lemma 4.23. Let $\varphi $ be a $\Sigma _n$ formula with $n \geq 1$ . Then there is a $\Sigma _n'$ formula $\varphi '$ with the same free variables as $\varphi $ such that $I\Sigma _n' + \mathfrak {D} \vdash \varphi \leftrightarrow \varphi '$ .

For proof, see [Reference Hájek and Pudlák13, pp. 88–89].Footnote 10

Lemma 4.24. $I\Sigma _0'$ and $BA'$ prove the same bounded $'$ formulas.

For proof, see [Reference Hájek and Pudlák13, p. 233].

Lemma 4.25. Let $\varphi (x_1, \ldots , x_k)$ be a bounded $'$ formula, and let $a_1, \ldots , a_k \in \mathbb {N}$ be such that $\mathbb {N} \vDash \varphi (a_1, \ldots , a_k)$ . Then

$$ \begin{align*}BA' \vdash x_1 \doteq a_1 \wedge \cdots \wedge x_k \doteq a_k \rightarrow \varphi(x_1, \ldots, x_k). \end{align*} $$

Proof. Let $\psi $ be the $L_{\mathrm{PA}}$ -formula obtained from $\varphi $ by replacing $Sxy$ , $Axyz$ , $Mxyz$ with $Sx = y$ , $x+y=z$ , $x \cdot y = z$ respectively. Observe that $\psi (S^{a_1}0, \ldots , S^{a_k}0)$ is a true bounded sentence of $L_{\mathrm{PA}}$ .

Now we argue as follows:

$$ \begin{align*} I\Sigma_0 &\vdash \psi(S^{a_1}0, \ldots, S^{a_k}0), \\ I\Sigma_0' + \mathfrak{D} &\vdash \psi(S^{a_1}0, \ldots, S^{a_k}0), \\ I\Sigma_0' + \mathfrak{D}&\vdash x_1 \doteq a_1 \wedge \cdots \wedge x_k \doteq a_k \rightarrow \psi(x_1, \ldots, x_k), \\ I\Sigma_0' + \mathfrak{D}&\vdash x_1 \doteq a_1 \wedge \cdots \wedge x_k \doteq a_k \rightarrow \varphi(x_1, \ldots, x_k), \\ I\Sigma_0' &\vdash x_1 \doteq a_1 \wedge \cdots \wedge x_k \doteq a_k \rightarrow \varphi(x_1, \ldots, x_k), \\ BA' &\vdash x_1 \doteq a_1 \wedge \cdots \wedge x_k \doteq a_k \rightarrow \varphi(x_1, \ldots, x_k). \end{align*} $$

The first line holds because $I\Sigma _0$ proves all true bounded sentences.Footnote 11 The second line follows by Lemma 4.22. Regarding the third line, it is easy to check that for each $n \in \mathbb {N}$ ,

$$ \begin{align*}I\Sigma_0' + \mathfrak{D} \vdash x = S^n 0 \leftrightarrow x \doteq {n}. \end{align*} $$

The fourth line follows by propositional logic, because $\varphi $ and $\psi $ differ only by applications of the equivalences in $\mathfrak {D}$ . The fifth line follows because $I\Sigma _0' + \mathfrak {D}$ is conservative over $I\Sigma _0'$ for $L'$ -formulas. The sixth line follows by Lemma 4.24.

Lastly, we describe a system of second-order arithmetic without function symbols.

Definition 4.26. The language $L_2'$ is just like $L_2$ , but with the vocabulary of $L'$ replacing the vocabulary of $L_{\mathrm{PA}}$ .

Definition 4.27. Let $Z_2'$ be the theory in $L_2'$ whose axioms are those of $I\Sigma _0'$ , plus the second-order induction axiom

$$ \begin{align*}X0 \wedge \forall x \forall y (Xx \wedge Sxy \rightarrow Xy) \rightarrow \forall x Xx \end{align*} $$

and the second-order comprehension scheme for $L_2'$ .

Lemma 4.28. $Z_2$ and $Z_2'$ are mutually interpretable. Indeed, $Z_2' + \mathfrak {D} \vdash Z_2$ , and conversely $Z_2 + \mathfrak {D} \vdash Z_2'$ .

Proof. We argue that $Z_2' + \mathfrak {D} \vdash Z_2$ . The other direction is easy.

Observe that $(Z_2' + \mathfrak {D}) \vdash (I\Sigma _0' + \mathfrak {D}) \vdash I\Sigma _0 \vdash Q$ . Furthermore, the two ways of formulating the second-order induction axiom are equivalent in the presence of $Sx=y \leftrightarrow Sxy$ .

It remains to show that $Z_2' + \mathfrak {D}$ proves the second-order comprehension scheme for $L_2$ . Take any $L_2$ -formula $\varphi $ . Let $\psi $ be the formula obtained from $\varphi $ by replacing each atomic predication $Xt$ with $\exists z(Xz \wedge z=t)$ , where z is a new variable. Then every non-atomic term in $\psi $ occurs in an equation $t_1 = t_2$ . These equations are $L_{\mathrm{PA}}$ -formulas. By Lemma 4.23, $Z_2' + \mathfrak {D}$ proves each $L_{\mathrm{PA}}$ -formula to be equivalent to an $L'$ -formula. So, there is an $L_2'$ -formula $\varphi '$ such that $Z_2' + \mathfrak {D} \vdash \varphi \leftrightarrow \varphi '$ . Now apply second-order comprehension to $\varphi '$ , and we are done.

5 Well-orderings and finiteness

In this section, we define ‘well-ordering’ in $\mathcal {L}$ , and we note that $\mathit {Ax}_{\mathcal {L}}$ proves that all well-orderings are comparable (Lemma 5.29). Then we define the notion of Stäckel-finiteness and prove the important lemma of induction on finite concepts (Lemma 5.32). We will use these lemmas throughout the paper.

For simplicity, we work in $\mathcal {L}$ . However, these notions can easily be extended to $\mathcal {L}^+$ .

Let $\varnothing $ denote the empty concept. Let V denote the universal concept.

Let $Y \subseteq X$ abbreviate $\forall x (Yx \rightarrow Xx)$ .

Let ‘ $(X,R)$ is a linear order’ abbreviate the formula

$$ \begin{align*}\forall x \forall y(Rxy \wedge Ryx \rightarrow x=y) &\wedge \forall x \forall y \forall z (Rxy \wedge Ryz \rightarrow Rxz)\\ &\wedge \forall x \forall y(Xx \wedge Xy \leftrightarrow (Rxy \vee Ryx)). \end{align*} $$

In other words, $(X,R)$ is a linear order just in case R is an antisymmetric, transitive, total relation on X.

Let ‘ $(X,R)$ is well-founded’ abbreviate

$$ \begin{align*}\forall Y (Y \neq \varnothing \wedge Y \subseteq X \rightarrow \exists x (Yx \wedge \forall y (Yy \rightarrow Rxy))). \end{align*} $$

Say that $(X,R)$ is a well-ordering if $(X,R)$ is a well-founded linear order.

We say that two well-orderings $(X, \leq _X)$ and $(Y, \leq _Y)$ are order-isomorphic, denoted $(X, \leq _X) \simeq _o (Y, \leq _Y)$ , just in case there is a bijection $f: X \to Y$ such that

$$ \begin{align*}\forall x \forall y (x \leq_X y \leftrightarrow f(x) \leq_Y f(y)). \end{align*} $$

Strictly speaking, we should represent f as a relation, but we will go on using functional notation informally.

If $(X,R)$ is a well-ordering, let $X \upharpoonright a$ be the initial segment of $(X,R)$ up to a, defined by

$$ \begin{align*}(X \upharpoonright a)x \ \leftrightarrow \ Xx \wedge Rxa. \end{align*} $$

We also regard $\varnothing $ as an initial segment of $(X,R)$ . An initial segment of $(X,R)$ is proper if it is not equal to X.

Let $(X, \leq _X) <_o (Y, \leq _Y)$ abbreviate the statement that $(X, \leq _X)$ is order-isomorphic with a proper initial segment of $(Y, \leq _Y)$ .

We borrow the next lemma from [Reference Ebels-Duggan7, p. 611].

Lemma 5.29 (Comparability of well-orderings)

It is provable from $\mathit {Ax}_{\mathcal {L}}$ that any two well-orderings $(X,\leq _X)$ and $(Y,\leq _Y)$ are comparable, in the sense that exactly one of the following holds:

$$ \begin{align*}(X, \leq_X) <_o (Y, \leq_Y), \quad (X, \leq_X) \simeq_o (Y, \leq_Y), \quad (X, \leq_X)>_o (Y, \leq_Y). \end{align*} $$

Proof. Copy the usual set-theoretic proof [Reference Jech18, pp. 18–19].

We now define the notion of Stäckel-finiteness.

If R is a binary relation, let $R^{-1}$ be the converse of R, defined by $R^{-1}xy \leftrightarrow Ryx.$

Definition 5.30. Say that $(X,R)$ is a double well-ordering if $(X,R)$ and $(X,R^{-1})$ are both well-orderings.

Say that X is Stäckel-finite, abbreviated $\mathit {Fin}(X)$ , if X admits a double well-ordering. That is,

$$ \begin{align*}\mathit{Fin}(X) \iff_{\hspace{-4 pt} \mathrm{df}}\ \ \exists R ( (X,R) \text{ is a double well-ordering}). \end{align*} $$

Remark 5.31. The double well-ordering criterion is proposed as a definition of finiteness in [Reference Stäckel29]. The criterion is also discussed in [Reference Zermelo34, Reference Zermelo and Castelnuovo35]. For historical remarks, see [Reference Parsons25].

Stäckel-finiteness is strictly stronger than Dedekind-finiteness, in the sense that

$$ \begin{align*} \mathit{Ax}_{\mathcal{L}} &\vdash \mathit{Fin}(X) \rightarrow \mathit{DFin}(X), \\ \mathit{Ax}_{\mathcal{L}} &\not\vdash \mathit{DFin}(X) \rightarrow \mathit{Fin}(X), \end{align*} $$

where of course $\mathit {DFin}(X)$ abbreviates that X is Dedekind-finite. Indeed, $\mathit {Fin}(X) \rightarrow \mathit {DFin}(X)$ is a version of the pigeonhole principle. It is provable from $\mathit {Ax}_{\mathcal {L}}$ by induction on finite concepts (Lemma 5.32). On the other hand, the Fraenkel model (defined in Section 6) is a model of $\mathit {DFin}(V) + \neg \mathit {Fin}(V)$ , witnessing that $\mathit {Ax}_{\mathcal {L}} \not \vdash \mathit {DFin}(X) \rightarrow \mathit {Fin}(X)$ .

Lastly, we show that $\mathit {Ax}_{\mathcal {L}}$ proves a principle of induction on Stäckel-finite concepts.

Let $X \cup \{a\}$ be the concept defined by

$$ \begin{align*}(X \cup \{a\})x\ \leftrightarrow\ (Xx \vee x=a). \end{align*} $$

Lemma 5.32 (Induction on finite concepts)

Let $\varphi (X)$ be any formula of $\mathcal {L}$ . Then $\mathit {Ax}_{\mathcal {L}}$ proves the universal closure of

$$ \begin{align*}\varphi(\varnothing) \wedge \forall X \forall a (\mathit{Fin}(X) \wedge \varphi(X) \rightarrow \varphi(X \cup \{a\})) \rightarrow \forall X(\mathit{Fin}(X) \rightarrow \varphi(X)). \end{align*} $$

Proof. Assume the antecedent. Take any X such that $\mathit {Fin}(X)$ . Fix a double well-ordering $(X,R)$ , and let Y be defined by $Yx \leftrightarrow (Xx \wedge \varphi (X \upharpoonright x))$ . It suffices to show that $Y=X$ .

Suppose not. Since $(X,R)$ is a well-ordering, there is an R-least y such that $Xy \wedge \neg Yy$ . It is easy to see that y cannot be the R-least element of X. Since $(X,R^{-1})$ is a well-ordering, y has a unique $(X,R)$ -predecessor, call it z. By the minimality of y, we have $Yz$ , and hence $\varphi (X \upharpoonright z)$ . Also, it is easy to see that $\mathit {Fin}(X \upharpoonright z)$ . It follows that $\varphi ( (X \upharpoonright z) \cup \{y\})$ , which is to say $\varphi (X \upharpoonright y)$ . But this contradicts our choice of y.

6 The Fraenkel model

In this section, we define the Fraenkel model and show that it is a model of $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ (Lemmas 6.38 and 6.39). Then we show that the relations occurring in the Fraenkel model are exactly the sets definable by Boolean combinations of equalities with object parameters (Lemma 6.40). We will make good use of these facts in Section 7.

We remark that Lemma 6.40 implies that the Fraenkel model is the minimal infinite model of $\mathit {Ax}_{\mathcal {L}}$ —i.e., it is a submodel of any infinite model of $\mathit {Ax}_{\mathcal {L}}$ .

Definition 6.33. Let $A \subseteq \mathbb {N}^n$ and $E \subseteq \mathbb {N}$ . We say that E is a support of A if every permutation $\pi : \mathbb {N} \to \mathbb {N}$ that fixes E pointwise fixes A setwise:

$$ \begin{align*}(\forall e \in E)(\pi(e)=e) \implies \forall x_1, \ldots, x_n ( (x_1, \ldots, x_n ) \in A \leftrightarrow (\pi(x_1), \ldots, \pi(x_n) ) \in A ). \end{align*} $$

Using the notation $\pi (A) = \{ ( \pi (x_1), \ldots , \pi (x_n) ) \in \mathbb {N}^n: ( x_1, \ldots , x_n ) \in A \}$ , we can restate this property as follows: for every permutation $\pi :\mathbb {N}\to \mathbb {N}$ ,

$$ \begin{align*}(\forall e \in E)(\pi(e)=e) \implies \pi(A) = A. \end{align*} $$

Definition 6.34. A set $A \subseteq \mathbb {N}^n$ is symmetric if it has a finite support $E \subseteq \mathbb {N}$ .

Definition 6.35. The Fraenkel model is the $\mathcal {L}$ -prestructure $\mathcal {M}$ whose object universe is $\mathbb {N}$ , and whose n-ary relations are the symmetric subsets of $\mathbb {N}^n$ . That is, writing $M_n$ for $M_{\langle 0, \ldots , 0 \rangle } \ (n \text{ zeroes})$ ,

$$ \begin{align*} M_0 &= \mathbb{N}, \\ M_n &= \{A \subseteq \mathbb{N}^n: A \text{ is symmetric} \}. \end{align*} $$

It is well known that $\mathcal {M}$ is a model of $\mathit {Ax}_{\mathcal {L}}$ (i.e., it is a general $\mathcal {L}$ -structure) [Reference Väänänen and Zalta32]. However, we are not aware of any English-language source that gives the proof. For the reader’s convenience, we present the proof from [Reference Asser1] in the next two lemmas.

Lemma 6.36. If $A \subseteq \mathbb {N}^n$ is symmetric, and $\sigma :\mathbb {N} \to \mathbb {N}$ is any permutation, then $\sigma (A) \subseteq \mathbb {N}^n$ is also symmetric.

Proof. Let E be a support for A. We show that $\sigma ^{-1}(E)$ is a support for $\sigma (A)$ . Indeed, take any permutation $\pi :\mathbb {N} \to \mathbb {N}$ that fixes $\sigma ^{-1}(E)$ pointwise. Then the permutation $\sigma ^{-1} \pi \sigma : \mathbb {N} \to \mathbb {N}$ fixes E pointwise. So, $(\sigma ^{-1} \pi \sigma )(A) = A$ , and hence $\pi (\sigma (A)) = \sigma (A)$ .

Corollary 6.37. Each relation domain $M_n$ of the Fraenkel model is closed under the action (on $\mathbb {N}^n$ ) of permutations of $\mathbb {N}$ .

Lemma 6.38. The Fraenkel model is a model of $\mathit {Ax}_{\mathcal {L}}$ .

Proof. Let $\mathcal {M}$ be the prestructure defined above. We show that $\mathcal {M}$ satisfies Comprehension. Take any formula $\varphi (\bar x, \bar b, \bar B)$ of $\mathcal {L}$ , with free variables $\bar x = (x_1, \ldots , x_n)$ and parameters $\bar b = (b_1, \ldots , b_j)$ and $\bar B = (B_1, \ldots , B_k)$ drawn from $\mathcal {M}$ . Say that $A = \{\bar a \in \mathbb {N}^n : \mathcal {M} \vDash \varphi (\bar a, \bar b, \bar B)\}$ . We show that $A \in M_n$ .

Since the relation parameters $\bar B$ are drawn from $\mathcal {M}$ , each set $B_i$ has a finite support $E_i$ ( $i = 1, \ldots , k$ ). Let $E = \{b_1, \ldots , b_j \} \cup E_1 \cup \cdots \cup E_k$ . Clearly, E is finite. We show that E is a support for A.

Take any permutation $\pi : \mathbb {N} \to \mathbb {N}$ that fixes E pointwise, and take any $\bar a = (a_1, \ldots , a_n) \in \mathbb {N}^n$ . We check that $\bar a \in A \iff \pi (\bar a ) = ( \pi (a_1), \ldots , \pi (a_n) ) \in A$ . Indeed,

$$ \begin{align*} \bar a \in A &\iff \mathcal{M} \vDash \varphi(\bar a, \bar b, \bar B) \\ &\iff \mathcal{M} \vDash \varphi(\pi(\bar a), \pi(\bar b), \pi(\bar B))\\ &\iff \mathcal{M} \vDash \varphi(\pi(\bar a), \bar b, \bar B) \\ &\iff \pi(\bar a) \in A. \end{align*} $$

(Notation: $\pi (\bar b) = (\pi (b_1), \ldots , \pi (b_j))$ and $\pi (\bar B) = (\pi (B_1), \ldots , \pi (B_k))$ . By Lemma 6.36, each $\pi (B_i)$ is a parameter from $\mathcal {M}$ .) The second step works because permuting everything uniformly doesn’t change any truth-values relative to any variable-assignment. This is easily proved by induction on formulas. The third step works because $\pi $ fixes E pointwise, hence fixes all the parameters.

Lemma 6.39. The Fraenkel model is a model of $\neg \mathit {Fin}(V)$ .

Proof. In fact, we will prove something stronger: the Fraenkel model does not contain any linear ordering of the universe.

Consider any relation $R \subseteq \mathbb {N}^2$ with finite support $E \subseteq \mathbb {N}$ . Suppose for sake of contradiction that R is a linear ordering of the universe. Since R is total, we may choose distinct $a,b \in \mathbb {N} \setminus E$ such that $Rab$ . Let $\pi $ be any permutation fixing E such that $\pi (a)=b$ and $\pi (b)=a$ . Since E is a support of R, it follows that $Rba$ . But this contradicts the assumption that R is antisymmetric.

So, $\mathcal {M}$ contains no linear ordering of the universe. It follows that $\mathcal {M}$ contains no double well-ordering of the universe, i.e., $\mathcal {M} \vDash \neg \mathit {Fin}(V)$ .

We close this section by giving a simple characterization of symmetric sets.

Lemma 6.40. Let $E \subseteq N$ be a finite set. A set $A \subseteq N$ is symmetric with support E iff A is definable by Boolean combinations of equalities with parameters from E.

Proof. Define an equivalence relation $\sim _E$ on $\mathbb {N}^n$ , as follows:

$$ \begin{align*} (a_1, \ldots, a_n) \sim_E (b_1, \ldots, b_n) \iff [(&\forall i, j \leq n)(a_i = a_j \leftrightarrow b_i = b_j) \ \wedge \\ & (\forall e \in E)(\forall i \leq n)(a_i = e \leftrightarrow b_i = e)]. \end{align*} $$

In words: $\bar a \sim _E \bar b$ iff $\bar a$ and $\bar b$ are n-tuples with the same pattern of identity and distinctness which agree on members of E. It is easy to see that $\sim _E$ really is an equivalence relation.

( $\!{\implies}\kern-1pt\!$ ). Suppose $A \subseteq \mathbb {N}^n$ is symmetric with support E. Observe that A is a union of equivalence classes of $\sim _E$ . Indeed, if $\bar a \sim _E \bar b$ , then there is a permutation $\pi :\mathbb {N} \to \mathbb {N}$ fixing E such that $\pi (\bar a) = \bar b$ .

Now, each equivalence class of $\sim _E$ is definable by a Boolean combination of equalities with parameters from E, of the following form:

$$ \begin{align*}\bigwedge_{\substack{i, j \leq n \\ i \neq j}} (\neg) \ x_i = x_j \ \wedge \ \bigwedge_{\substack{ i \leq n \\ e \in E }} (\neg) \ x_i = e. \end{align*} $$

(The parenthesized negations may or may not be present in each conjunct.) Furthermore, $\sim _E$ has only finitely many equivalence classes, because there are only finitely many possible patterns of identity and distinctness among $x_1, \ldots , x_n$ and the members of E. Hence, A is definable by a disjunction of formulas like the one above.

( $\Longleftarrow $ ). Suppose A is definable by a Boolean combination of equalities with parameters from E. We show that A is symmetric with support E.

Take any permutation $\pi :\mathbb {N}\to \mathbb {N}$ fixing E pointwise. That is, for all $x_i, x_j \in \mathbb {N}$ and $e \in E$ ,

$$ \begin{align*} x_i = x_j \ &\leftrightarrow \ \pi(x_i) = \pi(x_j), \\ x_i = e \ &\leftrightarrow \ \pi(x_i) = e. \end{align*} $$

By induction on formulas, it is easy to see that $\mathbb {N} \vDash \varphi (\bar x, \bar e) \leftrightarrow \varphi (\pi (\bar x), \bar e)$ for any Boolean combination of equalities $\varphi (\bar x, \bar e)$ . Since A is defined by some such Boolean combination, it follows that $\pi (A) = A$ .

Since $\pi $ was arbitrary, we conclude that A is symmetric with support E.

7 The non-conservativeness of w2FA

In this section, we prove Theorem 7.47, which says that w2FA is not conservative over $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ .

Here is the main idea of the proof. We have seen that $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ has a model whose relations are easy to describe in finitary terms (Section 6). Hence, $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ is a fairly weak theory; in fact it is mutually interpretable with first-order Peano arithmetic. (To show that $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ interprets PA, the trick is to code arithmetical statements as statements about finite concepts.) On the other hand, adding $\mathrm{w2FA}$ to $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ results in a much stronger theory, one which proves that the numerical sort is Dedekind-infinite and hence interprets second-order arithmetic. Second-order arithmetic is not conservative over Peano arithmetic. By means of a carefully chosen interpretation, this non-conservativeness can be transferred to the theories of interest to us. For example, $\mathrm{w2FA} + \neg \mathit {Fin}(V)$ proves the interpretation of a consistency statement for Peano arithmetic, while $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ does not.

Let $X \approx Y$ abbreviate that there is a bijection between X and Y, in which case we say that X and Y are equinumerous concepts.

If $Ryz$ is a binary relation, let $R_y$ be the concept defined by $R_y z \leftrightarrow Ryz$ . (This is terrible notation, but we only use it in the following definition.)

Definition 7.41. Define $\mathit {Succ}, \mathit {Leq}, \mathit {Add}, \mathit {Mult}$ as follows:

$$ \begin{align*} \mathit{Succ}(X,Y) &\iff \exists a(\neg Xa \ \wedge\ Y \approx X\cup \{a\}), \\ \mathit{Leq}(X,Y) &\iff \exists X'(X \approx X' \ \wedge\ X' \subseteq Y), \\ \mathit{Add}(X,Y,Z) &\iff \exists Y' (Y \approx Y' \ \wedge\ X \cap Y' = \varnothing \ \wedge\ X \cup Y' \approx Z), \\ \mathit{Mult}(X,Y,Z) &\iff \exists R[ \forall y \forall z (Ryz \rightarrow (Yy \wedge Zz)) \wedge \forall y(Yy \rightarrow R_y \approx X) \\ & \quad \qquad \wedge \forall z(Zz \rightarrow \exists ! y Ryz) ]. \end{align*} $$

In other words, $\mathit {Mult}(X,Y,Z)$ says that Z is equinumerous with the union of $|Y|$ disjoint copies of X.

Definition 7.42. Define the translation $\alpha : L_2 \to \mathcal {L}^+$ as follows.

Identify first-order variables of $L_{2}$ with base-sort concept variables of $\mathcal {L}^+$ . Identify second-order variables of $L_2$ with numerical-sort concept variables of $\mathcal {L}^+$ .

Relativize $\forall x$ to the formula $\mathit {Fin}(X)$ .

Relativize $\forall X$ to the formula $\mathit {FinNums}(\mathbf {X}) := \forall \mathbf {y}(\mathbf {X} \mathbf {y} \rightarrow \exists Y[\mathbf {y} = \#Y \wedge \mathit {Fin}(Y)])$ .

Translate predication and equality as follows:

$$ \begin{align*} (Xy)^\alpha &:= \mathbf{X}(\#Y), \\ (x=y)^\alpha &:= X \approx Y. \end{align*} $$

Translate $0, S, \leq , +, \cdot $ as follows:

$$ \begin{align*} (x=0)^\alpha &:= X = \varnothing, \\ (Sx = y)^\alpha &:= \mathit{Succ}(X,Y), \\ (x \leq y)^\alpha &:= \mathit{Leq}(X,Y), \\ (x+ y = z)^\alpha &:= \mathit{Add}(X,Y,Z), \\ (x \cdot y = z)^\alpha &:= \mathit{Mult}(X,Y,Z). \end{align*} $$

We may extend this translation to all $L_2$ -formulas via the usual techniques for eliminating definite descriptions. For example, write $SSx = y$ as $\exists z(Sx = z \wedge Sz = y)$ , and so on.

Lemma 7.43. Restricted to $L_{\mathrm{PA}}$ -formulas, the translation $\alpha : L_2 \to \mathcal {L}^+$ is an interpretation of PA in $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ .

Proof. Note that if $\varphi $ is an $L_{\mathrm{PA}}$ -formula, then $\varphi ^\alpha $ is an $\mathcal {L}$ -formula. We will show that $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ proves the $\alpha $ -translation of each axiom of PA, and also proves that $\mathit {Succ}$ , $\mathit {Add}$ , $\mathit {Mult}$ define total functions (up to $\approx $ ).

First we prove that $\mathit {Succ}$ defines a total function (up to $\approx $ ). In other words, we show that for any Stäckel-finite concepts $X, Y, Z$ ,

$$ \begin{align*} & \exists W (\mathit{Fin}(W) \wedge \mathit{Succ}(X, W)), \\ &\mathit{Succ}(X, Y) \wedge \mathit{Succ}(X, Z) \rightarrow Y \approx Z. \end{align*} $$

We reason in $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ . For the first claim, take any concept X such that $\mathit {Fin}(X)$ . Then X is not V. So, there exists a such that $\neg Xa$ . Then $\mathit {Succ}(X, X \cup \{a\})$ , and it is easy to check that $\mathit {Fin}(X \cup \{a\})$ . This gives us the first claim. The second claim is obtained simply by unpacking the definition of $\mathit {Succ}$ .

We postpone the proofs that $\mathit {Add}$ and $\mathit {Mult}$ define total functions (up to $\approx $ ).

The $\alpha $ -translations of the axioms of Q can be expressed as follows (after eliminating definite descriptions in a convenient way). For any Stäckel-finite concepts $X, Y, Z, Y', Z'$ ,

$$ \begin{align*} & \neg \mathit{Succ}(X, \varnothing), \\ & \mathit{Succ}(X, Z) \wedge \mathit{Succ}(Y,Z) \rightarrow X \approx Y, \\ & \mathit{Add}(X, \varnothing, Z) \leftrightarrow Z \approx X, \\ & \mathit{Succ}(Y, Y') \rightarrow (\mathit{Add}(X,Y',Z') \leftrightarrow \exists Z [\mathit{Fin}(Z) \wedge \mathit{Add}(X, Y, Z) \wedge \mathit{Succ}(Z,Z')]), \\ & \mathit{Mult}(X, \varnothing, Z) \leftrightarrow Z = \varnothing, \\ & \mathit{Succ}(Y, Y') \rightarrow (\mathit{Mult}(X,Y',Z') \leftrightarrow \exists Z [\mathit{Fin}(Z) \wedge \mathit{Mult}(X, Y, Z) \wedge \mathit{Add}(Z,X,Z')]), \\ & \mathit{Leq}(X,Y) \leftrightarrow \exists Z (\mathit{Fin}(Z) \wedge \mathit{Add}(Z,X,Y)). \end{align*} $$

(We drop the third axiom of Q, since it is redundant in PA.) It is tedious but straightforward to check that all of these claims are provable from $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ .

The previous step essentially provides us with recursive definitions of $\mathit {Add}$ and $\mathit {Mult}$ . Using these recursive definitions, it is then easy to prove that $\mathit {Add}$ and $\mathit {Mult}$ define total functions (up to $\approx $ ). For $\mathit {Add}$ , we must show that for any Stäckel-finite concepts $X, Y, Z, W$ ,

$$ \begin{align*} & \exists U (\mathit{Fin}(U) \wedge \mathit{Add}(X, Y, U)), \\ & \mathit{Add}(X, Y, Z) \wedge \mathit{Add}(X, Y, W) \rightarrow Z \approx W. \end{align*} $$

Both of these claims are provable by induction on the finite concept Y (Lemma 5.32), using the recursive definition of $\mathit {Add}$ . The proof for $\mathit {Mult}$ is similar.

Lastly, the $\alpha $ -translation of the induction scheme of PA follows from induction on finite concepts (Lemma 5.32 again).

Lemma 7.44. The translation $\alpha : L_2 \to \mathcal {L}^+$ is an interpretation of $Z_2$ in $\mathrm{w2FA} + \neg \mathit {Fin}(V)$ .

Proof. By Lemma 7.43, we already know that the $\alpha $ -translation is an interpretation of PA in $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ , and hence in $\mathrm{w2FA} + \neg \mathit {Fin}(V)$ . It remains to check that $\mathrm{w2FA} + \neg \mathit {Fin}(V)$ proves the $\alpha $ -translations of the second-order induction and comprehension axioms.

The translation of the second-order induction axiom is equivalent to

$$ \begin{align*} & \mathbf{X}(\# \varnothing) \wedge \forall X \left ( \mathit{Fin}(X) \wedge \mathbf{X}(\#X) \wedge \mathit{Succ}(X, Y) \rightarrow \mathbf{X}(\#Y) \right ) \rightarrow \forall X \left ( \mathit{Fin}(X) \rightarrow \mathbf{X}(\#X) \right ). \end{align*} $$

This is easily proved by induction on finite concepts, generalized to $\mathcal {L}^+$ -formulas. The generalization is proved in the same way as Lemma 5.32.

The comprehension scheme translates as follows:

$$ \begin{align*} \exists \mathbf{X} \left ( \mathit{FinNums}(\mathbf{X}) \wedge \forall Y \left( \mathit{Fin}(Y) \rightarrow \left ( \mathbf{X}(\#Y) \leftrightarrow \varphi^\alpha(Y) \right ) \right) \right ). \end{align*} $$

To prove this in $\mathrm{w2FA} + \neg \mathit {Fin}(V)$ , apply comprehension (in $\mathcal {L}^+$ ) to the formula

$$ \begin{align*}\exists Y \left ( \mathbf{x} = \#Y \wedge \mathit{Fin}(Y) \wedge \varphi^\alpha(Y) \right ). \end{align*} $$

Then use w2FA and the fact that $\approx $ is a congruence with respect to $\varphi ^\alpha (Y)$ .

We will now define a translation $\beta : \mathcal {L} \to L_{\mathrm{PA}}$ inspired by the Fraenkel model, and show that it is an interpretation of $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ in PA.

Fix primitive recursive encodings of finite sets and sequences as natural numbers. For finite sequences, this amounts to specifying the following functions in $L_{\mathrm{PA}}$ :

  1. (i) for each $n\in \mathbb {N}$ , a primitive recursive function $\langle x_1, \ldots , x_n \rangle $ , which codes this tuple as a single number,

  2. (ii) primitive recursive functions $\mathit {length}(s)$ and $(s)_i$ , which return the length and the i-th element of the finite sequence coded by s.

We identify finite sets and sequences with their codes. We use the letter E for finite sets, and the letter s for finite sequences.

Fix a primitive recursive Gödel numbering of $L_{\mathrm{PA}}$ -formulas. We identify formulas with their Gödel numbers. For each formula $\varphi $ , let be a formal numeral that denotes (the Gödel number of) $\varphi $ .

Next, we describe $L_{\mathrm{PA}}$ -formulas $\mathit {BoolEq}$ , $\mathit {BoolSat}$ , $\mathit {pad}_n$ representing certain primitive recursive relations and functions.

Let $\mathit {BoolEq}(x, y, E)$ just in case: x is a Boolean combination of $L_{\mathrm{PA}}$ -equalities with exactly y free variables and with constant symbols drawn from $\{S^e0: e\in E\}$ .

Let $\mathit {BoolSat}(x,s)$ just in case: x is a Boolean combination of $L_{\mathrm{PA}}$ -equalities that is satisfied when the i-th variable of $L_{\mathrm{PA}}$ is assigned the value $(s)_i$ , for all $i \leq length(s)$ . This is primitive recursive, because truth and satisfaction for bounded ( $\Sigma _0$ ) formulas are primitive recursive notions.

For each $n\in \mathbb {N}$ , let $\mathit {pad}_n(x_1, \ldots , x_n, y_1, \ldots , y_n)=s$ just in case: s is the shortest finite sequence whose $x_i$ -th element is $y_i$ (for all $1 \leq i \leq n$ ) and whose other elements are all zero.

Definition 7.45. Define the translation $\beta : \mathcal {L} \to L_{\mathrm{PA}}$ as follows.

Let the variables of $L_{\mathrm{PA}}$ and the object variables of $\mathcal {L}$ be enumerated by $v_1, v_2, v_3, \ldots $ .

Translate each object variable $v_i$ of $\mathcal {L}$ by the even-numbered variable $v_{2i}$ . Translate each relation variable X of $\mathcal {L}$ by a distinct odd-numbered variable $v_X \in \{v_1, v_3, v_5, \ldots \}$ . In the last clause, E is a fresh variable and n is the arity of X.

$$ \begin{align*} (X v_{i_1} \cdots v_{i_n})^\beta &:= \mathit{BoolSat}(v_{X}, \mathit{pad}_n(S^{i_1}0, \ldots, S^{i_n}0, v_{2i_1}, \ldots, v_{2i_n} ) ). \\ (v_i = v_j)^\beta &:= v_{2i} = v_{2j}. \\ (\varphi \rightarrow \psi)^\beta &:= \varphi^\beta \rightarrow \psi^\beta. \\ (\neg \varphi)^\beta &:= \neg \varphi^\beta. \\ (\forall v_i \ \varphi)^\beta &:= \forall v_{2i} \ \varphi^\beta. \\ (\forall X \ \varphi)^\beta &:= \forall v_{X} (\exists E \ \mathit{BoolEq}(v_{X}, S^n 0, E) \rightarrow \varphi^\beta). \end{align*} $$

Lemma 7.46. The translation $\beta : \mathcal {L} \to L_{\mathrm{PA}}$ is an interpretation of $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ in PA.

Proof. It is easy to check that the $\beta $ -translation of any non-comprehension axiom is a theorem of first-order logic, and hence is provable in PA.Footnote 12 It remains to show that PA proves the $\beta $ -translation of each comprehension axiom, and also that PA proves $(\neg \mathit {Fin}(V))^\beta $ .

The idea is to formalize the proofs of Lemmas 6.38, 6.39, and 6.40 in PA. The main obstacle is that we defined symmetric sets $A \subseteq \mathbb {N}^n$ in terms of arbitrary permutations of $\mathbb {N}$ , and it is not obvious how to formalize those in PA. But in fact we do not need arbitrary permutations. Say that a permutation $\pi :\mathbb {N} \to \mathbb {N}$ is essentially finite if $\pi (a) = a$ for all but finitely many $a \in \mathbb {N}$ . If we go through Section 6, replacing ‘permutation’ with ‘essentially finite permutation’ everywhere, we get exactly the same model, and all the proofs still work.

We formalize Lemma 6.40 as follows. Say that an $L_{\mathrm{PA}}$ -formula $\varphi (\bar x)$ is symmetric with support E just in case, for every essentially finite permutation $\pi $ ,

$$ \begin{align*}(\forall e \in E)(\pi(e) = e) \implies \forall \bar x (\varphi(\bar x) \leftrightarrow \varphi(\pi(\bar x))). \end{align*} $$

Then we prove a theorem scheme in PA which says: ‘An $L_{\mathrm{PA}}$ -formula is symmetric iff there is a Boolean combination of equalities coextensive with it.’ More precisely, let $\varphi (v_{i_1}, \ldots , v_{i_n})$ be any $L_{\mathrm{PA}}$ -formula with exactly the free variables displayed. Then PA proves the following: $\varphi (v_{i_1}, \ldots , v_{i_n})$ is symmetric with support E iff there exists y such that

$$ \begin{align*}\mathit{BoolEq}(y, S^n0, E) \wedge \forall \bar x (\mathit{BoolSat}(y, \mathit{pad}_n(S^{i_1}0, \ldots, S^{i_n}0, \bar x ) ) \leftrightarrow \varphi(\bar x)). \end{align*} $$

$(\!{\implies}\!\kern-1pt)$ . We reason in PA. Suppose that $\varphi (v_{i_1}, \ldots , v_{i_n})$ is symmetric with support E. Let $\psi _1, \ldots , \psi _m$ be all possible disjunctions of formulas of the form

$$ \begin{align*}\bigwedge_{\substack{j,k \leq n \\ j \neq k}} (\neg) \ v_{i_j} = v_{i_k} \ \wedge \ \bigwedge_{\substack{ j \leq n \\ e \in E }} (\neg) \ v_{i_j} = S^e 0, \end{align*} $$

where parenthesized negations may or may not be present. Argue that $\bar x \sim _E \bar y \rightarrow (\varphi (\bar x) \leftrightarrow \varphi (\bar y))$ , and hence

$$ \begin{align*}\forall \bar x (\varphi(\bar x) \leftrightarrow \psi_1(\bar x)) \vee \cdots \vee \forall \bar x (\varphi(\bar x) \leftrightarrow \psi_m(\bar x)). \end{align*} $$

Then observe that , for each $1 \leq i \leq m$ .Footnote 13 Reasoning by cases, we are done.

For the $(\Longleftarrow )$ direction, copy the rest of the proof of Lemma 6.40.

Next, we formalize Lemma 6.38. We replace $\mathcal {M} \vDash \varphi $ (‘ $\mathcal {M}$ satisfies $\varphi $ ’) with $\varphi ^\beta $ throughout. For each $\mathcal {L}$ -formula $\varphi (\bar x, \bar y, \bar Y)$ not containing X free, we wish to show that PA proves

$$ \begin{align*}(\forall \bar y \forall \bar Y \exists X \forall \bar x [X \bar x \leftrightarrow \varphi(\bar x, \bar y, \bar Y)])^\beta. \end{align*} $$

This basically says: ‘There is a Boolean combination of equalities coextensive with $\varphi (\bar x, \bar y, \bar Y)^\beta $ .’ By the formalized version of Lemma 6.40, it suffices to prove in PA that $\varphi (\bar x, \bar y, \bar Y)^\beta $ is a symmetric $L_{\mathrm{PA}}$ -formula. To do this, use induction on $\mathcal {L}$ -formulas $\varphi (\bar x, \bar X)$ to prove the following theorem scheme in PA:

$$ \begin{align*}\pi \text{ is an essentially finite permutation} \rightarrow (\forall \bar x \forall \bar X [\varphi(\bar x, \bar X) \leftrightarrow \varphi(\pi(\bar x), \pi(\bar X))])^\beta. \end{align*} $$

(This corresponds to our earlier observation that permuting everything uniformly doesn’t change any truth-values in $\mathcal {M}$ relative to any variable-assignment.) Then copy the rest of the proof of Lemma 6.38.

In the same way, it is easy to formalize Lemma 6.39 in PA.

We are now ready to prove the first main theorem of the paper.

Theorem 7.47. w2FA is not conservative over $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ .

Proof. Let $\mathit {Con}_{\mathrm{PA}}$ denote a standard consistency statement for PA. We claim that $(\mathit {Con}_{\mathrm{PA}})^\alpha $ is a witness to non-conservativeness. That is,

(1) $$ \begin{align} \mathit{Ax}_{\mathcal{L}} + \neg \mathit{Fin}(V) & \not\vdash (\mathit{Con}_{\rm PA})^\alpha, \end{align} $$
(2) $$ \begin{align} \mathrm{w2FA} + \neg \mathit{Fin}(V) & \vdash (\mathit{Con}_{\rm PA})^\alpha. \end{align} $$

Proof of claim (1)

Write $\vartriangleright $ for ‘interprets’. From Lemmas 7.43 and 7.46, we have

$$ \begin{align*}\mathrm{PA} \ \vartriangleright^\beta \ \mathit{Ax}_{\mathcal{L}} +\neg \mathit{Fin}(V) \ \vartriangleright^\alpha \ \mathrm{PA}. \end{align*} $$

Suppose for a contradiction that $\mathit {Ax}_{\mathcal {L}} +\neg \mathit {Fin}(V) \vdash (\mathit {Con}_{\mathrm{PA}})^\alpha $ . Then $\mathrm{PA} \vdash ((\mathit {Con}_{\mathrm{PA}})^\alpha )^\beta $ , and hence $\mathrm{PA} \vartriangleright ^{\beta \circ \alpha } \mathrm{PA} + \mathit {Con}_{\mathrm{PA}}$ . However, by a strong version of Gödel’s second incompleteness theorem, $\mathrm{PA} \not \vartriangleright (\mathrm{PA} + \mathit {Con}_{\mathrm{PA}})$ .Footnote 14 Contradiction.

Proof of claim (2)

It is well known that $Z_2 \vdash \mathit {Con}_{\mathrm{PA}}$ . Hence, by Lemma 7.44,

$$ \begin{align*}\mathrm{w2FA} + \neg \mathit{Fin}(V) \vdash (\mathit{Con}_{\mathrm{PA}})^\alpha. \end{align*} $$

Corollary 7.48. w2FA is not conservative over $\mathit {Ax}_{\mathcal {L}}$ .

For proof, see Lemma 1.3.

Corollary 7.49. 2FA is not conservative over $\mathit {Ax}_{\mathcal {L}}$ .

8 w2FA is conservative over stronger base theories

It is surprising that w2FA is not conservative over $\mathit {Ax}_{\mathcal {L}}$ . However, the next two theorems establish some limits to the non-conservativeness of w2FA.

Theorem 8.50. w2FA is conservative over third-order logic.

Proof. Let $\mathcal {L}^3$ be the third-order analog of the base language $\mathcal {L}$ . Let $\mathit {Ax}_{\mathcal {L}^3}$ denote the axioms of the deductive system for $\mathcal {L}^3$ , including full third-order comprehension in the base sort. Note that w2FA still only includes second-order comprehension for the numerical sort.

Take any $\mathcal {L}^3$ -formula $\varphi $ , and suppose that $\mathrm{w2FA} + \mathit {Ax}_{\mathcal {L}^3} \vdash \varphi $ . We show that $\mathit {Ax}_{\mathcal {L}^3} \vdash \varphi $ . Our strategy is to define an interpretation of w2FA in $\mathit {Ax}_{\mathcal {L}^3}$ that leaves $\mathcal {L}^3$ -sentences fixed (up to renaming of bound variables). Under such an interpretation, any derivation of $\varphi $ from $\mathrm{w2FA} + \mathit {Ax}_{\mathcal {L}^3}$ is transformed into a derivation of $\varphi $ from $\mathit {Ax}_{\mathcal {L}^3}$ . The idea is to interpret each cardinality $\#X$ as the concept X from whence it came, with numerical-sort equality being interpreted as equinumerosity.

First, we define a pre-translation from variables of $\mathcal {L}^3 \cup \mathcal {L}^+$ into variables of $\mathcal {L}^3$ . Translate each variable of sort $\tau $ as a variable of sort $\tau ^*$ , where

$$ \begin{align*} 0^* &:= 0, \\ n^* &:= \langle 0 \rangle, \\ \langle \tau_1, \ldots, \tau_k \rangle^* &:= \langle \tau_1^*, \ldots, \tau_k^* \rangle. \end{align*} $$

In other words, $\tau ^*$ is obtained from $\tau $ by replacing each occurrence of n with $\langle 0 \rangle $ .

Set up the pre-translation so that distinct variables of $\mathcal {L}^3 \cup \mathcal {L}^+$ are translated as distinct variables of $\mathcal {L}^3$ . For example, let the base-sort concept variables be enumerated by $X_0, X_1, X_2, \ldots $ , and the numerical-sort object variables by $\mathbf {{v}}_0, \mathbf {{v}}_1, \mathbf {{v}}_2, \ldots $ . Then let the pre-translations be

$$ \begin{align*} X_i^* &:= X_{2i} , \\ \mathbf{{v}}_i^* &:= X_{2i+1}. \end{align*} $$

Similarly for other sorts.

We now define the translation $*: \mathcal {L}^3 \cup \mathcal {L}^+ \to \mathcal {L}^3$ . In the first and last lines, let $\tau = \langle \tau _1, \ldots , \tau _k \rangle $ be any second- or third-order sort. In the last line, $\mathit {Cong}_\approx ((X^\tau )^*)$ is a metalinguistic abbreviation of the statement: ‘ $\approx $ is a congruence for the relevant argument-places of $(X^\tau )^*$ ’, where the sort $\tau $ determines which argument-places are relevant.

$$ \begin{align*} (X^\tau x_1^{\tau_1} \cdots x_k^{\tau_k})^* &:= (X^\tau)^* (x_1^{\tau_1})^* \cdots (x_k^{\tau_k})^*. \\ (x = y)^* &:= x^* = y^*. \\ (\mathbf{x} = \mathbf{y})^* &:= \mathbf{x}^* \approx \mathbf{y}^*. \\ (\mathbf{x} = \#X)^* &:= \mathbf{x}^* \approx X^*. \\ (\varphi \rightarrow \psi)^* &:= \varphi^* \rightarrow \psi^*. \\ (\neg \varphi)^* &:= \neg \varphi^*. \\ (\forall x \ \varphi)^* &= \forall x^* \ \varphi^*. \\ (\forall \mathbf{x} \ \varphi)^* &= \forall \mathbf{x}^* \ \varphi^*. \\ (\forall X^\tau \, \varphi)^* &= \begin{cases} \forall (X^\tau)^* \, \varphi^*, & \text{if } \tau \in \mathit{Sorts}^3(\{0\}), \\ \forall (X^\tau)^* (\mathit{Cong}_\approx((X^\tau)^*) \rightarrow \varphi^*), & \text{else.} \end{cases} \end{align*} $$

It is easy to check that the $*$ -translation of each axiom of w2FA is provable from $\mathit {Ax}_{\mathcal {L}^3}$ . So, the translation works.

To prove the next theorem, we need another little fact about conservativeness.

Lemma 8.51. Let T be a theory in a formal language L, and let A be any L-sentence. Suppose that a sentence $\Delta $ is conservative over $T + A$ and is also conservative over $T + \neg A$ . Then $\Delta $ is conservative over T.

Proof. Take any $\varphi \in L$ , and suppose that $T + \Delta \vdash \varphi $ . We show that $T \vdash \varphi $ . Indeed

$$ \begin{align*} T + A + \Delta &\vdash \varphi, \\ T + A &\vdash \varphi, \\ T &\vdash A \rightarrow \varphi. \end{align*} $$

By the same reasoning, we also have $T \vdash \neg A \rightarrow \varphi $ . Hence, $T \vdash \varphi $ .

Theorem 8.52. w2FA is conservative over $\mathit {Ax}_{\mathcal {L}} + \mathit {Fin}(V)$ .

Proof. Let $|V|=1$ abbreviate the formula $\forall x \forall y \ x=y$ . By Lemma 8.51, we may divide into cases according to whether $|V| = 1$ or $|V| \neq 1$ . The rest of the proof is contained in Lemmas 8.53 and 8.54.

Lemma 8.53. w2FA is conservative over $\mathit {Ax}_{\mathcal {L}} + \mathit {Fin}(V) + |V| \neq 1$ .

Proof. We follow the same strategy as in Theorem 8.50. That is, we show how to define an interpretation $\dagger $ of w2FA in $\mathit {Ax}_{\mathcal {L}} + \mathit {Fin}(V) + |V| \neq 1$ that leaves $\mathcal {L}$ -sentences fixed (up to renaming of bound variables). The idea is to interpret cardinalities $\#X$ as pairs of base-sort objects. Specifically, we will fix distinct base-sort objects a and b, represent $\# (V \upharpoonright x)$ as $(x, a)$ , and represent $\#\varnothing $ as $(a,b)$ .

First, we define a pre-translation from variables of $\mathcal {L}^+$ into variables of $\mathcal {L}$ . Translate each variable of sort $\tau $ as a distinct variable or pair of variables of sort(s) $\tau ^\dagger $ , where

$$ \begin{align*} 0^\dagger &:= 0, \\ n^\dagger &:= 0, 0, \\ \langle \tau_1, \ldots, \tau_k \rangle^\dagger &:= \langle \tau_1^\dagger, \ldots, \tau_k^\dagger \rangle. \end{align*} $$

For example, $\langle n, 0, n \rangle ^\dagger = \langle 0,0,0,0,0\rangle $ and $\langle \langle n \rangle , n \rangle ^\dagger = \langle \langle 0,0 \rangle , 0,0\rangle $ .

Set up the pre-translation so that no variable of $\mathcal {L}$ is ever used twice. For definiteness, let the base-sort object variables be enumerated by $v_0, v_1, v_2, \ldots $ , and the numerical-sort object variables by $\mathbf {{v}}_0, \mathbf {{v}}_1, \mathbf {{v}}_2, \ldots $ . Then let the pre-translations of the object variables be

$$ \begin{align*} v_i^\dagger &:= v_{3i} ,\\ \mathbf{{v}}_i^\dagger &:= v_{3i+1} v_{3i+2}. \end{align*} $$

Similarly for second-order variables.

Now we define the interpretation $\dagger : \mathcal {L}^+ \to \mathcal {L}$ . Fix a well-ordering $\leq $ of V, and fix distinct base-sort objects $a \neq b$ . In the first and last lines, let $\tau = \langle \tau _1, \ldots , \tau _k \rangle $ be any second-order sort.

$$ \begin{align*} (X^\tau x_1^{\tau_1} \cdots x_k^{\tau_k})^\dagger &:= (X^\tau)^\dagger (x_1^{\tau_1})^\dagger \cdots (x_k^{\tau_k})^\dagger. \\ (v_i = v_j)^\dagger &:= v_{3i} = v_{3j}. \\ (\mathbf{{v}}_i = \mathbf{{v}}_j)^\dagger &:= v_{3i+1} = v_{3j+1} \wedge v_{3i+2} = v_{3j+2}. \\ (\mathbf{{v}}_i = \#X)^\dagger &:= (X \approx (V \upharpoonright v_{3i+1}) \wedge v_{3i+2} = a) \vee (X = \varnothing \wedge v_{3i+1} = a \wedge v_{3i+2} = b). \\ (\varphi \rightarrow \psi)^\dagger &:= \varphi^\dagger \rightarrow \psi^\dagger. \\ (\neg \varphi)^\dagger &:= \neg \varphi^\dagger. \\ (\forall v_i \ \varphi)^\dagger &= \forall v_{3i} \ \varphi^\dagger. \\ (\forall \mathbf{{v}}_i \ \varphi)^\dagger &= \forall v_{3i+1}\forall v_{3i+2} \ \varphi^\dagger. \\ (\forall X^\tau \, \varphi)^\dagger &= \forall (X^\tau)^\dagger \, \varphi^\dagger. \end{align*} $$

In order to justify the interpretation of $\#$ , we must check that for each base concept X, there is a unique initial segment of $(V, \leq )$ that is equinumerous with X. For the existence claim, recall that $\mathit {Ax}_{\mathcal {L}}$ proves that any two well-orderings are comparable (Lemma 5.29). In particular, $(X,\leq )$ is order-isomorphic with a segment of $(V, \leq )$ , and hence X is equinumerous with that segment. For the uniqueness claim, use the pigeonhole principle (Remark 5.31).

Now it is easy to check that the $\dagger $ -translation of each axiom of w2FA is provable from $\mathit {Ax}_{\mathcal {L}} + \mathit {Fin}(V) + |V| \neq 1$ . So, the interpretation works.

Lemma 8.54. w2FA is conservative over $\mathit {Ax}_{\mathcal {L}} + |V| = 1$ .

Proof. Observe that $\mathit {Ax}_{\mathcal {L}} + |V| = 1$ is a categorical theory, and hence it is a complete theory. So, the only way that w2FA could be non-conservative over $\mathit {Ax}_{\mathcal {L}} + |V| = 1$ is if the combined theory $\mathrm{w2FA} + |V| = 1$ were inconsistent. But $\mathrm{w2FA} + |V| = 1$ is consistent: it has a model $\mathcal {M}$ with object domains $M_0 = \{a\}$ and $M_n = \{0,1\}$ and with $I(\#)$ being the function mapping each base-sort concept to its cardinality.

9 The non-conservativeness of 2FA

In the previous section, we established some limits to the non-conservativeness of w2FA. In this section, we will show that 2FA is more deeply non-conservative than w2FA. The main result is Theorem 9.67, which says that 2FA is non-conservative over $\mathit {Ax}_{\mathcal {L}} + \mathit {Fin}(V)$ . Our proof of this result can be generalized to show that 2FA is non-conservative over pure axiomatic n-th order logic for any $n \geq 2$ , or even over simple type theory.

Roughly, the idea is to construct a Gödel sentence for $\mathit {Ax}_{\mathcal {L}} + \mathit {Fin}(V)$ . By a variation on Gödel’s first incompleteness theorem, $\mathit {Ax}_{\mathcal {L}} + \mathit {Fin}(V)$ does not prove its own Gödel sentence. On the other hand, $\mathrm{2FA} + \mathit {Fin}(V)$ does prove the Gödel sentence, because it is a powerful theory: it interprets second-order arithmetic in the new sort (and it is smart enough to relate that arithmetic to the Gödel sentence expressed in $\mathcal {L}$ ).

But $\mathit {Ax}_{\mathcal {L}} + \mathit {Fin}(V)$ says that the universe is finite, so it cannot interpret Q. How, then, is it possible to pull off the Gödel argument? The trick is that $\mathit {Ax}_{\mathcal {L}} + \mathit {Fin}(V)$ has arbitrarily large models. If $\mathit {Ax}_{\mathcal {L}} + \mathit {Fin}(V)$ proved its own Gödel sentence, then any sufficiently large model would contain a witness to the paradoxical derivation, yielding a contradiction.

To implement this argument, it will be convenient to work with a definitional extension $T = \mathit {Ax}_{\mathcal {L} \cup L'} + \mathit {Fin}(V) + \Delta $ , which we now describe.

Definition 9.55. Let $\mathcal {L} \cup L' : = \mathcal {L}_{\{0\}}[\{0,S,\leq , A,M\}]$ .

We identify variables of $L'$ with object variables of $\mathcal {L}$ . Thus,

  • $0$ is a base object constant,

  • S and $\leq $ are constants of sort $\langle 0, 0 \rangle $ ,

  • A and M are constants of sort $\langle 0, 0, 0 \rangle $ .

Let $\mathit {Ax}_{\mathcal {L} \cup L'}$ be the axioms of the deductive system for $\mathcal {L} \cup L'$ .

Definition 9.56. Let $\Delta $ be the conjunction of the following $(\mathcal {L}\cup L')$ -formulas:

  1. 1. $(V,\leq )$ is a double well-ordering with least element $0$ ,

  2. 2. $Sxy$ iff y is the upper neighbor of x with respect to $\leq $ ,

  3. 3. Definitions of A and M:

    $$ \begin{align*} & Ax0z\leftrightarrow z= x, \\ & Syy' \wedge Szz' \rightarrow (Axyz \leftrightarrow Axy'z'), \\ & Mx0z \leftrightarrow z=0, \\ & Syy' \wedge Azxz' \rightarrow (Mxyz \leftrightarrow Mxy'z'). \end{align*} $$

Definition 9.57. Let $T = \mathit {Ax}_{\mathcal {L} \cup L'} + \mathit {Fin}(V) + \Delta $ .

Lemma 9.58. $T \vdash BA'$ .

Proof. It is obvious that T proves the universal closures of the first three axioms of $BA'$ . Furthermore, since $(V,\leq )$ is a well-ordering, we have induction for all $(\mathcal {L} \cup L')$ -formulas. Using induction, it is easy to prove the universal closures of the remaining axioms of $BA'$ .

We will now describe the construction of the Gödel sentence of T.

Fix a Gödel numbering of $\mathcal {L}\cup L'$ . We describe $L_{\mathrm{PA}}$ -formulas $\mathit {Der}_T$ , $\mathit {diag}$ representing certain primitive recursive notions.

Let $\mathit {Der}_T(x,y)$ just in case: x is the Gödel number of a T-derivation of a formula with Gödel number y.

Let $\mathit {diag}(x)=y$ be a function with the following property: if n is the Gödel number of an $(\mathcal {L} \cup L')$ -formula $\theta (y)$ with exactly the free variable y, then

(The notation $y \doteq {n}$ is from Definition 4.21.) Note that $\mathit {diag}$ is modeled on the Gödel diagonal function: in essence, it substitutes into a formula its own Gödel number.

It is well known that recursive relations are $\Delta _1$ -definable in PA [Reference Hájek and Pudlák13, p. 18, theorem 0.45]. So, we may choose $\mathit {Der}_T$ and $\mathit {diag}$ so that $\mathit {Der}_T(x, \mathit {diag}(y))$ is a $\Sigma _1$ formula. By Lemma 4.23, there is an equivalent $\Sigma _1'$ formula $\varphi (x,y)$ of $L'$ such that, for any parameters $a,b \in \mathbb {N}$ ,

$$ \begin{align*}\mathbb{N} \vDash \varphi(a, b) \iff \mathbb{N} \vDash \mathit{Der}_T(S^a 0, \mathit{diag}(S^b 0)). \end{align*} $$

Let p be the Gödel number of $\forall x \neg \varphi (x,y)$ . Then , where G is the following sentence:

$$ \begin{align*}G := \ \forall y(y \doteq {p} \rightarrow \forall x \neg \varphi(x,y)). \end{align*} $$

We say that G is the Gödel sentence of the theory T.

Lemma 9.59. The theory $T = \mathit {Ax}_{\mathcal {L} \cup L'} + \mathit {Fin}(V) + \Delta $ does not prove its own Gödel sentence G.

Proof. Suppose for sake of contradiction that $T \vdash G$ . Let d be the Gödel number of a derivation of G. Then we have

$$ \begin{align*} & \mathbb{N} \vDash \mathit{Der}_T(S^d 0, \mathit{diag}(S^p 0)), \\ & \mathbb{N} \vDash \varphi(d,p). \end{align*} $$

Write $\varphi (x,y)$ as