1 Introduction
Frege [Reference Frege10–Reference Frege12] sought to derive the theorems of arithmetic from nothing but basic logical laws and definitions. Such a derivation, called a logicist derivation of arithmetic, would provide the ultimate foundation for our arithmetical knowledge. It would justify the theorems of arithmetic once and for all by deriving them from principles that needed no justification—principles that were either selfevident (‘basic logical laws’) or true simply by stipulation (‘definitions’).
By his own lights, Frege did not manage to give a logicist derivation of arithmetic. But he did show how to derive a very powerful system of arithmetic from a single, natural principle, known as Hume’s Principle (HP).Footnote ^{1} Informally, HP says, ‘The number of Fs is equal to the number of Gs iff there is a one–one correspondence between the Fs and the Gs.’ In secondorder logic, HP is expressible as the universal closure of
where $\#$ is an operator that combines with monadic secondorder variables $F,G$ to form terms of object type, and $ F \approx _R G$ abbreviates the statement that R is a one–one correspondence between the Fs and the Gs.Footnote ^{2} Then we have the following beautiful result:
Theorem 1.1 (Frege’s Theorem)
The theorems of secondorder arithmetic are derivable in secondorder logic from HP together with eliminative definitions of natural number, zero, and successor.Footnote ^{3}
NeoFregean logicists, preeminently Hale and Wright [Reference Hale and Wright14], argue that Frege’s Theorem already yields a logicist derivation of arithmetic. They claim that HP may be taken as an implicit definition of the operator $\#$ (‘the number of’) in purely logical terms.Footnote ^{4} Hale and Wright’s notion of implicit definition is deeply controversial. For our purposes, the main point is that Hale and Wright conceive of implicit definitions as true simply by stipulation [Reference Hale and Wright14, p. 117]. Such definitions need no justification. They are true by fiat. So, if Hale and Wright are correct, Frege’s Theorem does indeed yield a logicist derivation of arithmetic.
Not just anything can be stipulated to be true. We cannot establish any new ‘substantive’ truths by fiat. No one could have established by fiat that the Morning Star is the Evening Star. Accordingly, it is natural to think that any legitimate stipulative definition must meet the following requirement, known as conservativeness:
Definition 1.2. Let T be a theory in a formal language L. Let $\Delta $ be a definition of one new sign, and let $L^+$ be the language obtained by adding that new sign to L. Assume that deductive systems for L and $L^+$ have been specified. Then $\Delta $ is conservative over T iff any Lformula that is derivable $_{L^+}$ from $T + \Delta $ is already derivable $_{L}$ from T.
Intuitively, a definition is conservative over our theory T just in case adding it to our theory does not yield any new theorems expressible entirely in old vocabulary. The definition does not settle any open questions that we already knew how to ask.
But HP is not conservative. More precisely, HP is not conservative over pure axiomatic secondorder logic—which presumably ought to be the starting theory for aspiring logicists.Footnote ^{5} For HP proves a sentence DI in the language of pure secondorder logic which says that the universe is Dedekindinfinite (‘there is a one–one mapping from the universe into itself that is not onto’). But DI is not a theorem of pure secondorder logic. So, it seems that HP cannot be a legitimate stipulative definition. Call this the conservativeness problem for neoFregean logicism.
The conservativeness problem is robust. Definitions that are conservative over pure secondorder logic seem to be mathematically very weak, and hence unable to provide a foundation for arithmetic. Furthermore, adding more basic logical laws won’t help unless those laws suffice to prove DI. But it seems like a tall order to prove the existence of infinitely many objects from basic logical laws alone.
Hale and Wright respond to the conservativeness problem by denying that stipulative definitions must be conservative in the sense of Definition 1.2. Roughly speaking, they hold that stipulative definitions need only satisfy a modified conservativeness requirement, known as Fieldconservativeness.Footnote ^{6} We set out to explore a different approach. Is it possible to find a variant of HP that is conservative in the standard deductive sense—the sense of Definition 1.2?
A promising direction is suggested by Heck’s work on the Julius Caesar problem [Reference Heck15, Reference Heck16]. Heck reconstrues Hume’s Principle as introducing a new sort of singular term into the language. Call the reconstrued principle twosorted Hume’s Principle (2HP), and the theory that results from supplementing 2HP with logical axioms for the expanded language, twosorted Frege Arithmetic (2FA). The theory 2FA interprets secondorder arithmetic in the numerical sort. In particular, 2FA proves that the numerical universe is Dedekindinfinite. But there is no obvious witness to nonconservativeness, because the numerical sort is not part of the base language. Indeed, it has been claimed that 2FA is conservative over pure secondorder logic [Reference Burgess3, p. 237, n. 7].
In this paper we prove that 2FA is not conservative over pure secondorder logic. In fact, we prove something stronger. Our strategy is based on the following little fact:
Lemma 1.3. Let T be a theory in a formal language L, and let A be any Lsentence. Suppose that a sentence $\Delta $ is not conservative over $T + A$ . Then $\Delta $ is not conservative over T.
Proof. Let $\varphi $ be an Lsentence such that $T + A + \Delta \vdash \varphi $ , but $T + A \not \vdash \varphi $ . By the Deduction Theorem, we have $T + \Delta \vdash A \rightarrow \varphi $ , but $T \not \vdash A \rightarrow \varphi $ .
In Section 7, we consider a theory w2FA that is much weaker than 2FA. We show that w2FA is nonconservative over pure secondorder logic together with an axiom saying that the base universe is infinite.Footnote ^{7} In other words, even if we already know that there are infinitely many objects, w2FA tells us something new about them! Then from Lemma 1.3, it follows that w2FA, and hence 2FA, is nonconservative over pure secondorder logic.
In Section 8, we show that for the weaker theory w2FA, the nonconservativeness vanishes if we strengthen the base theory in either of two natural ways. First, w2FA is conservative over third or higherorder logic. Second, w2FA is conservative over secondorder logic plus ‘the base universe is finite’.
In Section 9, we present a different proof that 2FA is not conservative over pure secondorder logic. This proof shows that 2FA remains nonconservative over the stronger base theories discussed in the previous section. Specifically, we show that 2FA is not conservative over secondorder logic plus ‘the base universe is finite’, and the proof of this fact generalizes to third and higherorder logic.
In order to state and prove these results, we will need some preliminaries. In Section 2, we explain the logical setting for the paper: manysorted axiomatic secondorder logic. In Section 3, we explain how to construe Hume’s Principle in a manysorted setting, and we define the theories w2FA and 2FA. In Section 4, we present some background material on first and secondorder arithmetic. In Section 5, we show how to formalize some facts about wellorderings and finiteness in secondorder logic. In Section 6, we discuss the Fraenkel model, which is the minimal infinite model of pure secondorder logic.
In Sections 7–9, we prove the main results. Lastly, in Section 10, we connect our work to the literature on Fieldconservativeness and related notions. Our main result implies that in a onesorted setting, HP is neither deductively Fieldconservative nor deductively Caesarneutral conservative over second or higherorder logic. This answers some open problems raised by Shapiro and Weir [Reference Shapiro and Weir27, p. 298], Fine [Reference Fine9, p. 192, n. 1], and Studd [Reference Studd30, p. 597]. We conclude by mentioning some open problems of our own.
2 Manysorted secondorder logic
We work in axiomatic secondorder logic with many sorts of singular terms and firstorder variables. In this section we explain the logical framework in considerable generality.
In Section 2.1, we define ‘sort’. In Sections 2.2 and 2.3, we define secondorder languages $\mathcal {L}_J[K]$ for any nonempty set of object sorts J and any set of constant symbols K. We present deductive systems and general semantics for these languages. In Section 2.4, we define the two manysorted secondorder languages that will be central to the rest of the paper, called the base language $\mathcal {L} := \mathcal {L}_{\{0\}}[\varnothing ]$ and the expanded language $\mathcal {L}^+ := \mathcal {L}_{\{0, n\}}[\{\#_0, \#_n \}]$ .
2.1 Sorts
Let J be any nonempty set of symbols. These symbols are called firstorder sorts or object sorts.
Let $\mathit {Sorts}^2(J)$ be the set of all tuples $\langle j_1, \dots , j_n \rangle $ with $n\geq 1$ and $j_1, \dots , j_n \in J$ . These tuples are secondorder relation sorts formed from J.
Let $\mathit {Sorts}^3(J)$ be the set of all tuples $\langle \tau _1, \dots , \tau _n \rangle $ with $n\geq 1$ and $\tau _1, \dots , \tau _n \in J \cup \mathit {Sorts}^2(J)$ , and with at least one of $\tau _1, \dots , \tau _n$ belonging to $\mathit {Sorts}^2(J)$ . These tuples are thirdorder relation sorts formed from J.
Let $\mathit {FnSorts}(J)$ be the set of all tuples $\langle \tau _1, \ldots , \tau _n; \tau _{n+1} \rangle $ with $n\geq 1$ and $\tau _1, \ldots , \tau _n, \tau _{n+1} \in J \cup \mathit {Sorts}^2(J)$ . These tuples are function sorts formed from J.
Let $\mathit {Sorts}(J) = J \cup \mathit {Sorts}^2(J) \cup \mathit {Sorts}^3(J) \cup \mathit {FnSorts}(J)$ .
Intuitively, $\langle \tau _1, \ldots , \tau _n \rangle $ is the sort of nary relations with arguments of sorts $\tau _1, \ldots , \tau _n$ , while $\langle \tau _1, \ldots , \tau _n; \tau _{n+1} \rangle $ is the sort of nary functions with arguments of sorts $\tau _1, \ldots , \tau _n$ and values of sort $\tau _{n+1}$ .
Example 2.4. Suppose $J = \{0, 1\}$ . Then $\langle 1, 1, 0 \rangle \in \mathit {Sorts}^2(J)$ , $\langle 0, 1, \langle 0, 1 \rangle \rangle \in \mathit {Sorts}^3(J)$ , and $\langle \langle 0 \rangle; 1 \rangle \in \mathit {FnSorts}(J)$ .
In the languages $\mathcal {L}_J[K]$ , there will be no function variables and no thirdorder variables. We only allow variables of sorts $\tau \in J \cup \mathit {Sorts}^2(J)$ . However, there may be constant symbols of any sort $\tau \in \mathit {Sorts}(J)$ .
2.2 Languages without constant symbols
For any set of object sorts J, we define the secondorder language $\mathcal {L}_J$ as follows:

(i) The alphabet of $\mathcal {L}_J$ contains variables $x^j, y^j, z^j, \ldots $ for each object sort $j \in J$ , and relation variables $X^{\tau }, Y^{\tau }, Z^{\tau }, \ldots $ for each secondorder sort $\tau \in \mathit {Sorts}^2(J)$ . There are no nonlogical constant symbols. The logical constants are $\neg , \,{\rightarrow}, \forall , =$ .

(ii) The terms of sort $\tau $ are the variables of sort $\tau $ , for each $\tau \in J \cup \mathit {Sorts}^2(J)$ .

(iii) In atomic formulas, we require that the sorts match. More precisely, the atomic formulas are strings of the form $t_1^j = t_2^j$ and $T^{\langle j_1, \ldots , j_n \rangle } t_1^{j_1}, \dots , t_n^{j_n}$ , where each $t^j$ is a term of sort $j \in J$ , and $T^{\langle j_1, \ldots , j_n \rangle }$ is a term of sort $\langle j_1, \ldots , j_n \rangle $ .

(iv) If $\varphi , \psi $ are formulas and $x^j, X^\tau $ are variables, then $ \neg \varphi $ , $ \varphi \rightarrow \psi $ , $ \forall x^j \varphi $ , $ \forall X^\tau \varphi $ are also formulas.
The deductive system for $\mathcal {L}_J$ is essentially equivalent to Shapiro’s D2 minus the axiom schema of choice [Reference Shapiro26, pp. 66–67]. Compare [Reference Enderton8, pp. 112–113]. Its axioms are all closed universal generalizations of the formulas depicted in Table 1. For legibility, we suppress sorts. But note that x, y, and t must all be of the same sort, and X and T must be of the same sort. This requirement is induced by the formation rules of the language.
Let $\varphi , \psi $ be any formulas of $\mathcal {L}_J$ . Let $x,y, X$ be variables, and $t,T$ be terms. (Note that $x,y,t$ must all be of the same sort. Likewise, X and T must be of the same sort.) Let $\varphi (t)$ be the result of substituting t for all free occurrences of x in $\varphi $ . In (*), let $\alpha $ be any atomic formula of $\mathcal {L}_J$ , and let $\alpha '$ be any formula obtained from $\alpha $ by replacing zero or more occurrences of x with y. In Comprehension, we write $X \bar x$ to abbreviate $X ^{\langle j_1, \ldots , j_n \rangle } x_1^{j_1} \cdots x_n^{j_n}$ .
An $\mathcal {L}_J$ prestructure $\mathcal {M}$ is a collection of nonempty sets $\{M_\tau : \tau \in J \cup \mathit {Sorts}^2(J) \}$ such that $M_{\langle j_1, \ldots , j_n \rangle } \subseteq \mathcal {P}(M_{j_1} \times \cdots \times M_{j_n})$ for all $j_1, \ldots , j_n \in J$ . Satisfaction and truth in $\mathcal {M}$ are defined inductively, taking variables of sort $\tau $ to range over domain $M_\tau $ .
A general $\mathcal {L}_J$ structure is an $\mathcal {L}_J$ prestructure in which the secondorder comprehension axioms are satisfied. Our deductive system is sound and complete with respect to general $\mathcal {L}_J$ structures.
A standard $\mathcal {L}_J$ structure $\mathcal {M}$ is a general $\mathcal {L}_J$ structure in which $M_{\langle j_1, \ldots , j_n \rangle }= \mathcal {P}(M_{j_1} \times \cdots \times M_{j_n})$ for all $j_1, \ldots , j_n \in J$ . So, a standard $\mathcal {L}_J$ structure is fully specified by its object domains $\{M_j : j \in J\}$ . Our deductive system is sound but not complete with respect to standard structures.
2.3 Languages with constant symbols
We will now sketch how to add constant symbols to the languages $\mathcal {L}_J$ .
For each $\tau \in \mathit {Sorts}(J)$ , let $K_\tau $ be a set of new symbols, called constant symbols. Each constant symbol is assigned to a particular sort $\tau $ , and is classified as an object, relation, or function constant accordingly. Assume that the $K_\tau $ ’s are pairwise disjoint, or use superscripts to keep track of sorts. Let $K = \bigcup _{\tau \in \mathit {Sorts}(J)} K_\tau $ .
Define the language $\mathcal {L}_J[K]$ as follows:

(i) The alphabet of $\mathcal {L}_J[K]$ is the alphabet of $\mathcal {L}_J$ expanded by K.

(ii) If $\tau \in J \cup \mathit {Sorts}^2(J)$ , the atomic terms of sort $\tau $ are the variables $x^\tau $ and the constants in $K_\tau $ .
If $\tau \in \mathit {Sorts}^3(J)$ , the atomic terms of sort $\tau $ are the constants in $K_\tau $ .
If $\tau = \langle \tau _1, \ldots , \tau _n; \tau _{n+1} \rangle \in \mathit {FnSorts}(J)$ , and $f^\tau \in K_\tau $ , and $t_1^{\tau _1}, \ldots , t_n^{\tau _n}$ are terms of the indicated sorts, then $f^\tau t_1^{\tau _1} \cdots t_n^{\tau _n}$ is a term of sort $\tau _{n+1}$ .

(iii) The atomic formulas are defined as in $\mathcal {L}_J$ , except that we also allow atomic formulas of the form $T^{\tau } t_1^{\tau _1} \cdots t_n^{\tau _n}$ with $\tau = \langle \tau _1, \ldots , \tau _n \rangle \in \mathit {Sorts}^3(J)$ .

(iv) The inductive clauses generating the set of all formulas are unchanged.
The deductive system for $\mathcal {L}_J[K]$ is obtained from the deductive system for $\mathcal {L}_J$ by allowing $\varphi , \psi $ to range over $\mathcal {L}_J[K]$ formulas, $\alpha $ to range over atomic $\mathcal {L}_J[K]$ formulas, and adding axioms of Extensionality analogous to the axioms of Identity.Footnote ^{8}
An $\mathcal {L}_J[K]$ prestructure $\mathcal {M} = (\mathcal {S}, I)$ consists of an $\mathcal {L}_J$ prestructure $\mathcal {S}$ together with an interpretation I of the constant symbols that meets the following three conditions:

(i) If $c^j$ is an object constant of sort $j \in J$ , then $I(c^j) \in M_j$ .

(ii) If $R^\tau $ is a relation constant of sort $\tau = \langle \tau _1, \ldots , \tau _n \rangle \in \mathit {Sorts}^2(J) \cup \mathit {Sorts}^3(J)$ , then $I(R^\tau ) \in \mathcal {P}(M_{\tau _1} \times \cdots \times M_{\tau _n})$ .

(iii) If $f^\tau $ is a function constant of sort $\tau = \langle \tau _1, \ldots , \tau _n; \tau _{n+1} \rangle \in \mathit {FnSorts}(J)$ , then $I(f^\tau )$ is a function from $M_{\tau _1} \times \cdots \times M_{\tau _{n}}$ into $M_{\tau _{n+1}}$ .
General and standard $\mathcal {L}_J[K]$ structures are defined analogously to $\mathcal {L}_J$ structures.
2.4 The languages $\mathcal {L}$ and $\mathcal {L}^+$
We now define the two languages that will be at the center of the rest of the paper.
Definition 2.5. The base language is $\mathcal {L} := \mathcal {L}_{\{0\}}$ .
Definition 2.6. The expanded language is $\mathcal {L}^+ := \mathcal {L}_{\{0, n\}}[\{\#_0, \#_n\}]$ , where $\#_0$ and $\#_n$ are function constants of sorts $\langle \langle 0 \rangle; n \rangle $ and $\langle \langle n \rangle; n \rangle $ , respectively.
The logical axioms for $\mathcal {L}$ and $\mathcal {L}^+$ will be denoted by $\mathit {Ax}_{\mathcal {L}}$ and $\mathit {Ax}_{\mathcal {L}^+}$ , respectively.
Some notational conventions:

(i) We generally drop the superscripts $0$ , $\langle 0 \rangle $ , $\langle 0, 0 \rangle $ , $\ldots $ .

(ii) We generally write variables of sorts $\tau \in \{n\} \cup \mathit {Sorts}^2(\{n\})$ in boldface, and drop the superscripts n, $\langle n \rangle $ , $\langle n, n \rangle $ , $\cdots $ .

(iii) When we write secondorder relation superscripts, we drop the angle brackets and commas. For example, we write $X^{n0}$ instead of $X^{\langle n,0 \rangle }$ .

(iv) We drop the subscripts from $\#_0$ and $\#_n$ , writing $\#$ for both.

(v) Following Frege, we refer to monadic relations as concepts.
3 Heck’s theory 2FA
Think of the base language $\mathcal {L}$ as our starting language, and $\mathit {Ax}_{\mathcal {L}}$ as our starting theory. Heck [Reference Heck15], [Reference Heck16, pp. 150–151] reconstrues Hume’s Principle as introducing a new, numerical sort of object (sort n), together with a host of new secondorder relation sorts. The operator $\#$ (‘the number of’) may be applied to a concept variable of either sort, yielding a singular term of the numerical sort.
Definition 3.7. Weak twosorted Hume’s Principle (w2HP) is the universal closure of:
Here, $F^0 \approx _{R^{00}} G^0$ abbreviates the statement that $R^{00}$ is a one–one correspondence between $F^0$ and $G^0$ .
Intuitively, w2HP gives the criterion of identity for numbers belonging to basesort concepts. It tells us how to count basesort objects. But w2HP does not tell us how to count numbers. Since we do in fact count numbers, we are motivated to consider a stronger principle.
Definition 3.8. Twosorted Hume’s Principle (2HP) is the conjunction of the universal closures of the following three $\mathcal {L}^+$ formulas:
The first line is w2HP. The second line gives the criterion of identity for numbers belonging to numerical concepts. The third line gives the mixed criterion of identity, which tells us (e.g.) whether the number of JulioClaudian emperors equals the number of prime numbers less than 12.
Using our superscriptdropping conventions, we may write 2HP as follows:
Definition 3.9. Weak twosorted Frege Arithmetic (w2FA) is the theory whose logical axioms are $\mathit {Ax}_{\mathcal {L}^+}$ and whose sole nonlogical axiom is $\mathrm{w2HP}$ .Footnote ^{9} In other words,
Definition 3.10. Twosorted Frege Arithmetic (2FA) is the theory whose logical axioms are $\mathit {Ax}_{\mathcal {L}^+}$ and whose sole nonlogical axiom is $\mathrm{2HP}$ . In other words,
Notice that the logical axioms of 2FA include full secondorder comprehension for the expanded language. So, by Frege’s Theorem, 2FA interprets secondorder arithmetic in the numerical sort. It follows that 2FA proves a sentence which says that the numerical universe is Dedekindinfinite. But this is not a witness to nonconservativeness, because the numerical sort is not part of the base language. Prima facie, it seems quite plausible that 2FA should be a conservative extension of $\mathit {Ax}_{\mathcal {L}}$ .
4 Arithmetic
We will study 2FA by comparing it with other, betterknown systems of arithmetic. In Section 4.1, we describe the usual systems of first and secondorder arithmetic. In Section 4.2, we describe systems of arithmetic with no function symbols.
4.1 First and secondorder arithmetic
We begin with firstorder arithmetic. For reference, see [Reference Hájek and Pudlák13, pp. 12–13, 28–29].
Definition 4.11. The language of Peano arithmetic, $L_{\mathrm{PA}}$ , is a classical firstorder language with identity whose nonlogical vocabulary is $(0, S, \leq , +, \cdot )$ . Here, 0 is a constant symbol, S is a unary function symbol, $\leq $ is a binary relation symbol, and $+,\cdot $ are binary function symbols.
Definition 4.12. Robinson arithmetic, Q, is the theory in $L_{\mathrm{PA}}$ with the following eight axioms:
Definition 4.13. Peano arithmetic, PA, is the result of adding to Q the following axiom schema of induction:
where $\varphi (x)$ is any formula of $L_{\mathrm{PA}}$ .
We write $(\forall x \leq t)(\cdots )$ to abbreviate $\forall x (x \leq t \rightarrow \cdots )$ , and similarly we write $(\exists x \leq t)(\cdots )$ . The quantifiers occurring in these expressions are said to be bounded.
An $L_{\mathrm{PA}}$ formula is called bounded, or $\Sigma _0$ , if all quantifiers occurring in it are bounded.
An $L_{\mathrm{PA}}$ formula is called $\Sigma _n$ ( $n \geq 0$ ) if it consists of a string of n alternating unbounded quantifiers, the first of which is existential, followed by a bounded formula. That is, a $\Sigma _n$ formula has the form $\exists x \forall y \exists z \forall w \cdots \theta $ , where $\theta $ is bounded.
Definition 4.14. The theory $I\Sigma _n$ ( $n \geq 0$ ) is the result of adding to Q the axiom schema of induction above, restricted to $\Sigma _n$ formulas.
We now turn our attention to secondorder arithmetic. For reference, see [Reference Simpson28, pp. 2–5].
Definition 4.15. The language of secondorder arithmetic, $L_2$ , is a twosorted language consisting of all the vocabulary of $L_{\mathrm{PA}}$ , together with denumerably many monadic secondorder variables $X, Y, Z, \ldots $ and a secondorder quantifier $\forall X$ . The atomic formulas of $L_2$ include all strings of the form $Xt$ , where t is a firstorder term and X is a secondorder variable.
The secondorder variables of $L_2$ are usually called set variables, and the atomic formulas $Xt$ are sometimes written $t \in X$ . For our purposes, there is no difference between set variables and concept variables, and the predication relation $\in $ may be left implicit. Hence, $L_2$ may be regarded as an expansion of the monadic fragment of $\mathcal {L}$ .
Definition 4.16. Secondorder arithmetic, $Z_2$ , is the theory in $L_2$ whose axioms are those of Q, together with the secondorder induction axiom
and the secondorder comprehension scheme
for each formula $\varphi $ of $L_2$ not containing X free. As usual, $\varphi $ may contain parameters, i.e., free first or secondorder variables other than x.
4.2 First and secondorder arithmetic with no function symbols
In this section, we introduce an arithmetical language $L'$ in which successor, addition, and multiplication are rendered as relations (which may be only partially defined) instead of functions. This allows us to define $BA'$ , a weak system of arithmetic that does not assume the existence of infinitely many natural numbers. The main point of the section is to state Lemma 4.23 and prove Lemmas 4.25 and 4.28. We will use these lemmas in Section 9 only, so feel free to skip this section and return to it later.
For reference, see [Reference Hájek and Pudlák13, pp. 86–89, 233].
Definition 4.17. Let $L'$ be the classical firstorder language with identity whose nonlogical vocabulary is $(0, S, \leq , A, M)$ . Here, 0 is a constant symbol, S and $\leq $ are binary relation symbols, and A and M are ternary relation symbols.
An $L'$ formula is called bounded′, or $\Sigma _0'$ , if it contains only bounded quantifiers.
Definition 4.18. $BA'$ is the theory in $L'$ with the following axioms:

1. $\leq $ is a discrete linear order with least element 0,

2. $Sxy$ iff y is the upper neighbor of x with respect to $\leq $ ,

3. Definitions of A and M:
$$ \begin{align*} & Ax0z\leftrightarrow z= x, \\ & Syy' \wedge Szz' \rightarrow (Axyz \leftrightarrow Axy'z'), \\ & Mx0z \leftrightarrow z=0, \\ & Syy' \wedge Azxz' \rightarrow (Mxyz \leftrightarrow Mxy'z'), \end{align*} $$ 
4. Commutativity and associativity of A and M, distributivity, monotonicity of addition, monotonicity of multiplication by a positive number, and $x \leq y \leftrightarrow (\exists u \leq y) Axuy$ ,

5. Induction scheme for $\Sigma _0'$ formulas:
$$ \begin{align*}\varphi(0) \wedge \forall x \forall y (\varphi(x) \wedge Sxy \rightarrow \varphi(y)) \rightarrow \forall x \varphi(x). \end{align*} $$
Definition 4.19. $I\Sigma _0'$ is the result of adding to $BA'$ axioms saying that $S,A,M$ define total functions, namely $\forall x \exists y Sxy$ , etc.
An $L'$ formula is called $\Sigma _n'$ ( $n \geq 0$ ) if it consists of a string of n alternating unbounded quantifiers, the first of which is existential, followed by a bounded $'$ formula.
Definition 4.20. The theory $I\Sigma _n'$ ( $n \geq 0$ ) is the result of adding to $I\Sigma _0'$ the axiom schema of induction above, extended to $\Sigma _n'$ formulas.
We now state some useful facts about $BA'$ and its relatives.
Definition 4.21. Let $\mathfrak {D}$ be the conjunction of the following three $(L_{\mathrm{PA}} \cup L')$ formulas:
For each $n \in \mathbb {N}$ , let $x \doteq n$ abbreviate the $L'$ formula
Lemmas 4.22 and 4.23 tell us that the theories $I\Sigma _n$ and $I\Sigma _n'$ are in a strong sense equivalent.
Lemma 4.22. Let $n \geq 0$ . Then $I\Sigma _n' + \mathfrak {D} \vdash I\Sigma _n$ , and conversely $I\Sigma _n + \mathfrak {D} \vdash I\Sigma _n'$ .
Lemma 4.23. Let $\varphi $ be a $\Sigma _n$ formula with $n \geq 1$ . Then there is a $\Sigma _n'$ formula $\varphi '$ with the same free variables as $\varphi $ such that $I\Sigma _n' + \mathfrak {D} \vdash \varphi \leftrightarrow \varphi '$ .
For proof, see [Reference Hájek and Pudlák13, pp. 88–89].Footnote ^{10}
Lemma 4.24. $I\Sigma _0'$ and $BA'$ prove the same bounded $'$ formulas.
For proof, see [Reference Hájek and Pudlák13, p. 233].
Lemma 4.25. Let $\varphi (x_1, \ldots , x_k)$ be a bounded $'$ formula, and let $a_1, \ldots , a_k \in \mathbb {N}$ be such that $\mathbb {N} \vDash \varphi (a_1, \ldots , a_k)$ . Then
Proof. Let $\psi $ be the $L_{\mathrm{PA}}$ formula obtained from $\varphi $ by replacing $Sxy$ , $Axyz$ , $Mxyz$ with $Sx = y$ , $x+y=z$ , $x \cdot y = z$ respectively. Observe that $\psi (S^{a_1}0, \ldots , S^{a_k}0)$ is a true bounded sentence of $L_{\mathrm{PA}}$ .
Now we argue as follows:
The first line holds because $I\Sigma _0$ proves all true bounded sentences.Footnote ^{11} The second line follows by Lemma 4.22. Regarding the third line, it is easy to check that for each $n \in \mathbb {N}$ ,
The fourth line follows by propositional logic, because $\varphi $ and $\psi $ differ only by applications of the equivalences in $\mathfrak {D}$ . The fifth line follows because $I\Sigma _0' + \mathfrak {D}$ is conservative over $I\Sigma _0'$ for $L'$ formulas. The sixth line follows by Lemma 4.24.
Lastly, we describe a system of secondorder arithmetic without function symbols.
Definition 4.26. The language $L_2'$ is just like $L_2$ , but with the vocabulary of $L'$ replacing the vocabulary of $L_{\mathrm{PA}}$ .
Definition 4.27. Let $Z_2'$ be the theory in $L_2'$ whose axioms are those of $I\Sigma _0'$ , plus the secondorder induction axiom
and the secondorder comprehension scheme for $L_2'$ .
Lemma 4.28. $Z_2$ and $Z_2'$ are mutually interpretable. Indeed, $Z_2' + \mathfrak {D} \vdash Z_2$ , and conversely $Z_2 + \mathfrak {D} \vdash Z_2'$ .
Proof. We argue that $Z_2' + \mathfrak {D} \vdash Z_2$ . The other direction is easy.
Observe that $(Z_2' + \mathfrak {D}) \vdash (I\Sigma _0' + \mathfrak {D}) \vdash I\Sigma _0 \vdash Q$ . Furthermore, the two ways of formulating the secondorder induction axiom are equivalent in the presence of $Sx=y \leftrightarrow Sxy$ .
It remains to show that $Z_2' + \mathfrak {D}$ proves the secondorder comprehension scheme for $L_2$ . Take any $L_2$ formula $\varphi $ . Let $\psi $ be the formula obtained from $\varphi $ by replacing each atomic predication $Xt$ with $\exists z(Xz \wedge z=t)$ , where z is a new variable. Then every nonatomic term in $\psi $ occurs in an equation $t_1 = t_2$ . These equations are $L_{\mathrm{PA}}$ formulas. By Lemma 4.23, $Z_2' + \mathfrak {D}$ proves each $L_{\mathrm{PA}}$ formula to be equivalent to an $L'$ formula. So, there is an $L_2'$ formula $\varphi '$ such that $Z_2' + \mathfrak {D} \vdash \varphi \leftrightarrow \varphi '$ . Now apply secondorder comprehension to $\varphi '$ , and we are done.
5 Wellorderings and finiteness
In this section, we define ‘wellordering’ in $\mathcal {L}$ , and we note that $\mathit {Ax}_{\mathcal {L}}$ proves that all wellorderings are comparable (Lemma 5.29). Then we define the notion of Stäckelfiniteness and prove the important lemma of induction on finite concepts (Lemma 5.32). We will use these lemmas throughout the paper.
For simplicity, we work in $\mathcal {L}$ . However, these notions can easily be extended to $\mathcal {L}^+$ .
Let $\varnothing $ denote the empty concept. Let V denote the universal concept.
Let $Y \subseteq X$ abbreviate $\forall x (Yx \rightarrow Xx)$ .
Let ‘ $(X,R)$ is a linear order’ abbreviate the formula
In other words, $(X,R)$ is a linear order just in case R is an antisymmetric, transitive, total relation on X.
Let ‘ $(X,R)$ is wellfounded’ abbreviate
Say that $(X,R)$ is a wellordering if $(X,R)$ is a wellfounded linear order.
We say that two wellorderings $(X, \leq _X)$ and $(Y, \leq _Y)$ are orderisomorphic, denoted $(X, \leq _X) \simeq _o (Y, \leq _Y)$ , just in case there is a bijection $f: X \to Y$ such that
Strictly speaking, we should represent f as a relation, but we will go on using functional notation informally.
If $(X,R)$ is a wellordering, let $X \upharpoonright a$ be the initial segment of $(X,R)$ up to a, defined by
We also regard $\varnothing $ as an initial segment of $(X,R)$ . An initial segment of $(X,R)$ is proper if it is not equal to X.
Let $(X, \leq _X) <_o (Y, \leq _Y)$ abbreviate the statement that $(X, \leq _X)$ is orderisomorphic with a proper initial segment of $(Y, \leq _Y)$ .
We borrow the next lemma from [Reference EbelsDuggan7, p. 611].
Lemma 5.29 (Comparability of wellorderings)
It is provable from $\mathit {Ax}_{\mathcal {L}}$ that any two wellorderings $(X,\leq _X)$ and $(Y,\leq _Y)$ are comparable, in the sense that exactly one of the following holds:
Proof. Copy the usual settheoretic proof [Reference Jech18, pp. 18–19].
We now define the notion of Stäckelfiniteness.
If R is a binary relation, let $R^{1}$ be the converse of R, defined by $R^{1}xy \leftrightarrow Ryx.$
Definition 5.30. Say that $(X,R)$ is a double wellordering if $(X,R)$ and $(X,R^{1})$ are both wellorderings.
Say that X is Stäckelfinite, abbreviated $\mathit {Fin}(X)$ , if X admits a double wellordering. That is,
Remark 5.31. The double wellordering criterion is proposed as a definition of finiteness in [Reference Stäckel29]. The criterion is also discussed in [Reference Zermelo34, Reference Zermelo and Castelnuovo35]. For historical remarks, see [Reference Parsons25].
Stäckelfiniteness is strictly stronger than Dedekindfiniteness, in the sense that
where of course $\mathit {DFin}(X)$ abbreviates that X is Dedekindfinite. Indeed, $\mathit {Fin}(X) \rightarrow \mathit {DFin}(X)$ is a version of the pigeonhole principle. It is provable from $\mathit {Ax}_{\mathcal {L}}$ by induction on finite concepts (Lemma 5.32). On the other hand, the Fraenkel model (defined in Section 6) is a model of $\mathit {DFin}(V) + \neg \mathit {Fin}(V)$ , witnessing that $\mathit {Ax}_{\mathcal {L}} \not \vdash \mathit {DFin}(X) \rightarrow \mathit {Fin}(X)$ .
Lastly, we show that $\mathit {Ax}_{\mathcal {L}}$ proves a principle of induction on Stäckelfinite concepts.
Let $X \cup \{a\}$ be the concept defined by
Lemma 5.32 (Induction on finite concepts)
Let $\varphi (X)$ be any formula of $\mathcal {L}$ . Then $\mathit {Ax}_{\mathcal {L}}$ proves the universal closure of
Proof. Assume the antecedent. Take any X such that $\mathit {Fin}(X)$ . Fix a double wellordering $(X,R)$ , and let Y be defined by $Yx \leftrightarrow (Xx \wedge \varphi (X \upharpoonright x))$ . It suffices to show that $Y=X$ .
Suppose not. Since $(X,R)$ is a wellordering, there is an Rleast y such that $Xy \wedge \neg Yy$ . It is easy to see that y cannot be the Rleast element of X. Since $(X,R^{1})$ is a wellordering, y has a unique $(X,R)$ predecessor, call it z. By the minimality of y, we have $Yz$ , and hence $\varphi (X \upharpoonright z)$ . Also, it is easy to see that $\mathit {Fin}(X \upharpoonright z)$ . It follows that $\varphi ( (X \upharpoonright z) \cup \{y\})$ , which is to say $\varphi (X \upharpoonright y)$ . But this contradicts our choice of y.
6 The Fraenkel model
In this section, we define the Fraenkel model and show that it is a model of $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ (Lemmas 6.38 and 6.39). Then we show that the relations occurring in the Fraenkel model are exactly the sets definable by Boolean combinations of equalities with object parameters (Lemma 6.40). We will make good use of these facts in Section 7.
We remark that Lemma 6.40 implies that the Fraenkel model is the minimal infinite model of $\mathit {Ax}_{\mathcal {L}}$ —i.e., it is a submodel of any infinite model of $\mathit {Ax}_{\mathcal {L}}$ .
Definition 6.33. Let $A \subseteq \mathbb {N}^n$ and $E \subseteq \mathbb {N}$ . We say that E is a support of A if every permutation $\pi : \mathbb {N} \to \mathbb {N}$ that fixes E pointwise fixes A setwise:
Using the notation $\pi (A) = \{ ( \pi (x_1), \ldots , \pi (x_n) ) \in \mathbb {N}^n: ( x_1, \ldots , x_n ) \in A \}$ , we can restate this property as follows: for every permutation $\pi :\mathbb {N}\to \mathbb {N}$ ,
Definition 6.34. A set $A \subseteq \mathbb {N}^n$ is symmetric if it has a finite support $E \subseteq \mathbb {N}$ .
Definition 6.35. The Fraenkel model is the $\mathcal {L}$ prestructure $\mathcal {M}$ whose object universe is $\mathbb {N}$ , and whose nary relations are the symmetric subsets of $\mathbb {N}^n$ . That is, writing $M_n$ for $M_{\langle 0, \ldots , 0 \rangle } \ (n \text{ zeroes})$ ,
It is well known that $\mathcal {M}$ is a model of $\mathit {Ax}_{\mathcal {L}}$ (i.e., it is a general $\mathcal {L}$ structure) [Reference Väänänen and Zalta32]. However, we are not aware of any Englishlanguage source that gives the proof. For the reader’s convenience, we present the proof from [Reference Asser1] in the next two lemmas.
Lemma 6.36. If $A \subseteq \mathbb {N}^n$ is symmetric, and $\sigma :\mathbb {N} \to \mathbb {N}$ is any permutation, then $\sigma (A) \subseteq \mathbb {N}^n$ is also symmetric.
Proof. Let E be a support for A. We show that $\sigma ^{1}(E)$ is a support for $\sigma (A)$ . Indeed, take any permutation $\pi :\mathbb {N} \to \mathbb {N}$ that fixes $\sigma ^{1}(E)$ pointwise. Then the permutation $\sigma ^{1} \pi \sigma : \mathbb {N} \to \mathbb {N}$ fixes E pointwise. So, $(\sigma ^{1} \pi \sigma )(A) = A$ , and hence $\pi (\sigma (A)) = \sigma (A)$ .
Corollary 6.37. Each relation domain $M_n$ of the Fraenkel model is closed under the action (on $\mathbb {N}^n$ ) of permutations of $\mathbb {N}$ .
Lemma 6.38. The Fraenkel model is a model of $\mathit {Ax}_{\mathcal {L}}$ .
Proof. Let $\mathcal {M}$ be the prestructure defined above. We show that $\mathcal {M}$ satisfies Comprehension. Take any formula $\varphi (\bar x, \bar b, \bar B)$ of $\mathcal {L}$ , with free variables $\bar x = (x_1, \ldots , x_n)$ and parameters $\bar b = (b_1, \ldots , b_j)$ and $\bar B = (B_1, \ldots , B_k)$ drawn from $\mathcal {M}$ . Say that $A = \{\bar a \in \mathbb {N}^n : \mathcal {M} \vDash \varphi (\bar a, \bar b, \bar B)\}$ . We show that $A \in M_n$ .
Since the relation parameters $\bar B$ are drawn from $\mathcal {M}$ , each set $B_i$ has a finite support $E_i$ ( $i = 1, \ldots , k$ ). Let $E = \{b_1, \ldots , b_j \} \cup E_1 \cup \cdots \cup E_k$ . Clearly, E is finite. We show that E is a support for A.
Take any permutation $\pi : \mathbb {N} \to \mathbb {N}$ that fixes E pointwise, and take any $\bar a = (a_1, \ldots , a_n) \in \mathbb {N}^n$ . We check that $\bar a \in A \iff \pi (\bar a ) = ( \pi (a_1), \ldots , \pi (a_n) ) \in A$ . Indeed,
(Notation: $\pi (\bar b) = (\pi (b_1), \ldots , \pi (b_j))$ and $\pi (\bar B) = (\pi (B_1), \ldots , \pi (B_k))$ . By Lemma 6.36, each $\pi (B_i)$ is a parameter from $\mathcal {M}$ .) The second step works because permuting everything uniformly doesn’t change any truthvalues relative to any variableassignment. This is easily proved by induction on formulas. The third step works because $\pi $ fixes E pointwise, hence fixes all the parameters.
Lemma 6.39. The Fraenkel model is a model of $\neg \mathit {Fin}(V)$ .
Proof. In fact, we will prove something stronger: the Fraenkel model does not contain any linear ordering of the universe.
Consider any relation $R \subseteq \mathbb {N}^2$ with finite support $E \subseteq \mathbb {N}$ . Suppose for sake of contradiction that R is a linear ordering of the universe. Since R is total, we may choose distinct $a,b \in \mathbb {N} \setminus E$ such that $Rab$ . Let $\pi $ be any permutation fixing E such that $\pi (a)=b$ and $\pi (b)=a$ . Since E is a support of R, it follows that $Rba$ . But this contradicts the assumption that R is antisymmetric.
So, $\mathcal {M}$ contains no linear ordering of the universe. It follows that $\mathcal {M}$ contains no double wellordering of the universe, i.e., $\mathcal {M} \vDash \neg \mathit {Fin}(V)$ .
We close this section by giving a simple characterization of symmetric sets.
Lemma 6.40. Let $E \subseteq N$ be a finite set. A set $A \subseteq N$ is symmetric with support E iff A is definable by Boolean combinations of equalities with parameters from E.
Proof. Define an equivalence relation $\sim _E$ on $\mathbb {N}^n$ , as follows:
In words: $\bar a \sim _E \bar b$ iff $\bar a$ and $\bar b$ are ntuples with the same pattern of identity and distinctness which agree on members of E. It is easy to see that $\sim _E$ really is an equivalence relation.
( $\!{\implies}\kern1pt\!$ ). Suppose $A \subseteq \mathbb {N}^n$ is symmetric with support E. Observe that A is a union of equivalence classes of $\sim _E$ . Indeed, if $\bar a \sim _E \bar b$ , then there is a permutation $\pi :\mathbb {N} \to \mathbb {N}$ fixing E such that $\pi (\bar a) = \bar b$ .
Now, each equivalence class of $\sim _E$ is definable by a Boolean combination of equalities with parameters from E, of the following form:
(The parenthesized negations may or may not be present in each conjunct.) Furthermore, $\sim _E$ has only finitely many equivalence classes, because there are only finitely many possible patterns of identity and distinctness among $x_1, \ldots , x_n$ and the members of E. Hence, A is definable by a disjunction of formulas like the one above.
( $\Longleftarrow $ ). Suppose A is definable by a Boolean combination of equalities with parameters from E. We show that A is symmetric with support E.
Take any permutation $\pi :\mathbb {N}\to \mathbb {N}$ fixing E pointwise. That is, for all $x_i, x_j \in \mathbb {N}$ and $e \in E$ ,
By induction on formulas, it is easy to see that $\mathbb {N} \vDash \varphi (\bar x, \bar e) \leftrightarrow \varphi (\pi (\bar x), \bar e)$ for any Boolean combination of equalities $\varphi (\bar x, \bar e)$ . Since A is defined by some such Boolean combination, it follows that $\pi (A) = A$ .
Since $\pi $ was arbitrary, we conclude that A is symmetric with support E.
7 The nonconservativeness of w2FA
In this section, we prove Theorem 7.47, which says that w2FA is not conservative over $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ .
Here is the main idea of the proof. We have seen that $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ has a model whose relations are easy to describe in finitary terms (Section 6). Hence, $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ is a fairly weak theory; in fact it is mutually interpretable with firstorder Peano arithmetic. (To show that $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ interprets PA, the trick is to code arithmetical statements as statements about finite concepts.) On the other hand, adding $\mathrm{w2FA}$ to $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ results in a much stronger theory, one which proves that the numerical sort is Dedekindinfinite and hence interprets secondorder arithmetic. Secondorder arithmetic is not conservative over Peano arithmetic. By means of a carefully chosen interpretation, this nonconservativeness can be transferred to the theories of interest to us. For example, $\mathrm{w2FA} + \neg \mathit {Fin}(V)$ proves the interpretation of a consistency statement for Peano arithmetic, while $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ does not.
Let $X \approx Y$ abbreviate that there is a bijection between X and Y, in which case we say that X and Y are equinumerous concepts.
If $Ryz$ is a binary relation, let $R_y$ be the concept defined by $R_y z \leftrightarrow Ryz$ . (This is terrible notation, but we only use it in the following definition.)
Definition 7.41. Define $\mathit {Succ}, \mathit {Leq}, \mathit {Add}, \mathit {Mult}$ as follows:
In other words, $\mathit {Mult}(X,Y,Z)$ says that Z is equinumerous with the union of $Y$ disjoint copies of X.
Definition 7.42. Define the translation $\alpha : L_2 \to \mathcal {L}^+$ as follows.
Identify firstorder variables of $L_{2}$ with basesort concept variables of $\mathcal {L}^+$ . Identify secondorder variables of $L_2$ with numericalsort concept variables of $\mathcal {L}^+$ .
Relativize $\forall x$ to the formula $\mathit {Fin}(X)$ .
Relativize $\forall X$ to the formula $\mathit {FinNums}(\mathbf {X}) := \forall \mathbf {y}(\mathbf {X} \mathbf {y} \rightarrow \exists Y[\mathbf {y} = \#Y \wedge \mathit {Fin}(Y)])$ .
Translate predication and equality as follows:
Translate $0, S, \leq , +, \cdot $ as follows:
We may extend this translation to all $L_2$ formulas via the usual techniques for eliminating definite descriptions. For example, write $SSx = y$ as $\exists z(Sx = z \wedge Sz = y)$ , and so on.
Lemma 7.43. Restricted to $L_{\mathrm{PA}}$ formulas, the translation $\alpha : L_2 \to \mathcal {L}^+$ is an interpretation of PA in $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ .
Proof. Note that if $\varphi $ is an $L_{\mathrm{PA}}$ formula, then $\varphi ^\alpha $ is an $\mathcal {L}$ formula. We will show that $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ proves the $\alpha $ translation of each axiom of PA, and also proves that $\mathit {Succ}$ , $\mathit {Add}$ , $\mathit {Mult}$ define total functions (up to $\approx $ ).
First we prove that $\mathit {Succ}$ defines a total function (up to $\approx $ ). In other words, we show that for any Stäckelfinite concepts $X, Y, Z$ ,
We reason in $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ . For the first claim, take any concept X such that $\mathit {Fin}(X)$ . Then X is not V. So, there exists a such that $\neg Xa$ . Then $\mathit {Succ}(X, X \cup \{a\})$ , and it is easy to check that $\mathit {Fin}(X \cup \{a\})$ . This gives us the first claim. The second claim is obtained simply by unpacking the definition of $\mathit {Succ}$ .
We postpone the proofs that $\mathit {Add}$ and $\mathit {Mult}$ define total functions (up to $\approx $ ).
The $\alpha $ translations of the axioms of Q can be expressed as follows (after eliminating definite descriptions in a convenient way). For any Stäckelfinite concepts $X, Y, Z, Y', Z'$ ,
(We drop the third axiom of Q, since it is redundant in PA.) It is tedious but straightforward to check that all of these claims are provable from $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ .
The previous step essentially provides us with recursive definitions of $\mathit {Add}$ and $\mathit {Mult}$ . Using these recursive definitions, it is then easy to prove that $\mathit {Add}$ and $\mathit {Mult}$ define total functions (up to $\approx $ ). For $\mathit {Add}$ , we must show that for any Stäckelfinite concepts $X, Y, Z, W$ ,
Both of these claims are provable by induction on the finite concept Y (Lemma 5.32), using the recursive definition of $\mathit {Add}$ . The proof for $\mathit {Mult}$ is similar.
Lastly, the $\alpha $ translation of the induction scheme of PA follows from induction on finite concepts (Lemma 5.32 again).
Lemma 7.44. The translation $\alpha : L_2 \to \mathcal {L}^+$ is an interpretation of $Z_2$ in $\mathrm{w2FA} + \neg \mathit {Fin}(V)$ .
Proof. By Lemma 7.43, we already know that the $\alpha $ translation is an interpretation of PA in $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ , and hence in $\mathrm{w2FA} + \neg \mathit {Fin}(V)$ . It remains to check that $\mathrm{w2FA} + \neg \mathit {Fin}(V)$ proves the $\alpha $ translations of the secondorder induction and comprehension axioms.
The translation of the secondorder induction axiom is equivalent to
This is easily proved by induction on finite concepts, generalized to $\mathcal {L}^+$ formulas. The generalization is proved in the same way as Lemma 5.32.
The comprehension scheme translates as follows:
To prove this in $\mathrm{w2FA} + \neg \mathit {Fin}(V)$ , apply comprehension (in $\mathcal {L}^+$ ) to the formula
Then use w2FA and the fact that $\approx $ is a congruence with respect to $\varphi ^\alpha (Y)$ .
We will now define a translation $\beta : \mathcal {L} \to L_{\mathrm{PA}}$ inspired by the Fraenkel model, and show that it is an interpretation of $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ in PA.
Fix primitive recursive encodings of finite sets and sequences as natural numbers. For finite sequences, this amounts to specifying the following functions in $L_{\mathrm{PA}}$ :

(i) for each $n\in \mathbb {N}$ , a primitive recursive function $\langle x_1, \ldots , x_n \rangle $ , which codes this tuple as a single number,

(ii) primitive recursive functions $\mathit {length}(s)$ and $(s)_i$ , which return the length and the ith element of the finite sequence coded by s.
We identify finite sets and sequences with their codes. We use the letter E for finite sets, and the letter s for finite sequences.
Fix a primitive recursive Gödel numbering of $L_{\mathrm{PA}}$ formulas. We identify formulas with their Gödel numbers. For each formula $\varphi $ , let be a formal numeral that denotes (the Gödel number of) $\varphi $ .
Next, we describe $L_{\mathrm{PA}}$ formulas $\mathit {BoolEq}$ , $\mathit {BoolSat}$ , $\mathit {pad}_n$ representing certain primitive recursive relations and functions.
Let $\mathit {BoolEq}(x, y, E)$ just in case: x is a Boolean combination of $L_{\mathrm{PA}}$ equalities with exactly y free variables and with constant symbols drawn from $\{S^e0: e\in E\}$ .
Let $\mathit {BoolSat}(x,s)$ just in case: x is a Boolean combination of $L_{\mathrm{PA}}$ equalities that is satisfied when the ith variable of $L_{\mathrm{PA}}$ is assigned the value $(s)_i$ , for all $i \leq length(s)$ . This is primitive recursive, because truth and satisfaction for bounded ( $\Sigma _0$ ) formulas are primitive recursive notions.
For each $n\in \mathbb {N}$ , let $\mathit {pad}_n(x_1, \ldots , x_n, y_1, \ldots , y_n)=s$ just in case: s is the shortest finite sequence whose $x_i$ th element is $y_i$ (for all $1 \leq i \leq n$ ) and whose other elements are all zero.
Definition 7.45. Define the translation $\beta : \mathcal {L} \to L_{\mathrm{PA}}$ as follows.
Let the variables of $L_{\mathrm{PA}}$ and the object variables of $\mathcal {L}$ be enumerated by $v_1, v_2, v_3, \ldots $ .
Translate each object variable $v_i$ of $\mathcal {L}$ by the evennumbered variable $v_{2i}$ . Translate each relation variable X of $\mathcal {L}$ by a distinct oddnumbered variable $v_X \in \{v_1, v_3, v_5, \ldots \}$ . In the last clause, E is a fresh variable and n is the arity of X.
Lemma 7.46. The translation $\beta : \mathcal {L} \to L_{\mathrm{PA}}$ is an interpretation of $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ in PA.
Proof. It is easy to check that the $\beta $ translation of any noncomprehension axiom is a theorem of firstorder logic, and hence is provable in PA.Footnote ^{12} It remains to show that PA proves the $\beta $ translation of each comprehension axiom, and also that PA proves $(\neg \mathit {Fin}(V))^\beta $ .
The idea is to formalize the proofs of Lemmas 6.38, 6.39, and 6.40 in PA. The main obstacle is that we defined symmetric sets $A \subseteq \mathbb {N}^n$ in terms of arbitrary permutations of $\mathbb {N}$ , and it is not obvious how to formalize those in PA. But in fact we do not need arbitrary permutations. Say that a permutation $\pi :\mathbb {N} \to \mathbb {N}$ is essentially finite if $\pi (a) = a$ for all but finitely many $a \in \mathbb {N}$ . If we go through Section 6, replacing ‘permutation’ with ‘essentially finite permutation’ everywhere, we get exactly the same model, and all the proofs still work.
We formalize Lemma 6.40 as follows. Say that an $L_{\mathrm{PA}}$ formula $\varphi (\bar x)$ is symmetric with support E just in case, for every essentially finite permutation $\pi $ ,
Then we prove a theorem scheme in PA which says: ‘An $L_{\mathrm{PA}}$ formula is symmetric iff there is a Boolean combination of equalities coextensive with it.’ More precisely, let $\varphi (v_{i_1}, \ldots , v_{i_n})$ be any $L_{\mathrm{PA}}$ formula with exactly the free variables displayed. Then PA proves the following: $\varphi (v_{i_1}, \ldots , v_{i_n})$ is symmetric with support E iff there exists y such that
$(\!{\implies}\!\kern1pt)$ . We reason in PA. Suppose that $\varphi (v_{i_1}, \ldots , v_{i_n})$ is symmetric with support E. Let $\psi _1, \ldots , \psi _m$ be all possible disjunctions of formulas of the form
where parenthesized negations may or may not be present. Argue that $\bar x \sim _E \bar y \rightarrow (\varphi (\bar x) \leftrightarrow \varphi (\bar y))$ , and hence
Then observe that , for each $1 \leq i \leq m$ .Footnote ^{13} Reasoning by cases, we are done.
For the $(\Longleftarrow )$ direction, copy the rest of the proof of Lemma 6.40.
Next, we formalize Lemma 6.38. We replace $\mathcal {M} \vDash \varphi $ (‘ $\mathcal {M}$ satisfies $\varphi $ ’) with $\varphi ^\beta $ throughout. For each $\mathcal {L}$ formula $\varphi (\bar x, \bar y, \bar Y)$ not containing X free, we wish to show that PA proves
This basically says: ‘There is a Boolean combination of equalities coextensive with $\varphi (\bar x, \bar y, \bar Y)^\beta $ .’ By the formalized version of Lemma 6.40, it suffices to prove in PA that $\varphi (\bar x, \bar y, \bar Y)^\beta $ is a symmetric $L_{\mathrm{PA}}$ formula. To do this, use induction on $\mathcal {L}$ formulas $\varphi (\bar x, \bar X)$ to prove the following theorem scheme in PA:
(This corresponds to our earlier observation that permuting everything uniformly doesn’t change any truthvalues in $\mathcal {M}$ relative to any variableassignment.) Then copy the rest of the proof of Lemma 6.38.
In the same way, it is easy to formalize Lemma 6.39 in PA.
We are now ready to prove the first main theorem of the paper.
Theorem 7.47. w2FA is not conservative over $\mathit {Ax}_{\mathcal {L}} + \neg \mathit {Fin}(V)$ .
Proof. Let $\mathit {Con}_{\mathrm{PA}}$ denote a standard consistency statement for PA. We claim that $(\mathit {Con}_{\mathrm{PA}})^\alpha $ is a witness to nonconservativeness. That is,
Proof of claim (1)
Write $\vartriangleright $ for ‘interprets’. From Lemmas 7.43 and 7.46, we have
Suppose for a contradiction that $\mathit {Ax}_{\mathcal {L}} +\neg \mathit {Fin}(V) \vdash (\mathit {Con}_{\mathrm{PA}})^\alpha $ . Then $\mathrm{PA} \vdash ((\mathit {Con}_{\mathrm{PA}})^\alpha )^\beta $ , and hence $\mathrm{PA} \vartriangleright ^{\beta \circ \alpha } \mathrm{PA} + \mathit {Con}_{\mathrm{PA}}$ . However, by a strong version of Gödel’s second incompleteness theorem, $\mathrm{PA} \not \vartriangleright (\mathrm{PA} + \mathit {Con}_{\mathrm{PA}})$ .Footnote ^{14} Contradiction.
Proof of claim (2)
It is well known that $Z_2 \vdash \mathit {Con}_{\mathrm{PA}}$ . Hence, by Lemma 7.44,
Corollary 7.48. w2FA is not conservative over $\mathit {Ax}_{\mathcal {L}}$ .
For proof, see Lemma 1.3.
Corollary 7.49. 2FA is not conservative over $\mathit {Ax}_{\mathcal {L}}$ .
8 w2FA is conservative over stronger base theories
It is surprising that w2FA is not conservative over $\mathit {Ax}_{\mathcal {L}}$ . However, the next two theorems establish some limits to the nonconservativeness of w2FA.
Theorem 8.50. w2FA is conservative over thirdorder logic.
Proof. Let $\mathcal {L}^3$ be the thirdorder analog of the base language $\mathcal {L}$ . Let $\mathit {Ax}_{\mathcal {L}^3}$ denote the axioms of the deductive system for $\mathcal {L}^3$ , including full thirdorder comprehension in the base sort. Note that w2FA still only includes secondorder comprehension for the numerical sort.
Take any $\mathcal {L}^3$ formula $\varphi $ , and suppose that $\mathrm{w2FA} + \mathit {Ax}_{\mathcal {L}^3} \vdash \varphi $ . We show that $\mathit {Ax}_{\mathcal {L}^3} \vdash \varphi $ . Our strategy is to define an interpretation of w2FA in $\mathit {Ax}_{\mathcal {L}^3}$ that leaves $\mathcal {L}^3$ sentences fixed (up to renaming of bound variables). Under such an interpretation, any derivation of $\varphi $ from $\mathrm{w2FA} + \mathit {Ax}_{\mathcal {L}^3}$ is transformed into a derivation of $\varphi $ from $\mathit {Ax}_{\mathcal {L}^3}$ . The idea is to interpret each cardinality $\#X$ as the concept X from whence it came, with numericalsort equality being interpreted as equinumerosity.
First, we define a pretranslation from variables of $\mathcal {L}^3 \cup \mathcal {L}^+$ into variables of $\mathcal {L}^3$ . Translate each variable of sort $\tau $ as a variable of sort $\tau ^*$ , where
In other words, $\tau ^*$ is obtained from $\tau $ by replacing each occurrence of n with $\langle 0 \rangle $ .
Set up the pretranslation so that distinct variables of $\mathcal {L}^3 \cup \mathcal {L}^+$ are translated as distinct variables of $\mathcal {L}^3$ . For example, let the basesort concept variables be enumerated by $X_0, X_1, X_2, \ldots $ , and the numericalsort object variables by $\mathbf {{v}}_0, \mathbf {{v}}_1, \mathbf {{v}}_2, \ldots $ . Then let the pretranslations be
Similarly for other sorts.
We now define the translation $*: \mathcal {L}^3 \cup \mathcal {L}^+ \to \mathcal {L}^3$ . In the first and last lines, let $\tau = \langle \tau _1, \ldots , \tau _k \rangle $ be any second or thirdorder sort. In the last line, $\mathit {Cong}_\approx ((X^\tau )^*)$ is a metalinguistic abbreviation of the statement: ‘ $\approx $ is a congruence for the relevant argumentplaces of $(X^\tau )^*$ ’, where the sort $\tau $ determines which argumentplaces are relevant.
It is easy to check that the $*$ translation of each axiom of w2FA is provable from $\mathit {Ax}_{\mathcal {L}^3}$ . So, the translation works.
To prove the next theorem, we need another little fact about conservativeness.
Lemma 8.51. Let T be a theory in a formal language L, and let A be any Lsentence. Suppose that a sentence $\Delta $ is conservative over $T + A$ and is also conservative over $T + \neg A$ . Then $\Delta $ is conservative over T.
Proof. Take any $\varphi \in L$ , and suppose that $T + \Delta \vdash \varphi $ . We show that $T \vdash \varphi $ . Indeed
By the same reasoning, we also have $T \vdash \neg A \rightarrow \varphi $ . Hence, $T \vdash \varphi $ .
Theorem 8.52. w2FA is conservative over $\mathit {Ax}_{\mathcal {L}} + \mathit {Fin}(V)$ .
Proof. Let $V=1$ abbreviate the formula $\forall x \forall y \ x=y$ . By Lemma 8.51, we may divide into cases according to whether $V = 1$ or $V \neq 1$ . The rest of the proof is contained in Lemmas 8.53 and 8.54.
Lemma 8.53. w2FA is conservative over $\mathit {Ax}_{\mathcal {L}} + \mathit {Fin}(V) + V \neq 1$ .
Proof. We follow the same strategy as in Theorem 8.50. That is, we show how to define an interpretation $\dagger $ of w2FA in $\mathit {Ax}_{\mathcal {L}} + \mathit {Fin}(V) + V \neq 1$ that leaves $\mathcal {L}$ sentences fixed (up to renaming of bound variables). The idea is to interpret cardinalities $\#X$ as pairs of basesort objects. Specifically, we will fix distinct basesort objects a and b, represent $\# (V \upharpoonright x)$ as $(x, a)$ , and represent $\#\varnothing $ as $(a,b)$ .
First, we define a pretranslation from variables of $\mathcal {L}^+$ into variables of $\mathcal {L}$ . Translate each variable of sort $\tau $ as a distinct variable or pair of variables of sort(s) $\tau ^\dagger $ , where
For example, $\langle n, 0, n \rangle ^\dagger = \langle 0,0,0,0,0\rangle $ and $\langle \langle n \rangle , n \rangle ^\dagger = \langle \langle 0,0 \rangle , 0,0\rangle $ .
Set up the pretranslation so that no variable of $\mathcal {L}$ is ever used twice. For definiteness, let the basesort object variables be enumerated by $v_0, v_1, v_2, \ldots $ , and the numericalsort object variables by $\mathbf {{v}}_0, \mathbf {{v}}_1, \mathbf {{v}}_2, \ldots $ . Then let the pretranslations of the object variables be
Similarly for secondorder variables.
Now we define the interpretation $\dagger : \mathcal {L}^+ \to \mathcal {L}$ . Fix a wellordering $\leq $ of V, and fix distinct basesort objects $a \neq b$ . In the first and last lines, let $\tau = \langle \tau _1, \ldots , \tau _k \rangle $ be any secondorder sort.
In order to justify the interpretation of $\#$ , we must check that for each base concept X, there is a unique initial segment of $(V, \leq )$ that is equinumerous with X. For the existence claim, recall that $\mathit {Ax}_{\mathcal {L}}$ proves that any two wellorderings are comparable (Lemma 5.29). In particular, $(X,\leq )$ is orderisomorphic with a segment of $(V, \leq )$ , and hence X is equinumerous with that segment. For the uniqueness claim, use the pigeonhole principle (Remark 5.31).
Now it is easy to check that the $\dagger $ translation of each axiom of w2FA is provable from $\mathit {Ax}_{\mathcal {L}} + \mathit {Fin}(V) + V \neq 1$ . So, the interpretation works.
Lemma 8.54. w2FA is conservative over $\mathit {Ax}_{\mathcal {L}} + V = 1$ .
Proof. Observe that $\mathit {Ax}_{\mathcal {L}} + V = 1$ is a categorical theory, and hence it is a complete theory. So, the only way that w2FA could be nonconservative over $\mathit {Ax}_{\mathcal {L}} + V = 1$ is if the combined theory $\mathrm{w2FA} + V = 1$ were inconsistent. But $\mathrm{w2FA} + V = 1$ is consistent: it has a model $\mathcal {M}$ with object domains $M_0 = \{a\}$ and $M_n = \{0,1\}$ and with $I(\#)$ being the function mapping each basesort concept to its cardinality.
9 The nonconservativeness of 2FA
In the previous section, we established some limits to the nonconservativeness of w2FA. In this section, we will show that 2FA is more deeply nonconservative than w2FA. The main result is Theorem 9.67, which says that 2FA is nonconservative over $\mathit {Ax}_{\mathcal {L}} + \mathit {Fin}(V)$ . Our proof of this result can be generalized to show that 2FA is nonconservative over pure axiomatic nth order logic for any $n \geq 2$ , or even over simple type theory.
Roughly, the idea is to construct a Gödel sentence for $\mathit {Ax}_{\mathcal {L}} + \mathit {Fin}(V)$ . By a variation on Gödel’s first incompleteness theorem, $\mathit {Ax}_{\mathcal {L}} + \mathit {Fin}(V)$ does not prove its own Gödel sentence. On the other hand, $\mathrm{2FA} + \mathit {Fin}(V)$ does prove the Gödel sentence, because it is a powerful theory: it interprets secondorder arithmetic in the new sort (and it is smart enough to relate that arithmetic to the Gödel sentence expressed in $\mathcal {L}$ ).
But $\mathit {Ax}_{\mathcal {L}} + \mathit {Fin}(V)$ says that the universe is finite, so it cannot interpret Q. How, then, is it possible to pull off the Gödel argument? The trick is that $\mathit {Ax}_{\mathcal {L}} + \mathit {Fin}(V)$ has arbitrarily large models. If $\mathit {Ax}_{\mathcal {L}} + \mathit {Fin}(V)$ proved its own Gödel sentence, then any sufficiently large model would contain a witness to the paradoxical derivation, yielding a contradiction.
To implement this argument, it will be convenient to work with a definitional extension $T = \mathit {Ax}_{\mathcal {L} \cup L'} + \mathit {Fin}(V) + \Delta $ , which we now describe.
Definition 9.55. Let $\mathcal {L} \cup L' : = \mathcal {L}_{\{0\}}[\{0,S,\leq , A,M\}]$ .
We identify variables of $L'$ with object variables of $\mathcal {L}$ . Thus,

• $0$ is a base object constant,

• S and $\leq $ are constants of sort $\langle 0, 0 \rangle $ ,

• A and M are constants of sort $\langle 0, 0, 0 \rangle $ .
Let $\mathit {Ax}_{\mathcal {L} \cup L'}$ be the axioms of the deductive system for $\mathcal {L} \cup L'$ .
Definition 9.56. Let $\Delta $ be the conjunction of the following $(\mathcal {L}\cup L')$ formulas:

1. $(V,\leq )$ is a double wellordering with least element $0$ ,

2. $Sxy$ iff y is the upper neighbor of x with respect to $\leq $ ,

3. Definitions of A and M:
$$ \begin{align*} & Ax0z\leftrightarrow z= x, \\ & Syy' \wedge Szz' \rightarrow (Axyz \leftrightarrow Axy'z'), \\ & Mx0z \leftrightarrow z=0, \\ & Syy' \wedge Azxz' \rightarrow (Mxyz \leftrightarrow Mxy'z'). \end{align*} $$
Definition 9.57. Let $T = \mathit {Ax}_{\mathcal {L} \cup L'} + \mathit {Fin}(V) + \Delta $ .
Lemma 9.58. $T \vdash BA'$ .
Proof. It is obvious that T proves the universal closures of the first three axioms of $BA'$ . Furthermore, since $(V,\leq )$ is a wellordering, we have induction for all $(\mathcal {L} \cup L')$ formulas. Using induction, it is easy to prove the universal closures of the remaining axioms of $BA'$ .
We will now describe the construction of the Gödel sentence of T.
Fix a Gödel numbering of $\mathcal {L}\cup L'$ . We describe $L_{\mathrm{PA}}$ formulas $\mathit {Der}_T$ , $\mathit {diag}$ representing certain primitive recursive notions.
Let $\mathit {Der}_T(x,y)$ just in case: x is the Gödel number of a Tderivation of a formula with Gödel number y.
Let $\mathit {diag}(x)=y$ be a function with the following property: if n is the Gödel number of an $(\mathcal {L} \cup L')$ formula $\theta (y)$ with exactly the free variable y, then
(The notation $y \doteq {n}$ is from Definition 4.21.) Note that $\mathit {diag}$ is modeled on the Gödel diagonal function: in essence, it substitutes into a formula its own Gödel number.
It is well known that recursive relations are $\Delta _1$ definable in PA [Reference Hájek and Pudlák13, p. 18, theorem 0.45]. So, we may choose $\mathit {Der}_T$ and $\mathit {diag}$ so that $\mathit {Der}_T(x, \mathit {diag}(y))$ is a $\Sigma _1$ formula. By Lemma 4.23, there is an equivalent $\Sigma _1'$ formula $\varphi (x,y)$ of $L'$ such that, for any parameters $a,b \in \mathbb {N}$ ,
Let p be the Gödel number of $\forall x \neg \varphi (x,y)$ . Then , where G is the following sentence:
We say that G is the Gödel sentence of the theory T.
Lemma 9.59. The theory $T = \mathit {Ax}_{\mathcal {L} \cup L'} + \mathit {Fin}(V) + \Delta $ does not prove its own Gödel sentence G.
Proof. Suppose for sake of contradiction that $T \vdash G$ . Let d be the Gödel number of a derivation of G. Then we have
Write $\varphi (x,y)$ as