1 Introduction
One of the fundamental driving questions of proof theory is the following: What is the computational content of a mathematical theorem, and how can it be exhibited? Proof mining, which emerged as a subfield of mathematical logic in the 1990s through the work of Ulrich Kohlenbach and his collaborators,Footnote 1 aims at answering that question by extracting this computational content from theorems with proofs as they are found in the mainstream mathematical literature. This is a nontrivial task, in particular as such proofs are prima facie noneffective, involving both classical logic as well as various noncomputational (set-theoretic) principles. However, backed by a logical apparatus relying on the utilization of various methods from proof theory like functional interpretations and majorizability, the program of proof mining has had great success in various areas of mathematics, in particular regarding (nonlinear) analysis and optimization (see in particular the recent surveys [Reference Kohlenbach27, Reference Kohlenbach, Sirakov, Ney de Souza and Viana28]).
Two areas that proof mining has previously only touched upon briefly are the fields of measure theory in general and probability theory in particular. Concretely, we refer to the works [Reference Arthan and Oliva1, Reference Avigad, Dean and Rute2, Reference Avigad, Gerhardy and Towsner3] which are essentially the only proof mining case studies in these areas preceding this work. From a practical perspective, this diffidence of proof mining regarding these areas is at least partially due to the fact that they are so far not substantiated by underlying logical methods as other areas of applications for proof mining are. This absence of a firm proof-theoretic foundation is largely due to a range of logical difficulties inherently present in the context of areas strongly reliant on set-theoretic methods, as is precisely the case with measure and probability theory. In any case, this lack of such a formal approach to these areas also renders the previous applications, to a certain degree, ad hoc.
It is the aim of this paper to extend the current logical methods used in proof mining as to render them applicable to large classes of proofs from probability theory, in particular so that they allow for a logical explanation of the success of the previous case studies mentioned before (as well as of the various properties of the extracted content).
1.1 The logical foundations of proof mining
The fundamental logical “substrates” of the proof mining program are the so-called general logical metatheorems.Footnote 2 These use well-known proof interpretations like Gödel’s functional interpretation [Reference Gödel13], negative translations (see, e.g., [Reference Kuroda36]), and their extensionsFootnote 3 to provide a general result that quantifies and allows for the extraction of the computational content of large classes of theorems from their proofs. Furthermore, these proofs may involve classical logic and various noncomputational principles. In that way, proof mining, as substantiated by these metatheorems, has led to hundreds of applications in the last decades.
A crucial innovation in the techniques underlying these logical metatheorems was introduced by Kohlenbach in [Reference Kohlenbach25], marking the “modern age” of proof mining: The metatheorems for proof mining preceding [Reference Kohlenbach25] were based on “pure” systems for arithmetic in all finite types (see, e.g., [Reference Kohlenbach26, Reference Troelstra60]) and as such were restricted in their expressivity to dealing with spaces and structures that were representable as Polish metric spaces in the sense of Baire space (and thus separable). The paradigm first proposed in [Reference Kohlenbach25] was to extend the language of the underlying systems with additional abstract base types which, together with additional constants and governing axioms, could be used to talk about much larger and broader classes of spaces and objects on them beyond merely representable ones.Footnote
4
The class of spaces and objects treated in this fashion has grown since then to a rather sizable amount, ranging from fundamental examples like general (nonseparable) metric, hyperbolic,
$\mathrm {CAT}(0)$
, Banach and Hilbert spaces to much more involved objects like
$\mathbb {R}$
-trees,
$L^p$
-spaces, the dual of a (nonseparable) Banach space as well as monotone operators and nonlinear semigroups, among many others.
Further, the approach to represent various classes of spaces via abstract types has, together with an ingenious combination due to Kohlenbach (see the discussions in Section 8 later on for further details and references on this) of Gödel’s functional interpretation with (a suitable extension of) Howard’s notion of majorizability [Reference Howard and Troelstra19], resulted in logical metatheorems that, beyond the extractability of effective bounds from noneffective existence proofs, guarantee a high degree of uniformity of the respective data. These perspectives of using abstract types to represent general spaces, together with the use of notions of majorizability to induce uniformities, are also fundamental to the present paper.
1.2 An abstract approach to probability theory
In this paper, we follow a novel abstract approach towards a treatment of various fundamental notions from probability theory to avoid the range of issues which are present a priori in that context, as briefly mentioned before. Namely, already the most fundamental notions from probability like countable unions of measurable sets or
$\sigma $
-additive measures require the use of proof-theoretically strong comprehension principles to deal with the high quantifier complexity inherent in their defining axioms. For example, the existence of countable unions can be rather immediately recognized as a type of
$\Sigma _1$
-comprehension principle while the
$\sigma $
-additivity of a measure is likewise a strong existence property as it immediately implies various limit theorems for probabilities. Now, while this strength might be realized in extreme cases, the practice of probability theory suggests that in many situations, in particular where these objects are only discussed in an “abstract” way, theorems from probability theory can be given computational solutions with low complexity. An approach towards extractive proof theory based on a direct specification of these objects and notions would hence distort the complexity of bounds extracted from proofs in these situations.
We here provide a proof-theoretic approach which characterizes situations where these notions are indeed inherently “tame” in the sense that, although in principle being subject to well-known Gödelian phenomena, their mere presence and abstract use do not contribute to the strength of extractable bounds.Footnote
5
For that, we first focus on so-called probability contents (also called charges), that is, evaluations of sets which behave like a measure but are only finitely additive. This absence of the strong requirement of
$\sigma $
-additivity then also allows us to, at first, lift the restriction of the closure under countable unions of the space of events, so that we only consider these mappings to operate on (Boolean) algebras of sets (also called fields). In fact, these contents on algebras form a well-studied part of modern probability and measure theory with a rich theory and, as, for example, highlighted in the seminal book by K.P.S. Bhaskara Rao and M. Bhaskara Rao [Reference Bhaskara Rao and Bhaskara Rao6] on general (that is not necessarily
$[0,1]$
-valued) contents, they are in a way “more interesting, more difficult to handle, and perhaps more important than countably additive ones” (which, as a statement, is attributed to Bochner in the foreword of [Reference Bhaskara Rao and Bhaskara Rao6]).
To formally approach probability contents on algebras, we follow a so-called intensional approach, that is, we fundamentally employ new base types and constants to provide an abstract access to the involved objects instead of relying on particular representations. Concretely, over a base theory of arithmetic in all finite types, we use new abstract types to provide a quantifier-free access to the base set of the content space and the algebra of events over this set and employ constants utilizing these types to provide an abstract access to the related set-theoretic operations and the content. These new types and constants are then governed by admissible axioms, that is axioms of low computational strength, which simply describe the fundamental properties of these objects instead of extensionally specifying their precise structure through any kind of coding.
We then utilize this base system for contents on algebras to define extensions which allow us to treat countably infinite unions and hence to provide access to the theory of
$\sigma $
-algebras and associated
$\sigma $
-additive probability measures. Again, our treatment is intensional here, to avoid the inherent difficulties with infinite unions laid out before. Concretely, we approach infinite unions by adding a novel constant to the underlying system that provides a direct and abstract access to them as an operator associating a new measurable set, representing the union, with a sequence of measurable sets. However, instead of specifying that the resulting values indeed represent the unions in question through the strong associated (comprehension-type) axiom, which prompted us to consider probability contents on algebras in the first place, we only specify that it “behaves like” a union through a combination of admissible axioms and rules. Further, we provide a tame approach to the space of bounded and Borel-measurable functions to introduce the Lebesgue integral (already in the context of probability contents), similarly an object that relies on strong comprehension principles in its classical formulation. Also here, we opt for an intensional specification which, instead of describing said objects explicitly and completely, axiomatizes a general structure just adhering to some essential properties.
In that way, we arrive at suitable systems for probability contents on algebras of sets, and at various extensions of those systems for these other fundamental notions from probability theory. All of these we then endow with corresponding metatheorems on the extraction of computational bounds in the style of proof mining, based on (a monotone variant of) Gödel’s functional interpretation.
These metatheorems thereby provide a formal proof-theoretic perspective on how an abstract use of the central notions of probability theory, now formally characterized through our intensional approach, does not (artificially) contribute to the strength of extractable bounds. Concretely, the metatheorems formally guarantee that the complexity of the extractable information depends only on (and can be a priori bounded in terms of) the complexity of the principles used in the corresponding proof. In that way, while the main base system taken in this paper is actually one that does contain large amounts of comprehension (to illustrate the potential strength of systems which are amenable to proof mining methods), the approach to the various objects from probability theory taken here does not rely on these strong principles at all, as motivated before, and thus can also immediately be developed over suitably weak subsystems (as, e.g., the collection of systems introduced in [Reference Kohlenbach24] based on the Grzegorczyk hierarchy [Reference Grzegorczyk15]) where most of the mathematics discussed here can be similarly carried out but where then bounds of correspondingly low complexity could be guaranteed. In that way, the metatheorems established here indeed provide the right background to formally elucidate the extent of the phenomenon of proof-theoretic tameness for probability theory, which as discussed in detail before, was one of the main motivations for their design.
Beyond this tameness, one of the most crucial features of the new metatheorems presented in this paper is that they guarantee a high uniformity of the extracted content, in particular that it will be independent of the measure, the underlying set, and the algebra. As in the case of the first modern metatheorems mentioned before, this relies on a specific extension of the notion of majorizability due to Howard, which in our case utilizes the probability content (or measure) to provide a corresponding notion of majorizability for the new abstract types. In particular, we want to note that this is the first time in proof mining that an extension of Howard’s majorizability notion to abstract spaces is utilized that does not rely on any metric structure of the underlying spaces. In that way, the present metatheorems provide the first concrete logical explanation of the uniformities of extractable bounds observed in the previously mentioned proof mining case studies, as will be discussed further later on (see in particular Section 1.4).
1.3 Related work
The present approach to proof mining and probability theory is, in particular, to be distinguished from previous work on logical aspects of (quantitative) probability theory. On the proof-theoretic side, the main preceding work is that by Kreuzer [Reference Kreuzer35] on extracting computational content from proofs in measure theory. This approach, however, relies on strong forms of comprehension for treating measure spaces (represented via a specific coding), resulting in a rather restricted formal theory with a more limited scope of analyzable theorems, and which further does not guarantee any of the uniformity features for the quantitative information extracted thereby, in contrast to the present results. Outside of proof theory, finitizations of concepts from probability theory that enjoy similar uniformities as the ones considered here have also been obtained using tools from model theory, particularly ultraproducts [Reference Avigad and Iovino4, Reference Dueñez, Iovino and Iovino8, Reference Goldbring and Towsner14]. However, these works differ from the present one in both method and scope. The first crucial difference is that the proof-theoretic metatheorems presented here actually provide a method for extracting the uniform quantitative information which the model-theoretic approach can only infer the existence of. Beyond that, as already commented on before, in the present approach, the complexity of that information can be gauged beforehand based on the principles used in the proof, and the proof-theoretic tameness of that information can be guaranteed thereby. Lastly, the model-theoretic approach is essentially fixed to focus on convergence statements and uniformities relating to their so-called metastable formulations. In contrast, the proof-theoretic approach presented here does not have that limitation (which has a crucial impact on the range of possible applications as will also be discussed again in Section 1.4). These facts crucially separate our work from the model-theoretic approach. Nevertheless, the model-theoretic approaches are certainly not subsumed by our work but rather complementary. For one, the model-theoretic approaches to uniformities of rates of metastability in probability theory only rely on the truth of a statement, while our proof-theoretic results rely on the provability of the underlying theorem in an (albeit very strong) underlying theory. Also, we want to highlight that in particular the approach by Dueñez and Iovino [Reference Dueñez, Iovino and Iovino8] seems to rely, upon a closer inspection, on rather similar ideas for approaching some of the initial objects in question (e.g., by effectively treating probability contents on algebras instead of measures on
$\sigma $
-algebras and treating these underlying spaces with their operations as abstract entities), a fact that illustrates the apparent similarity of the problems that both the proof-theoretic and the model-theoretic approaches to these types of questions face and which in particular further highlights the naturalness of the approach followed here. However, our approach starts to crucially differ also conceptually instead of just methodologically from the work in [Reference Dueñez, Iovino and Iovino8] in the way that infinite unions and integration are treated in our systems and in the way that the uniformities of associated (extracted) bounds are guaranteed.
1.4 Applications, case studies, and extensions
Besides these logical considerations, the applications of the present metatheorems to probability theory are arguably the most important consequence of the present work. We hence use the last two sections of this paper to substantiate the applicability of the logical metatheorems introduced here.
At first, we outline in detail how the quantitative analysis of Egorov’s theorem as presented in the seminal case study for proof mining in measure theory from [Reference Avigad, Dean and Rute2] formalizes in our systems. Thereby, as mentioned before, we also provide the first logical explanation of the uniformities of the bounds extracted therein.Footnote 6
Furthermore, the analysis of the results from [Reference Avigad, Dean and Rute2] provided later highlights that these remain true for probability contents. This therefore illustrates that the notions and proofs produced in the work [Reference Avigad, Dean and Rute2], by following the finitary perspective of proof mining, allow for a lift of the underlying convergence result to the theory of contents, a qualitative result that complements the quantitative results produced in [Reference Avigad, Dean and Rute2] in the form of a corresponding rate. In that way, this points to an apparent empirical phenomenon which we want to highlight here: Finitary quantitative variants of notions and results from the theory of probability measures, as suggested by proof mining, seem to provide analogous versions suitable for the theory of probability contents. This view is largely corroborated by the further case studies developed since this work (as discussed below). Further, and more generally, these results thereby also highlight the naturalness of the theory of contents as an underlying medium for developing a logical account of probability theory in the sense of proof mining and fuel our confidence that already this system for probability contents will provide a suitable base for proof mining developments in the context of probability theory in the future. In that way, the fundamental relevance of (probability) contents, relating to the statement by Bochner quoted above, is essentially rediscovered by the proof-theoretic approach presented in this paper.
As another application of our novel logical approach, we establish a general so-called proof-theoretic transfer principle that allows for a lift of computational information on the relation between modes of convergence of sequences of real numbers to sequences of random variables. These results thereby provide a formal footing for this type of strategy, which is rather abundant in probability theory and in particular features in some recent case studies on proof mining and probability theory by the first author [Reference Neri43].
Besides these examples of application discussed here, the applicability of the systems presented in this paper is further substantiated by the fact that they explain the previously mentioned application [Reference Arthan and Oliva1] preceding this work as well as the very recent works in laws of large numbers [Reference Neri42, Reference Neri43], asymptotic behavior of stochastic processes [Reference Neri, Pischke, Powell, Beckmann, Oitavem and Manea45, Reference Neri and Powell47], and stochastic optimization [Reference Neri, Pischke and Powell44, Reference Neri and Powell46, Reference Pischke and Powell53] (along with forthcoming work by the authors together with Thomas Powell) as instances of the present methodology. In particular, in the context of these recent works, the extractive proof-theoretic perspective of this work was crucial for obtaining the respective results. The growing number of case studies thereby in particular empirically illustrates that the present abstract approach indeed has a rather broad applicability, as hoped for initially.
Beyond that, the present work lends itself both to further theoretical investigations on the extension of proof mining methods to further notions from probability and measure theory like Bochner integrals and martingales, among many others, as well as to substantiate and carry out many further (and potentially much more sophisticated) applications of proof mining in this area beyond those already mentioned. In particular, we want to mention that most ideas developed here could be extended, mutatis mutandis, to general finite contents and measures.
2 Preliminaries
The basic system that we rely on is the system
$\mathcal {A}^\omega =\textsf {WE}\mbox {-}\textsf {PA}^\omega +\textsf {QF}\mbox {-}\textsf {AC}+\textsf {DC}$
for classical analysis in all finite types as commonly used in proof mining, formalized via (a weakly extensional variant of) Peano arithmetic in all finite types together with a few choice principles (see, e.g., [Reference Kohlenbach25] where this notation for the system was, presumably, first introduced). As all systems introduced here will be extensions of this system
$\mathcal {A}^\omega $
, we in this section sketch the essential features relevant for this paper. For any further details, we refer to the works [Reference Kohlenbach26, Reference Troelstra60].
Here, we follow the definition of weakly extensional Peano arithmetic in all finite types
$\textsf {WE}\mbox {-}\textsf {PA}^\omega $
as, for example, given in [Reference Kohlenbach26] (see also [Reference Troelstra60]) and, in that way, we do not recall all the defining features
$\textsf {WE}\mbox {-}\textsf {PA}^\omega $
here and only focus on the four main aspects which are relevant in detail for this paper. In general, we denote function types using the bracket notation used in [Reference Kohlenbach26], that is,
$\rho (\tau )$
is the type of functions that map objects of type
$\tau $
to objects of type
$\rho $
, and we use T to denote the set of all finite types as usual, that is,
As usual, we denote pure types by natural numbers by setting
$n+1:=0(n)$
. The four central properties of
$\textsf {WE}\mbox {-}\textsf {PA}^\omega $
that we need here are that, for one, the only primitive relation is equality at type
$0$
(denoted by
$=_0$
) and higher-type equality is only defined as an abbreviation via recursion with
For another,
$\textsf {WE}\mbox {-}\textsf {PA}^\omega $
crucially does not contain the full extensionality principles
Instead, it only contains the quantifier-free extensionality rule
where
$A_0$
is a quantifier-free formula, s and t are terms of type
$\rho $
and r is a term of type
$\tau $
. This lack of full extensionality is essential for establishing results on program extraction from classical proofs as the full extensionality axiom is not admissible in the context of the Dialectica interpretation, the main tool used later extract bounds from (classical) proofs. However, the extensionality rule is admissible in that context and so, from an applied perspective, serves the much-needed purpose of reintroducing an admissible fraction of extensionality back into the system to be practically as flexible as possible when dealing with proofs from the literature. We refer to [Reference Kohlenbach26] for a detailed discussion on this.
Further,
$\textsf {WE}\text {-}\textsf {PA}^\omega $
contains constants
${\underline {R}}_{\underline {\rho }}$
for simultaneous primitive recursion in the sense of Gödel [Reference Gödel13] and Hilbert [Reference Hilbert18] as governed by the axioms
$$\begin{align} \begin{cases} (R_i)_{\underline{\rho}}0\underline{y}\underline{z}=_{\rho_i}y_i\\ (R_i)_{\underline{\rho}}(Sx)\underline{y}\underline{z}=_{\rho_i}z_i(\underline{R}_{\underline{\rho}}x\underline{y}\underline{z})x \end{cases}\text{ for }i=1,...,k \end{align}$$
where
$\underline {\rho }=\rho _1,\dots ,\rho _k$
is a tuple of types,
$\underline {y}=y_1,\dots ,y_k$
with
$y_i$
of type
$\rho _i$
and
$\underline {z}=z_1,\dots ,z_k$
with
$z_i$
of type
$\rho _i(0)\underline {\rho }^t$
where we write
$\underline {\rho }^t:=(\rho _k)\dots (\rho _1)$
.
Lastly, due to the inclusion of the combinators of Schönfinkel [Reference Schönfinkel54] in the language of
$\textsf {WE}\mbox {-}\textsf {PA}^\omega $
, the system allows the definition of
$\lambda $
-abstraction in the sense that for any term t of type
$\tau $
and any variable x of type
$\rho $
, we can construct a term
$\lambda x.t$
of type
$\tau (\rho )$
such that the free variables of
$\lambda x.t$
are exactly those of t without x and so that
for any term s of type
$\rho $
.
Next to
$\textsf {WE}\mbox {-}\textsf {PA}^\omega $
, we define, as usual, the principle of quantifier-free choice
$\textsf {QF}\mbox {-}\textsf {AC}$
, that is,Footnote
7
with
$A_0$
quantifier-free and where the types of the variable tuples
$\underline {x}$
,
$\underline {y}$
are arbitrary, as well as the principle of dependent choice
$\textsf {DC}$
defined as the collection of
$\textsf {DC}^{\underline {\rho }}$
for all tuples of types
$\underline {\rho }$
with
where
$\underline {f}^{\underline {\rho }(0)}$
stands for
$f_1^{\rho _1(0)},\dots ,f_k^{\rho _k(0)}$
and A may now be arbitrary.
Over
$\mathcal {A}^\omega $
, we will have to rely on some chosen representation of the real numbers as a Polish space and for that we follow definitions and conventions given in [Reference Kohlenbach26]. In particular, rational numbers are represented using pairs of natural numbers and, in that context, we fix the same pairing function j as in [Reference Kohlenbach25]:
$$\begin{align*}j(n^0,m^0):=\begin{cases}\min u\leq_0(n+m)^2+3n+m[2u=_0(n+m)^2+3n+m]&\text{if existent},\\0^0&\text{otherwise}.\end{cases} \end{align*}$$
The usual arithmetical operations
$+_{\mathbb {Q}},\cdot _{\mathbb {Q}},\vert \cdot \vert _{\mathbb {Q}}$
, etc., are then primitive recursively definable through terms that operate on such codes and the usual relations
$=_{\mathbb {Q}}$
,
$<_{\mathbb {Q}}$
, etc., are definable via quantifier-free formulas.
For real numbers we then rely on a representation via fast converging Cauchy sequences of rational numbers with a fixed Cauchy modulus
$2^{-n}$
(see [Reference Kohlenbach26] for details), that is, via objects of type
$1$
, and we consider
$\mathbb {N}$
and
$\mathbb {Q}$
as being embedded in that representation via the constant sequences. Also here, the usual arithmetical operations like
$+_{\mathbb {R}}$
,
$\cdot _{\mathbb {R}}$
,
$\vert \cdot \vert _{\mathbb {R}}$
, etc., are primitive recursively definable through closed terms and the relations
$=_{\mathbb {R}}$
and
$<_{\mathbb {R}}$
, etc., now operating on type
$1$
objects, are representable by formulas of the underlying language. Naturally, these relations are not decidable anymore but are given by
$\Pi ^0_1$
- and
$\Sigma ^0_1$
-formulas, respectively.
In the context of this representation of reals, we will later rely on an operator
$\widehat {\cdot }$
which allows for an implicit quantification over all such fast-converging Cauchy sequences of rationals. Following [Reference Kohlenbach26], we define this operator via
$$\begin{align*}\widehat{x}n:=\begin{cases}xn&\text{if }\forall k<_0n\left( \vert xk-_{\mathbb{Q}}x(k+1)\vert_{\mathbb{Q}}<_{\mathbb{Q}}2^{-k-1}\right),\\ xk&\text{for } k<_0n \text{ least with }\vert xk-_{\mathbb{Q}}x(k+1)\vert_{\mathbb{Q}}\geq_{\mathbb{Q}}2^{-k-1}\text{ otherwise},\end{cases} \end{align*}$$
turning x of type
$1$
into a fast-converging Cauchy sequence
$\widehat {x}$
, and we refer to [Reference Kohlenbach26] for any further discussions of its properties.
In the context of the bound extraction theorems later on, we will rely on a canonical selection of a Cauchy sequence representing a given real number. Naturally, such an association will be noneffective. However, it will suffice that the operation behaves well enough w.r.t. the notion of majorization. Following [Reference Kohlenbach25], this can be achieved for non-negative numbers via the function
$(\cdot )_\circ $
defined by
where
$$\begin{align*}k_0:=\max k\left[\frac{k}{2^{n+1}}\leq r\right]. \end{align*}$$
Later, we will need an extension of this function
$(\cdot )_\circ $
to all real numbers such that we retain these nice properties regarding majorizability and so, for
$r<0$
, we consider
$(r)_\circ $
to be defined byFootnote
8
where
$$\begin{align*}\bar{k}_0:=\max k\left[\frac{k}{2^{n+1}}\leq \vert r\vert\right]. \end{align*}$$
Then
$(r)_\circ (n)=-_{\mathbb {Q}}(\vert r\vert )_\circ (n)$
and we get the following lemma containing exactly the properties that we later need for this notion to be useful in the context of majorizability (extending Lemma 2.10 from [Reference Kohlenbach25]):
Lemma 2.1 (essentially [Reference Kohlenbach25, Lemma 2.10], see also [Reference Pischke51, Lemma 2.1]).
Let
$r\in \mathbb {R}$
. Then:
-
1.
$(r)_\circ $
is a representation of r in the sense of the above (see again, e.g., [Reference Kohlenbach26]). -
2. For
$s\in [0,\infty )$
, if
$\vert r\vert \leq s$
, then
$(r)_\circ \leq _1 (s)_\circ $
, that is,
$(r)_\circ (n)\leq (s)_\circ (n)$
for all
$n\in \mathbb {N}$
. -
3.
$(r)_\circ $
is nondecreasing (as a type 1 function).
Lastly, we write
$r_\alpha $
for the unique real represented by
$\widehat {\alpha }$
for a given sequence
$\alpha \in \mathbb {N}^{\mathbb {N}}$
and we sometimes write
$[\alpha ](n)$
for the rational number represented by the n-th element of that sequence for better readability.
In terms of notation, we want to note that, to enhance readability, we will omit the subscripts of the arithmetical operations for
$\mathbb {R}$
everywhere. Further, we will omit the operation
$\cdot _{\mathbb {R}}$
often altogether. Similarly, we will also omit types of variables whenever convenient. Also, we will almost always omit types or related subscripts in proofs. Lastly, we throughout denote the powerset of a set X by
$2^X$
.
3 Systems for algebras of sets
In this section, we develop the underlying systems on which we will later bootstrap our treatment of probability contents and probability measures as well as of all the other respective extensions discussed before. As such, we begin with a treatment of algebras of sets as the most basic underlying algebraic notion that is essential for the theory of contents. For references on these basic definitions and their properties, if nothing else is mentioned otherwise, we mainly refer to [Reference Bhaskara Rao and Bhaskara Rao6].
Definition 3.1 (Algebra of sets).
Let
$\Omega $
be a set and
$S\subseteq 2^\Omega $
. Then S is called an algebra of sets (or simply an algebra) if
$\emptyset \in S$
and for any
$A,B\in S$
, it holds that
$A^c:=\Omega \setminus A\in S$
and
$A\cup B\in S$
.
The approach that we take towards a formal system for algebras of sets is to use abstract types to represent both the underlying ground set
$\Omega $
as well as the algebra
$S\subseteq 2^\Omega $
. One then has to restore the structure of S as a collection of subsets over
$\Omega $
with certain operations on them by including additional constants that reintroduce these operations in this abstract setting.
Concretely, to form a system for the treatment of algebras, we extend the previously discussed set of types T by two new abstract types
$\Omega $
and S, forming the extended set of types
$T^{\Omega ,S}$
defined by
and, over the resulting language, we then utilize this augmented set of types to introduce the following new constants to induce the usual structure on the set represented by S in relation to
$\Omega $
as mentioned before:
-
•
$\mathrm {eq}$
of type
$0(\Omega )(\Omega )$
; -
•
$\in $
of type
$0(S)(\Omega )$
; -
•
$\cup $
of type
$S(S)(S)$
; -
•
$(\cdot )^c$
of type
$S(S)$
; -
•
$\emptyset $
of type S; -
•
$c_\Omega $
of type
$\Omega $
.
The constant
$\mathrm {eq}$
serves as an abstract account of the equality relation between objects of type
$\Omega $
while the constant
$\in $
serves as an abstract account of the element relation between elements as objects of type
$\Omega $
and sets as objects of type S. The constants
$\cup $
and
$(\cdot )^c$
reintroduce the respective operations of union and complement for the abstract type S and
$\emptyset $
provides a constant representing the empty set. The constant
$c_\Omega $
in particular is intended to witness that the underlying set
$\Omega $
is nonempty. We often simply write
$A^c$
instead of
$(A)^c$
for A of type S. Further, we abbreviate
$\in x A =_0 0$
by
$x\in A$
and, similarly, we write
$x\not \in A$
for
$\in x A\neq _0 0$
. Lastly, we define
$\Omega :=\emptyset ^c$
as a notation for the top element of S.Footnote
9
Also regarding notation, we introduce intersections as an abbreviation by defining
for terms
$A^S, B^S$
.
We write
$x=_\Omega y$
as an abbreviation of
$\mathrm {eq} xy=_00$
for objects
$x^\Omega ,y^\Omega $
. Using
$\in $
, we introduce equality on S via the following abbreviation: for
$A^S$
and
$B^S$
, we define
Note that
$=_S$
clearly is, provably, an equivalence relation. Furthermore, we introduce the abbreviation
for
$A,B$
of type S and it is straightforward to show that
$\subseteq _S$
forms a partial order with respect to equality defined by
$=_S$
.
For axioms, we first specify that
$\mathrm {eq}$
represents an equivalence relation:
Further, we axiomatize that
$\in $
, as a relation, is bounded by
$1$
on all inputs and behaves as an element relation regarding the operations of union and complement as well as with respect to the empty set:
Based on the fact that inclusions of elements
$x^\Omega $
in elements
$A^S$
as facilitated by
$\in $
are quantifier-free assertions, the above axioms are (generalized)
$\Pi _1$
-sentences and so they are in particular immediately admissible in the context of bound extraction theorems based on the Dialectica interpretation (as will be discussed later in more detail).
Definition 3.2. We write
$\mathcal {F}^\omega $
for the system resulting from
$\mathcal {A}^\omega $
over the augmented language including the types
$\Omega ,S$
(where all the respective constants and axioms now are allowed to also refer to these new types, if applicable) by extending this system with the constants
$\mathrm {eq},\in ,\cup ,(\cdot )^c,\emptyset ,c_\Omega $
and the axioms
$(\mathrm {eq})_1$
–
$(\mathrm {eq})_2$
as well as
${({\in })}_1$
–
${({\in })}_4$
.
We now begin by showing some basic properties of the above operations on algebras provable in this system
$\mathcal {F}^\omega $
which, for one, amount to deriving the essential algebraic properties of S as a subalgebra of the full Boolean algebra of the power set of
$\Omega $
. Further, for another, all algebraic operations on S behave in a provably extensional way.
Proposition 3.3. The operations
$\cup $
and
$(\cdot )^c$
are provably extensional in
$\mathcal {F}^\omega $
, that is,
$\mathcal {F}^\omega $
proves:
-
1.
$\forall A^S, {A'}^S, B^S, {B'}^S(A=_SA'\land B=_SB'\rightarrow A\cup B=_SA'\cup B')$
, -
2.
$\forall A^S, {A'}^S(A=_SA'\rightarrow A^c=_S{A'}^c)$
.
Further, all the axioms of Boolean algebras, instantiated using
$\cap ,\cup ,(\cdot )^c$
, and
$=_S$
, can be derived in
$\mathcal {F}^\omega $
. Lastly, over
$\mathcal {F}^\omega $
,
$A \subseteq _S B$
is equivalent to both
$A =_S B \cap A$
and
$B =_S A \cup B$
for terms
$A^S,B^S$
.
Proof. We only show items (i) and (ii) as these illustrate the style of proof that one typically follows in
$\mathcal {F}^\omega $
to reason about the algebraic structure of S. The identities of Boolean algebras and the equivalent formulations of the order in terms of meet and join are then easily derived from the axioms
${({\in })}_2,\dots ,{({\in })}_4$
.
-
1. Fix
$A, A', B, B'$
and assume
$A=A'$
as well as
$B=B'$
. We need to show that Let x be arbitrary. Then
$$\begin{align*}\forall x(x\in A\cup B\leftrightarrow x\in A'\cup B'). \end{align*}$$
$x\in A\cup B$
is equivalent to
$x\in A\lor x\in B$
by
${({\in })}_3$
. By assumption of
$A=A'$
and
$B=B'$
, we have that
$x\in A\lor x\in B$
is equivalent to
$x\in A'\lor x\in B'$
and so to
$x\in A'\cup B'$
by
${({\in })}_3$
, which yields the claim.
-
2. Fix
$A,A'$
and assume
$A=A'$
. We need to show that Let x be arbitrary. Then
$$\begin{align*}\forall x(x\in A^c\leftrightarrow x\in {A'}^c). \end{align*}$$
$x\in A^c$
is equivalent to
$x \not \in A$
by
${({\in })}_4$
. By assumption of
$A=A'$
, we have that
$x\not \in A$
is equivalent to
$x\not \in A'$
and thus to
$x\in {A'}^c$
again by
${({\in })}_4$
.
Remark 3.4. Also the constants
$\mathrm {eq}$
and
$\in $
are immediately provably extensional in
$\mathcal {F}^\omega $
, as can be shown using the quantifier-free extensionality rule.
Using the recursor constants of the underlying language of
$\mathcal {F}^\omega $
in combination with the union operation
$\cup $
immediately allows one to also talk about arbitrary finite unions. Concretely, given a sequence of events
$A^{S(0)}$
and two natural numbers
$n^0\leq _0 m^0$
, we use the abbreviation
$$\begin{align*}\bigcup_{i=n}^{m}A(i):=R_S(m-n,A(n),\lambda B,x.(B\cup A(n+x+1))) \end{align*}$$
where
$R_S$
is a (single) type S recursor constant. For
$m<_0n$
, we simply set
$\bigcup _{i=n}^{m}A(i):=\emptyset $
. We then dually write
$$\begin{align*}\bigcap_{i=n}^mA(i):=\left(\bigcup_{i=n}^m(A(i))^c\right)^c. \end{align*}$$
It is easy to show by induction that the previous extensionality result for
$\cup $
extends to these finite unions.
4 Systems for contents on algebras of sets
We now augment the previous system
$\mathcal {F}^\omega $
for the treatment of algebras so that we arrive at a system suitable for treating proofs from the theory of probability contents in the sense of the following definition:Footnote
10
Definition 4.1 (Contents).
Let
$\Omega $
be a set and
$S\subseteq 2^\Omega $
be an algebra. A content on S is a mapping
$\mu :S\to [0,\infty ]$
such that
$\mu (\emptyset )=0$
and
$\mu (A\cup B)=\mu (A)+\mu (B)$
for
$A,B\in S$
with
$A\cap B=\emptyset $
.
We say that
$\mu $
is a probability content if
$\mu (\Omega )=1$
.
We mainly denote probability contents by the symbol
$\mathbb {P}$
. Again, we mainly refer to [Reference Bhaskara Rao and Bhaskara Rao6] as a standard reference for the theory of contents.
The concrete approach that we now take for a formal system for probability contents on algebras is to introduce an additional constant
-
•
$\mathbb {P}$
of type
$1(S)$
to the language of the system
$\mathcal {F}^\omega $
. The first defining properties of
$\mathbb {P}$
as a probability content are then easily formalized in the underlying language as
These statements
$(\mathbb {P})_1$
and
$(\mathbb {P})_2$
are again purely universal statements and therefore immediately admissible in the context of metatheorems based on the (monotone) functional interpretation.
The last property of
$\mathbb {P}$
, that is, additivity, if formalized naively via
is not purely universal based on the internal definition of
$=_S$
and is instead equivalent over
$\mathcal {F}^\omega $
(extended with the constant
$\mathbb {P}$
) to the following generalized
$\Pi _3$
-sentence:
Similar to how the class of so-called type
$\Delta $
sentences is treated in, for example, [Reference Günzel and Kohlenbach16], it is clear that this statement would be admissible in the context of bound extraction theorems based on the monotone Dialectica interpretation if the x could be conceived of as being bounded in a suitable sense relative to A and B. Now, as a matter of fact, a crucial perspective for our formal approach to deriving bound extraction theorems for these systems for algebras and probability contents will be that the whole space
$\Omega $
can be naturally regarded as uniformly bounded. In the context of a corresponding suitable extension of the notion of majorizability to
$\Omega $
which reflects this perspective via assuming that there is a uniform majorant for all
$x^\Omega $
, the above axiom actually has a trivial monotone functional interpretation and is thus admissible in the context of the approach to proof mining metatheorems via such a variant of the Dialectica interpretation. This will be discussed in full formal detail later on so that here, for now, we are content with just considering quantification over
$\Omega $
as “bounded” and so as “proof-theoretically harmless.”
However, if we were to admit the above sentence as the sole axiom, we would be tasked with deriving all the other properties of
$\mathbb {P}$
from this axiom, including monotonicity and thus extensionality of the content as a function, which would require many subtle manipulations of various equalities using the quantifier-free extensionality rule. We therefore instead opt for the following axiomatization which eases the formal development of these properties in the resulting system: For one, instead of additivity, we include the following generalized additivity law which holds for probability contents on algebras:
This statement is purely universal and thus immediately admissible in the context of our approach to bound extraction theorems as before. The other property of
$\mathbb {P}$
that we then axiomatically add is that of the monotonicity of
$\mathbb {P}$
, that is,
Similar to the above, this statement is equivalent to the following (generalized)
$\Pi _3$
-statement
which in the context of the previously sketched extended notion of majorizability will later have a trivial monotone functional interpretation and thus be admissible in the context of our approach to proof mining metatheorems.
Adding these statements as axioms to the underlying system for algebras of sets, we derive the following system for probability contents on such algebras:
Definition 4.2. We write
$\mathcal {F}^\omega [\mathbb {P}]$
for the system resulting from
$\mathcal {F}^\omega $
by adding the above constant
$\mathbb {P}$
together with the axioms
$(\mathbb {P})_1$
–
$(\mathbb {P})_4$
.
We now begin with some immediate properties of
$\mathbb {P}$
provable in the above system.
Proposition 4.3. The following properties of
$\mathbb {P}$
are provable in
$\mathcal {F}^\omega [\mathbb {P}]$
:
-
1.
$\mathbb {P}$
is extensional w.r.t.
$=_S$
and
$=_{\mathbb {R}}$
, that is,
$$\begin{align*}\forall A^S,B^S\left( A=_SB\rightarrow \mathbb{P}(A)=_{\mathbb{R}}\mathbb{P}(B)\right). \end{align*}$$
-
2.
$\mathbb {P}$
is definite on
$\emptyset $
, that is,
$$\begin{align*}\forall A^S\left(\mathbb{P}(A)>_{\mathbb{R}}0\rightarrow A\neq_S\emptyset\right). \end{align*}$$
-
3.
$\mathbb {P}$
is additive, that is,
$$\begin{align*}\forall A^S, B^S(A\cap B=_S\emptyset\rightarrow \mathbb{P}(A \cup B) =_{\mathbb{R}} \mathbb{P}(A) + \mathbb{P}(B)). \end{align*}$$
-
4.
$\mathbb {P}$
respects the relative complements of subsets, that is, In particular, we also have
$$\begin{align*}\forall A^S, B^S(B \subseteq_S A \rightarrow \mathbb{P}(A \cap B^c) =_{\mathbb{R}} \mathbb{P}(A) - \mathbb{P}(B)). \end{align*}$$
$$\begin{align*}\forall A^S (\mathbb{P}(A^c) =_{\mathbb{R}} 1 - \mathbb{P}(A)). \end{align*}$$
-
5.
$\mathbb {P}$
satisfies Boole’s inequality, that is,
$$\begin{align*}\forall A^{S(0)}, n^0\,\left(\mathbb{P}\kern1.5pt\left(\bigcup^n_{i=0} A(i)\right) \le_{\mathbb{R}} \sum_{i=0}^n \mathbb{P}(A(i))\right). \end{align*}$$
Proof.
-
1. Assume
$\mathbb {P}(A)>\mathbb {P}(B)$
. By axiom
$(\mathbb {P})_4$
, there exists an x such that
$x\in A$
and
$x\not \in B$
, that is,
$A\neq B$
. Similarly we derive
$A\neq B$
from
$\mathbb {P}(A)<\mathbb {P}(B)$
. Combined, we get that
$A= B$
implies
$\mathbb {P}(A)=\mathbb {P}(B).$
-
2. Assume
$\mathbb {P}(A)>0=\mathbb {P}(\emptyset )$
. Then by axiom
$(\mathbb {P})_4$
, we get an
$x\in A$
, that is,
$A\neq \emptyset $
. -
3. Let
$A,B$
be arbitrary with
$A\cap B=\emptyset $
. We have
$\mathbb {P}(A\cup B)=\mathbb {P}(A)+\mathbb {P}(B)-\mathbb {P}(A\cap B)$
by axiom
$(\mathbb {P})_3$
. As
$\mathbb {P}$
is extensional, we get
$\mathbb {P}(A\cap B)=\mathbb {P}(\emptyset )=0$
so that the above implies
$\mathbb {P}(A\cup B)=\mathbb {P}(A)+\mathbb {P}(B)$
as desired. -
4. Let
$E := A \cap B$
and
$F:= A \cap B^c$
. Then
$E \cap F = \emptyset $
(by the properties of algebras of sets). Thus, by additivity,
$\mathbb {P}(E \cup F) = \mathbb {P}(E) + \mathbb {P}(F)$
. We have that
$E \cup F = A$
(again by the properties of algebras of sets). Thus, by extensionality of
$\mathbb {P}$
, we have
$\mathbb {P}(A) = \mathbb {P}(A \cap B) + \mathbb {P}(A \cap B^c)$
. Now,
$B \subseteq A$
implies
$A \cap B = B$
by Proposition 3.3 and so the result follows from the extensionality of
$\mathbb {P}$
. -
5. This follows via a simple induction from the axiom
$(\mathbb {P})_3$
.
Contents on algebras enjoy certain continuity properties similar to continuity from above and below for measures but without the existence of limiting sets, that is, infinite unions, etc. (see, e.g., [Reference Bhaskara Rao and Bhaskara Rao6]) and we now discuss how the system
$\mathcal {F}^\omega [\mathbb {P}]$
recognizes Cauchy-variants of these properties.
For that, we introduce the following operation on terms of type
$S(0)$
that allows for the implicit quantification over a disjoint countable family of sets: given
$A^{S(0)}$
, we set
$(A\!\uparrow )(0)=A(0)$
and
$$\begin{align*}(A\!\uparrow)(n+1):=A(n+1)\cap \left(\bigcup^n_{i=0} A(i)\right)^c. \end{align*}$$
This operation thus turns A into a sequence of disjoint sets
$A\!\uparrow $
with the same (partial) union(s) and if A was already a disjoint family, then it is left unchanged by the operation.
We now begin with a Cauchy-type form of
$\sigma $
-additivity of
$\mathbb {P}$
as a content. For this, note that for a given
$A^{S(0)}$
, the sequence of partial sums
$$\begin{align*}\sum^n_{i=0}\mathbb{P}((A\!\uparrow)(i))=\mathbb{P}\kern1.5pt\left(\bigcup_{i=0}^n(A\!\uparrow)(i)\right)=\mathbb{P}\kern1.5pt\left(\bigcup_{i=0}^nA(i)\right) \end{align*}$$
is a monotone and bounded sequence of real numbers and thus is Cauchy. The following result that (already a weak fragment of)
$\textsf {WE}\mbox {-}\textsf {PA}^\omega $
suffices to prove the Cauchy formulation of the convergence of monotone and bounded sequences is well known:
Lemma 4.4 (folklore, see essentially [Reference Kohlenbach26]).
The system
$\textsf {WE}\mbox {-}\textsf {PA}^\omega $
(and actually already a weak fragment thereof) proves that
$$ \begin{align*} &\forall a^{1(0)}\Big(\forall n^0\left(0\le_{\mathbb{R}} a(n)\le_{\mathbb{R}} 1\land a(n) \le_{\mathbb{R}} a(n+1)\right)\\ &\qquad\to \forall k^0\exists N^0\forall n^0,m^0\ge_0 N\left(\vert a(n)-a(m)\vert <_{\mathbb{R}} 2^{-k}\right)\Big). \end{align*} $$
So, instantiating the above result with
$a(n)=\sum ^n_{i=0}\mathbb {P}((A\!\uparrow )(i))$
, we can derive that
$\mathcal {F}^\omega [\mathbb {P}]$
(and actually already a weak fragment thereof) can prove the Cauchy property of sequences of contents of increasing disjoint unions:
Proposition 4.5. The system
$\mathcal {F}^\omega [\mathbb {P}]$
(and actually already a weak fragment thereof) proves
$$\begin{align*}\forall A^{S(0)}\forall k^0\exists N^0\forall n^0,m^0\ge_0 N\left( \left\vert \sum_{i=0}^n\mathbb{P}((A\!\uparrow)(i)) - \sum_{i=0}^m\mathbb{P}((A\!\uparrow)(i))\right\vert<_{\mathbb{R}} 2^{-k}\right). \end{align*}$$
From this Proposition 4.5, we can then immediately derive the following continuity theorems for contents.
Proposition 4.6. The system
$\mathcal {F}^\omega [\mathbb {P}]$
(and actually already a weak fragment thereof) proves:
-
1.
$\mathbb {P}$
is continuous from below, that is,
$$\begin{align*}\forall A^{S(0)}\left(\forall n^0\left(A(n) \subseteq_S A(n+1)\right) \to\forall k^0\exists N^0\forall n^0,m^0\ge_0 N\left( \left\vert \mathbb{P}(A(n)) - \mathbb{P}(A(m))\right\vert<_{\mathbb{R}} 2^{-k}\right)\right). \end{align*}$$
-
2.
$\mathbb {P}$
is continuous from above, that is,
$$\begin{align*}\forall A^{S(0)}\left(\forall n^0\left(A(n+1) \subseteq_S A(n)\right) \to\forall k^0\exists N^0\forall n^0,m^0\ge_0 N\left( \left\vert \mathbb{P}(A(n)) - \mathbb{P}(A(m))\right\vert<_{\mathbb{R}} 2^{-k}\right)\right). \end{align*}$$
Proof.
-
1. Note that
$A(n) \subseteq A(n+1)$
for all n implies that for any n. Thus by Proposition 4.3, (4), we have
$$\begin{align*}(A\!\uparrow)(n+1) = A(n+1)\cap A(n)^c \end{align*}$$
for all n. Thus we have
$$\begin{align*}\mathbb{P}((A\!\uparrow)(n+1)) = \mathbb{P}(A(n+1))- \mathbb{P}(A(n)) \end{align*}$$
for any n. The result now follows from Proposition 4.5.
$$\begin{align*}\sum_{i=0}^n\mathbb{P}((A\!\uparrow)(i))=\mathbb{P}(A(0))+\sum_{i=0}^{n-1}\left(\mathbb{P}(A(i+1))-\mathbb{P}(A(i))\right)=\mathbb{P}(A(n)) \end{align*}$$
-
2. Observe that
$A(n+1) \subseteq A(n)$
for any n implies that
$A(n)^c \subseteq A(n+1)^c$
for all n. Thus, by
$(1)$
, we have By Proposition 4.3, (4), we get
$$\begin{align*}\forall k^0\exists N^0\forall n^0,m^0\ge_0 N\left( \left\vert \mathbb{P}(A(n)^c) - \mathbb{P}(A(m)^c)\right\vert<_{\mathbb{R}} 2^{-k}\right). \end{align*}$$
$\mathbb {P}(A(n)^c)=1-\mathbb {P}(A(n))$
and from this the result follows.
5 Systems for
$\sigma $
-algebras and probability measures
If we now further require the closure of the underlying algebra of sets under countable unions, we arrive at the notion of a
$\sigma $
-algebra which forms the algebraic basis for probability measures.
Definition 5.1 (
$\sigma $
-algebra).
Let
$\Omega $
be a set and
$S\subseteq 2^\Omega $
be an algebra. Then S is called a
$\sigma $
-algebra if for any
$(A_n)\subseteq S$
, it also holds that
$\bigcup _{n=0}^\infty A_n\in S$
.
The requirement that a content on a
$\sigma $
-algebra is also well-behaved w.r.t. these countable unions then leads to the notion of a measure on such an algebra.
Definition 5.2 (Measure).
Let
$\Omega $
be a set and
$S\subseteq 2^\Omega $
be a
$\sigma $
-algebra. A measure on S is a content
$\mu :S\to [0,\infty ]$
that is also
$\sigma $
-additive, that is,
$$\begin{align*}\mu\left(\bigcup_{n=0}^\infty A_n\right)=\sum_{n=0}^\infty\mu(A_n) \end{align*}$$
for any
$(A_n)\subseteq S$
with
$A_i\cap A_j=\emptyset $
for
$i\neq j$
.
The map
$\mu $
is called a probability measure if
$\mu (\Omega )=1$
. In that case,
$(\Omega ,S,\mu )$
is called a probability space.
In this section, we will now discuss how the previous system for algebras
$\mathcal {F}^\omega $
and its extension
$\mathcal {F}^\omega [\mathbb {P}]$
for treating probability contents can be augmented by a certain intensional treatment of countably infinite unions to provide an apt and tame formal basis for these notions.
5.1 Treating infinite unions tamely
Concretely, to treat countably infinite unions over algebras of sets tamely, we now extend the previous system
$\mathcal {F}^\omega $
with the following further constant
-
•
$\bigcup $
of type
$S(S(0))$
,
providing a term of type S for the resulting union of the sequence of sets coded by the input of type
$S(0)$
. So, in the context of suitable axioms specifying that
$\bigcup A$
for a given
$A^{S(0)}$
represents the union of all
$A(n)$
, we can formally induce that the algebra S is closed under these countable unions. The immediate axioms specifying the property that
$\bigcup A$
is the corresponding union are
as well as
specifying that
$\bigcup $
is the join of the elements
$A(n)$
in S (seen as a lattice). The first statement
${({\bigcup })}_1$
is immediately admissible in the context of the Dialectica interpretation as it is purely universal. The latter statement is naturally not admissible as is in the context of the usual approach to proof mining metatheorems as it contains a negative universal quantifier of type
$0$
that cannot be majorized (which, after all, is also why a uniform variant of arithmetical comprehension is necessary to fully define countable unions of sets, see, e.g., [Reference Kreuzer35] for further discussions).
Since we want to avoid this strong form of comprehension as to not in general distort the strength of the extracted bounds to be able to a priori guarantee the extractability of proof-theoretically tame bounds from proofs, we opt for the next best thing we can do and instead specify the union only intensionally by adding the following rule-variant of the above converse implication
$$\begin{align} \frac{F_{qf}\to \forall n^0\left( A(n)\subseteq_SB\right)}{F_{qf}\to \bigcup A\subseteq_S B}\end{align}$$
where A is a term of type
$S(0)$
, B is a term of type S and
$F_{qf}$
is a quantifier-free formula. So: If
$A(n)$
is provably bounded above by B w.r.t.
$\subseteq _S$
under some quantifier-free assumptions, then also
$\bigcup A$
is provably bounded above by B w.r.t.
$\subseteq _S$
under the same assumptions.
Definition 5.3. We write
$\mathcal {F}^\omega [\bigcup ]$
for the system resulting from
$\mathcal {F}^\omega $
by extending it with the constant
$\bigcup $
together with the axiom
${({\bigcup })}_1$
and the rule
${({\bigcup })}_2$
. Similarly, we write
$\mathcal {F}^\omega [\bigcup ,\mathbb {P}]$
for the system that arises from
$\mathcal {F}^\omega [\mathbb {P}]$
by adding the same constants, axioms, and rules.
Using this intensional approach to countable unions, we can also immediately provide an intensional treatment of countable intersections. For this, we first define
$A^c$
for an
$A^{S(0)}$
by setting
$A^c(n):= A(n)^c$
for any
$n^0$
. Using this notation, we then define the countable intersection of a collection of sets represented by an
$A^{S(0)}$
via
We then get that analogs of the axiom
${({\bigcup })}_1$
and the rule
${({\bigcup })}_2$
, formulated appropriately for the intersection, are provable in our system
$\mathcal {F}^\omega [\bigcup ]$
:
Lemma 5.4. The following statement is provable in
$\mathcal {F}^\omega [\bigcup ]$
:
Further, in
$\mathcal {F}^\omega [\bigcup ]$
the following rule is derivable:
$$\begin{align*}\frac{F_{qf}\to \forall n^0\left( B\subseteq_S A(n)\right)}{F_{qf}\to B\subseteq_S \bigcap A}. \end{align*}$$
Proof. For the provability of the first statement, let
$x\in \bigcap A$
. By definition we have
$x\not \in \bigcup A^c$
. Then by axiom
${({\bigcup })}_1$
, we get
$x\not \in A(n)^c$
, that is,
$x\in A(n)$
for any n.
Now, for the rule, suppose that we provably have
$F_{qf}\to \forall n\left ( B\subseteq A(n)\right )$
. Then we also provably have
$F_{qf}\to \forall n(A(n)^c\subseteq B^c)$
and using the rule
${({\bigcup })}_2$
, we get
$F_{qf}\to \bigcup A^c\subseteq B^c$
. Thus also
$F_{qf}\to B\subseteq \left (\bigcup A^c\right )^c=\bigcap A$
as desired.
5.2 Handling probability measures
As we have seen in Propositions 4.5 and 4.6, the system
$\mathcal {F}^\omega [\mathbb {P}]$
already provides Cauchy-variants of the convergence of monotone sequences of events as well as of sums of disjoint events. In the theory of measures on
$\sigma $
-algebras, the resulting limits of course correspond to the measure of respective infinite unions or intersections. Thus, the natural question of whether and how this can be formally represented in the system
$\mathcal {F}^\omega [\bigcup ,\mathbb {P}]$
immediately arises. And while for a disjoint family represented by
$A^{S(0)}$
, the limit
$$\begin{align*}0\leq \mathbb{P}\kern1.5pt\left(\bigcup A\right)-\mathbb{P}\kern1.5pt\left(\bigcup_{i=0}^nA(i)\right)\to 0\text{ for }n\to\infty \end{align*}$$
holds true, there is in general no computable rate of convergence for this expression in the sense that even a function
$\varphi $
of type
$1$
such that
$$ \begin{align} \forall k^0\exists n\leq_0 \varphi(k)\left(\mathbb{P}\kern1.5pt\left(\bigcup A\right)-\mathbb{P}\kern1.5pt\left(\bigcup_{i=0}^nA(i)\right)\leq_{\mathbb{R}} 2^{-k}\right) \end{align} $$
is in general not computable (see Remark 5.5 for an example). Nevertheless, we want to point out that while therefore the convergence of the sequence
$\sum _{i=0}^n\mathbb {P}(A(i))$
towards
$\mathbb {P}(\bigcup A)$
cannot be provable in any system that allows for the extraction of computable and uniform bounds, the system
$\mathcal {F}^\omega [\bigcup ,\mathbb {P}]$
still provides an intensional version of that convergence in the following sense:
-
1. By Proposition 4.5, the sequence of partial sums
$\sum _{i=0}^n\mathbb {P}(A(i))$
is provably Cauchy. -
2. Using the additivity and monotonicity of
$\mathbb {P}$
, that is, axioms
$(\mathbb {P})_3$
and
$(\mathbb {P})_4$
, we get by
$\bigcup _{i=0}^nA(i)\subseteq _S \bigcup A$
that holds provably.
$$\begin{align*}\sum_{i=0}^n\mathbb{P}(A(i))=_{\mathbb{R}}\mathbb{P}\kern1.5pt\left(\bigcup_{i=0}^n A(i)\right)\leq_{\mathbb{R}} \mathbb{P}\kern1.5pt\left(\bigcup A\right) \end{align*}$$
-
3. For any object
$B^S$
such that we provably have
$\forall n^0\left (A(n)\subseteq _S B\right )$
, we get
$\bigcup A\subseteq _S B$
using the rule
${({\bigcup })}_2$
and so we get provably in that case by monotonicity of
$$\begin{align*}\mathbb{P}\kern1.5pt\left(\bigcup A\right)\leq_{\mathbb{R}} \mathbb{P}(B) \end{align*}$$
$\mathbb {P}$
.
So the value
$\mathbb {P}(\bigcup A)$
is at least intensionally specified to be the limit of the partial sums
$\sum _{i=0}^n\mathbb {P}(A(i))$
as
$\mathbb {P}(\bigcup A)$
is bounded below by this nondecreasing sequence of partial sums and intensionally bounded above by the probability of any set which provably sits above the given partial unions
$\bigcup _{i=0}^nA(i)$
.
The case that we want to make is now twofold: For one, as mentioned in the introduction, the theory of contents already exhausts large parts of the theory of measures in the sense that often already the properties of contents on algebras suffice to carry out proofs for properties of measures on
$\sigma $
-algebras (as will also be the case in the applications discussed later). For another, we want to argue that even in situations where one cannot do just with finite unions and content, such an intensional specification of countable unions and their measures might suffice for formalizing a given proof and all the while guaranteeing the extractability of tame bounds a priori. If that is not the case, then the result under consideration might be considered to be inherently “untame” and a full treatment of the comprehension principle needed to define the respective unions could be necessary. We therefore regard
$\mathcal {F}^\omega [\bigcup ,\mathbb {P}]$
as a suitable tame base system for treating probability measures on
$\sigma $
-algebras.
Remark 5.5. For an example where
$\varphi $
in (
$\circ $
) is not computable, we take
$(r_i)\subseteq (0,1)$
to be a sequence of computable real numbers such that
$$\begin{align*}a_n=\sum_{i=0}^nr_i\to a \le 1 \end{align*}$$
without a computable rate of convergence.Footnote
11
We now define
$\Omega =\mathbb {N}\cup \{\infty \}$
as well as
$S=2^\Omega $
. On the discrete sample space
$\Omega $
, we then define a probability mass function p via
$p(i)=r_{i}$
for
$i\in \mathbb {N}$
as well as
$p(\infty )=1-a$
. This p then as usual induces a probability measure
$\mathbb {P}$
on the
$\sigma $
-algebra S defined for
$A\subseteq \Omega $
via
Clearly
$(\Omega ,S,\mathbb {P})$
is a probability space where
$$\begin{align*}\mathbb{P}(\mathbb{N})-\sum_{i=1}^{n+1}\mathbb{P}(\{i\})=a-\sum_{i=0}^nr_i=a-a_n \end{align*}$$
cannot have a computable rate of convergence to
$0$
.
6 Intensional intervals, inverse mappings and measurable functions
In this section, we now extend the machinery of the previous logical systems so that we are able to deal with functions
$f:\Omega \to \mathbb {R}$
that are measurable in a certain suitable sense relative to algebras. As such, the treatment given here will be instrumental for our approach to integrable functions in Section 7, for the applications discussed in Section 9, and for the proof-theoretic transfer principles for implications between modes of convergence in Section 10. For this, we now first recall the essential definitions and basic results.
Definition 6.1 (Borel
$\sigma $
-algebra).
Let X be a topological space. The Borel
$\sigma $
-algebra
$\mathcal {B}(X)$
on X is the smallest
$\sigma $
-algebra on X that contains all open subsets of X.
We refer to [Reference Halmos17] as a standard reference for Borel
$\sigma $
-algebras in particular and measure theory in general (in particular regarding the well-definedness of the above definition for which one needs to see that the intersection of any family of (
$\sigma $
-)algebras is again a (
$\sigma $
-)algebra).
Crucial for us will be the notion of a generating set of a (
$\sigma $
-)algebra.
Definition 6.2 (Generators of a (
$\sigma $
-)algebra).
Let
$\Omega $
be a set and
$S\subseteq 2^\Omega $
be a (
$\sigma $
-)algebra. A generating set for S is a set
$S_0\subseteq 2^\Omega $
such that S is the smallest (
$\sigma $
-)algebra containing
$S_0$
.
In that terminology, the Borel
$\sigma $
-algebra
$\mathcal {B}(X)$
is the
$\sigma $
-algebra generated by the open subsets of the underlying topological space.
For the special case of the real numbers as a topological space with the usual topology induced by the metric distance, we in particular get the following canonical generators besides the open subsets of
$\mathbb {R}$
.
Lemma 6.3 (folklore, see, e.g., [Reference Halmos17]).
The Borel
$\sigma $
-algebra
$\mathcal {B}(\mathbb {R})$
on
$\mathbb {R}$
is generated by the collection of all closed intervals
$\{[a,b]\mid a,b\in \mathbb {R}\}$
.
One then arrives at the notion of a measurable function which we for simplicity only formulate for real-valued functions here.
Definition 6.4 (Borel-measurable function).
Let
$(\Omega ,S,\mu )$
be a content space. A function
$f:\Omega \to \mathbb {R}$
is called Borel-measurable if
for all
$B\in \mathcal {B}(\mathbb {R})$
.
As we will crucially use later on, Borel-measurability is simply characterized by a similar condition on a generating set.
Proposition 6.5 (folklore, see, e.g., [Reference Halmos17]).
Let
$(\Omega ,S,\mu )$
be a measure space. A function
$f:\Omega \to \mathbb {R}$
is Borel-measurable if, and only if,
$f^{-1}([a,b])\in S$
for all
$a,b\in \mathbb {R}$
.
If the underlying algebra S is not a
$\sigma $
-algebra and if
$\mu $
is only a content, then the requirement that the preimages of the above collection of intervals are included is still statable but might result in something weaker than full Borel-measurability. We call a function
$f:\Omega \to \mathbb {R}$
where the preimages of all intervals
$[a,b]$
for
$a,b\in \mathbb {R}$
are included in a corresponding algebra
$S\subseteq 2^\Omega $
to be weakly Borel-measurable. It will in particular be this notion of weakly Borel-measurable functions that we will rely on later in the context of our approach towards Lebesgue integrals for probability contents. It should be noted that by requiring the inclusion
$f^{-1}([a,b])\in S$
for two real numbers
$a,b\in \mathbb {R}$
and an algebra S, we in particular also obtain that
$f^{-1}([a,b))=f^{-1}([a,b])\cap \left (f^{-1}([b,b])\right )^c\in S$
as S is an algebra.Footnote
12
To formally deal with the notion of (weak) Borel-measurability, we thus need access to the collection of the closed intervals
$[a,b]$
for
$a,b\in \mathbb {R}$
generating the Borel-algebra. For this, we will introduce an intensional approach to real intervals in the next subsection to provide formal means of operating on these generators. These intensional variants of real intervals can then be processed by a general type of inverse map using which we will be able to state the measurability of a function formally.
6.1 Intensional Intervals
Concretely, we now provide a quantifier-free (and thus in a way intensional) account of the closed intervals
$[a,b]$
(and thus also of the half-open intervals
$[a,b)$
as discussed above) by introducing a further constant to the language:
-
•
$[\cdot ,\cdot ]$
of type
$0(1)(1)(1)$
.
Given two inputs
$a^1, b^1$
, this function shall return a characteristic function for an intensional representation of the corresponding interval. For this, we add the following axioms:
Here, we wrote
$[a,b]$
for
$[\cdot ,\cdot ]ab$
as well as
$r\in [a,b]$
for
$[a,b](r)=_00$
. Note that this is an intensional representation of the set as we have
$a,b\in [a,b]$
but we cannot conclude from
$r=a$
or
$r=b$
that
$r\in [a,b]$
.
We also introduce the following notation used later: if we are given a system
$\mathcal {C}^\omega $
, we write
$\mathcal {C}^\omega [\mathrm {Int}]$
to denote the extension of
$\mathcal {C}^\omega $
by the above constant
$[\cdot ,\cdot ]$
and the axioms
$([\cdot ,\cdot ])_1$
–
$([\cdot ,\cdot ])_4$
for treating the closed intervals.
In similarity to the discussion above, these closed intervals then also provide a quantifier-free access to the half-open intervals in the following way: We define
$[\cdot ,\cdot )$
of type
$0(1)(1)(1)$
via
Also for
$[\cdot ,\cdot )$
, we write
$[a,b)$
for
$[\cdot ,\cdot )ab$
as well as
$r\in [a,b)$
for
$[a,b)(r)=_00$
.
In the context of that definition, we in particular obtain relatively immediately that
$[\cdot ,\cdot )$
defined as such satisfies properties that intensionally specify the half-open intervals similar to how we have specified closed intervals above with the axioms
$([\cdot ,\cdot ])_1$
–
$([\cdot ,\cdot ])_4$
, that is, we for one have
$a\in [a,b)$
but from
$r=a$
, we cannot infer
$r\in [a,b)$
and we have
$b\not \in [a,b)$
but from
$r=b$
, we cannot infer
$r\not \in [a,b)$
. This is collected in the following lemma:
Lemma 6.6. The system
$\mathcal {A}^\omega [\mathrm {Int}]$
proves the following properties of
$[\cdot ,\cdot )$
:
-
1.
$\forall a^1,b^1,r^1\left ([a,b)(r)\leq _0 1\right )$
, -
2.
$\forall a^1,b^1,r^1\left (r\in [a,b)\to a\leq _{\mathbb {R}} r<_{\mathbb {R}}b\right )$
, -
3.
$\forall a^1,b^1,r^1\left (a<_{\mathbb {R}}r<_{\mathbb {R}} b\to r\in [a,b)\right )$
, -
4.
$\forall a^1,b^1\left (a<_{\mathbb {R}}b\to a\in [a,b)\land b\not \in [a,b)\right )$
.
We omit the proof as it is rather immediate.
6.2 The inverse map
We now provide a treatment of the inverse map for a given function f of type
$1(\Omega )$
. For this, we actually introduce a uniform operator into the language via a constant
-
•
$(\cdot )^{-1}$
of type
$0(\Omega )(0(1))(1(\Omega ))$
that provides an inverse map for any given function
$f^{1(\Omega )}$
in the sense that, writing
$f^{-1}$
for
$(\cdot )^{-1}f$
, the functional
$f^{-1}$
receives a subset of the reals coded via a characteristic function
$A^{0(1)}$
and maps this into a characteristic function
$f^{-1}A$
of type
$0(\Omega )$
coding a subset of the underlying space
$\Omega $
.
This type of map is then governed by the following two axioms:
Here, we wrote
$f(x)\in A$
for
$Af(x)=_00$
and
$x\in f^{-1}A$
for
$f^{-1}Ax=_00$
.
Also for this type of extension, we introduce the following generic notation: given a system
$\mathcal {C}^\omega $
, we write
$\mathcal {C}^\omega [\mathrm {Inv}]$
to denote the extensions of
$\mathcal {C}^\omega $
by the constant
$(\cdot )^{-1}$
and the above axioms for treating the inverse map.
6.3 Measurability of functions
In the context of the intensional representations for closed intervals, generating the Borel
$\sigma $
-algebra on the reals, as well as using the general inverse map introduced before, we are now in the position of formulating the (weak) Borel-measurability of a function
$f^{1(\Omega )}$
formally in the underlying language by
The inner matrix (based on the fact that stating element relations via
$\in $
is quantifier-free and as we use the abbreviation
$x\in f^{-1}([a,b])$
for
$f^{-1}[a,b]x=_00$
as introduced in Section 6.2) is quantifier-free and thus the above sentence is a generalized
$\Pi _3$
-statement. Similar to the monotonicity statement of the content
$\mathbb {P}$
, this statement would be admissible in the context of the monotone Dialectica interpretation if the quantification over elements of type S could be conceived of as being bounded in a suitable sense. Similar to how we argued with the monotonicity of
$\mathbb {P}$
, a crucial perspective for our formal approach to bound extraction theorems later on will be that, besides the whole space
$\Omega $
, we will also be able to regard the space S as uniformly bounded, formally encapsulated via a corresponding suitable extension of the notion of majorizability to S used later. In that context, such a sentence has a trivial monotone functional interpretation and we will use this later to formulate admissible axioms stating that certain classes of functions are indeed measurable.
7 Treating integration over probability contents
We now want to extend the previous system
$\mathcal {F}^\omega [\mathbb {P}]$
for probability contents on algebras so that we can treat a certain class of integrable functions
$f:\Omega \to \mathbb {R}$
, in particular so that we obtain a suitable base for random variables and their moments as used in various applications, especially those illustrated in Section 9.
For the usual approach to the integral over contents, which mimics that of the Lebesgue integral, we mainly follow the exposition given in [Reference Bhaskara Rao and Bhaskara Rao6] (where the corresponding notion is introduced under the name of the “D-integral”) which we detail here to some degree to provide the necessary mathematical basics for the axiomatizations chosen later. Concretely, let
$\Omega $
be a set and S an algebra on it and let
$\mu $
be a finite content on S (i.e.,
$\mu (\Omega )<+\infty $
). One then first arrives at a notion of simple function that is completely analogous to how it is usually defined in the context of measure theory, that is, a simple function is a function
$f:\Omega \to \mathbb {R}$
of the form
$$\begin{align*}f(x)=\sum_{i=0}^n b_i\chi_{A_i} \end{align*}$$
for given sets
$A_i\in S$
and values
$b_i\in \mathbb {R}$
.Footnote
13
For such a function f, the integral over
$\mu $
is simply defined as
$$\begin{align*}\int{f}\,\mathrm{d}\mu=\sum_{i=0}^n b_i\mu(A_i), \end{align*}$$
also in similarity to usual Lebesgue integrals over measures. A general function
$f:\Omega \to \mathbb {R}$
is now declared integrable if there is a sequence of simple functions
$f_n$
such that (1) the
$f_n$
converge to f in a suitable senseFootnote
14
and (2) the sequence satisfies
Such a sequence is called a determining sequence for f and using such a determining sequence, one then defines the integral of f via
Crucially, this limit is well-defined as the following general result shows:
Lemma 7.1 (Lemma 4.4.12 in [Reference Bhaskara Rao and Bhaskara Rao6]).
Let f be an integrable function and let
$(f_n)$
be a determining sequence for f. Then
$f_n-f$
is integrable and
This notion of a (D-)integral defined for contents then shares many of the familiar properties of Lebesgue integrals defined for measures. One particularly useful property is that every integrable function f is measurable in the following extended sense, called
$T_2$
-measurable in [Reference Bhaskara Rao and Bhaskara Rao6]: for any
$\varepsilon>0$
, there is a partition
$A_0,\dots , A_n\in S$
of
$\Omega $
such that
$\mu (A_0)<\varepsilon $
and
$\vert f(x)-f(y)\vert <\varepsilon $
for any
$x,y\in A_i$
and any i. In particular, for any function f that is measurable in this sense, one gets that f is integrable if, and only if,
$\vert f\vert $
is integrable (see, e.g., Corollary 4.4.19 in [Reference Bhaskara Rao and Bhaskara Rao6]).
Now, a major part of the theory of integrals over contents (like, e.g., a nice correspondence between the so-called D- and S-integrals, the latter being similar in spirit to a Riemann-Stieltjes integral, and the fact that the notions of
$T_2$
-measurability and integrability coincide as shown by Theorem 4.5.7 in [Reference Bhaskara Rao and Bhaskara Rao6], among many others) depends on the assumption that the integrated functions are bounded. In that way, we will similarly require that all integrated functions are bounded.
In fact, using the proof-theoretic perspective of the approach taken here, we find that this assumption of the boundedness of functions is also suggested as a necessity in our formal approach by the notion of majorizability employed later. Concretely, as discussed before, to develop a proof-theoretically tame theory of algebras and contents, we have regarded
$\Omega $
and S as uniformly bounded in the sense that we later regard all elements of these spaces as uniformly majorized by the content of the full space
$\mu (\Omega )$
, that is, by
$1$
in the context of a probability content, which we denote in writing by
$1\gtrsim _\Omega x$
and
$1\gtrsim _S A$
for
$x\in \Omega $
and
$A\in S$
. While this will be discussed comprehensively and in full formal detail later, we here already look at what this definition entails for majorizable functions f of type
$1(\Omega )$
: a function
$f^*$
of type
$0(0)(0)$
is a majorant for f, written
$f^*\gtrsim _{1(\Omega )} f$
, if
Therefore, as
$1\gtrsim _\Omega x$
for any x, we in particular derive that
for any x, using the monotonicity of the coding of rational numbers (see, e.g., the discussion on p.430 in [Reference Kohlenbach26]). Thus, the real number represented by
$f(x)$
is uniformly bounded by
$f^*(1)(0)+1$
for any x and so any majorizable function of type
$1(\Omega )$
represents a bounded function
$\Omega \to \mathbb {R}$
. In other words, boundedness of integrable functions is suggested as a necessary assumption by the chosen proof-theoretic methodology.
Lastly, we will actually require that the integrated functions are not only
$T_2$
-measurable in the sense discussed above but that they even are weakly Borel-measurable in the sense discussed before (i.e., that the preimages of closed intervals are included in the underlying algebra). Clearly, any bounded and weakly Borel-measurable function is also
$T_2$
-measurable in the previous sense and thus integrable as discussed before. While this class is slightly restricted compared to that of all integrable functions, we find that it has two central advantages: For one, it allows for a particularly smooth and proof-theoretically tame approach to the integral that is amenable to bound extraction theorems. For another, in virtually all previous ad hoc proof mining applications to measure and probability theory involving integrals, already the stronger (or, in the context of
$\sigma $
-algebras, equivalent) property of being Borel-measurable is assumed so that a treatment of this class still allows for our metatheorems established later to provide a proof-theoretic explanation of the respective extractions. Moreover, this is in particular true for the applications discussed later in Section 9. In that way, the approach to the integral via this assumption should be understood to be merely indicative regarding the possibilities of the present formal approach, where various other measurability and integrability notions could similarly be accommodated.
Now, the formal approach to the integral over a probability content is to abstractly encode a suitable subspace of bounded and weakly Borel-measurable functions closed under linear combinations, multiplications with characteristic functions, and absolute valuesFootnote 15 intensionally via the use of a characteristic function and then to introduce the integral for all functions from this space as well as the relevant closure properties using further constants and axioms. For this, we now initially introduce two further constants
-
• I of type
$0(1(\Omega ))$
, -
•
$\left \lVert \cdot \right \rVert _\infty $
of type
$1(1(\Omega ))$
,
into the language of
$\mathcal {F}^\omega [\mathbb {P},\mathrm {Int},\mathrm {Inv}]$
. The first of these is the previously mentioned characteristic function providing an intensional account of a space closed under linear combinations, multiplications with characteristic functions and absolute values as well as containing only bounded and weakly Borel-measurable functions and the latter is used to formally introduce the supremum norm on these functionals. As initial axioms, we now therefore stipulate the following:
The axioms
$(I)_3$
and
$(I)_5$
will again be admissible later because of the extended notion of majorizability on
$\Omega $
and S. Note also that axioms
$(I)_4$
and
$(I)_5$
together specify
$\left \lVert f\right \rVert _\infty $
as the least upper bound on
$\vert f(x)\vert $
.Footnote
16
In the following, we will write
$\chi _A$
for the function
$\lambda x.(x\in A^c)$
.Footnote
17
Also, in the following we just briefly write
$\alpha f+ \beta g$
for
$\lambda x.(\alpha f(x)+\beta g(x))$
as well as
$\vert f\vert $
for
$\lambda x.(\vert f(x)\vert )$
and
$f\chi _A$
for
$\lambda x.\left (f(x)\chi _A(x)\right )$
.
As the operations
$\max $
and
$\min $
can be defined on functions using the absolute value via
we immediately get that axiom
$(I)_7$
implies the closure of I under these operations and thus we have effectively axiomatized that I in particular is a Riesz space of bounded functions with the respective operations and thus our approach is similar to how abstract integration spaces are approached in the context of Daniell integrals [Reference Daniell7] where a similar collection of functions is presumed.
To deal with the integral, we add a further constant
-
•
$\int {\cdot }\,\mathrm {d}\mathbb {P}$
of type
$1(1(\Omega ))$
.
The first two axioms for the integral are now that it behaves as expected on characteristic functions and that the integral is a linear function:
$$\begin{align} &\forall A^S\left( \int{\chi_A}\,\mathrm{d}\mathbb{P}=_{\mathbb{R}}\mathbb{P}(A)\right), \end{align}$$
$$\begin{align} &\forall f^{1(\Omega)}, g^{1(\Omega)},\alpha^1,\beta^1\left( f,g\in I\to \int{(\alpha f+\beta g)}\,\mathrm{d}\mathbb{P}=_{\mathbb{R}} \alpha\int{f}\,\mathrm{d}\mathbb{P}+\beta\int{g}\,\mathrm{d}\mathbb{P}\right). \end{align}$$
Using these two axioms, we immediately get that the integral behaves as expected on simple functions.
The major other statement that we need to axiomatize is that any function in I is actually integrable in the sense that its integral is well-defined and arises as the limit of a sequence of integrals of simple functions. In the context of our standing assumption that our functions are bounded and weakly Borel-measurable, we can in particular derive the following general result that guarantees this property, and inspires the subsequent axiom:
Lemma 7.2 (essentially folklore).
Let
$\Omega $
be a set, S an algebra on
$\Omega $
and
$\mu $
a probability content on S. Let f be a weakly Borel-measurable function and assume that
$\vert f\vert $
is bounded by
$b\in \mathbb {N}^*$
, that is,
$\vert f(x)\vert \leq b$
for all
$x\in \Omega $
. For a given k, define
$$\begin{align*}I_{k,i}=\left[-b+\frac{i}{2^k},-b+\frac{i+1}{2^k}\right) \text{ for }i=0,1,\dots,b2^{k+1}-2\text{ and }I_{k,b2^{k+1}-1}=\left[\frac{b2^k-1}{2^k},b\right]. \end{align*}$$
Then:
$$\begin{align*}\forall k\in\mathbb{N}\left(\int{\left\vert f-\sum_{i=0}^{b2^{k+1}-1}\left(-b+\frac{i}{2^{k}}\right)\chi_{f^{-1}(I_{k,i})}\right\vert}\,\mathrm{d}\mathbb{P}\leq 2^{-k}\right). \end{align*}$$
Proof. Let
$k\in \mathbb {N}$
. As all
$I_{k,i}$
are disjoint and cover
$[-b,b]$
and since
$\vert f\vert $
is bounded by b, their preimages under f are disjoint and cover
$\Omega $
. Thus
$$\begin{align*}1=\mathbb{P}(\Omega)=\mathbb{P}\kern1.5pt\left(\bigcup_{i=0}^{b2^{k+1}-1}f^{-1}(I_{k,i})\right)=\sum_{i=0}^{b2^{k+1}-1}\mathbb{P}(f^{-1}(I_{k,i})) \end{align*}$$
and for any
$k\in \mathbb {N}$
:
$$\begin{align*}f(x)=\sum_{i=0}^{b2^{k+1}-1} f(x)\chi_{f^{-1}(I_{k,i})}(x). \end{align*}$$
Further, for
$x\in f^{-1}(I_{k,i})$
, it clearly holds that
$$\begin{align*}\left\vert f(x)-\left(-b+\frac{i}{2^{k}}\right)\right\vert\leq \frac{1}{2^k}. \end{align*}$$
As k was arbitrary, the function f is integrable by Theorem 4.5.7 in [Reference Bhaskara Rao and Bhaskara Rao6] (use, e.g., the equivalence between (viii) and (v)). Thus, using the monotonicity and linearity of the integral on contents (see, e.g., Theorem 4.4.13 in [Reference Bhaskara Rao and Bhaskara Rao6]), we have
$$ \begin{align*} \int{\left\vert f-\sum_{i=0}^{b2^{k+1}-1}\left(-b+\frac{i}{2^{k}}\right)\chi_{f^{-1}(I_{k,i})}\right\vert}\,\mathrm{d}\mathbb{P}&=\int{\left\vert\sum_{i=0}^{b2^{k+1}-1}\left(f+b-\frac{i}{2^{k}}\right)\chi_{f^{-1}(I_{k,i})}\right\vert}\,\mathrm{d}\mathbb{P}\\&\leq \sum_{i=0}^{b2^{k+1}-1}\int{\left\vert f-\left(-b+\frac{i}{2^{k}}\right)\right\vert\chi_{f^{-1}(I_{k,i})}}\,\mathrm{d}\mathbb{P}\\&\leq \sum_{i=0}^{b2^{k+1}-1}\int{\frac{1}{2^k}\chi_{f^{-1}(I_{k,i})}}\,\mathrm{d}\mathbb{P}\\&\leq \sum_{i=0}^{b2^{k+1}-1}\frac{1}{2^k}\mathbb{P}(f^{-1}(I_{k,i}))\\&\leq \frac{1}{2^k}\sum_{i=0}^{b2^{k+1}-1}\mathbb{P}(f^{-1}(I_{k,i}))=\frac{1}{2^k}.\\[-47pt] \end{align*} $$
To axiomatize the integrability of a function
$f\in I$
, it thus suffices to state the conclusion of the above lemma and although we could formalize this directly by employing the general inverse mapping and intensional intervals, we can actually avoid this machinery here at the mild expense of quantifying over the sequence of sets used in the simple functions instead of explicitly specifying them. Concretely, we consider the following third axiomFootnote
18
$$\begin{align} \forall f^{1(\Omega)}\forall k^0\exists A^{S(0)}\left(f\in I\to \int{\left\vert f-\sum_{i=0}^{2^{k+1}b_f-1}\left(-b_f+\frac{i}{2^{k}}\right)\chi_{A(i)}\right\vert}\,\mathrm{d}\mathbb{P}\leq_{\mathbb{R}} 2^{-k}\right), \end{align}$$
where we wrote
$b_f:=\left \lVert f\right \rVert _\infty (0)+1$
and have used that, since
$\left \lVert f\right \rVert _\infty \geq _{\mathbb {R}} \vert f(x)\vert $
for all x, it holds that the natural number
$b_f$
similarly bounds
$\vert f\vert $
. Again, by the later considerations on majorizability whereby also A of type
$S(0)$
can be regarded as uniformly bounded, this axiom will later be admissible in the context of our approach to proof mining metatheorems via the monotone functional interpretation.
Lastly, to also devise a practical system, it will be convenient to also axiomatically include (instead of discussing how it might be provable in the system) that the integral of a positive function
$f\in I$
is positive. Naively, this statement can be written as
$$\begin{align*}\forall f^{1(\Omega)}\left( f\in I\land \forall x^\Omega\left( f(x)\geq_{\mathbb{R}} 0\right)\to \int{f}\,\mathrm{d}\mathbb{P}\geq_{\mathbb{R}} 0\right) \end{align*}$$
which is not a priori admissible in the context of our approach to bound extraction theorems due to the hidden negative universal type
$0$
quantifier in
$\geq _{\mathbb {R}}$
. However, if we rewrite the above statement in the prenexed form
$$\begin{align*}\forall f^{1(\Omega)}\forall k^0\exists x^\Omega\exists j^0\left( f\in I\land \int{f}\,\mathrm{d}\mathbb{P}<_{\mathbb{R}} -2^{-k}\to \left( f(x)\leq_{\mathbb{R}} -2^{-j}\right)\right), \end{align*}$$
we can witness the quantifier over j simply by
$j=k$
which can be immediately seen using only very basic properties of the integral (see, e.g., Theorem 4.4.13 in [Reference Bhaskara Rao and Bhaskara Rao6]) so that we arrive at the axiom
$$\begin{align} \forall f^{1(\Omega)}\forall k^0\exists x^\Omega\left( f\in I\land \int{f}\,\mathrm{d}\mathbb{P}<_{\mathbb{R}} -2^{-k}\to\left( f(x)\leq_{\mathbb{R}} -2^{-k}\right)\right) \end{align}$$
which, in the context of our perspective that we regard quantification over
$\Omega $
as bounded, will later be admissible in the context of the monotone functional interpretation.
We want to note that the axioms
${({\int })}_2$
–
${({\int })}_4$
roughly correspond to the four central properties of an abstract “I-integral” in the context of Daniell’s approach to integration [Reference Daniell7] and so, in some sense, our approach to the integral here can be regarded as a sort of effectivized implementation of the Daniell integral.
With all of these constants and axioms, we then arrive at the following system for integrals:
Definition 7.3. We write
$\mathcal {F}^\omega [\mathbb {P},\mathrm {Integral}]$
for the system resulting from
$\mathcal {F}^\omega [\mathbb {P},\mathrm {Int},\mathrm {Inv}]$
by adding the above constants
$I,\left \lVert \cdot \right \rVert _\infty ,\int {\cdot }\,\mathrm {d}\mathbb {P}$
together with the axioms
$(I)_1$
–
$(I)_8$
as well as
${({\int })}_1$
–
${({\int })}_4$
. Further, we write
$\mathcal {F}^\omega [\bigcup ,\mathbb {P},\mathrm {Integral}]$
for the system
$\mathcal {F}^\omega [\mathbb {P},\mathrm {Integral}]$
extended with the constant
$\bigcup $
and the axiom
${({\bigcup })}_1$
as well as the rule
${({\bigcup })}_2$
.
We end this section with some immediate properties of the integral over contents that are provable in our system. A more intricate use of the integrals will then be made in Section 9 where they (together with the boundedness and weak Borel-measurability) feature crucially in the formal explanation of a previous proof-mining application (and in particular highlight the usability of the above axiomatic approach to the integral with regard to the proof mining practice). As is common in proof mining, however, this approach to the integral enjoys a large degree of flexibility in the sense that it can, of course, be immediately augmented by further constants and axioms specifying other properties of the integral that might be crucial in a certain application, as also already mentioned before.
Lemma 7.4. The system
$\mathcal {F}^\omega [\mathbb {P},\mathrm {Integral}]$
proves:
-
1.
$\int {\cdot }\,\mathrm {d}\mathbb {P}$
is monotone w.r.t. pointwise inequality, that is,
$$\begin{align*}\forall f^{1(\Omega)}, g^{1(\Omega)}\left( f,g\in I\land \forall x^\Omega\left( f(x)\leq_{\mathbb{R}} g(x)\right) \to \int{f}\,\mathrm{d}\mathbb{P}\leq_{\mathbb{R}}\int{g}\,\mathrm{d}\mathbb{P}\right). \end{align*}$$
-
2.
$\int {\cdot }\,\mathrm {d}\mathbb {P}$
is extensional w.r.t. pointwise equality, that is,
$$\begin{align*}\forall f^{1(\Omega)},g^{1(\Omega)}\left( f,g\in I\land \forall x^\Omega\left( f(x)=_{\mathbb{R}}g(x)\right)\to \left\vert\int{f}\,\mathrm{d}\mathbb{P}-\int{g}\,\mathrm{d}\mathbb{P}\right\vert=_{\mathbb{R}}0\right). \end{align*}$$
-
3.
$\int {\cdot }\,\mathrm {d}\mathbb {P}$
is monotone w.r.t. inequality almost everywhere, that is,
$$ \begin{align*} \forall f^{1(\Omega)},g^{1(\Omega)}\bigg( f,g\in I\land & \exists A^S\left( \mathbb{P}(A)=0\land \forall x^\Omega\left( x\in A^c\to f(x)\leq_{\mathbb{R}}g(x)\right)\right)\\ & \to \int{f}\,\mathrm{d}\mathbb{P}\leq_{\mathbb{R}}\int{g}\,\mathrm{d}\mathbb{P}\bigg). \end{align*} $$
-
4.
$\int {\cdot }\,\mathrm {d}\mathbb {P}$
is extensional w.r.t. equality almost everywhere, that is,
$$ \begin{align*} \forall f^{1(\Omega)},g^{1(\Omega)}\bigg( f,g\in I\land & \exists A^S\left( \mathbb{P}(A)=0\land \forall x^\Omega\left( x\in A^c\to f(x)=_{\mathbb{R}}g(x)\right)\right)\\ & \to \left\vert\int{f}\,\mathrm{d}\mathbb{P}-\int{g}\,\mathrm{d}\mathbb{P}\right\vert=_{\mathbb{R}}0\bigg). \end{align*} $$
-
5.
$\int {\cdot }\,\mathrm {d}\mathbb {P}$
behaves well with absolute values, that is,
$$\begin{align*}\forall f^{1(\Omega)}\left( f\in I\to \left\vert\int{f}\,\mathrm{d}\mathbb{P}\right\vert\leq_{\mathbb{R}}\int{\vert f\vert}\,\mathrm{d}\mathbb{P}\right). \end{align*}$$
Proof.
-
1. If
$\forall x\left ( f(x)\leq g(x)\right )$
, note that we have
$(g-f)(x)\geq 0$
for all x. By axiom
${({\int })}_4$
, we have thus that
$\int {(g-f)}\,\mathrm {d}\mathbb {P}\geq 0$
and therefore, we get
$\int {f}\,\mathrm {d}\mathbb {P}\leq \int {g}\,\mathrm {d}\mathbb {P}$
by axiom
${({\int })}_2$
. -
2. This immediately follows from item (1).
-
3. By axiom
$(I)_8$
, we have
$f\chi _{A^c},g\chi _{A^c}\in I$
. As in particular
$f\chi _{A^c}(x)\leq g\chi _{A^c}(x)$
holds for any x, we get by item (1). Similarly, as the axioms for the supremum norm imply that
$$\begin{align*}\int{f\chi_{A^c}}\,\mathrm{d}\mathbb{P}\leq\int{g\chi_{A^c}}\,\mathrm{d}\mathbb{P} \end{align*}$$
$(f-g)\chi _A(x)\leq \left \lVert f-g\right \rVert _\infty \chi _A(x)$
for all x, item (1) together with axiom
${({\int })}_1$
implies which yields
$$\begin{align*}\int{(f-g)\chi_A}\,\mathrm{d}\mathbb{P}\leq\int{\left\lVert f-g\right\rVert_\infty\chi_A}\,\mathrm{d}\mathbb{P}=\left\lVert f-g\right\rVert_\infty\mathbb{P}(A)=0 \end{align*}$$
As
$$\begin{align*}\int{f\chi_A}\,\mathrm{d}\mathbb{P}\leq\int{g\chi_A}\,\mathrm{d}\mathbb{P}. \end{align*}$$
$f(x)= f\chi _A(x)+f\chi _{A^c}(x)$
holds for all x (and similarly for g), we thus in particular get the claim using axiom
${({\int })}_2$
.
-
4. This immediately follows from item (3).
-
5. Note that we have provably that
$-\vert f(x)\vert \leq f(x)\leq \vert f(x)\vert $
for any x so that by item (1) and axiom
${({\int })}_2$
, we have that is, that
$$\begin{align*}-\int{\vert f\vert}\,\mathrm{d}\mathbb{P}\leq\int{f}\,\mathrm{d}\mathbb{P}\leq\int{\vert f\vert}\,\mathrm{d}\mathbb{P}, \end{align*}$$
$\left \vert \int {f}\,\mathrm {d}\mathbb {P}\right \vert \leq \int {\vert f\vert }\,\mathrm {d}\mathbb {P}$
.
Note that by item (2) of the above lemma together with the axioms on the supremum norm
$\left \lVert \cdot \right \rVert _\infty $
, we in particular have that
$\int {\cdot }\,\mathrm {d}\mathbb {P}$
is extensional w.r.t.
$\left \lVert \cdot \right \rVert _\infty $
in the sense that
$$\begin{align*}\forall f^{1(\Omega)},g^{1(\Omega)}\left( \left\lVert f-g\right\rVert_\infty=_{\mathbb{R}}0\to \left\vert\int{f}\,\mathrm{d}\mathbb{P}-\int{g}\,\mathrm{d}\mathbb{P}\right\vert=_{\mathbb{R}}0\right). \end{align*}$$
8 A bound extraction theorem
We now establish our main results, the bound extraction theorems, for the systems introduced previously. For that, as hinted at in the introduction, we follow the approach of the first metatheorems using abstract types presented in [Reference Kohlenbach25] (see also [Reference Gerhardy and Kohlenbach12, Reference Kohlenbach26]).Footnote
19
As the outline of our approach is rather standard in that way, we will sometimes only sketch the arguments instead of giving full, detailed proofs, only spelling out those parts that are sensitive to the new ideas introduced in this paper. Throughout, to ease notation, we write
$\mathcal {C}^\omega $
for the system
$\mathcal {F}^\omega $
or one of its extensions as discussed previously.
As in the works mentioned above, the main tool for the metatheorems presented here is Gödel’s Dialectica interpretation [Reference Gödel13] which is combined with a negative translation by Kuroda [Reference Kuroda36]. We recall the definitions of those central proof interpretations here:
Definition 8.1 [Reference Gödel13, Reference Troelstra60].
The Dialectica interpretation
$A^D=\exists \underline {x}\forall \underline {y} A_D(\underline {x},\underline {y})$
of a formula A in the language of
$\mathcal {C}^\omega $
(and its extensions) is defined via the following recursion on the structure of the formula:
-
1.
$A^D:=A_D:=A$
for A being a prime formula.
If
$A^D=\exists \underline {x}\forall \underline {y} A_D(\underline {x},\underline {y})$
and
$B^D=\exists \underline {u}\forall \underline {v} B_D(\underline {u},\underline {v})$
, we set
-
2.
$(A\land B)^D:=\exists \underline {x},\underline {u}\forall \underline {y},\underline {v}(A\land B)_D$
where
$(A\land B)_D(\underline {x},\underline {u},\underline {y},\underline {v}):=A_D(\underline {x},\underline {y})\land B_D(\underline {u},\underline {v})$
, -
3.
$(A\lor B)^D:=\exists z^0,\underline {x},\underline {u}\forall \underline {y},\underline {v}(A\lor B)_D$
where
$(A\lor B)_D(z^0,\underline {x},\underline {u},\underline {y},\underline {v}):=(z=0\rightarrow A_D(\underline {x},\underline {y}))\land (z\neq 0\rightarrow B_D(\underline {u},\underline {v}))$
, -
4.
$(A\rightarrow B)^D:=\exists \underline {U},\underline {Y}\forall \underline {x},\underline {v}(A\rightarrow B)_D$
where
$(A\rightarrow B)_D(\underline {U},\underline {Y},\underline {x},\underline {v}):=A_D(\underline {x},\underline {Y}\underline {x}\underline {v})\to B_D(\underline {U}\underline {x},\underline {v})$
, -
5.
$(\exists z^\tau A(z))^D:=\exists z,\underline {x}\forall \underline {y}(\exists z^\tau A(z))_D$
where
$(\exists z^\tau A(z))_D(z,\underline {x},\underline {y}):=A_D(\underline {x},\underline {y},z)$
, -
6.
$(\forall z^\tau A(z))^D:=\exists \underline {X}\forall z,\underline {y}(\forall z^\tau A(z))_D$
where
$(\forall z^\tau A(z))_D(\underline {X},z,\underline {y}):=A_D(\underline {X}z,\underline {y},z)$
.
Definition 8.2 [Reference Kuroda36].
The negative translation of A is defined by
$A':=\neg \neg A^*$
where
$A^*$
is defined by the following recursion on the structure of A:
-
1.
$A^*:= A$
for prime A; -
2.
$(A\circ B)^*:= A^*\circ B^*$
for
$\circ \in \{\land ,\lor ,\rightarrow \}$
; -
3.
$(\exists x^\tau A)^*:=\exists x^\tau A^*$
; -
4.
$(\forall x^\tau A)^*:=\forall x^\tau \neg \neg A^*$
.
For the combination of these two interpretations, the following soundness result is one of the two central technical tools in the context of the proof of the proof mining metatheorems. In that context, we define
$\mathcal {C}^{\omega -}$
as the system
$\mathcal {C}^\omega $
without the schemas
$\textsf {QF}\mbox {-}\textsf {AC}$
and
$\textsf {DC}$
.
Lemma 8.3 (essentially [Reference Kohlenbach25]).
Let
$\mathcal {P}$
be a set of universal sentences and let
$A(\underline {a})$
be an arbitrary formula (with only the variables
$\underline {a}$
free) in the language of
$\mathcal {F}^\omega $
. Then the rule
$$\begin{align*}\begin{cases}\mathcal{F}^\omega+\mathcal{P}\vdash A(\underline{a})\Rightarrow\\ \mathcal{F}^{\omega-}+(\mathrm{BR})+\mathcal{P}\vdash\forall\underline{a},\underline{y}(A')_D(\underline{t}\underline{a},\underline{y},\underline{a})\end{cases} \end{align*}$$
holds where
$\underline {t}$
is a tuple of closed terms of
$\mathcal {F}^{\omega -}+(\mathrm {BR})$
which can be extracted from the respective proof and
$(\mathrm {BR})$
is the schema of simultaneous bar-recursion of Spector [Reference Spector and Dekker58], here extended to all types from
$T^{\Omega ,S}$
(similar to, e.g., [Reference Kohlenbach26]).
This result extends to any suitable extension of the language of
$\mathcal {F}^\omega $
(e.g., by any kind of new types and constants) together with any number of additional universal axioms in that language.
We omit the proof as it is almost exactly the same as the proof given for the analogous soundness result in [Reference Kohlenbach25] (although this result from [Reference Kohlenbach25] is of course not formulated for the system
$\mathcal {F}^\omega $
).
Besides the soundness of the Dialectica interpretation (together with the negative translation), the other one of the two central tools utilized in the metatheorems is that of majorizability. Originally introduced by Howard [Reference Howard and Troelstra19], the notion of majorizable functionals was later extended by Bezem [Reference Bezem5] to that of strongly majorizable functionals to provide a model for finite type arithmetic extended by the schema of bar recursion discussed before. In that way, this model of strongly majorizable functionals provides the crucial basis for proof mining metatheorems of systems allowing for dependent choice. In the context of the abstract types, we further need to consider an extension of this notion of strongly majorizable functionals to these new types. The first such extensions to abstract types have been devised in [Reference Gerhardy and Kohlenbach12, Reference Kohlenbach25]. However, in this setting (and essentially in all other settings for metatheorems proved afterwards), this extension was motivated and based on the metric structure assumed for the respective classes of spaces that were treated. We thus find ourselves here at a “fork in the road,” where we have to extend the notion of majorizability sensibly to our types
$\Omega $
and S, both representing spaces which do not carry any metric structure. The key insight, already mentioned and motivated throughout the previous sections many times (e.g., in the context of the admissibility of the axioms containing existential quantifiers over variables of types
$\Omega $
, S or
$S(0)$
, etc.) is to
-
1. majorize objects
$A^S$
by natural numbers bounding the measure of A, -
2. majorize objects
$x^\Omega $
by natural numbers bounding the measure of the full set
$\Omega $
in S.
In the case of a probability measure, any object of type
$\Omega $
or S is therefore uniformly majorized by
$1$
but with the phrasing of (1) and (2), we wanted to highlight the general idea of this approach as it might be feasible also for more general finite contents.
In any way, similar to [Reference Gerhardy and Kohlenbach12, Reference Kohlenbach25], the majorants for objects with types from
$T^{\Omega ,S}$
are objects with types from T according to the following extended projection:
Definition 8.4 (essentially [Reference Gerhardy and Kohlenbach12]).
Define
$\widehat {\tau }\in T$
, given
$\tau \in T^{\Omega ,S}$
, by recursion on the structure via
The majorizability relation
$\gtrsim $
is then defined in tandem with the structure of all strongly majorizable functionals.
Definition 8.5 (essentially [Reference Gerhardy and Kohlenbach12, Reference Kohlenbach25]).
Let
$\Omega $
be a nonempty set,
$S\subseteq 2^\Omega $
be an algebra and
$\mathbb {P}$
be a probability content on S. The structure
$\mathcal {M}^{\omega ,\Omega ,S}$
and the majorizability relation
$\gtrsim _\rho $
are defined by
$$\begin{align*}\begin{cases} \mathcal{M}_0:=\mathbb{N}, n\gtrsim_0 m:=n\geq m\land n,m\in\mathbb{N},\\ \mathcal{M}_\Omega:= \Omega, n\gtrsim_\Omega x:= n\geq \mathbb{P}(\Omega)\land n\in \mathcal{M}_0,x\in \mathcal{M}_\Omega,\\ \mathcal{M}_{S}:= S, n\gtrsim_{S} A:= n\geq \mathbb{P}(A)\land n\in \mathcal{M}_0,A\in \mathcal{M}_{S},\\ f\gtrsim_{\tau(\xi)}x:=f\in \mathcal{M}_{\widehat{\tau}}^{\mathcal{M}_{\widehat{\xi}}}\land x\in \mathcal{M}_\tau^{\mathcal{M}_\xi}\\ \phantom{f\gtrsim_{\tau(\xi)}x:=}\land\forall g\in \mathcal{M}_{\widehat{\xi}},y\in \mathcal{M}_\xi(g\gtrsim_\xi y\rightarrow fg\gtrsim_\tau xy)\\ \phantom{f\gtrsim_{\tau(\xi)}x:=}\land\forall g,y\in \mathcal{M}_{\widehat{\xi}}(g\gtrsim_{\widehat{\xi}}y\rightarrow fg\gtrsim_{\widehat{\tau}}fy),\\ \mathcal{M}_{\tau(\xi)}:=\left\{x\in \mathcal{M}_\tau^{\mathcal{M}_\xi}\mid \exists f\in \mathcal{M}^{\mathcal{M}_{\widehat{\xi}}}_{\widehat{\tau}}:f\gtrsim_{\tau(\xi)}x\right\}. \end{cases} \end{align*}$$
So, as already discussed previously, though only informally, as
$\mathbb {P}$
is a probability content, we have
$1\gtrsim _S A$
for any
$A\in S$
(as
$\mathbb {P}(A)\leq \mathbb {P}(\Omega )=1$
) as well as
$1\gtrsim _\Omega x$
for any
$x\in \Omega $
(but in the way the model is defined above, the definition immediately makes sense in the context of general finite contents).
Before we move on further, we now just quickly note that majorizability behaves nicely w.r.t. functions with multiple arguments as represented by their curried variants.
Lemma 8.6 ([Reference Gerhardy and Kohlenbach12, Reference Kohlenbach25], see also Kohlenbach [Reference Kohlenbach26, Lemma 17.80]).
Let
$\xi =\tau (\xi _k)\dots (\xi _1)$
. For
$x^*:\mathcal {M}_{\widehat {\xi _1}}\to (\mathcal {M}_{\widehat {\xi _2}}\to \dots \to \mathcal {M}_{\widehat {\tau }})\dots )$
and
$x:\mathcal {M}_{\xi _1}\to (\mathcal {M}_{\xi _2}\to \dots \to \mathcal {M}_\tau )\dots )$
, we have
$x^*\gtrsim _\xi x$
iff
-
1.
$\forall y_1^*,y_1,\dots ,y_k^*,y_k\left (\bigwedge _{i=1}^k(y^*_i\gtrsim _{\xi _i}y_i)\rightarrow x^*y^*_1\dots y^*_k\gtrsim _\tau xy_1\dots y_k\right )$
and -
2.
$\forall y_1^*,y_1,\dots ,y_k^*,y_k\left (\bigwedge _{i=1}^k(y^*_i\gtrsim _{\widehat {\xi _i}}y_i)\rightarrow x^*y^*_1\dots y^*_k\gtrsim _{\widehat {\tau }} x^*y_1\dots y_k\right )$
.
The other main structure featuring in the metatheorems is the structure of all set-theoretic functionals
$\mathcal {S}^{\omega ,\Omega ,S}$
, defined via
$\mathcal {S}_0:=\mathbb {N}$
,
$\mathcal {S}_\Omega := \Omega $
,
$\mathcal {S}_{S}:=S$
and
Both structures
$\mathcal {S}^{\omega ,\Omega ,S}$
and
$\mathcal {M}^{\omega ,\Omega ,S}$
later turn into models of our systems if equipped with corresponding interpretations for the respective additional constants, with
$\mathcal {S}^{\omega ,\Omega ,S}$
serving as the structure for the intended standard models.
The proof of the bound extraction theorems now follows the following general high-level outline of most other such metatheorems: using functional interpretation and negative translation, one extracts realizers from (essentially)
$\forall \exists $
-theorems. These realizers have types from
$T^{\Omega ,S}$
. We then use majorizability to construct bounds for these realizers, depending only on majorants of the parameters, which are validated in a model based on
$\mathcal {M}^{\omega ,\Omega ,S}$
. In a final step, we can then recover to the truth in a model based on the usual full set-theoretic structure
$\mathcal {S}^{\omega ,\Omega ,S}$
if the types occurring in the axioms and the theorem are “low enough,” which we will call admissible. Concretely, following [Reference Gerhardy and Kohlenbach12, Reference Kohlenbach25], we introduce the following specific classes of types: We call a type
$\xi $
of degree n if
$\xi \in T$
and it has degree
$\leq n$
in the usual sense (see, e.g., [Reference Kohlenbach26]). Further we call
$\xi $
small if it is of the form
$\xi =\xi _0(0)\dots (0)$
for
$\xi _0\in \{0,\Omega ,S\}$
(including
$0,\Omega ,S$
) and call it admissible if it is of the form
$\xi =\xi _0(\tau _k)\dots (\tau _1)$
where each
$\tau _i$
is small and
$\xi _0\in \{0,\Omega ,S\}$
(also including
$0,\Omega ,S$
).
Further, also in analogy to [Reference Gerhardy and Kohlenbach12, Reference Kohlenbach25], we define certain subclasses of formulas satisfying certain type restrictions: A formula A is called a
$\forall $
-formula if
$A=\forall \underline {a}^{\underline {\xi }} A_{qf}(\underline {a})$
with
$A_{qf}$
quantifier-free and all types
$\xi _i$
in
$\underline {\xi }=(\xi _1,\dots ,\xi _k)$
are admissible. A formula A is called an
$\exists $
-formula if
$A=\exists \underline {a}^{\underline {\xi }}A_{qf}(\underline {a})$
with similar
$A_{qf}$
and
$\underline {\xi }$
.
The class
$\Delta $
already mentioned previously, which was originally introduced in [Reference Kohlenbach20, Reference Kohlenbach21] (and then lifted to abstract types in [Reference Günzel and Kohlenbach16]) to signify a collection of commonly occurring formulas with trivial monotone functional interpretations, is now similarly introduced in the context of the systems studied in this paper: a formula of type
$\Delta $
is any formula of the form
where
$A_{qf}$
is quantifier-free, the types in
$\underline {\delta }$
,
$\underline {\sigma }$
, and
$\underline {\gamma }$
are admissible,
$\underline {r}$
is a tuple of closed terms of appropriate type,
$\leq $
is defined by recursion on the type via
-
1.
$x\leq _0 y:=x\leq _0 y$
, -
2.
$x\leq _\Omega y:=\mathbb {P}(\Omega )\leq _{\mathbb {R}}\mathbb {P}(\Omega )$
, -
3.
$A\leq _{S} B:=\mathbb {P}(A)\leq _{\mathbb {R}}\mathbb {P}(B)$
, -
4.
$x\leq _{\tau (\xi )} y:=\forall z^\xi (xz\leq _\tau yz)$
,
and
$\underline {x}\leq _{\underline {\sigma }}\underline {y}$
is an abbreviation for
$x_1\leq _{\sigma _1}y_1\land \dots \land x_k\leq _{\sigma _k}y_k$
where
$\underline {x}$
,
$\underline {y}$
, and
$\underline {\sigma }$
are k-tuples of terms and types, respectively, such that
$x_i$
and
$y_i$
are of type
$\sigma _i$
.
Given a set
$\beth $
of formulas of type
$\Delta $
, we write
$\widetilde {\beth }$
for the set of all Skolem normal forms
for any
$\forall \underline {a}^{\underline {\delta }}\exists \underline {b}\leq _{\underline {\sigma }}\underline {r}\underline {a}\forall \underline {c}^{\underline {\gamma }}A_{qf}(\underline {a},\underline {b},\underline {c})$
in
$\beth $
.
Remark 8.7. We want to note briefly that all axioms that were previously discussed as admissible based on our extended notion of majorizability can actually be seen as statements of type
$\Delta $
in the context of the above definition. At first, the axiom
$(\mathbb {P})_4$
can be equivalently rewritten as
and is thus immediately of type
$\Delta $
.Footnote
20
Second, also the axiom
$(I)_3$
can be rewritten with the additional boundedness information via
and thus is of type
$\Delta $
. Similarly, also the axiom
$(I)_5$
can be rewritten as an axiom of type
$\Delta $
as
Lastly, also the integrability axioms
${({\int })}_4$
and
${({\int })}_3$
can be rewritten as
$$\begin{align*}\forall f^{1(\Omega)}, k^0\exists x^\Omega\leq_\Omega c_\Omega\left( f\in I\land \int{f}\,\mathrm{d}\mathbb{P}<_{\mathbb{R}} -2^{-k}\to \left( f(x)\leq_{\mathbb{R}} -2^{-k}\right)\right) \end{align*}$$
and
$$ \begin{align*} \forall f^{1(\Omega)}, k^0\exists A^{S(0)}\leq_{S(0)}\lambda n^0.\Omega\left(f\in I\to \int{\left\vert f-\sum_{i=0}^{2^{k+1}b_f-1}\left(-b_f+\frac{i}{2^{k}}\right)\chi_{A(i)}\right\vert}\,\mathrm{d}\mathbb{P}\leq_{\mathbb{R}} 2^{-k}\right), \end{align*} $$
respectively, with
$b_f:=\left \lVert f\right \rVert _\infty (0)+1$
as before, which turns them into axioms of type
$\Delta $
.
Crucially, axioms of type
$\Delta $
are trivialized under the monotone functional interpretationFootnote
21
and we treat any axiom of type
$\Delta $
in
$\mathcal {C}^\omega $
(or any suitable extension) “in this spirit.” We here only write “in this spirit” as we actually do not use a monotone variant of the Dialectica interpretation but treat the functional interpretation part and the majorization part of the combined interpretation separately. In that way, we follow the approach given in [Reference Günzel and Kohlenbach16] (see also the recent [Reference Pischke51]) and treat axioms of type
$\Delta $
by employing a construction that converts a theory with axioms of such a type into a theory using only additional purely universal axioms formulated using the Skolem functions of these axioms. This new theory is then used in combination with the functional interpretation to extract the respective terms and then the proof proceeds as outlined before.
Concretely, we now proceed as follows: Let
$\beth $
be a set of axioms of type
$\Delta $
and write
$\widehat {\mathcal {C}}^\omega $
for
$\mathcal {C}^\omega $
without any of its axioms of type
$\Delta $
. Then, we form a new theory
$\overline {\mathcal {C}}^\omega _\beth $
from
$\widehat {\mathcal {C}}^\omega $
by adding the Skolem functionals
$\underline {B}$
of any axiom of type
$\Delta $
of
$\mathcal {C}^\omega +\beth $
, say of the form
as new constants to the language and simultaneously adding the corresponding “instantiated Skolem normal form,” that is,
as a new axiom. Therefore, the system
$\overline {\mathcal {C}}^\omega _\beth $
only extends
$\mathcal {F}^\omega $
by new types, constants, and universal axioms. Therefore, as mentioned before, Lemma 8.3 also applies to this system where the conclusion is then proved in
$\overline {\mathcal {C}}^{\omega -}_\beth +(\mathrm {BR})$
, that is,
$\overline {\mathcal {C}}^{\omega }_\beth $
with the principles
$\textsf {QF}\mbox {-}\textsf {AC}$
and
$\textsf {DC}$
removed and where the scheme of simultaneous bar-recursion is added.
In the case where the extension
$\mathcal {C}^\omega $
contains the rule
${({\bigcup })}_2$
, we for simplicity assume that in the process of forming the extended theory, this rule is also removed in the sense that
$\overline {\mathcal {C}}^{\omega }_\beth $
does not contain the rule and for any provable premise
we add the corresponding conclusion
as an axiom of
$\overline {\mathcal {C}}^{\omega }_\beth $
.
We now move on to the central result of the majorization part of the chosen approach to the bound extraction theorems, stating that every closed term in the underlying language of the system in question is majorizable. As such, the result is similar to Lemma 9.11 in [Reference Gerhardy and Kohlenbach12].
Lemma 8.8. Let
$\mathcal {C}^\omega $
be (one of the previously discussed extensions of) the system
$\mathcal {F}^\omega [\mathbb {P}]$
and let
$\beth $
be a set of additional axioms of type
$\Delta $
. Let
$\Omega $
be a nonempty set and let
$S\subseteq 2^\Omega $
be an algebra (or, in the context of the constant
$\bigcup $
, a
$\sigma $
-algebra). Let
$\mathbb {P}$
be a probability content on S. Then
$\mathcal {M}^{\omega ,\Omega ,S}$
is a model of
$\overline {\mathcal {C}}^{\omega -}_\beth +(\mathrm {BR})$
, provided
$\mathcal {S}^{\omega ,\Omega ,S}\models \beth $
(with
$\mathcal {M}^{\omega ,\Omega ,S}$
and
$\mathcal {S}^{\omega ,\Omega ,S}$
defined via suitable interpretations of the additional constants in
$\mathcal {C}^\omega $
). Moreover, for any closed term t of
$\overline {\mathcal {C}}^{\omega -}_\beth +(\mathrm {BR})$
, one can construct a closed term
$t^*$
of
$\mathcal {A}^\omega +(\mathrm {BR})$
such that
Proof. The structure of the proof is standard and similar to proofs of related results from the literature (see, e.g., [Reference Kohlenbach26]). As such, we only discuss the interpretations of the new constants added to
$\mathcal {A}^\omega $
to form the respective theories as well as their majorizations. For the constants already contained in
$\mathcal {A}^\omega $
, we may choose suitable interpretations as in [Reference Kohlenbach26] and for majorizing a composition of terms, we may similarly proceed as outlined therein. For that, we now first focus on
$\mathcal {F}^\omega [\mathbb {P}]$
and assume that there are no further axioms of type
$\Delta $
beyond those already contained in
$\mathcal {F}^\omega [\mathbb {P}]$
. For any
$\mathcal {C}^\omega $
, we deal with any set
$\beth $
of additional axioms of type
$\Delta $
and the respectively induced constants later on by moving to the theory
$\overline {\mathcal {C}}^\omega _\beth $
.
Now, for the new constants added to
$\mathcal {A}^\omega $
to form
$\mathcal {F}^\omega [\mathbb {P}]$
, we consider the following interpretations (writing
$\mathcal {M}$
for
$\mathcal {M}^{\omega ,\Omega ,S}$
):
-
•
$[\mathrm {eq}]_{\mathcal {M}}:=\text {the characteristic function of the equality relation in }\Omega $
; -
•
$[\in ]_{\mathcal {M}}:=\text {the characteristic function of the element relation in }S$
; -
•
$[\cup ]_{\mathcal {M}}:=\text {union in }S$
; -
•
$[(\cdot )^c]_{\mathcal {M}}:=\text {complement in }S$
; -
•
$[\emptyset ]_{\mathcal {M}}:=\text {the empty set in }S$
; -
•
$[\mathbb {P}]_{\mathcal {M}}:= \lambda A^S.(\mathbb {P}(A))_\circ $
where
$\mathbb {P}$
is the content fixed in the context of this lemma.
This is only well-defined in
$\mathcal {M}^{\omega ,\Omega ,S}$
if we can construct majorants of these objects. This we can do as follows:
-
•
$\lambda x^0,y^0.1\gtrsim \mathrm {eq}$
; -
•
$\lambda x^0,y^0.1\gtrsim \, \in $
; -
•
$\lambda x^0,y^0.1\gtrsim \cup $
; -
•
$\lambda x^0. 1\gtrsim (\cdot )^c$
; -
•
$0\gtrsim \emptyset $
; -
•
$\lambda x^0.(x)_\circ \gtrsim \mathbb {P}$
.
Note that in the last item, the operation
$(x)_\circ $
is definable in
$\mathcal {A}^\omega $
via a closed term as x is of type
$0$
.
For justifying that those terms really are majorants of the respective constants, we argue as follows: The first four items immediately follow from the fact that
$\mathbb {P}(A)\leq \mathbb {P}(\Omega )=1$
(i.e., that
$\mathbb {P}$
is a probability content) and that
$\mathbb {P}(\emptyset )=_{\mathbb {R}}0$
. The last item follows immediately from Lemma 2.1 as clearly, if
$x\geq _{\mathbb {R}}\mathbb {P}(A)$
, then
$(x)_\circ \gtrsim (\mathbb {P}(A))_\circ $
.
In the case where
$\mathcal {C}^\omega $
contains the respective additional constant
$\bigcup $
, a corresponding interpretation is naturally defined by
-
•
$[\bigcup ]_{\mathcal {M}}:=\text {countably infinite union in S}$
,
which is well-defined since we in this context assume that S is a
$\sigma $
-algebra. We can achieve majorization as before by exploiting that
$\mathbb {P}$
is a finite content with
-
•
$\lambda f^{0(0)}.1\gtrsim \bigcup $
.
Lastly, if the system
$\mathcal {C}^\omega $
contains the respective constants and axioms for treating integrals, we choose corresponding interpretations of the additional constants as follows:
-
•
$\big [[\cdot ,\cdot ]\big ]_{\mathcal {M}}:=\lambda a^1,b^1,x^1.\begin {cases}0&\text {if }r_x\in [r_a,r_b];\\1&\text {otherwise};\end {cases}$
-
•
$[(\cdot )^{-1}]_{\mathcal {M}}:=\lambda f^{1(\Omega )},A^{0(1)},x^\Omega .\begin {cases}0&\text {if }x\in f^{-1}(\{r_a\mid a^1: A(a)=_00\});\\1&\text {otherwise};\end {cases}$
-
•
$[I]_{\mathcal {M}}:=$
the characteristic function of a set of bounded and weakly Borel-measurable functions
$f^{1(\Omega )}$
which is closed under linear combinations, multiplication with characteristic functions, and absolute values; -
•
$[\left \lVert \cdot \right \rVert _\infty ]_{\mathcal {M}}:=\lambda f^{1(\Omega )}.\begin {cases}(\sup _{x\in \Omega }\vert f(x)\vert )_\circ &\text {if }f\text { is bounded};\\0&\text {otherwise};\end {cases}$
-
•
$[\int {\cdot }\,\mathrm {d}\mathbb {P}]_{\mathcal {M}}:=\lambda f^{1(\Omega )}.\begin {cases}(\int {f}\,\mathrm {d}\mathbb {P})_\circ &\text {if }f\text { is bounded and weakly Borel-measurable};\\0&\text {otherwise};\end {cases}$
where the latter
$\int {f}\,\mathrm {d}\mathbb {P}$
represents the usual integral defined over the content.
Note that here, we now rely on the extended operator
$(\cdot )_\circ $
operating on all real numbers as the integral of a general integrable function may be negative. As for majorization, we rely on the following constructions:
-
•
$\lambda a^1,b^1,r^1.1\gtrsim [\cdot ,\cdot ]$
; -
•
$\lambda f^{1(0)},A^{0(1)},x^0.1\gtrsim (\cdot )^{-1}$
; -
•
$\lambda f^{1(0)}.1\gtrsim I$
; -
•
$\lambda f^{1(0)}.(f(1)(0)+1)_\circ \gtrsim \left \lVert \cdot \right \rVert _\infty $
; -
•
$\lambda f^{1(0)}.(f(1)(0)+1)_\circ \gtrsim \int {\cdot }\,\mathrm {d}\mathbb {P}$
.
The first three items are immediate as we deal with characteristic functions. For the fourth, note that if
${f^*}^{1(0)}\gtrsim f^{1(\Omega )}$
, then
and as
$1\gtrsim x$
for any
$x^\Omega $
as
$1=\mathbb {P}(\Omega )$
, we have
$f^*(1)\gtrsim f(x)$
for any
$x^\Omega $
. Therefore, we have
for any
$x^\Omega $
, by the monotonicity of the coding of rational numbers (again, see, e.g., the discussion on p.430 in [Reference Kohlenbach26]). This implies
$f^*(1)(0)+1\geq _{\mathbb {R}} \left \lVert f\right \rVert _\infty $
so that the result follows from Lemma 2.1.
Lastly, note that for any bounded and weakly Borel-measurable function f, we have that
$\vert \int {f}\,\mathrm {d}\mathbb {P}\vert \leq _{\mathbb {R}}\int {\vert f\vert }\,\mathrm {d}\mathbb {P}$
so that
$$\begin{align*}\left\vert \int{f}\,\mathrm{d}\mathbb{P}\right\vert\leq_{\mathbb{R}}\int{\vert f\vert}\,\mathrm{d}\mathbb{P}\leq_{\mathbb{R}} \left\lVert\vert f\vert\right\rVert_\infty=_{\mathbb{R}}\left\lVert f\right\rVert_\infty\leq_{\mathbb{R}} f^*(1)(0)+1 \end{align*}$$
as before. The majorizability result then follows again from Lemma 2.1.
That
$\mathcal {M}^{\omega ,\Omega ,S}$
with these chosen interpretations is a model of
$\mathcal {C}^{\omega -}+(\mathrm {BR})$
can be shown similarly to analogous results (see, e.g., [Reference Kohlenbach26]). The intended interpretations of the constants of
$\mathcal {C}^\omega $
in
$\mathcal {S}^{\omega ,\Omega ,S}$
, turning
$\mathcal {S}^{\omega ,\Omega ,S}$
into a model of these systems, are defined in analogy to the corresponding model
$\mathcal {M}^{\omega ,\Omega ,S}$
defined above.
For treating the other additional axioms in
$\mathcal {C}^\omega +\beth $
of type
$\Delta $
beyond the axioms already contained in
$\mathcal {C}^\omega $
, we rely on the following argument (akin to [Reference Günzel and Kohlenbach16], Lemma 5.11) showing that
$\mathcal {S}^{\omega ,\Omega ,S}\models \beth $
implies
$\mathcal {M}^{\omega ,\Omega ,S}\models \widetilde {\beth }$
. For this, the proof given in [Reference Günzel and Kohlenbach16] for Lemma 5.11 carries over which we sketch here: While
$\mathcal {M}^{\omega ,\Omega ,S}$
in general is not a model of the axiom of choice [Reference Kohlenbach22], one can show (similar to [Reference Kohlenbach22]) that
$\mathcal {M}^{\omega ,\Omega ,S}\models \mathsf {b}\text {-}\mathsf {AC}_{\Omega ,S}$
where
$$\begin{align*}\mathsf{b}\text{-}\mathsf{AC}_{\Omega,S}:=\bigcup_{\delta,\rho\in T^{\Omega,S}}\mathsf{b}\text{-}\mathsf{AC}^{\delta,\rho} \end{align*}$$
with
Now, for small types
$\rho $
, we have
$M_\rho =S_\rho $
while for admissible types
$\rho $
, we have
$M_\rho \subseteq S_\rho $
(for which it is important that admissible types take arguments of small types). For this, the proof given in [Reference Gerhardy and Kohlenbach12] carries over. Further, we need that it is provable in
$\mathcal {C}^{\omega -}$
that
holds for all types
$\rho $
which can be shown similar to, for example, [Reference Kohlenbach26].
Suppose now that
Then also
$\mathcal {M}^{\omega ,\Omega ,S}$
is a model of this sentence: First the types of the variables which are universally quantified are admissible, so over
$\mathcal {M}^{\omega ,\Omega ,S}$
the domain of the universal quantifiers is reduced. For the witnesses for
$\underline {b}$
, which exist in
$\mathcal {S}^{\omega ,\Omega ,S}$
, note first that these could potentially live in
$\mathcal {M}^{\omega ,\Omega ,S}$
as the types of the variables in
$\underline {b}$
are admissible, that is, they take arguments of small types and map into small types. It thus only remains to be seen whether such a witness is majorizable for majorizable inputs. However, by the above argument, the terms in
$\underline {r}$
are all majorizable and if
$\underline {a}$
comes from
$\mathcal {M}^{\omega ,\Omega ,S}$
, then
$\underline {r}\underline {a}$
is majorizable. That we have
$\underline {b}\leq _{\underline {\sigma }}\underline {r}\underline {a}$
in
$\mathcal {M}^{\omega ,\Omega ,S}$
now implies that
$\underline {b}$
is majorizable by
$(\dagger )$
(and consequently the corresponding interpretations exist in
$\mathcal {M}^{\omega ,\Omega ,S}$
too). Lastly, it is rather immediate to see that
$\mathcal {M}^{\omega ,\Omega ,S}\models \beth $
implies
$\mathcal {M}^{\omega ,\Omega ,S}\models \widetilde {\beth }$
using
$\mathsf {b}\text {-}\mathsf {AC}_{\Omega ,S}$
.
From
$\mathcal {M}^{\omega ,\Omega ,S}\models \widetilde {\beth }$
, we immediately get that the corresponding Skolem functions have interpretations in
$\mathcal {M}^{\omega ,\Omega ,S}$
, that the corresponding structures defined by some canonical interpretations of those additional constants are indeed models of those variants of the systems where the corresponding Skolem functionals of these axioms are added and where the axioms themselves are replaced by their instantiated Skolem normal forms (i.e.,
$\overline {\mathcal {C}}^{\omega -}_\beth $
and its extensions) and, lastly, that the above majorizability result extends to these systems.
Note that, technically, these arguments were already needed in the above considerations to see that
$\mathcal {M}^{\omega ,\Omega ,S}$
really is a model of
$\mathcal {C}^{\omega -}+(\mathrm {BR})$
. However, we did not discuss this there explicitly as for those specific axioms of type
$\Delta $
belonging to
$\mathcal {C}^{\omega -}+(\mathrm {BR})$
, the types of the variables occurring in them are not only small but actually all among
$\{0,1,\Omega ,S,S(0)\}$
so that it was immediately clear that the models coincide at that level (essentially just by definition) and we thus omitted such a general discussion there.
Combined with the Dialectica interpretation, the main result we then arrive at is the following bound extraction result for classical proofs:
Theorem 8.9. Let
$\mathcal {C}^\omega $
be (one of the previously discussed extensions of) the system
$\mathcal {F}^\omega [\mathbb {P}]$
and let
$\beth $
be a set of formulas of type
$\Delta $
. Let
$\tau $
be admissible,
$\delta $
be of degree
$1$
and s be a closed term of
$\mathcal {C}^\omega $
of type
$\sigma (\delta )$
for admissible
$\sigma $
and let
$B_\forall (x,y,z,u)$
/
$C_\exists (x,y,z,v)$
be
$\forall $
-/
$\exists $
-formulas of
$\mathcal {C}^{\omega }$
with only
$x,y,z,u$
/
$x,y,z,v$
free. If
then one can extract a partial functional
$\Phi :\mathcal {S}_{\delta }\times \mathcal {S}_{\widehat {\tau }}\rightharpoonup \mathbb {N}$
which is total and (bar-recursively) computable on
$\mathcal {M}_\delta \times \mathcal {M}_{\widehat {\tau }}$
and such that for all
$x\in \mathcal {S}_\delta $
,
$z\in \mathcal {S}_\tau $
,
$z^*\in \mathcal {S}_{\widehat {\tau }}$
, if
$z^*\gtrsim z$
, then
holds whenever
$\mathcal {S}^{\omega ,\Omega ,S}\models \beth $
for
$\mathcal {S}^{\omega ,\Omega ,S}$
defined via any nonempty set
$\Omega $
and any algebra
$S\subseteq 2^\Omega $
(or, in the context of the constant
$\bigcup $
, any
$\sigma $
-algebra) together with any probability content
$\mathbb {P}$
on S (and with suitable interpretations of the additional constants). Further:
-
1. If
$\widehat {\tau }$
is of degree
$1$
, then
$\Phi $
is a total computable functional. -
2. We may have tuples instead of single variables
$x,y,z,u,v$
and a finite conjunction instead of a single premise
$\forall u^0 B_\forall (x,y,z,u)$
. -
3. If the claim is proved without
$\textsf {DC}$
, then
$\tau $
may be arbitrary and
$\Phi $
will be a total functional on
$\mathcal {S}_\delta \times \mathcal {S}_{\widehat {\tau }}$
which is primitive recursive in the sense of Gödel [Reference Gödel13] and Hilbert [Reference Hilbert18]. In that case, also plain majorization can be used instead of strong majorization (see, e.g., [Reference Kohlenbach26]).
Proof. First, assume that
$\mathcal {S}^{\omega ,\Omega ,S}\models \beth $
and (for simplicity) that
Clearly, the same statement is then also provable in
$\overline {\mathcal {C}}^\omega _\beth $
. By assumption,
$B_\forall (z,u)=\forall \underline {a} B_{qf}(z,u,\underline {a})$
and
$C_\exists (z,v)=\exists \underline {b} C_{qf}(z,v,\underline {b})$
for quantifier-free
$B_{qf}$
and
$C_{qf}$
. Thus, by prenexiation, we get
Using Lemma 8.3 (which is applicable as
$\overline {\mathcal {C}}^\omega _\beth $
is an extension of
$\mathcal {F}^\omega $
only by new constants and purely universal axioms) and disregarding the realizers for
$\underline {a},\underline {b}$
, we get closed terms
$t_u,t_v$
of
$\overline {\mathcal {C}}^{\omega -}_\beth +(\mathrm {BR})$
such that
By Lemma 8.8 there are closed terms
$t^*_u,t^*_v$
of
$\mathcal {A}^\omega +(\mathrm {BR})$
such that
for all nonempty sets
$\Omega $
, any algebra (or
$\sigma $
-algebra)
$S\subseteq 2^\Omega $
and any probability content
$\mathbb {P}$
on S and where the constants are interpreted as in Lemma 8.8. Define
Then
holds for all
$z\in \mathcal {M}_\tau $
and
$z^*\in \mathcal {M}_{\widehat {\tau }}$
with
$z^*\gtrsim z$
. The conclusion that
$\mathcal {S}^{\omega ,\Omega ,S}$
satisfies the same sentence can be achieved as in the proof of Theorem 17.52 in [Reference Kohlenbach26] which we sketch here: Note that in the conclusion, we restrict ourselves to those z which have majorants
$z^*$
. As the type of z is admissible, it takes arguments of small type for which
$\mathcal {M}^{\omega ,\Omega ,S}$
and
$\mathcal {S}^{\omega ,\Omega ,S}$
coincide (recall the proof of Lemma 8.8). Therefore, any such
$z,z^*$
from
$\mathcal {S}^{\omega ,\Omega ,S}$
also live in
$\mathcal {M}^{\omega ,\Omega ,S}$
so that
$\Phi (z^*)$
is well-defined for
$z,z^*$
belonging to
$\mathcal {S}^{\omega ,\Omega ,S}$
with
$z^*\gtrsim z$
. In
$B_\forall $
, all types are admissible so that truth in
$\mathcal {S}^{\omega ,\Omega ,S}$
implies truth in
$\mathcal {M}^{\omega ,\Omega ,S}$
and similarly for
$C_\exists $
where thus truth in
$\mathcal {M}^{\omega ,\Omega ,S}$
implies truth in
$\mathcal {S}^{\omega ,\Omega ,S}$
. Lastly, as in Lemma 17.84 in [Reference Kohlenbach26], we can show that as
$\Phi $
is of type
$0(\widehat {\tau })$
, the interpretations of
$\Phi $
in
$\mathcal {S}^{\omega ,\Omega ,S}$
and
$\mathcal {M}^{\omega ,\Omega ,S}$
coincide on majorizable elements. All in all, the arguments above imply that
holds for all
$z\in S_\tau $
and
$z^*\in S_{\widehat {\tau }}$
with
$z^*\gtrsim z$
.
The additional
$\forall x^\delta \forall y\leq _\sigma s(x)$
can be treated as, for example, discussed in [Reference Kohlenbach25] and we thus omit any details. Similarly, item (1) can be shown as in the proof of Theorem 17.52 from [Reference Kohlenbach26] (see page 428 therein). Further, (2) is immediate and (3) follows from the fact that without
$\textsf {DC}$
, bar recursion becomes superfluous, and the model
$\mathcal {M}^{\omega ,\Omega ,S}$
can be avoided.
9 Applications of the metatheorems
In this section, we are now concerned with the applications of the above metatheorems. Concretely, we want to indicate how the systems introduced prior can be used together with their metatheorems to recognize previous (ad hoc) applications in the spirit of the proof mining program as proper instances of proof-theoretic bound extraction theorems. For this, we here focus on the seminal work [Reference Avigad, Dean and Rute2] by Avigad, Dean, and Rute. There, we find that the quantitative results obtained in [Reference Avigad, Dean and Rute2] for Egorov’s theorem, as well as the dominated convergence theorem, although in general being of bar-recursive complexity (which is due to the use of a certain principle of countable choice as we will later see formally), are nevertheless highly uniform, being in particular independent of the space, the measurable sets, and the measure. As already mentioned in the introduction, the authors of [Reference Avigad, Dean and Rute2] presumed that this independence can be explained using some instantiation of the notion of majorizability. In that way, the metatheorems proved before and the discussion in this section show that this intuition was correct and that the uniformities are a necessary consequence of the novel form of majorizability introduced in this paper.
However, as already discussed in the introduction, we want to mention that the focus on the work [Reference Avigad, Dean and Rute2] shall be understood to be merely indicative of the usefulness of the systems and metatheorems discussed before. In particular, essentially all other quantitative works on probability theory in the spirit of proof mining that have so far been considered in the literature can be similarly recognized as instances of the metatheorems proved here. Similarly, there too, the peculiar uniformities observed in practice are a priori guaranteed by the approach towards the metatheorems chosen here.
Now, the work [Reference Avigad, Dean and Rute2] is concretely concerned with interrelations between different modes of convergence for sequences of random variables. The most prominent of these modes, also based on its similarity to a usual notion of pointwise convergence of functions, is that of almost sure convergence.
Definition 9.1 (Almost sure convergence).
Let
$(\Omega ,S,\mathbb {P})$
be a probability space and
$(X_n)$
be a sequence of random variables
$X_n:\Omega \to \mathbb {R}$
(i.e.,
$X_n$
is measurable w.r.t. S and the Borel
$\sigma $
-algebra on
$\mathbb {R}$
). Then
$(X_n)$
is said to converge almost surely to a random variable
$X:\Omega \to \mathbb {R}$
if
This notion of almost sure convergence does not lend itself easily to a quantitative account of that convergence. Thus, in many cases where probability theorists are concerned with quantitative results (see, e.g., [Reference Luzia39, Reference Siegmund55]), they opt for a different, but equivalent, formulation of almost sure convergence known as almost uniform convergence.
Definition 9.2 (Almost uniform convergence).
Let
$(\Omega ,S,\mathbb {P})$
be a probability space and
$(X_n)$
be a sequence of random variables
$X_n:\Omega \to \mathbb {R}$
. Then
$(X_n)$
is said to converge almost uniformlyFootnote
22
to a random variable
$X:\Omega \to \mathbb {R}$
if for all
$\varepsilon , \delta>0$
there exists an
$N \in \mathbb {N}$
such that
The seminal result that these two notions of convergence are indeed equivalent is known as Egorov’s theorem [Reference Egoroff9]. Note that since
$(\Omega ,S,\mathbb {P})$
is a probability space and the
$X_n$
’s are random variables (and therefore measurable), the sets we take the probability of in the above definitions are measurable sets.
This result was analyzed quantitatively in [Reference Avigad, Dean and Rute2] and then was used in turn also to provide a quantitative dominated convergence theorem for the Lebesgue integral. With the results from this section, we will be able to recognize this analysis as an instance of the preceding metatheorems for probability contents on algebras.
We now move towards formalizing the results from [Reference Avigad, Dean and Rute2] and for that purpose introduce a sequence of random variables into the system
$\mathcal {F}^\omega [\mathbb {P},\mathrm {Integral}]$
by adding an additional constant
$X^{1(\Omega )(0)}$
together with the axiom
to stipulate that the sequence in question belongs to our subspace of bounded and weakly Borel-measurable functions. It is clear that Theorem 8.9 extends from
$\mathcal {F}^\omega [\mathbb {P},\mathrm {Integral}]$
to its extension by this constant X as any
$X(n)$
is trivially majorizable as it is bounded and the whole constant X is thus majorized by a maximum construction. In that system, using axiom
$(I)_3$
that asserts the weak Borel-measurability of any
$X(n)$
, it is then in particular a consequence of
$\Pi ^\Omega _1$
-
$\mathsf {AC}$
(and actually of
$\mathsf {b}\text {-}\mathsf {AC}_{\Omega ,S}$
by regarding quantification over the type S as bounded) that there exists a functional
$P^{S(0)(0)(0)}$
such that
We add such a P directly into the language of the system via a new constant of the appropriate type together with the above defining property as an axiom and denote the resulting system by
$\mathcal {F}^\omega [\mathbb {P},\mathrm {Integral},X]$
.
Using this functional P, we can then formally introduce the alternative way of formulating almost uniform convergence using finite unions as (implicitly) introduced in [Reference Avigad, Dean and Rute2], which allows for both a natural metastable variant as well as to extend any discussion regarding this notion naturally to the context of contents:
Definition 9.3. We say that X converges almost uniformly with respect to finite unions if
$$\begin{align*}\forall k^0, a^0\exists b^0\forall c^0\left( \mathbb{P}\kern1.5pt\left(\bigcup_{i=b}^c\bigcup_{j=b}^c P(a,i,j)^c\right)\le_{\mathbb{R}} 2^{-k}\right). \end{align*}$$
It is rather immediately clear that this notion of almost uniform convergence w.r.t. finite unions is equivalent over probability spaces to the usual notion of almost uniform convergence. Further, we want to emphasize that this mode was not explicitly introduced in [Reference Avigad, Dean and Rute2] but rather implicitly as already mentioned above as it is naturally suggested through the quantitative rendering of almost uniform convergence used in [Reference Avigad, Dean and Rute2] in their main quantitative result given in Theorem 3.1. Concretely, one immediately finds that a solution of the monotone functional interpretation of the negative translation of almost uniform convergence w.r.t. finite unions is exactly a function
$M(k,a)$
providing a
$2^{-k}$
-uniform bound for the
$2^{-a}$
-metastable convergence of the sequence coded by X as introduced in [Reference Avigad, Dean and Rute2].
Similarly, we can also give a formal representation of the notion of almost uniform metastable pointwise convergence as introduced in [Reference Avigad, Dean and Rute2] in the context of the system
$\mathcal {F}^\omega [\mathbb {P},\mathrm {Integral},X]$
:
Definition 9.4. We say that X converges almost uniform metastable pointwisely if
$$\begin{align*}\forall k^0,a^0,F^1\exists b^0\left( \mathbb{P}\kern1.5pt\left(\bigcap_{m=0}^b\bigcup_{i=m}^{F(m)}\bigcup_{j=m}^{F(m)}P(a,i,j)^c\right)\le_{\mathbb{R}} 2^{-k}\right). \end{align*}$$
Contrary to Definition 9.3, this mode of convergence was explicitly introduced in [Reference Avigad, Dean and Rute2] as a “more quantitatively friendly” version of the notion of almost sure convergence. In particular, as shown by Proposition 4.1 in [Reference Avigad, Dean and Rute2], the notion of almost uniform metastable pointwise convergence coincides over probability spaces with that of almost sure convergence. Also, as essentially already observed in [Reference Avigad, Dean and Rute2], a solution to the monotone functional interpretation (of the negative translation of) the statement of almost uniform metastable pointwise convergence is exactly a function
$M'(k,a)$
providing a
$2^{-k}$
-uniform bound on the
$2^{-a}$
-metastable pointwise convergence of the sequence coded by X as introduced in [Reference Avigad, Dean and Rute2].
In [Reference Avigad, Dean and Rute2], the authors now provide a quantitative version of Egorov’s theorem by constructing a solution of the monotone functional interpretation of (the negative translation of) almost uniform convergence w.r.t. finite unions from a solution of the monotone functional interpretation of (the negative translation of) the statement of almost uniform metastable pointwise convergence. In that way, they in particular provide a uniform quantitative rendering of the corresponding implication. We justify this application by showing in the following that already the system
$\mathcal {F}^\omega [\mathbb {P},\mathrm {Integral},X]$
proves this implication between the above two modes of convergence. Besides thereby explaining the success and the uniformities of the quantitative version of Egorov’s theorem from [Reference Avigad, Dean and Rute2], the provability in
$\mathcal {F}^\omega [\mathbb {P},\mathrm {Integral},X]$
established here shows in particular that the corresponding results from [Reference Avigad, Dean and Rute2] are true for probability contents and not just probability measures as illustrated in [Reference Avigad, Dean and Rute2]. That is, the authors inadvertently provided an “Egorov-like theorem” for probability contents. Concretely, this seems to be in particular due to the above renderings of the notions of almost sure and almost uniform convergence introduced via a finitary perspective informed by proof mining in [Reference Avigad, Dean and Rute2], which provide exactly those alternative phrasings of these notions that are much more nicely compatible with the notion of a content due to a computationally effective formulation using finite unions. In that way, the results from this section tie into the comments made in the introduction that the notions and proofs produced through the finitary perspective of proof mining seem to be suitable so that they allow for a simultaneous lift of the results to the theory of contents.
We now first note that one direction of that equivalence can be easily witnessed in the system
$\mathcal {F}^\omega [\mathbb {P},\mathrm {Integral},X]$
discussed previously.
Theorem 9.5. The system
$\mathcal {F}^\omega [\mathbb {P},\mathrm {Integral},X]$
proves that if X converges almost uniformly with respect to finite unions, then X converges almost uniformly metastable pointwisely.
Proof. We reason in
$\mathcal {F}^\omega [\mathbb {P},\mathrm {Integral},X]$
. Let k, a, and F be given. Using that X converges almost uniformly w.r.t. finite unions, there exists a b such that
$$\begin{align*}\mathbb{P}\kern1.5pt\left(\bigcup_{i=b}^c\bigcup_{j=b}^c P(a,i,j)^c\right)\le 2^{-k} \end{align*}$$
for all c. Now, we in particular have
$$\begin{align*}\bigcap_{m=0}^b\bigcup_{i=m}^{F(m)}\bigcup_{j=m}^{F(m)} P(a,i,j)^c\subseteq \bigcup_{i=b}^{F(b)}\bigcup_{j=b}^{F(b)} P(a,i,j)^c \end{align*}$$
and therefore
$$\begin{align*}\mathbb{P}\kern1.5pt\left(\bigcap_{m=0}^b\bigcup_{i=m}^{F(m)}\bigcup_{j=m}^{F(m)} P(a,i,j)^c\right)\leq\mathbb{P}\kern1.5pt\left(\bigcup_{i=b}^{F(b)}\bigcup_{j=b}^{F(b)} P(a,i,j)^c\right)\leq 2^{-k} \end{align*}$$
follows from the monotonicity of
$\mathbb {P}$
. This yields that X converges almost uniformly metastable pointwisely.
As mentioned before, a quantitative version of the converse of the above theorem is one of the main results of [Reference Avigad, Dean and Rute2] and to obtain this, the authors of [Reference Avigad, Dean and Rute2] mainly utilized a quantitative version of the following property about sequences of events.
Theorem 9.6 (Theorem 2.2 of [Reference Avigad, Dean and Rute2]).
For every sequence of events
$(A_n)$
, any functional
$M: \mathbb {N}^{\mathbb {N}}\to \mathbb {N}$
and any
$\lambda> \lambda ' > 0$
, there exists an
$n \in \mathbb {N}$
such that
$$\begin{align*}\mathbb{P}\kern1.5pt\left(\bigcap_{m = 0}^{M(F)}\bigcup_{j = m}^{F(m)}A_j\right) < \lambda'\text{ for all }F: \mathbb{N} \to \mathbb{N} \end{align*}$$
implies
$\mathbb {P}(A_n) < \lambda $
.
We now discuss in the following how (the proof of) this result can be formalized in our system for probability contents on algebras
$\mathcal {F}^\omega [\mathbb {P}]$
, justifying the existence and the uniformity of the quantitative result given in [Reference Avigad, Dean and Rute2] by means of our metatheorems. Concretely, we show:
Theorem 9.7. The system
$\mathcal {F}^\omega [\mathbb {P}]$
proves:
$$\begin{align*}\forall A^{S(0)}, M^{0(1)}, u^0, v^0>_0 u\exists n^0\left(\forall F^1\left(\mathbb{P}\kern1.5pt\left(\bigcap_{m = 0}^{M(F)}\bigcup_{j = m}^{F(m)}A(j)\right) \le_{\mathbb{R}} 2^{-v} \right)\to\mathbb{P}(A(n)) <_{\mathbb{R}} 2^{-u}\right). \end{align*}$$
In particular, as this theorem of
$\mathcal {F}^\omega [\mathbb {P}]$
has the correct logical form, our main Theorem 8.9 on the extraction of uniform computable bounds applies and we thus find that the existence of a computable bound on the existential quantifier on n can be guaranteed to exist a priori and even further, based on our notion of majorizability, it can be guaranteed that this bound is independent of the content space and the sequence of events which matches exactly the properties of the bound explicitly calculated in [Reference Avigad, Dean and Rute2].
To now demonstrate Theorem 9.7, we in particular rely on the following lemma:
Lemma 9.8. The system
$\mathcal {F}^\omega [\mathbb {P}]$
proves:
$$\begin{align*}\forall A^{S(0)}, k^0\exists N^0\forall n^0\left(\mathbb{P}\kern1.5pt\left(\bigcup_{i=0}^n A(i) \cap \left(\bigcup_{i=0}^N A(i)\right)^c\right) <_{\mathbb{R}} 2^{-k} \right). \end{align*}$$
Proof. We reason in
$\mathcal {F}^\omega [\mathbb {P}]$
. Let
$A^{S(0)}$
and
$k^0$
be given. At first, note that Proposition 4.5 implies that
$$ \begin{align} \exists N \forall n \left(n \ge N \to \left\vert \sum_{i=0}^n\mathbb{P}((A\!\uparrow)(i)) - \sum_{i=0}^N\mathbb{P}((A\!\uparrow)(i))\right\vert< 2^{-k}\right). \end{align} $$
Take such an N and let n be arbitrary. If
$n<N$
, then
$$\begin{align*}\bigcup_{i=0}^n A(i) \cap \left(\bigcup_{i=0}^N A(i)\right)^c=\emptyset \end{align*}$$
and so by extensionality of
$\mathbb {P}$
, we get
$$\begin{align*}\mathbb{P}\kern1.5pt\left(\bigcup_{i=0}^n A(i) \cap \left(\bigcup_{i=0}^N A(i)\right)^c\right) = 0 \end{align*}$$
and are done. So suppose
$n \ge N$
. Then by
$(*)$
, we get
$$\begin{align*}\left\vert \sum_{i=0}^n\mathbb{P}((A\!\uparrow)(i)) - \sum_{i=0}^N\mathbb{P}((A\!\uparrow)(i))\right\vert < 2^{-k}. \end{align*}$$
Since all the
$(A\!\uparrow )(i)$
are disjoint (by definition of
$A\!\uparrow $
) and since we have
$\bigcup _{i=0}^j (A\!\uparrow )(i) = \bigcup _{i=0}^j A(i)$
for any j, we immediately derive
$$\begin{align*}\sum_{i=0}^j\mathbb{P}((A\!\uparrow)(i)) = \mathbb{P}\kern1.5pt\left(\bigcup_{i=0}^j A(i) \right) \end{align*}$$
for any j by finite additivity and extensionality of
$\mathbb {P}$
. Thus, we in particular have
$$\begin{align*}\left\vert \mathbb{P}\kern1.5pt\left(\bigcup_{i=0}^n A(i) \right) - \mathbb{P}\kern1.5pt\left(\bigcup_{i=0}^N A(i) \right)\right\vert< 2^{-k} \end{align*}$$
and since
$n\geq N$
implies
$\bigcup _{i=0}^N A(i) \subseteq \bigcup _{i=0}^n A(i)$
, we obtain
$$\begin{align*}\mathbb{P}\kern1.5pt\left(\bigcup_{i=0}^n A(i) \cap \left(\bigcup_{i=0}^N A(i)\right)^c\right) = \mathbb{P}\kern1.5pt\left(\bigcup_{i=0}^n A(i) \right) - \mathbb{P}\kern1.5pt\left(\bigcup_{i=0}^N A(i) \right)< 2^{-k} \end{align*}$$
by Proposition 4.3.
With that lemma, we are now in the position for a formal proof of the main combinatorial theorem from [Reference Avigad, Dean and Rute2]:
Proof of Theorem 9.7.
Let
$A^{S(0)}$
,
$M^{0(1)}$
,
$u^0$
and
$v^0$
with
$v>u$
be given and suppose
$$\begin{align*}\forall F^1 \left(\mathbb{P}\kern1.5pt\left(\bigcap_{m = 0}^{M(F)}\bigcup_{j = m}^{F(m)}A(j)\right) \le 2^{-v} \right). \end{align*}$$
So, by the previous Lemma 9.8 applied to the sequence of events
$f_m^{S(0)}$
defined by
$f_m(k) = A(k+m)$
, we have
$$\begin{align*}\forall m\exists N \forall n\left(\mathbb{P}\kern1.5pt\left(\bigcup_{i=m}^{n+m} A(i) \cap \left(\bigcup_{i=m}^{N+m} A(i)\right)^c\right) < \frac{2^{-u}- 2^{-v}}{2^{m+1}}\right) \end{align*}$$
and so in particular
$$\begin{align*}\forall m\exists N\geq m \forall n\geq m\left(\mathbb{P}\kern1.5pt\left(\bigcup_{i=m}^{n} A(i) \cap \left(\bigcup_{i=m}^{N} A(i)\right)^c\right) < \frac{2^{-u}- 2^{-v}}{2^{m+1}}\right). \end{align*}$$
Thus, using
$\mathsf {AC}$
(actually, by switching from
$<$
to
$\leq $
in the above formulation and manipulating the bound slightly,
$\Pi ^0_1$
-
$\mathsf {AC}$
suffices), there exists a functional
$F^1$
such that for all m and
$n\geq m$
:
$$\begin{align*}\mathbb{P}\kern1.5pt\left(\bigcup_{i=m}^{n} A(i) \cap \left(\bigcup_{i=m}^{F(m)} A(i)\right)^c\right) < \frac{2^{-u}- 2^{-v}}{2^{m+1}}. \end{align*}$$
It is now easy to see that for this functional F, we have
$$ \begin{align*} A(M(F))&\subseteq \bigcap_{m = 0}^{M(F)}\bigcup_{i=m}^{M(F)} A(i) \\ &\subseteq\left( \bigcap_{m = 0}^{M(F)}\bigcup_{j = m}^{F(m)}A(j)\right) \cup \bigcup_{m = 0}^{M(F)} \left( \bigcup_{i=m}^{M(F)} A(i) \cap \left(\bigcup_{j = m}^{F(m)}A(j)\right)^c \right) \end{align*} $$
and so, by the sub-additivity and monotonicity of
$\mathbb {P}$
, we derive
$$\begin{align*}\mathbb{P}(A(M(F))) < 2^{-v} + \sum_{m=0}^{M(F)} \frac{2^{-u}- 2^{-v}}{2^{m+1}} < 2^{-u} \end{align*}$$
and so taking
$n:=M(F)$
for this functional F yields the claim.
Using this combinatory lemma, we can now also prove the converse of Theorem 9.5 and thereby exhibit how a quantitative solution for Theorem 9.7 as obtained in [Reference Avigad, Dean and Rute2] immediately can be used in conjunction with the proof-theoretic metatheorem established in Theorem 8.9 to derive a quantitative version of Egorov’s theorem in the sense of the above notions incorporating finite unions as presented in Theorem 3.1 of [Reference Avigad, Dean and Rute2] and in particular guarantees the observed uniformities of the rates a priori.
Theorem 9.9. The system
$\mathcal {F}^\omega [\mathbb {P},\mathrm {Integral},X]$
proves that if X converges almost uniformly metastable pointwisely, then X converges almost uniformly with respect to finite unions.
Proof. Suppose that X does not converge almost uniformly with respect to finite unions, that is, that we have k and a such that
$$\begin{align*}\forall m\exists g\left( \mathbb{P}\kern1.5pt\left(\bigcup_{i=m}^g\bigcup_{j=m}^g P(a,i,j)^c\right)> 2^{-k}\right). \end{align*}$$
Using
$\textsf {QF}\mbox {-}\textsf {AC}$
(after suitably prenexing the hidden quantifiers), we get a functional G such that
$$\begin{align*}\forall m\left(\mathbb{P}\kern1.5pt\left(\bigcup_{i=m}^{G(m)}\bigcup_{j=m}^{G(m)} P(a,i,j)^c\right)> 2^{-k}\right). \end{align*}$$
For a contradiction, suppose now that X converges almost uniformly metastable pointwisely. By instantiating the corresponding notion with
$k+1$
and a, we get
$$\begin{align*}\forall F\exists b\left( \mathbb{P}\kern1.5pt\left(\bigcap_{m=0}^b\bigcup_{i=m}^{F(m)}\bigcup_{j=m}^{F(m)}P(a,i,j)^c\right)\leq 2^{-(k+1)}\right). \end{align*}$$
Thus, by
$\textsf {QF}\mbox {-}\textsf {AC}$
(again after suitably prenexing), we get a functional M such that
$$\begin{align*}\forall F\left( \mathbb{P}\kern1.5pt\left(\bigcap_{m=0}^{M(F)}\bigcup_{i=m}^{F(m)}\bigcup_{j=m}^{F(m)}P(a,i,j)^c\right)\leq 2^{-(k+1)}\right). \end{align*}$$
Defining
$M'(F)=M(\lambda n.\tilde {G}(F(n)))$
where
$\tilde {G}(n)=\max _{m \le n} G(m)$
, we get
$$\begin{align*}\forall F\left( \mathbb{P}\kern1.5pt\left(\bigcap_{m=0}^{M'(F)}\bigcup_{i =m}^{\tilde{G}(F(m))}\bigcup_{j=m}^{\tilde{G}(F(m))}P(a,i,j)^c\right)\leq 2^{-(k+1)}\right). \end{align*}$$
We now define a sequence of events A via
$$\begin{align*}A(n):=\bigcup_{i=n}^{G(n)}\bigcup_{j=n}^{G(n)}P(a,i,j)^c. \end{align*}$$
Then we have
$$\begin{align*}\bigcup_{n=m}^{F(m)}A(n) = \bigcup_{n=m}^{F(m)}\bigcup_{i=n}^{G(n)}\bigcup_{j=n}^{G(n)}P(a,i,j)^c\subseteq\bigcup_{i=m}^{\tilde{G}(F(m))}\bigcup_{j=m}^{\tilde{G}(F(m))}P(a,i,j)^c \end{align*}$$
for all m and F and therefore this implies
$$\begin{align*}\forall F\left( \bigcap_{m=0}^{M'(F)}\bigcup_{n=m}^{F(m)}A(n) \subseteq \bigcap_{m=0}^{M'(F)}\bigcup_{i=m}^{\tilde{G}(F(m))}\bigcup_{j=m}^{\tilde{G}(F(m))}P(a,i,j)^c\right). \end{align*}$$
By the monotonicity of
$\mathbb {P}$
, we get
$$\begin{align*}\forall F\left( \mathbb{P}\kern1.5pt\left(\bigcap_{m=0}^{M'(F)}\bigcup_{n=m}^{F(m)}A(n)\right) \leq 2^{-(k+1)}\right), \end{align*}$$
which yields, by Theorem 9.7, that there exists an m such that
$$\begin{align*}\mathbb{P}(A(m)) = \mathbb{P}\kern1.5pt\left(\bigcup_{i=m}^{G(m)}\bigcup_{j=m}^{G(m)} P(a,i,j)^c\right) < 2^{-k} \end{align*}$$
which is a contradiction.
As a last formal elucidation of some of the results from [Reference Avigad, Dean and Rute2], we turn to Theorem 3.2 therein, where the authors provide a quantitative version for a special case of the dominated convergence theorem, strengthening preceding results from Tao [Reference Tao59]. Concretely, they assume for this special case that the random variables are positive and bounded (w.l.o.g.) by
$1$
(which immediately yields the “uniform” majorizability of the sequence X and thus guarantees the full independence of the rate from X via the preceding metatheorems). The following result now establishes that the corresponding infinitary convergence result can be proved in our system for integrals over probability contents
$\mathcal {F}^\omega [\mathbb {P},\mathrm {Integral},X]$
and therefore makes it possible to recognize the quantitative results extracted in [Reference Avigad, Dean and Rute2] as an application of the metatheorems established in this paper.
Theorem 9.10. The system
$\mathcal {F}^\omega [\mathbb {P},\mathrm {Integral},X]$
proves: if X converges almost uniformly metastable pointwisely and satisfies
$\forall n^0,x^\Omega \left (0\leq _{\mathbb {R}} X(n)(x)\leq _{\mathbb {R}}1\right )$
, it holds that
$$\begin{align*}\forall k^0\exists n^0 \forall i^0,j^0\left(i,j\ge_0 n \to \left \vert\int{X(i)}\,\mathrm{d}\mathbb{P} - \int{X(j)}\,\mathrm{d}\mathbb{P}\right\vert\le_{\mathbb{R}} 2^{-k}\right). \end{align*}$$
Proof. Let k be given. Since, by Theorem 9.9, X converges almost uniformly w.r.t. finite unions, there exists an n such that
$$\begin{align*}\forall m\left( \mathbb{P}\kern1.5pt\left(\bigcup_{a=n}^m\bigcup_{b=n}^m P(k+1,a,b)^c\right)\le 2^{-(k+2)}\right). \end{align*}$$
Take
$i,j\geq n$
and define
$m=\max \{i,j\}$
as well as
$$\begin{align*}A:=\bigcup_{a=n}^m\bigcup_{b=n}^m P(k+1,a,b)^c. \end{align*}$$
Similar to the proof of item (4) from Lemma 7.4, we have
$$ \begin{align*} \left\vert\int{X(i)}\,\mathrm{d}\mathbb{P} - \int{X(j)}\,\mathrm{d}\mathbb{P}\right\vert&\le\int{\left\vert X(i)-X(j)\right\vert}\,\mathrm{d}\mathbb{P}\\ &=\int{\left\vert X(i)-X(j)\right\vert\chi_A}\,\mathrm{d}\mathbb{P} + \int{\left\vert X(i)-X(j)\right\vert\chi_{A^c}}\,\mathrm{d}\mathbb{P}. \end{align*} $$
As we have
$\left \vert X(i)(x)-X(j)(x)\right \vert \le 2$
for all x, we get
On the other hand,
$x\in A^c$
yields
$$\begin{align*}x\in \bigcap_{a=n}^m\bigcap_{b=n}^m P(k+1,a,b) \end{align*}$$
which in particular gives, by definition of m, that
$x\in P(k+1,i,j)$
. From the definition of P, this in particular implies that
and so we have
$\left \vert X(i)(x) -X(j)(x)\right \vert \chi _{A^c}(x) \le 2^{-(k+1)}$
which yields
and we are done.
10 Proof-theoretic transfer principles
In this last section, we present how our systems and metatheorems allow for the proof of a general type of result, which we call a proof-theoretic transfer principle, that allows one to transfer quantitative information on implications between modes of convergence of real numbers to corresponding quantitative information on implications between analogous modes of convergence for bounded random variables. In particular, as this type of reasoning is very common in the literature on the convergence of various iterations of random variables (see, e.g., [Reference Kolmogorov32] among many others), this transfer principle allows for a logical explanation of the strategy of providing a proof-theoretic analysis of such results in practice by mainly analyzing the underlying result on real numbers and then lifting this result together with some (often) simple modifications to random variables.
Concretely, to allow for a discussion of general modes of convergence for real numbers and random variables, we consider the following abstract formal setup: We throughout this section fix two
$\Pi _3$
-formulas
and
where
$P_0$
and
$Q_0$
are quantifier-free formulas which only have the indicated variables free. We understand P and Q as abstract representations of modes of convergence, with parameters
$\underline {p}$
, for a sequence of real numbers represented by x.
For an example, we may take
using the previous intensional intervals (where the above statement can thus be regarded as a quantifier-free statement). In that case, P represents the usual Cauchy property for x.
To allow for a discussion of these modes applied to random variables, we extend the system
$\mathcal {F}^\omega [\mathbb {P}]$
with four further constants
together with the axioms
$$ \begin{align*} & \forall \underline{p}^{\underline{\sigma}},a^0,b^0,c^0,z^\Omega(z \in P(a,b,c,\underline{p}) \leftrightarrow P_0(a,b,c,\lambda n.X(n)(z),\underline{p})),\\ & \forall \underline{p}^{\underline{\sigma}},a^0,b^0,c^0,z^\Omega(z \in Q(a,b,c,\underline{p}) \leftrightarrow Q_0(a,b,c,\lambda n.X(n)(z),\underline{p})),\\ & \qquad\qquad\qquad\forall n^0, z^\Omega(\tau(n) \ge_{\mathbb{R}} \vert X(n)(z)\vert ), \end{align*} $$
specifying that the properties
$P_0$
and
$Q_0$
(inducing the predicates P and Q) induce measurable sets pointwisely relative to the sequence of random variablesFootnote
23
specified by X and that these random variables are all bounded via a suitable monotone sequence of bounds (i.e., that X as a constant is majorized by
$\tau $
). It is clear that Theorem 8.9 extends to this system, which we denote by
$\mathcal {U}^\omega $
, as all constants are majorizable and since the new axioms are purely universal.
In this extended language, we can then provide a formula that represents the property P if suitably lifted to the sequence of random variables represented by X:
Definition 10.1. We say that X satisfies P almost uniformly, and write
$P(X)$
a.u., if
Similarly, we define
$Q(X)$
a.u.
If we consider the previous example for
$P_0$
given in (∼), then by formulating
$P(X)$
a.u. in this case we recover the notion of almost uniform convergence with respect to finite unions as given in Definition 9.3 (i.e., the variant of almost uniform convergence implicitly considered in [Reference Avigad, Dean and Rute2]).
We now turn to our main result that provides a relationship between statements of the form
and statements of the form
and which thereby not only establishes an upgrade-type theorem from relations between modes of convergence for sequences of reals to sequences of random variables but also allows for a transfer of the computational information obtainable for the implication in the premise to the implication in the conclusion.
Theorem 10.2. Provably in
$\mathcal {U}^\omega $
, given functionals
$V,A,C$
such that
$$\begin{align*}\forall x,\underbrace{\underline{p},x^*,B,u,w}_{\omega}\left( \forall n(x^*(n)\geq \vert x(n)\vert)\land P_0(A\omega,B(A\omega),C\omega,x,\underline{p})\to Q_0(u,V\underline{p}x^*Bu,w,x,\underline{p})\right), \end{align*}$$
we can construct
$V',A',C'$
such that
$$\begin{align*}\forall \underbrace{\underline{p},B,k,u,w}_{\alpha}\left( \mathbb{P}(P(A'\alpha,Bk(A'\alpha),C'\alpha,\underline{p})^c)\leq 2^{-k}\to \mathbb{P}(Q(u,V'\underline{p}Bku,w,\underline{p})^c)\leq 2^{-k}\right). \end{align*}$$
Proof. Given such
$V,A,C$
, and
$\alpha =(\underline {p},B,k,u,w)$
, we define
$$ \begin{align*} & A'\alpha:=A\underline{p}\tau(Bk)uw,\\ & C'\alpha:=C\underline{p}\tau(Bk)uw,\\ & V'\underline{p}Bku:=V\underline{p}\tau(Bk)u. \end{align*} $$
Let z be arbitrary with
$z\in Q(u,V'\underline {p}Bku,w,\underline {p})^c$
. By the axioms of
$\mathcal {U}^\omega $
and the definition of
$V'$
, we have
$$ \begin{align*} z\in Q(u,V'\underline{p}Bku,w,\underline{p})^c&\leftrightarrow z\in Q(u,V\underline{p}\tau(Bk)u,w,\underline{p})^c\\ &\leftrightarrow \neg Q_0(u,V\underline{p}\tau(Bk)u,w,\lambda n.X(n)(z),\underline{p}) \end{align*} $$
and the latter implies
using the assumptions on
$V,A,C$
and that
$\tau (n)\geq \vert X(n)(z)\vert $
for any z. This in turn is by definition of
$A',V',C'$
equivalent to
and thus to
Thus, we have
as z above was arbitrary and therefore, we get
by the monotonicity of
$\mathbb {P}$
. This yields the claim.
This result, while at first glance rather technical and abstract, has a very concrete use recently observed in applications by the first author [Reference Neri43] and to illustrate this, we will now shortly discuss the extent of the above result and its use in mathematics:
-
1. Observe that the conclusion of Theorem 10.2 is just a witnessed version of the Dialectica interpretation of
(+)and therefore, under
$$ \begin{align} P(X)\text{ a.u.} \to Q(X)\text{ a.u.} \end{align} $$
$\textsf {AC}$
, this witnessed Dialectica interpretation in particular implies
$(+)$
. In that way, whenever the premise of Theorem 10.2 is established in
$\mathcal {U}^\omega $
, one immediately obtains the truth of
$(+)$
and so the above result allows for a lift from a (quantitative) result on real numbers to a true result for random variables. Furthermore, another main benefit of the conclusion of Theorem 10.2 is that it allows for the extraction of quantitative information in the sense that the functional
$V'$
provides a transformation of a rate for the premise
$P(X)$
a.u. into a rate for the conclusion
$Q(X)$
a.u. Even further, as
$V'$
can be constructed from V, this transformation of rates can be directly inferred from the transformation of rates V of the presumed result for real numbers.
-
2. The premise of Theorem 10.2 is “essentially” the Dialectica interpretation of the statement
(++)in the sense that the functionals
$$ \begin{align} \forall \underline{p}^{\underline{\sigma}},x^{1(0)} (P(x,\underline{p}) \to Q(x,\underline{p})) \end{align} $$
$V,A,C$
represent realizers for this interpretation, with the additional assumption that these realizers are suitably uniform, depending only on upper bounds of the sequence
$x^{1(0)}$
. Although one can construct examples where such realizers do not possess this kind of uniformity, in practice, for many theorems of the form
$(++)$
that have a semi-constructive proof, such uniform realizers can be given. In particular, this is true for the forthcoming work by the first author on Kronecker’s Lemma [Reference Neri43] and an upcoming work by Oliva and Arthan on quantitative stochastic optimization [Reference Oliva48]. In both these cases, one uses reasoning about sequences of real numbers to obtain the analogous result about sequences of random variables and a computational interpretation can be given to this line of reasoning. Theorem 10.2 then in particular provides an abstract generalization of this procedure and explains how this reasoning is substantiated by logical results.
Lastly, in the following remark we discuss a counterexample illustrating the necessity of the majorizability of the sequence of random variables in Theorem 10.2:
Remark 10.3. For the above transfer principle to hold, the assumption of the boundedness of the sequence of random variables is necessary as the following example shows: Take
$\Omega := \mathbb {N}$
and let S be the collection of all finite and co-finite subsets of
$\mathbb {N}$
, that is,
Furthermore, define the content
$\mathbb {P}$
by
$\mathbb {P}(A) = 0$
if A is finite and
$\mathbb {P}(A) = 1$
if
$A^c$
is finite, for all
$A \in S$
. Now, we consider the two properties
for a sequence
$x=(x_n)$
of type
$1(0)$
. Clearly, both P and Q are
$\Pi _3^0$
-formulas and are trivially true for all sequences x. Therefore also
$P(x)\to Q(x)$
is trivially true. Further, we can easily give
$V,A,C$
that satisfy the assumptions of Theorem 10.2. Now, consider
for each n. Then the set
$Q(n,m)$
corresponding to
$Q_0$
is just
which belongs to S as it is finite.
$P_0$
is just represented by the full set
$\mathbb {N}$
. Therefore, X satisfies P almost uniformly and does not satisfy Q almost uniformly as any
$Q(n,m)$
has measure
$0$
.
Acknowledgements
Both authors want to thank Thomas Powell, Ulrich Kohlenbach, Henry Towsner, and José Iovino for insightful discussions on the topics of this paper. We also want to thank the anonymous referee for the very careful reading of the manuscript and the resulting helpful suggestions which improved the paper in various places, in particular its presentation.
Competing interests
The authors have no competing interest to declare.
Financial support
The first author was partially supported by the EPSRC Centre for Doctoral Training in Digital Entertainment (EP/L016540/1). The second author was supported by the ‘Deutsche Forschungsgemeinschaft’ Project DFG KO 1737/6-2.

