Toward A Logical Theory Of Fairness and Bias

Fairness in machine learning is of considerable interest in recent years owing to the propensity of algorithms trained on historical data to amplify and perpetuate historical biases. In this paper, we argue for a formal reconstruction of fairness definitions, not so much to replace existing definitions but to ground their application in an epistemic setting and allow for rich environmental modelling. Consequently we look into three notions: fairness through unawareness, demographic parity and counterfactual fairness, and formalise these in the epistemic situation calculus.


Introduction
Machine Learning techniques have become pervasive across a range of different applications, and are the source of considerable excitement but also debate.For example, they are now widely used in areas as disparate as recidivism prediction, consumer credit-risk analysis and insurance pricing (Chouldechova 2017;Khandani et al. 2010).In some of these applications, the prevalence of machine learning techniques has raised concerns about the potential for learned algorithms to become biased against certain groups.This issue is of particular concern in cases when algorithms are used to make decisions that could have far-reaching consequences for individuals (for example in recidivism prediction) (Chouldechova 2017;Angwin et al. 2016).Attributes which the algorithm should be "fair" with respect to are typically referred to as protected attributes.The values to these are often hidden from the view of the decision maker (whether automated or human).There are multiple different potential fields that might qualify as protected attributes in a given situation, including ethnicity, sex, age, nationality and marital status (Zemel et al. 2013).Ideally, such attributes should not affect any prediction made by "fair" algorithms.However, even in cases where it is clear which attributes should be protected, there are multiple (and often mutually exclusive) definitions of what it means for an algorithm to be unbiased with respect to these attributes, and there is disagreement within the academic community on what is most appropriate (Dwork et al. 2011;Kusner et al. 2017;Zafar et al. 2017a).
However, even amid pressing concerns that algorithms currently in use may exhibit racial biases, there remains a lack of agreement about how to effectively implement fairness, given the complex socio-technical situations that such applications are deployed in and the background knowledge and context needed to assess the impact of outcomes (e.g., denying a loan to someone in need).
To address such issues broadly, an interesting argument has been championed by the symbolic community: by assuming a rich enough understanding of the application domain, we can encode machine ethics in a formal language.Of course, with recent advances in statistical relational learning, neuro-symbolic AI and inductive logic programming (Raedt et al. 2016;Muggleton et al. 2012), it is possible to integrate low-level pattern recognition based on sensory data with high-level formal specifications.For example, the Hera project (Lindner et al. 2017) allows for the implementation of several kinds of (rule-based) moral theory to be captured.Geneth (Anderson and Anderson 2014) uses inductive logic programming to create generalised moral principles from the judgements of ethicists about particular ethical dilemmas, with the system's performance being evaluated using an ethical Turing test.On the formalisation side, study of moral concepts has long been a favored topic in the knowledge representation community (Conway and Gawronski 2013;Alexander and Moore 2016;Czelakowski 1997;Hooker and Kim 2018), that can be further coupled against notions of beliefs, desires and intentions (Broersen et al. 2001;Georgeff et al. 1998).Finally, closer to the thrust of this paper, (Pagnucco et al. 2021) formalize consequentialist and deontological ethical principles in terms of "desirable" states in the epistemic situation calculus, and (Classen and Delgrande 2020) formalize obligations using situation calculus programs.

Contributions
Our thesis, in essence, is this: complementing the vibrant work in the ML community, it is worthwhile to study ethical notions in formal languages.This serves three broad objectives: 1. We can identify what the system needs to know versus what is simply true (Reiter 2001a;Halpern and Moses 2014) and better articulate how this knowledge should impact the agent's choices.It is worth remarking that epistemic logic has served as the foundation for investigating the impact of knowledge on plans and protocols (Levesque 1996;Lespérance et al. 2000;Halpern et al. 2009).2. We implicitly understand that we can further condition actions against background knowledge (such as ontologies and databases), as well as notions such as intentions and obligations (Sardina and Lespérance 2010).3. We can position the system's actions not simply as a single-shot decision or prediction, as is usual in the ML literature, but as a sequence of complex events that depend on observations and can involve loops and recursion: that is, in the form of programs (Levesque et al. 1997).
It would beyond the scope of a single paper to illustrate the interplay between the three objectives except in some particular application scenario.Thus, we focus on the interplay between A and C in the sense of advocating a "research agenda," rather than a single technical result, or a demonstration of a single application.In particular, what we seek to do is a formal reconstruction of some fairness definitions, not so much to replace existing definitions but to ground their application in an epistemic, dynamic setting.Consequently we look into three notions: fairness through unawareness, demographic parity and counterfactual fairness, and formalise these in the epistemic situation calculus (Scherl and Levesque 2003;Lakemeyer and Levesque 2011).In particular, our contributions are as follows: • Consider the notion of fairness through unawareness (FTU) in machine learning.Here, a "fair" classifier is one that predicts outputs by not using any information about protected attributes.In a dynamic setting, imagine a (virtual or physical) robot that is acting in service of some objective φ.For example, in a loan setting, which is classically treated as a static model in machine learning, we can expect intelligent automated agents to carry out many operations: check the yearly budget of the bank to determine the total amount to be loaned, rank applicants based on risk, determine the impact of recession, and ultimately synthesize a plan to achieve φ (loan approval), but by virtue of FTU, it should never be the case that the agent has had access to protected information.In this paper, we provide a simple but general definition to capture that idea, in a manner that distinguishes what is true from what is known by the agent.• Analogously, consider the notion of demographic parity (DP).It is understood as a classifier that is equally likely to make a positive prediction regardless of the value of the protected attribute.For example, the proportion of men who are granted loans equals the proportion of women granted loans.So, if φ(x) is the granting of a loan to individual x, how do we capture the notion that the agent has synthesized a plan that achieves φ(x) for both males as well as females?What would it look like for planning agents that want to conform to both FTU and DP?What if, instead of DP, we wished to only look at those granted loans, and among this group, we did not want the classifier to discriminate based on the individual's gender?For all these cases, we provide definitions in terms of the agent's mental state and action sequences that the agent knows will achieve φ(x) (Levesque 1996).• Finally, counterfactual fairness insists that the prediction should not differ if the individual's protected attributes take on a different value.For a planning agent to ensure this, we would need to make sure that deleting facts about the current value for an individual x's protected attribute and adding a different value still achieves φ(x) after the sequence.We characterize this using the notion of forgetting because we permit, in general, any arbitrary first-order theory for the initial knowledge base, and not just a database interpreted under the closed-world assumption.
These definitions can be seen to realize a specification for "fair" cognitive robots: that is, reasoning and planning agents (Lakemeyer and Levesque 2007) that ensure through the course of their acting that, say, they never gain knowledge about the protected attributes of individuals, and guarantee that individuals are not discriminated based on values to these attributes.
It should be clear that our definitions are loosely inspired by the ML notions.And so our formalisation do not argue for one definition over another, nor challenge any existing definition.We do, however, believe that studying the effects of these definitions in a dynamic setting provides a richer context to evaluate their appropriateness.Moreover, a formalisation such as ours lends itself to various types of implementations.For example, the synthesis of (epistemic) programs and plans (Wang and Zhang 2005;Baral et al. 2017;Muise et al. 2015;Classen et al. 2008;McIlraith and Son 2002) that achieve goals in socio-technical applications in a fair manner is an worthwhile research agenda.Likewise, enforcing fairness constraints while factoring for the relationships between individuals in social networks (Farnadi et al. 2018), or otherwise contextualising attributes against other concepts in a relational knowledge base (Aziz et al. 2018;Fu et al. 2020) are also worthwhile.By stipulating an account in quantified logic, it becomes possible to further unify such proposals in a dynamic setting.
Logic and fairness.Let us briefly remark on closely related efforts.At the outset, note that although there has been considerable work on formalizing moral rules, there is no work (as far as we are aware) on the formalization of fairness and bias in a dynamic epistemic setting, where we need to explicate the interaction between actions, plans and meta-beliefs.However, there is some work that tackles epistemic and logical aspects.
For example, the work of (Kawamoto 2019) considers a statistical epistemic logic and its use for the formalisation of statistical accuracy as well as fairness, including the criterion of equality of opportunity.There are a few key differences to our work: that work is motivated by a probabilistic reconstruction of prediction systems by appealing to distance measures, and so knowledge is defined in terms of accessibility between worlds that are close enough.The language, moreover, allows for "measurement" variables that are interpreted statistically.In contrast, our account is not (yet) probabilistic, and if our account were to be extended in that fashion, the most obvious version would reason about degrees of belief (Bacchus et al. 1999;Belle and Lakemeyer 2017); see (Bacchus et al. 1996) for discussions on the differences between statistical belief and degrees of belief.Moreover, our account is dynamic, allowing for explicit modalities operators for actions and programs.Consequently, our definitions are about studying how, say, the agent remains ignorant about protected attributes when executing a plan.
Be that as it may, the work of (Kawamoto 2019) leads to an account where fairness can be expressed as a logical property using predicates for protected attributes, remarkably similar in spirit to our approach if one were to ignore actions.This should, in the very least, suggest that such attempts are very promising, and for the future, it would be worthwhile to conduct a deeper investigation on how these formalisation attempts can be synthesized to obtain a general probabilistic logical account that combines the strength of dynamic epistemic languages and statistical measures.(In a related vein to (Kawamoto 2019), (Liu and Lorini 2022) seek to axiomatize ML sytems for the purpose of explanations in a modal logic.)An entirely complementary effort is the use of logic for verifying fair models (Ignatiev et al. 2020), where existing definitions and classifiers are encoded using logical functions and satisfiability modulo theories.
To summarize, all these differ from our work in that we are attempting to understand the interplay between bias, action and knowledge, and not really interested in capturing classifiers as objects in our language.Thus, our work, as discussed above, can be seen as setting the stage for "fair" cognitive robots.There is benefit to unifying these streams, which we leave to the future.

A logic for knowledge and action
We now introduce the logic ES (Lakemeyer and Levesque 2004). 1 The non-modal fragment of ES consists of standard first-order logic with =.That is, connectives {∧, ∀, ¬}, syntactic abbreviations {∃, ≡, ⊃} defined from those connectives, and a supply of variables variables {x, y, . . ., u, v, . ..}. Different to the standard syntax, however, is the inclusion of (countably many) standard names (or simply, names) for both objects and actions R, which will allow a simple, substitutional interpretation for ∀ and ∃.These can be thought of as special extra constants that satisfy the unique name assumption and an infinitary version of domain closure.
1 Our choice of language may seem unusual, but it is worth noting that this language is a modal syntactic variant of the classical epistemic situation that is better geared for reasoning about knowledge (Lakemeyer and Levesque 2011).
But more importantly, it can be shown that reasoning about actions and knowledge reduces to first-order reasoning via the so-called regression and representation theorems (Lakemeyer and Levesque 2004).(For space reasons, we do not discuss such matters further here.)There are, of course, many works explicating the links between the situation calculus and logic programming; see, for example, (Lee and Palla 2012).See also works that link the situation calculus to planning, such as (Classen et al. 2008;Belle 2022;Sardina et al. 2004;Baier et al. 2007).
Like in the situation calculus, to model immutable properties, we assume rigid predicates and functions, such as IsPlant(x) and father(x) respectively.To model changing properties, ES includes fluent predicates and functions of every arity, such as Broken(x) and height(x).Note that there is no longer a situation term as an argument in these symbols to distinguish the fluents from the rigids.For example, ES also includes distinguished fluent predicates Poss and SF to model the executability of actions and capture sensing outcomes respectively, but they are unary predicates (that is, in contrast to the classical situation calculus (Reiter 2001b) because they no longer include situation terms.)Terms and formulas are constructed as usual.The set of ground atoms P are obtained, as usual, from names and predicates.
There are four modal operators in ES: [a], , K and O.For any formula α, we read [a]α, α and Kα as "α holds after a", "α holds after any sequence of actions" and "α is known," respectively.Moreover, Oα is to be read as "α is only-known."Given a sequence In classical situation calculus parlance, we would use [a]α to capture successor situations as properties that are true after an action in terms of the current state of affairs.Together with the modality, which allows to capture quantification over situations and histories, basic action theories can be defined.Like in the classical approach, one is interested in the entailments of the basic action theory.
Semantics.Recall that in the simplest setup of the possible-worlds semantics, worlds mapped propositions to {0, 1}, capturing the (current) state of affairs.ES is based on the very same idea, but extended to dynamical systems.So, suppose a world maps P and Z to {0, 1}.2 Here, Z is the set of all finite sequences of action names, including the empty sequence .Let W be the set of all worlds, and e ⊆ W be the epistemic state.By a model, we mean a triple (e, w, z) where z ∈ Z. Intuitively, each world can be thought of as a situation calculus tree, denoting the properties true initially but also after every sequence of actions.W is then the set of all such trees.Given a triple (e, w, z), w denotes the real world, and z the actions executed so far.
To account for how knowledge changes after (noise-free) sensing, one defines w ′ ∼ z w, which is to be read as saying "w ′ and w agree on the sensing for z", as follows: This is saying that initially, we would consider all worlds compatible, but after actions, we would need the world w ′ to agree on the executability of actions performed so far as well as agree on sensing outcomes.The reader might notice that this is clearly a reworking of the successor state axiom for the knowledge fluent in (Scherl and Levesque 2003).
With this, we get a simply account for truth.We define the satisfaction of formulas wrt (with respect to) the triple (e, w, z), and the semantics is defined inductively: We write Σ | = α (read as "Σ entails α") to mean for every M = (e, w, Properties.Let us first begin by observing that given a model (e, w, z), we do not require w ∈ e.It is easy to show that if we stipulated the inclusion of the real world in the epistemic state, Kα ⊃ α would be true.That is, suppose Kα.By the definition above, w is surely compatible with itself after any z, and so α must hold at w. Analogously, properties regarding knowledge can be proven with comparatively simpler arguments in a modal framework, in relation to the classical epistemic situation calculus.Valid properties include: Note that such properties hold over all possible action sequences, which explains the presence of the operator on the outside.The first is about the closure of modus ponens within the epistemic modality.The second and third are on positive and negative introspection.The last two reason about quantification outside the epistemic modality, and what that means in terms of the agent's knowledge.For example, item 5 says that if there is some individual n such that the agent knows T eacher(n), it follows that the agent believes ∃xT eacher(x) to be true.This may seem obvious, but note that the property is really saying that the existence of an individual in some possible world implies that such an individual exists in all accessible worlds.It is because there is a fixed domain of discourse that these properties come out true; they are referred to a the Barcan formula.
As seen above, the logic ES allows for a simple definition of the notion of only-knowing in the presence of actions (Levesque 1990), which allows one to capture both the beliefs as well as the non-beliefs of the agent.Using the modal operator O for only-knowing, it can be shown that for any non-modal {α, β} .That is, only-knowing a knowledge base also means knowing everything entailed by that knowledge base.Conversely, it also means not believing everything that is not entailed by the knowledge base.In that sense, K can be seen as an "at least" epistemic operator, and O captures both at least and "at most" knowing.This can be powerful to ensure, for example, that the agent provably does not know protected attributes.
We will now consider the axiomatization of a basic action theory in ES.But before explaining how successor state axioms are written, one might wonder whether a successor state axiom for K is needed, as one would for Knows in the epistemic situation calculus.It turns out because the compatibility of the worlds already accounted for the executability of actions and sensing outcomes in accessible worlds, such an axiom is actually a property of the logic: (As is usual, free variables are implicitly quantified from the outside.)Thus, what will be known after an action is understood in terms of what was known previously together with the sensing outcome.The example below will further clarify how S F works.
Basic Action Theories.To axiomatize the domain, we consider the analogue of the basic action theory in the situation calculus (Reiter 2001b).It consists of: • axioms that describe what is true in the initial states, as well as what is known initially; • precondition axioms that describe the conditions under which actions are executable using a distinguished predicate Poss; • successor state axioms that describe the conditions under which changes happen to fluents after actions (incorporating Reiter's monotonic solution to the frame problem); and • sensing axioms that inform the agent about the world using a distinguished predicate S F.
Note that foundational axioms as usually considered in Reiter's variant of the situation calculus (Reiter 2001b) are not needed as the tree-like nature of the situations is baked into the semantics.
Let us consider a simple example of a loan agency set up for the employees of a company.For simplicity, assume actions are always executable: Poss(a) = true.Let us also permit a sensing axiom that allows one to look up if an individual is male: SF(a) ≡ (a = isMale(x) ∧ Male(x)) ∨ a isMale(x).For simplicity, we assume binary genders, but it is a simple matter of using a predicate such as Gender(x, y) instead to allow individuals x to take on gender y from an arbitrary set.
To now consider successor state axioms, let us suppose having a loan is simply a matter of the manager approving, and unless the manager denies it at some point, the individual continues to hold the loan.For illustration purposes, we will consider a company policy that approves loans for those with high salaries.High salaries are enabled for an "eligible" individual if they are promoted by the manager, and salaries remain high unless they get demoted.Finally, we model eligibility and maleness as a rigid, but this is not necessary, and we can permit actions that updates the gender of individuals in the database.These are formalized as the axioms below, where the left hand side of the equivalence captures the idea that for every sequence of actions, the effect of doing a on a predicate is given by the right hand side of the equivalence.
We will lump the successor state, precondition and sensing axioms as Σ dyn .The sentences that are true initially will be referred to by Σ 0 ; however, the agent cannot be expected to know everything that is true, and so let Σ ′ 0 be what is believed initially.It may seem natural to let Σ ′ 0 ⊆ Σ 0 , but that it not necessary.The agent might be uncertain about what is true (e.g., Σ 0 might have p but Σ ′ 0 has p ∨ q instead).3However, for simplicity, we will require that agents at least believe the dynamics works as would the real world.Therefore, we consider entailments wrt the following background theory: there are two groups of individuals, n i and n ′ i , the first male and the second female, the first considered eligible and the second not considered eligible.All that the agent knows is the eligibility of the individuals.Note that N here is any set, possibly an infinite one, that is, the language allows N = N.For ease of readability, however, we let N = {1} in our examples below, and we write n 1 as n and n ′ 1 as n ′ .4It is worth quickly remarking that many features of the language are omitted here for simplicity.For example, ES can be extended with second-order variables (Classen and Lakemeyer 2008), which allows one to consider the equivalent of GOLOG programs (Levesque et al. 1997).Likewise, notions of probabilistic actions (Bacchus et al. 1999), epistemic achievability (Lespérance et al. 2000), and causality (Batusov and Soutchanski 2018) in addition to studying program properties (Classen 2018) are interesting dimensions to explore in the fairness context.
Forgetting.In some of the definitions of fairness, we will need to force the setting where information about protected attributes is forgotten.While standard ML approaches propose to do this via column deletion (e.g., remove all entries for the gender attribute), a richer notion is arguably needed for a first-order knowledge base.We appeal to the notion of forgetting (Lin and Reiter 1994).
Lin and Reiter defined the notion of forgetting, which is adapted to ES below.They show that while forgetting ground atoms is first-order definable, forgetting relations needs second-order logic.We only focus on the case of atoms, but it would interesting to study how fairness notions are affected when protected attributes are completely absent from a theory.
Suppose S denotes a finite set of ground atoms.We write M(S ) to mean the set of all truth assignments to S .Slightly abusing notation, given a ground atom p, we write w ′ ∼ p w to mean that w ′ and w agree on everything initially, except maybe p.That is, for every atom q p, w[q, ] = w ′ [q, ].Next, for every action sequence z and every atom q ′ , w[q ′ , z] = w ′ [q ′ , z].Definition.Given a formula φ not mentioning modalities, we say φ ′ is the result of forgetting atom p, denoted Forget(φ, p), if for any world w, w | = φ ′ iff there is a w ′ such that w ′ | = φ and w ∼ p w ′ .Inductively, given a set of atoms {p 1 , . . ., p k }, define Forget(φ, {p 1 , . . ., p k }) as Forget(Forget(φ, p 1 ), . . ., p k ).
It is not hard to show that forgetting amounts to setting an atom to true everywhere or setting it false everywhere.In other words: Proposition.Forget(φ, S ) ≡ M∈M(s) φ[M], where φ[M] is equivalent to φ ∧ i (p i = b i ) understood to mean that the proposition p i is accorded the truth value b i ∈ {0, 1} by M.
Abusing notation, we extend the notion of forgetting of an atom p for basic action theories and the background theory as follows in applying it solely to what is true/known initially: One of the benefits of lumping the knowledge of the agent as an objective formula in the context of the only-knowing operator is the relatively simple definition of forgetting.
Proposition.Suppose φ is non-modal.Suppose p is an atom.For every objective ψ such that Forget(φ, p) | = ψ it is also the case that O(Forget(φ, p)) | = Kψ.
Because Oφ | = Kψ for every {φ, ψ} provided φ | = ψ, the above statement holds immediately.In so much as we are concerned with a non-modal initial theory and the effects of forgetting, our definition of Forget(Σ, p) above (notational abuse notwithstanding) suffices.In contrast, forgetting with arbitrary epistemic logical formulas is far more involved (Zhang and Zhou 2009).

Existing notions
As discussed, we will not seek to simply retrofit existing ML notions in a logical language; rather we aim to identify the principles and emphasize the provenance of unfair actions in complex events.Nonetheless, it is useful to revisit a few popular definitions to guide our intuition.
Fairness through unawareness.Fairness through unawareness (FTU) is the simplest definition of fairness; as its name suggests, an algorithm is "fair" if it is unaware of the protected attribute a p of a particular individual when making a prediction (Kusner et al. 2017).
Definition.For some set of attributes X any mapping f : X − → ŷ, where a p X satisfies fairness through unawareness (Kusner et al. 2017).(Assume y denotes the true label.) This prevents the algorithm learning direct bias on the basis of the protected attribute, but does not prevent indirect bias, which the algorithm can learn by exploiting the relationship between other training variables and the protected attribute (Pedreschi et al. 2008;Hardt et al. 2016).Moreover, if any of the training attributes are allocated by humans there is the potential for bias to be introduced.
Statistical measures of fairness.Rather than defining fairness in terms of the scope of the training data, much of the existing literature instead assesses whether an algorithm is fair on the basis of a number of statistical criteria that depend on the predictions made by the algorithm (Hardt et al. 2016;Kusner et al. 2017;Zemel et al. 2013).One widely used and simple criterion is demographic parity (DP).In the case that both the predicted outcome and protected attribute a p are both binary variables, a classifier is said to satisfy predictive parity (Hardt et al. 2016) if: P(ŷ = 1|a p = 1) = P(ŷ = 1|a p = 0).By this definition, a classifier is considered fair if it is equally likely to make a positive prediction regardless of the value of the protected attribute a p .
Fairness and the individual.Another problem with statistical measures is that, provided that the criterion is satisfied, an algorithm will be judged to be fair regardless of the impact on individuals.In view of that, various works have introduced fairness metrics which aim to ensure that individuals are treated fairly, rather than simply considering the statistical impact on the population as a whole (Dwork et al. 2011;Kusner et al. 2017).Counterfactual fairness (CF), for example, was proposed as a fairness criterion in (Kusner et al. 2017).The fundamental principle behind this definition of fairness is that the outcome of the algorithm's prediction should not be altered if different individuals within the sample training set were allocated different values for their protected attributes (Kusner et al. 2017).This criterion is written in the following form: The notation ŷ ← A p ←a p is understood as "the value of ŷ if A p had taken the value a p " (Kusner et al. 2017).

Formalizing Fairness
At the outset, let us note a few salient points about our formalizations of FTU, DP and CF: 1.Because we are not modeling a prediction problem, our definitions below should be seen as being loosely inspired by existing notions rather that faithful reconstructions.In par-ticular, we will look at "fair outcomes" after a sequence of actions.Indeed, debates about problems with the mathematical notions of fairness in single-shot predictions problems are widespread (Dwork et al. 2011;Kusner et al. 2017;Zafar et al. 2017a), leading to recent work on looking at the long-term effects of fairness (Creager et al. 2020).However, we are ignoring probabilities in the formalization in current work only to better study the principles behind the above notions -we suspect with a probabilistic epistemic dynamic language (Bacchus et al. 1999), the definitions might resemble mainstream notions almost exactly and yet organically use them over actions and programs, which is attractive.2. The first-order nature of the language, such as quantification, will allow us to easily differentiate fairness for an individual versus groups.In the mainstream literature, this has to be argued informally, and the intuition grasped meta-linguistically. 3.Because we model the real-world in addition the agent's knowledge, we will be able to articulate what needs to be true vs just believed by the agent.In particular, our notion of equity will refer to the real-world.4. De-re vs de-dicto knowledge will mean having versus not having information about protected attributes respectively.Sensing actions can be set up to enable de-re knowledge if need be, but it is easy to see in what follows that de-dicto is preferable. 5. Action sequences can make predicates true, and this will help us think about equity in terms of balancing opportunities across instances of protected attributes (e.g., making some property true so that we achieve gender balance).
Fairness through unawareness.Let us begin with FTU: recall that it requires that the agent does not know the protected attributes of the individuals.To simplify the discussion, let us assume we are concerned with one such attribute θ(x), say, Male(x), in our examples for concreteness.We might be interested in achieving hasLoan(x) or highS alary(x), for example, either for all x or some individual.
Definition.A sequence δ = a 1 • • • a k implements FTU for φ wrt protected attribute θ(x) iff Σ | = [δ]Kφ; and for every δ ′ ≤ δ: The attractiveness of a first-order formalism is that in these and other definitions below where we quantify over all individuals, it is immediate to limit the applicability of the conditions wrt specific individuals.Suppose n is such an individual.Then: Example.Consider Σ from [eq:example], Male(x) as the protected attribute, and suppose δ = approve(n) • approve(n ′ ).It is clear that δ implements FTU for both the universal φ = ∀xhasLoan(x) as well as an individual φ = hasLoan(n).Throughout the history, the agent does not know the gender of the individual.
Before turning to other notions, let us quickly reflect on proxy variables.Recall that in the ML literature, these are variables that indirectly provide informations about protected attributes.We might formalize this using entailment: Definition.Given a protected attribute θ(x) and theory Σ, let the proxy set Proxy(θ(x)) be the set of predicates {η 1 (x), . . .η k (x)} such that: Σ | = ∀x(η i (x) ⊃ θ(x)), for i ∈ {1, . . ., k}.That is, given the axioms in the background theory, η i (x) tells us about θ(x).
Example.Suppose the agent knows the following sentence: ∀x(EtonForBoys(x) ⊃ Male(x)).Let us assume EtonForBoys(x) is a rigid, like Male(x).Let us also assume that K(EtonForBoys(n)).It is clear that having information about this predicate for n would mean the agent can infer that n is male.
The advantage of looking at entailment in our definitions is that we do not need to isolate the proxy set at all, because whatever information we might have the proxy set and its instances, all we really need to check is that Σ | = ∃xKθ(x). 5emographic parity.Let us now turn to DP.In the probabilistic context, DP is a reference to the proportion of individuals in the domain: say, the proportion of males promoted is the same as the proportion of females promoted.In logical terms, although FTU permitted its definition to apply to both groups and individuals, DP, by definition, is necessarily a quantified constraint.In contrast, CF will stipulate conditions solely on individuals. Definition.
To reiterate, in probabilistic terms, the proportion of men who are promoted equals the proportion of women who are promoted.In the categorial setting, the agent knows that all men are promoted as well as that all women are promoted.
Note that even though the agent does not know the gender of the individuals, in every possible world, regardless of the gender assigned to an individual n in that world, n has the loan.In other words, all men and all women hold the loan.This is de-dicto knowledge of the genders, and it is sufficient to capture the thrust of DP.
We might be tempted to propose a stronger requirement, stipulating de-re knowledge: That is, the agent knows whether x is a male or not, for every x. Example.
FTU-DP.In general, since we do not wish the agent to know the values of protected attributes, vanilla DP is more attractive.Formally, we may impose a FTU-style constraint of not knowing on any fairness definition.For example, Definition.
Again, it is worth remarking that mixing and matching constraints is straightforward in a logic, and the semantical apparatus provides us with the tools to study the resulting properties.
One can also consider situations where some knowledge of protected attributes is useful to ensure there is parity but to also account for special circumstances.In this, the protected attribute itself could be "hidden" in a more general class, which is easy enough to do in a relational language.
Example.Suppose we introduce a new predicate for underrepresented groups.We might have, for example: ∀x(¬Male(x) ∨ . . .∨ RaceMinority(x) ⊃ Underrepresented(x)).This could be coupled with a sensing axiom of the sort: SF(checkU(x)) ≡ Underrepresented(x). Add the predicate definition and the sensing axioms to the initial theories and dynamic axioms in Σ respectively.Consider δ = checkU(n) • checkU(n ′ ) • approve(n) • approve(n ′ ).Then δ implements strong DP for hasLoan(x) wrt attribute Underrepresented(x).That is, both represented and underrepresented groups have loans.
Equality of opportunity.One problem with DP is that (unless the instance rate of y = 1 happens to be the same in both the a p = 0 group and a p = 1 group), the classifier cannot achieve 100% classification accuracy and satisfy the fairness criterion simultaneously (Hardt et al. 2016).Also, there are scenarios where this definition is completely inappropriate because the instance rate of y = 1 differs so starkly between different demographic groups.Finally, there are also concerns that statistical parity measures fail to account for fair treatment of individuals (Dwork et al. 2011).Nonetheless it is often regarded as the most appropriate statistical definition when an algorithm is trained on historical data (Zafar et al. 2017b;Zemel et al. 2013).
A modification of demographic parity is "equality of opportunity" (EO).By this definition, a classifier is considered fair if, among those individuals who meet the positive criterion, the instance rate of correct prediction is identical, regardless of the value of the protected attribute (Hardt et al. 2016).This condition can be expressed as (Hardt et al. 2016): P(y = 1|a p = a, ŷ = 1) = P(y = 1|a p = a ′ , ŷ = 1) ∀ a, a ′ .In (Hardt et al. 2016), it is pointed out that a classifier can simultaneously satisfy equality of opportunity and achieve perfect prediction whereby ŷ = y (prediction=true label) in all cases.
In the logical setting, this can be seen as a matter of only looking at individuals that satisfy a criterion, such as being eligible for promotion or not being too old to run for office.
Definition.A sequence δ implements EO for φ(x) wrt attribute θ(x) and criterion η(x) iff: Example.Consider δ = promote(n) • promote(n ′ ), let φ(x) = highS alary(x) and the criterion η(x) = Eligible(x).Although the promote action for n ′ does not lead her to obtain a high salary, because we condition the definition only for eligible individuals, δ does indeed implement EO.Note again that the agent does not know the gender for n ′ , but in every possible world, regardless of the gender n ′ is assigned, n ′ is known to be ineligible.In contrast, n is eligible and δ leads to n having a high salary.That is, every eligible male now has high salary, and every eligible female also has high salary.(It just so happens there are no eligible females, but we will come to that.) In general, the equality of opportunity criterion might well be better applied in instances where there is a known underlying discrepancy in positive outcomes between two different groups, and this discrepancy is regarded as permissible.However, as we might observe in our background theory, there is systematic bias in that no women is considered eligible.
Counterfactual fairness.Let us now turn to CF.The existing definition forces us to consider a "counterfactual world" where the protected attribute values are reversed, and ensure that the action sequence still achieves the goal.
The definition of CF is well-intentioned, but does not quite capture properties that might enable equity.Indeed, there is a gender imbalance in the theory, in the sense that only the male employee is eligible for promotions and the female employee can never become eligible.Yet CF does not quite capture this.Let us revisit the example with getting high salaries: Example.Consider δ = promote(n) for property highS alary(n) wrt attribute Male(n).It is clear that δ implements CF because the gender is irrelevant given that n is eligible.However, given δ ′ = promote(n ′ ), we see that δ ′ does not implement CF for highS alary(n ′ ) wrt Male(n ′ ).Because n ′ is not eligible, highS alary(n ′ ) does not become true after the promotion.
Equity.Among the many growing criticisms about formal definitions of fairness is that notions such as CF fail to capture systemic injustices and imbalances.We do not suggest that formal languages would address such criticisms, but they provide an opportunity to study desirable augmentations to the initial knowledge or action theory.
Rather than propose a new definition, let us take inspiration from DP, which seems fairly reasonable except that it is the context of what the agent knows.Keeping in mind a desirable "positive" property such as Eligible(x), let us consider DP but at the world-level: Definition.Given a theory Σ, protected attribute θ(x), positive property η(x), where x is the individual, define strong equity: In general, it may not be feasible to ensure that properties hold for all instances of both genders.For example, there may be only a handful of C-level executives, and we may wish that there are executives of both genders.
We assume weak equity and focus on FTU below.The definitions could be extended to strong equity or other fairness notions depending on the modelling requirements.
Definition.A sequence δ = a 1 • • • a k implements equitable FTU for φ wrt protected attribute θ(x) and property η(x) iff (a) either weak equity holds in Σ and δ implements FTU; or (b) δ implements equitable FTU for φ wrt θ(x) and η(x) for the updated theory Forget(Σ, S ), where Note that we are assuming that N is finite here because we have only defined forgetting wrt finitely many atoms.Otherwise, we would need a second-order definition.
Example.Consider δ = promote(n) • promote(n ′ ) for goal φ = ∀x(highS alary(x)) wrt protected attribute Male(x) and property Eligible(x).It is clear that weak equity does not hold for Σ because there is a female who is not eligible.In this case, consider Σ ′ = Forget(Σ, S ) where S = {Eligible(n), Eligible(n ′ )} .And with that, Σ ′ also does not mention that n is eligible, so the promotion actions does not lead to anyone having high salaries.So δ does not enable knowledge of φ.
Example.Let us consider Σ ′ that is like Σ except that Eligible(x) is not rigid, and can be affected using the action make(x): [a]Eligible(x) ≡ Eligible(x) ∨ (a = make(x)).That is, either an individual is eligible already or the manager makes them.Of course, δ = promote(n) • promote(n ′ ) from above still does not implement equitable FTU, because we have not considered any actions yet to make individuals eligible.However, consider δ ′ = make(n) • make(n ′ ) • promote(n) • promote(n ′ ).Because Σ does not satisfy weak equity, we turn to the second condition of the definition.On forgetting, no one is eligible in the updated theory, but the first two actions in δ ′ makes both n and n ′ eligible, after which, they are both promoted.So δ ′ enables knowledge of ∀x(highS alary(x)).Thus, the actions have made clear that eligibility is the first step in achieving gender balance, after which promotions guarantee that there are individuals of both genders with high salaries.

Conclusions
In this paper, we looked into notions of fairness from the machine learning literature, and inspired by these, we attempted a formalization in an epistemic logic.Although we limited ourselves to categorical knowledge and noise-free observations, we enrich the literature by considering actions.Consequently we looked into three notions: fairness through unawareness, demographic parity and counterfactual fairness, but then expanded these notions to also tackle equality of opportunity as well as equity.We were also able to mix and match constraints, showing the advantage of a logical approach, where one can formally study the properties of (combinations of) definitions.Using a simple basic action theory we were nonetheless able to explore these notions using action sequences.
As mentioned earlier, this is only a first step and as argued in works such as (Pagnucco et al. 2021;Dehghani et al. 2008;Halpern and Kleiman-Weiner 2018) there is much promise in looking at ethical AI using rich logics.In fact, we did not aim to necessarily faithfully reconstruct existing ML notions in this paper but rather study underlying principles.This is primarily because we are not focusing on single-shot prediction problems but how actions, plans and programs might implement fairness and de-biasing.The fact that fairness was defined in terms of actions making knowledge of the goal true, exactly as one would in planning (Levesque 1996), is no accident.
State-of-the-art analysis in fairness is now primarily based on false positives and false negatives (Verma and Rubin 2018).So we think as the next step, a probabilistic language such as (Bacchus et al. 1999) could bring our notions closer to mainstream definitions, but now in the presence of actions.In the long term, the goal is to logically capture bias in the presence of actions as well as repeated harms caused by systemic biases (Creager et al. 2020).Moreover, the use of logics not only serve notions such as verification and correctness, but as we argue, could also provide a richer landscape for exploring ethical systems, in the presence of background knowledge and context.This would enable the use of formal tools (model theory, proof strategies and reasoning algorithms) to study the long-term impact of bias while ensuring fair outcomes throughout the operational life of autonomous agents embedded in complex sociotechnical applications.
Of course, a logical study such as ours perhaps has the downside that the language of the paper is best appreciated by researchers in knowledge representation, and not immediately accessible to a mainstream machine learning audience.But on the other hand, there is considerable criticism geared at single-shot prediction models for not building in sufficient context and commonsense.