Hostname: page-component-89b8bd64d-n8gtw Total loading time: 0 Render date: 2026-05-10T23:35:18.945Z Has data issue: false hasContentIssue false

Using Pareto simulated annealing to address algorithmic bias in machine learning

Published online by Cambridge University Press:  04 May 2022

William Blanzeisky
Affiliation:
School of Computer Science, University College Dublin, Dublin 4, Ireland E-mail: william.blanzeisky@ucdconnect.ie, padraig.cunningham@ucd.ie
Pádraig Cunningham
Affiliation:
School of Computer Science, University College Dublin, Dublin 4, Ireland E-mail: william.blanzeisky@ucdconnect.ie, padraig.cunningham@ucd.ie
Rights & Permissions [Opens in a new window]

Abstract

Algorithmic bias arises in machine learning when models that may have reasonable overall accuracy are biased in favor of ‘good’ outcomes for one side of a sensitive category, for example gender or race. The bias will manifest as an underestimation of good outcomes for the under-represented minority. In a sense, we should not be surprised that a model might be biased when it has not been ‘asked’ not to be; reasonable accuracy can be achieved by ignoring the under-represented minority. A common strategy to address this issue is to include fairness as a component in the learning objective. In this paper, we consider including fairness as an additional criterion in model training and propose a multi-objective optimization strategy using Pareto Simulated Annealing that optimizes for both accuracy and underestimation bias. Our experiments show that this strategy can identify families of models with members representing different accuracy/fairness tradeoffs. We demonstrate the effectiveness of this strategy on two synthetic and two real-world datasets.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press
Figure 0

Figure 1. A comparison on the two underestimation measures UEI and US$_{\textrm{S}}$ on the four datasets summarized in Table 1. The UEI score is an aggregate score across all feature/outcomes and as such can hide detail of the impact on the protected group (S=0). This is evident when we compare the Recidivism and Adult scores.

Figure 1

Figure 2. Relationship between underestimation index (UEI) and accuracy.

Figure 2

Figure 3. Pareto fronts obtained using Pareto Simulated Annealing on four datasets.

Figure 3

Algorithm 1: High-level pseudocode for Pareto Simulated Annealing algorithm to mitigate underestimation.

Figure 4

Table 1. Summary details of the four datasets

Figure 5

Figure 4. UEI and accuracy scores for the four models on the test data.

Figure 6

Figure 5. $US_{S=0}$ and accuracy scores for the four models on the test data.