Hostname: page-component-89b8bd64d-ktprf Total loading time: 0 Render date: 2026-05-08T15:12:50.111Z Has data issue: false hasContentIssue false

Neural network attribution methods for problems in geoscience: A novel synthetic benchmark dataset

Published online by Cambridge University Press:  09 June 2022

Antonios Mamalakis*
Affiliation:
Department of Atmospheric Science, Colorado State University, Fort Collins, Colorado, USA
Imme Ebert-Uphoff
Affiliation:
Department of Electrical and Computer Engineering, Colorado State University, Fort Collins, Colorado, USA Cooperative Institute for Research in the Atmosphere, Colorado State University, Fort Collins, Colorado, USA
Elizabeth A. Barnes
Affiliation:
Department of Atmospheric Science, Colorado State University, Fort Collins, Colorado, USA
*
*Corresponding author. E-mail: amamalak@rams.colostate.edu

Abstract

Despite the increasingly successful application of neural networks to many problems in the geosciences, their complex and nonlinear structure makes the interpretation of their predictions difficult, which limits model trust and does not allow scientists to gain physical insights about the problem at hand. Many different methods have been introduced in the emerging field of eXplainable Artificial Intelligence (XAI), which aims at attributing the network’s prediction to specific features in the input domain. XAI methods are usually assessed by using benchmark datasets (such as MNIST or ImageNet for image classification). However, an objective, theoretically derived ground truth for the attribution is lacking for most of these datasets, making the assessment of XAI in many cases subjective. Also, benchmark datasets specifically designed for problems in geosciences are rare. Here, we provide a framework, based on the use of additively separable functions, to generate attribution benchmark datasets for regression problems for which the ground truth of the attribution is known a priori. We generate a large benchmark dataset and train a fully connected network to learn the underlying function that was used for simulation. We then compare estimated heatmaps from different XAI methods to the ground truth in order to identify examples where specific XAI methods perform well or poorly. We believe that attribution benchmarks as the ones introduced herein are of great importance for further application of neural networks in the geosciences, and for more objective assessment and accurate implementation of XAI methods, which will increase model trust and assist in discovering new science.

Information

Type
Methods Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press
Figure 0

Figure 1. Schematic overview of the general idea of the paper for a climate prediction setting. In step 1 (Section 2), we generate $ N $ independent realizations of a random vector $ \mathbf{X}\in {\mathbb{R}}^d $ from a multivariate Normal Distribution. In step 2 (also Section 2), we generate a response $ Y\in \mathbb{R} $ to the synthetic input $ \mathbf{X} $, using a known nonlinear function $ F $. In step 3 (Section 3), we train a fully connected NN using the synthetic data $ \mathbf{X} $ and $ Y $ to approximate the function $ F $. The NN learns a function $ \hat{F} $. Lastly, in step 4 (Section 4), we compare the XAI heatmaps estimated from different XAI methods to the ground truth, which represents the function $ F $ and has been objectively derived for any sample $ n=1,2,\dots, N $.

Figure 1

Figure 2. Schematic representation and actual examples of local piece-wise linear functions $ {C}_i $, for $ K=5 $.

Figure 2

Figure 3. Performance of different XAI methods for the sample $ n=\mathrm{979,476} $ in the testing set. The XAI performance is assessed by comparing the estimated heatmaps to the ground truth of attribution for $ F $. All heatmaps are standardized with the corresponding maximum (in absolute terms) heatmap value. Red (blue) color corresponds to positive (negative) contribution to (or gradient of) the response/prediction, with darker shading representing higher (in absolute terms) value. For all methods apart from Deep Taylor and LRPα = 1,β = 0, the correlation coefficient between the heatmap and the ground truth is also provided. For the methods Deep Taylor and LRPα = 1,β = 0 the correlation with the absolute ground truth is given to account for the fact that these two methods do not distinguish between positive and negative attributions (by construction).

Figure 3

Figure 4. Same as Figure 3, but for the sample $ n=\mathrm{995,903} $ in the testing set.

Figure 4

Figure 5. Summary of the performance of different XAI methods. Histograms of the correlation coefficients between different XAI heatmaps and the ground truth of attribution for 100,000 testing samples. a) Results of Input × Gradient for the linear model and the network. b-c) Results of XAI methods when applied to the network.