Hostname: page-component-5db58dd55d-qmkzp Total loading time: 0 Render date: 2026-06-01T04:11:14.176Z Has data issue: false hasContentIssue false

Towards learned emulation of interannual water isotopologue variations in General Circulation Models

Published online by Cambridge University Press:  04 October 2023

Jonathan Wider*
Affiliation:
Institut für Umweltphysik, Ruprecht-Karls-Universität Heidelberg, Heidelberg, Germany Interdisziplinäres Zentrum für Wissenschaftliches Rechnen, Ruprecht-Karls-Universität Heidelberg, Heidelberg, Germany Department of Computational Hydrosystems, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), Dresden/Leipzig, Germany
Jakob Kruse
Affiliation:
Institut für Umweltphysik, Ruprecht-Karls-Universität Heidelberg, Heidelberg, Germany Interdisziplinäres Zentrum für Wissenschaftliches Rechnen, Ruprecht-Karls-Universität Heidelberg, Heidelberg, Germany
Nils Weitzel
Affiliation:
Institut für Umweltphysik, Ruprecht-Karls-Universität Heidelberg, Heidelberg, Germany Department of Geosciences, University of Tübingen, Tübingen, Germany
Janica C. Bühler
Affiliation:
Department of Geosciences, University of Tübingen, Tübingen, Germany
Ullrich Köthe
Affiliation:
Interdisziplinäres Zentrum für Wissenschaftliches Rechnen, Ruprecht-Karls-Universität Heidelberg, Heidelberg, Germany
Kira Rehfeld
Affiliation:
Institut für Umweltphysik, Ruprecht-Karls-Universität Heidelberg, Heidelberg, Germany Department of Geosciences, University of Tübingen, Tübingen, Germany Department of Physics, University of Tübingen, Tübingen, Germany
*
Corresponding author: Jonathan Wider; Email: jonathan.wider@ufz.de

Abstract

Simulating abundances of stable water isotopologues, that is, molecules differing in their isotopic composition, within climate models allows for comparisons with proxy data and, thus, for testing hypotheses about past climate and validating climate models under varying climatic conditions. However, many models are run without explicitly simulating water isotopologues. We investigate the possibility of replacing the explicit physics-based simulation of oxygen isotopic composition in precipitation using machine learning methods. These methods estimate isotopic composition at each time step for given fields of surface temperature and precipitation amount. We implement convolutional neural networks (CNNs) based on the successful UNet architecture and test whether a spherical network architecture outperforms the naive approach of treating Earth’s latitude-longitude grid as a flat image. Conducting a case study on a last millennium run with the iHadCM3 climate model, we find that roughly 40% of the temporal variance in the isotopic composition is explained by the emulations on interannual and monthly timescale, with spatially varying emulation quality. The tested CNNs outperform simple baseline models such as random forest and pixel-wise linear regression substantially. A modified version of the standard UNet architecture for flat images yields results that are as good as the predictions by the spherical CNN. Variations in the implementation of isotopes between climate models likely contribute to an observed deterioration of emulation results when testing on data obtained from different climate models than the one used for training. Future work toward stable water-isotope emulation might focus on achieving robust climate–oxygen isotope relationships or exploring the set of possible predictor variables.

Information

Type
Application Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Figure 1. Our approach to the emulation of $ {\delta}^{18}\mathrm{O} $ in precipitation: for each time step, we use surface temperature and precipitation amount as predictor variables. Subsequently, the data is standardized pixel-wise by subtracting the mean and dividing it by the standard deviation at each pixel (top right). Means and standard deviations are based on the training set of the investigated climate model simulation. We use a machine learning emulation model (ML Regressor) to obtain a standardized estimate for $ {\delta}^{18}\mathrm{O} $. The emulator output (bottom right) is then de-standardized using the training set mean and standard deviation of $ {\delta}^{18}\mathrm{O} $ at every pixel, to arrive at the final emulation result (bottom left). When applying the ML model to data from climate models other than the one that was used for training (e.g., in the cross-comparison experiment in Section 3.4), we use the mean and standard deviation from the training set of the new model.

Figure 1

Figure 2. Statistical properties of the iHadCM3 $ {\delta}^{18}\mathrm{O} $ data: (a) mean state of isotopic composition ($ {\delta}^{18}\mathrm{O} $) in precipitation and (b) standard deviation of $ {\delta}^{18}\mathrm{O} $ on an annual timescale; (c) absolute correlations of $ {\delta}^{18}\mathrm{O} $ with temperature (green) and precipitation amount (brown) on interannual timescale—for each grid cell only the stronger of the two is shown.

Figure 2

Figure 3. Test set emulation performance of the best ML emulation method. The bluer the colors, the better the emulation. Blue colors indicate regions in which the performance is better than a trivial baseline model that returns the correct test set mean at every time step. This plot displays the average of the $ {R}^2 $ scores over 10 runs. Additionally, we show the time series of the ML emulation (green, mean over ten runs) and the true simulation data (black) for grid boxes next to two ice core drilling sites. Panel (b) “NGRIP” (Greenland). Panel (c) “WAIS Divide” (West Antarctica).

Figure 3

Figure 4. Typical emulation results on iHadCM3 dataset: we show anomalies as they are output by the ML emulator (“Emulation”) and the “true” result in the simulation data set (“Ground truth”). The anomalies are computed with respect to the training set mean. For the selected time step, the anomaly correlation coefficient (ACC) reaches its median value.

Figure 4

Table 1. Globally averaged $ {R}^2 $ scores for the different ML emulation methods.

Figure 5

Figure 5. Results for the cross-prediction task: a UNet is trained on the iHadCM3 training data set. The performance is then evaluated on the test set of various climate models; shown $ {R}^2 $ scores are averages over 10 runs.

Supplementary material: File

Wider et al. supplementary material
Download undefined(File)
File 4.2 MB