Hostname: page-component-77f85d65b8-5ngxj Total loading time: 0 Render date: 2026-03-29T15:14:48.973Z Has data issue: false hasContentIssue false

A climate index collection based on model data

Published online by Cambridge University Press:  02 May 2023

Marco Landt-Hayen*
Affiliation:
Information Systems and Data Mining, Christian-Albrechts-Universität zu Kiel, Kiel, Germany Ocean Circulation and Climate Dynamics, GEOMAR Helmholtz Centre for Ocean Research Kiel, Kiel, Germany
Willi Rath
Affiliation:
Ocean Circulation and Climate Dynamics, GEOMAR Helmholtz Centre for Ocean Research Kiel, Kiel, Germany
Sebastian Wahl
Affiliation:
Ocean Circulation and Climate Dynamics, GEOMAR Helmholtz Centre for Ocean Research Kiel, Kiel, Germany
Nils Niebaum
Affiliation:
Ocean Circulation and Climate Dynamics, GEOMAR Helmholtz Centre for Ocean Research Kiel, Kiel, Germany
Martin Claus
Affiliation:
Ocean Circulation and Climate Dynamics, GEOMAR Helmholtz Centre for Ocean Research Kiel, Kiel, Germany Faculty of Mathematics and Natural Sciences, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
Peer Kröger
Affiliation:
Information Systems and Data Mining, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
*
Corresponding author: Marco Landt-Hayen; Email: mlandt-hayen@geomar.de

Abstract

Machine learning (ML) and in particular deep learning (DL) methods push state-of-the-art solutions for many hard problems, for example, image classification, speech recognition, or time series forecasting. In the domain of climate science, ML and DL are known to be effective for identifying causally linked modes of climate variability as key to understand the climate system and to improve the predictive skills of forecast systems. To attribute climate events in a data-driven way, we need sufficient training data, which is often limited for real-world measurements. The data science community provides standard data sets for many applications. As a new data set, we introduce a consistent and comprehensive collection of climate indices typically used to describe Earth System dynamics. Therefore, we use 1000-year control simulations from Earth System Models. The data set is provided as an open-source framework that can be extended and customized to individual needs. It allows users to develop new ML methodologies and to compare results to existing methods and models as benchmark. For example, we use the data set to predict rainfall in the African Sahel region and El Niño Southern Oscillation with various ML models. Our aim is to build a bridge between the data science community and researchers and practitioners from the domain of climate science to jointly improve our understanding of the climate system.

Information

Type
Data Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Table 1. All indices are included in CICMoD data set with their acronyms and spatial domains, ordered by the underlying feature.

Figure 1

Figure 1. Pairwise correlation coefficients of all CICMoD indices derived from FOCI data.

Figure 2

Figure 2. Fidelity check on validation data: Sahel rainfall predictions (black line) from MLP models on FOCI data (upper part) and CESM data (lower part), respectively, compared to true targets shown as a bar plot.

Figure 3

Table 2. Evaluating model performance for predicting Sahel rainfall with linear regression (lin. reg.) and MLP models trained on FOCI and CESM data, respectively.

Figure 4

Figure 3. Pairwise correlation coefficients of Nino 3.4 index with various lead times (current phase, 3 and 6 months into the future) used as targets and input features, both derived from FOCI data.

Figure 5

Figure 4. Fidelity check on the first 500 months of validation data: Compare predictions (black line) from CNN models on FOCI (left-hand side) and CESM data (right-hand side), respectively, compared to true targets shown as bar plot for various lead times. (a,b) Current phase. (c,d) Three months into the future. (e,f) Six months into the future.

Figure 6

Table 3. Evaluating model performance for predicting ENSO with CNN and LSTM models trained on FOCI and CESM data, respectively.