Hostname: page-component-77f85d65b8-hzqq2 Total loading time: 0 Render date: 2026-03-29T11:12:35.788Z Has data issue: false hasContentIssue false

A spatiotemporal stochastic climate model for benchmarking causal discovery methods for teleconnections

Published online by Cambridge University Press:  27 September 2022

Xavier-Andoni Tibau*
Affiliation:
Deutsches Zentrum für Luft- und Raumfahrt (DLR), Institut für Datenwissenschaften, Jena, Germany Deutsches Zentrum für Luft- und Raumfahrt (DLR), Institut für Physik der Atmosphäre, Oberpfaffenhofen, Germany
Christian Reimers
Affiliation:
Deutsches Zentrum für Luft- und Raumfahrt (DLR), Institut für Datenwissenschaften, Jena, Germany Friedrich Schiller University Jena, Computer Vision Group, Jena, Germany
Andreas Gerhardus
Affiliation:
Deutsches Zentrum für Luft- und Raumfahrt (DLR), Institut für Datenwissenschaften, Jena, Germany
Joachim Denzler
Affiliation:
Deutsches Zentrum für Luft- und Raumfahrt (DLR), Institut für Datenwissenschaften, Jena, Germany Friedrich Schiller University Jena, Computer Vision Group, Jena, Germany
Veronika Eyring
Affiliation:
Deutsches Zentrum für Luft- und Raumfahrt (DLR), Institut für Physik der Atmosphäre, Oberpfaffenhofen, Germany University of Bremen, Institute of Environmental Physics (IUP), Bremen, Germany
Jakob Runge
Affiliation:
Deutsches Zentrum für Luft- und Raumfahrt (DLR), Institut für Datenwissenschaften, Jena, Germany Technische Universität Berlin, Faculty of Electrical Engineering and Computer Science, Berlin, Germany
*
*Corresponding author. E-mail: xavier.tibau@dlr.de

Abstract

Teleconnections that link climate processes at widely separated spatial locations form a key component of the climate system. Their analysis has traditionally been based on means, climatologies, correlations, or spectral properties, which cannot always reveal the dynamical mechanisms between different climatological processes. More recently, causal discovery methods based either on time series at grid locations or on modes of variability, estimated through dimension-reduction methods, have been introduced. A major challenge in the development of such analysis methods is a lack of ground truth benchmark datasets that have facilitated improvements in many parts of machine learning. Here, we present a simplified stochastic climate model that outputs gridded data and represents climate modes and their teleconnections through a spatially aggregated vector-autoregressive model. The model is used to construct benchmarks and evaluate a range of analysis methods. The results highlight that the model can be successfully used to benchmark different causal discovery methods for spatiotemporal data and show their strengths and weaknesses. Furthermore, we introduce a novel causal discovery method at the grid level and demonstrate that it has orders of magnitude better performance than the current approaches. Improved causal analysis tools for spatiotemporal climate data are pivotal to advance process-based understanding and climate model evaluation.

Information

Type
Methods Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press
Figure 0

Figure 1. Overview of network estimation methods from two perspectives: nodes can be defined as individual grid locations or at the mode level (vertical dimension). Links can be based on correlation or causation (horizontal dimension). For instance, some nodes in a network may be at the node level and others at the grid level (e.g., correlation maps of El Niño–Southern Oscillation). Causal links may be defined in the bivariate Granger causality case or in a multivariate framework with PCMCI.

Figure 1

Figure 2. Representation of the Mapped-PCMCI algorithm. (a) Perform a dimensionality reduction method on a gridded dataset to obtain a mapping between the gridded data and its lower-dimensional representation. (b) Apply a causal discovery method to obtain the parents and the estimated causal network from the resulting time series. (c) Estimate the lagged causal effects for all links to obtain a coefficient matrix. (d) “Invert” the dimensionality reduction mapping to obtain the causal network and the causal effects at the grid level.

Figure 2

Figure 3. Illustration of the spatially aggregated vector-autoregressive model. Climate fields $ {\mathbf{y}}_t $ evolving as given in equation (16) are represented by time series on a regular grid. Climate modes of variability are defined by regions $ {W}_i $ of covariant Gaussian noise $ {\varepsilon}_t\sim \mathcal{N}\left({\mu}_y,{\Sigma}_y\right) $ where $ {\Sigma}_y $ denotes the spatial covariance matrix of the grid level noise. Time series at the grid level are mapped to the mode level by $ {\mathbf{x}}_t=W{\mathbf{y}}_t $. The interdependencies between the modes (causal network), here represented by black arrows, are defined by $ \Phi $, where $ {\Phi}_{ij}\left(\tau \right) $ denotes the effect of the $ j $th mode into the $ i $th mode at time lag $ \tau $.

Figure 3

Figure 4. Example of a four-mode spatially aggregated vector-autoregressive model and network estimation based on PCA–PCMCI and Varimax–PCMCI. When the algorithm used to detect spatial patterns is not suitable to recover them from data (in this specific case, principal component analysis [PCA]), then both spatial and causal inferred conclusions may be wrong. (a) The ground truth. Each row represents the weight vector of a mode (columns of $ W $). In red, the underlying causal network is shown ($ \mathcal{G} $). (b) The first four components of Varimax estimated weights ($ \hat{W} $) and in red the estimated PCMCI causal network ($ \hat{\mathcal{G}} $). (c) The first four components of PCA estimated weights ($ \hat{W} $) and in red the estimated PCMCI causal network ($ \hat{\mathcal{G}} $).

Figure 4

Figure 5. Example of a two-mode spatially aggregated vector-autoregressive model and grid-level network estimations. Ground truth (a), network estimated by Mapped-PCMCI (b), by PCMCI at the grid level (c), and by correlation at the grid level (d). The blue arrows indicate dependencies at lag 1, and the brown arrows indicate dependencies at lag 2. The light blue and light brown arrows indicate a false positive link detected by the method. For the sake of visualization, autodependencies in the same grid point are not shown, and only a small fraction of false positives are shown ($ \le \hskip0.35em 50\% $ and $ \le \hskip0.35em 10\% $ in (c) and (d), respectively). Table 1 shows the performance metrics for each method. Details of the structural causal model can be found in Section 3.4.2.

Figure 5

Table 1. Performance metrics for the structural causal model described in Section 3.4.2. PCMCI and Mapped-PCMCI use $ T=1,000 $, $ {\tau}_{\mathrm{min}}=1 $, $ {\tau}_{\mathrm{max}}=2 $, $ {\alpha}_{PC}=0.2 $, and $ \alpha =0.05 $. PCMCI uses the Varimax+ algorithm (Algorithm 3 in Appendix C in the Supplementary Material) with $ N=2 $ components and a significance level of 0.01. The metrics used to evaluate the detection of the links in the true graph are precision ($ {\Pr}^{\mathrm{\mathcal{M}}} $) and recall ($ {\operatorname{Re}}^{\mathrm{\mathcal{M}}} $). To assess the coefficient of the detected links, the mean square error ($ {\mathrm{MSE}}^{\mathrm{\mathcal{E}}} $) and the mean relative absolute error ($ {\mathrm{MRAE}}^{\mathrm{\mathcal{E}}} $) have been used. For more details, see Section 4.1.3.

Figure 6

Table 2. Summary of experiments for evaluating the causal methods presented in Table 3. For each experiment, we simulate and evaluate 100 SAVAR realizations to obtain confidence intervals of evaluation metrics.

Figure 7

Table 3. Methods used for causal discovery on estimated modes. Step 1 corresponds to the dimensionality reduction method, Step 2 is the causal graph estimation method, and Step 3 refers to the link coefficient estimation.

Figure 8

Figure 6. Comparison of the methods of Table 3 for different challenges. The different subfigures represent the change of the performance metrics (y-axis) for the different methods evaluated (described in Section 4.1.2) as a function of (a) the number of time samples available, (b) the ratio of individual noise in each grid point and the covariance ($ \lambda $ in equation (16)), (c) the number of variables, and (d) the strength of the autocorrelation ($ {\Phi}^{ii}\left(\tau \right) $) of each variable. The performance metrics evaluate the reconstruction of the modes and the signal using mean absolute Pearson correlation (MAPW and MAPX, respectively), the Precision (PrM) and Recall (ReM) of the estimated Causal Graph G, and the goodness of the $ \Phi $-coefficient estimation with the mean squared error (MSEε) and the mean relative absolute error (MRAEε). The shaded areas depict the 95% range of the corresponding metric across the 100 repetitions.

Figure 9

Figure 7. Comparison of the methods of Table 3 for different challenges (continued). The different subfigures represent the change of the performance metrics (y-axis) for the different methods evaluated (described in Section 4.1.2) as a function of (a) the number of links between the modes, (b) the resolution of the spatial grid, (c) a combination of increasing the number of links between modes and the strength its corresponding links, and (d) nonstationary trends applied to each climate variable, where the trends correspond to a slow Ornstein–Uhlenbeck process. The performance metrics evaluate the reconstruction of the modes and the signal using mean absolute Pearson correlation (MAPW and MAPX, respectively), the Precision (PrM) and Recall (ReM) of the estimated Causal Graph G, and the goodness of the coefficient estimation with the mean squared error (MSEε) and the mean relative absolute error (MRAEε). The shaded areas depict the 95% range of the corresponding metric across the 100 repetitions.

Figure 10

Table 4. Methods used for causal discovery at the grid level, dimensionality reduction (Step 1), causal graph estimation (Step 2), link coefficient estimation (Step 3), and the inversion of the dimensionality reduction step (Step 4). Note that the last four methods correspond to different implementations of Mapped-PCMCI, described in detail in Section 2.3.

Figure 11

Table 5. Datasets used for causal discovery at the grid level. The first one, Syntheticdataset, is the simplest; modes do not overlap, and $ {D}_x\hskip0.35em =\hskip0.35em {\mathbf{I}}_N $, $ {D}_y\hskip0.35em =\hskip0.35em {\mathbf{I}}_L $, and $ \lambda \hskip0.35em =\hskip0.35em 1 $. In the second one, Surface Pressure dataset (low resolution), the data are closer to true climate data, and modes are extracted from a reanalysis dataset and overlap. However, dimensionality is still low, and the noise follows the same distribution as the first one. In the third one, Surface Pressure dataset, the dimensionality is larger, modes overlap, and the noise is inferred from data without enforcing any particular spatial distribution.

Figure 12

Figure 8. Comparison of causal discovery methods at the grid level. Comparison of grid-level causal discovery methods (Table 4) for different datasets (Table 5). Each row represents a different dataset that becomes more challenging for Mapped-PCMCI. The Synthetic data corresponds to a SAVAR model fulfilling all the assumptions of Mapped-PCMCI. For the second one, the Surface Pressure dataset (low resolution), we have used a dataset with mode patterns extracted from a pressure reanalysis dataset regridding the data to $ 53\times 27 $ and using a Gaussian noise following the pattern of the modes. This dataset violates the assumption of Mapped-PCMCI of nonoverlapping modes. Finally, the Surface Pressure is the second dataset with higher resolution ($ 70\times 36 $) inferring the spatial noise distribution directly from the data. This later dataset does not fulfill two of the requirements of Mapped-PCMCI, namely, nonoverlapping modes and noise structure emerging from modes’ spatial patterns. Note that, for MSEε, the scale of the ordinate axis is logarithmic, and moreover, for PrM and for both Surface Pressure datasets, the ordinate axis starts at 0.9. The performance metrics evaluate the reconstruction of the signal using mean absolute Pearson correlation (MAPX) and the Precision (PrM) and Recall (ReM) of the estimated Causal Graph G. Each boxplot is generated from 100 repetitions.

Supplementary material: PDF

Tibau et al. supplementary material

Tibau et al. supplementary material

Download Tibau et al. supplementary material(PDF)
PDF 920.5 KB