Hostname: page-component-6766d58669-tq7bh Total loading time: 0 Render date: 2026-05-21T18:52:38.577Z Has data issue: false hasContentIssue false

Clustering of causal graphs to explore drivers of river discharge

Published online by Cambridge University Press:  04 July 2023

Wiebke Günther*
Affiliation:
Institute of Data Science, German Aerospace Center, Jena, Germany
Peter Miersch
Affiliation:
Department of Computational Hydrosystems, Helmholtz Centre for Environmental Research – UFZ, Leipzig, Germany
Urmi Ninad
Affiliation:
Institute of Data Science, German Aerospace Center, Jena, Germany Faculty of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany
Jakob Runge
Affiliation:
Institute of Data Science, German Aerospace Center, Jena, Germany Faculty of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany
*
Corresponding author: Wiebke Günther; Email: wiebke.guenther@dlr.de

Abstract

This work aims to classify catchments through the lens of causal inference and cluster analysis. In particular, it uses causal effects (CEs) of meteorological variables on river discharge while only relying on easily obtainable observational data. The proposed method combines time series causal discovery with CE estimation to develop features for a subsequent clustering step. Several ways to customize and adapt the features to the problem at hand are discussed. In an application example, the method is evaluated on 358 European river catchments. The found clusters are analyzed using the causal mechanisms that drive them and their environmental attributes.

Information

Type
Methods Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Figure 1. Geographic overview and distribution of European catchments characteristics. (a) Area, (b) average elevation, (c) average slope, and (d) forest cover.

Figure 1

Figure 2. Illustration of total causal effect estimation in a time series graph for $ \tau =2 $. The CE $ {\Psi}_{ji}(2) $ between $ {X}_{t-2}^1=: X $ and $ {X}_t^2=: Y $ is computed by summing over the products of path coefficients (link labels) along each path, that is, $ {\Psi}_{21}(2)={\Phi}_{1,1}(1){\Phi}_{2,1}(1)+{\Phi}_{2,1}(1){\Phi}_{2,2}(1) $.

Figure 2

Figure 3. Cluster results for different choices of features. For a brief explanation of these, and other, feature extraction methods see Section 2.2. In the following $ {X}^1 $ denotes temperature, $ {X}^2 $ precipitation and $ {X}^3 $ discharge. The features correspond to (a) average lagged CE of $ {X}^1 $ on $ {X}^3 $, (b) average lagged CE of $ {X}^2 $ on $ {X}^3 $, (c) average lagged CE of $ {X}^i $ on $ {X}^j $ for all $ i\ne j $, (d) maximal lagged CE of $ {X}^i $ on $ {X}^j $ for all $ i\ne j $, (e) path coefficients, and (f) ACE of $ {X}^i $ on the system for all $ i $.

Figure 3

Figure 4. Histogram of one-dimensional CE-based features (right column) in comparison to clustering based on feature vectors that include the average joint CE of temperature on precipitation, temperature on discharge, precipitation on temperature, and precipitation on discharge (left and middle column). For instance, the plot in the right upper corner shows the frequency of values of the average lagged CE of temperature on discharge within the different clusters. One can see that in the catchments of cluster 1 (blue), the average lagged CE of temperature on discharge is relatively low in comparison to the other clusters. Colors correspond to the clusters illustrated in Figure 3.

Figure 4

Figure 5. Distribution of catchment attributes within each cluster. The attributes are slope in degrees times 10, average altitude in m above sea level, area in $ {\mathrm{km}}^2 $, proportion of basin covered by forest, proportion of basin covered by impervious surfaces, and mean annual rain volume in $ 100\;{\mathrm{km}}^3 $. Colors correspond to the clusters illustrated in Figure 3. In the following $ {X}^1 $ denotes temperature, $ {X}^2 $ precipitation, and $ {X}^3 $ discharge. The features correspond to (a) average lagged CE of $ {X}^1 $ on $ {X}^3 $, (b) average lagged CE of $ {X}^2 $ on $ {X}^3 $, (c) average lagged CE of $ {X}^i $ on $ {X}^j $ for all $ i\ne j $, (d) maximal lagged CE of $ {X}^i $ on $ {X}^j $ for all $ i\ne j $, (e) path coefficients, and (f) ACE of $ {X}^i $ on the system for all $ i $.

Supplementary material: PDF

Günther et al. supplementary material

Günther et al. supplementary material 1

Download Günther et al. supplementary material(PDF)
PDF 9.7 MB
Supplementary material: PDF

Günther et al. supplementary material

Günther et al. supplementary material 2

Download Günther et al. supplementary material(PDF)
PDF 6.6 MB