Hostname: page-component-6766d58669-mzsfj Total loading time: 0 Render date: 2026-05-24T01:13:39.749Z Has data issue: false hasContentIssue false

Precipitation prediction over the upper Indus Basin from large-scale circulation patterns using Gaussian processes

Published online by Cambridge University Press:  23 September 2025

Kenza Tazi*
Affiliation:
Department of Engineering, University of Cambridge , Cambridge, UK British Antarctic Survey , Cambridge, UK
Andrew Orr
Affiliation:
British Antarctic Survey , Cambridge, UK
J. Scott Hosking
Affiliation:
British Antarctic Survey , Cambridge, UK The Alan Turing Institute , London, UK
Richard E. Turner
Affiliation:
Department of Engineering, University of Cambridge , Cambridge, UK The Alan Turing Institute , London, UK
*
Corresponding author: Kenza Tazi; Email: kt484@cam.ac.uk

Abstract

Water resources from the Indus Basin sustain over 270 million people. However, water security in this region is threatened by climate change. This is especially the case for the upper Indus Basin, where most frozen water reserves are expected to decrease significantly by the end of the century, leaving rainfall as the main driver of river flow. However, future precipitation estimates from global climate models differ greatly for this region. To address this uncertainty, this paper explores the feasibility of using probabilistic machine learning to map large-scale circulation fields, better represented by global climate models, to local precipitation over the upper Indus Basin. More specifically, Gaussian processes are trained to predict monthly ERA5 precipitation data over a 15-year horizon. This paper also explores different Gaussian process model designs, including a non-stationary covariance function to learn complex spatial relationships in the data. Going forward, this approach could be used to make more accurate predictions from global climate model outputs and better assess the probability of future precipitation extremes.

Information

Type
Application Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. Map of Indus Basin showing elevation (TROPOMI, 2019), the Indus River (Kelso and Patterson, 2010, in blue) and the UIB (NASA JPL, 2013, watershed boundary shown with dashed line). The inset shows the watershed’s location with respect to the Asian continent.

Figure 1

Figure 2. ERA5 monthly precipitation over the UIB between 1970 and 2004. The average annual and seasonal precipitation over the UIB (left) and time series for three locations in the UIB (right) are shown. The time series shows the typical precipitation regimes in the basin (Figure 3) and are fit with a linear regression model (dotted green lines) to identify trends over the studied time period.

Figure 2

Figure 3. UIB precipitation regimes. Clusters are generated by applying soft $ k $-means to ERA5 monthly precipitation data from 1970 to 2004. The figure shows the clusters for a weighting threshold of 0.6, following the approach of Lalchand et al. (2022).

Figure 3

Table 1. Considered model features with dimensions and abbreviations. These monthly variables are or are derived from ERA5 (Hersbach et al., 2020), with the exception of the atmospheric indices, which are calculated by NOAA (2011) using historical observations

Figure 4

Table 2. Performance of the single-location(SLM), cluster (Khyber, Gilgit, Ngari), and whole-basin models (WBM). The RMSE is given in $ \mathrm{mm}\;{\mathrm{d}}^{-1} $

Figure 5

Figure 4. Test $ {R}^2 $ (left), RMSE (centre), and MLL (right) for the full single-location model across the UIB evaluated on all precipitation data between 2005 and 2019.

Figure 6

Figure 5. Learnt non-stationary kernel lengthscales over the UIB for longitude, latitude and their magnitude.

Figure 7

Figure B1. Model MLL as a function of selected input features for the single-location GPs. The plot illustrates the greedy feature selection process, where the best previous kernel design is used to test the next most likely feature. Negative changes imply the feature improves model predictive skill. Features associated with positive changes decrease model predictive skill and are not included in subsequent covariance functions.

Figure 8

Figure B2. As Figure B1 for the cluster GPs. Although EOF500C2 reduces the validation MLL of the Ngari GP, we exclude this feature in the final model as it sets the variance of previously selected features to be very low.

Figure 9

Figure B3. As Figure B1 for the stationary whole-basin GP.

Figure 10

Table C1. Feature variance contribution for the cluster (Khyber, Gilgit, Ngari) and stationary whole-basin (UIB) models. The variances are normalized with respect to the training set distribution, with higher values representing larger contributions towards predicting precipitation. Empty entries correspond to features that were not chosen during the feature selection experiments

Author comment: Precipitation prediction over the upper Indus Basin from large-scale circulation patterns using Gaussian processes — R0/PR1

Comments

No accompanying comment.

Review: Precipitation prediction over the upper Indus Basin from large-scale circulation patterns using Gaussian processes — R0/PR2

Conflict of interest statement

None

Comments

Review report: Precipitation prediction over the upper Indus Basin from large-scale circulation patterns using Gaussian processes

This paper presents a study that uses Gaussian process to predict precipitation over the upper Indus Basin (UIB) region. Several model configurations including single-location, clustered and whole basin models are evaluated. The work serves well as a novel proof-of-concept for applying probabilistic AI models with uncertainty quantification for future precipitation prediction in the UIB region, which is a region highly vulnerable to climate change.

The paper is well-written with clear methodology, appropriate data and impactful applications. It demonstrated the feasibility to use GCM variables to predict precipitation over complex mountain terrain in UIB with probabilistic machine learning model (Gaussian Process). This machine learning based approach represents a novel alternative to the traditional dynamical down-scaling approaches. As an application paper is suitable for publication for Environmental Data Science (EDS).

The methodology is sound and well executed. The choice of data (ERA5) is justified. The design of experiments is appropriate with proper performance metrics. Different experimental setup and the trade-offs between different configurations are explored and discussed. The paper clearly presents the work and is in an appropriate length with clear figures and tables that effectively support the narrative. There are right amount of description of the background and methodology that making the paper easy to read for people from either the machine learning community or the environmental science community.

To further improve the paper, first, there are some well-known AI down-scaling studies available in literature using models like GAN and diffusion models achieving notable results. I think it is valid that the author argues that probabilistic machine learning models like GP can address uncertainties, however it is still beneficial to review some existing AI down-scaling literature in additional to reviews on dynamical down-scaling. Secondly, if possible, to have a discussion or comments on the feature selection outcomes in Appendix B will be interesting, especially to compare with human knowledge to understand if the machine learning algorithm are more sensitive to features that human expert thinks are most important. I had some concerns with regards to the computational scalability of GP, but it has been well addressed in the limitations and future work sections. I recommend the paper to be accepted with minor addition of literature review and discussions.

Review: Precipitation prediction over the upper Indus Basin from large-scale circulation patterns using Gaussian processes — R0/PR3

Conflict of interest statement

No.

Comments

This work is an extension of the earlier submitted work to the Climate Informatics conference. The paper aims to explore the machine learning approach to predict precipitation over the upper Indus Basin using large-scale climate circulation.

The current version of the work improved upon the previous work but I have a few suggestions for the authors to consider.

1. Feature selection is important for the readers to understand the model performance. If authors want to keep the feature selection details in the Appendix, at least authors should consider including what major variables are included in each model in the main text.

2. Many input variables have strong correlations (for those non-EOF variables). Will there be benefit to perform a PCA for the input features before feeding into the GP model?

3. The GP model also can generate uncertainty of the model output which is one of the advantages of the GP model. This can also be useful for the assessment of the model estimation (e.g., understanding whether the difference between the estimation and target is within the range of the model uncertainty). It will be good to show some analysis of the GP model output uncertainty.

4. There is recent development of more efficient GP models that may address the issues that the authors mentioned about the computational cost. Authors may want to explore these new GP implementations as well.

Recommendation: Precipitation prediction over the upper Indus Basin from large-scale circulation patterns using Gaussian processes — R0/PR4

Comments

No accompanying comment.

Decision: Precipitation prediction over the upper Indus Basin from large-scale circulation patterns using Gaussian processes — R0/PR5

Comments

No accompanying comment.

Author comment: Precipitation prediction over the upper Indus Basin from large-scale circulation patterns using Gaussian processes — R1/PR6

Comments

NA

Review: Precipitation prediction over the upper Indus Basin from large-scale circulation patterns using Gaussian processes — R1/PR7

Conflict of interest statement

Reviewer declares none.

Comments

The reply to the earlier review comments are satisfactory, I recommend the publication of this manuscript without further comments.

Recommendation: Precipitation prediction over the upper Indus Basin from large-scale circulation patterns using Gaussian processes — R1/PR8

Comments

No accompanying comment.

Decision: Precipitation prediction over the upper Indus Basin from large-scale circulation patterns using Gaussian processes — R1/PR9

Comments

No accompanying comment.