Time series predictions in unmonitored sites: a survey of machine learning techniques in water resources

Jared D. Willard; Charuleka Varadharajan; Xiaowei Jia; Vipin Kumar

doi:10.1017/eds.2024.14

Time series predictions in unmonitored sites: a survey of machine learning techniques in water resources

Part of: Tackling Climate Change with Machine Learning

Published online by Cambridge University Press: 22 January 2025

Jared D. Willard

Charuleka Varadharajan

Xiaowei Jia and

Vipin Kumar

Show author details

Jared D. Willard*: Affiliation:
Computing Sciences Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, MN, USA
Charuleka Varadharajan: Affiliation:
Earth and Environmental Sciences Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Xiaowei Jia: Affiliation:
Department of Computer Science, University of Pittsburgh, Pittsburgh, PA, USA
Vipin Kumar: Affiliation:
Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, MN, USA
*: Corresponding author: Jared D. Willard; Email: jwillard@lbl.gov

Article contents

Abstract
Impact statement
Introduction
Machine learning frameworks for predictions in unmonitored sites
Summary and discussion
Conclusion
Author contributions
Competing interest
Data availability statement
Funding statement
Ethical standards
Footnotes
References

Abstract

Prediction of dynamic environmental variables in unmonitored sites remains a long-standing challenge for water resources science. The majority of the world’s freshwater resources have inadequate monitoring of critical environmental variables needed for management. Yet, the need to have widespread predictions of hydrological variables such as river flow and water quality has become increasingly urgent due to climate and land use change over the past decades, and their associated impacts on water resources. Modern machine learning methods increasingly outperform their process-based and empirical model counterparts for hydrologic time series prediction with their ability to extract information from large, diverse data sets. We review relevant state-of-the art applications of machine learning for streamflow, water quality, and other water resources prediction and discuss opportunities to improve the use of machine learning with emerging methods for incorporating watershed characteristics and process knowledge into classical, deep learning, and transfer learning methodologies. The analysis here suggests most prior efforts have been focused on deep learning frameworks built on many sites for predictions at daily time scales in the United States, but that comparisons between different classes of machine learning methods are few and inadequate. We identify several open questions for time series predictions in unmonitored sites that include incorporating dynamic inputs and site characteristics, mechanistic understanding and spatial context, and explainable AI techniques in modern machine learning frameworks.

Keywords

deep learning machine learning prediction in unmonitored basins transfer learning

Information

Type: Survey Paper
Information: Environmental Data Science , Volume 4 , 2025 , e7

DOI: https://doi.org/10.1017/eds.2024.14 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial licence (http://creativecommons.org/licenses/by-nc/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright: © The Author(s), 2025. Published by Cambridge University Press

Impact statement

This review addresses a gap that different types of ML methods for hydrological time series prediction in unmonitored sites are often not compared in detail and best practices are unclear. We consolidate and synthesize state-of-the-art ML techniques for researchers and water resources management, where the strengths and limitations of different ML techniques are described allowing for a more informed selection of existing ML frameworks and development of new ones. Open questions that require further investigation are highlighted to encourage researchers to address specific issues like training data and input selection, model explainability, and the incorporation of process-based knowledge.

1. Introduction

Environmental data for water resources often does not exist at the appropriate spatiotemporal resolution or coverage for scientific studies or management decisions. Although advanced sensor networks and remote sensing are generating more environmental data (Hubbard et al., Reference Hubbard, Varadharajan, Wu, Wainwright and Dwivedi2020; Reichstein et al., Reference Reichstein, Camps-Valls, Stevens, Jung, Denzler and Carvalhais2019; Topp et al., Reference Topp, Pavelsky, Jensen, Simard and Ross2020), the amount of observations available will continue to be inadequate for the foreseeable future, notably for variables that are only measured at a few locations. For example, the United States Geological Survey (USGS) streamflow monitoring network covers less than 1% of stream reaches in the United States, with monitoring sites declining over time (Ahuja, Reference Ahuja2016; Konrad et al., Reference Konrad, Anderson, Restivo and David2022), and stream coverage is significantly lower in many other parts of the world. Similarly, just over 12,000 of the 185,000 lakes with at least 4 hectares in area in the conterminous US (CONUS) have at least one lake surface temperature measurement (Willard et al., Reference Willard, Read, Topp, Hansen and Kumar2022b), and less than 5% of those have 10 or more days with temperature measurements (Read et al., Reference Read, Carr, De Cicco, Dugan, Hanson, Hart, Kreft, Read and Winslow2017). Since observing key variables at scale is prohibitively costly (Caughlan and Oakley, Reference Caughlan and Oakley2001), models that use existing data and transfer information to unmonitored systems are critical to closing the data gaps. The problem of streamflow and water quality prediction in unmonitored basins in particular has been a longstanding area of research in hydrology due to its importance for infrastructure design, energy production, and management of water resources. The need for these predictions has grown with changing climate, increased frequency and intensity of extreme events, and widespread human impacts on water resources (Blöschl et al., Reference Blöschl, Bloschl, Sivapalan, Wagener, Savenije and Viglione2013; Guo et al., Reference Guo, Zhang, Zhang and Wang2020b; Salinas et al., Reference Salinas, Laaha, Rogger, Parajka, Viglione, Sivapalan and Blöschl2013; Sánchez-Gómez et al., Reference Sánchez-Gómez, Martínez-Pérez, Sylvain, Sastre-Merlín and Molina-Navarro2023; Zounemat-Kermani et al., Reference Zounemat-Kermani, Batelaan, Fadaee and Hinkelmann2021).

A variety of models—process-based, machine learning (ML), and statistical models—have been used to predict key ecosystem variables. These models can be applied to a few categories of applications where data are unavailable at the spatial and temporal scales needed for environmental decision-making. The first is based on data completeness, which could occur when (a) a site is not monitored at all; (b) a site is monitored, but the time series has large chunks of missing data or is available for a limited period; (c) a site is monitored, but the time series has sporadic missing data. A second is based on data resolution when (a) a site is monitored but at a lower resolution than desired, or (b) a site is not monitored but data for other covariates are available. In this paper, we define the problem of predictions in unmonitored sites or the ‘unmonitored’ scenario as specifically the cases where the locations have either no monitoring data at all for the variable of interest or sufficiently sparse or low-resolution monitoring data where it effectively can be considered as an unmonitored site. In cases where data need to be gap filled or extended forward or backward in time, a model can be trained on a time period within one site and then predictions are made for new time periods at the same site. This is often referred to as the monitored prediction scenario or the gauged scenario in streamflow modeling. While temporal predictions in monitored sites are important, spatial extrapolation to unmonitored sites is even more crucial, because the vast majority of locations remain unmonitored for many environmental variables of interest.

Traditionally, water resources modeling in unmonitored sites has relied on the regionalization of process-based models. Regionalization techniques relate the parameter values of a model calibrated to the data of a monitored site to the inherent characteristics of the unmonitored site (Razavi and Coulibaly, Reference Razavi and Coulibaly2013; Seibert, Reference Seibert1999; Yang et al., Reference Yang, Magnusson and Xu2019b). However, large uncertainties and mixed success have prevented process-based model regionalization from being widely employed in hydrological analysis and design (Bastola et al., Reference Bastola, Ishidaira and Takeuchi2008; Prieto et al., Reference Prieto, Le Vine, Kavetski, Garcia and Medina2019; Wagener and Wheater, Reference Wagener and Wheater2006). A major issue that makes process-based model calibration and regionalization difficult is the complex relationships between model parameters (e.g., between soil porosity and soil depth in rainfall-runoff models) (Kratzert et al., Reference Kratzert, Herrnegger, Klotz, Hochreiter and Klambauer2019a; Oudin et al., Reference Oudin, Andreassian, Perrin, Michel and Le Moine2008), which leads to the problem of equifinality (Beven and Freer, Reference Beven and Freer2001) where different parameter values or model structures are equally capable of reproducing a similar hydrological outcome. Additionally, process models require significant amounts of site-specific data collection and computational power for calibration and benchmarking, which is expensive to generate across diverse regions of interest.

On the other hand, ML models built using data from large-scale monitoring networks do regionalization implicitly without the dependence on expert knowledge, pre-defined hydrological models, and also often without any hydrological knowledge at all. Since ML models have significantly more flexibility in how parameters and connections between parameters are optimized, unlike process-based models where each parameter represents a specific system component or property, issues relevant to equifinality become largely irrelevant (S. Razavi et al., Reference Razavi, Hannah, Elshorbagy, Kumar, Marshall, Solomatine, Dezfuli, Sadegh and Famiglietti2022). In recent years, numerous ML approaches have been explored for environmental variable time series predictions in unmonitored locations that span a variety of methods and applications in hydrology and water resources engineering. Most of the ML approaches for predictions in unmonitored regions focus on stream flows, but are rapidly expanding to other variables like river and lake water quality as data collection and modeling continue to advance such as soil moisture (Fang et al., Reference Fang, Pan and Shen2018), stream temperature (Rahmani et al., Reference Rahmani, Shen, Oliver, Lawson and Appling2021; Weierbach et al., Reference Weierbach, Lima, Willard, Hendrix, Christianson, Lubich and Varadharajan2022), and lake temperature (Willard et al., Reference Willard, Read, Topp, Hansen and Kumar2022b). The ML models have continually outperformed common process-based hydrological models in terms of both predictive performance and computational efficiency at large spatial scales (Kratzert et al., Reference Kratzert, Klotz, Herrnegger, Sampson, Hochreiter and Nearing2019b; Oğuz and Ertuğrul, Reference Oğuz and Ertuğrul2023; Read et al., Reference Read, Jia, Willard, Appling, Zwart, Oliver, Karpatne, Hansen, Hanson and Watkins2019; Sun et al., Reference Sun, Jiang, Mudunuru and Chen2021a). Specifically, deep learning architectures like long short-term memory (LSTM) networks have been increasingly used for time series predictions due to their ability to model systems and variables that have memory, that is, where past conditions influence present behavior (e.g., snowpack depth; (Lees et al., Reference Lees, Reece, Kratzert, Klotz, Gauch, De Bruijn, Kumar Sahu, Greve, Slater and Dadson2022)). LSTMs have been shown to outperform both state-of-the-art process-based models and also classical ML models (e.g., XGBoost, random forests, support vector machines) for applications like lake temperature (Daw et al., Reference Daw, Thomas, Carey, Read, Appling and Karpatne2020; Jia et al., Reference Jia, Willard, Karpatne, Read, Zwart, Steinbach and Kumar2021a; Read et al., Reference Read, Jia, Willard, Appling, Zwart, Oliver, Karpatne, Hansen, Hanson and Watkins2019), stream temperature (Feigl et al., Reference Feigl, Lebiedzinski, Herrnegger and Schulz2021; Weierbach et al., Reference Weierbach, Lima, Willard, Hendrix, Christianson, Lubich and Varadharajan2022), and groundwater dynamics (Jing et al., Reference Jing, He, Tian, Lancia, Cao, Crivellari, Guo and Zheng2022) predictions among many others. Other deep learning architectures effective for time series modeling, but seen less often in hydrology, include the simpler gated recurrent unit (GRU) (Chung et al., Reference Chung, Gulcehre, Cho and Bengio2014) or more recent innovations like the temporal convolution network (TCN) (Lea et al., Reference Lea, Flynn, Vidal, Reiter and Hager2017), or spatiotemporally aware process-guided deep learning models (Topp et al., Reference Topp, Barclay, Diaz, Sun, Jia, Lu, Sadler and Appling2023). Recent advancements have also introduced transformer-based methods (Yin et al., Reference Yin, Guo, Zhang, Chen and Zhang2022), which are architecturally able to model long-term dependencies more effectively than LSTM (Wen et al., Reference Wen, Zhou, Zhang, Chen, Ma, Yan and Sun2022; Zeyer et al., Reference Zeyer, Bahar, Irie, Schlüter and Ney2019). Transformers have been recently shown to occasionally outperform other methods for streamflow prediction (Amanambu et al., Reference Amanambu, Mossa and Chen2022; Xu et al., Reference Xu, Lin, Hu, Wang, Wu, Zhang and Ran2023b; Yin et al., Reference Yin, Zhu, Zhang, Xing, Xia, Liu and Zhang2023). However, so far, these alternatives to LSTM have primarily focused on temporal predictions in well-monitored locations.

Understanding how to leverage state-of-the-art ML with existing observational data for prediction in unmonitored sites can lend insights into both model selection and training for transfer to new regions, as well as sampling design for new monitoring paradigms to optimally collect data for modeling and analysis. However, to date, ML-based approaches have not been sufficiently compared or benchmarked, making it challenging for researchers to determine which architecture to use for a given prediction task. In this paper, we provide a comprehensive and systematic review of ML-based techniques for time series predictions in unmonitored sites and demonstrate their use for different environmental applications. We also enumerate the gaps and opportunities that exist for advancing research in this promising direction. The scope of our study is limited to using ML for predictions in unmonitored scenarios as defined above. We do not cover the many statistical and ML-based efforts in recent years for regionalizing process-based hydrological models, a topic that is covered extensively in the recent review Guo et al. (Reference Guo, Zhang, Zhang and Wang2020b). We also exclude remote sensing applications to estimate variables at previously unmonitored inland water bodies. This is a different class of problems and there are significant challenges to increasing the scale and robustness of remote sensing applications including atmospheric effects, measurement frequency, and insufficient resolution for smaller water bodies like rivers (Topp et al., Reference Topp, Pavelsky, Jensen, Simard and Ross2020), which are detailed in a number of reviews (Gholizadeh et al., Reference Gholizadeh, Melesse and Reddi2016; Giardino et al., Reference Giardino, Brando, Gege, Pinnel, Hochberg, Knaeps, Reusen, Doerffer, Bresciani and Braga2019; Odermatt et al., Reference Odermatt, Gitelson, Brando and Schaepman2012; Topp et al., Reference Topp, Pavelsky, Jensen, Simard and Ross2020).

We organize the paper as follows. Section 2 first describes different ML and knowledge-guided ML frameworks that have been applied for water resources time series predictions in unmonitored sites. Then, Section 3 summarizes and discusses overarching themes between methods, applications, regions, and datasets. Lastly, Section 3.1 analyzes the gaps in knowledge and lists open questions for future research.

2. Machine learning frameworks for predictions in unmonitored sites

In this section, we describe different ML methodologies that have been used for applications in water resources time series modeling for unmonitored sites. Generally, the process of developing these ML models first involves generating predictions for a set of entities (e.g., stream gauge sites, lakes) with monitoring data of the target variable (e.g., discharge, water quality). Then, the knowledge, data, or models developed on those systems are used to predict the target variable on entities with no monitoring data available. Importantly, for evaluation purposes, these models are often tested on pseudo-unmonitored sites, where data is withheld until the testing stage to mimic model building for real unmonitored sites.

The most commonly used type of model for this approach is known as an entity-aware model (Ghosh et al., Reference Ghosh, Yang, Khandelwal, He, Renganathan, Sharma, Jia and Kumar2023; Kratzert et al., Reference Kratzert, Klotz, Shalev, Klambauer, Hochreiter and Nearing2019c, Reference Kratzert, Klotz, Shalev, Klambauer, Hochreiter and Nearingd)Footnote ¹, which attempts to incorporate inherent characteristics of different entities to improve prediction performance. These characteristics across the literature are also called attributes, traits, or properties. The concept is similar to trait-based modeling to map characteristics to function in ecology and other earth sciences (Zakharova et al., Reference Zakharova, Meyer and Seifan2019). The underlying assumption is that the input data used for prediction consists of both dynamic physical drivers (e.g., daily meteorology) and site-specific characteristics of each entity such as their geomorphology, climatology, land cover, or land use. Varied ML methodologies have been developed that differ both in how these characteristics are used to improve performance and also in how entities are selected and used for modeling. These approaches are described further below and include building a single model using all available entities or subgroups of entities deemed relevant to the target unmonitored sites (Section 2.1), transfer learning of models from well-monitored sites to target sites (Section 2.2), and a cross-cutting theme of integrating ML with domain knowledge and process-based models (Section 2.3).

2.1. Broad-scale models using all available entities or a subgroup of entities

Typically, process-based models have been applied and calibrated to specific locations, which is fundamentally different from the ML approach of building a single regionalized model on a large number of sites (hence referred to as a broad-scale model) that inherently differentiates between dynamic behaviors and characteristics of different sites (Golian et al., Reference Golian, Murphy and Meresa2021; Guo et al., Reference Guo, Zhang, Zhang and Wang2021). The objective of broad-scale modeling is to learn and encode these differences such that differences in site characteristics translate into appropriately heterogeneous hydrologic behavior. Usually, the choice is made to include all possible sites or entities in building a single broad-scale model. However, using the entirety of available data is not always optimal. Researchers may also consider selecting only a subset of entities for training for a variety of reasons including (1) the entire dataset may be imbalanced such that performance diminishes on minority system types (Wilson et al., Reference Wilson, Close, Abraham, Sarris, Banasiak, Stenger and Hadfield2020), (2) some types of entities may be noisy, contain erroneous or outlier data, or have varying amount of input data, or (3) to save on the computational expense of building a broad-scale model. Traditionally in geoscientific disciplines like hydrology, stratifying a large domain of entities into multiple homogeneous subgroups or regions that are “similar” is common practice. This is based on evidence in process-based modeling that grouping heterogeneous sites for regionalization can negatively affect performance when extrapolating to unmonitored sites (Hosking and Wallis, Reference Hosking and Wallis1997; Lettenmaier et al., Reference Lettenmaier, Wallis and Wood1987). Therefore, it remains an open question whether using all the available data is the optimal approach for building training datasets for predictions in unmonitored sites. Copious research has been done investigating various homogeneity criteria trying to find the best way to group sites for these regionalization attempts for process-based modeling (Burn, Reference Burn1990a, Reference Burnb; Guo et al., Reference Guo, Zhang, Zhang and Wang2021), and many recent approaches also leverage ML for clustering sites (e.g., using k-means (Aytac, Reference Aytac2020; Tongal and Sivakumar, Reference Tongal and Sivakumar2017)) prior to parameter regionalization (Guo et al., Reference Guo, Zhang, Zhang and Wang2021; Sharghi et al., Reference Sharghi, Nourani, Soleimani and Sadikoglu2018).

Many studies use subgroups of sites when building broad-scale models using ML. For example, Araza et al. (Reference Araza, Hein, Duku, Rawlins and Lomboy2020) demonstrate that a principal components analysis-based clustering of 21 watersheds in the Luzon region of the Philippines outperforms an entity-aware broad-scale model built on all sites together for daily streamflow prediction. Furthermore, Weierbach et al. (Reference Weierbach, Lima, Willard, Hendrix, Christianson, Lubich and Varadharajan2022) found that an ML model combining two regions of data in the United States for stream temperature prediction did not perform better than building models for each individual region. Chen et al. (Reference Chen, Zhu, Jiang and Sun2020) cluster weather stations by mean climatic characteristics when building LSTM and temporal convolution network models for predicting evapotranspiration in out-of-sample sites, claiming models performed better on similar climatic conditions. Additionally for stream water level prediction in unmonitored sites, Corns et al. (Reference Corns, Long, Hale, Kanwar and Vanfossan2022) group sites based on the distance to upstream and downstream gauges to include proximity to a monitoring station as criteria for input data selection. The water levels from the upstream and downstream gauges are also used as input variables. The peak flood prediction model described in Section 2.1 divides the models and data across the 18 hydrological regions in the conterminous US as defined by USGS (U.S. Geological Survey, 2016).

However, it remains to be seen how selecting a subgroup of entities as opposed to using all available data fairs in different prediction applications because much of this work does not compare the performances of both these cases. When viewed through the lens of modern data-driven modeling, evidence suggests deep learning methods in particular may benefit from pooling large amounts of heterogeneous training data. Fang et al. (Reference Fang, Kifer, Lawson, Feng and Shen2022) demonstrate this effect of “data synergy” on both streamflow and soil moisture modeling in gauged basins showing that deep learning models perform better when fed a diverse training dataset spanning multiple regions as opposed to homogeneous dataset on a single region even when the homogeneous data is more relevant to the testing dataset and the training datasets are the same size. A recent opinion piece Kratzert et al. (Reference Kratzert, Gauch, Klotz and Nearing2024) also make an argument against building deep learning models, specifically LSTM models, on streamflow data from small homogeneous sets of watersheds, especially for predicting unmonitored areas and for extreme events. Moreover in Willard (Reference Willard2023), regional LSTM models of stream temperature in the United States perform worse than the LSTM model built on all sites in the CONUS for 15 out of 17 regions, and single-site trained models transferred to the testing sites generally performed worse except when pre-trained on the global model.

Overall across broad-scale modeling efforts, studies differ in how the ML framework leverages the site characteristics. The following subsections describe different approaches of incorporating site characteristics into broad-scale models that use all available entities or a subgroup, covering direct concatenation of site characteristics and dynamic features, encoding of characteristics using ML, and the use of graph neural networks to encode dependencies between sites.

2.1.1. Direct concatenation broad-scale model

When aggregating data across many sites for an entity-aware broad-scale model, it is common to append site characteristics directly with the input forcing data directly before feeding it to the ML model. Shown visually in Figure 1, this is a simple approach that does not require novel ML architecture and is therefore very accessible for researchers. Although some characteristics can change over time, many applications treat these characteristics as static values over each timestep through this concatenation process, even though commonly used recurrent neural network-based approaches like LSTM are not built to incorporate static inputs (Li et al., Reference Li, Lyons, Klaus, Gage, Kollef and Lu2021a; Lin et al., Reference Lin, Zhang, Ivy, Capan, Arnold, Huddleston and Chi2018; Rahman et al., Reference Rahman, Yuan, Xie and Sha2020). In a landmark result for temporal streamflow predictions, Kratzert et al. (Reference Kratzert, Klotz, Shalev, Klambauer, Hochreiter and Nearing2019c, Reference Kratzert, Klotz, Shalev, Klambauer, Hochreiter and Nearing2019d) used an LSTM with directly concatenated site characteristics and dynamic inputs built on 531 geographically diverse catchments within the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) dataset, and were able to predict more accurately on unseen data on the same 531 test sites than state-of-the-art process-based models calibrated to each basin individually. Given the success of the model, that study was expanded to the scenario of predicting unmonitored stream sites (Kratzert et al., Reference Kratzert, Herrnegger, Klotz, Hochreiter and Klambauer2019a), where they found the accuracy of the broad-scale LSTM with concatenated features in ungauged basins was comparable to calibrated process-based models in gauged basins. Arsenault et al. (Reference Arsenault, Martel, Brunet, Brissette and Mai2023) and Jiang et al. (Reference Jiang, Zheng and Solomatine2020) further show a similar broad-scale LSTM can outperform the state-of-the-art regionalization of process-based models for predictions in ungauged basins in the United States, and similar results are seen in Russian (Ayzel et al., Reference Ayzel, Kurochkina, Kazakov and Zhuravlev2020), Brazilian (Nogueira Filho et al., Reference Nogueira Filho, Souza Filho, Porto, Vieira Rocha, Sousa Estácio and Martins2022), and Korean (Choi et al., Reference Choi, Lee and Kim2022) watersheds. More recently, attention-based transformer models have been used in Yin et al. (Reference Yin, Zhu, Zhang, Xing, Xia, Liu and Zhang2023) for streamflow prediction on the CAMELS dataset showing improved performance over multiple kinds of LSTM models for both prediction in individual ungauged sites and entire ungauged regions. Broad-scale models have also been used for the prediction of other environmental variables like continental-scale snow pack dynamics (Wang et al., Reference Wang, Gupta, Zeng and Niu2022), monthly baseflow (Xie et al., Reference Xie, Liu, Tian, Wang, Bai and Liu2022), dissolved oxygen in streams (Zhi et al., Reference Zhi, Feng, Tsai, Sterle, Harpold, Shen and Li2021), and lake surface temperature (Willard et al., Reference Willard, Read, Topp, Hansen and Kumar2022b).

Figure 1.

Example of an LSTM network model with directly concatenated site characteristics and dynamic inputs.

The previously mentioned approaches in most cases focus on predicting mean daily values, but accurate predictions of extremes (e.g., very high flow events or droughts) remain an outstanding and challenging problem in complex spatiotemporal systems (J. Jiang et al., Reference Jiang, Huang, Grebogi and Lai2022). This is a longstanding fundamental challenge in catchment hydrology (Salinas et al., Reference Salinas, Laaha, Rogger, Parajka, Viglione, Sivapalan and Blöschl2013), where typically the approach has been to subdivide the study area into fixed, contiguous regions that are used to regionalize predictions for floods or low flows from process-based models for all catchments in a given area. At least for process-based models, this has been shown to be more successful than global regionalizations (Salinas et al., Reference Salinas, Laaha, Rogger, Parajka, Viglione, Sivapalan and Blöschl2013). As recent ML and statistical methods are shown to outperform process-based models for the prediction of extremes (Frame et al., Reference Frame, Kratzert, Klotz, Gauch, Shelev, Gilon, Qualls, Gupta and Nearing2022; Viglione et al., Reference Viglione, Parajka, Rogger, Salinas, Laaha, Sivapalan and Blöschl2013), opportunities exist to apply broad-scale entity-aware methods in the same way as daily averaged predictions. Challenges facing ML models for extreme prediction include replacing common loss functions like mean squared error which tend to prioritize average behavior and may not adequately capture rare and extreme events (Mudigonda et al., Reference Mudigonda, Ram, Kashinath, Racah, Mahesh, Liu, Beckham, Biard, Kurth and Kim2021), and dealing with the common scenario of extreme data being sparse (Zhang et al., Reference Zhang, Alexander, Hegerl, Jones, Tank, Peterson, Trewin and Zwiers2011). Initial studies using broad-scale models with concatenated inputs for peak flood prediction show that these methods can also be used to predict extremes. For instance, Rasheed et al. (Reference Rasheed, Aravamudan, Sefidmazgi, Anagnostopoulos and Nikolopoulos2022) built a peak flow prediction model that combines a “detector” LSTM that determines if the meteorological conditions pose a flood risk, with an entity-aware ML model for peak flow prediction to be applied if there is a risk. They show that building a model only on peak flows and combining it with a detector model improves performance over the broad-scale LSTM model trained to predict mean daily flows (e.g., Kratzert et al. (Reference Kratzert, Klotz, Herrnegger, Sampson, Hochreiter and Nearing2019b)). Though initial studies like this show promise, further research is required to compare techniques that deal with the imbalanced data, that is, extreme events are often rare outliers, different loss functions and evaluation metrics for extremes, and different ML architectures.

Based on these results, it appears as though site characteristics can contain sufficient information to differentiate between site-specific dynamic behaviors for a variety of prediction tasks. This challenges a longstanding hydrological perspective that transferring models and knowledge from one basin to another requires that they must be functionally similar (Fang et al., Reference Fang, Kifer, Lawson, Feng and Shen2022; Guo et al., Reference Guo, Zhang, Zhang and Wang2021; Razavi and Coulibaly, Reference Razavi and Coulibaly2013 since these broad-scale models are built on a large number of heterogeneous sites. A recent study Li et al. (Reference Li, Khandelwal, Jia, Cutler, Ghosh, Renganathan, Xu, Tayal, Nieber and Duffy2022) also substitutes random values as a substitute for site characteristics in a direct concatenation broad-scale LSTM to improve performance and promote entity-awareness in the case of missing or uncertain characteristics.

2.1.2. Concatenation of encoded site characteristics for broad-scale models

Though recurrent neural network models like the LSTM have been used with direct concatenation of static and dynamic features, other methods have been developed that encode watershed characteristics as static features to improve accuracy or increase efficiency. As shown in Figure 2, one approach is to use two separate neural networks, where the first learns a representation of the “static” characteristics using an encoding neural network (e.g., an autoencoder), and the second takes that encoded representation at each time-step along with dynamic time-series inputs to predict the target using a time series ML framework (e.g., LSTM). This has been shown to be effective mostly in healthcare data applications (Esteban et al., Reference Esteban, Staeck, Baier, Yang and Tresp2016; Li et al., Reference Li, Lyons, Klaus, Gage, Kollef and Lu2021a; Lin et al., Reference Lin, Zhang, Ivy, Capan, Arnold, Huddleston and Chi2018), but also in lake temperature prediction in Tayal et al. (Reference Tayal, Jia, Ghosh, Willard, Read and Kumar2022). The idea is to extract the information from characteristics that account for data heterogeneity across multiple entities. This extraction process is independent of the LSTM or similar time series model handing the dynamic input and therefore can be flexible in how the two components are connected. Examples to improve efficiency include, (1) static information may not be needed at every time step and be applied only at the time step of interest (Lin et al., Reference Lin, Zhang, Ivy, Capan, Arnold, Huddleston and Chi2018), or (2) the encoding network can be used to reduce the dimension of static features prior to connecting with the ML framework doing the dynamic prediction (Kao et al., Reference Kao, Liou, Lee and Chang2021). In terms of performance, works from multiple disciplines have found these types of approaches improve accuracy over the previously described direct concatenation approach (Lin et al., Reference Lin, Zhang, Ivy, Capan, Arnold, Huddleston and Chi2018; Rahman et al., Reference Rahman, Yuan, Xie and Sha2020; Tayal et al., Reference Tayal, Jia, Ghosh, Willard, Read and Kumar2022).

Figure 2.

Example of a combination static feature encoder neural network with an LSTM network model.

In water resources applications, Tayal et al. (Reference Tayal, Jia, Ghosh, Willard, Read and Kumar2022) demonstrate this in lake temperature prediction using an invertible neural network in the encoding step, showing slight improvement over the static and dynamic concatenation approach. Invertible neural networks have the ability to model forward and backward processes within a single network in order to solve inverse problems. For example, their model uses lake characteristics and meteorological data to predict lake temperature, but can also attempt to derive lake characteristics from lake temperature data. It has also been shown in streamflow prediction that this type of encoder network can be used either on the site characteristics (S. Jiang et al., Reference Jiang, Zheng and Solomatine2020) or also on partially available soft data like soil moisture or flow duration curves (Feng et al., Reference Feng, Lawson and Shen2021). In Jiang et al. (Reference Jiang, Zheng and Solomatine2020), they include a feed-forward neural network to process static catchment-specific attributes separately from dynamic meteorological data prior to predicting with a physics-informed neural network model. However, it is not directly compared with a model using the static features without any processing in a separate neural network so the added benefit is unclear. Feng et al. (Reference Feng, Lawson and Shen2021) further show an encoder network to encode soil moisture data if it is available prior to predicting streamflow with an LSTM model but show limited benefit over not including the soil moisture data.

2.1.3. Broad-scale graph neural networks

The majority of works in this study treat entities as systems that exist independently from each other (e.g., different lakes and different stream networks). However, many environmental and geospatial modeling applications exhibit strong dependencies and coherence between systems (Reichstein et al., Reference Reichstein, Camps-Valls, Stevens, Jung, Denzler and Carvalhais2019). These dependencies can be real, interactive physical connections, or a coherence in dynamics due to certain similarities regardless of whether the entities interact. For example, water temperature in streams is affected by a combination of natural and human-involved processes including meteorology, interactions between connected stream segments within stream networks, and the process of water management and timed-release reservoirs. Similar watersheds, basins, or lakes may also exhibit dependencies and coherence based on characteristics or climatic factors (George et al., Reference George, Talling and Rigg2000; Huntington et al., Reference Huntington, Hodgkins and Dudley2003; Kingston et al., Reference Kingston, McGregor, Hannah and Lawler2006; Magnuson et al., Reference Magnuson, Benson and Kratz1990). Popular methods like the previously described broad-scale models using direct concatenation of inputs (Section 2.1.1) offer no intuitive way to encode interdependencies between entities (e.g., in a connected stream network) and often ignore these effects. Researchers are beginning to explore different ways to encode these dependencies explicitly by using graph neural networks (GNNs) for broad-scale modeling of many entities. The use of GNNs can allow the modeling of complex relationships and interdependencies between entities, something traditional feed-forward or recurrent neural networks cannot do (Wu et al., Reference Wu, Pan, Chen, Long, Zhang and Philip2020). GNNs have seen a surge in popularity in recent years for many scientific applications and several extensive surveys of GNNs are available in the literature (Battaglia et al., Reference Battaglia, Hamrick, Bapst, Sanchez-Gonzalez, Zambaldi, Malinowski, Tacchetti, Raposo, Santoro and Faulkner2018; Bronstein et al., Reference Bronstein, Bruna, LeCun, Szlam and Vandergheynst2017; Wu et al., Reference Wu, Pan, Chen, Long, Zhang and Philip2020; Zhou et al., Reference Zhou, Cui, Hu, Zhang, Yang, Liu, Wang, Li and Sun2020). Hydrological processes naturally have both spatial and temporal components, and GNNs attempt to exploit the spatial connections, causative relations, or dependencies between similar entities analogous to the way that the LSTM architecture exploits temporal patterns and dependencies. Recent work has attempted to encode stream network structure within GNNs to capture spatial and hydrological dependencies for applications like drainage pattern recognition (Yu et al., Reference Yu, Ai, Yang, Huang and Yuan2022), groundwater level prediction (Bai and Tahmasebi, Reference Bai and Tahmasebi2022), rainfall-runoff or streamflow prediction (Feng et al., Reference Feng, Sha, Ding, Yan and Yu2022c; Kazadi et al., Reference Kazadi, Doss-Gollin, Sebastian and Silva2022; Kratzert et al., Reference Kratzert, Klotz, Gauch, Klingler, Nearing and Hochreiter2021; Sit et al., Reference Sit, Demiray and Demir2021; Sun et al., Reference Sun, Jiang, Mudunuru and Chen2021a; Zhao et al., Reference Zhao, Zhu, Shu, Wan, Yu, Zhou and Liu2020), lake temperature prediction (Stalder et al., Reference Stalder, Ozdemir, Safin, Sukys, Bouffard and Perez-Cruz2021), and stream temperature prediction (Bao et al., Reference Bao, Jia, Zwart, Sadler, Appling, Oliver and Johnson2021; Chen et al., Reference Chen, Appling, Oliver, Corson-Dosch, Read, Sadler, Zwart and Jia2021a, Reference Chen, Zwart and Jia2022).

In hydrology, there are three intuitive methods for the construction of the graph itself. The first is geared towards non-interacting entities, building the graph in the form of pair-wise similarity between entities, whether that be between site characteristics (Sun et al., Reference Sun, Jiang, Mudunuru and Chen2021a), spatial locations (Sun et al., Reference Sun, Yao, Bi, Huang, Zhao and Qiao2021b; Zhang et al., Reference Zhang, Li, Frery and Ren2021) (e.g., latitude/longitude) or both (Xiang and Demir, Reference Xiang and Demir2021). The second type is geared more toward physically interacting entities, for example, the upstream and downstream connections between different stream segments in a river network (Jia et al., Reference Jia, Zwart, Sadler, Appling, Oliver, Markstrom, Willard, Xu, Steinbach, Read and Kumar2021b) or connections between reservoirs with timed water releases to downstream segments (Chen et al., Reference Chen, Appling, Oliver, Corson-Dosch, Read, Sadler, Zwart and Jia2021a). The third type starts with an a priori connectivity matrix like the previous type but lets the GNN learn an adaptive connectivity matrix during training based on the sites’ dynamic inputs, attributes, or location (Sun et al., Reference Sun, Jiang, Yang, Xie and Chen2022). Relying solely on the characteristics or location for graph construction in the non-interacting case more easily allows for broad-scale modeling because it can model spatially disconnected entities, however, it introduces no new information (e.g., physical connectivity) beyond what the previously described direct concatenation-based methods use since the static characteristics would be the same. However, performance could still improve and interpretations of encodings within a graph framework could yield new scientific discoveries since pairwise encodings between entities can be directly extracted. Graphs built using real physical connections between entities (e.g., stream segments in a stream graph), on the other hand, allow for the capability to learn how information is routed through the graph and how different entities physically interact with each other. So far, this has only been seen in stream modeling using stream network graphs (Bindas et al., Reference Bindas, Shen and Bian2020; Jia et al., Reference Jia, Zwart, Sadler, Appling, Oliver, Markstrom, Willard, Xu, Steinbach, Read and Kumar2021b; Kratzert et al., Reference Kratzert, Klotz, Gauch, Klingler, Nearing and Hochreiter2021; Topp et al., Reference Topp, Barclay, Diaz, Sun, Jia, Lu, Sadler and Appling2023). The third type is useful when combining the physical connectivity between sites with similarity in inputs, and also in cases where the inputs are at a different scale than the target variable, for example, when meteorological variables are at kilometer scale and streamflow is at point scale.

There are two different classes of GNN models, transductive and inductive, which differ in how the graph is incorporated into the learning process. Depending on how the graphs are constructed, one of these is more natural than the other. A conceptual depiction of both is shown in Figure 3. The key aspect of transductive GNNs is that both training and testing entities must be present in the graph during training. A prerequisite for this approach is that the test data (e.g., input features in unmonitored sites) is available during model training, and one key aspect is that the model would need to be completely re-trained upon the introduction of new test data. Even if the training data is unchanged prior to re-training, introducing new test nodes in the graph can affect how information is diffused to each training node during optimization (Ciano et al., Reference Ciano, Rossi, Bianchini and Scarselli2021). This type of approach is generally preferred for river network modeling given the often unchanging spatial topology of the sub-basin structure which is known a priori (Jia et al., Reference Jia, Zwart, Sadler, Appling, Oliver, Markstrom, Willard, Xu, Steinbach, Read and Kumar2021b; Moshe et al., Reference Moshe, Metzger, Elidan, Kratzert, Nevo and El-Yaniv2020; Sit et al., Reference Sit, Demiray and Demir2021). Graph connections from the test nodes to the training nodes in a transductive setting can be used either in the training or prediction phase, or both (Rossi et al., Reference Rossi, Tiezzi, Dimitri, Bianchini, Maggini and Scarselli2018). Inductive GNNs on the other hand, are built using only training entities and allow for new entity nodes to be integrated during testing. For applications that continuously need to predict on new test data, inductive approaches are much more preferred. New entity nodes are able to be incorporated because inductive frameworks also learn an information aggregator that transfers the necessary information from similar or nearby nodes to predict at nodes unseen during training. However, this also means connections between nodes are only present in the test data and those in the training data are unseen during model training as opposed to transductive approaches where they are included. As shown in Figure 3, inductive graph learning can either be done on nodes that connect with training set nodes in the graph or those that are disconnected. Inductive GNNs can be understood as being in the same class as more standard supervised ML models like LSTM or feed-forward neural networks, where they are able to continuously predict on new test data without the need for re-training.

Figure 3.

Conceptual example of transductive and inductive graph learning. In both left and right panels, $ \mathcal{F} $ is a model learned during training. Blue and red nodes represent entities with data for use in training and test entities without any data respectively. In transductive graph learning, the model has access to nodes and edges associated with test entities during training, but no new nodes can be introduced during testing. In inductive graph learning, the model is trained on an initial graph without any knowledge of the test entities, but the model can generalize to any new nodes during testing.

A few studies use GNNs for prediction in unmonitored sites for water resources applications. Sun et al. (Reference Sun, Jiang, Mudunuru and Chen2021a) use different types of spatiotemporal GNNs including three transductive GNN methods, two variants of the ChebNet-LSTM (Yan et al., Reference Yan, Wang, Yu, Jin and Zhang2021) and a Graph Convolutional Network LSTM (GCN-LSTM) (Seo et al., Reference Seo, Defferrard, Vandergheynst and Bresson2018), compared with a GNN that can used as either transductive or inductive, GraphWaveNet (Wu et al., Reference Wu, Pan, Long, Jiang and Zhang2019). In all cases, the graph is initially constructed as an adjacency matrix containing the pairwise Euclidean distance between stream sites using site characteristics. Importantly, all four models simplify to direct concatenation-based models described in Section 2.1 if the graph convolution-based components are removed (See Figure S2 in Sun et al. (Reference Sun, Jiang, Mudunuru and Chen2021a) for a visualization). For ChebNet-LSTM and GCN-LSTM, the direct concatenation-approach would effectively simplify the architecture to a traditional LSTM, and for GraphWaveNet, it would simplify to a gated temporal convolution network (TCN). They found that for the transductive case, both ChebNet-LSTMs and GCN-LSTM performed worse in terms of median performance across basins than the standard LSTM and GraphWaveNet was the only one that performed better. GraphWaveNet, the only GNN also capable of doing inductive learning, also performed better in the inductive case than standard LSTM. Jia et al. (Reference Jia, Zwart, Sadler, Appling, Oliver, Markstrom, Willard, Xu, Steinbach, Read and Kumar2021b) take a different spatiotemporal GNN approach for stream temperature temporal predictions, where they construct their graph by using stream reach lengths with upstream and downstream connections to construct a weighted adjacency matrix. They found their GNN pre-trained on simulation data from the PRMS-SNTemp process-based model (Markstrom, Reference Markstrom2012) outperformed both a non-pre-trained GNN and a baseline LSTM model. Based on these results, we see that encoding dependencies based on site characteristics as well as physical interaction and stream connections within GNNs, can improve performance over existing deep learning models like the feed-forward artificial neural network (ANN) or LSTM.

Some studies have explored different ways of constructing the adjacency matrix based on the application and available data. An example of a domain-informed method for graph construction can be seen in Bao et al. (Reference Bao, Jia, Zwart, Sadler, Appling, Oliver and Johnson2021) for stream temperature predictions in unmonitored sites, where they leverage partial differential equations of underlying heat transfer processes to estimate the graph structure dynamically. This graph structure is combined with temporal recurrent layers to improve prediction performance beyond existing process-based and ML approaches. Dynamic temporal graph structures like this are common in other disciplines like social media analysis and recommender systems but have not been widely used in geosciences (Longa et al., Reference Longa, Lachi, Santin, Bianchini, Lepri, Lio, Scarselli and Passerini2023).

2.2. Transfer learning

Transfer learning is a powerful technique for applying knowledge learned from one problem domain to another, typically to compensate for missing, non-existent, or unrepresentative data in the new problem domain. The idea is to transfer knowledge from an auxiliary task, that is, the source system, where adequate data is available, to a new but related task, that is, the target system, often where data is scarce or absent (Pan and Yang, Reference Pan and Yang2009; Weiss et al., Reference Weiss, Khoshgoftaar and Wang2016). Situations where transfer learning may be more desirable than broad-scale modeling include when (1) a set of highly tuned and reliable source models (ML, process-based or hybrid) may already be available, (2) local source models are more feasible computationally or more accurate than broad-scale models when applied to unmonitored systems, or (3) broad-scale models may need to be transferred and fine-tuned to a given region or system type more similar to an unmonitored system. In the context of geoscientific modeling, transfer learning for ML is analogous to calibrating process-based models in well-monitored systems and transferring the calibrated parameters to models for unmonitored systems, which has shown success in hydrological applications (Kumar et al., Reference Kumar, Samaniego and Attinger2013; Roth et al., Reference Roth, Nigussie and Lemann2016). Deep learning is particularly amenable to transfer learning because it can make use of massive datasets from related problems and alleviate data paucity issues common in applying data-hungry deep neural networks to environmental applications (Naeini and Uwaifo, Reference Naeini and Uwaifo2019; Shen, Reference Shen2018). Transfer learning using deep learning has shown recent success in water applications such as flood prediction (Kimura et al., Reference Kimura, Yoshinaga, Sekijima, Azechi and Baba2019; Zhao et al., Reference Zhao, Pang, Xu, Cui, Wang, Zuo and Peng2021), soil moisture (Li et al., Reference Li, Wang, Shangguan, Li, Yao and Yu2021b), and lake and estuary water quality(Tian et al., Reference Tian, Liao and Wang2019; Willard et al., Reference Willard, Read, Appling and Oliver2021a).

Transfer learning can also be a capable tool for predictions in unmonitored sites (Tabas and Samadi, Reference Tabas and Samadi2021), although most applications typically assume that some data is available in the target system for fine-tuning a model, which is often referred to as few shot learning with sparse data (Weiss et al., Reference Weiss, Khoshgoftaar and Wang2016; Zhuang et al., Reference Zhuang, Qi, Duan, Xi, Zhu, Zhu, Xiong and He2020). The specific case of transferring to a system or task without any training data is also known as “zero-shot learning” (Romera-Paredes and Torr, Reference Romera-Paredes and Torr2015), where only the inputs or a high-level description may be available for the testing domain that does not contain any target variable values. This is a significantly more challenging problem because taking a pre-trained model from a data-rich source system and fine-tuning it on the target system is not possible, and instead other contextual data about the source and target systems must be used. In the case of unmonitored prediction, we often only have the dynamic forcing data and the characteristics of the target system (site) available. The following subsections cover different ways researchers have addressed the zero-shot transfer learning problem for water resources prediction.

2.2.1. Choosing which model to transfer

A central challenge in zero-shot transfer learning is determining which model to transfer from a related known task or how to build a transferable model. Previous work on streamflow prediction has based this purely on expert knowledge. For example, Singh et al. (Reference Singh, Mishra, Pingale, Khare and Thakur2022) operate under the assumption that the model must be trained on other basins in the same climatic zone, and at least some of the source basin’s geographical area must have similar meteorological conditions to the target basin. Other work has transferred models from data-rich regions to data-poor regions without any analysis of the similarity between the source and target regions. For example, Le et al. (Reference Le, Kim, Adam, Do, Beling and Lakshmi2022) transfer ML streamflow models built on North America (987 catchments), South America (813 catchments), and Western Europe (457 catchments); to data-poor South Africa and Central Asian regions. They transfer these models as-is and do not take into account any of the sparse data in the data-poor region or the similarity between regions and find that the local models trained on minimal data outperform the models from data-rich regions. Attempts have also been made to use simple expert-created distance-based metrics (e.g., Burn and Boorman (Reference Burn and Boorman1993)) using the site characteristic values (Vaheddoost et al., Reference Vaheddoost, Safari and Yilmaz2023). However, it is reasonable to think that a data-driven way to inform model building based on both the entity’s characteristics and past modeling experiences would be possible.

The idea of building or selecting a model by leveraging preexisting models is a type of meta-learning (Brazdil, Reference Brazdil2009; Lemke et al., Reference Lemke, Budka and Gabrys2015). More broadly meta-learning is the concept of algorithms learning from other algorithms, often in the task of selecting a model or learning how to best combine predictions from different models in the context of ensemble learning. One meta-learning strategy for model selection is to build a metamodel to learn from both the model parameters of known tasks (with ground truth observations) and the correlation of known tasks to zero-shot tasks (Pal and Balasubramanian, Reference Pal and Balasubramanian2019). For example, in lake temperature modeling, Willard et al. (Reference Willard, Read, Appling, Oliver, Jia and Kumar2021b) use meta-learning for a model selection framework where a metamodel learns to predict the error of transferring a model built on a data-rich source lake to an unmonitored target lake. A diagram of the approach is shown in Figure 4. A variety of contextual data is used to make this prediction, including (1) characteristics of the lake (e.g., maximum depth, surface area, clarity, etc., (2) meteorological statistics (e.g., average and standard deviation of air temperature, wind speed, humidity, etc., (3) simulation statistics from an uncalibrated process-based model applied to both the source and target (e.g., differences in simulated lake stratification frequency), and (4) general observation statistics (e.g., number of training data points available on the source, average lake depth of measured temperature, etc). They show significantly improved performance predicting temperatures in 305 target lakes treated as unmonitored in the Upper Midwestern United States relative to the uncalibrated process-based General Lake Model (Hipsey et al., Reference Hipsey, Bruce, Boon, Busch, Carey, Hamilton, Hanson, Read, de Sousa, Weber and Winslow2019), the previous state-of-the-art for broad-scale lake thermodynamic modeling. This was expanded to a streamflow application by Ghosh et al. (Reference Ghosh, Li, Tayal, Kumar and Jia2022) with numerous methodological adaptations. First, instead of using the site characteristics as is, they use a sequence autoencoder to create embeddings for all the stream locations by combining input time series data and simulated data generated by a process-based model. This adaptation alleviated a known issue in the dataset where the site characteristics were commonly incomplete and inaccurate. They also use a clustering loss function term in the sequence autoencoder to guide the model transfer, where source systems are selected based on available source systems within a given cluster of sites as opposed to building an ensemble with a set number of source sites. The clustering loss function term allows the model to learn a latent space that can correctly cluster river streams that can accurately transfer to one another. They show on streams within the Delaware River Basin that this outperforms the aforementioned simpler meta-transfer learning frameworks on sites based on Willard et al. (Reference Willard, Read, Appling, Oliver, Jia and Kumar2021b). Willard (Reference Willard2023), expand on Willard et al. (Reference Willard, Read, Appling, Oliver, Jia and Kumar2021b) by also building a meta-transfer learning framework that pre-trains each source model on CONUS-scale data, aiming to combine the benefits of broad-scale modeling and site-specific transfer learning for the task of stream temperature prediction. They find a small performance improvement over the existing direct concatenation approach building a single model on all stream entities in the CONUS.

Figure 4.

Process diagram of the Meta Transfer Learning framework. Models are first built from data-rich source domains. The metamodel is trained using characteristics extracted from the source domains to predict the performance metrics from transferring models between source domains. Then given a target system or domain, the metamodel is able to output a prediction of how well each of the source models will perform on the target system. Adapted from Willard, et al. (Reference Willard, Read, Appling and Oliver2021a).

2.2.2. Fine-tuning models with sparse data

A common hydrologic prediction scenario is one in which broad-scale data and models are available but a target site has inadequate or sparse data. This is especially seen in remote, inaccessible, or under-monitored regions. Given a pre-trained model on broad-scale data or simulated process-based model outputs, fine-tuning ML models by adjusting parameters during a second training instance has the potential to improve the accuracy and relevance of the model for specific local conditions. Pre-training on process-based model outputs and fine-tuning on minimal sparse data has shown to be effective in lake temperature (Jia et al., Reference Jia, Willard, Karpatne, Read, Zwart, Steinbach and Kumar2021a; Willard et al., Reference Willard, Read, Appling, Oliver, Jia and Kumar2021b) and stream temperature (Jia et al., Reference Jia, Willard, Karpatne, Read, Zwart, Steinbach and Kumar2021a) prediction for as little as 0.1% of available data to simulate a common prediction scenario where only a few measurements of the target variable may be available and show substantial increase in performance over an uncalibrated process-based model. Furthermore, in soil moisture prediction Li et al. (Reference Li, Wang, Shangguan, Li, Yao and Yu2021b) show an effective pre-training on the large-scale process-based reanalysis ERA5-Land dataset (Muñoz-Sabater et al., Reference Muñoz-Sabater, Dutra, Agustí-Panareda, Albergel, Arduini, Balsamo, Boussetta, Choulga, Harrigan and Hersbach2021) and fine-tuning on the smaller SMAP data (O’Neill et al., Reference O’Neill, Entekhabi, Njoku and Kellogg2010) showing increased explained variation of over 20% compared to the non-fine-tuned version.

Another transfer learning with fine-tuning strategy in geoscientific modeling that can also be based on pre-training is to localize a larger-scale or more data-rich regional or global model to a specific location or subregion. This variant of transfer learning has seen success in deep learning models for applications like soil spectroscopy (Padarian et al., Reference Padarian, Minasny and McBratney2019; Shen et al., Reference Shen, Ramirez-Lopez, Behrens, Cui, Zhang, Walden, Wetterlind, Shi, Sudduth and Baumann2022) and snow cover prediction (Guo et al., Reference Guo, Chen, Liu and Zhao2020a; Wang et al., Reference Wang, Yuan, Shen, Liu, Li, Yue, Shi and Zhang2020a). However, these strategies have seen mixed success in hydrological applications. Wang et al. (Reference Wang, Gupta, Zeng and Niu2022) show that localizing an LSTM predicting continental-scale snowpack dynamics to individual regions across the United States had insignificant benefits over the continental-scale LSTM. Xiong et al. (Reference Xiong, Zheng, Chen, Tian, Liu, Han, Jiang, Lu and Zheng2022) show a similar result for the prediction of stream nitrogen export, where the individual models for the 7 distinct regions across the conterminous United States transferred to each other did not outperform the continental-scale model using all the data. Also, Lotsberg (Reference Lotsberg2021) showed that streamflow models trained on CAMELS-US (United States) transfer to CAMELS-GB (Great Britain) about as well as a model trained on the combined data from US and GB, and models trained on CAMELS-GB transfer to CAMELS-US about as well as a model using the combined data. They also show that the addition of site characteristics is not beneficial in transfer learning tasks, but acknowledge this could be due to the way data is normalized prior to training. Based on these results, it is possible that the entity-aware broad-scale model using all available data is already learning to differentiate between different regions or types of sites on its own, and fine-tuning to more similar sites based on expert knowledge may be less useful. However, this remains to be demonstrated for most hydrological and water resources prediction tasks. Other studies have also continued the practice of pre-training a model on a data-dense region like the United States and fine-tuning on data-sparse regions like China (Ma et al., Reference Ma, Feng, Lawson, Tsai, Liang, Huang, Sharma and Shen2021; Xu et al., Reference Xu, Lin, Hu, Wang, Wu, Zhang and Ran2023b) or Kenya (Oruche et al., Reference Oruche, Egede, Baker and O’Donncha2021).

2.2.3. Unsupervised domain adaptation

Domain adaptation methods are a subset of transfer learning algorithms that attempt to answer the question, how can a model both learn from a source domain and learn to generalize to a target domain? Often domain adaptation seeks to minimize the risk of making errors on the target data, and not necessarily on the source data as in traditional supervised learning. Unsupervised domain adaptation (UDA), in particular, focuses on the zero-shot learning case of the target domain being void of target data. Similar to the types of graph neural networks mentioned in Section 2.1.3, review papers have divided transfer learning algorithms into the categories, (1) inductive transfer learning where the source and target tasks are different and at least some labeled data from the target task is required to induce a model, (2) transductive transfer learning where the source and target tasks are the same but from different feature space domains and zero labeled data is available from the target domain, and (3) unsupervised transfer learning where no labeled data is available in both the source and target domains (S. Niu et al., Reference Niu, Liu, Wang and Song2020; Pan and Yang, Reference Pan and Yang2009). UDA specifically lies in the transductive transfer learning scenario and usually involves using the input data from the target or testing task during the training process, in addition to the source data. This aspect differentiates UDA from the previously described methods in this section. Researchers can employ different UDA methods when attempting to account for differences in the source and target tasks and datasets. Commonly UDA methods attempt to account for the difference in input feature distribution shifts between the source and task, but other methods attempt to account for the difference in distributions of labeled data. This differs from previous approaches we have mentioned like the broad-scale models that generally ignore input data from testing sites, meta-transfer learning that uses test data inputs during model selection but not during training, and localizing regional models that uses available data from regions containing the test sites but not any data from the test sites themselves. UDA has seen success in many disciplines including computer vision (Csurka, Reference Csurka2017; Patel et al., Reference Patel, Gopalan, Li and Chellappa2015), robotics (Bousmalis et al., Reference Bousmalis, Irpan, Wohlhart, Bai, Kelcey, Kalakrishnan, Downs, Ibarz, Pastor and Konolige2018; Hoffman et al., Reference Hoffman, Wang, Yu and Darrell2016), natural language processing (Blitzer et al., Reference Blitzer, Dredze and Pereira2007), and fault diagnostics (Shi et al., Reference Shi, Ying and Yang2022) but applications of UDA in hydrology are limited. In the only current hydrological example, Zhou and Pan (Reference Zhou and Pan2022) introduce a UDA framework for unmonitored flood forecasting that involves a two-stage adversarial learning approach. The model is first pre-trained on a large sample source dataset, then they perform adversarial domain adaptation using an encoder to map the source and target inputs to the same feature space and learn the difference between the source and target datasets. They show this method is effective in flood forecasting across the Tunxi and Changhua flood datasets spanning Eastern China and Taiwan. Currently, UDA that accounts for a shift in label distribution (real or synthetic) has not been attempted in hydrological prediction, and future research on UDA in hydrology will need to consider whether to account for either input or label distribution shift between entities and systems.

2.3. Cross-cutting theme: knowledge-guided machine learning

There is a growing consensus that solutions to complex non-linear environmental and engineering problems will require novel methodologies that are able to integrate traditional process-based modeling approaches with state-of-the-art ML techniques, known as Knowledge-guided machine learning (KGML) (Karpatne et al., Reference Karpatne, Kannan and Kumar2022) (also known as Physics-guided machine learning or Physics-informed machine learning (Karpatne et al., Reference Karpatne, Atluri, Faghmous, Steinbach, Banerjee, Ganguly, Shekhar, Samatova and Kumar2017a; Muther et al., Reference Muther, Dahaghi, Syed and Van Pham2022; Willard et al., Reference Willard, Jia, Xu, Steinbach and Kumar2022a)). These techniques have been demonstrated to improve prediction in many applications including lake temperature (Jia et al., Reference Jia, Willard, Karpatne, Read, Zwart, Steinbach and Kumar2021a; Read et al., Reference Read, Jia, Willard, Appling, Zwart, Oliver, Karpatne, Hansen, Hanson and Watkins2019), streamflow (Bhasme et al., Reference Bhasme, Vagadiya and Bhatia2022; Herath et al., Reference Herath, Chadalawada and Babovic2021; Hoedt et al., Reference Hoedt, Kratzert, Klotz, Halmich, Holzleitner, Nearing, Hochreiter and Klambauer2021), groundwater contamination (Soriano et al., Reference Soriano, Siegel, Johnson, Gutchess, Xiong, Li, Clark, Plata, Deziel and Saiers2021), and water cycle dynamics (Ng et al., Reference Ng, Samadi, Wang and Bao2021) among others. Willard et al. (Reference Willard, Jia, Xu, Steinbach and Kumar2022a) divide KGML methodologies into four classes; (i) physics-guidedFootnote ² loss function, (ii) physics-guided initialization, (iii) physics-guided design of architecture, and (iv) hybrid physics-ML modeling. Many of these methods are helpful for prediction in unmonitored sites since known physics or existing models can exist in the absence of observed target data. Note that KGML is a cross-cutting theme, as its principles can be integrated into either of the previously described broad-scale modeling and transfer learning approaches. The benefits we see from KGML as a class of standalone techniques can also help address resource efficiency issues in building both broad-scale entire-aware models and source models in transfer learning while maintaining high predictive performance, training data efficiency, and interpretability relative to traditional ML approaches (Willard et al., Reference Willard, Jia, Xu, Steinbach and Kumar2022a).

The field of KGML is rapidly advancing, and given the numerous applications we see for its use in hydrology, we include the following discussion on the different ways of harnessing KGML techniques in a given physical problem that has traditionally been simulated using process-based models. The following three subsections are divided based on how KGML techniques are used to either replace, augment, or recreate an existing process-based model. Section 3.1.4 further expands on this discussion by addressing the role of KGML in the future of unmonitored prediction and open questions that exist.

2.3.1. Guiding ML with domain knowledge: KGML loss functions, architecture, and initialization

Traditional process-based models for simulating environmental variables in complex systems do not capture all the processes involved which can lead to incomplete model structure (e.g., from simplified or missing physics). Though a key benefit of pure ML is the flexibility to literally fit any dataset as well as not being beholden to the causal structure that process-based models are, its inability to make use of process-based knowledge can lead to negative effects like sample inefficiency, inability to generalize to out-of-sample scenarios, and physically inconsistent solutions. When building an ML model as a replacement for a process-based model, there are at least three considerations to guide the ML model with domain knowledge for improved predictive performance; KGML loss function terms, architecture, and initialization.

KGML loss function terms can constrain model outputs such that they conform to existing physical laws or governing equations. In dynamical systems modeling and solving partial differentiable equations, this technique is known as physics-informed neural networks (PINNs) pioneered by Raissi et al. (Reference Raissi, Perdikaris and Karniadakis2019). Steering ML predictions towards physically consistent outputs have numerous benefits. For prediction in unmonitored or data-sparse scenarios, the major benefit of informed loss function terms is that often the computation requires little to no observation data. Therefore, optimizing for that term allows for the inclusion of unlabeled data in training, which is often the only data available. Other benefits include that the regularization by physical constraints can reduce the possible search space of parameters, and also potentially learning with fewer labeled data, while also ensuring the consistency with physical laws during optimization. KGML loss function terms have also shown that models following desired physical properties are more likely to be generalizable to out-of-sample scenarios (Read et al., Reference Read, Jia, Willard, Appling, Zwart, Oliver, Karpatne, Hansen, Hanson and Watkins2019), and thus become acceptable for use by domain scientists and stakeholders in water resources applications. Loss function terms corresponding to physical constraints are applicable across many different types of ML frameworks and objectives, however, most of these applications have been in the monitored prediction scenario (e.g., lake temperature (Jia et al., Reference Jia, Zwart, Sadler, Appling, Oliver, Markstrom, Willard, Xu, Steinbach, Read and Kumar2021b; Karpatne et al., Reference Karpatne, Watkins, Read and Kumar2017b; Read et al., Reference Read, Jia, Willard, Appling, Zwart, Oliver, Karpatne, Hansen, Hanson and Watkins2019), lake phosphorous (Hanson et al., Reference Hanson, Stillman, Jia, Karpatne, Dugan, Carey, Stachelek, Ward, Zhang and Read2020), subsurface flow (Wang et al., Reference Wang, Zhang, Chang and Li2020b)). We also see applications of PINNs in hydrology for solving PDEs for transmissivity (Guo et al., Reference Guo, Zhao, Lu and Luo2023), solute transport (Niu et al., Reference Niu, Xu, Qiu, Li and Dong2023), soil moisture (Bandai and Ghezzehei, Reference Bandai and Ghezzehei2022), groundwater flow (Cuomo et al., Reference Cuomo, De Rosa, Giampaolo, Izzo and Di Cola2023), and shallow water equations (D. Feng et al., Reference Feng, Tan and He2023; Nazari et al., Reference Nazari, Camponogara and Seman2022). In this survey, we find only one work using informed loss function terms within a meta-transfer learning framework for lake temperature modeling (Willard et al., Reference Willard, Read, Appling and Oliver2021a, Reference Willard, Read, Appling, Oliver, Jia and Kumar2021b) incorporating conservation of energy relating to the ingoing and outgoing thermal fluxes into the lake.

Another direction is to use domain knowledge to directly alter a neural network’s architecture to implicitly encode physical consistency or other desired physical properties. However, KGML-driven architecture optimizing for physical consistency is usually understood as a hard constraint since the consistency is hardcoded into the model, whereas KGML loss function terms are a soft constraint that can depend on optimization and weights within the loss function. Other benefits from KGML loss function terms are also experienced by KGML-driven model architecture, including reducing the search space and allowing for better out-of-sample generalizability. KGML-driven model architectures have shown success in hydrology, however, it has been limited to temporal predictions for monitored sites. Examples include S. Jiang et al. (Reference Jiang, Zheng and Solomatine2020) where they show a rainfall-runoff process model can be embedded as special recurrent neural layers in a deep learning architecture, Daw and Karpatne (Reference Daw and Karpatne2019) where they show a physical intermediate neural network node as part of an monotonicity-preserving structure in the LSTM architecture for lake temperature, and more examples in the Willard et al. (Reference Willard, Jia, Xu, Steinbach and Kumar2022a) KGML survey. However, there is nothing preventing these approaches from being applied in the unmonitored scenario.

Lastly, if process-based model output is already available, such as the National Water Model streamflow outputs (NOAA, 2016), FLake model lake surface temperature outputs within ERA5 (Muñoz-Sabater et al., Reference Muñoz-Sabater, Dutra, Agustí-Panareda, Albergel, Arduini, Balsamo, Boussetta, Choulga, Harrigan and Hersbach2021), or PRMS-SNTemp simulated stream temperature (Markstrom, Reference Markstrom2012), this data can be used to help pre-train an ML model, which is known as KGML initialization. In the unmonitored prediction scenario, pre-training can be done on process-based model simulations of sites with no monitoring data. This is arguably the most accessible KGML method since there is no direct alteration of the ML approaches. By pre-training, the ML model can learn to emulate the process-based model prior to seeing training data in order to accelerate or improve the primary training. Numerous studies in water resources perform KGML-based model initialization by making use of process-based model output to inform ML model building, either to create site-specific embeddings used for similarity calculation in meta transfer learning (Ghosh et al., Reference Ghosh, Li, Tayal, Kumar and Jia2022), as a pre-training stage for source models in meta transfer learning (Willard et al., Reference Willard, Read, Appling and Oliver2021a, Reference Willard, Read, Appling, Oliver, Jia and Kumar2021b), or as a pre-training stage for entity-aware broad-scale models (Koch and Schneider, Reference Koch and Schneider2022; Noori et al., Reference Noori, Kalin and Isik2020).

Beyond these traditional KGML approaches, there is also the concept of neural operators, which have emerged as a powerful class of ML capable of generalizing across different scenarios and scales. Unlike traditional neural networks that learn mappings between inputs and outputs with fixed dimensions, neural operators map between infinite-dimensional functional spaces (Li et al., Reference Li, Kovachki, Azizzadenesheli, Liu, Bhattacharya, Stuart and Anandkumar2020). While neural operators have not yet been directly applied to ungauged or unmonitored hydrologic time series prediction, recent studies demonstrate their potential in surrogate modeling of dynamical systems modeling for flood inundation (Sun et al., Reference Sun, Li, Lee, Huang, Scanlon and Dawson2023), geological carbon storage (Tang et al., Reference Tang, Kong and Morris2024), and groundwater flow (Taccari et al., Reference Taccari, Wang, Goswami, De Florio, Nuttall, Chen and Jimack2023). They also have the capability to increase computational efficiency within transformer architectures for scaling to high resolution or high dimensional data, specifically for vision transformers in Guibas et al. (Reference Guibas, Mardani, Li, Tao, Anandkumar and Catanzaro2021) and Pathak et al. (Reference Pathak, Subramanian, Harrington, Raja, Chattopadhyay, Mardani, Kurth, Hall, Li and Azizzadenesheli2022).

2.3.2. Augmenting process models with ML using hybrid process-ML models

In many cases, certain aspects of process-based models may be sufficient but researchers seek to use ML in conjunction with an operating process-based to address key issues. Examples include where (1) process-based model outputs or intermediate variables are useful inputs to the ML model, (2) a process-based model may model certain intermediate variables better than others that could utilize the benefits of ML, or (3) optimal performance involves choosing between process-based models and ML models, based on prediction circumstances in real-time. Using both the ML model and a process-based model simultaneously is known as a hybrid process-ML model and is the most commonly used KGML technique for unmonitored prediction. In the Willard et al. (Reference Willard, Jia, Xu, Steinbach and Kumar2022a) survey of KGML methods, they define hybrid models as either process and ML models working together for a prediction task, or a subcomponent of a process-based model being replaced by an ML model. This type of KGML method is also very accessible for domain scientists since it requires no alterations to existing ML frameworks. In this work, we do not cover the large body of work of ML predictions of process-based model parameters since these methods have been outpaced by ML for predictive performance and tend to extrapolate to new locations poorly (Nearing et al., Reference Nearing, Kratzert, Sampson, Pelissier, Klotz, Frame, Prieto and Gupta2021), but summaries can be found in Reichstein et al. (Reference Reichstein, Camps-Valls, Stevens, Jung, Denzler and Carvalhais2019) or Xu and Liang (Reference Xu and Liang2021).

The most common form of hybrid process-ML models in hydrological and water resources engineering is known as residual modeling. In residual modeling, a data-driven model is trained to predict a corrective term to the biased output of a process-based or mechanistic model. This concept goes by other names such as error-correction modeling, model post-processing, error prediction, compensation prediction, and others. Correcting these residual errors and biases has been shown to improve the skill and reliability of streamflow forecasting (Cho and Kim, Reference Cho and Kim2022; Regonda et al., Reference Regonda, Seo, Lawrence, Brown and Demargne2013), water level prediction (López López et al., Reference López López, Verkade, Weerts and Solomatine2014), and groundwater prediction (Xu and Valocchi, Reference Xu and Valocchi2015). When applying residual modeling to unmonitored prediction, the bias-correcting ML model must be trained on either a large number of sites or sites similar to the target site. Hales et al. (Reference Hales, Sowby, Williams, Nelson, Ames, Dundas and Ogden2022) demonstrate a framework to build a residual model for stream discharge prediction with the GEOGloWS ECMWF Streamflow Model that selects similar sites based on the dynamic time warping and euclidean distance time series similarity metrics. For unmonitored sites, they substitute simulated data instead of the observed data and show a substantial reduction in model bias in ungauged subbasins.

A slight alteration to the residual model is a hybrid process-ML model that takes an ML model and adds the output of a process-based model as an additional input. This adds a degree of flexibility to the modeling process compared to the standard residual model as the residual error is not modeled explicitly, and multiple process-based model outputs can be used at once. Karpatne et al. (Reference Karpatne, Watkins, Read and Kumar2017b) showed that adding the simulated output of a process-based model as one input to an ML model along with input drivers used to drive the physics-based model for lake temperature modeling can improve predictions, and a similar result was seen in Yang et al. (Reference Yang, Sun, Gentine, Liu, Wang, Yin, Du and Liu2019a) augmenting flood simulation model based on prior global flood prediction models. This hybrid modeling approach has recently been applied to unmonitored prediction as well, with Noori et al. (Reference Noori, Kalin and Isik2020) using the output of SWAT (Soil & Water Assessment Tool (Arnold et al., Reference Arnold, Srinivasan, Muttiah and Williams1998)) as an input to a feed-forward neural network for predicting monthly nutrient load prediction in unmonitored watersheds. They find that the hybrid process-ML model has greater prediction skills in unmonitored sites than the SWAT model calibrated at each individual site.

Another simple way to combine process-based models with ML models is through multi-model ensemble approaches that combine the predictions of two or more types of models. Ensembles can both provide more robust prediction and allow quantification and reduction of uncertainty. Multiple studies in hydrology have shown that using two or more process-based models with different structures improves performance and reduces prediction uncertainty in ungauged basins (Cibin et al., Reference Cibin, Athira, Sudheer and Chaubey2014; Waseem et al., Reference Waseem, Ajmal and Kim2015). Razavi and Coulibaly (Reference Razavi and Coulibaly2016) show an ensemble of both ML models and process-based models for streamflow prediction, which further reduced prediction uncertainty and outperformed individual models. However, this study is limited to building a model for an ungauged stream site using only the three most similar and closely located watersheds, as opposed to more comprehensive datasets like CAMELS.

Comparisons between different types of hybrid models are not commonly seen, as most studies tend to use only one method. In one study highlighting different hybrid models, Frame et al. (Reference Frame, Kratzert, Raney, Rahman, Salas and Nearing2021) compare three approaches, (1) LSTM residual models correcting the National Water Model (NWM), (2) a hybrid process-ML model using an LSTM that takes the output of the NWM as an additional input, and (3) a broad-scale entity-aware LSTM like we have described in Section 2.1. They find that in the unmonitored scenario, the third approach performed the best, which leads to the conclusion that the output from the NWM actually impairs the model and prevents it from learning generalizable hydrological relationships. In many KGML applications, the underlying assumption is that the process-based model is capable of reasonably good predictions and adds value to the ML approaches. Additional research is required to address when hybrid modeling is beneficial for unmonitored prediction since there are often numerous process-based models and different ways to hybridize modeling for a given environmental variable.

2.3.3. Building differentiable and learnable process-based models

Numerous efforts have been made to build KGML models that have equal or greater accuracy than existing ML approaches but with increased interpretability, transparency, and explainability using the principles of differentiable process-based (DPB) modeling (Feng et al., Reference Feng, Liu, Lawson and Shen2022b; Khandelwal et al., Reference Khandelwal, Xu, Li, Jia, Stienbach, Duffy, Nieber and Kumar2020; Shen et al., Reference Shen, Appling, Gentine, Bandai, Gupta, Tartakovsky, Baity-Jesi, Fenicia, Kifer and Li2023). The main idea of DPB models is to keep an existing geoscientific model’s structure but replace the entirety of its components with differentiable units (e.g., ML). From an ML point of view, it can be viewed as a domain-informed structural prior resulting in a modular neural network with physically meaningful components. This differs from the previously described hybrid process-ML methods that include non-differentiable process-based models or components. One recent example is shown in hydrological flow prediction by Feng et al. (Reference Feng, Liu, Lawson and Shen2022b), though similar models have been used in other applications like earth system models (Gelbrecht et al., Reference Gelbrecht, White, Bathiany and Boers2022) and molecular dynamics (AlQuraishi and Sorger, Reference AlQuraishi and Sorger2021). The DPB model proposed by Feng et al. (Reference Feng, Liu, Lawson and Shen2022b) starts with a simple backbone hydrological model (Hydrologiska Byråns Vattenbalansavdelning model (Bergström, Reference Bergström1976)), replaces parts of the model with neural networks, and couples it with a differentiable parameter learning framework (see Figure 1 in Feng et al. (Reference Feng, Liu, Lawson and Shen2022b) for a visualization). Specifically, the process model structure is implemented as a custom neural network architecture that connects units in a way that encodes the key domain process descriptions, and an additional neural network is appended to the aforementioned process-based neural network model to learn the physical parameters. The key concept is that the entire framework is differentiable from end to end, and the authors further show that the model has nearly identical performance in gauged flow prediction to the record-holding entity-aware LSTM while exhibiting interpretable physical processes and adherence to physical laws like conservation of mass. A simpler implementation is seen in Khandelwal et al. (Reference Khandelwal, Xu, Li, Jia, Stienbach, Duffy, Nieber and Kumar2020), also for streamflow, where intermediate RNN models are used to predict important process model intermediate variables (e.g., snowpack, evapotranspiration) prior to the final output layer. In both of these implementations, we see a major advantage of the DPB model is the ability to output an entire suite of environmental variables in addition to the target streamflow variable, including baseflow, evapotranspiration, water storage, and soil moisture. The DPB approach has been further demonstrated in the unmonitored prediction of hydrological flow in Feng et al. (Reference Feng, Beck, Lawson and Shen2022a), showing better performance than the entity-aware LSTM for mean flow and high flow predictions but slightly worse for low flow. The results of DPB models in both unmonitored and monitored scenarios challenge the notion that process-based model structure rigidness is undesirable as opposed to the highly flexible nature of neural networks and that maybe elements of both can be beneficial when the performance is near-identical in these specific case studies.

3. Summary and discussion

We see that many variations of the three classes of ML methodologies discussed in Section 2 have been used for predictions in unmonitored sites (Table 1). So far, entity-aware broad-scale modeling through direct concatenation of features remains the dominant approach for hydrological applications. It remains to be seen how these different methods stack up against each other when predicting different environmental variables since most of the current studies are on streamflow prediction. The evidence so far suggests that combining data from heterogeneous regions when available should be strongly considered. In Section 2.1, we saw many applications in which using all available data across heterogeneous sites was the preferred method for training ML models as opposed to fitting to individual or a subset of sites. Many recent studies continue the traditional practice of developing unsupervised, process-based, and data-driven functional similarity metrics and homogeneity criteria when selecting either specific sites or subgroups of sites to build models on to be transferred to unmonitored sites. Notably, some of these works show models built on subgroups of sites outperform models using all available sites. Additionally, the results from Frame et al. (Reference Frame, Kratzert, Raney, Rahman, Salas and Nearing2021) suggest that using a broad-scale entity-aware ML model combining data from all regions is preferable to two different hybrid process-ML frameworks that harness a well-known process-based model in the NWM. Similarly, the results from Fang et al. (Reference Fang, Kifer, Lawson, Feng and Shen2022) suggest that deep learning models perform better when fed a diverse training dataset spanning multiple regions as opposed to homogeneous dataset on a single region even when the homogeneous data is more relevant to the testing dataset and the training datasets are the same size. This can likely be attributed to the known vulnerability of ML models that perform better when fed data from a diverse or slightly perturbed dataset (e.g., from adversarial perturbations), where they are able to learn the distinctions in underlying processes (see Hao and Tao, Reference Hao and Tao2022 for an example in hydrology).

Table 1.

Literature table

Abbreviations: DCBS: direct concatenation broad-scale; TL: transfer learning; ANN: artificial neural network (feed-forward multilayer perceptron); GNN: graph neural network; LSTM: long short-term memory neural network; MARS: multi-adaptive regression splines; MLR: multilinear regression; GBR: gradient boosting regression; GRU: gated recurrent unit; PDE: partial differential equation; RF: random forest; SVR: support vector regression; TCN: temporal convolution network; XGB: extreme gradient boosting.

It is also clear that the LSTM model remains by far the most prevalent neural network architecture for water resources time series prediction due to its natural ability to model sequences, its memory structure, and its ability to capture cumulative system status. We see that 30 of the 40 reviewed studies in Table 1 use LSTM. This aligns with existing knowledge and studies that have consistently found that LSTM is better suited for environmental time series prediction than traditional architectures without explicit cell memory (Fan et al., Reference Fan, Jiang, Xu, Zhu, Cheng and Jiang2020; Zhang et al., Reference Zhang, Zhu, Zhang, Ye and Yang2018). Even though we see the traditional ANN sometimes perform nearly as well or better (Chen et al., Reference Chen, Zhu, Jiang and Sun2020; Nogueira Filho et al., Reference Nogueira Filho, Souza Filho, Porto, Vieira Rocha, Sousa Estácio and Martins2022), the LSTM has the advantage of not having to consider time-delayed inputs, which is a critical hyperparameter, due to its recurrent structure already incorporating many previous timesteps. We find that other neural network architectures suitable for temporal data like transformers (Vaswani et al., Reference Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser and Polosukhin2017) and temporal convolution networks (TCN) (Lea et al., Reference Lea, Flynn, Vidal, Reiter and Hager2017) are not used much for unmonitored water resources applications compared to other disciplines doing sequential modeling such as natural language processing and bioinformatic sequence analysis where these methods have largely replaced LSTM. This is likely due to their recent development compared to LSTM and also possibly due to their lack of inclusion in major deep-learning software packages like Pytorch and Keras. One recent study by Yin et al., Reference Yin, Zhu, Zhang, Xing, Xia, Liu and Zhang2023 seems to suggest that transformers outperform LSTM for rainfall-runoff prediction in the United States, but still the vast majority of transformer applications in hydrology are in the context of prediction in monitored sites (Liu et al., Reference Liu, Liu and Mu2022; Liu et al., Reference Liu, Bian and Shen2023; Wang and Tang, Reference Wang and Tang2023; Wei et al., Reference Wei, Wang, Schmalz, Hagan and Duan2023; Xu et al., Reference Xu, Fan, Luo, Li, Jeong and Xu2023a; Yin et al., Reference Yin, Guo, Zhang, Chen and Zhang2022). How transformers are fair in predicting in the unmonitored scenario will be an important research direction because results have been mixed in the monitored scenario when compared to LSTM with some showing improvement (e.g., Liu et al. (Reference Liu, Liu and Mu2022), Yin et al. (Reference Yin, Guo, Zhang, Chen and Zhang2022), and some not e.g., Liu et al. (Reference Liu, Bian and Shen2023), Wei et al. (Reference Wei, Wang, Schmalz, Hagan and Duan2023)).

We also find that most studies are focused on daily predictions, although a few studies predict at a monthly, annual, or hourly time scales based on desired output resolutions, data availability computational efficiency, or available computational power. For instance, monthly predictions may be desirable over daily due to the ability to use more interpretable, computationally efficient bootstrap ensembles, and easy-to-implement classical ML models (Weierbach et al., Reference Weierbach, Lima, Willard, Hendrix, Christianson, Lubich and Varadharajan2022). Increased computational efficiency can also enable running a large number (e.g., millions) of model trainings or evaluations for parameter sensitivity or uncertainty analysis.

Spatially, the majority of studies cover the United States at 27 out of 40 studies. Fifteen of these span the entire conterminous United States, while 10 are specific regions. The remaining studies are specific to certain countries and span Asia (seven studies), South America (one study), Europe (three studies), other North America (two), and two studies cover multiple continents. The strong focus on the United States can be due to its large land area with rivers alongside the economic capability to have advanced monitoring stations where data are freely available for study worldwide.

We also see the prevalence of the CAMELS dataset being used in streamflow studies; it is used in 9 out of the 40 studies in Table 1. CAMELS serves as a transformative continental-scale benchmark dataset for data-driven catchment science with its combined high-quality streamflow measurements spanning 671 catchments, climate-forcing data, and catchment characteristics like land cover and topography. However, we note that it is limited to “unimpaired” catchments that are not influenced by human management via dams. In addition to dam-managed catchments, catchments close to and within urban areas excluded from CAMELS are more likely to be impacted by roadways or other infrastructure. There are over 800,000 dammed reservoirs affecting rivers around the world, including over 90,000 in the United States (International Rivers, 2007; US Army Core of Engineers, 2020). The effect of dammed reservoirs on downstream temperature is also complicated by variable human-managed depth releases and changing demands for water and energy that affect decision making (Risley et al., Reference Risley, Constantz, Essaid and Rounds2010). These limitations may hamper the ability of current models to extrapolate to real-world scenarios where many catchments of high economic and societal value are either strongly human-impacted or data-sparse.

3.1. Open questions for further research

Though the works reviewed in this survey encompass many techniques and applications, there are still many open issues to be addressed as the water resources scientific community increasingly adopts ML approaches for unmonitored prediction. Here we highlight questions for further research that are widely applicable and agnostic to any specific target environmental variable and should be considered as the field moves forward.

3.1.1. Is more data always better?

We have seen that deep learning models in particular benefit from large datasets of heterogeneous entities, challenging the longstanding notion transferring models between systems requires that they must be functionally similar (Guo et al., Reference Guo, Zhang, Zhang and Wang2021; Razavi and Coulibaly, Reference Razavi and Coulibaly2013). Further research is needed to develop robust frameworks to discern how many sites need to be selected for training, what similarity needs to be leveraged to do so, and if excluding sites or regions can benefit broad-scale ML models when given different environmental variable prediction tasks. We hypothesize that excluding sites deemed dissimilar often limits the spectrum of hydrological heterogeneity, and the utilization of all available stream sites ensures a more comprehensive understanding of the system by allowing the model to learn from a wide range of hydrological behaviors to more effectively generalize to unseen scenarios. This is supported by work in streamflow modeling that has explicitly analyzed the effect of merging data from heterogeneous entities on prediction performance. (Fang et al., Reference Fang, Kifer, Lawson, Feng and Shen2022) is a great example demonstrating one step in deciding between using all available data versus a subset of functionally similar entities. Furthermore, in stream temperature modeling, Willard (Reference Willard2023) also finds using more data is beneficial for nearly all regions in the United States, and both regional modeling and single-site modeling can benefit from being pre-trained on all available data. Moving forward we expect the use of a maximal amount of training data to be the default approach, especially given the advancements in computational power and hydrology having comparatively smaller datasets than other fields where deep learning models are also commonly used like natural language processing, social media, and e-commerce.

3.1.2. How do we select optimal training data and input features for prediction?

If it is not feasible or desirable to use all available data, this further begs the question of how to optimally select functionally similar entities to construct a training dataset to minimize target site prediction error. Many approaches exist to derive an unsupervised similarity between sites including using network science (Ciulla et al., Reference Ciulla, Willard, Weierbach and Varadharajan2022), using meta-learning to select training data (e.g., active learning-based data selection (Al-Shedivat et al., Reference Al-Shedivat, Li, Xing and Talwalkar2021)), or comparing existing expert-derived metrics like hydrological signatures (McMillan, Reference McMillan2021). There are also methods to combine training for large-scale entity-aware modeling while also specifying a target region or class of similar sites exist (further explained in Section 3.1.3), and this is another example of where functional similarity could be applied.

Approaches also exist to use ML frameworks like neural networks to develop the similarity encodings themselves, which could be used to select subgroups of sites. Kratzert et al., (Reference Kratzert, Klotz, Shalev, Klambauer, Hochreiter and Nearing2019c, Reference Kratzert, Klotz, Shalev, Klambauer, Hochreiter and Nearing2019d) demonstrate a custom LSTM architecture that delineates static and dynamic inputs, feeding the former to the LSTM input gate and the latter to the remaining gates. The idea is to use the input gate nodes to encode the functional similarity between stream gauge locations based on the site characteristics alone, and they show this to reveal interpretable hydrological similarity that aligns with existing hydrological knowledge. This framework as-is will not exclude any sites directly but still offers insight into the usefulness of embedded functional similarity. We also see the static feature encoding from Section 2.1.2, differing from the previously mentioned method by using a separate ANN for static features as opposed to different gates in the same LSTM. Future research in developing these similarity encodings can also extend into adversarial-based ML methods that could discern valuable training entities.

Numerous other factors can be considered in training dataset construction when deciding whether to include entities other than functional similarity as well. First, the training data should be representative of all types of entities relevant to the prediction tasks, and not too biased towards a particular region or type of site which can correspondingly bias results. When building a model to transfer to a particular set of unmonitored sites, it must be considered whether the training data is representative of those target sites because environmental monitoring paradigms from the past that make up the dataset may not be in line with current priorities. Another consideration is the quality of data, where some sites may have higher quality of data than other sites which may have some highly uncertain characteristics. In cases like these, uncertainty quantification methods can be used to increase the reliability of predictions (Abdar et al., Reference Abdar, Pourpanah, Hussain, Rezazadegan, Liu, Ghavamzadeh, Fieguth, Cao, Khosravi and Acharya2021), or different weighting can be assigned to different entities based on uncertainty metrics or what the training dataset needs to be representative. It has also been shown that assigning a vector of random values as a surrogate for catchment physical descriptors can be sufficient in certain applications (Li et al., Reference Li, Khandelwal, Jia, Cutler, Ghosh, Renganathan, Xu, Tayal, Nieber and Duffy2022).

Furthermore, hydrologic prediction problems often contain a vast array of possible input features and input feature combinations spanning both dynamic forcing data like daily meteorology and static site characteristics. The process of feature selection aims to find the optimal subset of input features that (1) contains sufficient predictive information for an accuracy model, and (2) excludes redundant and uninformative features for better computational efficiency and model interpretability (Dhal and Azad, Reference Dhal and Azad2022). Notably, the majority of works reviewed in this study do not incorporate data-driven or statistical feature selection methods, and instead explicitly or presumably rely on expert domain knowledge to select inputs. This contrasts with many disciplines applying ML regression where feature selection is normalized and often deemed necessary (e.g., medical imaging (Remeseiro and Bolon-Canedo, Reference Remeseiro and Bolon-Canedo2019), multi-view learning (R. Zhang et al., Reference Zhang, Nie, Li and Wei2019), finance (Khan et al., Reference Khan, Ghazanfar, Azam, Karami, Alyoubi and Alfakeeh2020)). However, modern large-sample hydrology datasets offer a wealth of watershed, catchment, and individual site-specific characteristics and metrics that could serve as an opportunity to apply feature selection methods. For instance, the StreamCat dataset (Hill et al., Reference Hill, Weber, Leibowitz, Olsen and Thornbrugh2016) contains over 600 metrics for 2.65 million stream segments across the United, and the Caravan dataset (Kratzert et al., Reference Kratzert, Nearing, Addor, Erickson, Gauch, Gilon, Gudmundsson, Hassidim, Klotz and Nevo2023) contains 70 catchment attributes for 6830 catchments across the world.

Feature selection methods span three primary categories. Filter feature selection methods rank variables based on their statistical properties alongside the target variable without considering the ML model itself. Popular filter techniques base rankings on correlation coefficients, mutual information, and information gain per feature. These methods have low computational cost compared to other methods, however, they contain the drawback of not considering the interaction with the underlying ML model’s performance. Wrapper feature selection methods, on the other hand, assess the quality of variables by evaluating the performance of a specific ML model using a subset of features. Common wrapper methods include forward selection, backward elimination, boruta (Kursa and Rudnicki, Reference Kursa and Rudnicki2010), and recursive feature elimination. These methods have the advantage of considering the interaction between variables and the model’s performance, however, they are more computationally expensive due to an often large number of model trainings and evaluations, especially for datasets which a large number of candidate input features. Embedded (or intrinsic) feature selection methods are models that automatically already perform feature selection during training. Techniques like Least Absolute Shrinkage and Selection Operator and Elastic Net regularization automatically penalize the coefficients of irrelevant features during training, encouraging their removal. Additionally, random forest and similar decision tree methods also contain embedded feature selection as they will not include irrelevant features in the decision trees.

The size and dimensionality of the hydrological dataset play a significant role in selecting a feature selection method. For large datasets with hundreds or thousands of possible features, filter methods can provide a computationally efficient initial screening. In contrast, wrapper methods such as recursive feature elimination or forward feature selection, are suitable for smaller datasets with fewer predictors, as they explicitly consider the regression model’s performance. As hydrological modeling increasingly incorporates deep learning, the use of embedded methods may not be desirable since these methods generally exist among classical ML models. There is room for the hydrology community to develop standard processes to select optimal features for a given target variable and set of modeling sites. Furthermore, there needs to be methods to combine datasets where, for example, site-specific characteristics that need to be considered in a feature selection framework exist across multiple data sources.

3.1.3. How should site characteristics be used in machine learning models for unmonitored prediction?

We have seen that the generalization of ML models to unmonitored sites requires the availability of site characteristics (Kratzert et al., Reference Kratzert, Herrnegger, Klotz, Hochreiter and Klambauer2019a; Xie et al., Reference Xie, Liu, Tian, Wang, Bai and Liu2022), but that the science about how to use them is uncertain. The entity-aware models listed in this study tend to exhibit performance increases when such characteristics are included. For example, Rasheed et al. (Reference Rasheed, Aravamudan, Sefidmazgi, Anagnostopoulos and Nikolopoulos2022) find site characteristics like soil porosity, forest fraction, and potential evapotranspiration all exhibit significant importance for flood peak prediction, and Xie et al. (Reference Xie, Liu, Tian, Wang, Bai and Liu2022) find that the combined catchment characteristics make up 20% of the total feature importances for a continental-scale baseflow prediction model. However, the result from Li et al. (Reference Li, Khandelwal, Jia, Cutler, Ghosh, Renganathan, Xu, Tayal, Nieber and Duffy2022) showing random values substituted for site characteristics still improve performance in the temporal prediction scenario needs to be further investigated and compared in other applications. Many methods in this survey use site characteristics in different ways, and an open question remains of how to best add site characteristics to an ML model in a given task.

Throughout this review, we see several ways to incorporate site characteristics into ML model architecture and frameworks. The most common way is in an entity-aware model using concatenated input features as seen in Section 2.1.1, presumably based on landmark results from the streamflow modeling community. However, it has also been demonstrated that using a graph neural network approach using these site characteristics to determine the similarity between sites can slightly outperform the concatenated input approach (Sun et al., Reference Sun, Jiang, Mudunuru and Chen2021a). Site characteristics have also been used to build and predict with a metamodel the performance of different local models to be transferred to an unmonitored site (Ghosh et al., Reference Ghosh, Li, Tayal, Kumar and Jia2022; Willard et al., Reference Willard, Read, Appling and Oliver2021a, Reference Willard, Read, Appling, Oliver, Jia and Kumar2021b). Other works mentioned in Section 2.1.2 demonstrate the effectiveness of learning ML-based encodings of site characteristics as opposed to using them as-is (Ghosh et al., Reference Ghosh, Li, Tayal, Kumar and Jia2022; Tayal et al., Reference Tayal, Jia, Ghosh, Willard, Read and Kumar2022). However, these approaches have not been tested against the concatenated input entity-aware approach commonly seen in other works which is needed to assess their role in modeling unmonitored sites.

Furthermore, water management stakeholders, decision-makers, and forecasters often seek to prioritize specific individual locations that are unmonitored but the site characteristics are known. Many of the broad-scale approaches mentioned in this survey are built without any knowledge of the specific testing sites they are going to be applied to. While training without any knowledge of the testing data is a common practice in supervised machine learning, efforts to predict in unmonitored sites may benefit from including information on specific test sites during training. For example, characteristics from the test sites are used in the meta-transfer learning framework described in Section 2.2 to select source models to apply to the target or test system. Surveys on transfer learning (Niu et al., Reference Niu, Liu, Wang and Song2020; Pan and Yang, 2010) have described this distinction as the difference between inductive transfer learning, where the goal is to find generalizable rules that apply to completely unseen data, with transductive transfer learning, where the input data to the target or test system is known and can be used in the transfer learning framework. Transductive transfer learning methods like meta transfer learning have been proposed, but there is a lack of transductive methods that can harness the power of the highly successful entity-aware broad-scale models. In the same way that transfer learning has facilitated the pre-training of ML models in hydrology on data-rich watersheds to be transferred and fine tuned efficiently with little data in a new watershed, for example in flood prediction (Kimura et al., Reference Kimura, Yoshinaga, Sekijima, Azechi and Baba2019), we imagine there could be ways to harness to benefits of large-scale entity-aware modeling and also fine tune those same models to a specific region or class of sites that have known site characteristics. For example, the entity-aware models using all available data described in Section 2.1 could be fine-tuned to specific relevant subgroups, or the individual source models described in transfer learning approaches in Section 2.2 could be pre-trained using all available data (Willard, Reference Willard2023).

There is also the issue of the non-stationary nature of many site characteristics. These characteristics are typically derived from synthesized data products that treat them as static values such as the Geospatial Attributes of Gages for Evaluating Streamflow (Falcone, Reference Falcone2011) containing basin topography, climate, land cover, soil, and geology), StreamCat (Hill et al., Reference Hill, Weber, Leibowitz, Olsen and Thornbrugh2016), and the dataset in Willard et al. (Reference Willard, Read, Appling and Oliver2021a) (lake characteristics like bathymetry, surface area, stratification indices, and water clarity estimates). Though this treatment of site characteristics as static is intuitive for properties that do not evolve quickly (e.g., geology), in reality, properties such as land cover, land use, or even climate are dynamic in nature and evolve at different time scales. This can affect prediction performance in cases where the dynamic nature of certain characteristics treated as static is vital to prediction. For example, land use is a key dynamic predictor for river water quality in areas undergoing urbanization (Yao et al., Reference Yao, Chen, He, Cui, Mo, Pang and Chen2023), but is treated as static in most hydrological ML models. In lake temperature modeling, water clarity is treated as static in Willard et al. (Reference Willard, Read, Appling, Oliver, Jia and Kumar2021b) but realistically has a notable dynamic effect on water column temperatures (Rose et al., Reference Rose, Winslow, Read and Hansen2016). Though this problem exists in both monitored and unmonitored scenarios, characteristics are particularly important in unmonitored site prediction since often that is the only knowledge available concerning a location. As data collection from environmental sensors continues to improve, this highlights a need for new geospatial datasets and methods to represent dynamic characteristics at multiple time points (e.g., National Land Cover Database (Homer et al., Reference Homer, Fry and Barnes2012).

3.1.4. How can we leverage process understanding for prediction in unmonitored sites?

The success of ML models achieving better prediction accuracy across many hydrological and water resources variables compared to process-based models has led to the question posed by Nearing et al., Reference Nearing, Kratzert, Sampson, Pelissier, Klotz, Frame, Prieto and Gupta2021 of, “What role will hydrological science play in the age of machine learning?”. Given the relevant works reviewed in this study showing mixed results comparing KGML approaches using process understanding with domain-agnostic black box approaches, more research is required to address the role of domain knowledge in PUB prediction for unmonitored sites. From Section 2.3 we see that using graph neural networks has the potential to encode spatial context relevant for predictions and improve over existing methods, but also that hybrid models have not been as effective as domain-agnostic entity-aware LSTM counterparts. A key research direction will be finding which context is relevant to encode in graphs or other similarity or distance-based structures, whether that be spatial or based on expert domain knowledge. A preferable alternative to existing hybrid process-ML models may be the DPB models explained in Section 2.3.3, which exhibit many side benefits like being able to output accurate intermediate variables and demonstrating interpretability, but the performance achieved remains similar to existing process-agnostic models like the entity-aware LSTM models. There is potential to further research and develop these DPB approaches, for instance, they stand to benefit from assimilating multiple data sources since they simulate numerous additional variables.

KGML modeling techniques, like informed loss functions, informed model architecture, and hybrid modeling can be considered during method development. For example, knowledge-guided loss function terms can impose structure on the solution search space in the absence of labeled target data by forcing model output to conform to physical laws (e.g., conservation of energy or mass). Examples of successful implementations of knowledge-guided loss functions to improve temporal prediction include the conservation of energy-based term to predict lake temperature (Read et al., Reference Read, Jia, Willard, Appling, Zwart, Oliver, Karpatne, Hansen, Hanson and Watkins2019), power-scaling law-based term to predict lake phosphorous concentration (Hanson et al., Reference Hanson, Stillman, Jia, Karpatne, Dugan, Carey, Stachelek, Ward, Zhang and Read2020), and advection–dispersion equation-based terms to predict subsurface transport states (He et al., Reference He, Barajas-Solano, Tartakovsky and Tartakovsky2020). These results show that informed loss functions can improve the physical realism of the predictions, reduce the data required for good prediction performance, and also improve generalization to out-of-sample scenarios. Since loss function terms are generally calculated on the model output and do not require target variable data, they can easily be transferred from temporal predictions to the unmonitored prediction scenario.

Knowledge-guided architecture can similarly make use of the domain-specific characteristics of the problem being solved to improve prediction and impose constraints on model prediction but has not been applied in the unmonitored scenario. As opposed to soft constraints as imposed by a loss function term, architectural modifications can impose hard constraints. Successful examples of modified neural network architectures for hydrological prediction include a modified LSTM with monotonicity constraints for lake temperatures at different depths (Daw and Karpatne, Reference Daw and Karpatne2019), mass-conserving modified LSTMs for streamflow prediction (Hoedt et al., Reference Hoedt, Kratzert, Klotz, Halmich, Holzleitner, Nearing, Hochreiter and Klambauer2021), and an LSTM architecture that includes auxiliary intermediate processes that connect weather drivers to streamflow Khandelwal et al., Reference Khandelwal, Xu, Li, Jia, Stienbach, Duffy, Nieber and Kumar2020. Many hydrological prediction tasks involve governing equations such as conservation laws or equations of state that could be leveraged in similar ways to improve ML performance in unmonitored sites.

We also see from Section 2.3 that hybrid process and ML models are also tool to consider for ungauged and unmonitored prediction. However, comparisons between different types of hybrid models are not commonly seen, as most studies we noted tend to use only one method. However, different types should be considered based on the context of the task. For example, if multiple process-based models are available then a multi-model ensemble or using multiple process-based outputs as inputs to an ML model can be considered. Or, if part of the physical process is well-known and modeled compared to more uncertain components, researchers can consider replacing only part of the process-based model with an ML model component.

3.1.5. How do we perform uncertainty quantification for predictions in unmonitored sites?

Uncertainties in ML efforts for prediction in unmonitored sites can arise from various sources, including model structure and input data quality. Through uncertainty quantification (UQ) techniques, decision-makers can understand the limitations of the predictions and make informed decisions. UQ also enables model refinement, identification of data gaps, and prioritization of monitoring efforts in ungauged basins. Various techniques exist in UQ for ML (Abdar et al., Reference Abdar, Pourpanah, Hussain, Rezazadegan, Liu, Ghavamzadeh, Fieguth, Cao, Khosravi and Acharya2021), including Bayesian deep learning (Wang and Yeung, Reference Wang and Yeung2020), dropout-based methods (Gal and Ghahramani, Reference Gal and Ghahramani2016), Gaussian processes, and ensemble techniques. The concept of Bayesian deep learning is to incorporate prior knowledge and uncertainty by defining a full probability distribution for neural network parameters as opposed to a point estimate, which allows for the estimation of posterior distributions. These posterior distributions capture the uncertainty in the predictions and can be used to generate probabilistic forecasts in time series modeling. Gaussian processes similarly do Bayesian inference but over a particular function rather than a deep neural network, and dropout methods approximate Bayesian ML by using a common regularization technique to randomly set a fraction of the parameters to zero, effectively “dropping them out” for a particular forward pass. This allows for the creation of an ensemble of models from a single model.

Using ensembles of models for prediction is a longstanding technique in hydrology that spans both process-based models (Thielen et al., Reference Thielen, Schaake, Hartman and Buizza2008; Troin et al., Reference Troin, Arsenault, Wood, Brissette and Martel2021) and more recently ML models (Zounemat-Kermani et al., Reference Zounemat-Kermani, Batelaan, Fadaee and Hinkelmann2021). Ensemble learning is a general meta approach to model building that combines the predictions from multiple models for both UQ and better predictive performance. In traditional water resources prediction, ideally, models in the ensemble will differ with respect to either meteorological input dataset (e.g., He et al., Reference He, Wetterhall, Cloke, Pappenberger, Wilson, Freer and McGregor2009), process-based model parameters (e.g., Seibert and Beven, Reference Seibert and Beven2009) or multiple process-based model structures (e.g., Moore et al., Reference Moore, Mesman, Ladwig, Feldbauer, Olsson, Pilla, Shatwell, Venkiteswaran, Delany and Dugan2021). Different types of techniques are seen across ensemble learning more generally in the ML community, with common techniques such as (1) bagging, where many models are fit on different samples of the same dataset and averaging the predictions, (2) stacking, where different models types are fit on the same data and a separate model is used to learn how to combine the predictions, and (3) boosting, where ensemble members are added sequentially to correct the predictions made by previous models. Some of the main advantages of model ensembles in both cases is that the uncertainty in the predictions can be easily estimated and predictions can become more robust, leading them to be ubiquitous within many forecasting disciplines. Diversity in models is key, as model skill generally improves more from model diversity rather than from a larger ensemble (DelSole et al., Reference DelSole, Nattala and Tippett2014).

There are key differences in ensemble techniques in process-based modeling versus ML. For instance, expert-calibrated parameters have very specific meanings in process-based models whereas the analogous parameters in ML (usually known as weights) are more abstract and characteristic of a black box. When tweaking parameters between models to assemble an ensemble, physical realism is important in the process-based model case. Parameterization has a rich history in process-based models and the work can be very domain-specific, whereas ML ensemble techniques are often done using existing code libraries through a domain-agnostic process. Furthermore, ML ensemble techniques usually do not modify input datasets, though they could through adding noise (Brownlee, Reference Brownlee2018) or by using different data products (e.g., for meteorology).

We see most ML applications reviewed in this work do not attempt to use UQ techniques even though the few that do, see positive results (e.g., the use of ensembles for stream temperature (Weierbach et al., Reference Weierbach, Lima, Willard, Hendrix, Christianson, Lubich and Varadharajan2022), streamflow (Feng et al., Reference Feng, Lawson and Shen2021), and water level (Corns et al., Reference Corns, Long, Hale, Kanwar and Vanfossan2022)). A recent survey by Zounemat-Kermani et al., Reference Zounemat-Kermani, Batelaan, Fadaee and Hinkelmann2021 finds that ensemble ML strategies demonstrate “absolute superiority” compared to regular (individual) ML model learning in hydrology, and this result has also been seen in the machine learning community more generally for neural networks (Hansen and Salamon, Reference Hansen and Salamon1990). Many opportunities exist to develop ensemble frameworks in water resource prediction that harness numerous diverse ML models. In the same way that the hydrology community often uses ensembles of different process-based model structures, the many different architectures and hyperparameters in deep learning networks can achieve a similar diversity. Given the common entity-aware broad-scale modeling approach seen widely throughout this review, the opportunity exists to use resampling techniques like bootstrap aggregation (Breiman, Reference Breiman1996) to vary training data while maintaining broad coverage, as seen in Weierbach et al., Reference Weierbach, Lima, Willard, Hendrix, Christianson, Lubich and Varadharajan2022 for stream temperature. Other ensemble methods like in Feng et al., Reference Feng, Lawson and Shen2021 vary which site characteristics are used as inputs to LSTMs for streamflow prediction.

3.1.6. What is the role of explainable AI in predictions for unmonitored sites?

Historically, the difference between ML methods and more process-based or mechanistic methods has been described as a tradeoff between “predictive performance” and “explainability” (Lipton, Reference Lipton2018). However, there has been a deluge of advances in recent years in the field of explainable AI (XAI) (Arrieta et al., Reference Arrieta, Diaz-Rodriguez, Del Ser, Bennetot, Tabik, Barbado, Garcia, Gil-Lopez, Molina and Benjamins2020) and applications of these are increasingly being seen in geosciences (Başağaoğlu et al., Reference Başağaoğlu, Chakraborty, Lago, Gutierrez, Şahinli, Giacomoni, Furl, Mirchi, Moriasi and Şengor2022; Mamalakis et al., Reference Mamalakis, Barnes and Ebert-Uphoff2023). For example, recent work has shown how XAI can help to calibrate model trust and provide meaningful post-hoc interpretations (Toms et al., Reference Toms, Barnes and Ebert-Uphoff2020), identify how to fine-tune poor-performing models (Ebert-Uphoff and Hilburn, Reference Ebert-Uphoff and Hilburn2020), and also accelerate scientific discovery (Mamalakis et al., Reference Mamalakis, Barnes and Ebert-Uphoff2022). This has led to a change in the narrative of the performance and explainability tradeoff as calls are increasingly made for the water resources community to adopt ML as a complementary or primary avenue toward scientific discovery (Shen et al., Reference Shen, Laloy, Elshorbagy, Albert, Bales, Chang, Ganguly, Hsu, Kifer and Fang2018). Though the majority of work using XAI in water resources time series prediction has been seen in the temporal prediction scenario (e.g., Kratzert et al., Reference Kratzert, Herrnegger, Klotz, Hochreiter and Klambauer2019a; Lees et al., Reference Lees, Reece, Kratzert, Klotz, Gauch, De Bruijn, Kumar Sahu, Greve, Slater and Dadson2022), analysis of how ML models are able to learn and transfer hydrologic understanding for predictions in unmonitored sites can help address one of the most fundamental problems of “transferability” in hydrology.

We find that many water resources studies still use classical ML models like random forest or XGBoost in part due to their ease of interpretability. Initial investigations of the interpretability of deep learning frameworks have mostly addressed simple questions like feature attribution and sensitivity (e.g., Potdar et al., Reference Potdar, Kirstetter, Woods and Saharia2021; Sun et al., Reference Sun, Jiang, Mudunuru and Chen2021a). The concept of DPB models discussed in Section 2.3.3 shows potential to take this further and make an end-to-end interpretable model mimicking environmental processes but with the trainability and flexibility of deep neural networks. DPB models can provide more extensive interpretability compared to simpler feature attribution methods by being able to represent intermediate process variables explicitly in the neural network with the capability of extracting their relationship to the inputs and outputs.

Future work on XAI for unmonitored site prediction can pose research questions in directions that harness the existing highly successful ML models to both refine theoretical underpinnings and add to the current hydrologic or other process understandings surrounding regionalizations to unmonitored sites. For example, methods like layerwise relevance propagation, integrated gradients, or Shapley additive explanations (SHAP) (Molnar, Reference Molnar2020) could be used to explore causations and attributions of observed variability in situations where ML predicts more accurately than existing process-based regionalization approaches. Both temporal and spatial attributes can be considered, for example when using methods like SHAP with LSTM the attributions of any inputs along the sequence length can be used to see how far back in time the LSTM is using its memory to perform predictions, or in GNNs to see where in space the knowledge is being drawn for prediction (Ying et al., Reference Ying, Bourgeois, You, Zitnik and Leskovec2019).

4. Conclusion

The use of ML for unmonitored environmental variable prediction is an important research topic in hydrology and water resources engineering, especially given the urgent need to monitor the effects of climate change and urbanization on our natural and man-made water systems. In this article, we review the latest methodological advances in ML for unmonitored prediction using entity-aware deep learning models, transfer learning, and knowledge-guided ML models. We summarize the patterns and extent of these different approaches and enumerate questions for future research. Addressing these questions sufficiently will likely require the training of interdisciplinary water resources ML scientists and also the fostering of interdisciplinary collaborations between ML and domain scientists. As the field of ML for water resources progresses, we see many of these open questions can also augment domain science understanding in addition to improving prediction performance and advancing ML science. We hope this survey can provide researchers with state-of-the-art knowledge of ML for unmonitored prediction, offer the opportunity for cross-fertilization between ML practitioners and domain scientists, and provide guidelines for the future.

Acronyms/Abbreviations

AI: Artificial intelligence
ANN: Artificial neural network (feed forward)
CAMELS: Catchment Attributes and Meteorology for Large-sample Studies
DCBS: Direct concatenation broad-scale
DPB: Differentiable process-based
GNN: Graph neural network
GRU: Gated recurrent unit
KGML: Knowledge-guided machine learning
LSTM: Long short-term memory
MARS: Multi-adaptive regression splines
ML: Machine learning
NWM: National Water Model
RF: Random forest
XGB/XGBoost: Extreme gradient boosting
SHAP: SHapley Additive exPlanations
SVR: Support vector regression
TCN: Temporal convolutional network
XAI: eXplainable artificial intelligence

Acknowledgements

We are grateful for the editorial assistance of Somya Sharma, Kelly Lindsay, and Rahul Ghosh. We also acknowledge the helpful comments from the anonymous reviewers, which helped improve this manuscript.

Author contributions

Jared Willard: Writing – Original Draft Preparation (lead); Conceptualization (equal); Data (literature) Curation (lead); Investigation (equal); Methodology (equal). Charuleka Varadharajan: Project Administration (supporting); Writing – Review & Editing (equal); Supervision (equal); Funding Acquisition (equal). Xiaowei Jia: Conceptualization (supporting); Writing – Review & Editing (supporting). Vipin Kumar: Conceptualization (equal); Investigation (equal); Project Administration (lead); Writing – Review & Editing (equal); Supervision (lead); Methodology (equal); Funding Acquisition (equal).

Competing interest

Vipin Kumar is on the advisory board for the Environmental Data Science journal.

Data availability statement

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

Funding statement

This research is funded, in part, by NSF grants numbers 2313174, 2147195, 2239175, 2316305, 1934721 (HDR program), NSF LEAP Science and Technology Center award #2019625, and National AI Research Institutes Competitive Award no. 2023-67021-39829. Additional support was provided by the U.S. Department of Energy, Office of Science, Biological and Environmental Research Program for the iNAIADS DOE Early Career Award under contract no. DE-AC02-05CH11231. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory, operated under Contract No. DE-AC02-05CH11231 under the NESAP for Learning program. The U.S. Government retains, and the publisher, by accepting the article for publication, acknowledges, that the U.S. Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for U.S. Government purposes.

Ethical standards

The research meets all ethical guidelines, including adherence to the legal requirements of the study country.

Footnotes

¹ In this paper we use the term ““entity-aware”” in the context of a general way of modeling a large number of entities with inherent characteristics with ML, as opposed to the “entity-aware long short-term memory” architecture in Kratzert et al. (Reference Kratzert, Klotz, Shalev, Klambauer, Hochreiter and Nearing2019c).

² In this paper, we use the term “knowledge-guided” as opposed to “physics-guided” but they are used interchangeably in the literature.

References

Abdar, M, Pourpanah, F, Hussain, S, Rezazadegan, D, Liu, L, Ghavamzadeh, M, Fieguth, P, Cao, X, Khosravi, A, Acharya, UR, et al. (2021) A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information Fusion 76, 243–297.CrossRef Google Scholar

Ahuja, S (2016) Chemistry and Water: The Science behind Sustaining the world’s most Crucial Resource. Elsevier.Google Scholar

AlQuraishi, M and Sorger, PK (2021) Differentiable biology: Using deep learning for biophysics-based and data-driven modeling of molecular mechanisms. Nature Methods 18(10), 1169–1180.CrossRef Google Scholar PubMed

Al-Shedivat, M, Li, L, Xing, E and Talwalkar, A (2021) On data efficiency of meta-learning. In International Conference on Artificial Intelligence and Statistics (pp. 1369–1377).Google Scholar

Amanambu, AC, Mossa, J and Chen, Y-H (2022) Hydrological drought forecasting using a deep transformer model. Water 14(22), 3611.CrossRef Google Scholar

Araza, A, Hein, L, Duku, C, Rawlins, MA and Lomboy, R (2020) Data-driven streamflow modelling in ungauged basins: Regionalizing random forest (RF) models. bioRxiv, 2020-11.Google Scholar

Arnold, JG, Srinivasan, R, Muttiah, RS and Williams, JR (1998) Large area hydrologic modeling and assessment Part I: Model development 1. JAWRA Journal of the American Water Resources Association 34(1), 73–89.CrossRef Google Scholar

Arrieta, AB, Diaz-Rodriguez, N, Del Ser, J, Bennetot, A, Tabik, S, Barbado, A, Garcia, S, Gil-Lopez, S, Molina, D, Benjamins, R, et al (2020) Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion 58, 82–115.CrossRef Google Scholar

Arsenault, R, Martel, J-L, Brunet, F, Brissette, F and Mai, J (2023) Continuous streamflow prediction in ungauged basins: Long short-term memory neural networks clearly outperform traditional hydrological models. Hydrology and Earth System Sciences 27(1), 139–157.CrossRef Google Scholar

Aytac, E (2020) Unsupervised learning approach in defining the similarity of catchments: Hydrological response unit based k-means clustering, a demonstration on western black sea region of Turkey. International Soil and Water Conservation Research 8(3), 321–331.CrossRef Google Scholar

Ayzel, G, Kurochkina, L, Kazakov, E and Zhuravlev, S (2020) Streamflow prediction in ungauged basins: Benchmarking the efficiency of deep learning. E3S Web of Conferences 163, 01001.CrossRef Google Scholar

Bai, T and Tahmasebi, P (2022) Graph neural network for groundwater level forecasting. Journal of Hydrology 616, 128792.CrossRef Google Scholar

Bandai, T and Ghezzehei, TA (2022) Forward and inverse modeling of water flow in unsaturated soils with discontinuous hydraulic conductivities using physics-informed neural networks with domain decomposition. Hydrology and Earth System Sciences 26(16), 4469–4495.CrossRef Google Scholar

Bao, T, Jia, X, Zwart, J, Sadler, J, Appling, A, Oliver, S and Johnson, TT (2021) Partial differential equation driven dynamic graph networks for predicting stream water temperature. In 2021 IEEE International Conference on Data Mining (ICDM) (pp. 11–20).CrossRef Google Scholar

Başağaoğlu, H, Chakraborty, D, Lago, CD, Gutierrez, L, Şahinli, MA, Giacomoni, M, Furl, C, Mirchi, A, Moriasi, D and Şengor, SS (2022) A review on interpretable and explainable artificial intelligence in hydroclimatic applications. Water 14(8), 1230.CrossRef Google Scholar

Bastola, S, Ishidaira, H and Takeuchi, K (2008) Regionalisation of hydrological model parameters under parameter uncertainty: A case study involving topmodel and basins across the globe. Journal of Hydrology 357(3–4), 188–206.CrossRef Google Scholar

Battaglia, PW, Hamrick, JB, Bapst, V, Sanchez-Gonzalez, A, Zambaldi, V, Malinowski, M, Tacchetti, A, Raposo, D, Santoro, A, Faulkner, R, et al (2018). Relational inductive biases, deep learning, and graph networks. Preprint, arXiv:1806.01261.Google Scholar

Bergström, S (1976) Development and Application of a Conceptual Runoff Model for Scandinavian Catchments. SMHI Norrköping, Report RH07.Google Scholar

Beven, K and Freer, J (2001) Equifinality, data assimilation, and uncertainty estimation in mechanistic modelling of complex environmental systems using the glue methodology. Journal of Hydrology 249(1–4), 11–29.CrossRef Google Scholar

Bhasme, P, Vagadiya, J and Bhatia, U (2022) Enhancing predictive skills in physically-consistent way: Physics informed machine learning for hydrological processes. Journal of Hydrology 615, 128618. https://doi.org/10.1016/j.jhydrol.2022.128618.CrossRef Google Scholar

Bindas, T, Shen, C and Bian, Y (2020) Routing flood waves through the river network utilizing physics-guided machine learning and the Muskingum-Cunge method. AGU Fall Meeting Abstracts 2020, H224–H204.Google Scholar

Blitzer, J, Dredze, M and Pereira, F (2007) Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (pp. 440–447).Google Scholar

Blöschl, G, Bloschl, G, Sivapalan, M, Wagener, T, Savenije, H and Viglione, A (2013) Runoff Prediction in Ungauged Basins: Synthesis Across Processes, Places and Scales. Cambridge University Press.CrossRef Google Scholar

Bousmalis, K, Irpan, A, Wohlhart, P, Bai, Y, Kelcey, M, Kalakrishnan, M, Downs, L, Ibarz, J, Pastor, P, Konolige, K, et al (2018). Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In 2018 IEEE International Conference on Robotics and Automation (ICRA) (pp. 4243–4250).CrossRef Google Scholar

Brazdil, PB (ed) (2009) Metalearning: Applications to Data Mining [OCLC: ocn298595059]. Springer.CrossRef Google Scholar

Breiman, L (1996) Bagging predictors. Machine Learning 24, 123–140.CrossRef Google Scholar

Bronstein, MM, Bruna, J, LeCun, Y, Szlam, A and Vandergheynst, P (2017) Geometric deep learning: Going beyond Euclidean data. IEEE Signal Processing Magazine 34(4), 18–42.CrossRef Google Scholar

Brownlee, J (2018) Better Deep Learning: Train faster, Reduce Overfitting, and Make Better Predictions. Machine Learning Mastery.Google Scholar

Burn, DH (1990a) An appraisal of the “region of influence” approach to flood frequency analysis. Hydrological Sciences Journal 35(2), 149–165.CrossRef Google Scholar

Burn, DH (1990b) Evaluation of regional flood frequency analysis with a region of influence approach. Water Resources Research 26(10), 2257–2265.CrossRef Google Scholar

Burn, DH and Boorman, DB (1993) Estimation of hydrological parameters at ungauged catchments. Journal of Hydrology 143(3–4), 429–454.CrossRef Google Scholar

Caughlan, L and Oakley, KL (2001) Cost considerations for long-term ecological monitoring [Publisher: Elsevier]. Ecological Indicators 1(2), 123–134.CrossRef Google Scholar

Chen, Z, Zhu, Z, Jiang, H and Sun, S (2020) Estimating daily reference evapotranspiration based on limited meteorological data using deep learning and classical machine learning methods. Journal of Hydrology 591, 125286.CrossRef Google Scholar

Chen, S, Appling, A, Oliver, S, Corson-Dosch, H, Read, J, Sadler, J, Zwart, J and Jia, X (2021a) Heterogeneous stream-reservoir graph networks with data assimilation. In 2021 IEEE International Conference on Data Mining (ICDM) (pp. 1024–1029).CrossRef Google Scholar

Chen, S, Zwart, JA and Jia, X (2022) Physics-guided graph meta learning for predicting water temperature and streamflow in stream networks. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 2752–2761).CrossRef Google Scholar

Cho, K and Kim, Y (2022) Improving streamflow prediction in the WRF-hydro model with LSTM networks. Journal of Hydrology 605, 127297.CrossRef Google Scholar

Choi, J, Lee, J and Kim, S (2022) Utilization of the long short-term memory network for predicting streamflow in ungauged basins in Korea. Ecological Engineering 182, 106699.CrossRef Google Scholar

Chung, J, Gulcehre, C, Cho, K and Bengio, Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. Preprint, arXiv:1412.3555.Google Scholar

Ciano, G, Rossi, A, Bianchini, M and Scarselli, F (2021) On inductive–transductive learning with graph neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(2), 758–769.CrossRef Google Scholar

Cibin, R, Athira, P, Sudheer, K and Chaubey, I (2014) Application of distributed hydrological models for predictions in ungauged basins: A method to quantify predictive uncertainty. Hydrological Processes 28(4), 2033–2045.CrossRef Google Scholar

Ciulla, F, Willard, J, Weierbach, H and Varadharajan, C (2022) Interpretable classification of the contiguous United States river catchments using network science methods. AGU Fall Meeting Abstracts 2022.Google Scholar

Corns, SM, Long, SK, Hale, J, Kanwar, B, Vanfossan, S, et al. (2022). Deep Learning for Unmonitored Water Level Prediction and Risk Assessment (Tech. Rep.). Missouri: Department of Transportation. Construction and Materials Division.Google Scholar

Csurka, G (2017) A comprehensive survey on domain adaptation for visual applications. In Domain Adaptation in Computer Vision Applications (pp. 1–35).CrossRef Google Scholar

Cuomo, S, De Rosa, M, Giampaolo, F, Izzo, S and Di Cola, VS (2023) Solving groundwater flow equation using physics-informed neural networks. Computers & Mathematics with Applications 145, 106–123.CrossRef Google Scholar

Daw, A and Karpatne, A (2019) Physics-aware architecture of neural networks for uncertainty quantification: Application in lake temperature modeling. In FEED Workshop at Knowledge Discovery and Data Mining Conference (SIGKDD) 2019.Google Scholar

Daw, A, Thomas, RQ, Carey, CC, Read, JS, Appling, AP and Karpatne, A (2020) Physics-guided architecture (PGA) of neural networks for quantifying uncertainty in lake temperature modeling. In Proceedings of the 2020 SIAM International Conference on Data Mining. pp. 532–540.CrossRef Google Scholar

DelSole, T, Nattala, J and Tippett, MK (2014) Skill improvement from increased ensemble size and model diversity. Geophysical Research Letters 41(20), 7331–7342.CrossRef Google Scholar

Dhal, P and Azad, C (2022) A comprehensive survey on feature selection in the various fields of machine learning. Applied Intelligence, 1–39.Google Scholar

Ebert-Uphoff, I and Hilburn, K (2020) Evaluation, tuning, and interpretation of neural networks for working with images in meteorological applications. Bulletin of the American Meteorological Society 101(12), E2149–E2170.CrossRef Google Scholar

Esteban, C, Staeck, O, Baier, S, Yang, Y and Tresp, V (2016) Predicting clinical events by combining static and dynamic information using recurrent neural networks. In 2016 IEEE International Conference on Healthcare Informatics (ICHI) (pp. 93–101).CrossRef Google Scholar

Falcone, JA (2011) GAGES-II: Geospatial Attributes of Gages for Evaluating Streamflow (Tech. Rep.). US Geological Survey.Google Scholar

Fan, H, Jiang, M, Xu, L, Zhu, H, Cheng, J and Jiang, J (2020) Comparison of long short term memory networks and the hydrological model in runoff simulation. Water 12(1), 175.CrossRef Google Scholar

Fang, K, Pan, M and Shen, C (2018) The value of SMAP for long-term soil moisture estimation with the help of deep learning. IEEE Transactions on Geoscience and Remote Sensing 57(4), 2221–2233.CrossRef Google Scholar

Fang, K, Kifer, D, Lawson, K, Feng, D and Shen, C (2022) The data synergy effects of time-series deep learning models in hydrology. Water Resources Research 58(4), e2021WR029583.CrossRef Google Scholar

Feigl, M, Lebiedzinski, K, Herrnegger, M and Schulz, K (2021) Machine-learning methods for stream water temperature prediction. Hydrology and Earth System Sciences 25(5), 2951–2977. https://doi.org/10.5194/hess-25-2951-2021.CrossRef Google Scholar

Feng, D, Lawson, K and Shen, C (2021) Mitigating prediction error of deep learning streamflow models in large data-sparse regions with ensemble modeling and soft data. Geophysical Research Letters 48(14), e2021GL092999.CrossRef Google Scholar

Feng, D, Beck, H, Lawson, K and Shen, C (2022a) The suitability of differentiable, learnable hydrologic models for ungauged regions and climate change impact assessment. Hydrology and Earth System Sciences Discussions, 1–28.Google Scholar

Feng, D, Liu, J, Lawson, K and Shen, C (2022b) Differentiable, learnable, regionalized process-based models with multiphysical outputs can approach state-of-the-art hydrologic prediction accuracy. Water Resources Research 58(10), e2022WR032404.CrossRef Google Scholar

Feng, J, Sha, H, Ding, Y, Yan, L and Yu, Z (2022c) Graph convolution based spatial-temporal attention LSTM model for flood forecasting. In 2022 International Joint Conference on Neural Networks (IJCNN) (pp. 1–8).CrossRef Google Scholar

Feng, D, Tan, Z and He, Q (2023) Physics-informed neural networks of the saint-venant equations for downscaling a large-scale river model. Water Resources Research 59(2), e2022WR033168.CrossRef Google Scholar

Frame, JM, Kratzert, F, Raney, A, Rahman, M, Salas, FR and Nearing, GS (2021) Post-processing the national water model with long short-term memory networks for streamflow predictions and model diagnostics. JAWRA Journal of the American Water Resources Association 57(6), 885–905.CrossRef Google Scholar

Frame, JM, Kratzert, F, Klotz, D, Gauch, M, Shelev, G, Gilon, O, Qualls, LM, Gupta, HV and Nearing, GS (2022) Deep learning rainfall–runoff predictions of extreme events. Hydrology and Earth System Sciences 26(13), 3377–3392.CrossRef Google Scholar

Gal, Y and Ghahramani, Z (2016) Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning (pp. 1050–1059).Google Scholar

Gelbrecht, M, White, A, Bathiany, S and Boers, N (2022) Differentiable programming for earth system modeling. Preprint, arXiv:2208.13825.CrossRef Google Scholar

George, D, Talling, J and Rigg, E (2000) Factors influencing the temporal coherence of five lakes in the English lake district. Freshwater Biology 43(3), 449–461.CrossRef Google Scholar

Gholizadeh, MH, Melesse, AM and Reddi, L (2016) A comprehensive review on water quality parameters estimation using remote sensing techniques. Sensors 16(8), 1298.CrossRef Google Scholar PubMed

Ghosh, R, Li, B, Tayal, K, Kumar, V and Jia, X (2022) Meta-transfer learning: An application to streamflow modeling in river-streams. In 2022 IEEE International Conference on Data Mining (ICDM) (pp. 161–170).CrossRef Google Scholar

Ghosh, R, Yang, H, Khandelwal, A, He, E, Renganathan, A, Sharma, S, Jia, X and Kumar, V (2023) Entity aware modelling: A survey. Preprint, arXiv:2302.08406.Google Scholar

Giardino, C, Brando, V, Gege, P, Pinnel, N, Hochberg, E, Knaeps, E, Reusen, I, Doerffer, R, Bresciani, M, Braga, F, et al (2019) Imaging spectrometry of inland and coastal waters: State of the art, achievements and perspectives. Surveys in Geophysics 40(3), 401–429.CrossRef Google Scholar

Golian, S, Murphy, C and Meresa, H (2021) Regionalization of hydrological models for flow estimation in ungauged catchments in Ireland. Journal of Hydrology: Regional Studies 36, 100859.Google Scholar

Guibas, J, Mardani, M, Li, Z, Tao, A, Anandkumar, A and Catanzaro, B (2021) Adaptive fourier neural operators: Efficient token mixers for transformers. Preprint, arXiv:2111.13587.Google Scholar

Guo, X, Chen, Y, Liu, X and Zhao, Y (2020a) Extraction of snow cover from high-resolution remote sensing imagery using deep learning on a small dataset. Remote Sensing Letters 11(1), 66–75.CrossRef Google Scholar

Guo, Y, Zhang, Y, Zhang, L and Wang, Z (2020b) Regionalization of hydrological modeling for predicting streamflow in ungauged catchments: A comprehensive review. WIREs Water. https://doi.org/10.1002/wat2.1487.CrossRef Google Scholar

Guo, Y, Zhang, Y, Zhang, L and Wang, Z (2021) Regionalization of hydrological modeling for predicting streamflow in ungauged catchments: A comprehensive review. Wiley Interdisciplinary Reviews: Water 8(1), e1487.CrossRef Google Scholar

Guo, Q, Zhao, Y, Lu, C and Luo, J (2023) High-dimensional inverse modeling of hydraulic tomography by physics informed neural network (HT-PINN). Journal of Hydrology 616, 128828.CrossRef Google Scholar

Hales, RC, Sowby, RB, Williams, GP, Nelson, EJ, Ames, DP, Dundas, JB and Ogden, J (2022) Saber: A model-agnostic postprocessor for bias correcting discharge from large hydrologic models. Hydrology 9(7), 113.CrossRef Google Scholar

Hansen, LK and Salamon, P (1990) Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(10), 993–1001.CrossRef Google Scholar

Hanson, PC, Stillman, AB, Jia, X, Karpatne, A, Dugan, HA, Carey, CC, Stachelek, J, Ward, NK, Zhang, Y, Read, JS, et al. (2020) Predicting lake surface water phosphorus dynamics using process-guided machine learning [Publisher: Elsevier]. Ecological Modelling 430, 109136.CrossRef Google Scholar

Hao, J and Tao, Y (2022) Adversarially robust water quality assessment associated with power plants. Energy Reports 8, 37–45.CrossRef Google Scholar

He, Y, Wetterhall, F, Cloke, H, Pappenberger, F, Wilson, M, Freer, J and McGregor, G (2009) Tracking the uncertainty in flood alerts driven by grand ensemble weather predictions. Meteorological Applications: A Journal of Forecasting, Practical Applications, Training Techniques and Modelling 16(1), 91–101.CrossRef Google Scholar

He, Q, Barajas-Solano, D, Tartakovsky, G and Tartakovsky, AM (2020) Physics-informed neural networks for multiphysics data assimilation with application to subsurface transport. Advances in Water Resources 141, 103610.CrossRef Google Scholar

Herath, HMVV, Chadalawada, J and Babovic, V (2021) Hydrologically informed machine learning for rainfall–runoff modelling: Towards distributed modelling. Hydrology and Earth System Sciences 25(8), 4373–4401.CrossRef Google Scholar

Hill, RA, Weber, MH, Leibowitz, SG, Olsen, AR and Thornbrugh, DJ (2016) The stream-catchment (StreamCat) dataset: A database of watershed metrics for the conterminous United States. JAWRA Journal of the American Water Resources Association 52(1), 120–128.CrossRef Google Scholar

Hipsey, MR, Bruce, LC, Boon, C, Busch, B, Carey, CC, Hamilton, DP, Hanson, PC, Read, JS, de Sousa, E, Weber, M and Winslow, LA (2019) A general lake model (GLM 3.0) for linking with high-frequency sensor data from the global Lake ecological observatory network (GLEON). Geoscientific Model Development 12(1), 473–523. https://doi.org/10.5194/gmd-12-473-2019.CrossRef Google Scholar

Hoedt, P-J, Kratzert, F, Klotz, D, Halmich, C, Holzleitner, M, Nearing, G, Hochreiter, S and Klambauer, G (2021) MCLSTM: Mass-conserving LSTM. Preprint, arXiv:2101.05186 [CS, STAT]. Retrieved from January 25, 2021.Google Scholar

Hoffman, J, Wang, D, Yu, F and Darrell, T (2016) FCNS in the wild: Pixel-level adversarial and constraint-based adaptation. Preprint, arXiv:1612.02649.Google Scholar

Homer, CH, Fry, JA, Barnes, CA, et al (2012) The national land cover database. US Geological Survey Fact Sheet 3020(4), 1–4.Google Scholar

Hosking, JRM and Wallis, JR (1997) Regional Frequency Analysis. Cambridge University Press.CrossRef Google Scholar

Hubbard, SS, Varadharajan, C, Wu, Y, Wainwright, H and Dwivedi, D (2020) Emerging technologies and radical collaboration to advance predictive understanding of watershed hydrobiogeochemistry. Hydrological Processes 34(15).CrossRef Google Scholar

Huntington, TG, Hodgkins, G and Dudley, R (2003) Historical trend in river ice thickness and coherence in hydroclimatological trends in Maine. Climatic Change 61(1–2), 217–236.CrossRef Google Scholar

International Rivers (2007, October) Damming Statistics. https://archive.internationalrivers.org/damming-statistics.Google Scholar

Jia, X, Willard, JD, Karpatne, A, Read, JS, Zwart, JA, Steinbach, M and Kumar, V (2021a) Physics-guided machine learning for scientific discovery: An application in simulating lake temperature profiles [Publisher: ACM New York, NY]. ACM/IMS Transactions on Data Science 2(3), 1–26.CrossRef Google Scholar

Jia, X, Zwart, J, Sadler, J, Appling, A, Oliver, S, Markstrom, S, Willard, JD, Xu, S, Steinbach, M, Read, J and Kumar, V (2021b). Physics-guided recurrent graph model for predicting flow and temperature in river networks. Proceedings of the 2021 SIAM International Conference on Data Mining (SDM) 9.CrossRef Google Scholar

Jiang, S, Zheng, Y and Solomatine, D (2020) Improving AI system awareness of geoscience knowledge: Symbiotic integration of physical approaches and deep learning. Geophysical Research Letters 47(13). https://doi.org/10.1029/2020GL088229.CrossRef Google Scholar

Jiang, J, Huang, Z-G, Grebogi, C and Lai, Y-C (2022) Predicting extreme events from data using deep machine learning: When and where. Physical Review Research 4(2), 023028.CrossRef Google Scholar

Jing, H, He, X, Tian, Y, Lancia, M, Cao, G, Crivellari, A, Guo, Z and Zheng, C (2022) Comparison and interpretation of data-driven models for simulating site-specific human-impacted groundwater dynamics in the North China plain. Journal of Hydrology 616, 128751.CrossRef Google Scholar

Kalin, L, Isik, S, Schoonover, JE and Lockaby, BG (2010) Predicting water quality in unmonitored watersheds using artificial neural networks. Journal of Environmental Quality 39(4), 1429–1440.CrossRef Google Scholar PubMed

Kao, I-F, Liou, J-Y, Lee, M-H and Chang, F-J (2021) Fusing stacked autoencoder and long short-term memory for regional multistep-ahead flood inundation forecasts. Journal of Hydrology 598, 126371.CrossRef Google Scholar

Karpatne, A, Atluri, G, Faghmous, JH, Steinbach, M, Banerjee, A, Ganguly, A, Shekhar, S, Samatova, N and Kumar, V (2017a) Theory-guided data science: A new paradigm for scientific discovery from data. IEEE Transactions on Knowledge and Data Engineering 29(10), 2318–2331.CrossRef Google Scholar

Karpatne, A, Watkins, W, Read, J and Kumar, V (2017b) Physics-guided neural networks (PGNN): An application in lake temperature modeling. Preprint, arXiv:1710.11431.Google Scholar

Karpatne, A, Kannan, R and Kumar, V (2022) Knowledge Guided Machine Learning: Accelerating Discovery Using Scientific Knowledge and Data. CRC Press.CrossRef Google Scholar

Kazadi, AN, Doss-Gollin, J, Sebastian, A and Silva, A (2022). Flood prediction with graph neural networks. In NeurIPS 2022 Workshop on Tackling Climate Change with Machine Learning. https://www.climatechange.ai/papers/neurips2022/75 Google Scholar

Khan, W, Ghazanfar, MA, Azam, MA, Karami, A, Alyoubi, KH and Alfakeeh, AS (2020) Stock market prediction using machine learning classifiers and social media, news. Journal of Ambient Intelligence and Humanized Computing 13(3), 1–24.Google Scholar

Khandelwal, A, Xu, S, Li, X, Jia, X, Stienbach, M, Duffy, C, Nieber, J and Kumar, V (2020) Physics guided machine learning methods for hydrology. Preprint, arXiv:2012.02854.Google Scholar

Kimura, N, Yoshinaga, I, Sekijima, K, Azechi, I and Baba, D (2019) Convolutional neural network coupled with a transfer-learning approach for time-series flood predictions. Water 12(1), 96.CrossRef Google Scholar

Kingston, DG, McGregor, GR, Hannah, DM and Lawler, DM (2006) River flow teleconnections across the northern North Atlantic region. Geophysical Research Letters 33(14).CrossRef Google Scholar

Koch, J and Schneider, R (2022) Long short-term memory networks enhance rainfall-runoff modelling at the national scale of Denmark. GEUS Bulletin 49.CrossRef Google Scholar

Konrad, CP, Anderson, SW, Restivo, DE and David, JE (2022) Network Analysis of USGS Streamflow Gages (Ver. 2.0, May 2023). https://doi.org/10.5066/P9C8NYTOCrossRef Google Scholar

Kratzert, F, Herrnegger, M, Klotz, D, Hochreiter, S and Klambauer, G (2019a) Neuralhydrology–interpreting LSTMS in hydrology. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (pp. 347–362). Springer.CrossRef Google Scholar

Kratzert, F, Klotz, D, Herrnegger, M, Sampson, AK, Hochreiter, S and Nearing, GS (2019b) Toward improved predictions in ungauged basins: Exploiting the power of machine learning. Water Resources Research 55(12), 11344–11354.CrossRef Google Scholar

Kratzert, F, Klotz, D, Shalev, G, Klambauer, G, Hochreiter, S and Nearing, G (2019c) Benchmarking a catchment-aware long short-term memory network (LSTM) for large-scale hydrological modeling. Hydrology and Earth System Sciences Discussions 2019, 1–32.Google Scholar

Kratzert, F, Klotz, D, Shalev, G, Klambauer, G, Hochreiter, S and Nearing, G (2019d) Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets. Hydrology and Earth System Sciences 23(12), 5089–5110. https://doi.org/10.5194/hess-23-5089-2019.CrossRef Google Scholar

Kratzert, F, Klotz, D, Gauch, M, Klingler, C, Nearing, G and Hochreiter, S (2021) Large-scale river network modeling using graph neural networks. In EGU General Assembly Conference Abstracts, EGU21–13375.CrossRef Google Scholar

Kratzert, F, Nearing, G, Addor, N, Erickson, T, Gauch, M, Gilon, O, Gudmundsson, L, Hassidim, A, Klotz, D, Nevo, S, et al (2023) Caravan-a global community dataset for large-sample hydrology. Scientific Data 10(1), 61.CrossRef Google Scholar

Kratzert, F, Gauch, M, Klotz, D and Nearing, G (2024) Hess opinions: Never train an LSTM on a single basin. Hydrology and Earth System Sciences Discussions 2024, 1–19.Google Scholar

Kumar, R, Samaniego, L and Attinger, S (2013) Implications of distributed hydrologic model parameterization on water fluxes at multiple scales and locations. Water Resources Research 49(1), 360–379.CrossRef Google Scholar

Kursa, MB and Rudnicki, WR (2010) Feature selection with the boruta package. Journal of Statistical Software 36, 1–13.CrossRef Google Scholar

Le, M-H, Kim, H, Adam, S, Do, HX, Beling, P and Lakshmi, V (2022) Streamflow estimation in ungauged regions using machine learning: Quantifying uncertainties in geographic extrapolation. Hydrology and Earth System Sciences Discussions, 1–24.Google Scholar

Lea, C, Flynn, MD, Vidal, R, Reiter, A and Hager, GD(2017) Temporal convolutional networks for action segmentation and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 156–165).CrossRef Google Scholar

Lee, J-Y, Choi, C, Kang, D, Kim, BS and Kim, T-W (2020) Estimating design floods at ungauged watersheds in South Korea using machine learning models. Water 12(11), 3022.CrossRef Google Scholar

Lees, T, Reece, S, Kratzert, F, Klotz, D, Gauch, M, De Bruijn, J, Kumar Sahu, R, Greve, P, Slater, L and Dadson, SJ (2022) Hydrological concept formation inside long short-term memory (lstm) networks. Hydrology and Earth System Sciences 26(12), 3079–3101.CrossRef Google Scholar

Lemke, C, Budka, M and Gabrys, B (2015) Metalearning: A survey of trends and technologies. Artificial Intelligence Review 44(1), 117–130. https://doi.org/10.1007/s10462-013-9406-y.CrossRef Google Scholar PubMed

Lettenmaier, DP, Wallis, JR and Wood, EF (1987) Effect of regional heterogeneity on flood frequency estimation. Water Resources Research 23(2), 313–323.CrossRef Google Scholar

Li, Z, Kovachki, N, Azizzadenesheli, K, Liu, B, Bhattacharya, K, Stuart, A and Anandkumar, A (2020). Fourier neural operator for parametric partial differential equations. Preprint, arXiv:2010.08895.Google Scholar

Li, D, Lyons, PG, Klaus, J, Gage, BF, Kollef, M and Lu, C (2021a) Integrating static and time-series data in deep recurrent models for oncology early warning systems. CIKM, 913–936.Google Scholar

Li, Q, Wang, Z, Shangguan, W, Li, L, Yao, Y and Yu, F (2021b) Improved daily smap satellite soil moisture prediction over China using deep learning model with transfer learning. Journal of Hydrology 600, 126698.CrossRef Google Scholar

Li, X, Khandelwal, A, Jia, X, Cutler, K, Ghosh, R, Renganathan, A, Xu, S, Tayal, K, Nieber, J, Duffy, C, et al (2022) Regionalization in a global hydrologic deep learning model: From physical descriptors to random vectors. Water Resources Research 58(8), e2021WR031794.CrossRef Google Scholar

Lin, C, Zhang, Y, Ivy, J, Capan, M, Arnold, R, Huddleston, JM and Chi, M (2018) Early diagnosis and prediction of sepsis shock by combining static and dynamic information using convolutional-LSTM. In 2018 IEEE International Conference on Healthcare Informatics (ICHI) (pp. 219–228).CrossRef Google Scholar

Lipton, ZC (2018) The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue 16(3), 31–57.CrossRef Google Scholar

Liu, C, Liu, D and Mu, L (2022) Improved transformer model for enhanced monthly streamflow predictions of the yangtze river. IEEE Access 10, 58240–58253.CrossRef Google Scholar

Liu, J, Bian, Y and Shen, C (2023) Probing the limit of hydrologic predictability with the transformer network. Preprint, arXiv:2306.12384.Google Scholar

Longa, A, Lachi, V, Santin, G, Bianchini, M, Lepri, B, Lio, P, Scarselli, F and Passerini, A (2023) Graph neural networks for temporal graphs: State of the art, open challenges, and opportunities. Preprint, arXiv:2302.01018.Google Scholar

López López, P, Verkade, J, Weerts, A and Solomatine, D (2014) Alternative configurations of quantile regression for estimating predictive uncertainty in water level forecasts for the upper Severn river: A comparison. Hydrology and Earth System Sciences 18(9), 3411–3428.CrossRef Google Scholar

Lotsberg, BN (2021) LSTM Models Applied on Hydrological Time Series [Master’s thesis, University of Oslo].Google Scholar

Ma, K, Feng, D, Lawson, K, Tsai, W-P, Liang, C, Huang, X, Sharma, A and Shen, C (2021) Transferring hydrologic data across continents–leveraging data-rich regions to improve hydrologic prediction in data-sparse regions [Publisher: Wiley Online Library]. Water Resources Research 57(5), e2020WR028600.CrossRef Google Scholar

Magnuson, J, Benson, B and Kratz, T (1990) Temporal coherence in the limnology of a suite of lakes in Wisconsin, USA. Freshwater Biology 23(1), 145–159.CrossRef Google Scholar

Mamalakis, A, Barnes, EA and Ebert-Uphoff, I (2022) Investigating the fidelity of explainable artificial intelligence methods for applications of convolutional neural networks in geoscience. Artificial Intelligence for the Earth Systems 1(4), e220012.CrossRef Google Scholar

Mamalakis, A, Barnes, EA and Ebert-Uphoff, I (2023) Carefully choose the baseline: Lessons learned from applying XAI attribution methods for regression tasks in geoscience. Artificial Intelligence for the Earth Systems 2(1), e220058.CrossRef Google Scholar

Markstrom, SL (2012) P2s–Coupled Simulation with the Precipitation-Runoff Modeling System (PRMS) and the Stream Temperature Network (SNTEMP) Models. US Department of the Interior, US Geological Survey.CrossRef Google Scholar

McMillan, HK (2021) A review of hydrologic signatures and their applications. Wiley Interdisciplinary Reviews: Water 8(1), e1499.CrossRef Google Scholar

Molnar, C (2020) Interpretable Machine Learning. Lulu.com.Google Scholar

Moore, TN, Mesman, JP, Ladwig, R, Feldbauer, J, Olsson, F, Pilla, RM, Shatwell, T, Venkiteswaran, JJ, Delany, AD, Dugan, H, et al (2021) Lakeensemblr: An R package that facilitates ensemble modelling of lakes. Environmental Modelling and Software 143, 105101.CrossRef Google Scholar

Moshe, Z, Metzger, A, Elidan, G, Kratzert, F, Nevo, S and El-Yaniv, R (2020) Hydronets: Leveraging river structure for hydrologic modeling. Preprint, arXiv:2007.00595.Google Scholar

Mudigonda, M, Ram, P, Kashinath, K, Racah, E, Mahesh, A, Liu, Y, Beckham, C, Biard, J, Kurth, T, Kim, S, et al (2021) Deep learning for detecting extreme weather patterns. In Deep Learning for the Earth Sciences: A Comprehensive Approach to Remote Sensing, Climate Science, and Geosciences (pp. 161–185).CrossRef Google Scholar

Muhebwa, A, Wi, S, Gleason, CJ and Taneja, J (2021) Towards improved global river discharge prediction in ungauged basins using machine learning and satellite observations. NeurIPS.Google Scholar

Muñoz-Sabater, J, Dutra, E, Agustí-Panareda, A, Albergel, C, Arduini, G, Balsamo, G, Boussetta, S, Choulga, M, Harrigan, S, Hersbach, H, et al (2021) ERA5-Land: A state-of-the-art global reanalysis dataset for land applications [Publisher: Copernicus GmbH]. Earth System Science Data 13(9), 4349–4383.CrossRef Google Scholar

Muther, T, Dahaghi, AK, Syed, FI and Van Pham, V (2022) Physical laws meet machine intelligence: Current developments and future directions. Artificial Intelligence Review, 1–67.Google Scholar

Naeini, EZ and Uwaifo, J (2019) Transfer learning and auto-ml: A geoscience perspective. First Break 37(9), 65–71.CrossRef Google Scholar

Nazari, LF, Camponogara, E and Seman, LO (2022) Physics-informed neural networks for modeling water flows in a river channel. In IEEE Transactions on Artificial Intelligence.Google Scholar

Nearing, GS, Kratzert, F, Sampson, AK, Pelissier, CS, Klotz, D, Frame, JM, Prieto, C and Gupta, HV (2021) What role does hydrological science play in the age of machine learning?. Water Resources Research 57(3), e2020WR028091.CrossRef Google Scholar

Ng, B, Samadi, V, Wang, C and Bao, J (2021) Physics-Informed Deep Learning for Multiscale Water Cycle Prediction (Tech. Rep.). Livermore: Lawrence Livermore National Lab.(LLNL).CrossRef Google Scholar

Niu, S, Liu, Y, Wang, J and Song, H (2020) A decade survey of transfer learning (2010–2020). IEEE Transactions on Artificial Intelligence 1(2), 151–166.CrossRef Google Scholar

Niu, J, Xu, W, Qiu, H, Li, S and Dong, F (2023) 1-d coupled surface flow and transport equations revisited via the physics-informed neural network approach. Journal of Hydrology 625, 130048.CrossRef Google Scholar

NOAA (2016) National Water Model. Improving NOAA’s Water Prediction Services.Google Scholar

Nogueira Filho, FJM, Souza Filho, F d A, Porto, VC, Vieira Rocha, R, Sousa Estácio, ÁB and Martins, ESPR (2022) Deep learning for streamflow regionalization for ungauged basins: Application of long-short-term-memory cells in semiarid regions. Water 14(9), 1318.CrossRef Google Scholar

Noori, N, Kalin, L and Isik, S (2020) Water quality prediction using SWAT-ANN coupled approach. Journal of Hydrology 590, 125220.CrossRef Google Scholar

O’Neill, P, Entekhabi, D, Njoku, E and Kellogg, K (2010) The NASA soil moisture active passive (SMAP) mission: Overview. In 2010 IEEE International Geoscience and Remote Sensing Symposium (pp. 3236–3239).CrossRef Google Scholar

Odermatt, D, Gitelson, A, Brando, VE and Schaepman, M (2012) Review of constituent retrieval in optically deep and complex waters from satellite imagery. Remote Sensing of Environment 118, 116–126.CrossRef Google Scholar

Oğuz, A and Ertuğrul, ÖF (2023) A survey on applications of machine learning algorithms in water quality assessment and water supply and management. Water Science and Technology Water Supply 23(1).CrossRef Google Scholar

Oruche, R, Egede, L, Baker, T and O’Donncha, F (2021) Transfer learning to improve streamflow forecasts in data sparse regions. Preprint, arXiv:2112.03088.Google Scholar

Oudin, L, Andreassian, V, Perrin, C, Michel, C and Le Moine, N (2008) Spatial proximity, physical similarity, regression and ungaged catchments: A comparison of regionalization approaches based on 913 French catchments. Water Resources Research 44(3).CrossRef Google Scholar

Ouyang, W, Lawson, K, Feng, D, Ye, L, Zhang, C and Shen, C (2021) Continental-scale streamflow modeling of basins with reservoirs: Towards a coherent deep-learning-based strategy. Journal of Hydrology 599, 126455.CrossRef Google Scholar

Padarian, J, Minasny, B and McBratney, A (2019) Transfer learning to localise a continental soil VIS-NIR calibration model. Geoderma 340, 279–288.CrossRef Google Scholar

Pal, A and Balasubramanian, VN (2019) Zero-shot task transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2189–2198).CrossRef Google Scholar

Pan, SJ and Yang, Q (2009) A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22(10), 1345–1359.CrossRef Google Scholar

Patel, VM, Gopalan, R, Li, R and Chellappa, R (2015) Visual domain adaptation: A survey of recent advances. IEEE Signal Processing Magazine 32(3), 53–69.CrossRef Google Scholar

Pathak, J, Subramanian, S, Harrington, P, Raja, S, Chattopadhyay, A, Mardani, M, Kurth, T, Hall, D, Li, Z, Azizzadenesheli, K, et al (2022) Fourcastnet: A global data-driven high-resolution weather model using adaptive Fourier neural operators. Preprint, arXiv:2202.11214.Google Scholar

Potdar, AS, Kirstetter, P-E, Woods, D and Saharia, M (2021) Toward predicting flood event peak discharge in ungauged basins by learning universal hydrological behaviors with machine learning. Journal of Hydrometeorology 22(11), 2971–2982.Google Scholar

Prieto, C, Le Vine, N, Kavetski, D, Garcia, E and Medina, R (2019) Flow prediction in ungauged catchments using probabilistic random forests regionalization and new statistical adequacy tests. Water Resources Research 55(5), 4364–4392.CrossRef Google Scholar

Rahman, MH, Yuan, S, Xie, C and Sha, Z (2020) Predicting human design decisions with deep recurrent neural network combining static and dynamic data. Design Science 6, e15.CrossRef Google Scholar

Rahmani, F, Shen, C, Oliver, S, Lawson, K and Appling, A (2021) Deep learning approaches for improving prediction of daily stream temperature in data-scarce, unmonitored, and dammed basins. Hydrological Processes 35(11), e14400.CrossRef Google Scholar

Raissi, M, Perdikaris, P and Karniadakis, GE (2019) Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, 686–707.CrossRef Google Scholar

Rasheed, Z, Aravamudan, A, Sefidmazgi, AG, Anagnostopoulos, GC and Nikolopoulos, EI (2022) Advancing flood warning procedures in ungauged basins with machine learning. Journal of Hydrology 609, 127736.CrossRef Google Scholar

Razavi, T and Coulibaly, P (2013) Streamflow prediction in ungauged basins: Review of regionalization methods. Journal of Hydrologic Engineering 18(8), 958–975.CrossRef Google Scholar

Razavi, T and Coulibaly, P (2016) Improving streamflow estimation in ungauged basins using a multi-modelling approach. Hydrological Sciences Journal 61(15), 2668–2679.CrossRef Google Scholar

Razavi, S, Hannah, DM, Elshorbagy, A, Kumar, S, Marshall, L, Solomatine, DP, Dezfuli, A, Sadegh, M and Famiglietti, J (2022) Coevolution of machine learning and process-based modelling to revolutionize earth and environmental sciences: A perspective. Hydrological Processes 36(6), e14596.CrossRef Google Scholar

Read, EK, Carr, L, De Cicco, L, Dugan, HA, Hanson, PC, Hart, JA, Kreft, J, Read, JS and Winslow, LA (2017) Water quality data for national-scale aquatic research: The water quality portal. Water Resources Research 53(2), 1735–1745. https://doi.org/10.1002/2016WR019993.CrossRef Google Scholar

Read, JS, Jia, X, Willard, JD, Appling, AP, Zwart, JA, Oliver, SK, Karpatne, A, Hansen, GJ, Hanson, PC, Watkins, W, et al. (2019) Process-guided deep learning predictions of lake water temperature [Publisher: Wiley Online Library]. Water Resources Research 55(11), 9173–9190.CrossRef Google Scholar

Regonda, SK, Seo, D-J, Lawrence, B, Brown, JD and Demargne, J (2013) Short-term ensemble streamflow forecasting using operationally-produced single-valued streamflow forecasts–a hydrologic model output statistics (HMOS) approach. Journal of Hydrology 497, 80–96.CrossRef Google Scholar

Reichstein, M, Camps-Valls, G, Stevens, B, Jung, M, Denzler, J, Carvalhais, N, et al. (2019) Deep learning and process understanding for data-driven earth system science [Publisher: Nature Publishing Group]. Nature 566(7743), 195–204.CrossRef Google Scholar

Remeseiro, B and Bolon-Canedo, V (2019) A review of feature selection methods in medical applications. Computers in Biology and Medicine 112, 103375.CrossRef Google Scholar PubMed

Risley, JC, Constantz, J, Essaid, H and Rounds, S (2010) Effects of upstream dams versus groundwater pumping on stream temperature under varying climate conditions. Water Resources Research 46(6).CrossRef Google Scholar

Romera-Paredes, B and Torr, P (2015) An embarrassingly simple approach to zero-shot learning. In International conference on machine learning (pp. 2152–2161).Google Scholar

Rose, KC, Winslow, LA, Read, JS and Hansen, GJ (2016) Climate-induced warming of lakes can be either amplified or suppressed by trends in water clarity. Limnology and Oceanography Letters 1(1), 44–53.CrossRef Google Scholar

Rossi, A, Tiezzi, M, Dimitri, GM, Bianchini, M, Maggini, M and Scarselli, F (2018) Inductive–transductive learning with graph neural networks. In Artificial Neural Networks in Pattern Recognition: 8th IAPR TC3 Workshop, ANNPR 2018, Siena, Italy, September 19–21, 2018, Proceedings 8 (pp. 201–212).CrossRef Google Scholar

Roth, V, Nigussie, TK and Lemann, T (2016) Model parameter transfer for streamflow and sediment loss prediction with swat in a tropical watershed. Environmental Earth Sciences 75(19), 1–13.CrossRef Google Scholar

Salinas, J, Laaha, G, Rogger, M, Parajka, J, Viglione, A, Sivapalan, M and Blöschl, G (2013) Comparative assessment of predictions in ungauged basins–part 2: Flood and low flow studies. Hydrology and Earth System Sciences 17(7), 2637–2652.CrossRef Google Scholar

Sánchez-Gómez, A, Martínez-Pérez, S, Sylvain, L, Sastre-Merlín, A and Molina-Navarro, E (2023) Streamflow components and climate change: Lessons learnt and energy implications after hydrological modeling experiences in catchments with a mediterranean climate. Energy Reports 9, 277–291.CrossRef Google Scholar

Seibert, J (1999) Regionalisation of parameters for a conceptual rainfall-runoff model. Agricultural and Forest Meteorology 98, 279–293.CrossRef Google Scholar

Seibert, J and Beven, KJ (2009) Gauging the ungauged basin: How many discharge measurements are needed? Hydrology and Earth System Sciences 13(6), 883–892.CrossRef Google Scholar

Seo, Y, Defferrard, M, Vandergheynst, P and Bresson, X (2018) Structured sequence modeling with graph convolutional recurrent networks. In Neural Information Processing: 25th International Conference, ICONIP 2018, Siem Reap, Cambodia, December 13–16, 2018, Proceedings, Part I 25 (pp. 362–373).CrossRef Google Scholar

Sharghi, E, Nourani, V, Soleimani, S and Sadikoglu, F (2018) Application of different clustering approaches to hydroclimatological catchment regionalization in mountainous regions, a case study in Utah state. Journal of Mountain Science 15(3), 461–484.CrossRef Google Scholar

Shen, C (2018) A transdisciplinary review of deep learning research and its relevance for water resources scientists. Water Resources Research 54(11), 8558–8593.CrossRef Google Scholar

Shen, C, Laloy, E, Elshorbagy, A, Albert, A, Bales, J, Chang, F-J, Ganguly, S, Hsu, K-L, Kifer, D, Fang, Z, et al (2018) Hess opinions: Incubating deep-learning-powered hydrologic science advances as a community. Hydrology and Earth System Sciences 22(11), 5639–5656.CrossRef Google Scholar

Shen, Z, Ramirez-Lopez, L, Behrens, T, Cui, L, Zhang, M, Walden, L, Wetterlind, J, Shi, Z, Sudduth, KA, Baumann, P, et al (2022) Deep transfer learning of global spectra for local soil carbon monitoring. ISPRS Journal of Photogrammetry and Remote Sensing 188, 190–200.CrossRef Google Scholar

Shen, C, Appling, AP, Gentine, P, Bandai, T, Gupta, H, Tartakovsky, A, Baity-Jesi, M, Fenicia, F, Kifer, D, Li, L, et al (2023) Differentiable modeling to unify machine learning and physical models and advance geosciences. Preprint, arXiv:2301.04027.CrossRef Google Scholar

Shi, Y, Ying, X and Yang, J (2022) Deep unsupervised domain adaptation with time series sensor data: A survey. Sensors 22(15), 5507.CrossRef Google Scholar PubMed

Singh, L, Mishra, PK, Pingale, SM, Khare, D and Thakur, HP (2022) Streamflow regionalisation of an ungauged catchment with machine learning approaches. Hydrological Sciences Journal 67(6), 886–897.CrossRef Google Scholar

Sit, M, Demiray, B and Demir, I (2021) Short-term hourly streamflow prediction with graph convolutional GRU networks. Preprint, arXiv:2107.07039.Google Scholar

Soriano, MA, Siegel, HG, Johnson, NP, Gutchess, KM, Xiong, B, Li, Y, Clark, CJ, Plata, DL, Deziel, NC and Saiers, JE (2021) Assessment of groundwater well vulnerability to contamination through physics-informed machine learning. Environmental Research Letters 16(8), 084013.CrossRef Google Scholar

Stalder, M, Ozdemir, F, Safin, A, Sukys, J, Bouffard, D and Perez-Cruz, F (2021) Probabilistic modeling of lake surface water temperature using a Bayesian spatio-temporal graph convolutional neural network. Preprint, arXiv:2109.13235.Google Scholar

Sun, AY, Jiang, P, Mudunuru, MK and Chen, X (2021a) Explore spatio-temporal learning of large sample hydrology using graph neural networks. Water Resources Research 57(12), e2021WR030394.CrossRef Google Scholar

Sun, Y, Yao, X, Bi, X, Huang, X, Zhao, X and Qiao, B (2021b) Time-series graph network for sea surface temperature prediction. Big Data Research 25, 100237.CrossRef Google Scholar

Sun, AY, Jiang, P, Yang, Z-L, Xie, Y and Chen, X (2022) A graph neural network (GNN) approach to basin-scale river network learning: The role of physics-based connectivity and data fusion. Hydrology and Earth System Sciences 26(19), 5163–5184.CrossRef Google Scholar

Sun, AY, Li, Z, Lee, W, Huang, Q, Scanlon, BR and Dawson, C (2023) Rapid flood inundation forecast using Fourier neural operator. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 3733–3739).CrossRef Google Scholar

Tabas, SS and Samadi, V (2021) Structure learning and transfer learning for streamflow prediction across ungauged basins. AGU Fall Meeting 2021.Google Scholar

Taccari, ML, Wang, H, Goswami, S, De Florio, M, Nuttall, J, Chen, X and Jimack, PK (2023) Developing a cost-effective emulator for groundwater flow modeling using deep neural operators. Journal of Hydrology, 130551.Google Scholar

Tang, H, Kong, Q and Morris, JP (2024) Multi-fidelity fourier neural operator for fast modeling of large-scale geological carbon storage. Journal of Hydrology, 130641.CrossRef Google Scholar

Tayal, K, Jia, X, Ghosh, R, Willard, J, Read, J and Kumar, V (2022) Invertibility aware integration of static and time-series data: An application to lake temperature modeling. In Proceedings of the 2022 SIAM International Conference on Data Mining (SDM) (pp. 702–710).CrossRef Google Scholar

Thielen, J, Schaake, J, Hartman, R and Buizza, R (2008) Aims, challenges and progress of the hydrological ensemble prediction experiment (HEPEX) following the third HEPEX workshop held in Stresa 27 to 29 June 2007. Atmospheric Science Letters 9(2), 29–35.CrossRef Google Scholar

Tian, W, Liao, Z and Wang, X (2019) Transfer learning for neural network model in chlorophyll-a dynamics prediction. Environmental Science and Pollution Research 26(29), 29857–29871.CrossRef Google Scholar PubMed

Toms, BA, Barnes, EA and Ebert-Uphoff, I (2020) Physically interpretable neural networks for the geosciences: Applications to earth system variability. Journal of Advances in Modeling Earth Systems 12(9), e2019MS002002.CrossRef Google Scholar

Tongal, H and Sivakumar, B (2017) Cross-entropy clustering framework for catchment classification. Journal of Hydrology 552, 433–446.CrossRef Google Scholar

Topp, SN, Pavelsky, TM, Jensen, D, Simard, M and Ross, MR (2020) Research trends in the use of remote sensing for inland water quality science: Moving towards multidisciplinary applications. Water 12(1), 169.CrossRef Google Scholar

Topp, SN, Barclay, J, Diaz, J, Sun, AY, Jia, X, Lu, D, Sadler, JM and Appling, AP (2023) Stream temperature prediction in a shifting environment: Explaining the influence of deep learning architecture. Water Resources Research 59(4), e2022WR033880.CrossRef Google Scholar

Troin, M, Arsenault, R, Wood, AW, Brissette, F and Martel, J-L (2021) Generating Ensemble Streamflow Forecasts: A Review of Methods and Approaches Over the Past 40 Years. US Army Core of Engineers (2020). https://nid.usace.army.mil/#/.Google Scholar

U.S. Geological Survey (2016) U.S. Geological Survey, 2016, National Water Information System. https://waterdata.usgs.gov/nwis (last accessed on 11 January 2021]. https://doi.org/10.5066/F7P55KJN.CrossRef Google Scholar

USACE. (2020). National Inventory of Dams. US Army Corps of Engineers.Google Scholar

Vaheddoost, B, Safari, MJS and Yilmaz, MU (2023) Rainfall-runoff simulation in ungauged tributary streams using drainage area ratio-based multivariate adaptive regression spline and random forest hybrid models. Pure and Applied Geophysics, 1–18.Google Scholar

Vaswani, A, Shazeer, N, Parmar, N, Uszkoreit, J, Jones, L, Gomez, AN, Kaiser, L and Polosukhin, I (2017) Attention is all you need. Advances in Neural Information Processing Systems 30.Google Scholar

Viglione, A, Parajka, J, Rogger, M, Salinas, J, Laaha, G, Sivapalan, M and Blöschl, G (2013) Comparative assessment of predictions in ungauged basins–part 3: Runoff signatures in Austria. Hydrology and Earth System Sciences 17(6), 2263–2279.CrossRef Google Scholar

Wagener, T and Wheater, HS (2006) Parameter estimation and regionalization for continuous rainfall-runoff models including uncertainty. Journal of Hydrology 320(1–2), 132–154.CrossRef Google Scholar

Wang, C and Tang, W (2023) Temporal fusion transformer-gaussian process for multi-horizon river level prediction and uncertainty quantification. Journal of Circuits, Systems and Computers.CrossRef Google Scholar

Wang, H and Yeung, D-Y (2020) A survey on bayesian deep learning. ACM Computing Surveys (CSUR) 53(5), 1–37.Google Scholar

Wang, J, Yuan, Q, Shen, H, Liu, T, Li, T, Yue, L, Shi, X and Zhang, L (2020a) Estimating snow depth by combining satellite data and ground-based observations over Alaska: A deep learning approach. Journal of Hydrology 585, 124828.CrossRef Google Scholar

Wang, N, Zhang, D, Chang, H and Li, H (2020b) Deep learning of subsurface flow via theory-guided neural network. Journal of Hydrology 584, 124700.CrossRef Google Scholar

Wang, Y-H, Gupta, HV, Zeng, X and Niu, G-Y (2022) Exploring the potential of long short-term memory networks for improving understanding of continental-and regional-scale snowpack dynamics. Water Resources Research 58(3), e2021WR031033.CrossRef Google Scholar

Waseem, M, Ajmal, M and Kim, T-W (2015) Ensemble hydrological prediction of streamflow percentile at ungauged basins in Pakistan. Journal of Hydrology 525, 130–137.CrossRef Google Scholar

Wei, X, Wang, G, Schmalz, B, Hagan, DFT and Duan, Z (2023) Evaluate transformer model and self-attention mechanism in the yangtze river basin runoff prediction. Journal of Hydrology: Regional Studies 47, 101438.Google Scholar

Weierbach, H, Lima, AR, Willard, JD, Hendrix, VC, Christianson, DS, Lubich, M and Varadharajan, C (2022) Stream temperature predictions for river basin management in the pacific northwest and mid-Atlantic regions using machine learning. Water 14(7), 1032.CrossRef Google Scholar

Weiss, K, Khoshgoftaar, TM and Wang, D (2016) A survey of transfer learning. Journal of Big Data 3(1), 1–40.CrossRef Google Scholar

Wen, Q, Zhou, T, Zhang, C, Chen, W, Ma, Z, Yan, J and Sun, L (2022) Transformers in time series: A survey. Preprint, arXiv:2202.07125.Google Scholar

White, E (2017) Predicting Unimpaired Flow in Ungauged Basins: “Random Forests” Applied to California Streams. Davis: University of California.Google Scholar

Willard, JD (2023) Machine Learning Techniques for Time Series Regression in Unmonitored Environmental Systems [Doctoral dissertation, University of Minnesota].Google Scholar

Willard, JD, Read, JS, Appling, AP and Oliver, SK (2021a) Data release: Predicting Water Temperature Dynamics of Unmonitored Lakes with Meta Transfer Learning (Provisional Data Release). https://doi.org/10.5066/P9I00WFR.CrossRef Google Scholar

Willard, JD, Read, JS, Appling, AP, Oliver, SK, Jia, X and Kumar, V (2021b) Predicting water temperature dynamics of unmonitored Lakes with meta-transfer learning [Publisher: Wiley Online Library]. Water Resources Research 57(7), e2021WR029579.CrossRef Google Scholar

Willard, JD, Jia, X, Xu, S, Steinbach, M, and Kumar, V (2022a) Integrating scientific knowledge with machine learning for engineering and environmental systems [Place: New York, NY, USA Publisher: Association for Computing Machinery]. ACM Computing Surveys. https://doi.org/10.1145/3514228.Google Scholar

Willard, JD, Read, JS, Topp, S, Hansen, GJ and Kumar, V (2022b) Daily surface temperatures for 185,549 lakes in the conterminous United States estimated using deep learning (1980–2020). Limnology and Oceanography Letters.CrossRef Google Scholar

Wilson, SR, Close, ME, Abraham, P, Sarris, TS, Banasiak, L, Stenger, R and Hadfield, J (2020) Achieving unbiased predictions of national-scale groundwater redox conditions via data oversampling and statistical learning. Science of the Total Environment 705, 135877.CrossRef Google Scholar PubMed

Wu, Z, Pan, S, Long, G, Jiang, J and Zhang, C (2019) Graph wave net for deep spatial-temporal graph modeling. Preprint, arXiv:1906.00121.Google Scholar

Wu, Z, Pan, S, Chen, F, Long, G, Zhang, C and Philip, SY (2020) A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems 32(1), 4–24.CrossRef Google Scholar

Xiang, Z and Demir, I (2021) High-resolution rainfall-runoff modeling using graph neural network. Preprint, arXiv:2110.10833.Google Scholar

Xie, J, Liu, X, Tian, W, Wang, K, Bai, P and Liu, C (2022) Estimating gridded monthly baseflow from 1981 to 2020 for the contiguous us using long short-term memory (LSTM) networks. Water Resources Research 58(8), e2021WR031663.CrossRef Google Scholar

Xiong, R, Zheng, Y, Chen, N, Tian, Q, Liu, W, Han, F, Jiang, S, Lu, M and Zheng, Y (2022) Predicting dynamic riverine nitrogen export in unmonitored watersheds: Leveraging insights of AI from data-rich regions. Environmental Science & Technology 56(14), 10530–10542.CrossRef Google Scholar PubMed

Xu, T and Liang, F (2021) Machine learning for hydrologic sciences: An introductory overview. Wiley Interdisciplinary Reviews: Water 8(5), e1533.CrossRef Google Scholar

Xu, T and Valocchi, AJ (2015) Data-driven methods to improve baseflow prediction of a regional groundwater model. Computers & Geosciences 85, 124–136.CrossRef Google Scholar

Xu, J, Fan, H, Luo, M, Li, P, Jeong, T and Xu, L (2023a) Transformer based water level prediction in Poyang Lake, China. Water 15(3), 576.CrossRef Google Scholar

Xu, Y, Lin, K, Hu, C, Wang, S, Wu, Q, Zhang, L and Ran, G (2023b) Deep transfer learning based on transformer for flood forecasting in data-sparse basins. Journal of Hydrology 625, 129956.CrossRef Google Scholar

Yan, B, Wang, G, Yu, J, Jin, X and Zhang, H (2021) Spatial-temporal Chebyshev graph neural network for traffic flow prediction in IOT-based its. IEEE Internet of Things Journal 9(12), 9266–9279.CrossRef Google Scholar

Yang, T, Sun, F, Gentine, P, Liu, W, Wang, H, Yin, J, Du, M and Liu, C (2019a) Evaluation and machine learning improvement of global hydrological model-based flood simulations. Environmental Research Letters 14(11), 114027.CrossRef Google Scholar

Yang, X, Magnusson, J and Xu, C-Y (2019b) Transferability of regionalization methods under changing climate. Journal of Hydrology 568, 67–81.CrossRef Google Scholar

Yao, S, Chen, C, He, M, Cui, Z, Mo, K, Pang, R and Chen, Q (2023) Land use as an important indicator for water quality prediction in a region under rapid urbanization. Ecological Indicators 146, 109768.CrossRef Google Scholar

Yin, H, Zhang, X, Wang, F, Zhang, Y, Xia, R and Jin, J (2021) Rainfall-runoff modeling using LSTM-based multi-state-vector sequence-to-sequence model. Journal of Hydrology 598, 126378. https://doi.org/10.1016/j.jhydrol.2021.126378.CrossRef Google Scholar

Yin, H, Guo, Z, Zhang, X, Chen, J and Zhang, Y (2022) RR-former: Rainfall-runoff modeling based on transformer. Journal of Hydrology 609, 127781.CrossRef Google Scholar

Yin, H, Zhu, W, Zhang, X, Xing, Y, Xia, R, Liu, J and Zhang, Y (2023) Runoff predictions in new-gauged basins using two transformer-based models. Journal of Hydrology 622, 129684.CrossRef Google Scholar

Ying, Z, Bourgeois, D, You, J, Zitnik, M and Leskovec, J (2019) GNNexplainer: Generating explanations for graph neural networks. Advances in Neural Information Processing Systems 32, 9240–9251.Google Scholar PubMed

Yu, H, Ai, T, Yang, M, Huang, L and Yuan, J (2022) A recognition method for drainage patterns using a graph convolutional network. International Journal of Applied Earth Observation and Geoinformation 107, 102696.CrossRef Google Scholar

Zakharova, L, Meyer, K and Seifan, M (2019) Trait-based modelling in ecology: A review of two decades of research. Ecological Modelling 407, 108703.CrossRef Google Scholar

Zeyer, A, Bahar, P, Irie, K, Schlüter, R and Ney, H (2019) A comparison of transformer and LSTM encoder decoder models for ASR. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (pp. 8–15).CrossRef Google Scholar

Zhang, X, Alexander, L, Hegerl, GC, Jones, P, Tank, AK, Peterson, TC, Trewin, B and Zwiers, FW (2011) Indices for monitoring changes in extremes based on daily temperature and precipitation data. Wiley Interdisciplinary Reviews: Climate Change 2(6), 851–870.Google Scholar

Zhang, J, Zhu, Y, Zhang, X, Ye, M and Yang, J (2018) Developing a long short-term memory (lstm) based model for predicting water table depth in agricultural areas. Journal of Hydrology 561, 918–929.CrossRef Google Scholar

Zhang, R, Nie, F, Li, X and Wei, X (2019) Feature selection with multi-view data: A survey. Information Fusion 50, 158–167.CrossRef Google Scholar

Zhang, X, Li, Y, Frery, AC and Ren, P (2021) Sea surface temperature prediction with memory graph convolutional networks. IEEE Geoscience and Remote Sensing Letters 19, 1–5.Google Scholar

Zhao, Q, Zhu, Y, Shu, K, Wan, D, Yu, Y, Zhou, X and Liu, H (2020) Joint spatial and temporal modeling for hydrological prediction. IEEE Access 8, 78492–78503.CrossRef Google Scholar

Zhao, G, Pang, B, Xu, Z, Cui, L, Wang, J, Zuo, D and Peng, D (2021) Improving urban flood susceptibility mapping using transfer learning. Journal of Hydrology 602, 126777.CrossRef Google Scholar

Zhi, W, Feng, D, Tsai, W-P, Sterle, G, Harpold, A, Shen, C and Li, L (2021) From hydrometeorology to river water quality: Can a deep learning model predict dissolved oxygen at the continental scale? Environmental Science & Technology 55(4), 2357–2368.CrossRef Google Scholar PubMed

Zhou, R and Pan, Y (2022). Flooddan: Unsupervised flood forecasting based on adversarial domain adaptation. In 2022 IEEE 5th International Conference on Big Data and Artificial Intelligence (BDAI) (pp. 6–12).CrossRef Google Scholar

Zhou, J, Cui, G, Hu, S, Zhang, Z, Yang, C, Liu, Z, Wang, L, Li, C and Sun, M (2020) Graph neural networks: A review of methods and applications. AI Open 1, 57–81.CrossRef Google Scholar

Zhuang, F, Qi, Z, Duan, K, Xi, D, Zhu, Y, Zhu, H, Xiong, H and He, Q (2020) A comprehensive survey on transfer learning. Proceedings of the IEEE 109(1), 43–76.CrossRef Google Scholar

Zounemat-Kermani, M, Batelaan, O, Fadaee, M and Hinkelmann, R (2021) Ensemble machine learning paradigms in hydrology: A review. Journal of Hydrology 598, 126266.CrossRef Google Scholar

Figure 1. Example of an LSTM network model with directly concatenated site characteristics and dynamic inputs.

Figure 2. Example of a combination static feature encoder neural network with an LSTM network model.

Figure 3. Conceptual example of transductive and inductive graph learning. In both left and right panels, $ \mathcal{F} $ is a model learned during training. Blue and red nodes represent entities with data for use in training and test entities without any data respectively. In transductive graph learning, the model has access to nodes and edges associated with test entities during training, but no new nodes can be introduced during testing. In inductive graph learning, the model is trained on an initial graph without any knowledge of the test entities, but the model can generalize to any new nodes during testing.

Figure 4. Process diagram of the Meta Transfer Learning framework. Models are first built from data-rich source domains. The metamodel is trained using characteristics extracted from the source domains to predict the performance metrics from transferring models between source domains. Then given a target system or domain, the metamodel is able to output a prediction of how well each of the source models will perform on the target system. Adapted from Willard, et al. (2021a).

Table 1. Literature table

Article contents

Time series predictions in unmonitored sites: a survey of machine learning techniques in water resources

Abstract

Keywords

Information

Impact statement

1. Introduction

2. Machine learning frameworks for predictions in unmonitored sites

2.1. Broad-scale models using all available entities or a subgroup of entities

2.1.1. Direct concatenation broad-scale model

2.1.2. Concatenation of encoded site characteristics for broad-scale models

2.1.3. Broad-scale graph neural networks

2.2. Transfer learning

2.2.1. Choosing which model to transfer

2.2.2. Fine-tuning models with sparse data

2.2.3. Unsupervised domain adaptation

2.3. Cross-cutting theme: knowledge-guided machine learning

2.3.1. Guiding ML with domain knowledge: KGML loss functions, architecture, and initialization

2.3.2. Augmenting process models with ML using hybrid process-ML models

2.3.3. Building differentiable and learnable process-based models

3. Summary and discussion

3.1. Open questions for further research

3.1.1. Is more data always better?

3.1.2. How do we select optimal training data and input features for prediction?

3.1.3. How should site characteristics be used in machine learning models for unmonitored prediction?

3.1.4. How can we leverage process understanding for prediction in unmonitored sites?

3.1.5. How do we perform uncertainty quantification for predictions in unmonitored sites?

3.1.6. What is the role of explainable AI in predictions for unmonitored sites?

4. Conclusion

Acronyms/Abbreviations

Acknowledgements

Author contributions

Competing interest

Data availability statement

Funding statement

Ethical standards

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests