Hostname: page-component-77f85d65b8-45ctf Total loading time: 0 Render date: 2026-03-30T07:31:26.228Z Has data issue: false hasContentIssue false

Time series predictions in unmonitored sites: a survey of machine learning techniques in water resources

Published online by Cambridge University Press:  22 January 2025

Jared D. Willard*
Affiliation:
Computing Sciences Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, MN, USA
Charuleka Varadharajan
Affiliation:
Earth and Environmental Sciences Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Xiaowei Jia
Affiliation:
Department of Computer Science, University of Pittsburgh, Pittsburgh, PA, USA
Vipin Kumar
Affiliation:
Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, MN, USA
*
Corresponding author: Jared D. Willard; Email: jwillard@lbl.gov

Abstract

Prediction of dynamic environmental variables in unmonitored sites remains a long-standing challenge for water resources science. The majority of the world’s freshwater resources have inadequate monitoring of critical environmental variables needed for management. Yet, the need to have widespread predictions of hydrological variables such as river flow and water quality has become increasingly urgent due to climate and land use change over the past decades, and their associated impacts on water resources. Modern machine learning methods increasingly outperform their process-based and empirical model counterparts for hydrologic time series prediction with their ability to extract information from large, diverse data sets. We review relevant state-of-the art applications of machine learning for streamflow, water quality, and other water resources prediction and discuss opportunities to improve the use of machine learning with emerging methods for incorporating watershed characteristics and process knowledge into classical, deep learning, and transfer learning methodologies. The analysis here suggests most prior efforts have been focused on deep learning frameworks built on many sites for predictions at daily time scales in the United States, but that comparisons between different classes of machine learning methods are few and inadequate. We identify several open questions for time series predictions in unmonitored sites that include incorporating dynamic inputs and site characteristics, mechanistic understanding and spatial context, and explainable AI techniques in modern machine learning frameworks.

Information

Type
Survey Paper
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NC
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial licence (http://creativecommons.org/licenses/by-nc/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. Example of an LSTM network model with directly concatenated site characteristics and dynamic inputs.

Figure 1

Figure 2. Example of a combination static feature encoder neural network with an LSTM network model.

Figure 2

Figure 3. Conceptual example of transductive and inductive graph learning. In both left and right panels, $ \mathcal{F} $ is a model learned during training. Blue and red nodes represent entities with data for use in training and test entities without any data respectively. In transductive graph learning, the model has access to nodes and edges associated with test entities during training, but no new nodes can be introduced during testing. In inductive graph learning, the model is trained on an initial graph without any knowledge of the test entities, but the model can generalize to any new nodes during testing.

Figure 3

Figure 4. Process diagram of the Meta Transfer Learning framework. Models are first built from data-rich source domains. The metamodel is trained using characteristics extracted from the source domains to predict the performance metrics from transferring models between source domains. Then given a target system or domain, the metamodel is able to output a prediction of how well each of the source models will perform on the target system. Adapted from Willard, et al. (2021a).

Figure 4

Table 1. Literature table