Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks

Shivam Singh; Simon Michael Papalexiou; Hebatallah M. Abdelmoaty; Tom Hartvigsen; Antonios Mamalakis

doi:10.1017/eds.2026.10039

Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks

Published online by Cambridge University Press: 28 May 2026

Shivam Singh

Simon Michael Papalexiou ,

Hebatallah M. Abdelmoaty ,

Tom Hartvigsen and

Antonios Mamalakis

Show author details

Shivam Singh*: Affiliation:
University of Virginia, USA
Simon Michael Papalexiou: Affiliation:
University of Calgary, Canada
Hebatallah M. Abdelmoaty: Affiliation:
University of Calgary, Canada
Tom Hartvigsen: Affiliation:
University of Virginia, USA
Antonios Mamalakis: Affiliation:
University of Virginia, USA
*: Corresponding author: Shivam Singh; Email: wpa8me@virginia.edu

Article contents

Abstract
Impact Statement
Introduction
Data
Methods
Evaluation for dry/wet storm prediction
Results
Discussion
Conclusion
Open peer review
Author contribution
Competing interests
Data availability statement
Footnotes
References

Abstract

Accurately downscaling precipitation from coarse to high spatial resolutions remains a critical challenge in climate and hydrometeorological modeling. A key limitation is the frequent misclassification of dry and wet regions, which compromises the realism and reliability of high-resolution outputs. To address this, we propose a deep learning-based downscaling framework that explicitly models dry/wet classification by transforming low-resolution 6×6 precipitation inputs into high-resolution 60 × 60 binary classification fields. We evaluate two architectures, a convolutional encoder-decoder and a conditional Wasserstein generative adversarial network (WGAN), utilizing three training strategies: (1) using binary wet/dry inputs, (2) using precipitation intensity inputs, and (3) using precipitation intensity inputs and adding physical constraints. Models are trained and validated on both synthetically generated precipitation fields and real radar-estimated hourly precipitation data over the contiguous United States. Performance is assessed using metrics including the overall probability of zero ($ {\mathrm{P}}_0 $) and spatial autocorrelations. Results show that incorporating intensity information improves dry/wet classification, while adding physical constraints further enhances accuracy, generalization, and physical consistency, especially for WGAN models. The convolutional encoder-decoder produces smoother outputs with stable performance regarding marginal statistics, whereas the WGAN generates sharper boundaries and more realistic dry/wet fields, improving on the spatial dependence structure. Furthermore, we demonstrate that the derived dry/wet classification fields can be used as binary masks to bias correct downscaled precipitation fields, enhancing both statistical fidelity and spatial realism. These findings highlight the value of a physically informed bias-correction strategy for improving the spatial realism of high-resolution precipitation fields from coarse-scale inputs.

Keywords

drizzle bias correction dry/wet classification generative models precipitation downscaling Wasserstein GAN (WGAN)

Information

Type: Application Paper
Information: Environmental Data Science , Volume 5 , 2026 , e12

DOI: https://doi.org/10.1017/eds.2026.10039 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices: Open data Open materials
Copyright: © The Author(s), 2026. Published by Cambridge University Press

Impact Statement

Underestimation of dry regions due to drizzle bias remains a persistent limitation in statistical and deep learning-based precipitation downscaling, undermining the reliability of high-resolution products for drought assessment and hydrologic applications. This study addresses the problem by training convolutional encoder-decoder and WGAN models on both synthetic and radar-based datasets to predict high-resolution dry/wet classifications from coarse-resolution inputs under varying conditioning strategies. While convolutional encoder–decoders reproduced marginal dry/wet statistics reliably, WGAN with dry-region constraints yields more spatially coherent and physically consistent outputs, particularly regarding storm boundaries and spatial autocorrelation structure. We demonstrate that the predicted dry/wet masks offer an efficient correction mechanism for mitigating drizzle bias in existing downscaled outputs, enhancing their utility in water resource management, hydrologic modeling, and climate impact assessments.

1. Introduction

Precipitation is a fundamental driver of the hydrological cycle and underpins a wide range of applications in water resources, agriculture, flood risk assessment, and climate impact studies. However, global climate models and most reanalysis products represent precipitation at relatively coarse spatial resolutions, typically ranging from about ~0.25° to over 1°, which limits their ability to capture localized storm structure, the spatial intermittency of precipitation, and extremes (Maraun et al., Reference Maraun, Wetterhall, Ireson, Chandler, Kendon, Widmann, Brienen, Rust, Sauter, Themeßl, Venema, Chun, Goodess, Jones, Onof, Vrac and Thiele-Eich2010; Giorgi and Gutowski, Reference Giorgi and Gutowski2015; Dueben and Bauer, Reference Dueben and Bauer2018; Wang et al., Reference Wang, Tian, Lowe, Kalin and Lehrter2021; Kumar et al., Reference Kumar, Atey, Singh, Chattopadhyay, Acharya, Singh, Nanjundiah and Rao2023; Baghanam et al., Reference Baghanam, Nourani, Bejani, Pourali, Kantoush and Zhang2024; Lopez-Gomez et al., Reference Lopez-Gomez, Wan, Zepeda-Núñez, Schneider, Anderson and Sha2024). As a result, precipitation downscaling has become an essential tool for translating coarse-scale climate information into spatially refined precipitation fields suitable for hydrologic and impact-oriented applications. Dynamical downscaling based on regional climate models (RCMs) and convection-permitting models (CPMs) offers higher resolution but remains computationally expensive, making its use in long-term simulations or large ensembles prohibitive (Gao et al., Reference Gao, Shi, Zhang, Wu, Giorgi, Ji and Wang2012; Giorgi and Gutowski, Reference Giorgi and Gutowski2015; Gutowski et al., Reference Gutowski, Ullrich, Hall, Leung, O’Brien, Patricola, Arritt, Bukovsky, Calvin, Feng, Jones, Kooperman, Monier, Pritchard, Pryor, Qian, Rhoades, Roberts and Sakaguchi2020; Potter et al., Reference Potter, Chiew, Charles, Fu, Zheng and Zhang2020; Hobeichi et al., Reference Hobeichi, Nishant, Shao, Abramowitz, Pitman, Sherwood, Bishop and Green2023; Nishant et al., Reference Nishant, Hobeichi, Sherwood, Abramowitz, Shao, Bishop and Pitman2023; Rahimi et al., Reference Rahimi, Huang, Norris, Hall, Goldenson, Risser, Feldman, Lebo, Dennis and Thackeray2024). As a result, statistical and machine learning-based downscaling methods have gained momentum due to their efficiency, scalability, reduced computational demands, and rapid deployment capabilities in translating coarse-scale climate information into finer scales suitable for impact modeling (Vrac et al., Reference Vrac, Stein, Hayhoe and Liang2007; Liu et al., Reference Liu, Ganguly and Dy2020; Tabari et al., Reference Tabari, Paz, Buekenhout and Willems2021; Miralles et al., Reference Miralles, Steinfeld, Martius and Davison2022; Rampal et al., Reference Rampal, Hobeichi, Gibson, Baño-Medina, Abramowitz, Beucler, González-Abad, Chapman, Harder and Gutiérrez2024).

A vast majority of statistical downscaling efforts, however, focus primarily on correcting precipitation intensity and improving mean or variance estimates, often neglecting the characterization of dry and wet states (AghaKouchak et al., Reference AghaKouchak, Mehran, Norouzi and Behrangi2012; Mamalakis et al., Reference Mamalakis, Langousis, Deidda and Marrocu2017; Pan et al., Reference Pan, Anderson, Goncalves, Lucas, Bonfils, Lee, Tian and Ma2021; Chen et al., Reference Chen, Janke, Steinke and Lerch2024; Luo et al., Reference Luo, Zhang, Wang, Chen, Feng, Wang, Liu, Guo, Chen and Wang2025). This oversight introduces significant biases in climate risk assessments, as the misclassification of dry/wet regimes can significantly impact applications such as storm impact analysis, drought investigation, and agricultural planning, where the presence or absence of rainfall is more critical than marginal errors in precipitation amounts (Mendoza Paz and Willems, Reference Mendoza Paz and Willems2023; Vogel et al., Reference Vogel, Johnson, Marshall, Bende-Michl, Wilson, Peter, Wasko, Srikanthan, Sharples, Dowdy, Hope, Khan, Mehrotra, Sharma, Matic, Oke, Turner, Thomas, Donnelly and Duong2023; Rahimi et al., Reference Rahimi, Huang, Norris, Hall, Goldenson, Risser, Feldman, Lebo, Dennis and Thackeray2024). Particularly in arid and semi-arid regions, or during convective extremes, traditional downscaling methods often produce unrealistic drizzle or fail to preserve extended dry spells, due to treating precipitation as a purely continuous variable optimized for mean squared error rather than its intermittent nature (Maraun, Reference Maraun2016; Potter et al., Reference Potter, Chiew, Charles, Fu, Zheng and Zhang2020; Suliman et al., Reference Suliman, Awchi, Al-Mola and Shahid2020; Harder et al., Reference Harder, Hernandez-Garcia, Ramesh, Yang, Sattegeri, Szwarcman, Watson, Rolnick and Watson2023). Machine learning (ML) and deep learning (DL) approaches, especially generative adversarial networks (GANs), have recently emerged as promising tools for downscaling precipitation fields due to their capacity to learn complex, nonlinear patterns and replicate fine-scale spatial features (Leinonen et al., Reference Leinonen, Nerini and Berne2020; Scher and Peßenteiner, Reference Scher and Peßenteiner2021; Duncan et al., Reference Duncan, Subramanian and Harrington2022; González-Abad et al., Reference González-Abad, Baño-Medina and Cachá2023; Murukesh et al., Reference Murukesh, Golla and Kumar2024). GAN-based approaches have been shown to better preserve precipitation structures, spatial heterogeneity, and fine-scale variability compared to a range of conventional statistical and deterministic deep learning downscaling methods (Leinonen et al., Reference Leinonen, Nerini and Berne2020; Harris et al., Reference Harris, McRae, Chantry, Dueben and Palmer2022; Murukesh et al., Reference Murukesh, Golla and Kumar2024; Papalexiou and Mamalakis, Reference Papalexiou and Mamalakis2025).

However, even state-of-the-art GAN models tend to prioritize intensity reproduction, often failing to generate true dry regions and instead producing low-intensity spurious precipitation, especially near wet-dry boundaries (Harris et al., Reference Harris, McRae, Chantry, Dueben and Palmer2022; Papalexiou and Mamalakis, Papalexiou and Mamalakis, Reference Papalexiou and Mamalakis2025). This “drizzle bias” is not merely a technical artifact, it fundamentally distorts the representation of the hydrological cycle, compromises the reliability of precipitation patterns, and can mislead water resource assessments and climate impact analyses (Lazoglou et al., Reference Lazoglou, Economou, Anagnostopoulou, Zittis, Tzyrkalli, Georgiades and Lelieveld2024). Papalexiou and Mamalakis (Reference Papalexiou and Mamalakis2025) systematically demonstrate this challenge through a controlled experiment using synthetic storm fields. Their evaluation of multiple neural network architectures, including a Wasserstein GAN (WGAN), highlights that while WGAN outperforms conventional models in preserving spatial structure and variability in precipitation, they fail to accurately replicate the probability of zero precipitation (P ₀) and produce biased representations of extreme events. Notably, none of the evaluated models, including WGAN, were able to naturally reproduce dry regions, instead producing light precipitation (drizzle bias) across all grid cells. This behavior highlights a persistent dry/wet misclassification problem that remains a fundamental challenge in precipitation downscaling research. Recent studies reinforce this observation, pointing to a general trend where downscaling efforts, whether based on deep learning, traditional statistical approaches, or hybrid models, struggle with accurately capturing precipitation intermittency and the discrete nature of rainfall occurrence (Banõ-Medina et al., Reference Banõ-Medina, Manzanas, Cimadevilla, Fernandez, Gonzalez-Abad, Cofinõ and Gutierrez2022; Nishant et al., Reference Nishant, Hobeichi, Sherwood, Abramowitz, Shao, Bishop and Pitman2023; Murukesh et al., Reference Murukesh, Golla and Kumar2024). For example, Chen et al. (Reference Chen, Janke, Steinke and Lerch2024) emphasize that despite improvements in spatial realism, many ML-based downscaling models underestimate dry days and overproduce light precipitation, failing to reproduce observed precipitation occurrence frequencies. Similarly, studies like those of Nishant et al. (Reference Nishant, Hobeichi, Sherwood, Abramowitz, Shao, Bishop and Pitman2023) and Luo et al. (Reference Luo, Zhang, Wang, Chen, Feng, Wang, Liu, Guo, Chen and Wang2025) highlight that statistical bias correction techniques often adjust mean values but do little to fix the structural misclassification between wet and dry states. Addressing this gap is critical, especially as climate change is expected to intensify precipitation extremes and alter dry spell dynamics (Berrang-Ford et al., Reference Berrang-Ford, Ford and Paterson2011; Stocker, Reference Stocker2014; Lesnikowski et al., Reference Lesnikowski, Ford, Berrang-Ford, Barrera and Heymann2015; Magnan, Reference Magnan2016; Yazdandoost et al., Reference Yazdandoost, Moradian, Izadi and Aghakouchak2021), making accurate dry/wet classification more consequential than ever. Incorporating classification-driven objectives within downscaling frameworks can substantially enhance the realism of downscaled datasets, ensuring that not only precipitation intensity but also the occurrence is accurately reproduced. Such an approach would improve hydrological model inputs, enhance the representation of precipitation occurrence and associated dry and wet conditions, and provide more actionable climate information for resource managers and policymakers.

This study proposes a generative adversarial network (GAN) framework designed to address dry/wet misclassification in precipitation downscaling. The framework incorporates explicit dry/wet classification objectives during model training to learn the spatial occurrence of precipitation, thereby reducing false wet and false dry predictions. The resulting dry/wet predictions explicitly represent the binary nature of precipitation occurrence (wet versus dry states) and can be applied as spatial masks to correct intensity-based downscaled precipitation fields. In convolutional and generative deep learning–based precipitation downscaling approaches, dry/wet corrections are often implemented using post-processing procedures based on fixed thresholds or rule-based criteria to suppress spurious low-intensity precipitation. In contrast, the proposed approach derives spatially coherent dry/wet masks directly from a data-trained generative model (WGAN), enabling the preservation of storm structure and spatial continuity. This data-driven masking strategy improves the physical consistency and interpretability of downscaled precipitation fields and reduces drizzle-related artifacts in hydrometeorological applications.

2. Data

2.1. Synthetic data

This study utilizes synthetic precipitation fields generated using the Complete Stochastic Modeling System (CoSMoS), as described in Papalexiou (Reference Papalexiou2022). CoSMoS is designed to simulate hydroclimatic fields with prescribed marginal distributions, spatiotemporal autocorrelation structures, advection velocities, and anisotropy (Papalexiou et al., Reference Papalexiou, Serinaldi and Porcu2021). Its flexibility enables the creation of synthetic precipitation fields with user-defined statistical properties, making it ideal for evaluating downscaling model performance under controlled conditions. The same dataset was used by Papalexiou and Mamalakis (Reference Papalexiou and Mamalakis2025) to assess the capability of deep learning models in precipitation downscaling tasks. In this study, synthetically generated precipitation fields at a fine spatial resolution, with a grid size of 60×60, are treated as a reference dataset, enabling controlled evaluation of downscaling models against a known ground truth. The corresponding low-resolution (6×6) synthetic fields are obtained by spatially aggregating the 60×60 fields using non-overlapping block averaging, consistent with the downscaling configuration adopted throughout the study. The grid dimensions describe the domain discretization of the synthetic fields and are not intended to represent a specific physical spatial resolution or to be spatially equivalent to radar-based datasets. Following aggregation, the 60×60 high-resolution precipitation fields were converted into binary dry/wet masks using the same thresholding procedure applied throughout the synthetic experiments, and these binary fields were used as the training targets. A total of 15,000 storm fields were used, comprising an equal mix of three distinct types: isotropic, anisotropic, and cyclonic storms. The dataset was randomly partitioned into 67% for training, 13% for validation, and 20% for testing, providing a balanced and robust approach for model development and assessment.

2.2. Radar data

To assess model performance on real-world precipitation, we utilized hourly accumulated precipitation fields from the NCEP Multi-Radar/Multi-Sensor (MRMS) system, accessed through the Iowa Environmental Mesonet (IEM). The MRMS dataset provides high-resolution precipitation measurements at $ 1\;\mathrm{km}\times 1\;\mathrm{km} $ spatial resolution, making it well-suited for evaluating the ability of the models to capture fine-scale storm structure and dry/wet boundaries. For this study, a subset of MRMS data was extracted over 600 × 600 km spatial domains from hourly precipitation fields during a three-month winter period (November–January 2023). This period was selected to capture a wide range of cold-season precipitation characteristics and spatial intermittency typical of winter storm systems, including stratiform precipitation, frontal systems, and embedded convective features, while keeping the dataset computationally manageable for analysis (Chen et al., Reference Chen, Leung, Gao, Liu and Wigmosta2023; Akinsanola et al., Reference Akinsanola, Chen, Kooperman and Bobde2024). To ensure geographical diversity and exposure to different climatic regimes, four distinct 600×600 km subregions were selected based on regional storm activity and hydroclimatic variability. The locations of these subregions are shown in Supplementary Figure S1, and their geographical details are summarized in Table 1.

Table 1.

Geographical locations of regions selected to extract high-resolution hourly precipitation with varied spatiotemporal patterns

A data table listing four regions with their latitude and longitude ranges for extracting high-resolution hourly precipitation. See long description.

Table 1. Long description

The table contains four columns: S dot No dot, Region, Latitude range, and Longitude range. Row one lists Western region, latitude 46 degrees North to 52 degrees North, longitude minus 126 degrees West to 120 degrees West. Row two lists Southeastern region, latitude 26 degrees North to 32 degrees North, longitude minus 86 degrees West to minus 80 degrees West. Row three lists Mid-South region, latitude 30 degrees North to 36 degrees North, longitude minus 100 degrees West to minus 94 degrees West. Row four lists Eastern region, latitude 40 degrees North to 46 degrees North, longitude minus 77 degrees West to minus 71 degrees West. Each region is associated with a distinct set of latitude and longitude boundaries for precipitation data extraction.

To ensure a consistent input-output configuration with the synthetic experiments, the MRMS data were subsequently preprocessed to match the same input–output configuration used during model training, consisting of 6×6 low-resolution (LR) inputs and 60×60 high-resolution (HR) targets. This was achieved through non-overlapping block aggregation of the native 1 km precipitation fields, yielding 60×60 grids at an effective resolution of 10 km and corresponding 6×6 grids at 100 km resolution. Following aggregation, the 60×60 HR precipitation fields were converted into binary dry/wet masks using the same thresholding procedure applied in the synthetic experiments as described in Section 2.1. This preprocessing ensures that the radar-based evaluation data structurally match the synthetic input–output configuration, without implying physical spatial-scale equivalence between the two datasets.

To focus performance assessment on nontrivial precipitation cases, samples exhibiting zero precipitation across the entire subregion at a given time step were excluded. This filtering step is necessary because one of the model configurations applies a hard constraint that preserves dry regions during training (Section 3.3). In that case, fully dry samples would therefore lead to trivial predictions and provide limited diagnostic value. Excluding these cases ensures that the analysis emphasizes situations requiring reconstruction of precipitation structure and accurate delineation of dry and wet areas. In total, 8,828 radar images were initially collected across the four regions. After filtering fully dry samples, 6,368 radar images remained. These samples were randomly shuffled using a fixed random seed to ensure reproducibility and then split into training, validation, and test sets containing 3,200, 1,248, and 1,920 samples, respectively. All splits were chosen to be exact multiples of the batch size (32) to ensure consistent batching during training and evaluation. The training set was used for model fitting, the validation set for hyperparameter tuning and early stopping, and the test set exclusively for performance assessment. This preprocessed radar dataset provides a rigorous benchmark for assessing the ability of the proposed models to reproduce fine-scale dry/wet precipitation occurrence and storm morphology in real-world conditions.

3. Methods

In this study, we present a deep learning-based framework that utilizes a convolutional encoder-decoder and WGAN to correct dry/wet classification bias and improve the representation of precipitation occurrence in downscaled fields. The methodology focuses on three main elements: (1) network architecture, which defines the core structure of the convolutional encoder-decoder and the WGAN, (2) training strategies incorporating different conditioning approaches, which guide the learning process to improve the representation of storm features, and (3) model evaluation using robust and interpretable performance metrics.

3.1. Convolutional encoder–decoder

We implement a convolutional encoder–decoder neural network to perform precipitation occurrence downscaling, with the objective of reconstructing high-resolution 60×60 dry/wet masks from coarse 6×6 low-resolution inputs. The network is designed to capture the spatial organization of precipitation and accurately delineate dry and wet regions. The model follows a standard encoder–decoder structure commonly used in image-to-image learning tasks (Badrinarayanan et al., Reference Badrinarayanan, Kendall and Cipolla2017; Meghani et al., Reference Meghani, Singh, Kumar and Goyal2023; Singh and Goyal, Reference Singh and Goyal2023; Abdelmoaty et al., Reference Abdelmoaty, Papalexiou, Mamalakis, Singh, Coia, Hairabedian, Szeftel and Grover2025). The encoder progressively extracts feature representations through convolutional layers with ReLU activation functions, capturing both local and large-scale spatial dependencies. Each convolutional layer is followed by a max-pooling operation that reduces the spatial resolution of the feature maps while retaining essential structural information, enabling the learning of hierarchical representations. The decoder reconstructs high-resolution binary precipitation occurrence fields using transposed convolutional layers that progressively increase spatial resolution. Feature reconstruction relies on the hierarchical representations learned by the encoder. The resulting network produces high-resolution dry/wet precipitation occurrence fields that capture the spatial organization of precipitation and accurately delineate dry and wet regions. To maintain architectural consistency with the WGAN-based generative framework, a Gaussian noise field is concatenated with the low-resolution input and provided to the encoder–decoder, although under binary cross-entropy training, the model behaves largely deterministically. A schematic of the network architecture is shown in Figure 2a.

3.2. Conditional Wasserstein GAN (WGAN)

While convolutional encoder–decoder networks can effectively capture the large-scale spatial organization of precipitation occurrence, training with deterministic loss functions often leads to overly smooth or repetitive binary patterns and limited representation of fine-scale spatial variability (Kalantar et al., Reference Kalantar, Messiou, Winfield, Renn, Latifoltojar, Downey, Sohaib, Lalondrelle, Koh and Blackledge2021; Choubey et al., Reference Choubey, Patil and Anand Kumar2024; Rampal et al., Reference Rampal, Hobeichi, Gibson, Baño-Medina, Abramowitz, Beucler, González-Abad, Chapman, Harder and Gutiérrez2024; Abdelmoaty et al., Reference Abdelmoaty, Papalexiou, Mamalakis, Singh, Coia, Hairabedian, Szeftel and Grover2025; Papalexiou and Mamalakis, Reference Papalexiou and Mamalakis2025). To address these limitations and improve the realism of downscaled dry/wet precipitation fields, we extend the encoder–decoder framework using a conditional WGAN. The WGAN consists of two components: (i) a generator (convolutional encoder–decoder) that performs precipitation occurrence downscaling and (ii) a critic network that evaluates the realism of the generated high-resolution binary outputs (Goodfellow, Reference Goodfellow2016). Unlike the original GAN formulation, which relies on an adversarial objective implemented via binary cross-entropy loss, the WGAN optimizes the Wasserstein distance, yielding improved training stability and more informative gradient signals (Arjovsky et al., Reference Arjovsky, Chintala and Bottou2017; Gulrajani et al., Reference Gulrajani, Ahmed, Arjovsky, Dumoulin and Courville2017; Creswell et al., Reference Creswell, White, Dumoulin, Arulkumaran, Sengupta and Bharath2018; Xu et al., Reference Xu, Jang-Jaccard, Liu, Sabrina and Kwak2022; Han and Guan, Reference Han and Guan2023). The inclusion of a gradient penalty (WGAN-GP) further stabilizes training and mitigates mode-collapse behavior (Gulrajani et al., Reference Gulrajani, Ahmed, Arjovsky, Dumoulin and Courville2017). The WGAN is formulated as a conditional generative model to ensure consistency between the generated high-resolution dry/wet occurrence fields and the corresponding low-resolution precipitation input. Conditioning is implemented at both components (generator and critic) of the WGAN. In the generator, the low-resolution precipitation field is provided explicitly as input and propagated through the encoder–decoder network. In the critic, the low-resolution field is concatenated with intermediate feature representations derived from the high-resolution binary field, allowing the critic to assess whether the generated dry/wet patterns are statistically consistent with the coarse-scale structure of the input (Leinonen et al., Reference Leinonen, Nerini and Berne2020; Glawion et al., Reference Glawion, Polz, Kunstmann, Fersch and Chwala2023).

As illustrated in Figure 1b, this conditioning is implemented at the feature-representation level rather than through strict pixel-wise alignment. Although these intermediate feature maps do not preserve exact spatial correspondence with either the low- or high-resolution grids, this design choice is intentional (Abdelmoaty et al., Reference Abdelmoaty, Papalexiou, Mamalakis, Singh, Coia, Hairabedian, Szeftel and Grover2025; Papalexiou and Mamalakis, Reference Papalexiou and Mamalakis2025). The aim is to enforce structural and statistical consistency across spatial scales rather than pixel-level correspondence, thereby allowing the critic to focus on large-scale coherence and the realistic spatial organization of dry/wet patterns. The generator is initialized using the pretrained encoder–decoder network (Section 3.1) and subsequently refined through adversarial training to minimize the Wasserstein distance between real and generated binary precipitation occurrence fields. This adversarial training enables the generator to learn complex spatial structures and sharper dry/wet boundaries than achievable under deterministic optimization alone. For the remainder of the study, this adversarially trained generator is referred to as the WGAN generator. The encoder–decoder generator and critic contain 121,633 and 382,729 trainable parameters, respectively (Figure 1).

Figure 1.

The architectures of (a) convolutional encoder–decoder (Generator) and (b) critic used in this study as components of WGAN.

A two-panel flowchart showing the generator and critic architectures for W G A N, with labeled convolution, pooling, and dense layers. See long description.

Figure 1. Long description

Panel a at the top outlines the generator as a convolutional encoder-decoder. The encoder starts with two parallel inputs: Coarse Image 6 by 6 and Noise distributed as N open parenthesis 0 comma 1 close parenthesis 6 by 6. These are concatenated to form 6 by 6 by 2, followed by a convolution layer 6 by 6 by 64, then max pooling 3 by 3 by 64. The decoder begins with transpose convolution 6 by 6 by 32, then convolution 14 by 14 by 32, transpose convolution 30 by 30 by 32, convolution 30 by 30 by 32, transpose convolution 60 by 60 by 32, convolution 60 by 60 by 1, and ends with a hard constraint 60 by 60. Panel b at the bottom shows the critic. The high resolution image 60 by 60 is processed by convolution 58 by 58 by 64, max pooling 29 by 29 by 64, convolution 13 by 13 by 32, max pooling 7 by 7 by 32, and convolution 6 by 6 by 8. In parallel, the coarse image 6 by 6 is concatenated, convolved to 6 by 6 by 9, then 6 by 6 by 32. Both branches are flattened and merged, followed by four dense layers with 1152, 256, 128, and 64 units, ending in the output node.

3.3. Training strategy

Models were trained using the Adam optimizer (Kingma and Ba, Reference Kingma and Ba2014) with a batch size of 32. Training was performed independently for the synthetic CoSMoS dataset and the MRMS radar dataset. For each dataset, samples were shuffled with a fixed random seed and divided into training, validation, and test sets. The validation set was used for hyperparameter selection, while the test set was reserved exclusively for performance evaluation.

To introduce stochastic variability in the generated outputs, the generator receives a stochastic noise input during both training and inference. Specifically, a noise field defined on the low-resolution grid is sampled from a standard normal distribution and concatenated channel-wise with the low-resolution precipitation input. Although the noise is sampled independently at each grid cell, the convolutional architecture of the generator transforms this input into spatially coherent variability, avoiding unphysical pixel-scale randomness and promoting realistic spatial structure. This stochastic input provides the primary source of variability in the generated dry/wet occurrence fields. When trained using a binary cross-entropy loss alone, the encoder–decoder network tends to suppress the influence of the noise input and converge toward a largely deterministic solution. In contrast, adversarial training within the WGAN framework encourages the generator to utilize the noise input to produce diverse yet realistic dry/wet spatial configurations. The critic does not generate variability itself, but instead constrains how stochastic variability is expressed by penalizing unrealistic spatial organization. As a result, the WGAN produces multiple plausible high-resolution dry/wet realizations conditioned on the same low-resolution input. For evaluation purposes, a fixed random seed is employed during inference to ensure reproducibility and consistent comparison across experiments. These models are trained and evaluated independently on the synthetic and MRMS datasets, and no transfer learning or cross-dataset generalization is assumed in this study.

To examine how different forms of low-resolution information influence precipitation occurrence reconstruction, three conditioning strategies were evaluated:

a) Binary input: The low-resolution input is provided as a binary dry/wet mask.
b) Intensity input: The low-resolution input consists of continuous precipitation intensity values.
c) Constrained input: In addition to intensity input, a hard dry constraint is imposed during training, which enforces that high-resolution pixels corresponding to dry low-resolution pixels remain dry.

For each conditioning strategy and model type (the convolutional encoder–decoder and the WGAN), models were trained using five independent training runs with different random initializations (seeds = [8889, 42, 100, 1025, 61]) to assess the robustness of the results with respect to stochastic training variability. This procedure is not intended as formal uncertainty quantification, but rather to evaluate the stability of model behavior across training realizations. The model output layer produces pixel-wise probabilities of wet occurrence. For evaluation, these probabilities were converted to binary dry/wet classifications using a threshold of 0.5, where values below 0.5 were classified as dry and values equal to or greater than 0.5 as wet.

3.3.1. Convolutional encoder–decoder network training

The convolutional encoder–decoder network was first trained independently as a binary precipitation occurrence downscaling model, with the objective of classifying each high-resolution pixel as dry or wet. Training was conducted for up to 500 epochs with early stopping based on validation loss, using a patience of 30 epochs to prevent overfitting. The network was optimized using the binary cross-entropy (BCE) loss, defined as

(Eq (1))

$$ {\mathcal{L}}_{\mathrm{BCE}}=-\frac{1}{N}\sum \limits_{i=1}^N\left[{y}_i\mathit{\log}\left({\hat{y}}_i\right)+\left(1-{y}_i\right)\mathit{\log}\left(1-{\hat{y}}_i\right)\right] $$

where $ N $ is the number of pixels in a batch, $ {y}_i\in \left\{0,1\right\} $ is the true dry/wet label derived from precipitation values using a threshold of precipitation $ >0 $ , and $ {\hat{y}}_i\in \left[0,1\right] $ is the predicted probability of a wet pixel. The final binary prediction is obtained by applying a threshold of 0.5 to $ {\hat{y}}_i $ . This stage produces a deterministic encoder–decoder model trained using the binary cross-entropy loss for pixel-wise dry/wet classification.

3.3.2. Conditional WGAN training

To improve the realism and spatial variability of the reconstructed dry/wet fields, the pretrained encoder–decoder network was subsequently used to initialize the generator in a conditional Wasserstein GAN with gradient penalty (WGAN-GP) framework. Under this conditional WGAN-GP formulation, the critic and generator are optimized using the following objectives. The critic loss is defined as

(Eq (2))

$$ {\mathcal{L}}_{\mathrm{C}}={\unicode{x1D53C}}_{\overset{\sim }{x}\sim {P}_{\mathrm{g}}}\left[C\left(\overset{\sim }{x},{x}_{\mathrm{LR}}\right)\right]-{\unicode{x1D53C}}_{x\sim {P}_{\mathrm{r}}}\left[C\left(x,{x}_{\mathrm{LR}}\right)\right]+\lambda\;{\unicode{x1D53C}}_{\hat{x}}\left[{\left(\parallel {\nabla}_{\hat{x}}C\left(\hat{x},{x}_{\mathrm{LR}}\right){\parallel}_2-1\right)}^2\right] $$

where

$ x\sim {P}_{\mathrm{r}} $ denotes real high-resolution binary precipitation fields drawn from the real data distribution $ {P}_r $ ,

$ \overset{\sim }{x}=G\left(z,{x}_{\mathrm{LR}}\right) $ denotes generated high-resolution samples produced by the generator $ G $ ,

$ \hat{x}=\varepsilon x+\left(1-\varepsilon \right)\overset{\sim }{x} $ denotes samples interpolated between real and generated data, with $ \varepsilon \sim \mathrm{Uniform}\left(0,1\right) $ ,

$ {x}_{\mathrm{LR}} $ denotes the corresponding low-resolution precipitation input used as conditioning information,

$ {P}_{\mathrm{g}} $ denotes the distribution of generated samples,

$$ C\left(\bullet \right)\; denotes\ the\ critic\ that\ assigns\;a\; scalar\ score\ to\ each\ sample\ conditioned\; on\;{x}_{\mathrm{LR}} $$

$ \lambda $ is the gradient penalty coefficient enforcing the Lipschitz constraint.

The generator loss is given by

(Eq (3))

$$ {\mathcal{L}}_{\mathrm{G}}=-{\unicode{x1D53C}}_{\overset{\sim }{x}\sim {P}_{\mathrm{g}}}\left[C\left(\overset{\sim }{x},{x}_{\mathrm{LR}}\right)\right] $$

The critic was updated three times per generator update, following established practice (Gulrajani et al., Reference Gulrajani, Ahmed, Arjovsky, Dumoulin and Courville2017; Abdelmoaty et al., Reference Abdelmoaty, Papalexiou, Mamalakis, Singh, Coia, Hairabedian, Szeftel and Grover2025; Papalexiou and Mamalakis, Reference Papalexiou and Mamalakis2025). The Adam optimizer was used for both the generator and the critic, with momentum parameters $ {\beta}_1=0.5 $ and $ {\beta}_2=0.9 $ . The gradient penalty coefficient was fixed at $ \lambda =10 $ , following Gulrajani et al. (Reference Gulrajani, Ahmed, Arjovsky, Dumoulin and Courville2017). Models were trained up to 50 epochs and learning rates were selected empirically based on validation performance and training stability, with values of $ 1\times {10}^{-4} $ for the critic and $ 2\times {10}^{-4} $ for the generator.

4. Evaluation for dry/wet storm prediction

Evaluating the performance of deep learning models for dry/wet classification requires robust and interpretable metrics that can effectively capture both classification accuracy and spatial consistency. In this work, we focus on two key metrics: probability of zero (P₀) and spatial autocorrelation, both of which are particularly relevant in precipitation downscaling (Papalexiou and Mamalakis, Reference Papalexiou and Mamalakis2025).

4.1. Probability of zero

The probability of zero ( $ {P}_0 $ ) is used to quantify the proportion of dry pixels (zero precipitation) in a given image. This metric is especially important in assessing the model’s ability to reproduce the percentage of dry/wet regions correctly (Baño-Medina et al., Reference Baño-Medina, Manzanas and Gutiérrez2020; Papalexiou, Reference Papalexiou2022; Papalexiou and Mamalakis, Reference Papalexiou and Mamalakis2025). Mathematically, $ {P}_0 $ is defined as the fraction of pixels with zero precipitation (dry) within an image, expressed as:

(Eq (4))

$$ {P}_0=\frac{\sum_{i=1}^N1\left({X}_i=0\right)}{N} $$

Here, N is the total number of pixels in the image, $ {X}_i $ is the binary value (0 for dry,1 for wet) at pixel i.

By comparing $ {P}_0 $ values between the ground truth (e.g., radar observations) and model predictions (e.g., from a convolutional encoder-decoder or WGAN), we can assess whether the model tends to overestimate wet areas or fails to capture dry zones. This is particularly critical in storm boundary detection. In addition to the mean P₀, we also evaluate the mean bias and mean root mean squared error (RMSE) between predicted and observed $ {P}_0 $ values to assess systematic deviations in dryness representation.

4.2. Spatial autocorrelation

To complement the evaluation of model performance in predicting dry and wet regions, we also use lagged spatial autocorrelation to assess how well these models preserve the spatial structure of storm fields (Papalexiou and Mamalakis, Reference Papalexiou and Mamalakis2025). This metric computes the Pearson autocorrelation coefficient between the original binary storm field and its shifted version in both the horizontal (longitudinal) and vertical (latitudinal) directions, for spatial lags up to six pixels. The autocorrelation is expressed as:

(Eq (5))

r equals StartFraction sigma summation Underscript i equals 1 Overscript upper N Endscripts left parenthesis upper X Subscript i Baseline minus upper X overbar right parenthesis left parenthesis upper Z Subscript i Baseline minus upper Z overbar right parenthesis Over StartRoot sigma summation Underscript i equals 1 Overscript upper N Endscripts left parenthesis upper X Subscript i Baseline minus upper X overbar right parenthesis squared sigma summation Underscript i equals 1 Overscript upper N Endscripts left parenthesis upper Z Subscript i Baseline minus upper Z overbar right parenthesis squared EndRoot EndFraction

$$ r=\frac{\sum_{i=1}^N\left({X}_i-\overline{X}\right)\left({Z}_i-\overline{Z}\right)}{\sqrt{\sum_{i=1}^N{\left({X}_i-\overline{X}\right)}^2\;{\sum}_{i=1}^N{\left({Z}_i-\overline{Z}\right)}^2}} $$

Here, N denotes the total number of pixels in the field. $ {X}_i $ and $ {Z}_i $ represent the binary values (0 for dry, 1 for wet) at the ith pixel in the original and spatially shifted (lagged) fields, respectively. $ \overline{X} $ and $ \overline{Z} $ are the mean values of the original and shifted fields across all pixels. Higher lagged spatial autocorrelations indicate stronger structural dependence as a function of spatial lags. Together, the probability of zero and lagged spatial autocorrelation metrics provide a comprehensive evaluation framework for dry–wet storm prediction models, combining pixel-level accuracy with structural realism across multiple spatial scales.

5. Results

5.1. Synthetic storm data

We evaluated the performance of the convolutional encoder–decoder and WGAN models on synthetic data using three conditioning settings: Binary input, intensity input, and constrained. For the convolutional encoder–decoder model, training and validation loss curves demonstrate distinct behaviors across the settings (Figure 2a). When training using a binary input setting, the training loss decreases steadily, but the validation loss begins to diverge after approximately 20 epochs, indicating signs of overfitting. In contrast, the intensity input setting yields better generalization, with flatter validation loss curves and reaching lower loss values, suggesting that the added precipitation intensity information helps the model learn more spatial context (Figure 2a). The most stable and lowest loss values are observed in the constrained input setting, where the inclusion of dry-region constraint leads to faster convergence and improved generalization, highlighting the benefit of incorporating physically meaningful constraints for bias correction.

Figure 2.

Training and validation loss curves for convolutional encoder-decoder (top row, [a]) and WGAN (bottom row, [b]) models under three conditioning settings: binary input, intensity input, and constrained.

Six-panel line graph comparing training and validation or generator and critic loss curves across binary input, intensity input, and constrained settings for two model types. See long description.

Figure 2. Long description

The top row contains three line graphs labeled a, each showing loss on the y axis and epochs on the x axis for a convolutional encoder-decoder model. From left to right: Binary Input, Intensity Input, and Constrained. In each, blue solid lines represent training loss and orange dashed lines represent validation loss. For Binary Input, both losses decrease initially, with training loss continuing to decrease and validation loss flattening and slightly increasing after 20 epochs. For Intensity Input and Constrained, both losses drop sharply in the first 10 epochs, then flatten, with training loss consistently lower than validation loss. The bottom row contains three line graphs labeled b, each showing loss on the y axis and epochs on the x axis for a W G A N model. From left to right: Binary Input, Intensity Input, and Constrained. In each, blue solid lines represent generator loss and orange dashed lines represent critic loss. For Binary Input, generator loss increases rapidly and fluctuates around 100, while critic loss remains near zero. For Intensity Input, generator loss fluctuates widely between 0 and 7000, while critic loss stays near zero. For Constrained, generator loss peaks early near 250, then decreases and fluctuates between 0 and 200, while critic loss remains near zero throughout.

For the WGAN model, the generator and critic losses exhibit the oscillatory behavior characteristic of adversarial training (Figure 2b). When conditioned on binary inputs, loss fluctuations remain within a moderate range, indicating relatively stable generator–critic dynamics. In contrast, intensity-only conditioning leads to pronounced instability, with large-amplitude oscillations in the generator loss, reflecting difficulty in maintaining adversarial balance when relying solely on intensity information. Notably, incorporating an explicit dry-region constraint substantially stabilizes training, reducing both generator and critic loss variability after the initial training phase. This behavior suggests that dry constraints act as an effective regularizer, anchoring the adversarial learning process toward physically plausible dry/wet spatial configurations and improving training stability. While WGAN-GP loss magnitudes are not absolute indicators of convergence, their relative behavior across conditioning strategies provides insight into adversarial balance and robustness. Training stability was further assessed using complementary diagnostics, including monitoring the gradient penalty term and visually inspecting the spatial coherence and diversity of generated dry/wet fields throughout training. The model did not collapse to trivial solutions (e.g., uniformly dry or wet fields), and the Constrained input configuration consistently preserved realistic dry/wet spatial structures.

The predictions by the convolutional encoder-decoder are provided in Figure 3. We observe that predictions based on the binary input setting tend to produce coarse and overly smoothed wet areas, with poor alignment to fine-scale features in the high-resolution ground truth. The model struggles to reconstruct intricate structures like spirals or narrow wet regions due to limited spatial information in the binary input setting.

Figure 3.

Spatial representation of storm structures and dry/wet classification from synthetic test data. The first and second row displays the low-resolution and high-resolution storm fields whereas the third and fourth rows present their binary equivalents, respectively. The fifth to seventh rows illustrate the predicted high-resolution storm fields generated by a convolutional encoder-decoder model trained under the three distinct settings. The results presented here are obtained after classifying the predicted probabilities into dry (0) and wet (1) using a threshold of 0.5.

A seven-row, eight-column multi-panel plot comparing low- and high-resolution storm fields, their binary forms, and three model prediction settings for dry and wet classification. See long description.

Figure 3. Long description

The layout consists of seven rows and eight columns of square panels. Each row is labeled on the left. The first row, labeled ‘Low-Resolution’, shows pixelated storm fields in pastel colors. The second row, ‘High-Resolution’, displays detailed storm structures with swirling patterns and a color bar below indicating intensity from 0 to 14. The third row, ‘Low-Res Binary’, and the fourth row, ‘High-Res Binary’, present binary versions of the first two rows, with dark blue for dry (0) and white for wet (1) regions. The fifth to seventh rows, labeled ‘Binary input’, ‘Intensity input’, and ‘Constrained’ under ‘Conv. Encoder-Decoder Predictions’, show predicted high-resolution binary storm fields from three different model settings. Each panel in these rows displays varying spatial patterns of dry and wet regions, with the most detailed structures in the ‘Intensity input’ and ‘Constrained’ rows. The bottom color bar maps binary values from 0 (dry) to 1 (wet).

The intensity input setting significantly improves the spatial accuracy and wet-region detection by the model. Fine-scale structures begin to emerge, although some noise and discontinuities persist. Similar results are produced when trained with a dry constraint. The WGAN model also exhibits similar behavior regarding the three different training settings, but the generated fields show sharper dry/wet boundaries compared to the smoother outputs produced by the convolutional encoder–decoder. The wet/dry fields show improved sharpness and are much more realistic (Figure 4). Using the binary input setting, WGAN predictions lack spatial detail, often failing to recover sub-grid storm patterns.

Figure 4.

Spatial representation of storm structures and dry/wet classification from synthetic test data. The first and second rows display the low-resolution and high-resolution storm fields, whereas the third and fourth rows present their binary equivalents, respectively. The fifth to seventh rows illustrate the predicted high-resolution binary storm fields generated by the WGAN model trained under the three distinct settings. (The results presented here are obtained after classifying the predicted probabilities into dry (0) and wet (1) with a threshold of 0.5).

A multi-row grid compares low and high-resolution storm fields, their binary forms, and W G A N model predictions under three settings. See long description.

Figure 4. Long description

The grid contains seven rows and eight columns. The first row, labeled Low-Resolution, shows pixelated storm intensity maps with colors from pink to green. The second row, High-Resolution, displays detailed storm structures with a color bar below ranging from 0 to 14 millimeters per hour, colored from purple to green. The third row, Low-Res Binary, and fourth row, High-Res Binary, show corresponding binary maps where dark blue indicates wet (1) and white indicates dry (0). The fifth to seventh rows, labeled W G A N Predictions, present predicted high-resolution binary storm fields for three input types: Binary input (green), Intensity input (orange), and Constrained (green). Each prediction row visually matches the structure of the high-resolution binary row to varying degrees. The bottom color bar indicates binary classification, with 0 as dry and 1 as wet. All rows are aligned by storm sample across columns.

The intensity input setting improves the resolution and extent of wet regions, but the output contains fragmented structures and some overprediction in dry zones. The constrained input setting yields similar results to the Intensity input setting, but is more physically consistent (Figure 4). The constraint makes wet regions more continuous, and dry regions better preserved. This reflects the regularizing effect of the dry constraint, which helps the generator maintain realistic spatial distributions. Overall, these visual results confirm the effectiveness of using intensity and dry-region information as conditioning inputs, with the constrained input setting consistently yielding better predictions in terms of both structure and realism. This pattern holds across both model architectures, although the convolutional encoder-decoder tends to produce smoother and more interpretable outputs, while WGAN results exhibit more pixel-level noise but sometimes sharper boundaries.

The performance of the trained models was evaluated in terms of their ability to correctly classify dry regions, quantified by the predicted probability of zero $ \Big({P}_0 $ ). With the binary input setting, the convolutional encoder–decoder model displays the most pronounced bias (Bias: −8.37%, RMSE: 13.31%), with consistent underprediction of $ {P}_0 $ (i.e., overprediction of wet areas). This behavior stems from the limited information in the binary input, which lacks precipitation intensity information and makes the model prone to misclassifying uncertain regions as wet. A slight reversal occurs only in very dry scenes $ \left({P}_0>90\%\right) $ , where the model slightly overpredicts dry conditions. With the intensity input setting, performance improves substantially (Bias: 0.56%, RMSE: 2.99%). Predictions are more tightly aligned with the 1:1 line, but a consistent trend is observed: the model underpredicts dry conditions in wetter scenes $ \left({P}_0<50\%\right) $ , while in drier scenes $ \left({P}_0>50\%\right) $ , it slightly overpredicts dry conditions, as reflected by the shift of points above the diagonal (Figure 5a). This behavior indicates that while Intensity input setting helps anchor storm structure, the model still tends to slightly exaggerate coarse-resolution conditions—dry scenes are predicted drier and wet scenes are predicted wetter. When adding the dry constraint, the model predictions are similar to the ones with the intensity input (Bias: 0.66%, RMSE: 3.06%), but with a slightly more pronounced overprediction of dryness in the mid to high $ {P}_0 $ range. This shift likely results from the explicit enforcement of dry-region preservation, which encourages extrapolation of dry areas and reduces false positives in wet prediction. The result is a modest dry bias that enhances physical consistency, particularly in scenes with large-scale dry conditions.

Figure 5.

Comprehensive evaluation of model performance in predicting the Probability of Zero (P₀), defined as the percentage of dry pixels (i.e., pixels with zero precipitation) in each high-resolution output. The evaluation was conducted across five independently trained model runs for each of the three conditioning settings: Binary, Intensity, and Dry Constraint. (a) Convolutional encoder–decoder model. (b) WGAN model. Each scatter plot compares the predicted P₀ against the observed P₀ across the entire test set, providing insight into model calibration, bias, and consistency in preserving dry regions.

A six-panel scatter plot grid compares predicted versus observed P sub 0 for two models and three input types, showing calibration and error metrics. See long description.

Figure 5. Long description

The grid contains six scatter plots arranged in two rows and three columns. The top row, labeled a, represents the convolutional encoder–decoder model. The bottom row, labeled b, represents the W G A N model. Each column corresponds to a different conditioning setting: left is Binary Input, center is Intensity Input, right is Constrained. All panels plot Predicted P sub 0 percent on the y-axis against Observed P sub 0 percent on the x-axis, both ranging from 0 to 100. Each plot includes a dashed red 1 to 1 line. In the top-left panel (a, Binary Input), points are widely scattered below the 1 to 1 line, with Bias negative 8.37 and R M S E 13.32. The top-center panel (a, Intensity Input) shows points closely aligned to the 1 to 1 line, Bias 0.56, R M S E 2.99. The top-right panel (a, Constrained) has points tightly clustered along the 1 to 1 line, Bias 0.65, R M S E 3.06. The bottom-left panel (b, Binary Input) shows moderate scatter, Bias 2.83, R M S E 9.90. The bottom-center panel (b, Intensity Input) shows points near the 1 to 1 line, Bias negative 0.76, R M S E 2.97. The bottom-right panel (b, Constrained) has the tightest clustering, Bias negative 0.38, R M S E 2.64. Color density indicates point concentration, with yellow for high density and purple for low.

For WGAN, predictions based on the binary input setting show reduced bias compared to the convolutional encoder-decoder (Bias: 2.83%, RMSE: 9.90%), but exhibit greater spread across samples. Using intensity input setting improves alignment with the 1:1 line (Bias: −0.76%, RMSE: 2.97%) and reduces spread, but the model still underpredicts dry conditions. The constrained input setting provides the most balanced performance for WGAN (Bias: −0.38%, RMSE: 2.64%), with predictions closely following the 1:1 line across the full range. The dry constraint appears to shift the model’s tendency by reducing a negative bias from −0.76% (in the Intensity input setting) to −0.38%, suggesting that the constraint helps correct underestimation of dry conditions and promotes more accurate dry-region reconstruction (Figure 5b).

Next, we evaluate the spatial structure of predicted high-resolution dry/wet fields using lagged spatial autocorrelation across pixel lags from 1 to 6 in both horizontal and vertical directions. For the convolutional encoder-decoder model (Supplementary Figure S2), the predicted lagged spatial autocorrelations using binary input setting are consistently higher than the ground truth across all lags and directions, indicating over-smoothing and exaggerated spatial persistence. This pattern is consistent in both horizontal and vertical directions and is observed across all five training runs. As the spatial lag increases, the interquartile spread widens, particularly beyond lag 3, reflecting greater variability and instability in preserving long-range spatial structure. Conditioning on intensity input improves both alignment with the ground truth and model consistency. Although the autocorrelations still slightly exceed those of the ground truth, the interquartile range becomes narrower, indicating reduced variability and improved generalization. The constrained convolutional encoder-decoder shows similar results to the Intensity input setting, showing improved structural stability and only marginal differences in lagged autocorrelation magnitude and spread across seeds. Both conditioning strategies help regularize spatial continuity while limiting excessive smoothing.

For the WGAN (Figure 6), the interquartile spread is generally narrower than in the convolutional encoder-decoder across all three training settings, indicating greater consistency across different seeds, even when using binary inputs. The WGAN trained in Intensity input setting most consistently follow the autocorrelation trend, with medians closely aligned to ground truth across all lags and directions. This suggests that intensity conditioning helps preserve both short- and long-range spatial structure. Predictions from the Binary input setting show wider variability across seeds and slightly lower correlations, particularly at higher lag (beyond lag 3), indicating limited spatial coherence when only binary dry/wet information is provided. The Constrained input setting, while maintaining high fidelity as of the intensity input setting and enhanced spatial realism, exhibits somewhat more inter-seed variability, possibly due to stricter regularization suppressing fine-scale features. Notably, in all three cases, WGAN predictions do not over-smooth the fields, as reflected by their ability to retain spatial correlation beyond short lags.

Figure 6.

Lagged spatial autocorrelation of binary dry/wet precipitation fields for increasing spatial lags (1–6 pixels) in the horizontal (top row) and vertical (bottom row) directions. Results are shown for the synthetic test dataset under the three conditioning strategies (binary input, intensity input, and constrained). Boxplots summarize the distribution of lagged spatial autocorrelation values across five independent training runs of the WGAN models with different random initializations (teal), compared against the ground truth fields (purple). Model outputs were converted to binary dry/wet maps using a probability threshold of 0.5 prior to analysis.

A six-panel boxplot matrix comparing lagged spatial autocorrelation for three conditioning strategies in horizontal and vertical directions. Ground truth and W G A N results are shown across increasing pixel lags. See long description.

Figure 6. Long description

The layout consists of two rows and three columns. The top row is labeled Horizontal Direction, the bottom row is labeled Vertical Direction. Columns from left to right are Binary Input, Intensity Input, and Constrained. Each panel plots lag (pixels) on the x-axis from 1 to 6, and correlation on the y-axis from 0 to 1. For each lag, two boxplots are shown: Ground Truth in purple and W G A N in teal. In all panels, correlation decreases as lag increases. For both directions, Ground Truth boxplots are generally higher and less variable at low lags, with W G A N distributions closely following but with greater spread, especially at higher lags. The Binary Input panels show the steepest decline in correlation with lag, while Intensity Input and Constrained panels show more gradual decreases. The legend at the bottom identifies Ground Truth and W G A N colors. All model outputs are thresholded at 0.5 probability before analysis.

Overall, the integration of intensity and dry-region constraint substantially improves the performance of both convolutional encoder-decoder and WGAN models in predicting high-resolution dry/wet fields. While intensity input settings enhance the spatial localization of wet regions, the dry constraint setting provides an additional regularizing effect, reinforcing physical consistency and enhanced spatial realism. The convolutional encoder–decoder exhibits similar or slightly better performance to the WGAN regarding the marginal statistics of the dry/wet fields (see probability of zero in Figure 5 and more statistics in the Supplementary Figures S5 and S6), but they tend to produce overly smooth outputs with inflated spatial coherence. In contrast, WGAN fields, though sometimes noisier, better align with targeted spatial structures and maintain stronger consistency across different initialization seeds.

5.2. Radar-based storm data

The training of both convolutional encoder–decoder and WGAN models on radar-based precipitation data reveals several important patterns, many of which align with what was observed during training on synthetic storm data (Supplementary Figure S3). For the convolutional encoder–decoder model, all three training settings (binary input, intensity input, and constrained input) demonstrate similar behavior to the training using synthetic data (Figure 2a) with steady reductions in training loss, while the validation loss reaches a plateau with the first 20 epochs (Supplementary Figure S3a). When using the dry constraint, the most favorable training is achieved, with both training and validation losses reaching consistently lower values than other settings and converging more smoothly, which reflects improved robustness and more physically consistent learning of dry/wet patterns. The WGAN also shows similar oscillatory training behavior to that observed on synthetic data (Supplementary Figure S3b). Generator loss exhibits large fluctuations under the Binary Input configuration, but these fluctuations become significantly narrower under the Intensity input and Constrained input settings. The difference in fluctuation range between synthetic and radar-based training may partly reflect differences in the scale of the conditioning inputs. In the synthetic experiments, the aggregated low-resolution intensity field can reach maximum values of approximately 1400 when the 60×60 synthetic precipitation fields are accumulated into 6×6 grids. In contrast, radar-based inputs correspond to hourly precipitation rates with maximum values on the order of several mm h⁻¹, resulting in a substantially smaller input magnitude. The difference in scale likely produces smoother gradients during optimization and contributes to differences in the observed fluctuation ranges (Figure S3b). Additionally, while the critic loss appears relatively flat across configurations, its magnitude alone does not provide a direct indication of training stability in WGAN-GP. The generator loss curves show different fluctuation patterns across conditioning settings, which may reflect differences in adversarial training dynamics under the different conditioning strategies.

Prediction results on radar-based precipitation data, shown in Figures 7 and 8, demonstrate the significant differences in dry/wet classification accuracy across conditioning strategies and model architectures. As seen previously with synthetic data, using binary input setting yields the poorest fidelity in both convolutional encoder-decoder and WGAN models. The predictions under this setting are overly coarse and tend to underestimate wet regions, largely replicating the blocky structure of the low-resolution binary input rather than the nuanced patterns observed in the high-resolution target. This issue is especially apparent for smaller storm cells and isolated wet areas, which are either entirely missed or exaggerated in extent. The convolutional encoder-decoder predictions based on binary input setting produce relatively large contiguous wet regions that do not align well with the ground truth, while WGAN predictions are more textured but still misrepresent the exact shape and spatial extent of wet zones (Figures 7, 8).

Figure 7.

Spatial representation of storm structures from the Radar dataset test sample and predictions from convolutional encoder–decoder models trained under the three conditioning settings. The first and second rows display the low-resolution and high-resolution precipitation fields, whereas the third and fourth rows show their corresponding binary fields. The fifth through seventh rows illustrate the predicted high-resolution fields generated under the binary, intensity, and constrained Input settings. The results presented here are obtained after converting predicted probabilities into dry (0) and wet (1) using a threshold of 0.5.

A seven-row, six-column multi-panel heatmap and binary map grid comparing observed and predicted storm precipitation fields under different model input settings. See long description.

Figure 7. Long description

From the top left, the first row displays six low-resolution precipitation fields with blocky color patches, labeled Low-Resolution. The second row shows corresponding high-resolution precipitation fields with finer spatial detail, labeled High-Resolution. Both rows use a horizontal colorbar below the second row, ranging from 0 to 5 millimeters per hour, with colors from pink to green. The third row presents low-resolution binary fields, labeled Low-Res Binary, where dark blue indicates wet and white indicates dry. The fourth row shows high-resolution binary fields, labeled High-Res Binary, with more granular blue regions. The fifth to seventh rows are labeled Conv. Encoder-Decoder Predictions, with left-side labels for Binary Input, Intensity Input, and Constrained. Each of these rows contains six predicted high-resolution binary fields, with blue and white regions indicating wet and dry, respectively. The bottom colorbar, labeled Binary (0: Dry, 1: Wet), anchors the binary classification. The spatial evolution from top to bottom demonstrates increasing resolution and the effect of different model conditioning settings on predicted storm structure.

Figure 8.

Spatial representation of storm structures from the Radar dataset test sample and predictions from the WGAN trained under the three conditioning settings. The first and second rows display the low-resolution and high-resolution precipitation fields, whereas the third and fourth rows show their corresponding binary fields. The fifth to seventh rows illustrate the predicted high-resolution fields generated under the binary, intensity, and constrained input settings. The results presented here are obtained after converting predicted probabilities into dry (0) and wet (1) using a threshold of 0.5.

A seven-row, six-column grid of storm structure panels compares low and high-resolution precipitation fields, binary fields, and W G A N predictions under three input settings. See long description.

Figure 8. Long description

From top to bottom, the first row shows low-resolution precipitation fields with coarse color blocks. The second row displays high-resolution precipitation fields with finer, more detailed color gradients. Both use a horizontal colorbar below the second row labeled Intensity in millimeters per hour, ranging from zero to five. The third row presents low-resolution binary fields, with dark blue for wet and white for dry, showing blocky patterns. The fourth row shows high-resolution binary fields with more intricate wet-dry boundaries. The fifth to seventh rows are labeled W G A N Predictions, with blue sidebars indicating Binary input, orange for Intensity input, and green for Constrained input. Each of these rows contains high-resolution binary fields predicted under the respective input setting, visually similar to the true high-resolution binary fields but with subtle differences in wet-dry spatial distribution. The bottom colorbar, labeled Binary zero for dry and one for wet, anchors the binary rows. All columns align spatially, allowing direct comparison across input and prediction types for each storm sample.

When using intensity as an input, both models exhibit some improvements. The added gradient and structure in the low-resolution intensity input may provide additional spatial information, which could help the networks to better localize precipitation regions. The convolutional encoder-decoder predictions become more confined, with the focus on the spatial extent of comparatively large contiguous wet zones matching the high-resolution target (Figure 7). The Intensity input setting improves spatial coherence, enabling more realistic storm patterns, but often overestimates wet area extent, especially in WGAN predictions. Notably, the constrained input setting consistently yields the most faithful reconstructions of high-resolution binary fields for both the convolutional encoder-decoder and the WGAN. It effectively suppresses false positives in dry regions and recovers sharper wet-dry boundaries, demonstrating that dry-region guidance significantly enhances physical realism in the generated outputs (Figure 8). Nevertheless, both models clearly benefit from the richer input, capturing small storm patches and distinguishing dry regions more accurately than in the binary input setting. The most accurate outputs are again achieved by the WGAN when it is constrained.

Regarding the overall predicted probability of zero ( $ {P}_0 $ ; see Figure 9), for the convolutional encoder–decoder, the binary input setting shows the largest spread from the 1:1 line, with a near-zero bias (0.02%) and an RMSE of 7.67% (Figure 9a). The spread is particularly pronounced at mid-to-high observed $ {P}_0 $ values, indicating greater variability in predicted dry fractions under this input configuration. Introducing Intensity input improves the agreement with observations: RMSE drops to 2.80% and predictions align more closely with the 1:1 line, though a slight positive bias of 1.07% is observed. Under the constrained input setting, predictions remain closely clustered around the diagonal with reduced outliers, yielding an RMSE of 3.06% and a bias of 1.48%.

Figure 9.

Comprehensive evaluation of model performance in predicting the Probability of Zero (P₀) from a test sample of radar data. The evaluation was conducted across five independently trained models for each of the three conditioning settings: Binary, Intensity, and Dry Constraint. (a) Convolutional encoder–decoder model. (b) WGAN model.

Six-panel scatterplot matrix comparing predicted versus observed P sub 0 percent for two models and three input types, with bias and R M S E values shown. See long description.

Figure 9. Long description

Top row panels correspond to the convolutional encoder–decoder model, labeled (a) on the left. Bottom row panels correspond to the W G A N model, labeled (b) on the left. Each panel plots predicted P sub 0 percent on the y axis against observed P sub 0 percent on the x axis, both ranging from 0 to 100 percent. The leftmost column is labeled Binary Input, the middle column Intensity Input, and the rightmost column Constrained. In the top left panel (a, Binary Input), bias is 0.02 and R M S E is 7.67. In the top middle panel (a, Intensity Input), bias is 1.07 and R M S E is 2.80. In the top right panel (a, Constrained), bias is 1.48 and R M S E is 3.06. In the bottom left panel (b, Binary Input), bias is negative 1.36 and R M S E is 6.13. In the bottom middle panel (b, Intensity Input), bias is negative 0.16 and R M S E is 1.80. In the bottom right panel (b, Constrained), bias is negative 0.49 and R M S E is 1.98. All panels show a dense cluster of points along the 1 to 1 line, with the tightest clustering and lowest R M S E in the intensity and constrained input conditions, especially for the W G A N model.

Across all input settings, the convolutional encoder–decoder tends to slightly overestimate dry region in cases with higher observed dryness ( $ {P}_0>50\% $ ), indicating a tendency to underpredict wet conditions in relatively dry fields. The WGAN exhibits similar overall behavior but with smaller errors (Figure 9b). Under binary input setting, the WGAN predictions exhibit a negative bias of –1.36% and an RMSE of 6.13%, with a greater spread relative to the intensity-based setting. The performance improves significantly under intensity input setting, where RMSE decreases to 1.80% and the bias is reduced to –0.16%, with predictions closely aligned along the 1:1 line. The Constrained input setting achieves similar strong performance with an RMSE of 1.98% and a bias of –0.49%. While a slight underestimation of $ {P}_0 $ is observed for very dry cases ( $ {P}_0>50\%\Big) $ , but overall the results indicate improved consistency in predicted dry-region fractions. In general, both models benefit from intensity-based inputs, while the addition of dry-region constraints provides further improvements in prediction consistency, particularly for the WGAN framework.

For the convolutional encoder–decoder, predicted fields consistently exhibit higher median correlations than the ground truth across all input settings and lag distances (Supplementary Figure S4). This is especially evident at mid-to-high lags, where the ground truth correlations naturally decay due to fine-scale variability, whereas convolutional encoder-decoder predictions maintain artificially elevated correlation. This behavior reflects the model’s tendency to produce overly smooth outputs, which suppress local noise and enhance spatial continuity beyond what is present in the actual data. Among input types, the Intensity input setting and the Constrained input setting yield slightly more controlled correlation trends compared to binary input setting, but the overall over-smoothing pattern persists. In all three settings, the convolutional encoder–decoder shows elevated interquartile variability, suggesting greater uncertainty and instability in capturing long-range spatial structures. The WGAN exhibit reduced interquartile variability compared to the convolutional encoder–decoder, in both directions, suggesting more stable spatial pattern learning (Figure 10). Notably, the intensity input and constrained input setting yield autocorrelation whose medians closely track the ground truth, across different lags. The dry constraint regularizes spatial features effectively but introduces slightly higher seed-to-seed variability. These observations highlight that while the convolutional encoder–decoder tends to over-persist spatial patterns, WGAN better capture the true spatial extent and variability of dry/wet zones.

Figure 10.

Lagged spatial autocorrelation of binary dry/wet precipitation fields for increasing spatial lags (1–6 pixels) in the horizontal (top row) and vertical (bottom row) directions. Results are shown for the radar test dataset under the three conditioning strategies (binary input, intensity input, and constrained). Boxplots summarize the distribution of lagged spatial autocorrelation values across five independently trained WGAN with different random initializations (teal), compared against the ground truth fields (purple). Model outputs were converted to binary dry/wet maps using a probability threshold of 0.5 prior to analysis.

Six-panel boxplot chart comparing lagged spatial autocorrelation for three conditioning strategies and two directions, showing W G A N and ground truth distributions. See long description.

Figure 10. Long description

There are six panels in a two by three grid. The top row is labeled Horizontal Direction Correlation, the bottom row is Vertical Direction Correlation. Each row has three panels labeled left to right as Binary Input, Intensity Input, and Constrained. The x axis in all panels is Lag in pixels, ranging from 1 to 6. The y axis is Correlation, ranging from 0 to 1. Each panel contains boxplots for each lag value, with teal boxes representing W G A N results and purple boxes for ground truth. For both directions and all conditioning strategies, correlation decreases as lag increases. W G A N distributions generally overlap with ground truth but show greater spread, especially at higher lags. The legend at the bottom identifies Ground Truth in purple and W G A N in teal.

Similarly to the conclusions from the synthetic dataset, the constrained training leads to enhanced accuracy, physical realism, and structural alignment of dry/wet separations, especially for WGAN models. While predictions on radar data exhibit overall slightly lower performance due to real-world complexity, the relative gains introduced by using the intensity input setting and constrained input setting remain evident (see Supplementary Figures S7 and S8). Models trained with intensity input setting and constrained input setting not only achieve lower RMSE and reduced bias in terms of predicting the overall probability of dry conditions but also produce spatial patterns that better match the observed spatial autocorrelation structure. These results highlight the value of integrating a consistency constraint into data-driven downscaling frameworks to improve predictive performance, even in challenging real-world applications.

6. Discussion

To demonstrate the value of explicitly predicting dry/wet classification fields, we applied the dry/wet mask generated by the WGAN trained under the constrained input setting to post-process high-resolution precipitation intensity fields produced by separately trained intensity prediction models. These intensity models use the same architectures and training datasets described earlier for the dry/wet experiments, but are trained to predict continuous precipitation intensity rather than binary occurrence. The convolutional encoder–decoder is optimized using mean squared error (MSE) loss, while the WGAN retains the adversarial training framework with intensity fields as targets. Results for synthetic data are presented in Figures 11, 12, while corresponding results for radar data are provided in Supplementary Figures S10–S11. Both the convolutional encoder-decoder (trained with mean squared error loss) and the WGAN (trained using the Wasserstein loss with gradient penalty) were used for storm intensity prediction; however, these models commonly exhibit drizzle bias, producing low, non-zero precipitation values in regions that should be entirely dry (Lazoglou et al., Reference Lazoglou, Economou, Anagnostopoulou, Zittis, Tzyrkalli, Georgiades and Lelieveld2024). This behavior has been reported in previous studies of machine learning–based precipitation prediction and downscaling, where models trained with MSE or adversarial losses tend to underestimate the occurrence of truly dry events. To address this, we applied the binary dry/wet mask generated by the WGAN trained in the constrained input setting to the predicted intensity fields, effectively eliminating spurious low-intensity values and enforcing the spatial dryness constraint. This correction significantly improved the realism and accuracy of the downscaled precipitation fields. In the synthetic data use-case, the uncorrected WGAN outputs exhibited a pronounced drizzle bias, which is also reflected in the flat-line behavior in the probability of zero ( $ {P}_0 $ ) plots, where predicted $ {P}_0 $ values remained systematically lower than the observed values. Applying the dry/wet mask corrected this behavior. Specifically, the raw downscaled precipitation fields showed a substantial negative bias in $ {P}_0 $ (−50.35%) and a high RMSE (58.45%), indicating a consistent overestimation of wet conditions (Figure 11a). After correction, the predicted P₀ values aligned closely with the 1:1 reference line, with the bias reduced to −0.54% and the RMSE to 2.93%. This substantial improvement demonstrates that integrating explicit dry-region awareness through classification masks meaningfully enhances the model’s ability to reproduce realistic dry/wet distributions in high-resolution downscaled precipitation fields.

Figure 11.

Improved dry/wet prediction and spatial structure following correction. (a) Applying a predicted dry/wet mask to WGAN-generated precipitation fields substantially reduces bias and RMSE in dry-pixel prediction (P₀). (b) Corrected outputs exhibit lagged spatial autocorrelations that more closely track the ground truth in both horizontal and vertical directions, indicating improved spatial coherence.

A four-panel comparison of W G A N and corrected precipitation predictions versus observed data, showing reduced bias and improved spatial correlations. See long description.

Figure 11. Long description

In the top row, the left scatterplot is titled ‘Observed vs W G A N’ with x-axis labeled Observed Po percent and y-axis labeled W G A N Po percent. The plot shows a dense cluster of points along the x-axis, with a dashed one-to-one line. Bias is minus 50.35 percent and R M S E is 58.45 percent. The right scatterplot is titled ‘Observed vs Corrected W G A N’ with x-axis labeled Observed Po percent and y-axis labeled Corrected Po percent. Points are tightly clustered along the one-to-one line. Bias is minus 0.54 percent and R M S E is 2.93 percent. In the bottom row, the left boxplot is titled ‘Horizontal Spatial Correlations’ with x-axis labeled Lag from 1 to 7 and y-axis labeled Correlation from 0 to 1. Three boxplot groups per lag represent Ground Truth (black), W G A N Prediction (cyan), and Corrected Prediction (magenta). The right boxplot is titled ‘Vertical Spatial Correlations’ with the same axes and color scheme. For both spatial correlation plots, corrected predictions closely track ground truth across lags, while W G A N predictions deviate more at higher lags.

Figure 12.

Enhanced realism in corrected precipitation fields. Corrected images, generated by masking WGAN-predicted intensities using the dry/wet classification, show improved alignment with the ground truth, eliminating false wet areas and preserving sharper storm boundaries. (White space is represented as dry regions, whereas negative predictions from WGAN are represented with brown color).

A five-by-five panel grid compares precipitation field processing stages from low-res input to corrected output, showing improved storm boundary sharpness and reduced false wet areas. See long description.

Figure 12. Long description

From left to right, each row displays: Column one, Low-Res Input, with coarse pixelated color blocks; column two, High-Res Image, showing detailed storm spirals and gradients; column three, Predicted Intensity, with smoother but sometimes overextended wet regions, including negative values in brown; column four, Dry/Wet Mask, with binary green (wet) and white (dry) segmentation; column five, Corrected Image, where predicted intensities are masked by the dry/wet classification, resulting in sharper storm boundaries and elimination of spurious wet regions. The bottom colorbars define storm intensity from 0.0 (white) to 15.0 (magenta) and dry/wet mask from 0 (white) to 1 (green). Each row represents a different storm event, with spatial features preserved and false positives reduced in the final column.

The benefits of the dry/wet mask correction extend beyond pixel-wise accuracy to improvements in spatial structure, as assessed through spatial autocorrelation in both horizontal and vertical directions. The corrected fields exhibit more stable and realistic autocorrelation patterns across all spatial lags, closely tracking the ground truth (Figure 11b, Supplementary Figure S10b). This indicates that the correction not only enhances dry/wet classification but also helps preserve spatial coherence and the structural integrity of storm systems. These quantitative improvements are further supported by visual inspection of sample storm fields (Figure 12, Supplementary Figure S11). Uncorrected WGAN outputs tend to diffusely extend light rainfall into surrounding dry areas, a clear manifestation of drizzle bias. In contrast, applying the dry/wet mask effectively delineates dry regions and storm boundaries. The corrected fields exhibit sharper edges, cleaner separation between wet and dry zones, and overall stronger visual agreement with the ground truth. Notably, fine-scale dry regions, often completely absent in the uncorrected predictions, are successfully recovered, highlighting the mask’s value in restoring both large- and small-scale storm features.

Overall, these results demonstrate the value of combining high-resolution intensity prediction with an explicit binary dry/wet mask. The predicted mask acts as a consistency constraint that guides the precipitation field toward more realistic dry/wet patterns. This two-stage modeling approach substantially reduces overestimation of wet conditions, improves the spatial organization of rainfall features, and ensures better alignment with both statistical and structural aspects of the observed data. Such physically constrained post-processing offers a practical and effective strategy for improving the realism and interpretability of deep learning-based downscaling models.

7. Conclusion

The study provides a comprehensive evaluation of deep learning-based downscaling approaches for high-resolution dry/wet classification using both synthetic and radar-based precipitation datasets. The convolutional encoder-decoder and WGAN were rigorously tested under three distinct training settings—using binary input, intensity input, and a constrained input, in order to assess their ability to reproduce spatially accurate and physically consistent dry/wet patterns. Two primary evaluation metrics were employed: the probability of zero ( $ {P}_0 $ ), which measures the accuracy of dry pixel prediction, and the lagged Pearson spatial autocorrelation, which quantifies spatial coherence and structural fidelity across multiple spatial lags. Across all experiments, using intensity input setting consistently improved model performance over the binary input setting, reducing drizzle bias and enhancing the spatial localization of wet regions for both the convolutional encoder–decoder and WGAN. The constrained input setting yielded the most balanced and generalizable results, especially for the WGAN, by reinforcing the extent and continuity of dry regions and reducing the occurrence of false wet predictions. While the convolutional encoder-decoder model tended to produce smoother outputs with somewhat overestimated spatial autocorrelation, it performed stably across different random initializations, with particularly strong results when trained under the intensity input setting. In contrast, the WGAN demonstrated sharper storm boundaries and greater inter-seed consistency, particularly when conditioned on both intensity and dry-region constraint, and their spatial autocorrelations were more closely aligned with the ground truth, especially in the horizontal direction for both types of storm data, synthetic as well as radar-based. Moreover, our results clearly demonstrated that incorporating a dry/wet mask significantly improves the downscaling outputs, effectively correcting the overestimation of wet conditions and allowing for the recovery of fine-scale dry features that were often missed in raw high-resolution predictions. Ultimately, a two-stage approach of separately predicting intensity and dry/wet masks proved highly effective, demonstrating that physically informed conditioning can substantially enhance the spatial accuracy and realism of deep learning-based precipitation downscaling.

These findings underscore the potential of deep learning models, particularly WGAN-based architectures with physically informed constraints for downscaling coarse-resolution precipitation fields into spatially detailed and physically consistent dry/wet patterns. The demonstrated improvements in both statistical accuracy and spatial structure, especially through the use of a predicted dry/wet mask, offer a practical pathway toward more reliable and interpretable downscaling outputs. Such advances are especially valuable in applications requiring high-fidelity dry/wet delineation, including drought monitoring, hydrological modeling, and storm impact assessments. Future work should explore further architectural enhancements and extend evaluations to diverse climatic regimes and storm types to better understand the models’ generalizability and robustness for real-world applications.

Open peer review

To view the open peer review materials for this article, please visit http://doi.org/10.1017/eds.2026.10039.

Supplementary material

The supplementary material for this article can be found at http://doi.org/10.1017/eds.2026.10039.

Acknowledgements

We gratefully acknowledge the financial support provided by the Environmental Institute, University of Virginia, through the Strategic Investment Fund for carrying out this research work.

Author contribution

Conceptualization: S.S., S.M.P., H.M.A., A.M.; Data curation: S.S.; Formal analysis: S.S.; Investigation: S.S., S.M.P., H.M.A.; Methodology: S.S., S.M.P., A.M.; Validation: S.S., A.M.; Visualization: S.S., H.M.A.; Writing - original draft: S.S.; Writing - review & editing: S.S., S.M.P., H.M.A., T.H., A.M.; Funding acquisition: S.M.P., T.H., A.M.; Resources: S.M.P., A.M.; Supervision: S.M.P., T.H., A.M.; Project administration: T.H., A.M.

Competing interests

The authors declare none.

Data availability statement

The CoSMoS R package (Papalexiou et al., Reference Papalexiou, Serinaldi and Porcu2021) used for generating storms, is available at CRAN (R Core Team, 2021; https://cran.r-project.org/web/packages/CoSMoS/vignettes/vignette.html).

The hourly precipitation data used in this study were obtained from the NCEP Multi-Radar/Multi-Sensor (MRMS) system, accessed via the Iowa Environmental Mesonet (IEM) platform (https://mesonet.agron.iastate.edu/GIS/rasters.php?rid=4). The MRMS dataset provides high-resolution precipitation estimates at a 1 km × 1 km spatial scale and is publicly available for research purposes. These data were used to evaluate model performance on real-world storm structures and dry/wet classification accuracy. The code used for model training, inference, and evaluation is publicly available on GitHub at https://github.com/shivamsinghhada/Binary_precipitation_downscaling.

Footnotes

This research article was awarded Open Data and Open Materials badges for transparent practices. See the Data Availability Statement for details.

References

Abdelmoaty, HM, Papalexiou, SM, Mamalakis, A, Singh, S, Coia, V, Hairabedian, M, Szeftel, P and Grover, P (2025) Generative adversarial networks for downscaling hourly precipitation in the Canadian Prairies. Journal of Geophysical Research: Machine Learning and Computation 2(4). https://doi.org/10.1029/2025JH000678.Google Scholar

AghaKouchak, A, Mehran, A, Norouzi, H and Behrangi, A (2012) Systematic and random error components in satellite precipitation data sets. Geophysical Research Letters 39(9). https://doi.org/10.1029/2012GL051592.CrossRef Google Scholar

Akinsanola, AA, Chen, Z, Kooperman, GJ and Bobde, V (2024) Robust future intensification of winter precipitation over the United States. npj Climate and Atmospheric Science 7(1), 212. https://doi.org/10.1038/s41612-024-00761-8.CrossRef Google Scholar

Arjovsky, M, Chintala, S and Bottou, L (2017) Wasserstein GAN. Preprint. http://arxiv.org/abs/1701.07875.Google Scholar

Badrinarayanan, V, Kendall, A and Cipolla, R (2017) SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(12), 2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615.CrossRef Google Scholar PubMed

Baghanam, AH, Nourani, V, Bejani, M, Pourali, H, Kantoush, SA and Zhang, Y (2024) A systematic review of predictor screening methods for downscaling of numerical climate models. Earth-Science Reviews 253, 104773. https://doi.org/10.1016/j.earscirev.2024.104773.CrossRef Google Scholar

Banõ-Medina, J, Manzanas, R, Cimadevilla, E, Fernandez, J, Gonzalez-Abad, J, Cofinõ, AS and Gutierrez, JM (2022) Downscaling multi-model climate projection ensembles with deep learning (DeepESD): Contribution to CORDEX EUR-44. Geoscientific Model Development 15(17), 6747–6758. https://doi.org/10.5194/gmd-15-6747-2022.CrossRef Google Scholar

Baño-Medina, J, Manzanas, R and Gutiérrez, JM (2020) Configuration and intercomparison of deep learning neural models for statistical downscaling. Geoscientific Model Development 13(4), 2109–2124. https://doi.org/10.5194/gmd-13-2109-2020.CrossRef Google Scholar

Berrang-Ford, L, Ford, JD and Paterson, J (2011) Are we adapting to climate change? Global Environmental Change 21. https://doi.org/10.1016/j.gloenvcha.2010.09.012.CrossRef Google Scholar

Chen, J, Janke, T, Steinke, F and Lerch, S (2024) Generative machine learning methods for multivariate ensemble postprocessing. The Annals of Applied Statistics 18(1). https://doi.org/10.1214/23-AOAS1784.CrossRef Google Scholar

Chen, X, Leung, LR, Gao, Y, Liu, Y and Wigmosta, M (2023) Sharpening of cold-season storms over the western United States. Nature Climate Change 13(2), 167–173. https://doi.org/10.1038/s41558-022-01578-0.CrossRef Google Scholar

Choubey, D, Patil, V and Anand Kumar, M (2024) Cognitive chromatic image synthesis using UNET and GAN. In 2024 2nd International Conference on Recent Advances in Information Technology for Sustainable Development (ICRAIS), Manipal, India. Piscataway, NJ, USA: IEEE, pp. 178–183. https://doi.org/10.1109/ICRAIS62903.2024.10811702.CrossRef Google Scholar

Creswell, A, White, T, Dumoulin, V, Arulkumaran, K, Sengupta, B and Bharath, AA (2018) Generative adversarial networks: An overview. IEEE Signal Processing Magazine 35(1), 53–65. https://doi.org/10.1109/MSP.2017.2765202.CrossRef Google Scholar

Dueben, PD and Bauer, P (2018) Challenges and design choices for global weather and climate models based on machine learning. Geoscientific Model Development 11(10), 3999–4009. https://doi.org/10.5194/gmd-11-3999-2018.CrossRef Google Scholar

Duncan, J, Subramanian, S and Harrington, P (2022) Generative modeling of high-resolution global precipitation forecasts. Preprint. http://arxiv.org/abs/2210.12504.Google Scholar

Gao, XJ, Shi, Y, Zhang, D, Wu, J, Giorgi, F, Ji, Z and Wang, Y (2012) Uncertainties in monsoon precipitation projections over China: Results from two high-resolution RCM simulations. Climate Research 52. https://doi.org/10.3354/cr0108.CrossRef Google Scholar

Giorgi, F and Gutowski, WJ (2015) Regional dynamical downscaling and the CORDEX initiative. Annual Review of Environment and Resources 40. https://doi.org/10.1146/annurev-environ-102014-021217.CrossRef Google Scholar

Glawion, L, Polz, J, Kunstmann, H, Fersch, B and Chwala, C (2023) spateGAN: Spatio-temporal downscaling of rainfall fields using a cGAN approach. Earth and Space Science 10(10). https://doi.org/10.1029/2023EA002906.CrossRef Google Scholar

González-Abad, J, Baño-Medina, J and Cachá, IH (2023) On the use of deep generative models for perfect prognosis climate downscaling. Preprint. http://arxiv.org/abs/2305.00974.Google Scholar

Goodfellow, I (2016) NIPS 2016 tutorial: Generative adversarial networks. Preprint. http://arxiv.org/abs/1701.00160.Google Scholar

Gulrajani, I, Ahmed, F, Arjovsky, M, Dumoulin, V and Courville, AC (2017) Improved training of Wasserstein GANs. In Advances in Neural Information Processing Systems (NeurIPS 2017), vol. 30.Google Scholar

Gutowski, WJ, Ullrich, PA, Hall, A, Leung, LR, O’Brien, TA, Patricola, CM, Arritt, RW, Bukovsky, MS, Calvin, KV, Feng, Z, Jones, AD, Kooperman, GJ, Monier, E, Pritchard, MS, Pryor, SC, Qian, Y, Rhoades, AM, Roberts, AF, Sakaguchi, K, et al. (2020) The ongoing need for high-resolution regional climate models: Process understanding and stakeholder information. Bulletin of the American Meteorological Society 101(5), E664–E683. https://doi.org/10.1175/BAMS-D-19-0113.1.CrossRef Google Scholar

Han, Y and Guan, L (2023) GAN-based vertical federated learning for label protection in binary classification. arXiv preprint arXiv:2302.02245. Available at https://arxiv.org/abs/2302.02245.Google Scholar

Harder, P, Hernandez-Garcia, A, Ramesh, V, Yang, Q, Sattegeri, P, Szwarcman, D, Watson, CD, Rolnick, D and Watson, C (2023) Hard-constrained deep learning for climate downscaling. Journal of Machine Learning Research 24, 1–40.Google Scholar

Harris, L, McRae, ATT, Chantry, M, Dueben, PD and Palmer, TN (2022) A generative deep learning approach to stochastic downscaling of precipitation forecasts. Journal of Advances in Modeling Earth Systems 14(10). https://doi.org/10.1029/2022MS003120.CrossRef Google Scholar PubMed

Hobeichi, S, Nishant, N, Shao, Y, Abramowitz, G, Pitman, A, Sherwood, S, Bishop, C and Green, S (2023) Using machine learning to cut the cost of dynamical downscaling. Earth’s Future 11(3). https://doi.org/10.1029/2022EF003291.CrossRef Google Scholar

Kalantar, R, Messiou, C, Winfield, JM, Renn, A, Latifoltojar, A, Downey, K, Sohaib, A, Lalondrelle, S, Koh, D-M and Blackledge, MD (2021) CT-based pelvic T1-weighted MR image synthesis using UNet, UNet++ and cycle-consistent generative adversarial network (cycle-GAN). Frontiers in Oncology 11. https://doi.org/10.3389/fonc.2021.665807.CrossRef Google Scholar PubMed

Kingma, DP and Ba, J (2014) Adam: A method for stochastic optimization. Preprint. http://arxiv.org/abs/1412.6980 Google Scholar

Kumar, B, Atey, K, Singh, BB, Chattopadhyay, R, Acharya, N, Singh, M, Nanjundiah, RS and Rao, SA (2023) On the modern deep learning approaches for precipitation downscaling. Earth Science Informatics 16(2), 1459–1472.10.1007/s12145-023-00970-4CrossRef Google Scholar

Lazoglou, G, Economou, T, Anagnostopoulou, C, Zittis, G, Tzyrkalli, A, Georgiades, P and Lelieveld, J (2024) Multivariate adjustment of drizzle bias using machine learning in European climate projections. Geoscientific Model Development 17(11), 4689–4703. https://doi.org/10.5194/gmd-17-4689-2024.CrossRef Google Scholar

Leinonen, J, Nerini, D and Berne, A (2020) Stochastic super-resolution for downscaling time-evolving atmospheric fields with a generative adversarial network. IEEE Transactions on Geoscience and Remote Sensing 59, 7211–7223. https://doi.org/10.1109/TGRS.2020.3032790.CrossRef Google Scholar

Lesnikowski, AC, Ford, JD, Berrang-Ford, L, Barrera, M and Heymann, J (2015) How are we adapting to climate change? A global assessment. Mitigation and Adaptation Strategies for Global Change, 20. https://doi.org/10.1007/s11027-013-9491-x.CrossRef Google Scholar

Liu, Y, Ganguly, AR and Dy, J (2020) Climate downscaling using YNet. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York, NY, USA: ACM, pp. 3145–3153. https://doi.org/10.1145/3394486.3403366.Google Scholar

Lopez-Gomez, I, Wan, ZY, Zepeda-Núñez, L, Schneider, T, Anderson, J and Sha, F (2024) Dynamical-generative downscaling of climate model ensembles. Preprint. http://arxiv.org/abs/2410.01776.Google Scholar

Luo, Y, Zhang, K, Wang, W, Chen, X, Feng, J, Wang, H, Liu, W, Guo, C, Chen, C and Wang, X (2025) An improved statistical bias correction method for global climate model (GCM) precipitation projection: A case study on the CMCC-CM2-SR5 model projection in China’s Huaihe River basin. Journal of Hydrology: Regional Studies 57, 102146. https://doi.org/10.1016/j.ejrh.2024.102146.Google Scholar

Magnan, AK (2016) Climate change: Metrics needed to track adaptation. Nature 530. https://doi.org/10.1038/530160d.CrossRef Google Scholar PubMed

Mamalakis, A, Langousis, A, Deidda, R and Marrocu, M (2017) A parametric approach for simultaneous bias correction and high-resolution downscaling of climate model rainfall. Water Resources Research 53(3), 2149–2170. https://doi.org/10.1002/2016WR019578.CrossRef Google Scholar

Maraun, D (2016) Bias correcting climate change simulations—a critical review. Current Climate Change Reports 2. https://doi.org/10.1007/s40641-016-0050-x.CrossRef Google Scholar

Maraun, D, Wetterhall, F, Ireson, AM, Chandler, RE, Kendon, EJ, Widmann, M, Brienen, S, Rust, HW, Sauter, T, Themeßl, M, Venema, VKC, Chun, KP, Goodess, CM, Jones, RG, Onof, C, Vrac, M and Thiele-Eich, I (2010) Precipitation downscaling under climate change: Recent developments to bridge the gap between dynamical models and the end user. Reviews of Geophysics 48(3), RG3003. https://doi.org/10.1029/2009RG000314.CrossRef Google Scholar

Meghani, S, Singh, S, Kumar, N and Goyal, MK (2023) Predicting the spatiotemporal characteristics of atmospheric rivers: A novel data-driven approach. Global and Planetary Change 231, 104295. https://doi.org/10.1016/j.gloplacha.2023.104295.CrossRef Google Scholar

Mendoza Paz, S and Willems, P (2023) The skill of statistical downscaling in future climate with high-resolution climate models as pseudo-reality. Journal of Hydrology: Regional Studies 48, 101477. https://doi.org/10.1016/j.ejrh.2023.101477.Google Scholar

Miralles, O, Steinfeld, D, Martius, O and Davison, AC (2022) Downscaling of historical wind fields over Switzerland using generative adversarial networks. Artificial Intelligence for the Earth Systems 1(4). https://doi.org/10.1175/aies-d-22-0018.1.CrossRef Google Scholar

Murukesh, M, Golla, S and Kumar, P (2024) Downscaling and reconstruction of high-resolution gridded rainfall data over India using deep learning-based generative adversarial network. Modeling Earth Systems and Environment 10(2), 2221–2237. https://doi.org/10.1007/s40808-023-01899-9.CrossRef Google Scholar

Nishant, N, Hobeichi, S, Sherwood, S, Abramowitz, G, Shao, Y, Bishop, C and Pitman, A (2023) Comparison of a novel machine learning approach with dynamical downscaling for Australian precipitation. Environmental Research Letters 18(9), 094006. https://doi.org/10.1088/1748-9326/ace463.CrossRef Google Scholar

Pan, B, Anderson, GJ, Goncalves, A, Lucas, DD, Bonfils, CJW, Lee, J, Tian, Y and Ma, H (2021) Learning to correct climate projection biases. Journal of Advances in Modeling Earth Systems 13(10). https://doi.org/10.1029/2021MS002509.CrossRef Google Scholar

Papalexiou, SM (2022) Rainfall generation revisited: Introducing CoSMoS-2s and advancing copula-based intermittent time series Modeling. Water Resources Research 58(6). https://doi.org/10.1029/2021WR031641.CrossRef Google Scholar

Papalexiou, SM and Mamalakis, A (2025) Machine unlearning: Bias correction in neural network downscaled storms. Journal of Hydrology, 134689. https://doi.org/10.1016/j.jhydrol.2025.134689.Google Scholar

Papalexiou, SM, Serinaldi, F and Porcu, E (2021) Advancing space-time simulation of random fields: From storms to cyclones and beyond. Water Resources Research 57(8). https://doi.org/10.1029/2020WR029466.CrossRef Google Scholar

Potter, NJ, Chiew, FHS, Charles, SP, Fu, G, Zheng, H and Zhang, L (2020) Bias in dynamically downscaled rainfall characteristics for hydroclimatic projections. Hydrology and Earth System Sciences 24(6), 2963–2979. https://doi.org/10.5194/hess-24-2963-2020.CrossRef Google Scholar

Rahimi, S, Huang, L, Norris, J, Hall, A, Goldenson, N, Risser, M, Feldman, DR, Lebo, ZJ, Dennis, E and Thackeray, C (2024) Understanding the Cascade: Removing GCM biases improves dynamically downscaled climate projections. Geophysical Research Letters 51(9). https://doi.org/10.1029/2023GL106264.CrossRef Google Scholar

Rampal, N, Hobeichi, S, Gibson, PB, Baño-Medina, J, Abramowitz, G, Beucler, T, González-Abad, J, Chapman, W, Harder, P and Gutiérrez, JM (2024) Enhancing regional climate downscaling through advances in machine learning. Artificial Intelligence for the Earth Systems 3(2). https://doi.org/10.1175/aies-d-23-0066.1.CrossRef Google Scholar

Scher, S and Peßenteiner, S (2021) Technical note: Temporal disaggregation of spatial rainfall fields with generative adversarial networks. Hydrology and Earth System Sciences 25(6), 3207–3225. https://doi.org/10.5194/hess-25-3207-2021.CrossRef Google Scholar

Singh, S and Goyal, MK (2023) An innovative approach to predict atmospheric rivers: Exploring convolutional autoencoder. Atmospheric Research 289, 106754.10.1016/j.atmosres.2023.106754CrossRef Google Scholar

Stocker, T (2014) Climate Change 2013: The Physical Science Basis: Working Group I Contribution to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge, UK: Cambridge University Press.Google Scholar

Suliman, AHA, Awchi, TA, Al-Mola, M and Shahid, S (2020) Evaluation of remotely sensed precipitation sources for drought assessment in semi-arid Iraq. Atmospheric Research 242, 105007. https://doi.org/10.1016/j.atmosres.2020.105007.CrossRef Google Scholar

Tabari, H, Paz, SM, Buekenhout, D and Willems, P (2021) Comparison of statistical downscaling methods for climate change impact analysis on precipitation-driven drought. Hydrology and Earth System Sciences 25(6), 3493–3517. https://doi.org/10.5194/hess-25-3493-2021.CrossRef Google Scholar

Vogel, E, Johnson, F, Marshall, L, Bende-Michl, U, Wilson, L, Peter, JR, Wasko, C, Srikanthan, S, Sharples, W, Dowdy, A, Hope, P, Khan, Z, Mehrotra, R, Sharma, A, Matic, V, Oke, A, Turner, M, Thomas, S, Donnelly, C and Duong, VC (2023) An evaluation framework for downscaling and bias correction in climate change impact studies. Journal of Hydrology 622, 129693. https://doi.org/10.1016/j.jhydrol.2023.129693.CrossRef Google Scholar

Vrac, M, Stein, ML, Hayhoe, K and Liang, XZ (2007) A general method for validating statistical downscaling methods under future climate change. Geophysical Research Letters, 34(18). https://doi.org/10.1029/2007GL030295.CrossRef Google Scholar

Wang, F, Tian, D, Lowe, L, Kalin, L and Lehrter, J (2021) Deep learning for daily precipitation and temperature downscaling. Water Resources Research 57(4). https://doi.org/10.1029/2020WR029308.CrossRef Google Scholar

Xu, W, Jang-Jaccard, J, Liu, T, Sabrina, F and Kwak, J (2022) Improved bidirectional GAN-based approach for network intrusion detection using one-class classifier. Computers 11(6), 85. https://doi.org/10.3390/computers11060085.CrossRef Google Scholar

Yazdandoost, F, Moradian, S, Izadi, A and Aghakouchak, A (2021) Evaluation of CMIP6 precipitation simulations across different climatic zones: Uncertainty and model intercomparison. Atmospheric Research 250, 105369. https://doi.org/10.1016/j.atmosres.2020.105369.CrossRef Google Scholar

Table 1. Geographical locations of regions selected to extract high-resolution hourly precipitation with varied spatiotemporal patternsTable 1. long description.

Figure 1. The architectures of (a) convolutional encoder–decoder (Generator) and (b) critic used in this study as components of WGAN.Figure 1. long description.

Figure 2. Training and validation loss curves for convolutional encoder-decoder (top row, [a]) and WGAN (bottom row, [b]) models under three conditioning settings: binary input, intensity input, and constrained.Figure 2. long description.

Figure 3. Spatial representation of storm structures and dry/wet classification from synthetic test data. The first and second row displays the low-resolution and high-resolution storm fields whereas the third and fourth rows present their binary equivalents, respectively. The fifth to seventh rows illustrate the predicted high-resolution storm fields generated by a convolutional encoder-decoder model trained under the three distinct settings. The results presented here are obtained after classifying the predicted probabilities into dry (0) and wet (1) using a threshold of 0.5.Figure 3. long description.

Figure 4. Spatial representation of storm structures and dry/wet classification from synthetic test data. The first and second rows display the low-resolution and high-resolution storm fields, whereas the third and fourth rows present their binary equivalents, respectively. The fifth to seventh rows illustrate the predicted high-resolution binary storm fields generated by the WGAN model trained under the three distinct settings. (The results presented here are obtained after classifying the predicted probabilities into dry (0) and wet (1) with a threshold of 0.5).Figure 4. long description.

Figure 5. Comprehensive evaluation of model performance in predicting the Probability of Zero (P₀), defined as the percentage of dry pixels (i.e., pixels with zero precipitation) in each high-resolution output. The evaluation was conducted across five independently trained model runs for each of the three conditioning settings: Binary, Intensity, and Dry Constraint. (a) Convolutional encoder–decoder model. (b) WGAN model. Each scatter plot compares the predicted P₀ against the observed P₀ across the entire test set, providing insight into model calibration, bias, and consistency in preserving dry regions.Figure 5. long description.

Figure 6. Lagged spatial autocorrelation of binary dry/wet precipitation fields for increasing spatial lags (1–6 pixels) in the horizontal (top row) and vertical (bottom row) directions. Results are shown for the synthetic test dataset under the three conditioning strategies (binary input, intensity input, and constrained). Boxplots summarize the distribution of lagged spatial autocorrelation values across five independent training runs of the WGAN models with different random initializations (teal), compared against the ground truth fields (purple). Model outputs were converted to binary dry/wet maps using a probability threshold of 0.5 prior to analysis.Figure 6. long description.

Figure 7. Spatial representation of storm structures from the Radar dataset test sample and predictions from convolutional encoder–decoder models trained under the three conditioning settings. The first and second rows display the low-resolution and high-resolution precipitation fields, whereas the third and fourth rows show their corresponding binary fields. The fifth through seventh rows illustrate the predicted high-resolution fields generated under the binary, intensity, and constrained Input settings. The results presented here are obtained after converting predicted probabilities into dry (0) and wet (1) using a threshold of 0.5.Figure 7. long description.

Figure 8. Spatial representation of storm structures from the Radar dataset test sample and predictions from the WGAN trained under the three conditioning settings. The first and second rows display the low-resolution and high-resolution precipitation fields, whereas the third and fourth rows show their corresponding binary fields. The fifth to seventh rows illustrate the predicted high-resolution fields generated under the binary, intensity, and constrained input settings. The results presented here are obtained after converting predicted probabilities into dry (0) and wet (1) using a threshold of 0.5.Figure 8. long description.

Figure 9. Comprehensive evaluation of model performance in predicting the Probability of Zero (P₀) from a test sample of radar data. The evaluation was conducted across five independently trained models for each of the three conditioning settings: Binary, Intensity, and Dry Constraint. (a) Convolutional encoder–decoder model. (b) WGAN model.Figure 9. long description.

Figure 10. Lagged spatial autocorrelation of binary dry/wet precipitation fields for increasing spatial lags (1–6 pixels) in the horizontal (top row) and vertical (bottom row) directions. Results are shown for the radar test dataset under the three conditioning strategies (binary input, intensity input, and constrained). Boxplots summarize the distribution of lagged spatial autocorrelation values across five independently trained WGAN with different random initializations (teal), compared against the ground truth fields (purple). Model outputs were converted to binary dry/wet maps using a probability threshold of 0.5 prior to analysis.Figure 10. long description.

Figure 11. Improved dry/wet prediction and spatial structure following correction. (a) Applying a predicted dry/wet mask to WGAN-generated precipitation fields substantially reduces bias and RMSE in dry-pixel prediction (P₀). (b) Corrected outputs exhibit lagged spatial autocorrelations that more closely track the ground truth in both horizontal and vertical directions, indicating improved spatial coherence.Figure 11. long description.

Figure 12. Enhanced realism in corrected precipitation fields. Corrected images, generated by masking WGAN-predicted intensities using the dry/wet classification, show improved alignment with the ground truth, eliminating false wet areas and preserving sharper storm boundaries. (White space is represented as dry regions, whereas negative predictions from WGAN are represented with brown color).Figure 12. long description.

Singh et al. supplementary material

DOI: https://doi.org/10.1017/eds.2026.10039.sm001

File 2.7 MB

Author comment: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R0/PR1

Published online by Cambridge University Press: 28 May 2026

DOI: https://doi.org/10.1017/eds.2026.10039.pr1

Shivam Singh

University of Virginia, United States

Revision round: 0

Role: author

Comments

May 19, 2025

Dear Editor,

I am pleased to submit our manuscript entitled “Correcting Dry/Wet Classification Bias in Precipitation Downscaling via Generative Adversarial Networks” for consideration in Environmental Data Science.

In this study, we address a critical limitation in data-driven precipitation downscaling: the systematic overestimation of light precipitation, or “drizzle bias,” which undermines the accurate delineation of dry zones in high-resolution climate products. We develop and evaluate U-Net and Wasserstein GAN models trained on both synthetic and radar-based precipitation datasets, employing multiple conditioning strategies—including intensity fields and dry-region constraints—to predict high-resolution dry/wet classifications. Our results demonstrate that physically informed constraints significantly enhance the spatial structure, sharpness, and reliability of dry/wet delineation, with potential utility as a correction layer for statistical and deep learning downscaling frameworks.

We believe this contribution is well aligned with the scope of Environmental Data Science, particularly its focus on machine learning applications in environmental modeling, and the need for more interpretable and physically consistent data-driven methods in hydrology and climate science.

This manuscript has not been submitted elsewhere and is original work by the authors. We respectfully request that the paper be considered for peer review, and we welcome the opportunity to contribute to your journal.

Thank you for your time and consideration.

Sincerely,

Dr. Shivam Singh

(On behalf of all co-authors)

Postdoctoral Research Associate

University of Virginia

Email: wpa8me@virginia.edu

Review: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R0/PR2

Published online by Cambridge University Press: 28 May 2026

DOI: https://doi.org/10.1017/eds.2026.10039.pr2

Reviewer_1

Date of review: 11 July 2025

Revision round: 0

Role: reviewer

Recommendation/decision: minor-revision

Conflict of interest statement

Comments

This manuscript presents a comparative study of a deep learning (DL) model for downscaling precipitation data. It introduces a novel approach for high-resolution dry/wet classification and employs three distinct training strategies to effectively capture spatially and physically consistent dry/wet patterns. The authors conclude that the WGAN method, when combined with physically informed constraints, outperforms others in capturing these patterns.

The study is well-structured and holds merit for publication, however, some improvements in language and clarity are required.

Authors are suggested to improve the language.

Review: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R0/PR3

Published online by Cambridge University Press: 28 May 2026

DOI: https://doi.org/10.1017/eds.2026.10039.pr3

Marcello Iotti CMCC Foundation - Euro-Mediterranean Center on Climate Change, Italy

Date of review: 11 December 2025

Revision round: 0

Role: reviewer

Recommendation/decision: major-revision

Conflict of interest statement

Reviewer declares none.

Comments

A Review of “Correcting Dry/Wet Classification Bias in Precipitation Downscaling via Generative Adversarial Networks”

(EDS-2025-0023)

Singh et al. (2025) propose a deep-learning framework for classifying dry versus wet conditions in downscaled precipitation fields. After outlining limitations in current downscaling approaches—particularly their tendency to produce unrealistic drizzle—the authors introduce a convolutional encoder–decoder architecture, used both as a standalone classifier and as the generator of a conditional WGAN.

They explore several conditioning strategies: (i) binary dry/wet LR fields, (ii) LR precipitation intensity, and (iii) LR precipitation intensity with an additional hard constraint enforcing that LR dry regions remain dry in the HR prediction.

The models are trained and evaluated using both synthetic precipitation fields generated with the Complete Stochastic Modeling System (CoSMoS) and hourly accumulated precipitation fields from the NCEP Multi-Radar/Multi-Sensor (MRMS) system. Model performance is assessed through the probability of zero precipitation and spatial autocorrelation. Finally, the authors apply the dry/wet mask produced by their classifier to post-process high-resolution precipitation fields generated by an independent WGAN trained on precipitation intensity only.

Overall, the manuscript is well structured, and the research problem is relevant and potentially impactful for several hydro-meteorological applications. The idea of explicitly correcting the dry/wet classification following intensity downscaling is conceptually sound and represents a valuable direction for improving physical realism.

However, I have several concerns regarding methodological choices, including aspects of data preprocessing, aggregation procedures, architectural terminology, and the interpretation of GAN training behaviour. In addition, multiple sections lack clarity or would benefit from a more rigorous formulation. Several parts of the manuscript could also be reorganised, consolidated, or streamlined to improve overall readability.

For these reasons, I recommend major revisions before the manuscript can be reconsidered for publication.

Below I provide a list of specific comments aimed at improving the clarity, methodological rigour, and scientific quality of the manuscript.

L60–61

The expression “at finer spatial and temporal scales” is vague because the reference baseline is not specified. Please consider clarifying/rephrasing.

L91–92

“GANs excel in preserving storm structures and spatial heterogeneity (…) over traditional regression-based models.” GAN advantages have been demonstrated not only relative to regression-based models, but also compared to more sophisticated downscaling methods. You may want to broaden this statement.

L128–129

“… the binary nature of precipitation.” It seems you are referring to the binary nature of precipitation occurrence, not precipitation itself. Please clarify.

L129–130

The expression “traditional post-processing ‘unlearning’ or bias correction steps” is unclear, particularly concerning the meaning of ‘unlearning’ in this context; please clarify.

L131–132

“classification-augmented GAN framework” - even if understandable, the term augmented may be confusing. Consider rephrasing.

L134–135

Using regime to indicate dry/wet states is not ideal, as “precipitation regime” has a specific meaning in climate science. I recommend replacing it with “dry/wet state” or similar wording.

L146–148

You refer to synthetic datasets of size 60×60. This describes only the domain extent, not the spatial resolution or the physical scale of the fields. Because the study compares performance across synthetic and MRMS datasets, it should be clarified whether these datasets are spatially consistent not only in terms of extent, but also in terms of physical scale. GAN-based downscaling models may not generalize well across datasets with substantially different spatial structures.

Please clarify whether the synthetic CoSMoS fields were generated to mimic the spatial structure of MRMS fields, and whether the spatial consistency between the two datasets has been assessed (e.g., by comparing their spatial power spectra or other scale-dependent metrics).

L159–160

The procedure used to compute the LR counterpart of the synthetic dataset is not described in the previous section. Please specify.

L161–162

The MRMS dataset covers 600×600 km at 1 km resolution. From your description, the HR and LR targets seem to be 60×60 at 10 km resolution and 6×6 at 100 km resolution. If this interpretation is correct, please state it explicitly. Additionally, “aggregated (re-gridded)” mixes two distinct concepts; please specify the exact procedure applied: interpolation, aggregation, conservative remapping, or another method.

L163–164

You refer to “evaluation data”, but based on later sections the radar dataset is actually used for training. Evaluation terminology may therefore be misleading. Perhaps replace with “used to assess performance” or similar wording.

L177–179

The first sentence repeats content from Lines 161–162. Additionally, the use of “evaluation dataset” can again be misleading. Please also clarify how dry days were filtered: did you remove only cases with zero precipitation across the entire domain, or did you apply a threshold to define dry conditions?

L186

Why was a subset of the radar images used after filtering?

L196

“… three major components.” Consider using an alternative expression such as “the methodology focuses on three main elements”.

Section 3.1

You refer to your architecture as a U-Net. However, the model lacks skip connections and a symmetric encoder–decoder structure (cf. Ronneberger et al. 2015, https://doi.org/10.1007/978-3-319-24574-4_28). Unless there is a specific justification, a more accurate description would be “convolutional encoder–decoder network”. Please revise or explain your choice of terminology.

L209

“reduces spatial dimension” — more accurately, this reduces resolution, not spatial extent.

L213 and elsewhere

The manuscript states that “the CNN upscales,” but the network actually performs downscaling (predicting high-resolution fields from low-resolution inputs). In atmospheric sciences, upscaling refers to the opposite process. Please check and correct the terminology.

L231–233

This appears to be the first explicit mention that the GAN is conditional. It would help to introduce the concept more smoothly, explaining that conditioning is used to encourage generated fields to be consistent with the LR input. Moreover, Fig. 1 suggests that the LR field is concatenated with intermediate feature maps rather than being provided at the network input; this technical detail should be explicitly described in the text. Finally, note that these intermediate feature maps may not preserve strict physical spatial alignment with the HR or LR fields. While this does not necessarily invalidate the approach, you could provide a brief comment or justification regarding this choice.

L241–242

“generator in case of WGAN” — this is already clear. The clarification is unnecessary.

L253–264

This description appears more appropriate in Section 3.1.

L255

Regarding the noise input, “conditioning” may not be the most appropriate term.

L256

What is meant by “spatially correlated noise”? Please clarify.

L263–264, 352–353

What is the rationale behind varying the random seed? Since the seed affects only the noise generation, one could argue for using a fixed seed for train/validation/test. Please justify the chosen approach.

L266–267

If the encoder/decoder suppresses noise effects, what is the motivation for including this noise?

L268–272

The manuscript states that variability in the generated outputs is “encouraged by the critic,” suggesting that the critic motivates the generator to utilize the noise input. While it is true that the training dynamics can influence how the generator exploits noise, it is more precise to clarify that the fundamental source of variability is the noise vector itself. I recommend revising the text to accurately reflect this concept.

Section 3.3

This section could benefit from reworking. For example:

• Standard GANs do not use the Wasserstein loss (L296).

• With conditional inputs, the equations require adaptation.

Consider either simplifying the mathematical exposition or making it rigorously consistent with your model.

L319

“Parameters” and “iterative” - please use precise terminology and clarify what aspect of hyperparameter determination was iterative.

L325

“The critic was trained for more steps (three) than the generator.” Please clarify: do you mean one generator update every three critic updates?

L326–329

I recommend consolidating the description of encoder/decoder CNN training separately from WGAN training.

L343–344

“Precipitation gradients” is unclear. Please clarify what gradients you are referring to.

L349–350

The meaning of the hard dry constraint could be clarified. Are HR pixels corresponding to dry LR pixels forced to be dry? Is this not overly restrictive?

L353–355

Although robustness to noise is a valid point, presenting it as “uncertainty quantification” is not accurate, since the study does not perform formal uncertainty analysis. I recommend removing this reference.

L359–361

Consider consolidating this text with the earlier paragraphs at Lines 330–351.

L380

“bias” — it appears you refer to the mean bias. Please clarify.

Figure 2b

The training curves raise concerns. In particular, the critic loss appears nearly flat around zero, while the generator loss oscillates strongly with large magnitude. This suggests that the critic may not be providing informative gradients to the generator, potentially indicating unstable training or mode collapse. The authors should comment on these observations and clarify whether and how they monitored the critic loss throughout training.

L519

“Pearson autocorrelation” — earlier you introduced spatial autocorrelation. Please adopt consistent terminology.

L560–562

The meaning of “input magnitude ~1400” is unclear. Are synthetic fields dimensionless?

More importantly, were the datasets normalized before being fed to the DL models, as is standard practice?

L564–566

This observation could also apply to the other experiments (see comment on Fig. 2b).

Recommendation: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R0/PR4

Published online by Cambridge University Press: 28 May 2026

DOI: https://doi.org/10.1017/eds.2026.10039.pr4

Duncan Watson-Parris

University of Oxford Department of Physics, United Kingdom of Great Britain and Northern Ireland

Date of review: 19 December 2025

Revision round: 0

Role: Editor

Recommendation/decision: major-revision

Comments

Please be sure to respond to all of the reviewers comments and concerns in your response.

Decision: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R0/PR5

Published online by Cambridge University Press: 28 May 2026

DOI: https://doi.org/10.1017/eds.2026.10039.pr5

Claire Monteleoni University of Colorado Boulder, United States

Revision round: 0

Role: Editor in Chief

Recommendation/decision: major-revision

Comments

No accompanying comment.

Author comment: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R1/PR6

Published online by Cambridge University Press: 28 May 2026

DOI: https://doi.org/10.1017/eds.2026.10039.pr6

Shivam Singh

University of Virginia, United States

Revision round: 1

Role: author

Comments

Jan 21, 2026

Dear Editor,

We are pleased to resubmit our revised manuscript entitled “Correcting Dry/Wet Classification Bias in Precipitation Downscaling via Generative Adversarial Networks” for reconsideration in Environmental Data Science.

We sincerely thank you and the reviewers for the constructive feedback, which helped us improve the clarity, and presentation of the manuscript. In this revised submission, we have carefully addressed all reviewer comments and provide a detailed point-by-point response in the accompanying response document. The revised manuscript has been substantially improved in terms of language clarity, methodological description, terminology consistency, and overall organization.

In this study, we address a critical limitation in data-driven precipitation downscaling: the systematic overestimation of light precipitation, or “drizzle bias,” which undermines the accurate delineation of dry regions in high-resolution climate products. We develop and evaluate a convolutional encoder–decoder model and a conditional Wasserstein GAN framework trained on both synthetic and radar-based precipitation datasets, using multiple conditioning strategies, including intensity-based inputs and physically informed dry-region constraints to improve high-resolution dry/wet classification. Our results demonstrate that incorporating physically informed constraints enhances the spatial structure, sharpness, and reliability of dry/wet delineation, with potential utility as a correction layer for statistical and deep-learning downscaling frameworks.

We believe this contribution aligns well with the scope of Environmental Data Science, particularly its focus on machine learning methods for environmental modeling and the development of more physically consistent and interpretable data-driven approaches in hydrology and climate science.

This manuscript has not been submitted elsewhere and represents original work by the authors. We respectfully request that the revised manuscript be reconsidered for publication in Environmental Data Science. Thank you for your time and consideration.

Sincerely,

Dr. Shivam Singh

(On behalf of all co-authors)

Postdoctoral Research Associate

University of Virginia

Email: wpa8me@virginia.edu

Review: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R1/PR7

Published online by Cambridge University Press: 28 May 2026

DOI: https://doi.org/10.1017/eds.2026.10039.pr7

Reviewer_1

Date of review: 23 January 2026

Revision round: 1

Role: reviewer

Recommendation/decision: accept

Conflict of interest statement

Comments

The manuscript is ready for publication.

Review: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R1/PR8

Published online by Cambridge University Press: 28 May 2026

DOI: https://doi.org/10.1017/eds.2026.10039.pr8

Marcello Iotti CMCC Foundation - Euro-Mediterranean Center on Climate Change, Italy

Date of review: 26 February 2026

Revision round: 1

Role: reviewer

Recommendation/decision: minor-revision

Conflict of interest statement

Reviewer declares none.

Comments

A Review of “Correcting Dry/Wet Classification Bias in Precipitation Downscaling via Generative Adversarial Networks” (EDS-2025-0023.R1)

I thank the authors for considering the comments I raised during the first round of review.

The quality of the manuscript has clearly improved, and most of the technical and scientific aspects have been clarified.

The current version of the paper is nearly acceptable.

I have some remaining concerns regarding the form of the manuscript, as well as some imprecisions I noticed while reading the current draft.

Moreover, some passages could benefit from further refinement and clarification.

I would therefore encourage the authors to carefully review the manuscript to improve clarity.

These observations do not constitute grounds for delaying acceptance.

For this reason, I recommend conditional acceptance with minor revisions, provided that the authors further refine the manuscript, based on, but not limited to, the comments below.

Specific Comments

L1-2: The title emphasizes the use of GANs, while the manuscript presents a comparative assessment of dry/wet classifiers based on both CNN and GAN architectures, under different input configurations. The authors may consider broadening the title to reflect both CNN and GAN, or reducing the emphasis on GANs.

L21-22 and elsewhere (resolution vs extent, storm input): I suggest removing brackets around “6x6” and “60x60” to avoid implying that these are resolutions rather than grid sizes. Regarding “storm input”, consider using “precipitation input” to reflect the broader types of events included in the radar dataset.

L31-32 and throughout (model naming): The GAN and CNN architectures could be referred to consistently as “the WGAN” and “the convolutional encoder–decoder”. Using the plural could imply that different conditioning or initialization strategies represent separate models, which is not the case in my opinion.

L37–39 and elsewhere (physically informed constraints): I suggest clarifying that your findings specifically highlight the benefit of the proposed strategy for this bias-correction task, rather than making a broad claim about physically informed constraints (i.e., those enforcing physical laws).

L61: “On the order of” → this is a range (0.25°-1°), rather than an order of magnitude.

L96: Remove “regression-based” unless justified, to avoid ambiguity.

L119-120: “Precipitation frequency” → Consider “precipitation event occurrence frequency”.

L129: “Extremes i.e. drought” → Drought represents only one extreme; please clarify.

L131: The encoder-decoder architecture is not included.

L155: Avoid phrases such as “fine spatial resolution of 60x60”; consider “synthetically generated storm field at a fine spatial resolution, with size 60x60”.

L167: “Access” → Did you mean “assess”?

L172-182: “The MRMS dataset spans a three-month winter period (November–January 2023)” - It seems that you extracted a subset of MRMS for your analysis. Please clarify. Also, consider consolidating the description of MRMS extraction.

L183: “To ensure consistency” → Consider “to assure consistent I/O configuration”.

L188-190: Please introduce the thresholding procedure in Sec. 2.1 before referencing it in later sections.

L215-219: Consider removing redundant statements about study goals.

L221: Avoid “for super-resolution”; it is not fully descriptive of the downscaling task.

L227: “Binary” unnecessary in this context.

L234: Please clarify “convolutional block” vs single convolutional layer.

L266-270: Consider consolidating with L274-275.

L277-278: “Design choice intentional and commonly adopted” → Add a reference or just drop the “commonly adopted” part.

L292: Move detailed storm counts to Sec. 2.1.

L318-327: The current description of the conditioning strategies may suggest that their impact is already established. I recommend clearly separating the description of the strategies from any claims about their effects. Any expected impact can be mentioned briefly as such, without implying that it has been assessed at this stage.

L336: Clarify “predicted probability” vs “predicted label”.

L336-337: You state before that the pixel-wise binary cross-entropy loss is optimized; avoid implying that accuracy is directly optimized.

L345-349 (Equation 2 and symbols): Please define all terms and correct any imprecisions.

L351-353: “Following established practice to maintain a well-trained critic” → Simplifying to “following established practice” would be clearer.

L359–366: Please clarify that the multiple training runs correspond to different random initializations used to assess robustness, rather than to distinct models.

The statement “Model outputs are probabilistic by design” appears to refer to the fact that the output layer provides pixel-wise wet/dry probabilities; I suggest making this explicit to avoid ambiguity and to improve technical clarity. Moreover, since this paragraph is not specific to the WGAN architecture, I recommend relocating it to the general methodological section.

L368: “Storm prediction” → Consider “dry/wet classification.”

L389: “The binary evaluation of dry and wet regions” → What you actually evaluate are the models.

L390: “Binary” unnecessary; use only if explicitly clarifying that you are predicting a binary wet/dry field.

L402-403: The repetition of the metric name and its brief definition appears redundant at this stage, as both have already been introduced and discussed.

L406–408 and throughout: Please use consistent naming for the conditioning settings (binary, intensity, and constrained input, as defined in Sec. 3.3) and avoid referring to them as different models.

L431-432: “Enforcing a fixed critic-to-generator update ratio” → Remove; it is a training setting, not a stability indicator.

L439–455 and elsewhere: I recommend avoiding repeated references to the same figure within a single paragraph.

L445: “Trends” → Consider “behaviour”.

L456: Clarify phrasing to indicate sharper, not “less smooth,” results.

L483, L552: Even if understandable I recommend avoiding the term “cues”.

L501: As previously discussed, no formal uncertainty quantification is performed in the manuscript. I therefore suggest removing “reflecting increased uncertainty”.

L508 (and elsewhere): The term “physically consistent” is not ideal here, as the improvement on the metric does not imply enforcement of physical laws (and does not constitute a straightforward suggestion of physical consistency). I suggest replacing it with “consistent” or “accurate”.

L514: I suggest moving the initialization details (seed list) to the Methods section and clarifying—if not already done—that the reported independence refers to different initializations, not distinct models.

L584–586: Following your response to my previous comment #36, I suggest including the relevant information in the manuscript. Specifically, indicate that 1400 is the maximum value of the synthetic field, as “cumulative value” is too vague. Avoid phrasing such as “typically limited to about 5 mm h⁻¹”; stating the order of magnitude, or the maximum value would be more scientifically accurate and clearer for the reader. Additionally, for improved reproducibility, consider briefly mentioning in the Methods section your choice not to perform data scaling, as discussed in your response.

L582–594: After reviewing the results you provided and your responses to comments #34 and #37 (thank you for the clarifications), I think I understand the point you are making. However, some aspects of the results remain not entirely straightforward. In particular, the statement “the critic likely provides weak or uninformative gradients, as evidenced by unstable generator losses” might be somewhat strong.

Looking at Fig. 2b, it is not immediately clear from the generator training curves under the three conditioning settings which configuration performs better. For instance, for the intensity input, the two lower and the uppermost curves appear to increase asymptotically, making it difficult for me to confidently attribute better performance to this setting relative to the binary input. A similar observation applies to Fig. S3b, where the generator curves under all conditioning settings also show asymptotic increases.

These observations do not invalidate the results, especially when considering your discussion in the response. Since you note that “loss magnitudes in WGAN-GP differ fundamentally from those in standard GANs and are not directly interpretable as indicators of convergence or training stability”, I suggest a more cautious framing of statements such as:

“Under the Binary Input setting, the critic likely provides weak or uninformative gradients, as evidenced by unstable generator losses. In contrast, when dry-region constraints are applied alongside intensity inputs, the generator becomes more stable and better aligned across seeds, suggesting that the constraint indirectly enhances adversarial balance and learning by guiding both networks toward physically plausible outputs.”

Specifically, I recommend avoiding over-interpretation, as it is not straightforward from the provided plots to determine under which conditions the generator is more stable across different conditioning strategies. Rephrasing the discussion to focus on the observations from the training loss curves, incorporating the concepts you described in response to my comment #34, and framing interpretations more narrowly, while retaining your (correct) note on WGAN-GP loss magnitudes for the different training sets, would improve scientific precision and clarity.

L610–611, 614: “Convolutional encoder-decoder models trained under three different/distinct settings” → Consider consolidating these sentences to avoid redundancy and improve clarity.

L619: “From convolutional encoder–decoder models”—did you mean from the WGAN?

L626–627: “The added gradient” → As before, I suggest framing this as an interpretation.

L638–639: Consider suppressing “physically plausible.”

L648: Clarify “negative bias” (Fig. 9a reports 0.02).

L650: “Real-valued” unnecessary.

L693: “Autocorrelation curves” → Suppress “curves”.

L707 and L765–766: Instead of “physically informed constraint/filter”, I suggest referring to the constrained input as a “consistency constraint”, since no physical law is enforced. In contrast the intensity input is, in my view, simply an input feature of the model.

L713: I apologize for having missed this point in the previous round of review. At this stage, you introduce a separate WGAN trained to predict precipitation intensity, but no details are provided about this model. It may be useful to briefly describe its architecture and training setup, even in a very concise form, possibly in the Supplementary Material.

L717–721: The statements “these models commonly exhibit drizzle bias” and “this is a well-documented limitation” would benefit from one or more supporting literature references, possibly recalling works already cited in the Introduction.

L827: In this paragraph, other URLs are reported in full, whereas the link to the code is provided only via “here” and appears to be broken. You may consider inserting the full link in textual form.

Recommendation: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R1/PR9

Published online by Cambridge University Press: 28 May 2026

DOI: https://doi.org/10.1017/eds.2026.10039.pr9

Duncan Watson-Parris

University of Oxford Department of Physics, United Kingdom of Great Britain and Northern Ireland

Date of review: 02 March 2026

Revision round: 1

Role: Editor

Recommendation/decision: minor-revision

Comments

Thank you for your recent revisions, which the reviewers agree significantly improved the manuscript. Please carefully consider the reviewers latest (more minor) revisions in preparing a new version.

Decision: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R1/PR10

Published online by Cambridge University Press: 28 May 2026

DOI: https://doi.org/10.1017/eds.2026.10039.pr10

Julien Brajard

NERSC, NERSC, Norway

Revision round: 1

Role: Editor in Chief

Recommendation/decision: minor-revision

Comments

No accompanying comment.

Author comment: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R2/PR11

Published online by Cambridge University Press: 28 May 2026

DOI: https://doi.org/10.1017/eds.2026.10039.pr11

Shivam Singh

University of Virginia, United States

Revision round: 2

Role: author

Comments

March 18, 2026

Dear Editor,

We are pleased to resubmit the revised version of our manuscript entitled “Correcting Dry/Wet Classification Bias in Precipitation Downscaling via Generative Adversarial Networks” for reconsideration in Environmental Data Science.

We sincerely thank you and the reviewers for the constructive and thoughtful feedback. We are encouraged that the manuscript has been substantially improved and is close to publication, subject to minor revisions. We have carefully addressed all remaining comments. In this revision, we have focused on improving clarity, refining terminology, and ensuring consistency in presentation. A detailed, point-by-point response to all reviewer comments is provided in the accompanying document.

In this study, we address a key limitation in data-driven precipitation downscaling: the misrepresentation of dry and wet regions, often associated with drizzle bias, which affects the realism of high-resolution precipitation fields. We develop and evaluate a convolutional encoder–decoder model and a conditional Wasserstein GAN framework using synthetic and radar-based datasets under multiple conditioning strategies, including intensity-based inputs and a consistency constraint to improve dry/wet classification. Our results demonstrate that incorporating this constraint improves the representation of dry regions and enhances the overall reliability of downscaled precipitation fields. We further illustrate how the predicted dry/wet masks can be used to post-process intensity-based predictions, reducing spurious light precipitation and improving agreement with observed dry/wet distributions.

We believe that this work aligns well with the scope of Environmental Data Science, particularly in advancing machine learning approaches for environmental modeling and improving the reliability and interpretability of data-driven downscaling methods.

This manuscript represents original work and has not been submitted elsewhere. We respectfully request that the revised manuscript be considered for publication in Environmental Data Science.

Thank you for your time and consideration.

Sincerely,

Dr. Shivam Singh

(On behalf of all co-authors)

Postdoctoral Research Associate

University of Virginia

Email: wpa8me@virginia.edu

Review: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R2/PR12

Published online by Cambridge University Press: 28 May 2026

DOI: https://doi.org/10.1017/eds.2026.10039.pr12

Marcello Iotti CMCC Foundation - Euro-Mediterranean Center on Climate Change, Italy

Date of review: 12 April 2026

Revision round: 2

Role: reviewer

Recommendation/decision: accept

Conflict of interest statement

Reviewer declares none.

Comments

A Review of “Correcting Dry/Wet Classification Bias in Precipitation Downscaling via Generative Adversarial Networks”

(EDS-2025-0023.R2)

The authors have addressed the points raised in the previous review rounds, and the manuscript has significantly improved as a result. In its current form, the overall quality of the work is suitable for publication.

From my side, I consider the review process to be complete and recommend acceptance of the manuscript. Below, I provide a set of final, mostly minor suggestions, which I encourage the authors to consider in order to further improve clarity, precision, and overall consistency of the text.

L22 “binary classification fields” → “binary dry/wet classification fields”

L23–24 “convolutional encoder-decoder and conditional Wasserstein Generative Adversarial Network (WGAN)” → “a convolutional encoder-decoder and a conditional Wasserstein Generative Adversarial Network (WGAN)”

L26 “and adding physical constraints” → “with an added consistency constraint”

L26 “trained and validated” → “trained and evaluated”, or “trained, validated, and evaluated”

L30 “physical constraints” → “the consistency constraint”

L31 Suppress “generalization, and physical consistency”

L31–32 “WGAN models” → “the WGAN”

L37 Suppress “physically informed”. Consider “…the value of this bias-correction strategy…”

L38 “and” → “the”

L46–47 “convolutional encoder-decoders… WGAN” → “the convolutional encoder-decoder… the WGAN”

L47–48 “dry-region constraints” → “a dry region constraint”

L48 Suppress “physically consistent”

L88 “…rather than its intermittent…” → “…rather than accounting for its intermittent nature…”

L136 “thereby reducing false wet and false dry predictions” → not demonstrated yet; consider “aiming at reducing…”

L143 Suppress “(WGAN)”

L144 “and spatial continuity” → “and the spatial continuity of the precipitation field”

L145 Suppress “physical consistency and interpretability”; consider replacing with “accuracy”

L164 “following aggregation” → “after aggregation”

L165–166 Introduce thresholding procedure here; remove current sentence

L177–179 → Improve wording; For example: “… a subset of MRMS hourly precipiation data was extracted …, for a three-month winter period ...”

L187–189 → Simplify; For example: “To ensure a consistent I/O configuration with the synthetic experiments, the MRMS data were processed to obtain 6×6 low-resolution inputs and 60×60 high-resolution targets”

L192 “Following aggregation” → “After aggregation”

L205 “diagnostic” → “training and diagnostic value”

L213–214 suppress “and early stopping”

L219–220 “convolutional encoder-decoder and WGAN” → “a convolutional encoder-decoder and a WGAN”

L256 Please be more specific at this point. For example, “a generator, using the same convolutional encoder-decoder architecture defined above, that performs… ”

L270 “allowing the critic” → “allowing it”

L274–275 “Convolutional encoder-decoder (Generator)” → “convolutional encoder-decoder / generator”

L275 Suppressing “as components of WGAN.” would make clearer the caption, as the network in Figure a is both the standalone convolutional encoder-decoder and the WGAN’s generator.

L289–290 Suppress reference to Fig. 1

L294–296 Remove repeated description of dataset split and shuffling (already introduced)

L297 “the generator” → “the convolutional encoder-decoder / generator”

L298 “stochastic noise input” → “noise input”

L308–309 The critic does not generate variability itself ...” generating variability is not the role of the critic. Consider simplifying to “The critic constrains how stochastic variability is expressed...”.

L313–314 “the model are trained and evaluated independently…” → remove this sentence and consolidate the description of transfer learning and cross-dataset generalization with L293–294.

L325–326 “five independent training runs…” → “five training runs with different random initializations”

L334 suppress “independently”

L335-336 Clarify “with the objective of classifying each HR pixel”. Considers “with the objective of producing a HR dry/wet fields, starting from LR information”.

L341 “using a threshold of precipitation > 0” → Explicitly define thresholding, e.g. “1 if precipitation intensity is > 0”

L343–344 Remove “deterministic” (model is stochastic by construction, even though training suppress the input noise)

L358–360 Consolidate with L355

L361 Simplify to “C is the critic score”

L413: “a binary input” → “the binary input”

L417 Suppress duplicated Fig. 2a reference

L420 Replace “physically meaningful constraints” → “a consistency contraint”; remove the reference to bias correction at this stage, as it is introduced later in Sec. 6.

L428–429 “dry constraint” → “the dry constraint”

L429–430 “anchoring the adversarial learning process toward physically plausible dry/wet spatial configurations” is an interpretative sentence → rephrase cautiously

Caption Fig. 2: Clarify that the training/validation curves refer to the encoder–decoder, while for the GAN there are separate curves for the generator and the critic. Moreover, explicitly state that different lines correspond to different training initializations.

L457 “with a dry constraint” → “using the constrained input”

L460–461 “The wet/dry fields show improved sharpness and are much more realistic” remove redundancy and consolidate with the preceding sentence.

L474–475 Replace “physically consistent” → “more realistic / more accurate”. Additionally, remove the duplicated reference to Figure 4.

L476–477 Frame as interpretation

L481 Suppress “more interpretable output”

L482 Suppress “sometimes”

L502–503 “The result is a modest dry bias that enhances physical consistency” rephrase bias interpretation, e.g. “the result is a reduction in the dry bias, particularly...”

L519 “Dry Constraint” → “Constrained Input”

L532 Suppress “instability”

L535 Suppress “and improved generalization”

L536 “Constrained convolutional encoder-decoder” → “The conv. encoder-decoder with constrained input”

L546 “WGAN models” → “the WGAN model”

L550 “indicating” → split into two sentences, as the IQR, per se, is not directly related to consistency across different training initializations.

L592–593 specify a number or the exact order of magnitude (e.g., 10 mm/h)

L594: the difference in input scale alone could explain the observed fluctuation range in the loss curves. The relation between input scale and gradient smoothness is not straightforward; consider removing “produces smoother gradients during optimization”.

L611 “(Figures 7-8)”remove duplicated figure reference

L615 “models” → “model”

L632 “The added gradient” remove or reframe as interpretation

L642 Suppress “nevertheless”

L644–645 “The most accurate outputs are again achieved by the WGAN when it is constrained” → avoid repetition.

L648–649 “across all five independently trained models” → “across models trained with five different initializations”.

L653 “with a” remove repetition

L659 Suppress “with reduced outliers”

L666–667 “five independently trained WGAN” → “the WGAN trained with five different random initializations”.

L674–675 simplify sentence, remove “with greater spread relative to the intensity-based setting”

L679 Remove “but”

L687-688 “naturally decay due to fine-scale variability… artificially …” remove causal attribution to fine-scale variability. The statement attributing correlation decay at high spatial lags to “fine-scale variability” is not necessarily justified, as other factors may also contribute. I suggest removing this part and reporting only the observed behavior from the boxplots.

L691–692 “slightly more controlled correlation trends” → “correlations more consistent with the ground truth”

L694–695 “elevated interquartile variability, suggesting greater uncertainty and instability in capturing long-range spatial structures”. I understand that this refers to elevated IQR variability across the five different training initializations (seeds). If so, please specify this clearly. Moreover, as already discussed, I would suppress “uncertainty”.

L703 “training” → “input”

L705 “for WGAN models” → “for the WGAN”

L709–710 Clarify: “overall probability of dry condition” → “probability of zero (P₀)”

L710-711 “the observed spatial autocorrelation structure” → “the observed ground truth”

L724–726: this sentence essentially repeats content already mentioned above; consider removing it.

L733: Suppress “and enforcing spatial dryness constraint”.

L743 “awareness” → “information”

L749 “dry pixel prediction” → “probability of zero (P₀)”

L754 Suppress “stable”

L755 Suppress “patterns”

L757 Suppress “structural integrity”

L779 Remove “physically”

L780 Remove “interpretability”

L787 Remove “physically consistent”

L800 “their” → “its”

L807, L811: drop “physically informed”; as discussed there is no enforcement of physical laws in this study.

L812 Suppress “physically consistent”

L815 Remove “interpretable”

L834-835 “These data were used to evaluate model performance on real-world storm structures and dry/wet classification accuracy”. Suppress this sentence, as it is redundant with the Methods section and not appropriate for the Data Availability Section.

Recommendation: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R2/PR13

Published online by Cambridge University Press: 28 May 2026

DOI: https://doi.org/10.1017/eds.2026.10039.pr13

Duncan Watson-Parris

University of Oxford Department of Physics, United Kingdom of Great Britain and Northern Ireland

Date of review: 15 April 2026

Revision round: 2

Role: Editor

Recommendation/decision: accept

Comments

I’m pleased to accept your manuscript. Please take care to address the (many) minor typos and text suggestions from the reviewer in preparing the final version.

Decision: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R2/PR14

Published online by Cambridge University Press: 28 May 2026

DOI: https://doi.org/10.1017/eds.2026.10039.pr14

Julien Brajard

NERSC, NERSC, Norway

Revision round: 2

Role: Editor in Chief

Recommendation/decision: accept

Comments

No accompanying comment.

Article contents

Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks

Abstract

Keywords

Information

Impact Statement

1. Introduction

2. Data

2.1. Synthetic data

2.2. Radar data

3. Methods

3.1. Convolutional encoder–decoder

3.2. Conditional Wasserstein GAN (WGAN)

3.3. Training strategy

3.3.1. Convolutional encoder–decoder network training

3.3.2. Conditional WGAN training

4. Evaluation for dry/wet storm prediction

4.1. Probability of zero

4.2. Spatial autocorrelation

5. Results

5.1. Synthetic storm data

5.2. Radar-based storm data

6. Discussion

7. Conclusion

Open peer review

Supplementary material

Acknowledgements

Author contribution

Competing interests

Data availability statement

Footnotes

References

Singh et al. supplementary material

Author comment: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R0/PR1

Comments

Review: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R0/PR2

Conflict of interest statement

Comments

Review: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R0/PR3

Conflict of interest statement

Comments

Recommendation: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R0/PR4

Comments

Decision: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R0/PR5

Comments

Author comment: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R1/PR6

Comments

Review: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R1/PR7

Conflict of interest statement

Comments

Review: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R1/PR8

Conflict of interest statement

Comments

Recommendation: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R1/PR9

Comments

Decision: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R1/PR10

Comments

Author comment: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R2/PR11

Comments

Review: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R2/PR12

Conflict of interest statement

Comments

Recommendation: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R2/PR13

Comments

Decision: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R2/PR14

Comments

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests