Hostname: page-component-5db58dd55d-h5th4 Total loading time: 0 Render date: 2026-06-01T10:00:18.247Z Has data issue: false hasContentIssue false

Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks

Published online by Cambridge University Press:  28 May 2026

Shivam Singh*
Affiliation:
University of Virginia, USA
Simon Michael Papalexiou
Affiliation:
University of Calgary, Canada
Hebatallah M. Abdelmoaty
Affiliation:
University of Calgary, Canada
Tom Hartvigsen
Affiliation:
University of Virginia, USA
Antonios Mamalakis
Affiliation:
University of Virginia, USA
*
Corresponding author: Shivam Singh; Email: wpa8me@virginia.edu

Abstract

Accurately downscaling precipitation from coarse to high spatial resolutions remains a critical challenge in climate and hydrometeorological modeling. A key limitation is the frequent misclassification of dry and wet regions, which compromises the realism and reliability of high-resolution outputs. To address this, we propose a deep learning-based downscaling framework that explicitly models dry/wet classification by transforming low-resolution 6×6 precipitation inputs into high-resolution 60 × 60 binary classification fields. We evaluate two architectures, a convolutional encoder-decoder and a conditional Wasserstein generative adversarial network (WGAN), utilizing three training strategies: (1) using binary wet/dry inputs, (2) using precipitation intensity inputs, and (3) using precipitation intensity inputs and adding physical constraints. Models are trained and validated on both synthetically generated precipitation fields and real radar-estimated hourly precipitation data over the contiguous United States. Performance is assessed using metrics including the overall probability of zero ($ {\mathrm{P}}_0 $) and spatial autocorrelations. Results show that incorporating intensity information improves dry/wet classification, while adding physical constraints further enhances accuracy, generalization, and physical consistency, especially for WGAN models. The convolutional encoder-decoder produces smoother outputs with stable performance regarding marginal statistics, whereas the WGAN generates sharper boundaries and more realistic dry/wet fields, improving on the spatial dependence structure. Furthermore, we demonstrate that the derived dry/wet classification fields can be used as binary masks to bias correct downscaled precipitation fields, enhancing both statistical fidelity and spatial realism. These findings highlight the value of a physically informed bias-correction strategy for improving the spatial realism of high-resolution precipitation fields from coarse-scale inputs.

Information

Type
Application Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press
Figure 0

Table 1. Geographical locations of regions selected to extract high-resolution hourly precipitation with varied spatiotemporal patternsTable 1. long description.

Figure 1

Figure 1. The architectures of (a) convolutional encoder–decoder (Generator) and (b) critic used in this study as components of WGAN.Figure 1. long description.

Figure 2

Figure 2. Training and validation loss curves for convolutional encoder-decoder (top row, [a]) and WGAN (bottom row, [b]) models under three conditioning settings: binary input, intensity input, and constrained.Figure 2. long description.

Figure 3

Figure 3. Spatial representation of storm structures and dry/wet classification from synthetic test data. The first and second row displays the low-resolution and high-resolution storm fields whereas the third and fourth rows present their binary equivalents, respectively. The fifth to seventh rows illustrate the predicted high-resolution storm fields generated by a convolutional encoder-decoder model trained under the three distinct settings. The results presented here are obtained after classifying the predicted probabilities into dry (0) and wet (1) using a threshold of 0.5.Figure 3. long description.

Figure 4

Figure 4. Spatial representation of storm structures and dry/wet classification from synthetic test data. The first and second rows display the low-resolution and high-resolution storm fields, whereas the third and fourth rows present their binary equivalents, respectively. The fifth to seventh rows illustrate the predicted high-resolution binary storm fields generated by the WGAN model trained under the three distinct settings. (The results presented here are obtained after classifying the predicted probabilities into dry (0) and wet (1) with a threshold of 0.5).Figure 4. long description.

Figure 5

Figure 5. Comprehensive evaluation of model performance in predicting the Probability of Zero (P₀), defined as the percentage of dry pixels (i.e., pixels with zero precipitation) in each high-resolution output. The evaluation was conducted across five independently trained model runs for each of the three conditioning settings: Binary, Intensity, and Dry Constraint. (a) Convolutional encoder–decoder model. (b) WGAN model. Each scatter plot compares the predicted P₀ against the observed P₀ across the entire test set, providing insight into model calibration, bias, and consistency in preserving dry regions.Figure 5. long description.

Figure 6

Figure 6. Lagged spatial autocorrelation of binary dry/wet precipitation fields for increasing spatial lags (1–6 pixels) in the horizontal (top row) and vertical (bottom row) directions. Results are shown for the synthetic test dataset under the three conditioning strategies (binary input, intensity input, and constrained). Boxplots summarize the distribution of lagged spatial autocorrelation values across five independent training runs of the WGAN models with different random initializations (teal), compared against the ground truth fields (purple). Model outputs were converted to binary dry/wet maps using a probability threshold of 0.5 prior to analysis.Figure 6. long description.

Figure 7

Figure 7. Spatial representation of storm structures from the Radar dataset test sample and predictions from convolutional encoder–decoder models trained under the three conditioning settings. The first and second rows display the low-resolution and high-resolution precipitation fields, whereas the third and fourth rows show their corresponding binary fields. The fifth through seventh rows illustrate the predicted high-resolution fields generated under the binary, intensity, and constrained Input settings. The results presented here are obtained after converting predicted probabilities into dry (0) and wet (1) using a threshold of 0.5.Figure 7. long description.

Figure 8

Figure 8. Spatial representation of storm structures from the Radar dataset test sample and predictions from the WGAN trained under the three conditioning settings. The first and second rows display the low-resolution and high-resolution precipitation fields, whereas the third and fourth rows show their corresponding binary fields. The fifth to seventh rows illustrate the predicted high-resolution fields generated under the binary, intensity, and constrained input settings. The results presented here are obtained after converting predicted probabilities into dry (0) and wet (1) using a threshold of 0.5.Figure 8. long description.

Figure 9

Figure 9. Comprehensive evaluation of model performance in predicting the Probability of Zero (P₀) from a test sample of radar data. The evaluation was conducted across five independently trained models for each of the three conditioning settings: Binary, Intensity, and Dry Constraint. (a) Convolutional encoder–decoder model. (b) WGAN model.Figure 9. long description.

Figure 10

Figure 10. Lagged spatial autocorrelation of binary dry/wet precipitation fields for increasing spatial lags (1–6 pixels) in the horizontal (top row) and vertical (bottom row) directions. Results are shown for the radar test dataset under the three conditioning strategies (binary input, intensity input, and constrained). Boxplots summarize the distribution of lagged spatial autocorrelation values across five independently trained WGAN with different random initializations (teal), compared against the ground truth fields (purple). Model outputs were converted to binary dry/wet maps using a probability threshold of 0.5 prior to analysis.Figure 10. long description.

Figure 11

Figure 11. Improved dry/wet prediction and spatial structure following correction. (a) Applying a predicted dry/wet mask to WGAN-generated precipitation fields substantially reduces bias and RMSE in dry-pixel prediction (P₀). (b) Corrected outputs exhibit lagged spatial autocorrelations that more closely track the ground truth in both horizontal and vertical directions, indicating improved spatial coherence.Figure 11. long description.

Figure 12

Figure 12. Enhanced realism in corrected precipitation fields. Corrected images, generated by masking WGAN-predicted intensities using the dry/wet classification, show improved alignment with the ground truth, eliminating false wet areas and preserving sharper storm boundaries. (White space is represented as dry regions, whereas negative predictions from WGAN are represented with brown color).Figure 12. long description.

Supplementary material: File

Singh et al. supplementary material

Singh et al. supplementary material
Download Singh et al. supplementary material(File)
File 2.7 MB

Author comment: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R0/PR1

Comments

May 19, 2025

Dear Editor,

I am pleased to submit our manuscript entitled “Correcting Dry/Wet Classification Bias in Precipitation Downscaling via Generative Adversarial Networks” for consideration in Environmental Data Science.

In this study, we address a critical limitation in data-driven precipitation downscaling: the systematic overestimation of light precipitation, or “drizzle bias,” which undermines the accurate delineation of dry zones in high-resolution climate products. We develop and evaluate U-Net and Wasserstein GAN models trained on both synthetic and radar-based precipitation datasets, employing multiple conditioning strategies—including intensity fields and dry-region constraints—to predict high-resolution dry/wet classifications. Our results demonstrate that physically informed constraints significantly enhance the spatial structure, sharpness, and reliability of dry/wet delineation, with potential utility as a correction layer for statistical and deep learning downscaling frameworks.

We believe this contribution is well aligned with the scope of Environmental Data Science, particularly its focus on machine learning applications in environmental modeling, and the need for more interpretable and physically consistent data-driven methods in hydrology and climate science.

This manuscript has not been submitted elsewhere and is original work by the authors. We respectfully request that the paper be considered for peer review, and we welcome the opportunity to contribute to your journal.

Thank you for your time and consideration.

Sincerely,

Dr. Shivam Singh

(On behalf of all co-authors)

Postdoctoral Research Associate

University of Virginia

Email: wpa8me@virginia.edu

Review: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R0/PR2

Conflict of interest statement

No

Comments

This manuscript presents a comparative study of a deep learning (DL) model for downscaling precipitation data. It introduces a novel approach for high-resolution dry/wet classification and employs three distinct training strategies to effectively capture spatially and physically consistent dry/wet patterns. The authors conclude that the WGAN method, when combined with physically informed constraints, outperforms others in capturing these patterns.

The study is well-structured and holds merit for publication, however, some improvements in language and clarity are required.

Authors are suggested to improve the language.

Review: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R0/PR3

Conflict of interest statement

Reviewer declares none.

Comments

A Review of “Correcting Dry/Wet Classification Bias in Precipitation Downscaling via Generative Adversarial Networks”

(EDS-2025-0023)

Singh et al. (2025) propose a deep-learning framework for classifying dry versus wet conditions in downscaled precipitation fields. After outlining limitations in current downscaling approaches—particularly their tendency to produce unrealistic drizzle—the authors introduce a convolutional encoder–decoder architecture, used both as a standalone classifier and as the generator of a conditional WGAN.

They explore several conditioning strategies: (i) binary dry/wet LR fields, (ii) LR precipitation intensity, and (iii) LR precipitation intensity with an additional hard constraint enforcing that LR dry regions remain dry in the HR prediction.

The models are trained and evaluated using both synthetic precipitation fields generated with the Complete Stochastic Modeling System (CoSMoS) and hourly accumulated precipitation fields from the NCEP Multi-Radar/Multi-Sensor (MRMS) system. Model performance is assessed through the probability of zero precipitation and spatial autocorrelation. Finally, the authors apply the dry/wet mask produced by their classifier to post-process high-resolution precipitation fields generated by an independent WGAN trained on precipitation intensity only.

Overall, the manuscript is well structured, and the research problem is relevant and potentially impactful for several hydro-meteorological applications. The idea of explicitly correcting the dry/wet classification following intensity downscaling is conceptually sound and represents a valuable direction for improving physical realism.

However, I have several concerns regarding methodological choices, including aspects of data preprocessing, aggregation procedures, architectural terminology, and the interpretation of GAN training behaviour. In addition, multiple sections lack clarity or would benefit from a more rigorous formulation. Several parts of the manuscript could also be reorganised, consolidated, or streamlined to improve overall readability.

For these reasons, I recommend major revisions before the manuscript can be reconsidered for publication.

Below I provide a list of specific comments aimed at improving the clarity, methodological rigour, and scientific quality of the manuscript.

L60–61

The expression “at finer spatial and temporal scales” is vague because the reference baseline is not specified. Please consider clarifying/rephrasing.

L91–92

“GANs excel in preserving storm structures and spatial heterogeneity (…) over traditional regression-based models.” GAN advantages have been demonstrated not only relative to regression-based models, but also compared to more sophisticated downscaling methods. You may want to broaden this statement.

L128–129

“… the binary nature of precipitation.” It seems you are referring to the binary nature of precipitation occurrence, not precipitation itself. Please clarify.

L129–130

The expression “traditional post-processing ‘unlearning’ or bias correction steps” is unclear, particularly concerning the meaning of ‘unlearning’ in this context; please clarify.

L131–132

“classification-augmented GAN framework” - even if understandable, the term augmented may be confusing. Consider rephrasing.

L134–135

Using regime to indicate dry/wet states is not ideal, as “precipitation regime” has a specific meaning in climate science. I recommend replacing it with “dry/wet state” or similar wording.

L146–148

You refer to synthetic datasets of size 60×60. This describes only the domain extent, not the spatial resolution or the physical scale of the fields. Because the study compares performance across synthetic and MRMS datasets, it should be clarified whether these datasets are spatially consistent not only in terms of extent, but also in terms of physical scale. GAN-based downscaling models may not generalize well across datasets with substantially different spatial structures.

Please clarify whether the synthetic CoSMoS fields were generated to mimic the spatial structure of MRMS fields, and whether the spatial consistency between the two datasets has been assessed (e.g., by comparing their spatial power spectra or other scale-dependent metrics).

L159–160

The procedure used to compute the LR counterpart of the synthetic dataset is not described in the previous section. Please specify.

L161–162

The MRMS dataset covers 600×600 km at 1 km resolution. From your description, the HR and LR targets seem to be 60×60 at 10 km resolution and 6×6 at 100 km resolution. If this interpretation is correct, please state it explicitly. Additionally, “aggregated (re-gridded)” mixes two distinct concepts; please specify the exact procedure applied: interpolation, aggregation, conservative remapping, or another method.

L163–164

You refer to “evaluation data”, but based on later sections the radar dataset is actually used for training. Evaluation terminology may therefore be misleading. Perhaps replace with “used to assess performance” or similar wording.

L177–179

The first sentence repeats content from Lines 161–162. Additionally, the use of “evaluation dataset” can again be misleading. Please also clarify how dry days were filtered: did you remove only cases with zero precipitation across the entire domain, or did you apply a threshold to define dry conditions?

L186

Why was a subset of the radar images used after filtering?

L196

“… three major components.” Consider using an alternative expression such as “the methodology focuses on three main elements”.

Section 3.1

You refer to your architecture as a U-Net. However, the model lacks skip connections and a symmetric encoder–decoder structure (cf. Ronneberger et al. 2015, https://doi.org/10.1007/978-3-319-24574-4_28). Unless there is a specific justification, a more accurate description would be “convolutional encoder–decoder network”. Please revise or explain your choice of terminology.

L209

“reduces spatial dimension” — more accurately, this reduces resolution, not spatial extent.

L213 and elsewhere

The manuscript states that “the CNN upscales,” but the network actually performs downscaling (predicting high-resolution fields from low-resolution inputs). In atmospheric sciences, upscaling refers to the opposite process. Please check and correct the terminology.

L231–233

This appears to be the first explicit mention that the GAN is conditional. It would help to introduce the concept more smoothly, explaining that conditioning is used to encourage generated fields to be consistent with the LR input. Moreover, Fig. 1 suggests that the LR field is concatenated with intermediate feature maps rather than being provided at the network input; this technical detail should be explicitly described in the text. Finally, note that these intermediate feature maps may not preserve strict physical spatial alignment with the HR or LR fields. While this does not necessarily invalidate the approach, you could provide a brief comment or justification regarding this choice.

L241–242

“generator in case of WGAN” — this is already clear. The clarification is unnecessary.

L253–264

This description appears more appropriate in Section 3.1.

L255

Regarding the noise input, “conditioning” may not be the most appropriate term.

L256

What is meant by “spatially correlated noise”? Please clarify.

L263–264, 352–353

What is the rationale behind varying the random seed? Since the seed affects only the noise generation, one could argue for using a fixed seed for train/validation/test. Please justify the chosen approach.

L266–267

If the encoder/decoder suppresses noise effects, what is the motivation for including this noise?

L268–272

The manuscript states that variability in the generated outputs is “encouraged by the critic,” suggesting that the critic motivates the generator to utilize the noise input. While it is true that the training dynamics can influence how the generator exploits noise, it is more precise to clarify that the fundamental source of variability is the noise vector itself. I recommend revising the text to accurately reflect this concept.

Section 3.3

This section could benefit from reworking. For example:

• Standard GANs do not use the Wasserstein loss (L296).

• With conditional inputs, the equations require adaptation.

Consider either simplifying the mathematical exposition or making it rigorously consistent with your model.

L319

“Parameters” and “iterative” - please use precise terminology and clarify what aspect of hyperparameter determination was iterative.

L325

“The critic was trained for more steps (three) than the generator.” Please clarify: do you mean one generator update every three critic updates?

L326–329

I recommend consolidating the description of encoder/decoder CNN training separately from WGAN training.

L343–344

“Precipitation gradients” is unclear. Please clarify what gradients you are referring to.

L349–350

The meaning of the hard dry constraint could be clarified. Are HR pixels corresponding to dry LR pixels forced to be dry? Is this not overly restrictive?

L353–355

Although robustness to noise is a valid point, presenting it as “uncertainty quantification” is not accurate, since the study does not perform formal uncertainty analysis. I recommend removing this reference.

L359–361

Consider consolidating this text with the earlier paragraphs at Lines 330–351.

L380

“bias” — it appears you refer to the mean bias. Please clarify.

Figure 2b

The training curves raise concerns. In particular, the critic loss appears nearly flat around zero, while the generator loss oscillates strongly with large magnitude. This suggests that the critic may not be providing informative gradients to the generator, potentially indicating unstable training or mode collapse. The authors should comment on these observations and clarify whether and how they monitored the critic loss throughout training.

L519

“Pearson autocorrelation” — earlier you introduced spatial autocorrelation. Please adopt consistent terminology.

L560–562

The meaning of “input magnitude ~1400” is unclear. Are synthetic fields dimensionless?

More importantly, were the datasets normalized before being fed to the DL models, as is standard practice?

L564–566

This observation could also apply to the other experiments (see comment on Fig. 2b).

Recommendation: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R0/PR4

Comments

Please be sure to respond to all of the reviewers comments and concerns in your response.

Decision: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R0/PR5

Comments

No accompanying comment.

Author comment: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R1/PR6

Comments

Jan 21, 2026

Dear Editor,

We are pleased to resubmit our revised manuscript entitled “Correcting Dry/Wet Classification Bias in Precipitation Downscaling via Generative Adversarial Networks” for reconsideration in Environmental Data Science.

We sincerely thank you and the reviewers for the constructive feedback, which helped us improve the clarity, and presentation of the manuscript. In this revised submission, we have carefully addressed all reviewer comments and provide a detailed point-by-point response in the accompanying response document. The revised manuscript has been substantially improved in terms of language clarity, methodological description, terminology consistency, and overall organization.

In this study, we address a critical limitation in data-driven precipitation downscaling: the systematic overestimation of light precipitation, or “drizzle bias,” which undermines the accurate delineation of dry regions in high-resolution climate products. We develop and evaluate a convolutional encoder–decoder model and a conditional Wasserstein GAN framework trained on both synthetic and radar-based precipitation datasets, using multiple conditioning strategies, including intensity-based inputs and physically informed dry-region constraints to improve high-resolution dry/wet classification. Our results demonstrate that incorporating physically informed constraints enhances the spatial structure, sharpness, and reliability of dry/wet delineation, with potential utility as a correction layer for statistical and deep-learning downscaling frameworks.

We believe this contribution aligns well with the scope of Environmental Data Science, particularly its focus on machine learning methods for environmental modeling and the development of more physically consistent and interpretable data-driven approaches in hydrology and climate science.

This manuscript has not been submitted elsewhere and represents original work by the authors. We respectfully request that the revised manuscript be reconsidered for publication in Environmental Data Science. Thank you for your time and consideration.

Sincerely,

Dr. Shivam Singh

(On behalf of all co-authors)

Postdoctoral Research Associate

University of Virginia

Email: wpa8me@virginia.edu

Review: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R1/PR7

Conflict of interest statement

No

Comments

The manuscript is ready for publication.

Review: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R1/PR8

Conflict of interest statement

Reviewer declares none.

Comments

A Review of “Correcting Dry/Wet Classification Bias in Precipitation Downscaling via Generative Adversarial Networks” (EDS-2025-0023.R1)

I thank the authors for considering the comments I raised during the first round of review.

The quality of the manuscript has clearly improved, and most of the technical and scientific aspects have been clarified.

The current version of the paper is nearly acceptable.

I have some remaining concerns regarding the form of the manuscript, as well as some imprecisions I noticed while reading the current draft.

Moreover, some passages could benefit from further refinement and clarification.

I would therefore encourage the authors to carefully review the manuscript to improve clarity.

These observations do not constitute grounds for delaying acceptance.

For this reason, I recommend conditional acceptance with minor revisions, provided that the authors further refine the manuscript, based on, but not limited to, the comments below.

Specific Comments

L1-2: The title emphasizes the use of GANs, while the manuscript presents a comparative assessment of dry/wet classifiers based on both CNN and GAN architectures, under different input configurations. The authors may consider broadening the title to reflect both CNN and GAN, or reducing the emphasis on GANs.

L21-22 and elsewhere (resolution vs extent, storm input): I suggest removing brackets around “6x6” and “60x60” to avoid implying that these are resolutions rather than grid sizes. Regarding “storm input”, consider using “precipitation input” to reflect the broader types of events included in the radar dataset.

L31-32 and throughout (model naming): The GAN and CNN architectures could be referred to consistently as “the WGAN” and “the convolutional encoder–decoder”. Using the plural could imply that different conditioning or initialization strategies represent separate models, which is not the case in my opinion.

L37–39 and elsewhere (physically informed constraints): I suggest clarifying that your findings specifically highlight the benefit of the proposed strategy for this bias-correction task, rather than making a broad claim about physically informed constraints (i.e., those enforcing physical laws).

L61: “On the order of” → this is a range (0.25°-1°), rather than an order of magnitude.

L96: Remove “regression-based” unless justified, to avoid ambiguity.

L119-120: “Precipitation frequency” → Consider “precipitation event occurrence frequency”.

L129: “Extremes i.e. drought” → Drought represents only one extreme; please clarify.

L131: The encoder-decoder architecture is not included.

L155: Avoid phrases such as “fine spatial resolution of 60x60”; consider “synthetically generated storm field at a fine spatial resolution, with size 60x60”.

L167: “Access” → Did you mean “assess”?

L172-182: “The MRMS dataset spans a three-month winter period (November–January 2023)” - It seems that you extracted a subset of MRMS for your analysis. Please clarify. Also, consider consolidating the description of MRMS extraction.

L183: “To ensure consistency” → Consider “to assure consistent I/O configuration”.

L188-190: Please introduce the thresholding procedure in Sec. 2.1 before referencing it in later sections.

L215-219: Consider removing redundant statements about study goals.

L221: Avoid “for super-resolution”; it is not fully descriptive of the downscaling task.

L227: “Binary” unnecessary in this context.

L234: Please clarify “convolutional block” vs single convolutional layer.

L266-270: Consider consolidating with L274-275.

L277-278: “Design choice intentional and commonly adopted” → Add a reference or just drop the “commonly adopted” part.

L292: Move detailed storm counts to Sec. 2.1.

L318-327: The current description of the conditioning strategies may suggest that their impact is already established. I recommend clearly separating the description of the strategies from any claims about their effects. Any expected impact can be mentioned briefly as such, without implying that it has been assessed at this stage.

L336: Clarify “predicted probability” vs “predicted label”.

L336-337: You state before that the pixel-wise binary cross-entropy loss is optimized; avoid implying that accuracy is directly optimized.

L345-349 (Equation 2 and symbols): Please define all terms and correct any imprecisions.

L351-353: “Following established practice to maintain a well-trained critic” → Simplifying to “following established practice” would be clearer.

L359–366: Please clarify that the multiple training runs correspond to different random initializations used to assess robustness, rather than to distinct models.

The statement “Model outputs are probabilistic by design” appears to refer to the fact that the output layer provides pixel-wise wet/dry probabilities; I suggest making this explicit to avoid ambiguity and to improve technical clarity. Moreover, since this paragraph is not specific to the WGAN architecture, I recommend relocating it to the general methodological section.

L368: “Storm prediction” → Consider “dry/wet classification.”

L389: “The binary evaluation of dry and wet regions” → What you actually evaluate are the models.

L390: “Binary” unnecessary; use only if explicitly clarifying that you are predicting a binary wet/dry field.

L402-403: The repetition of the metric name and its brief definition appears redundant at this stage, as both have already been introduced and discussed.

L406–408 and throughout: Please use consistent naming for the conditioning settings (binary, intensity, and constrained input, as defined in Sec. 3.3) and avoid referring to them as different models.

L431-432: “Enforcing a fixed critic-to-generator update ratio” → Remove; it is a training setting, not a stability indicator.

L439–455 and elsewhere: I recommend avoiding repeated references to the same figure within a single paragraph.

L445: “Trends” → Consider “behaviour”.

L456: Clarify phrasing to indicate sharper, not “less smooth,” results.

L483, L552: Even if understandable I recommend avoiding the term “cues”.

L501: As previously discussed, no formal uncertainty quantification is performed in the manuscript. I therefore suggest removing “reflecting increased uncertainty”.

L508 (and elsewhere): The term “physically consistent” is not ideal here, as the improvement on the metric does not imply enforcement of physical laws (and does not constitute a straightforward suggestion of physical consistency). I suggest replacing it with “consistent” or “accurate”.

L514: I suggest moving the initialization details (seed list) to the Methods section and clarifying—if not already done—that the reported independence refers to different initializations, not distinct models.

L584–586: Following your response to my previous comment #36, I suggest including the relevant information in the manuscript. Specifically, indicate that 1400 is the maximum value of the synthetic field, as “cumulative value” is too vague. Avoid phrasing such as “typically limited to about 5 mm h⁻¹”; stating the order of magnitude, or the maximum value would be more scientifically accurate and clearer for the reader. Additionally, for improved reproducibility, consider briefly mentioning in the Methods section your choice not to perform data scaling, as discussed in your response.

L582–594: After reviewing the results you provided and your responses to comments #34 and #37 (thank you for the clarifications), I think I understand the point you are making. However, some aspects of the results remain not entirely straightforward. In particular, the statement “the critic likely provides weak or uninformative gradients, as evidenced by unstable generator losses” might be somewhat strong.

Looking at Fig. 2b, it is not immediately clear from the generator training curves under the three conditioning settings which configuration performs better. For instance, for the intensity input, the two lower and the uppermost curves appear to increase asymptotically, making it difficult for me to confidently attribute better performance to this setting relative to the binary input. A similar observation applies to Fig. S3b, where the generator curves under all conditioning settings also show asymptotic increases.

These observations do not invalidate the results, especially when considering your discussion in the response. Since you note that “loss magnitudes in WGAN-GP differ fundamentally from those in standard GANs and are not directly interpretable as indicators of convergence or training stability”, I suggest a more cautious framing of statements such as:

“Under the Binary Input setting, the critic likely provides weak or uninformative gradients, as evidenced by unstable generator losses. In contrast, when dry-region constraints are applied alongside intensity inputs, the generator becomes more stable and better aligned across seeds, suggesting that the constraint indirectly enhances adversarial balance and learning by guiding both networks toward physically plausible outputs.”

Specifically, I recommend avoiding over-interpretation, as it is not straightforward from the provided plots to determine under which conditions the generator is more stable across different conditioning strategies. Rephrasing the discussion to focus on the observations from the training loss curves, incorporating the concepts you described in response to my comment #34, and framing interpretations more narrowly, while retaining your (correct) note on WGAN-GP loss magnitudes for the different training sets, would improve scientific precision and clarity.

L610–611, 614: “Convolutional encoder-decoder models trained under three different/distinct settings” → Consider consolidating these sentences to avoid redundancy and improve clarity.

L619: “From convolutional encoder–decoder models”—did you mean from the WGAN?

L626–627: “The added gradient” → As before, I suggest framing this as an interpretation.

L638–639: Consider suppressing “physically plausible.”

L648: Clarify “negative bias” (Fig. 9a reports 0.02).

L650: “Real-valued” unnecessary.

L693: “Autocorrelation curves” → Suppress “curves”.

L707 and L765–766: Instead of “physically informed constraint/filter”, I suggest referring to the constrained input as a “consistency constraint”, since no physical law is enforced. In contrast the intensity input is, in my view, simply an input feature of the model.

L713: I apologize for having missed this point in the previous round of review. At this stage, you introduce a separate WGAN trained to predict precipitation intensity, but no details are provided about this model. It may be useful to briefly describe its architecture and training setup, even in a very concise form, possibly in the Supplementary Material.

L717–721: The statements “these models commonly exhibit drizzle bias” and “this is a well-documented limitation” would benefit from one or more supporting literature references, possibly recalling works already cited in the Introduction.

L827: In this paragraph, other URLs are reported in full, whereas the link to the code is provided only via “here” and appears to be broken. You may consider inserting the full link in textual form.

Recommendation: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R1/PR9

Comments

Thank you for your recent revisions, which the reviewers agree significantly improved the manuscript. Please carefully consider the reviewers latest (more minor) revisions in preparing a new version.

Decision: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R1/PR10

Comments

No accompanying comment.

Author comment: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R2/PR11

Comments

March 18, 2026

Dear Editor,

We are pleased to resubmit the revised version of our manuscript entitled “Correcting Dry/Wet Classification Bias in Precipitation Downscaling via Generative Adversarial Networks” for reconsideration in Environmental Data Science.

We sincerely thank you and the reviewers for the constructive and thoughtful feedback. We are encouraged that the manuscript has been substantially improved and is close to publication, subject to minor revisions. We have carefully addressed all remaining comments. In this revision, we have focused on improving clarity, refining terminology, and ensuring consistency in presentation. A detailed, point-by-point response to all reviewer comments is provided in the accompanying document.

In this study, we address a key limitation in data-driven precipitation downscaling: the misrepresentation of dry and wet regions, often associated with drizzle bias, which affects the realism of high-resolution precipitation fields. We develop and evaluate a convolutional encoder–decoder model and a conditional Wasserstein GAN framework using synthetic and radar-based datasets under multiple conditioning strategies, including intensity-based inputs and a consistency constraint to improve dry/wet classification. Our results demonstrate that incorporating this constraint improves the representation of dry regions and enhances the overall reliability of downscaled precipitation fields. We further illustrate how the predicted dry/wet masks can be used to post-process intensity-based predictions, reducing spurious light precipitation and improving agreement with observed dry/wet distributions.

We believe that this work aligns well with the scope of Environmental Data Science, particularly in advancing machine learning approaches for environmental modeling and improving the reliability and interpretability of data-driven downscaling methods.

This manuscript represents original work and has not been submitted elsewhere. We respectfully request that the revised manuscript be considered for publication in Environmental Data Science.

Thank you for your time and consideration.

Sincerely,

Dr. Shivam Singh

(On behalf of all co-authors)

Postdoctoral Research Associate

University of Virginia

Email: wpa8me@virginia.edu

Review: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R2/PR12

Conflict of interest statement

Reviewer declares none.

Comments

A Review of “Correcting Dry/Wet Classification Bias in Precipitation Downscaling via Generative Adversarial Networks”

(EDS-2025-0023.R2)

The authors have addressed the points raised in the previous review rounds, and the manuscript has significantly improved as a result. In its current form, the overall quality of the work is suitable for publication.

From my side, I consider the review process to be complete and recommend acceptance of the manuscript. Below, I provide a set of final, mostly minor suggestions, which I encourage the authors to consider in order to further improve clarity, precision, and overall consistency of the text.

L22 “binary classification fields” → “binary dry/wet classification fields”

L23–24 “convolutional encoder-decoder and conditional Wasserstein Generative Adversarial Network (WGAN)” → “a convolutional encoder-decoder and a conditional Wasserstein Generative Adversarial Network (WGAN)”

L26 “and adding physical constraints” → “with an added consistency constraint”

L26 “trained and validated” → “trained and evaluated”, or “trained, validated, and evaluated”

L30 “physical constraints” → “the consistency constraint”

L31 Suppress “generalization, and physical consistency”

L31–32 “WGAN models” → “the WGAN”

L37 Suppress “physically informed”. Consider “…the value of this bias-correction strategy…”

L38 “and” → “the”

L46–47 “convolutional encoder-decoders… WGAN” → “the convolutional encoder-decoder… the WGAN”

L47–48 “dry-region constraints” → “a dry region constraint”

L48 Suppress “physically consistent”

L88 “…rather than its intermittent…” → “…rather than accounting for its intermittent nature…”

L136 “thereby reducing false wet and false dry predictions” → not demonstrated yet; consider “aiming at reducing…”

L143 Suppress “(WGAN)”

L144 “and spatial continuity” → “and the spatial continuity of the precipitation field”

L145 Suppress “physical consistency and interpretability”; consider replacing with “accuracy”

L164 “following aggregation” → “after aggregation”

L165–166 Introduce thresholding procedure here; remove current sentence

L177–179 → Improve wording; For example: “… a subset of MRMS hourly precipiation data was extracted …, for a three-month winter period ...”

L187–189 → Simplify; For example: “To ensure a consistent I/O configuration with the synthetic experiments, the MRMS data were processed to obtain 6×6 low-resolution inputs and 60×60 high-resolution targets”

L192 “Following aggregation” → “After aggregation”

L205 “diagnostic” → “training and diagnostic value”

L213–214 suppress “and early stopping”

L219–220 “convolutional encoder-decoder and WGAN” → “a convolutional encoder-decoder and a WGAN”

L256 Please be more specific at this point. For example, “a generator, using the same convolutional encoder-decoder architecture defined above, that performs… ”

L270 “allowing the critic” → “allowing it”

L274–275 “Convolutional encoder-decoder (Generator)” → “convolutional encoder-decoder / generator”

L275 Suppressing “as components of WGAN.” would make clearer the caption, as the network in Figure a is both the standalone convolutional encoder-decoder and the WGAN’s generator.

L289–290 Suppress reference to Fig. 1

L294–296 Remove repeated description of dataset split and shuffling (already introduced)

L297 “the generator” → “the convolutional encoder-decoder / generator”

L298 “stochastic noise input” → “noise input”

L308–309 The critic does not generate variability itself ...” generating variability is not the role of the critic. Consider simplifying to “The critic constrains how stochastic variability is expressed...”.

L313–314 “the model are trained and evaluated independently…” → remove this sentence and consolidate the description of transfer learning and cross-dataset generalization with L293–294.

L325–326 “five independent training runs…” → “five training runs with different random initializations”

L334 suppress “independently”

L335-336 Clarify “with the objective of classifying each HR pixel”. Considers “with the objective of producing a HR dry/wet fields, starting from LR information”.

L341 “using a threshold of precipitation > 0” → Explicitly define thresholding, e.g. “1 if precipitation intensity is > 0”

L343–344 Remove “deterministic” (model is stochastic by construction, even though training suppress the input noise)

L358–360 Consolidate with L355

L361 Simplify to “C is the critic score”

L413: “a binary input” → “the binary input”

L417 Suppress duplicated Fig. 2a reference

L420 Replace “physically meaningful constraints” → “a consistency contraint”; remove the reference to bias correction at this stage, as it is introduced later in Sec. 6.

L428–429 “dry constraint” → “the dry constraint”

L429–430 “anchoring the adversarial learning process toward physically plausible dry/wet spatial configurations” is an interpretative sentence → rephrase cautiously

Caption Fig. 2: Clarify that the training/validation curves refer to the encoder–decoder, while for the GAN there are separate curves for the generator and the critic. Moreover, explicitly state that different lines correspond to different training initializations.

L457 “with a dry constraint” → “using the constrained input”

L460–461 “The wet/dry fields show improved sharpness and are much more realistic” remove redundancy and consolidate with the preceding sentence.

L474–475 Replace “physically consistent” → “more realistic / more accurate”. Additionally, remove the duplicated reference to Figure 4.

L476–477 Frame as interpretation

L481 Suppress “more interpretable output”

L482 Suppress “sometimes”

L502–503 “The result is a modest dry bias that enhances physical consistency” rephrase bias interpretation, e.g. “the result is a reduction in the dry bias, particularly...”

L519 “Dry Constraint” → “Constrained Input”

L532 Suppress “instability”

L535 Suppress “and improved generalization”

L536 “Constrained convolutional encoder-decoder” → “The conv. encoder-decoder with constrained input”

L546 “WGAN models” → “the WGAN model”

L550 “indicating” → split into two sentences, as the IQR, per se, is not directly related to consistency across different training initializations.

L592–593 specify a number or the exact order of magnitude (e.g., 10 mm/h)

L594: the difference in input scale alone could explain the observed fluctuation range in the loss curves. The relation between input scale and gradient smoothness is not straightforward; consider removing “produces smoother gradients during optimization”.

L611 “(Figures 7-8)”remove duplicated figure reference

L615 “models” → “model”

L632 “The added gradient” remove or reframe as interpretation

L642 Suppress “nevertheless”

L644–645 “The most accurate outputs are again achieved by the WGAN when it is constrained” → avoid repetition.

L648–649 “across all five independently trained models” → “across models trained with five different initializations”.

L653 “with a” remove repetition

L659 Suppress “with reduced outliers”

L666–667 “five independently trained WGAN” → “the WGAN trained with five different random initializations”.

L674–675 simplify sentence, remove “with greater spread relative to the intensity-based setting”

L679 Remove “but”

L687-688 “naturally decay due to fine-scale variability… artificially …” remove causal attribution to fine-scale variability. The statement attributing correlation decay at high spatial lags to “fine-scale variability” is not necessarily justified, as other factors may also contribute. I suggest removing this part and reporting only the observed behavior from the boxplots.

L691–692 “slightly more controlled correlation trends” → “correlations more consistent with the ground truth”

L694–695 “elevated interquartile variability, suggesting greater uncertainty and instability in capturing long-range spatial structures”. I understand that this refers to elevated IQR variability across the five different training initializations (seeds). If so, please specify this clearly. Moreover, as already discussed, I would suppress “uncertainty”.

L703 “training” → “input”

L705 “for WGAN models” → “for the WGAN”

L709–710 Clarify: “overall probability of dry condition” → “probability of zero (P₀)”

L710-711 “the observed spatial autocorrelation structure” → “the observed ground truth”

L724–726: this sentence essentially repeats content already mentioned above; consider removing it.

L733: Suppress “and enforcing spatial dryness constraint”.

L743 “awareness” → “information”

L749 “dry pixel prediction” → “probability of zero (P₀)”

L754 Suppress “stable”

L755 Suppress “patterns”

L757 Suppress “structural integrity”

L779 Remove “physically”

L780 Remove “interpretability”

L787 Remove “physically consistent”

L800 “their” → “its”

L807, L811: drop “physically informed”; as discussed there is no enforcement of physical laws in this study.

L812 Suppress “physically consistent”

L815 Remove “interpretable”

L834-835 “These data were used to evaluate model performance on real-world storm structures and dry/wet classification accuracy”. Suppress this sentence, as it is redundant with the Methods section and not appropriate for the Data Availability Section.

Recommendation: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R2/PR13

Comments

I’m pleased to accept your manuscript. Please take care to address the (many) minor typos and text suggestions from the reviewer in preparing the final version.

Decision: Correcting dry/wet classification bias in precipitation downscaling via generative adversarial networks — R2/PR14

Comments

No accompanying comment.