Hostname: page-component-89b8bd64d-z2ts4 Total loading time: 0 Render date: 2026-05-07T13:53:50.275Z Has data issue: false hasContentIssue false

On the effectiveness of neural operators at zero-shot weather downscaling

Published online by Cambridge University Press:  27 March 2025

Saumya Sinha*
Affiliation:
National Renewable Energy Laboratory, Golden, CO, USA.
Brandon Benton
Affiliation:
National Renewable Energy Laboratory, Golden, CO, USA.
Patrick Emami
Affiliation:
National Renewable Energy Laboratory, Golden, CO, USA.
*
Corresponding author: Saumya Sinha; Email: saumya.sinha@nrel.gov

Abstract

Machine-learning (ML) methods have shown great potential for weather downscaling. These data-driven approaches provide a more efficient alternative for producing high-resolution weather datasets and forecasts compared to physics-based numerical simulations. Neural operators, which learn solution operators for a family of partial differential equations, have shown great success in scientific ML applications involving physics-driven datasets. Neural operators are grid-resolution-invariant and are often evaluated on higher grid resolutions than they are trained on, i.e., zero-shot super-resolution. Given their promising zero-shot super-resolution performance on dynamical systems emulation, we present a critical investigation of their zero-shot weather downscaling capabilities, which is when models are tasked with producing high-resolution outputs using higher upsampling factors than are seen during training. To this end, we create two realistic downscaling experiments with challenging upsampling factors (e.g., 8x and 15x) across data from different simulations: the European Centre for Medium-Range Weather Forecasts Reanalysis version 5 (ERA5) and the Wind Integration National Dataset Toolkit. While neural operator-based downscaling models perform better than interpolation and a simple convolutional baseline, we show the surprising performance of an approach that combines a powerful transformer-based model with parameter-free interpolation at zero-shot weather downscaling. We find that this Swin-Transformer-based approach mostly outperforms models with neural operator layers in terms of average error metrics, whereas an Enhanced Super-Resolution Generative Adversarial Network-based approach is better than most models in terms of capturing the physics of the ground truth data. We suggest their use in future work as strong baselines.

Information

Type
Application Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© Alliance for Sustainable Energy, LLC, 2025. Published by Cambridge University Press
Figure 0

Figure 1. Overview of the neural operator and non-neural-operator zero-shot weather downscaling approaches. We show 5x to 15x zero-shot downscaling as an example. (a,b) For neural operators, the interpolation scale factor is the same as the upsampling factor, e.g., the bicubic interpolation layer upsamples to 5x during training and 15x during evaluation. (c) For regular neural networks (e.g., SwinIR), the model is trained to output at 5x (e.g., using a learnable upsampler such as sub-pixel convolution). At test time, the model generates a 5x output which is then interpolated 3x more to produce the final 15x HR output.

Figure 1

Figure 2. Figure shows the two regions used for the ERA5 to WTK downscaling experiments. The 600 × 1600 (800 × 800) region is shown in black (blue). We use NREL’s rex (Rossol and Buster, 2021) tools to rasterize the WTK dataset.

Figure 2

Table 1. ERA5 to ERA5 wind speed downscaling results. MSE has units $ {\left(m/s\right)}^2 $, MAE $ m/s $ and IN $ m/s $. We bold the best-performing model among all the models and underline the best-performing neural operator model. Results for the other channels are added to the Appendix C

Figure 3

Figure 3. Figures (a) and (b) show kinetic energy spectrum plots for ERA5 standard downscaling and zero-shot downscaling respectively. Kinetic Energy is normalized and wavenumber is measured relative to the domain size. We add a zoomed-in plot (right) beside the main plot to zoom in on the key region of interest. We observe that ESRGAN appears to be the best in capturing the ground truth energy spectra across all wavenumbers for standard downscaling (a) and at medium-to-high wavenumbers for zero-shot downscaling (b). DAFNO is the best performing at the highest wavenumbers in (b).

Figure 4

Figure 4. ERA5 wind speed visualizations in $ m/s $ generated from the zero-shot downscaling. We zoom in on a small region for better comparison. SwinIR captures better and finer details of the HR image, over neural operator models, especially in the zoomed-in region. It is also better over regions with complex terrain (e.g. the mountain ranges in North and South America).

Figure 5

Table 2. ERA5 to WTK wind speed downscaling results. MSE has units $ {\left(m/s\right)}^2 $, MAE $ m/s $ and IN $ m/s $. We aggregate the error metrics over u and v wind velocity channels. We bold the best-performing model among all the models and underline the best-performing neural operator model

Figure 6

Figure 5. Figures (a) and (b) show kinetic energy spectrum plots for the ERA5 to WTK standard downscaling and zero-shot downscaling respectively. Kinetic Energy is normalized and wavenumber is measured relative to the domain size. We add a zoomed-in plot (right) beside the main plot to zoom in on the key region of interest. ESRGAN matches the HR spectrum at all wavenumbers for the standard downscaling (a). ESRGAN also comes closest to matching the ground truth energy spectrum for the zero-shot downscaling (b), but the gap increases for higher wavenumbers. SwinIR and EDSR rank second to ESRGAN for both (a) and (b).

Figure 7

Figure 6. WTK wind speed visualization in $ m/s $ generated from the zero-shot downscaling (figure shows results on one of the two regions). We observe ESRGAN’s downscaled outputs (followed by SwinIR’s) to be sharper with better details than the neural operator models.

Figure 8

Table 3. Ablation study on the effect of local layers from Liu-Schiaffini et al. (Liu-Schiaffini et al., 2024) on FNO, DFNO, and DUNO. We show the MSE in all the experiment setups. The local layers mostly benefit FNOs for the downscaling tasks but do not improve the performance of DFNO or DUNO models

Figure 9

Table 4. Model parameters and training wall-clock time recorded on a single NVIDIA H100 GPU for all the baseline and neural-operator-based models used in the ERA5 to WTK downscaling setup. SwinIR achieves superior average error metrics (e.g., MSE), as shown in Table 2, while having only marginally higher model parameters than the downscaling neural-operator-based models (except DAFNO). ESRGAN is the best model in matching the ground truth energy spectrum (Figures 5a, 5b) and is second to DAFNO in parameter count. Notably, ESRGAN takes the longest to run, followed by SwinIR; both take longer to run than all the neural-operator-based models

Figure 10

Figure 7. Plot comparing the mean-squared error (MSE) against the number of modes (n_modes) in the Fourier Neural Operator (FNO) model for ERA5 to WTK downscaling.

Figure 11

Figure 8. Energy spectrum plots comparing the effect of different numbers of modes ({16, 32, 64, 128, 160}) in the FNO model for ERA5 to WTK downscaling.

Figure 12

Figure 9. Plot comparing the MSE against the number of modes (n_modes) in the FNO model for ERA5 to ERA5 downscaling. In this case (as discussed in A.4), the standard downscaling FNO model performs a sweep over {8, 16, 32, 64, 128} number of modes, and the zero-shot downscaling model sweeps over {8, 16, 32, 64} number of modes.

Figure 13

Table 5. Ablation study on the effect of replacing convolutional RRDB blocks with residual Swin-Transformer blocks (RSTB) in the DXNO models. The table shows the MSE for the ERA5 to WTK downscaling experiments using the DFNO and DUNO models. DXNO with RRDB blocks show better performance

Figure 14

Table 6. ERA5 to ERA5 temperature downscaling results. MSE has units $ {(K)}^2 $, MAE $ K $ and IN K. We bold the best-performing model among all the models and underline the best-performing neural operator model

Figure 15

Table 7. ERA5 to ERA5 total column water vapor downscaling results. MSE has units $ {\left( kg/{m}^2\right)}^2 $, MAE $ kg/{m}^2 $ and IN $ kg/{m}^2 $. We bold the best-performing model among all the models and underline the best-performing neural operator model

Author comment: On the effectiveness of neural operators at zero-shot weather downscaling — R0/PR1

Comments

Dear Editor,

We would like to submit our work on “ On the Effectiveness of Neural Operators at Zero-Shot Weather Downscaling” for consideration in the “Environmental Data Science” journal’s special issue on “Tackling Climate Change with Machine Learning”. We believe our work on weather downscaling studies will be a great fit for this journal and the special issue. We confirm that this work has not been published elsewhere and all the authors have approved the final submitted manuscript. We also want to inform you that we have added the link to a part of the dataset we work with in the manuscript, and we are working on releasing the other part soon.

Thanks,

Saumya

Review: On the effectiveness of neural operators at zero-shot weather downscaling — R0/PR2

Conflict of interest statement

Reviewer declares none.

Comments

The authors extensively investigated the performance of neural operators in zero-shot super-resolution for atmospheric downscaling – an essential and challenging task. They tested the performances of various neural operators (i.e., FNO, DFNO, UNO, etc.) and a couple of baseline models (e.g., bicubic, SRCNN, SwinIR, etc.). The analyses were conducted mainly on downscaling the wind speed field under two scenarios: (1) using ERA5 only and (2) downscaling ERA5 to WTK dataset. The results show that the SwinIR model outperformed all neural operators, consistently showing worse MSE and kinetic energy spectrum. It is an interesting work, as I always wonder about and am concerned about the overall capability of neural operators (e.g., FNO) in zero-shot super-resolution. The manuscript is easy to read. Therefore, I suggest a moderate revision with one major comment followed by minor suggestions.

My main concern is determining the optimal hyperparameters/architecture of the neural operator, particularly the cutoff frequency of various FNOs, which is critical to determining the training performance. This is not only due to the need for hyperparameter tuning in every deep learning task but also because, in this specific problem, the adopted cutoff frequency could significantly affect to what extent the FNOs captured the high frequency/wavenumber dynamics. For example, the worse performance of FNOs could result from a low-frequency cutoff. However, the authors did not disclose the hyperparameter tuning process despite separating the dataset into training/validation/test parts. To that, I would suggest the following:

- Describe the hyperparameter tuning for each model (and its computational cost)

- Describe the optimal hyperparameters used by each model and the corresponding overall number of parameters/weights

- Evaluate the impact of frequency cutoff on the training. For instance, one can plot the model performance regarding MSE and kinetic energy spectrum versus the frequency cutoff.

Minor comments

Lines 33-34, Pg 2: Please describe the different physical processes captured by coarse and fine resolutions in the weather model (e.g., ERA5).

Lines 6-8, Pg 6: As ERA5 and WTK are generated by two different models, a worse performance is expected when using a model trained on ERA5 to predict the behavior of WTK. Please describe the need and what it means for other model simulations.

Lines 31-40, Pg 6: Do UNO and CNO keep the resolution-invariant property since both adopt UNet which fixes the input/grid size? If so, please briefly describe how.

Section 4: Please describe the training details, including the optimizer, scheduler, training epochs, and the hyperparameter tuning step (see the main comment). Please also comment on the training time for each model.

Section 4.2: Why were 2-m temperature and the total column water vapor not evaluated in ERA5 to WTK downscaling?

Section 5: If SwinIR performed better than NOs in the downscaling task, did it also outperform NOs in the test dataset? If not, one might conclude that SwinIR is outstanding in downscaling. Otherwise, the better performance of SwinIR is likely attributed to the fact that it has been better trained, which led back to my main comment on how the hyperparameters were tuned for NOs.

Figures 3 and 5: It’s hard to tell the difference of lines. Please plot the difference in power spectra between HR and each model. Please also use more distinguishable colors.

Section 5.3: Incorporating local layers did not help improve the NO performance, which is ‘contradictory’ to what they are supposed to do. Please comment on that.

Tables 4 and 5: The two tables are listed without discussion or mentioning the main manuscript. Please briefly talk about it in the main text.

Review: On the effectiveness of neural operators at zero-shot weather downscaling — R0/PR3

Conflict of interest statement

Reviewer declares none.

Comments

Page 5 Line 41: \theta should be defined for two components like \theta is a union of \theta_1 for neural network and \theta_2 for neural operator.

Page 7 Line 27: ‘This ERA5 dataset consists of image snapshots at 24-hour intervals over an eight year period.’ Is not ERA5 hourly data?

Page 7 Line 28: why data set split is not totally based on time order? Like 2007, 2008, 2010, 2011 for training, then 2012, 2013 for validation, and 2014, 2015 for testing

Where is the year 2009?

Page 7 Line 30: elaborate ‘we extract eight patches of size 64x64 (for the zero-shot downscaling) from each image to obtain HR images for training’.

What is the input LR size and what is the HR size? It would be great, if you can provide a table detailing input/output sizes?

Section 4.1

Provide more details on dataset size.

Is LR data generated with bicubic interpolation for both standard and zero-shot downscaling cases?

Section 4.2

It should be emphasized that a single model is trained for both wind regions.

Provide more details on dataset size.

Is there no data for year 2008 and 2009?

(‘The year 2007 is split 80/20 between the training and validation. We keep the year 2010 for testing’)

Section 4

Given evaluation results at Section 5, it shows that SwinIR can be a more effective architecture for spatial feature extraction than CNNs.

As a result, it is possible that the performance bottleneck for DXNO is RRDB which is also CNN based. Thus, I think it is important to have an ablation study to replace RRDB within DXNO with a transformer based architecture.

Recommendation: On the effectiveness of neural operators at zero-shot weather downscaling — R0/PR4

Comments

I have received two reviewer reports for “On the Effectiveness of Neural Operators at Zero-Shot Weather Downscaling”. From these reports, major revisions are required before publication of this manuscript. I am therefore returning the manuscript to you so you may make the changes suggested by the reviewers.

Decision: On the effectiveness of neural operators at zero-shot weather downscaling — R0/PR5

Comments

No accompanying comment.

Author comment: On the effectiveness of neural operators at zero-shot weather downscaling — R1/PR6

Comments

The authors thank the reviewers for their valuable feedback and thoughtful comments, we appreciate the reviewers’ time and effort in helping us improve our work. We hope the revised manuscript addresses all the concerns.

Review: On the effectiveness of neural operators at zero-shot weather downscaling — R1/PR7

Conflict of interest statement

Reviewer declares none.

Comments

This revision has addressed my concerns effectively. I would recommend accepting this manuscript.

Review: On the effectiveness of neural operators at zero-shot weather downscaling — R1/PR8

Conflict of interest statement

Reviewer declares none.

Comments

I would like to thank the authors for addressing my comments. I particularly like the investigation of the cut-off mode’s impact on NO’s performance. That the model with the lowest mode (i.e., 16) was found to have the best performance suggests that the FNO is not suitable for zero-shot atmospheric downscaling where finer features are important while irregular. I suggest the publication of the manuscript.

Recommendation: On the effectiveness of neural operators at zero-shot weather downscaling — R1/PR9

Comments

No accompanying comment.

Decision: On the effectiveness of neural operators at zero-shot weather downscaling — R1/PR10

Comments

No accompanying comment.