Hostname: page-component-89b8bd64d-5bvrz Total loading time: 0 Render date: 2026-05-07T21:48:10.502Z Has data issue: false hasContentIssue false

WindDragon: automated deep learning for regional wind power forecasting

Published online by Cambridge University Press:  19 March 2025

Julie Keisler*
Affiliation:
EDF R&D, Palaiseau, France University Lille, INRIA, CNRS, Centrale Lille, UMR 9189 CRIStAL, Lille, France
Etienne Le Naour*
Affiliation:
EDF R&D, Palaiseau, France Sorbonne Université, CNRS, ISIR, Paris, France
*
Corresponding authors: Julie Keisler and Etienne Le Naour; Emails: julie.keisler@inria.fr; etienne.le-naour@edf.fr
Corresponding authors: Julie Keisler and Etienne Le Naour; Emails: julie.keisler@inria.fr; etienne.le-naour@edf.fr

Abstract

Achieving net-zero carbon emissions by 2050 necessitates the integration of substantial wind power capacity into national power grids. However, the inherent variability and uncertainty of wind energy present significant challenges for grid operators, particularly in maintaining system stability and balance. Accurate short-term forecasting of wind power is therefore essential. This article introduces an innovative framework for regional wind power forecasting over short-term horizons (1–6 h), employing a novel Automated Deep Learning regression framework called WindDragon. Specifically designed to process wind speed maps, WindDragon automatically creates Deep Learning models leveraging Numerical Weather Prediction (NWP) data to deliver state-of-the-art wind power forecasts. We conduct extensive evaluations on data from France for the year 2020, benchmarking WindDragon against a diverse set of baselines, including both deep learning and traditional methods. The results demonstrate that WindDragon achieves substantial improvements in forecast accuracy over the considered baselines, highlighting its potential for enhancing grid reliability in the face of increased wind power integration.

Information

Type
Application Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. Global scheme for wind power forecasting. Every 6 h, the NWP model produces hourly forecasts. Each map is processed independently by the regressor which maps the grid to the wind power corresponding to the same timestamp.

Figure 1

Figure 2. Data preparation for the region Auvergne-Rhône-Alpes. The wind farms are represented in red. The first image shows the distribution of wind farms across the administrative region.

Figure 2

Figure 3. WindDragon’s meta-model for wind power forecasting.

Figure 3

Table 1. Layers available and their associated hyperparameters in the WindDragon search space (for the first and the second graph)

Figure 4

Figure 4. Visual illustration of the XGB two-step approach on the Auvergne-Rhône-Alpes region.

Figure 5

Figure 5. CNN architecture applied to the Grand Est region.

Figure 6

Table 2. National results: metrics computed on the aggregation of the regional forecasts for each model. The best results are highlighted in bold and the best second results are underlined

Figure 7

Table 3. Regional results. The best results are highlighted in bold and the best second results are underlined

Figure 8

Figure 6. Wind power forecasts for a week in January 2020. The figure displays the ground truth as dotted lines, and the forecasts from the two top-performing models, WindDragon and CNN.

Figure 9

Figure 7. Errors comparison between WindDragon and the CNN. The dotted vertical lines in Figure 7a,b represent the beginning of the new NWP forecast.

Figure 10

Figure 8. Comparison of the CNN and WindDragon performance over 20 quantiles. The two figures show WindDragon’s superiority over CNN over the entire distribution, but particularly over the distribution tails.

Figure 11

Figure 9. WindDragon search algorithm (Mutant-UCB) convergence: NMAE through time for each region.

Figure 12

Figure A1. Architecture found by WindDragon on Grand Est.

Figure 13

Figure A2. Architecture found by WindDragon on Auvergne-Rhône-Alpes.

Figure 14

Figure A3. Architecture found by WindDragon on Hauts-de-France.

Figure 15

Figure A4. Architecture found by WindDragon on Île-de-France.

Figure 16

Figure A5. Architecture found by WindDragon on Occitanie.

Figure 17

Figure A6. Weekly comparative visuals.

Author comment: WindDragon: automated deep learning for regional wind power forecasting — R0/PR1

Comments

Dear Editors:

We are writing to submit our manuscript “WindDragon: Enhancing Wind Power Forecasting with

Automated Deep Learning” to the Journal of Environmental Data Science.

Our manuscript presents an optimization tool to automatically find efficient deep neural networks to forecast aggregated wind power generation at the level of a region or a country. These models are based on wind speed maps from numerical weather prediction (NWP) forecasts and take advantage of their spatio-temporal aspect. These methods could play a crucial role in the smooth operation of power grids in the context of massive renewable energy integration.

Our submission has the following keywords: Wind Power Forecasting, Deep Learning, Renewable Energies and Automated Machine Learning.

As the corresponding author, I confirm that all co-authors below consent to my submission of this manuscript to the Journal of Environmental Data Science.

Sincerely,

Julie Keisler (EDF Lab Paris-Saclay, University of Lille & INRIA Paris)

Etienne Le-Naour (EDF Lab Paris-Saclay & Sorbonne Université)

Review: WindDragon: automated deep learning for regional wind power forecasting — R0/PR2

Conflict of interest statement

Reviewer declares none.

Comments

The paper presents WindDragon, an AutoML framework that shows improvement in forecasting aggregated short-term wind power compared to other deep learning forecasting techniques. It is overall well-written and the general framework is convincing. However, I have some questions and comments, mainly regarding how the results are evaluated.

Major comments

- Could you please provide training/inference times and set-ups for the CNN, WindDragon and the ViT?

- Is WindDragon an application of DRAGON or a new framework? How does it differ from DRAGON?

- page 3, l.13: Could you please explain in which sense DRAGON is more flexible than and different to other AutoML frameworks? Could you also state why these differences are advantageous for your task, i.e. why you chose DRAGON and not one of the other AutoML frameworks your citing?

- Could you please explain how you use XGBoost to predict the wind power output? Can XGBoost account for the turbine locations as visualized in Figure 4? Is the prediction from speed to power deterministic? How did you train it?

- Table 1+2: In all cases the rankings based on MAE and NMAE are the same. Please choose at least one more metric that is more different to MAE than NMAE (e.g. RMSE to penalize large errors) and provide reasoning for your choice. I suggest to choose your metric such that it proves that structures which are important for your task of interest are predicted properly by your model. This can help to show the improvement of WindDragon over the CNN, compare to page 7, l. 21 “superior accuracy, particularly in predicting high peak values”. It would be nice if this was represented by a metric as well. At the moment I would find it difficult to choose WindDragon over a simple CNN as the CNN shows very good performance as well. Please make sure that the advantages of WindDragon are better elaborated.

- Table 1 caption: What does “sum of the regional forecasts for each models” mean? Is it the sum of MAEs over several consecutive timesteps? Do you report one forecast run i.e. 6 hours or an average? You report more results on page 7, l. 28., is this a theoretical extrapolation and if so, how many days are the baseline? Please provide more details on how exactly you compute the errors.

- Could you please elaborate on why your“ method holds promise for extending the forecasting horizon” (page 8, l.43)?

Minor comments

- “These models are especially useful when the wind farms are scattered on the map (see Figure 3)”, page 2, l.47. I guess this refers to your model accounting for the locations of turbines. Describing it differently could help in understanding this location-awareness. Same for page 3, l.3.

- Could you please give a reference when stating that vision transformers excel at segmenting images (page 3, l.4)?

- page 3, l. 26: “in our case a value is predicted from a 2D map”. I think you could be more precise here and explicitly state that the 2D map are the wind speeds. Is the predicted value wind speed or power?

- Please correct typos in your manuscript, e.g. page 3, l.11 “Compare to other…” or “Wind Generation forecast” instead of wind power generation forecast in the caption of Figure 2, page 4, l.21 “more recent forecasts” instead of most recent forecasts, l.24 “botto-up.

- page 4, l.21: What is the main motivation for having six-hourly forecasts? Is it data availability?

- Figure 3: I think choosing a different color for the farms (black? and white wherever there is no wind data instead of purple) would improve visibility of the farms. It would also improve understanding if the lon-lat scales were the same in all subplots, then one could immediately see that the turbines are at the same locations. I assume that the background color where no turbines are is the wind speed at one time point, please add this to the caption. I think the first image of the series is not necessary for understanding the procedure.

- Figure 4: Please make sure m.s^1 is better readable.

- page 6, l.21: I think it would be better to cite vit-pytorch instead of the github name of the owner of the repository.

- Figure 7+8: Maybe plot the difference to the ground truth instead of the ground truth and the individual forecasts. This could help making the improvement of Wind Dragon over the CNN better visible.

Review: WindDragon: automated deep learning for regional wind power forecasting — R0/PR3

Conflict of interest statement

Reviewer declares none.

Comments

This paper developed an automated deep learning model for regional/national wind power forecasting, based on previously developed DRAGON model. Case studies based on 3-year data shows that the developed WindDragon model outperforms benchmarks, including CNN and ViT. While the paper shows some interesting findings, major issues should be addressed:

1. The current literature review shows that the authors are new to this field. Please clearly point out the motivation, innovation, and contribution of this paper.

2. Please include key information in the abstract, including motivation, innovation, key result statistics, conclusions.

3. The innovation compared to DRAGON is not clear.

4. Please add more details about the DRAGON algorithm

5. How did the authors handle new wind plants and wind curtailment?

6. It’s not clear how did the authors optimize the CNN for fair comparison.

7. Please include the training curves to show the training process.

8. There seems to be no significance in improvement compared to CNNs.

9. Please show more time series with difference performance of the models.

Recommendation: WindDragon: automated deep learning for regional wind power forecasting — R0/PR4

Comments

Thank you very much for submitting your manuscript for review. Based on the reviews, we cannot immediately accept this manuscript for publication. Some of the main issues highlighted by reviewers include a lack of clarity concerning the difference between WindDragon and DRAGON, a lack of a comprehensive literature review, and a need to revise the introduction to more clearly state the motivation, innovations and contributions. However, we would reconsider a revised version based on the comments provided by both reviewers.

Decision: WindDragon: automated deep learning for regional wind power forecasting — R0/PR5

Comments

No accompanying comment.

Author comment: WindDragon: automated deep learning for regional wind power forecasting — R1/PR6

Comments

We are grateful to the Editor-in-Chief, the Editor and the reviewers for their careful reading and comments. Below we comment on the changes we have made in response to the various suggestions made by the reviewers. In the updated version of the paper, we have written our changes in blue.

The main changes we have made are as follows

- Better contextualising our paper and stating the industrial needs it addresses in the introduction.

- Adding a full state of the art section to detail where our approach stands in both the wind power forecasting and AutoML communities.

- Detailing in the WindDragon section how exactly our approach differs from DRAGON.

- Changing the search algorithm from Evolutionary Algorithm to Mutant UCB to achieve even better results.

- Comparison of the performance of CNN and WindDragon.

Reviewer 1

Major remarks

Could you please provide training/inference times and set-ups for the CNN, WindDragon and the ViT?

Thank you for your comment. The CNN structure was found by hand using a trial and error method, whereas WindDragon’s ‘training’ includes an automatic search for these structures. Once the correct model is found, the inference of the two methods is similar, as it consists of the inference of a simple neural network. Therefore, it seems difficult to compare the training/inference of the different methods. However, thanks to your comment we have included a graph in the Experiments section showing the error of the best model found so far by WindDragon during training, to give an idea of the computational time required. We pointed out in the conclusion that this computation time was long and that this was an area for improvement (although this problem is inherent in any AutoML technique).

Is WindDragon an application of DRAGON or a new framework? How does it differ from DRAGON?

Thank you for your comment. DRAGON is a python open-source package which provides high-level bricks which can be used to define a search space and build search algorithms. With WindDragon we used those tools to propose a search space and a search algorithm adapted to the specific wind power forecasting task. We are revising Section 3 to highlight the difference between Windragon and Dragon according to your comment. More details can be found in this section.

page 3, l.13: Could you please explain in which sense DRAGON is more flexible than and different to other AutoML frameworks? Could you also state why these differences are advantageous for your task, i.e. why you chose DRAGON and not one of the other AutoML frameworks your citing?Thanks for your question. In short, DRAGON is more flexible than other AutoML frameworks because it allows to create search spaces that can mix different layers type (convolution, attention, MLP, etc) and does not constrain the neural networks general structure. According to your remark, we added a new related content paragraph \textit{AutoDL} for wind power forecasting. See Section 2 for more explanation.

Could you please explain how you use XGBoost to predict the wind power output?

Yes of course. For the XGboost baseline we follow a two-stages procedure.

- For a given NWP wind speed map $X_t \in \mathbb{R}^{m \times n}$ we compute the mean of the wind speed map denoted as $\bar{X}_t \in \mathbb{R}$

- Let us consider that our train dataset is composed of $T$ timestamps. The XGB regressor is a simple mapping between the mean of the wind speed map and the wind power generation for this given map. Formally: \begin{align*}

XGB \colon \mathbb{R} &\longrightarrow \mathbb{R} \\

\bar{X_t} &\longmapsto XGB(\bar{X}_t) = \hat{y_t}

\end{align*}

We then learn the XGB regressor on the $\{\bar{X}_t, y_t \}_{t=1}^{T}$ train dataset.

Can XGBoost account for the turbine locations as visualized in Figure 4?

The XGBoost baseline cannot take turbine locations into account because it is training on the mean of the wind speed map. In fact, the purpose of this baseline is to show the performance of a model that only relies on aggregated information, not spatial information (as opposed to deep learning baselines).

Is the prediction from speed to power deterministic? How did you train it?

Yes, it is deterministic. It is trained according to a Mean Square Error (MSE). For uncertainty quantification, the WindDragon framework (as well as all other baselines) could be trained with pinball losses.

Table 1+2: In all cases the rankings based on MAE and NMAE are the same. Please choose at least one more metric that is more different to MAE than NMAE (e.g. RMSE to penalize large errors) and provide reasoning for your choice. I suggest to choose your metric such that it proves that structures which are important for your task of interest are predicted properly by your model. This can help to show the improvement of WindDragon over the CNN, compare to page 7, l. 21 “superior accuracy, particularly in predicting high peak values”. It would be nice if this was represented by a metric as well.

Thanks for your remark. We compared the rankings of the CNN and WindDragon with the MSE and the RMSE and for all regions WindDragon outperforms the CNN. In the industry, the NMAE is the most used metric for regional wind power forecasting. The MAE gives an idea of the quantities of energy, and in particular shows that not all regions are equally important to forecast, since some produce much less wind power than others. For example, the forecasts for the PACA region are very poor, regardless of the method used, but given the very low level of production, this is not a problem. However, to take your comment into account, we have compared the performance of CNN and WindDragon Section 4 in more detail. We have compared performance by time of day, by month and by production distribution. We show in Figure 7 that on the highest quantile of the distribution, WindDragon forecasts are significantly better than those from the CNN.

At the moment I would find it difficult to choose WindDragon over a simple CNN as the CNN shows very good performance as well. Please make sure that the advantages of WindDragon are better elaborated.

For this kind of national problem, even a small gain in terms of percentages can have a huge impact, as hilighted in (iii) Results paragraph. But we agree with you, the results can be improved. This is why we changed our search algorithm and used Mutant-UCB instead of the Evolutionary Algorithm, which gave us much better results. This algorithm is described in detail in section 3. Thanks to this algorithm, WindDragon has an error that is 19\% lower than the CNN, which we believe is very advantageous from an operational point of view.

Table 1 caption: What does “sum of the regional forecasts for each models” mean? Is it the sum of MAEs over several consecutive timesteps? Do you report one forecast run i.e. 6 hours or an average? You report more results on page 7, l. 28., is this a theoretical extrapolation and if so, how many days are the baseline? Please provide more details on how exactly you compute the errors. About the national prediction : let us denote the national wind power generation as $y^{\text{nat}}$ and it is approximation as $\hat{y}^{\text{nat}}$. We can do the same for the french region, we denote the $i-th$ region as $y^{r_i}$ and it is approximation $\hat{y}^{r_i}$.

Our proposed model and all the baselines are specifically trained on each region independently, this is the results reported in table 3. The national forecast $\hat{y}^{\text{nat}}$ can be defined as $\hat{y}^{\text{nat}} = \sum:{i=1}^{I} \hat{y}^{r_i}$ where I denote the total number of regions. This two-stages procedure for the national forecast is way more efficient than directly forecast the national wind power generation based on the whole national wind speed map.

About the test timestamps : We are not sure to fully understand the question. But the reported test results concern the 2020 years (it is the average error over the 24*365 timestamps of the 2020 year).

About the 6 hours forecasts : Our method (and all baselines) does not perform forecasts, but treats a regression problem between a wind speed map at time $t$ and the corresponding wind power generation at time $t$. The 6-hour forecasts concern the NWP maps that we use as input to the regressors. The NWPs are the output of the HRES model, which runs 4 times per day and forecasts 6 hours at a time.

About the error : Thank you for your remark. In the revised version we provide more details on how exactly we compute the errors.}

Could you please elaborate on why your“ method holds promise for extending the forecasting horizon” (page 8, l.43)?

Sorry we made a mistake. We changed this statement in the conclusion and mentioned that a future work would be to apply this to longer horizons.

Minor remarks

Thank you for your questions and suggestions in the minor remarks section. We have addressed your recommendations and questions in the revised version of the manuscript. For detailed responses and updates, please refer to this updated version.

Reviewer 2

Remarks

The current literature review shows that the authors are new to this field. Please clearly point out the motivation, innovation, and contribution of this paper. Please include key information in the abstract, including motivation, innovation, key result statistics, conclusions.

Thanks for your remark. According to your suggestion we rewrite the abstract, introduction and related works. We have positioned ourselves in the literature as forecasting wind power at the regional scale, rather than at the farm scale as many papers do. This positioning is made clear in the abstract. We have justified the industrial need for such a forecast in the introduction. In a new part 2 “State of the art”, we have detailed the methods used in the literature to make regional forecasts, and we have expanded a little to forecasts at the scale of a wind turbine to detail the neural networks used in wind power forecasting.

The innovation compared to DRAGON is not clear.

Thanks for your question. In short, DRAGON is a package providing tools to create AutoDL frameworks for a particular task. We used those tools to create WindDragon, specific for the wind power generation forecasting problem. At the end of Part 2, we have added a presentation of the tools offered by DRAGON. In Part 3, entitled WindDragon, we present how we have used these tools, highlighting the innovative part of our approach (data processing, objective function, search space, search algorithm, etc).

Please add more details about the DRAGON algorithm

We added a full subsection (2.4) detailing DRAGON.

How did the authors handle new wind plants and wind curtailment?

For new installations, the load factor (production divided by installed capacity) allows us to take into account the production of new wind farms. Moreover, our models do not need a lot of training data to be effective, so we took 2 years to forecast the year after. During this period, the wind farms do not change very much. If the geographical location changes significantly and performance suffers as a result, WindDragon will have to be run again to get the models to perform well. As for curtailments, they are not very visible at this scale because they are buried in the mass of wind farms. They are also difficult to predict without data from the wind farms themselves. So we have not taken them into account.

It’s not clear how did the authors optimize the CNN for fair comparison.

The CNN structure was found by hand using a trial and error and with a simple grid search to optimize the hyperparameters (e.g. the number of layers, the kernel sizes, the activation functions). The aim of this paper is to show that the use of AutoDL allows to largely improve the performance of a model found by hand.

Please include the training curves to show the training process.

We detailed the CNN training process in the Experimental set-up paragraph within Section 4. As the training is basic (AdamW for 200 epochs) we do not think this is of much importance for this paper. Regarding WindDragon, thanks to your remark we detailed the objective function (building, training and evaluating a deep neural network) in section 3.

There seems to be no significance in improvement compared to CNNs.

These first results could indeed be improved. We changed our search algorithm and used Mutant-UCB instead of the Evolutionary Algorithm, which gave us much better results. This algorithm is described in detail in section 3. Thanks to this algorithm, WindDragon has an error that is 19\% lower than the CNN, which we believe is very advantageous from an operational point of view.

Please show more time series with difference performance of the models.

We added performance comparisons by time of day, month and wind generation distribution. We hope that these new graphs allow the reader to see that WindDragon consistently outperforms the CNN.

Review: WindDragon: automated deep learning for regional wind power forecasting — R1/PR7

Conflict of interest statement

Reviewer declares none.

Comments

Thank you for answering my questions, the paper is more concise and better readable now. My main comment regarding the improvement of WindDragon over the CNN has been addressed properly. I also like that you now assess accuracy in the tails of the distribution.

- The connection between DRAGON and WindDragon is clearer now. I am still not fully certain to which types of problems WindDragon is constrained as compared to DRAGON. It would be helpful to pinpoint the differences concretely, e.g. by comparing it the two frameworks in form of a table or adding one sentence to section 3. I understand that WindDragon is more constrained to the concrete question. This can be helpful and necessary for some related questions (I guess one example would be the extension of the framework to other regions) but also limits generalizability. Concrete questions I’d be interested in: What are the limits of WindDragon compared to DRAGON? What is the input data to DRAGON vs WindDragon? Which additional tools does DRAGON provide, i.e. in which parts is DRAGON more flexible than WindDragon? Which choices did you have to make to build WindDragon out of DRAGON and which ones did you make that are not necessary and could potentially be adapted for other wind power tasks?

- Regarding Table 2: I still don’t understand what “sum of the regional forecasts for each models” means. Do you mean “MAE is the cumulative error based on the regional results”? The errors of Table 3 do not add up to the numbers in Table 2. How is the NMAE in Table 2 computed? I do not understand how Table 2 is generated out of Table 3 as described on page 10, l.38.

- Small additional comment regarding errors: you could also compute a skill score to compute the added value of WindDragon over the CNN. This is a common procedure and I think it would be valuable as, depending on the score, skill scores are easier to interpret than figures.

- Figure 7b,d shows seasonal differences in the normalized monthly error: In summer, where the errors are higher, the improvement of WindDragon over the CNN is higher than in the summer. This could be particularly valuable as there is usually less wind power in summer which makes the generated power more important and accurate predictions more relevant. I think in both normalized errors there is a positive correlation between the error of the CNN and the absolute improvement of WindDragon: The more wrong the CNN is the more it helps to use WindDragon. This is just a visual investigation, maybe you want to check whether this is actually true and point this out in your results (you only mention the seasonal difference briefly but I think this is a big pro argument for WindDragon).

- Increase font sizes in Figure 7.

- Please correct inconsistencies in language/typos. Examples include

- Page 2, line 45 starts with “if”. I think this is not correct.

- Page 3, line 21 “Transfert strategy”.

- Terms such as “machine learning” or “computer vision” are usually not capitalized in sentences.

- Page 5, line 42 “more complex setups might be imagined” sounds very vague, consider rephrasing or leaving out such inconcrete sentences or move them to the discussion.

- m.s^-1 in Figure 4. Consider using the latex SI package.

- Please make sure all citations are displayed correctly, e.g. p.3, l.46 p.2., l.20

- Page 4, line 40 “*A few works tried to apply AutoDL*”. I think it would be better to say “A few works apply AutoDL”

- Figure 8: Please give more information in the caption of the Figure. I also think if you show the relative or absolute difference between the two plots a “convexish” function would underscore your interpretation that WindDragon particularly improves predictions in the tails. I think these improvements in tails and seasons is what makes the predictions of WindDragon more valuable compared to the CNN and I would appreciate it if you pointed this out in the conclusion.

Review: WindDragon: automated deep learning for regional wind power forecasting — R1/PR8

Conflict of interest statement

Reviewer declares none.

Comments

My comments have been well addressed.

Recommendation: WindDragon: automated deep learning for regional wind power forecasting — R1/PR9

Comments

No accompanying comment.

Decision: WindDragon: automated deep learning for regional wind power forecasting — R1/PR10

Comments

No accompanying comment.

Author comment: WindDragon: automated deep learning for regional wind power forecasting — R2/PR11

Comments

We are grateful to the Editor-in-Chief, the Editor and the reviewers for their careful reading and comments. Below we comment on the changes we have made in response to the major remarks made by the reviewer 1. As for the minor comments, we have taken almost all of them into account.

The connection between DRAGON and WindDragon is clearer now. I am still not fully certain to which types of problems WindDragon is constrained as compared to DRAGON. It would be helpful to pinpoint the differences concretely, e.g. by comparing it the two frameworks in form of a table or adding one sentence to section 3. I understand that WindDragon is more constrained to the concrete question. This can be helpful and necessary for some related questions (I guess one example would be the extension of the framework to other regions) but also limits generalizability. Concrete questions I’d be interested in: What are the limits of WindDragon compared to DRAGON? What is the input data to DRAGON vs WindDragon? Which additional tools does DRAGON provide, i.e. in which parts is DRAGON more flexible than WindDragon? Which choices did you have to make to build WindDragon out of DRAGON and which ones did you make that are not necessary and could potentially be adapted for other wind power tasks?

Thank you for your comment. Dragon and WindDragon are not two AutoDL frameworks. Dragon is a toolbox and WindDragon uses these tools to create an AutoDL framework applied to wind forecasting based on NWP maps. Dragon provides computing objects that can be used to encode deep neural networks architecture and hyperparameters, and WindDragon uses these objects to build a framework applicable to the use case. A relevant analogy can be drawn between Dragon and PyTorch, a toolbox for designing neural networks. WindDragon can be regarded as a neural network architecture that uses PyTorch’s tools to perform a specific task, making it irrelevant to consider input data for either Dragon or PyTorch.

Regarding Table 2: I still don’t understand what “sum of the regional forecasts for each models” means. Do you mean “MAE is the cumulative error based on the regional results”? The errors of Table 3 do not add up to the numbers in Table 2. How is the NMAE in Table 2 computed? I do not understand how Table 2 is generated out of Table 3 as described on page 10, l.38.

To get the national MAE per model :

- We first produce one forecast by region.

- We sum for each time step the forecasts of each region.

- We compute the MAE of this new signal.

Thanks to your question, we have added the mathematical formulation of these steps in the “Results” section on page 10.

Small additional comment regarding errors: you could also compute a skill score to compute the added value of WindDragon over the CNN. This is a common procedure and I think it would be valuable as, depending on the score, skill scores are easier to interpret than figures.

We already gave the percentage of improvement within the paragraph « WindDragon’s superior performances » at the end of page 11, but thanks to your remark we added the skill score per quantile Figure 8.b.

Figure 7b,d shows seasonal differences in the normalized monthly error: In winter, where the errors are higher, the improvement of WindDragon over the CNN is higher than in the summer. This could be particularly valuable as there is usually less wind power in summer which makes the generated power more important and accurate predictions more relevant. I think in both normalized errors there is a positive correlation between the error of the CNN and the absolute improvement of WindDragon: The more wrong the CNN is the more it helps to use WindDragon. This is just a visual investigation, maybe you want to check whether this is actually true and point this out in your results (you only mention the seasonal difference briefly but I think this is a big pro argument for WindDragon).

Thank you for your suggestion. We have tested the correlation between CNN error and WindDragon improvement. Our results show that this relationship exists, but is rather weak, with a correlation of 0.5. Although this trend is visible in certain seasons, it does not hold overall.

Figure 8: Please give more information in the caption of the Figure. I also think if you show the relative or absolute difference between the two plots a “convexish” function would underscore your interpretation that WindDragon particularly improves predictions in the tails. I think these improvements in tails and seasons is what makes the predictions of WindDragon more valuable compared to the CNN and I would appreciate it if you pointed this out in the conclusion.

Thank you for your suggestion. We have tested the addition of an absolute/relative difference curve between the WindDragon MAE and the CNN MAE to verify the suggested convex trend. Our results show that, although WindDragon significantly improves predictions at the extremes, the shape of the curve does not strictly follow a convexity. However, in the conclusion, we emphasised the improvement provided by WindDragon in the tails of the distribution and in difficult seasonal conditions.

Recommendation: WindDragon: automated deep learning for regional wind power forecasting — R2/PR12

Comments

No accompanying comment.

Decision: WindDragon: automated deep learning for regional wind power forecasting — R2/PR13

Comments

No accompanying comment.