Hostname: page-component-5db58dd55d-smskv Total loading time: 0 Render date: 2026-06-01T01:38:03.898Z Has data issue: false hasContentIssue false

Toward accurate forecasting of renewable energy: Building datasets and benchmarking machine learning models for solar and wind power in France

Published online by Cambridge University Press:  15 September 2025

Eloi Lindas*
Affiliation:
Laboratoire des Sciences du Climat et de l’Environnement (LSCE), IPSL, CEA/CNRS/UVSQ, Université Paris-Saclay, Gif-sur-Yvette, France Atos Inno’Lab TS Bezons, Atos, Bezons, France
Yannig Goude
Affiliation:
Laboratoire de Mathématiques d’Orsay (LMO), Faculté des Sciences d’Orsay, CNRS, Université Paris-Saclay, Orsay, France EDF R&D Lab, OSIRIS, EDF, Palaiseau, France
Philippe Ciais
Affiliation:
Laboratoire des Sciences du Climat et de l’Environnement (LSCE), IPSL, CEA/CNRS/UVSQ, Université Paris-Saclay, Gif-sur-Yvette, France
*
Corresponding author: Eloi Lindas; Email: eloi.lindas@lsce.ipsl.fr

Abstract

Accurate prediction of nondispatchable renewable energy sources is essential for grid stability and price prediction. Regional power supply forecasts are usually indirect through a bottom-up approach of plant-level forecasts, incorporate lagged power values, and do not use the potential of spatially resolved data. This study presented a comprehensive methodology for predicting solar and wind power production at a country scale in France using machine learning models trained with spatially explicit weather data combined with spatial information about production sites’ capacity. A dataset is built spanning from 2012 to 2023, using daily power production data from Réseau de Transport d’Electricité (the national grid operator) as the target variable, with daily weather data from ECMWF Re-Analysis v5, production sites capacity and location, and electricity prices as input features. Three modeling approaches are explored to handle spatially resolved weather data: spatial averaging over the country, dimension reduction through principal component analysis, and a computer vision architecture to exploit complex spatial relationships. The study benchmarks state-of-the-art machine learning models as well as hyperparameter tuning approaches based on cross-validation methods on daily power production data. Results indicate that cross-validation tailored to time series is best suited to reach low error. We found that neural networks tend to outperform traditional tree-based models, which face challenges in extrapolation due to the increasing renewable capacity over time. Model performance ranges from 4% to 10% in normalized root-mean-squared error for midterm horizon, achieving similar error metrics to local models established at a single-plant level, highlighting the potential of these methods for regional power supply forecasting.

Information

Type
Application Paper
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NC
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial licence (http://creativecommons.org/licenses/by-nc/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Open Practices
Open data
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. Global framework of this study represented schematically.

Figure 1

Figure 2. Power supply and capacity time series for wind and solar in France for the period of interest. The power capacity curves have been smoothed to a yearly resolution.

Figure 2

Figure 3. Illustration of power-weighted weather maps creation for wind.

Figure 3

Figure 4. Representation of the three modeling approaches used in this work to make use of weather maps.

Figure 4

Figure 5. Different cross-validation procedures considered in this work represented schematically. For Hold-Out and K-Fold, only the method without prior random shuffling is represented.

Figure 5

Table 1. Average and standard deviation of computing times for 1 iteration for each cross-validation method in seconds

Figure 6

Figure 6. Results of different cross-validation techniques for random forest on solar. Each axis represents a monitored quantity for a given HPO optimization procedure. The values for each method are plotted as points, and only the worst and best values for each axis are printed. The (S) indicates the shuffling variant of the method.

Figure 7

Figure 7. Robustness of cross-validation procedure regarding the dataset size for random forest on solar. The marker indicates the average $ \mid \Delta \varepsilon \mid $, while the error bars display the standard deviation. The (S) indicates the shuffling variant of the method.

Figure 8

Table 2. Benchmark results for different models using three different modeling approaches on the solar dataset

Figure 9

Figure 8. Power capacity, occlusion attribution, and regional realized power supply for early and late 2023 for Wind. Occlusion is an interpretation method that hides part of the input and sees how it impacts the CNN prediction. The higher the impact is, the higher the hidden part’s importance (Zeiler and Fergus, 2014). Power supply data are obtained from RTE for all of France’s regions (NUTS1).

Figure 10

Table A1. Description of climate variables

Figure 11

Figure C1. Results of different cross-validation techniques for boosted trees on solar. Only the worst and best values for each axis are printed. The (S) indicates the shuffling variant of the method.

Figure 12

Figure C2. Robustness of cross-validation procedure regarding dataset size for boosted tress on Solar. The marker indicates the average $ \mid \Delta \varepsilon \mid $, while the error bars display the standard deviation. The (S) indicates the shuffling variant of the method.

Figure 13

Figure C3. Results of different cross-validation techniques for feed-forward neural network on solar. Only the worst and best values for each axis are printed. The (S) indicates the shuffling variant of the method.

Figure 14

Figure C4. Robustness of cross-validation procedure regarding dataset size for feed-forward neural network on solar. The marker indicates the average $ \mid \Delta \varepsilon \mid $, while the error bars display the standard deviation. The (S) indicates the shuffling variant of the method.

Figure 15

Figure D1. Results of different cross-validation techniques for random forest on wind. Only the worst and best values for each axis are printed. The (S) indicates the shuffling variant of the method.

Figure 16

Figure D2. Robustness of cross-validation procedure regarding dataset size for random forest on wind. The marker indicates the average $ \mid \Delta \varepsilon \mid $, while the error bars display the standard deviation. The (S) indicates the shuffling variant of the method.

Figure 17

Figure D3. Results of different cross-validation techniques for boosted trees on wind. Only the worst and best values for each axis are printed. The (S) indicates the shuffling variant of the method.

Figure 18

Figure D4. Robustness of cross-validation procedure regarding dataset size for boosted trees on wind. The marker indicates the average $ \mid \Delta \varepsilon \mid $, while the error bars display the standard deviation. The (S) indicates the shuffling variant of the method.

Figure 19

Figure D5. Results of different cross-validation techniques for feed-forward neural network on wind. Only the worst and best values for each axis are printed. The (S) indicates the shuffling variant of the method.

Figure 20

Figure D6. Robustness of cross-validation procedure regarding dataset size for feed-forward neural network on wind. The marker indicates the average $ \mid \Delta \varepsilon \mid $, while the error bars display the standard deviation. The (S) indicates the shuffling variant of the method.

Figure 21

Table E1. Benchmark results for different models using three different modeling approaches on the wind dataset

Figure 22

Table F1. Comparison of ENTSO-E day ahead renewable Forecast performance for France with our model forecast performance in 2023 (test set)

Figure 23

Table G1. Comparison of our model performance when adding Gaussian noise to the weather inputs to mimic weather forecast data

Author comment: Toward accurate forecasting of renewable energy: Building datasets and benchmarking machine learning models for solar and wind power in France — R0/PR1

Comments

Cover Letter

Eloi LINDAS

Commissariat à l’Energie Atomique (CEA)/ Laboratoire des Sciences du Climat et de l’Environnement (LSCE)

Orme des Merisiers, Bat 714, 91190 Saint-Aubin

06/12/2024

Dear Editors,

I am writing to submit our manuscript titled “Towards Accurate Forecasting of Renewable Energy: Building Datasets and Benchmarking Machine Learning Models for Solar and Wind Power in France” for consideration in Environmental Data Science as an Application paper.

Our work addresses a challenge in the transition to renewable energy: accurate forecasting of regional solar and wind power supply. Using over a decade of spatially resolved weather and production data, we developed a dataset and benchmarked state-of-the-art machine learning models across 3 modeling approaches. Our findings give insights and recommendations on how to select a cross-validation procedure to estimate model generalization error as precisely as possible. The work also demonstrates the effectiveness of vision-based models in capturing complex spatial relationships, significantly enhancing forecasting accuracy at a national scale for France.

This study contributes to the journal’s focus on data-driven approaches for sustainable decision-making with:

1. A dataset that integrates spatially explicit weather, generation, and market data spanning 2012–2023.

2. Insights for practitioners, such as recommendations for cross-validation methods tailored to time series forecasting.

3. A benchmark of state-of-the-art machine learning models, exploring techniques from dimension reduction to computer vision.

We believe this research will interest a wide audience, including data scientists, energy forecasters, and policymakers, as it advances methodologies to support the integration of renewable energy into the grid.

We confirm that this work is original and has not been published elsewhere, nor is it currently under consideration for publication elsewhere. We have no conflicts of interest to disclose and all authors approved this submission.

Thank you for considering our manuscript. We look forward to the possibility of contributing to Environmental Data Science. To address any questions or provide additional materials, please contact me at: eloi.lindas@lsce.ipsl.fr

Sincerely,

Eloi LINDAS (On behalf of the co-authors)

Review: Toward accurate forecasting of renewable energy: Building datasets and benchmarking machine learning models for solar and wind power in France — R0/PR2

Conflict of interest statement

Reviewer declares none.

Comments

TITLE: Towards Accurate Forecasting of Renewable Energy: Building Datasets and Benchmarking Machine Learning Models for Solar and Wind Power in France

Manuscript ID : EDS-2024-0110

Manuscript Type : Application Paper

Summary

In the paper, the authors propose a machine learning approach to national-scale French solar and wind power generation forecasting from geographically explicit weather and production data. The method applies heterogeneous modelling approaches and rigorous validation and demonstrates the superior performance of neural networks compared with traditional approaches. I found this paper interesting and important in the energy sector. However, the manuscript requires some improvements. Some general comments are:

General Comments

1. Provide a list of abbreviations used in the paper.

2. Arrange keywords in alphabetical order.

3. I suggest you put table captions above tables and figure captions below figures.

4. Using MAPE and R^2 as evaluation metrics are usually not recommended when working with renewable energies such as solar and wind energy production. The authors are advised to read the following paper. Hong et al. (2020): Energy Forecasting: A Review and Outlook https://doi.org/10.1109/OAJPE.2020.3029979 See, for example, the section on Common Issues. Instead, the authors can use other evaluation metrics such as MBE, MASE, etc.

5. While achieving error levels comparable to single-plant models is a welcome development, the 4-10% nRMSE for a midterm horizon still suggests room for greater accuracy.

6. Page 14, Table, it appears that combining PCA with models proposed in the study did not improve the performance on the datasets. I suggest that the authors combine PCA with shrinkage methods such as Lasoo. Combining these techniques, such as using PCA for initial dimensionality reduction followed by Lasso for feature selection on the reduced set, can often yield the most robust and accurate time series forecasting models.

7. The fact that neural networks have been found to outperform tree-based models due to the latter’s susceptibility to extrapolation difficulties, which rise with increasing renewable capacity, suggests a need for alternative methods to counter this inherent weakness of tree-based algorithms.

8. In the “Conclusion” section, authors should avoid summarising the aspects they have already stated in the body of the manuscript. Instead, they should interpret their findings at a higher level of abstraction than in the previous sections of the manuscript. The authors should highlight whether or to what extent they have addressed the necessity identified within the “Introduction” section (the identified gap). The authors should avoid restating everything they did once again. However, instead, they should emphasise what their findings mean to the readers, making the “Conclusions” section interesting and memorable to them. The authors should not restate what they have done or what the article does. They should focus instead on what they have discovered and, most importantly, on what their findings mean.

Specific Comments

1. Page 3, line 28, proper referencing is required on Malvoni et al. On line 47 (ARIMA, SARIMAX…), what are the three dots for?

2. Page 5, line 42, change figure 2 to Figure 2.

3. Page 6, line 21, change “grrid” to grid. Attend to all minor typos in the manuscript.

Review: Toward accurate forecasting of renewable energy: Building datasets and benchmarking machine learning models for solar and wind power in France — R0/PR3

Conflict of interest statement

Reviewer declares none.

Comments

study presents a comprehensive methodology for predicting solar and wind power production at country scale in France using machine learning models trained with spatially explicit weather data combined with spatial information about production sites’ capacity.

Authors present three different modeling approaches to handle the weather-gridded data to forecast daily wind and PV power production. What is the reason for selecting these three specific modeling techniques?

It is better to mention the size of data set in table 3.

Recommendation: Toward accurate forecasting of renewable energy: Building datasets and benchmarking machine learning models for solar and wind power in France — R0/PR4

Comments

Both reviewers recognized the relevance of the article and provided only minor comments for improvement. Based on their evaluations, my recommendation is for a minor revision.

Decision: Toward accurate forecasting of renewable energy: Building datasets and benchmarking machine learning models for solar and wind power in France — R0/PR5

Comments

No accompanying comment.

Author comment: Toward accurate forecasting of renewable energy: Building datasets and benchmarking machine learning models for solar and wind power in France — R1/PR6

Comments

Cover Letter

Eloi LINDAS

Commissariat à l’Energie Atomique (CEA)/ Laboratoire des Sciences du Climat et de l’Environnement (LSCE)

Orme des Merisiers, Bat 714, 91190 Saint-Aubin

06/12/2024

Dear Editors,

I am writing to submit our manuscript titled “Towards Accurate Forecasting of Renewable Energy: Building Datasets and Benchmarking Machine Learning Models for Solar and Wind Power in France” for consideration in Environmental Data Science as an Application paper.

Our work addresses a challenge in the transition to renewable energy: accurate forecasting of regional solar and wind power supply. Using over a decade of spatially resolved weather and production data, we developed a dataset and benchmarked state-of-the-art machine learning models across 3 modeling approaches. Our findings give insights and recommendations on how to select a cross-validation procedure to estimate model generalization error as precisely as possible. The work also demonstrates the effectiveness of vision-based models in capturing complex spatial relationships, significantly enhancing forecasting accuracy at a national scale for France.

This study contributes to the journal’s focus on data-driven approaches for sustainable decision-making with:

1. A dataset that integrates spatially explicit weather, generation, and market data spanning 2012–2023.

2. Insights for practitioners, such as recommendations for cross-validation methods tailored to time series forecasting.

3. A benchmark of state-of-the-art machine learning models, exploring techniques from dimension reduction to computer vision.

We believe this research will interest a wide audience, including data scientists, energy forecasters, and policymakers, as it advances methodologies to support the integration of renewable energy into the grid.

We confirm that this work is original and has not been published elsewhere, nor is it currently under consideration for publication elsewhere. We have no conflicts of interest to disclose and all authors approved this submission.

Thank you for considering our manuscript. We look forward to the possibility of contributing to Environmental Data Science. To address any questions or provide additional materials, please contact me at: eloi.lindas@lsce.ipsl.fr

Sincerely,

Eloi LINDAS (On behalf of the co-authors)

Recommendation: Toward accurate forecasting of renewable energy: Building datasets and benchmarking machine learning models for solar and wind power in France — R1/PR7

Comments

The authors have fully addressed the reviewers’ comments, and I am pleased to recommend the manuscript for acceptance. Congratulations

Decision: Toward accurate forecasting of renewable energy: Building datasets and benchmarking machine learning models for solar and wind power in France — R1/PR8

Comments

No accompanying comment.