Hostname: page-component-6766d58669-7cz98 Total loading time: 0 Render date: 2026-05-16T04:45:42.386Z Has data issue: false hasContentIssue false

Space-scale exploration of the poor reliability of deep learning models: the case of the remote sensing of rooftop photovoltaic systems

Published online by Cambridge University Press:  10 April 2025

Gabriel Kasmi*
Affiliation:
MINES Paris, Université PSL Centre Observation Impacts Energie (O.I.E.), 06904 Sophia-Antipolis, France RTE France, Direction de la Recherche et du Développement, 92973 Paris La Défense, France
Laurent Dubus
Affiliation:
RTE France, Direction de la Recherche et du Développement, 92973 Paris La Défense, France WEMC (World Energy & Meteorology Council), Norwich NR4 7TJ, UK
Yves-Marie Saint-Drenan
Affiliation:
MINES Paris, Université PSL Centre Observation Impacts Energie (O.I.E.), 06904 Sophia-Antipolis, France
Philippe Blanc
Affiliation:
MINES Paris, Université PSL Centre Observation Impacts Energie (O.I.E.), 06904 Sophia-Antipolis, France
*
Corresponding author: Gabriel Kasmi; Email: gabriel.kasmi@minesparis.psl.eu

Abstract

Photovoltaic (PV) energy grows rapidly and is crucial for the decarbonization of electric systems. However, centralized registries recording the technical characteristics of rooftop PV systems are often missing, making it difficult to monitor this growth accurately. The lack of monitoring could threaten the integration of PV energy into the grid. To avoid this situation, remote sensing of rooftop PV systems using deep learning has emerged as a promising solution. However, existing techniques are not reliable enough to be used by public authorities or transmission system operators (TSOs) to construct up-to-date statistics on the rooftop PV fleet. The lack of reliability comes from deep learning models being sensitive to distribution shifts. This work comprehensively evaluates distribution shifts’ effects on the classification accuracy of deep learning models trained to detect rooftop PV panels on overhead imagery. We construct a benchmark to isolate the sources of distribution shifts and introduce a novel methodology that leverages explainable artificial intelligence (XAI) and decomposition of the input image and model’s decision regarding scales to understand how distribution shifts affect deep learning models. Finally, based on our analysis, we introduce a data augmentation technique designed to improve the robustness of deep learning classifiers under varying acquisition conditions. Our proposed approach outperforms competing methods and can close the gap with more demanding unsupervised domain adaptation methods. We discuss practical recommendations for mapping PV systems using overhead imagery and deep learning models.

Information

Type
Methods Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. Examples of images of the same PV panels but with different providers and acquisition dates (Up Google, down: IGN).

Figure 1

Figure 2. Test images on which a model trained on Google images (downsampled to 0.2 m/pixel of GSD, “Google baseline”) is evaluated. “Google 0.1 m/pixel” corresponds to the source Google image before downsampling and evaluates the effect of the varying image resolutions. “Google Spatial Shift” corresponds to Google images taken outside of France. “IGN” corresponds to images depicting the same installations as Google baseline but with a different provider.

Figure 2

Figure 3. Decomposition of a PV panel into scales.

Figure 3

Figure 4. Image and associated two-level dyadic wavelet transform with indications to interpret the wavelet transform of the image. “Horizontal,” “diagonal,” and “vertical” indicate the direction of the detail coefficients. The direction is the same at all levels.

Figure 4

Figure 5. A scattering propagator $ {U}_J $ applied to $ x $ computes each $ U\left[{\lambda}_1\right]x=\mid x\star {\psi}_{\lambda_1}\mid $ and outputs $ {S}_J\left[0/\right]x=x\star {\phi}_{2^J} $ (black arrow). Applying $ {U}_J $ to each $ U\left[{\lambda}_1\right]x $ computes all $ U\left[{\lambda}_1,{\lambda}_2\right]x $ and outputs $ {S}_J\left[{\lambda}_1\right]=U\left[{\lambda}_1\right]\star {\phi}_{2^J} $ (black arrows). Applying $ {U}_J $ iteratively to each $ U\left[p\right]x $ outputs $ {S}_J\left[p\right]x=U\left[p\right]x\star {\phi}_{2^J} $ (black arrows) and computes the next path layer. Figure borrowed from Bruna and Mallat, 2013. Note: In the image, the input $ x $ corresponds to $ f $ and $ \lambda ={2}^jr $ is a frequency variable corresponding to the $ {j}^{th} $ scale with $ r $ rotations.

Figure 5

Figure 6. Decomposition in the wavelet domain of the important regions for a model’s prediction with the WCAM.

Figure 6

Figure 7. Illustration of the effect of our data augmentation method on a sample of images.

Figure 7

Table 1. F1 Score and decomposition in true positives, true negatives, false positives, and false negatives rates of the classification accuracy of a CNN model trained on Google images (Google baseline) and tested on the three instances of distributions shifts: GSD (Google 0.1 m/pixel), the geographical variability (Google Spatial Shift) and the acquisition conditions (IGN).

Figure 8

Table 2. F1 Score and decomposition in true positives, true negatives, false positives, and false negative rate of the classification accuracy of the Scattering Transform model trained on Google images and deployed on IGN images. The best results are bolded.

Figure 9

Figure 8. Analysis with the WCAM of the CNNs prediction on an image no longer recognized as a PV panel.

Figure 10

Table 3. F1 Score and decomposition in true positives, true negatives, false positives, and false negatives rate for models trained on Google with different mitigation strategies. Evaluation on IGN images. The Oracle corresponds to a model trained on IGN images with standard augmentations. Best results are bolded, second-best results are underlined, values highlighted in red indicate the worst performance, and values in orange indicate the second-to-last worst performance

Figure 11

Table 4. F1 Score and true positives, true negatives, false positives, and false negatives rates. Evaluation computed on the Google dataset. ERM was trained on Google and Oracle on IGN images

Figure 12

Figure A1. Model explanations using the GradCAM (Selvaraju et al., 2020) for some true positives, false positives, true negatives, and false negatives. The redder, the higher the contribution of an image region to the predicted class (1 for true and false positives, and 0 for true and false negatives).

Figure 13

Figure B1. Flowchart of the wavelet scale attribution method (WCAM). Source: Kasmi et al., 2023b.

Figure 14

Figure D1. Visualization of the different data augmentation techniques implemented in this work.

Figure 15

Table E1. F1 Score and decomposition in true positives, true negatives, false positives, and false negatives rate for models trained on Google with different mitigation strategies. Evaluation of IGN images. The Oracle corresponds to a model trained on IGN images with standard augmentations. Best results are bolded, second-best results are underlined, values highlighted in red indicate the worst performance, and values in orange indicate the second-to-last worst performance

Figure 16

Figure E1. Evaluation of the different domain adaptation methods with the WCAM. Each column represents a column. The first and third rows depict the images from Google and IGN, respectively, and the second and fourth rows are the associated WCAMs.

Figure 17

Table F1. F1 Score and decomposition in true positives, true negatives, false positives, and false negatives rate for models trained on Google with different mitigation strategies. Evaluation of Google images

Figure 18

Table F2. F1 Score and decomposition in true positives, true negatives, false positives, and false negative rate of the classification accuracy of the Scattering Transform model trained on Google images and deployed on IGN images

Figure 19

Figure G1. Analysis with the WCAM of the CNNs prediction on an image no longer recognized as a PV panel.

Figure 20

Figure G2. Analysis with the WCAM of the CNNs prediction on an image no longer recognized as a PV panel.

Figure 21

Figure G3. Analysis with the WCAM of the CNNs prediction on an image that remains insensitive to varying acquisition conditions.

Figure 22

Figure G4. WCAMs on IGN of models trained on Google with different augmentation techniques.

Figure 23

Figure G5. Evaluation of the different domain adaptation methods with the WCAM. Each column represents a column. The first and third rows depict the images from Google and IGN, respectively, and the second and fourth rows are the associated WCAMs.

Figure 24

Figure G6. Evaluation of the different domain adaptation methods with the WCAM. Each column represents a column. The first and third rows depict the images from Google and IGN, respectively, and the second and fourth rows are the associated WCAMs.

Author comment: Space-scale exploration of the poor reliability of deep learning models: the case of the remote sensing of rooftop photovoltaic systems — R0/PR1

Comments

Dear Editor and Co-Guest editors,

We are pleased to submit the manuscript of our work Space-scale Exploration of the Poor Reliability of Deep Learning Models: the Case of the Remote Sensing of Rooftop Photovoltaic Systems» to this Special Special collection of Environmental Data Science. This manuscript is an enriched version of our work « Can We Reliably Improve the Robustness to Image Acquisition of Remote Sensing of PV Systems? » accepted as a poster at the « Tackling Climate Change with Machine Learning » workshop during NeurIPS 2023.

Deep learning algorithms have been extensively used in recent years to detect rooftop PV systems from aerial images. However, the data produced by these algorithms is unreliable as deep learning models are sensitive to distribution shifts. In practical terms, this means that a model trained on a given dataset generalizes poorly to new images, thus preventing, for instance, carrying out updates on the rooftop PV fleet used in a given location.  

This work introduces a novel methodology based on explainable artificial intelligence (XAI) to understand the sensitivity of deep learning models trained to detect rooftop photovoltaic (PV) systems on aerial imagery. We then propose a data augmentation technique to mitigate this sensitivity and draw some practical recommendations regarding the training process and the choice of the training data.

This work improves our understanding of the limitations of deep learning algorithms in applied settings and introduces a methodology to alleviate these limitations. Therefore, it paves the way for using deep learning models to address the lack of information regarding small-scale photovoltaic (PV) systems, ultimately favoring their insertion into the grid. We believe that our manuscript is a good fit for the Special Collection as it presents research at the intersection of machine learning and climate change by contributing to lifting limitations of deep learning algorithms applied in climate change-related topics (in our case, the integration of renewable energy sources such as PV).

We confirm that neither the manuscript nor any parts of its content are currently under consideration for publication with or published in another journal. All authors have approved the manuscript and agree with its submission to Environmental Data Science.

Best regards,

The Authors.

Review: Space-scale exploration of the poor reliability of deep learning models: the case of the remote sensing of rooftop photovoltaic systems — R0/PR2

Conflict of interest statement

No competing financial or non-financial interests.

Comments

Article review:

Title: Space-scale Exploration of the Poor Reliability of Deep Learning Models: the Case of the Remote Sensing of Rooftop Photovoltaic Systems

Overall:

Detecting rooftop photovoltaic (PV) installations from aerial imagery is a critical machine learning (ML) task with significant implications for addressing climate change. By facilitating the integration, planning, and monitoring of rooftop PV systems, this task supports the global transition to renewable energy. Deep learning (DL) methods, which have driven considerable advances in image segmentation and object detection within mainstream computer vision, can also address crucial challenges in this less-explored domain. However, the focus of ML research on more popular tasks has led to a limited understanding of the performance, reliability, and robustness of existing and emerging methods when applied to specialized areas like remote sensing for the renewable energy sector, as explored in this study.

This study contributes by identifying domain-specific challenges that existing ML techniques must overcome to improve rooftop PV detection. Among these challenges, a key issue highlighted by the authors is the distribution shift between training and testing data, specifically, the out-of-distribution performance of current ML and DL models. This is a significant concern in applying ML to domains where data is not identically and independently distributed (iid), a common assumption in traditional ML approaches based on empirical risk minimization (ERM).

To address this problem, various techniques have been developed, including those based on geometric DL and physics-informed ML. These methods aim to encode desired symmetries into ML models, often providing more advanced alternatives to data augmentation, which may be less effective in many contexts now. Other competing methods are further found in domain adaptation.

Despite these advances, the authors' contribution is valuable in advancing our understanding of the specific challenges within this application domain. However, there is a need to broaden the scope and refine the narrative to more effectively position this work within the broader ML landscape.

1. Introduction:

The length of the introduction is great. It is concise yet informative, making it accessible to a broader non-expert audience while effectively conveying the paper’s main focus. However, the narrative surrounding the broader issue of distribution shifts in data, particularly how real-world data often deviates from the iid assumption, and the resulting challenges for ML and DL methods under today’s prevalent paradigms, including empirical risk minimization (ERM), could be further refined. The challenge here is to maintain simplicity and clarity for readers without a deep ML background.

Detailed Suggestions:

• Narrative Enhancement: Identifying rooftop PV installations, but also potential rooftop areas without PV installations, is crucial for monitoring PV growth. Solving this task not only helps in tracking current PV adoption but also in promoting further expansion, which is essential for accelerating the global energy transition. Additionally, identifying these areas is vital for safe integration into the power system, including planning necessary grid expansions, storage capacities, and other infrastructure.

• Page 2, Line 14: Clarify what is meant by “generated data” and “reliability.” Specifically, provide context on how the data was generated and what aspects of reliability are being assessed.

• Page 2, Line 15: Define what is meant by a “deep learning-based registry of rooftop PV systems.” Provide a brief explanation or context to help the reader understand this concept.

• Page 2, Line 17: Consider whether advancing super-resolution imaging techniques challenge the proposed minimum resolutions. Refer to relevant research, such as this https://www.jmlr.org/papers/volume23/21-0635/21-0635.pdf paper, to provide a more nuanced discussion.

• Page 2, Lines 18-19: Discuss how this https://www.tandfonline.com/doi/full/10.1080/07038992.2024.2363236#abstract might relate to or challenge your work. This could add depth to your argument.

• Narrative Enhancement: I agree that understanding the impact of distribution shifts on the generalization performance of ML models, particularly for rooftop PV detection, is critical. However, the discussion would benefit from connecting this to the broader issue of generalization performance across various ML tasks under ERM techniques. Consider this https://proceedings.mlr.press/v139/koh21a to explicitly address this connection.

• Page 2, Line 24: Use the word “highlight” instead of “disentangle” for clarity and precision.

• Page 2, Line 25: When stating, “to understand how these shifts affect the deep learning models,” be more specific. Clarify what aspects of the models' performance or behavior you are investigating, as this will help readers grasp the significance of your analysis here.

• Page 2, Lines 26-29: If the solutions you propose are agnostic to the specific DL models used, meaning they are generally applicable across different architectures, this should be explicitly highlighted. It would strengthen your argument and the relevance of your work.

• The repositories for code and model data are well-organized. Great job on this.

2. Related Work:

• Page 3, Line 28: Use “depends” instead of “depending”.

• I agree that understanding the underlying reasons for poor performance is crucial in designing better ML-based solutions. The authors make this point effectively. However, it would be beneficial to verify whether other studies on rooftop PV detection have also identified these reasons, even if they did not explicitly focus on “poor reliability.” There might be existing literature that addresses similar challenges.

• Page 3, Lines 38-52: Several important aspects seem to be missing here.

First, to provide a more complete understanding of the problem, it is essential to discuss the importance of the iid assumption in the context of widely used ERM algorithms in ML. Breaking this assumption is a well-known cause of poor generalization performance, regardless of the specific application domain. Additionally, it is well-established that real-world data, especially remote-sensing data, often does not satisfy the iid assumption.

Second, a discussion on epistemic uncertainty is missing in this context. Addressing this type of uncertainty is crucial when dealing with non-iid data, as it directly impacts model reliability and generalization.

Third, there is a lack of discussion about methods that go beyond data augmentation, such as geometric DL and physics-informed ML. These methods encode symmetries (such as invariances) directly into the model architecture, reducing sample complexity and often outperforming data augmentation. For instance, training data augmentation is increasingly seen as a less sophisticated approach to encoding invariances compared to methods like Group Equivariant Convolutional Networks (G-CNNs). G-CNNs, for example, can inherently account for not just translation (which CNNs handle through translation equivariance) but also rotations and reflections in relevant patterns, here PV panels across images. For further reference, see https://proceedings.mlr.press/v48/cohenc16.html

3. Data

It would be helpful to provide more detail on how the datasets are utilized in the study. Specifically, it would be beneficial to explain how the double annotations for the 7,686 PV systems are incorporated into your analysis. Additionally, consider offering a brief overview of the empirical experiments conducted, highlighting what each dataset is intended to demonstrate or what comparisons are being made between the datasets. This context would clarify the role of each dataset in your study and enhance the reader’s understanding of the overall research approach.

4. Methods

The methods section currently lacks clarity and has significant room for improvement. In its current form, it is, in my opinion, the weakest part of the manuscript.

• The first sentence is an excellent summary of the research question. Consider making this statement equally explicit and concrete in the Introduction and Discussion sections.

• Page 5, Lines 25-27: What are the three factors or methods you’re referring to? It is helpful to mention them briefly for clarity: geographic shifts, acquisition conditions, resolution.

• Page 5, Lines 25-37: This paragraph is difficult to understand, and I found it challenging to grasp the methodology you’re proposing to answer your overarching research question. Except for the first sentence, the rest of the paragraph needs significant revision to provide a clear, step-by-step overview of your methodology and experimental design.

4.1 Disentangling the Sources of Distribution Shifts in Overhead Images

• Page 5, Lines 42-43: It appears that only geographical variability can be observed independently when resolution and acquisition conditions are held constant. In cases where resolution and acquisition conditions vary, it seems that only their combined effects can be evaluated based on the available data, without further processing. If this is correct, it’s important to clarify that the analysis isn’t fully disentangled across these three factors based on the raw data, and that this limitation can be overcome by up-/down-sampling the images.

• If I understand correctly, your approach involves isolating the effect of resolution variations by training the model on a standardized 20cm/pixel resolution. You then assess performance across both the same resolution (20cm/pixel) and a different resolution (10cm/pixel). Additionally, you test performance under identical acquisition conditions (Google images upsampled to 20cm/pixel) and differing acquisition conditions (IGN images at the original 20cm/pixel). These tests are conducted both within a geographically in-distribution context and a geographically out-of-distribution context. It’s important to emphasize that your out-of-distribution analysis refers solely to the geographical aspect (inside France vs. outside France), and does not explicitly account for other factors such as the distribution of pixels within these images or other similar factors. I encourage to use “spatial shifts” or “geographic shifts” instead of in-distirbution and out-of-distribution here. This is conflicting with in-distribution and out-of-distribution statements with respect to resolution and acquisition conditions of training vs testing data. It confused me personally, until I got to read through the end of the paper.

• Question: Would your observations be consistent if you were to reverse the approach, including a downsample of IGN images to conduct a more comprehensive four-fold analysis of your current baselines? This could include:

◦ Baseline 1 (current): Google 20cm/pixel

◦ Baseline 2: Google 10cm/pixel

◦ Baseline 3: IGN 20cm/pixel

◦ Baseline 4: IGN 10cm/pixel

This might be an interesting idea to strengthen your observations and potentially uncover new, important relationships.

5. Results

5.1. Deep Models Are Mostly Sensitive to Varying Acquisition Conditions, Leading to an Increase in False Negatives

This section is concise and clear. Well done. However, the inconsistent use of “resolution” and “ground sampling distance” throughout the paper can cause confusion. The term “ground sampling distance” is technical jargon that might make the paper harder to read for a broader audience. I recommend using “resolution” consistently throughout the manuscript.

5.2. The Scattering Transform Shows That Clean, Fine-Scale Features Are Transferable but Poorly Discriminative

These results are intriguing. However, there’s an inconsistency in how you highlight the best performance in your tables. For consistency, consider using the same markers across all tables (e.g., bold for best, yellow for second-best, and red for worst or problematic cases).

5.3. CNNs Are Sensitive to the Distortion of Coarse-Scale Discriminative Features

• Other options for measuring the distance between images includes calculating the Euclidean distance between images in an embedded vector space after encoding by your CNN, or, on the raw image data, using the JS/KL-divergence between the distribution of SSIMs. Please explain why you chose your specific method for measuring distance between images.

• The relevance of the p-values should be clarified—what should readers infer from them? Similarly, explain what the Pearson Correlation Coefficient (PCC) measures, particularly its role in assessing the linear correlation between two sets of data.

• Page 12, Lines 13-22: This section is difficult to understand and would benefit from improved clarity. Consider restructuring the explanation to make it more accessible.

5.4. Pathways Towards Improving Robustness to Acquisition Conditions

• There seems to be a discrepancy between your described observations and the numbers in Table 3. Please double-check the data to ensure accuracy in your statements.

• It’s great that you discuss the trade-off between recall and the F1-score, as this provides a fairer and more balanced comparison.

• For the benefit of a broader, non-ML audience, clarify that Recall is the True Positive Rate. It might also be helpful to include this clarification in all relevant results tables.

• Once again, use the same markers for best, worst, etc., across all results tables.

5.4.2. On the Role of the Input Data: Practical Recommendations for Training Data

• The study could benefit significantly from running all experiments in reverse, as previously suggested. Specifically, consider conducting a four-fold set of experiments by down-sampling your IGN images. This could provide additional insights and strengthen your findings. I don’t see this as a requirement though.

General Remarks:

• The suggested approach for improving robustness to acquisition conditions would benefit from a discussion and analysis of domain adaptation as a competing method. While the presented results are still relevant, incorporating this perspective could add depth to the placement of your paper within the larger scope of the ML landscape.

• The Methods section needs significant improvement in clarity. As it stands, it is the weakest part of the manuscript and difficult to navigate. I strongly encourage revising this section to enhance readability and comprehension.

• In contrast, the Results section is well-written and stands out as the strongest part of the manuscript. It effectively communicates the findings.

• Regarding the design of experiments, the choice of terminology for “out-of-distribution” is somewhat misleading. It would be more accurate to use terms like “spatial shift” or “geographic shift” to describe the isolated evaluation of images from France versus those from outside France, rather than implying a broader distribution shift.

• Consider whether the term “ground sampling distance” is necessary, especially if “resolution” suffices to convey the same meaning. Using “resolution” would make the text more accessible to a general audience and avoid unnecessary complexity.

• Lastly, review whether ‘aerial imagery’ or ‘satellite imagery’ would be more appropriate than ‘overhead imagery.’ Choosing the right term will help ensure clarity and precision in describing the data sources.

Review: Space-scale exploration of the poor reliability of deep learning models: the case of the remote sensing of rooftop photovoltaic systems — R0/PR3

Conflict of interest statement

Reviewer declares none.

Comments

The paper is well-written, well-organized, and offers a thorough review of prior research.

It would be good to include descriptions and plots of the ‘Blurring’ and ‘Blurring + Wavelet Perturbation’ approaches in the main text of the paper, as these are key contributions of the work.

Recommendation: Space-scale exploration of the poor reliability of deep learning models: the case of the remote sensing of rooftop photovoltaic systems — R0/PR4

Comments

Thank you very much for submitting your manuscript for review. Based on the reviews, we cannot immediately accept this manuscript for publication. However, we would reconsider a revised version based on the reviewer comments.

The reviewers were overall positive in their assessment of the work. However, both reviewers highlighted deficiencies of the Methods section, and we encourage the authors to use these reviews to strengthen the manuscript.

Decision: Space-scale exploration of the poor reliability of deep learning models: the case of the remote sensing of rooftop photovoltaic systems — R0/PR5

Comments

No accompanying comment.

Author comment: Space-scale exploration of the poor reliability of deep learning models: the case of the remote sensing of rooftop photovoltaic systems — R1/PR6

Comments

Prof. Claire Monteleoni

Chief Editor

Environmental Data Science journal

January 19th, 2025

Dear Prof. Monteleoni,

We are grateful to you and the reviewers for your thoughtful feedback and for recognizing the potential of our manuscript, “Space-scale Exploration of the Poor Reliability of Deep Learning Models: The Case of the Remote Sensing of Rooftop Photovoltaic Systems”. We confirm that this work is original ad has not been published elsewhere, nor is currently under consideration for publication elsewhere.

This paper addresses the challenges of reliably detecting rooftop photovoltaic (PV) systems using deep learning on overhead imagery, a critical task for monitoring PV deployment and supporting grid integration. It comprehensively evaluates the effects of distribution shifts on model performance, introduces a novel methodology leveraging explainable AI and multi-scale analysis, and proposes a data augmentation technique that significantly improves classifier robustness under varying conditions.

The comments provided by the Reviewers have significantly enhanced the quality of our work, and we have addressed them in detail in the attached document. In our revised submission, we have made the following key improvements:

1. Strengthened Methods Section: We have followed the feedbacks Reviewer 1 and strengthened our methods by providing additional experiments and checking the consistency of our results. We also introduced a comparison and a discussion with domain adaptation methods, thus broadening the scope of our work. We also believe that the discussion with domain adaptation methods broadens the scope of our work.

2. Refined narrative: following the remarks of Reviewer 1, we have refined our narrative and believe that the contributions of our work, both within the machine learning and the environmental sciences landscapes are clearer.

We deeply appreciate the reviewers' and editor’s efforts, which have helped us refine our manuscript and strengthen its impact. We believe that the revised version effectively addresses all feedbacks and enhances the clarity and significance of our contributions.

We have no conflicts of interest to disclose. This work is done as a part of my postdoctoral contract, which is sponsored by Réseau de transport d’électricité (RTE France), the French transmission system operator.

Thank you for your time and consideration.

Sincerely,

Dr. Gabriel Kasmi

Postdoctoral Researcher, Laboratoire Observation, Impacts, Energie (O.I.E.) Mines ParisTech

Review: Space-scale exploration of the poor reliability of deep learning models: the case of the remote sensing of rooftop photovoltaic systems — R1/PR7

Conflict of interest statement

Reviewer declares none.

Comments

Paper is well written and provides necessary details to replicate the research.

Review: Space-scale exploration of the poor reliability of deep learning models: the case of the remote sensing of rooftop photovoltaic systems — R1/PR8

Conflict of interest statement

Reviewer declares none.

Comments

All issues raised appropriately addressed. Good job.

Recommendation: Space-scale exploration of the poor reliability of deep learning models: the case of the remote sensing of rooftop photovoltaic systems — R1/PR9

Comments

Dear Authors,

I am pleased to inform you that your manuscript ”Space-scale Exploration of the Poor Reliability of Deep Learning Models: The Case of the Remote Sensing of Rooftop Photovoltaic Systems" has been accepted for publication in Environmental Data Science.

We appreciate the revisions you made in response to reviewer feedback, which have strengthened the manuscript further.

We look forward to seeing your work published and hope you will continue to contribute to Environmental Data Science in the future.

Decision: Space-scale exploration of the poor reliability of deep learning models: the case of the remote sensing of rooftop photovoltaic systems — R1/PR10

Comments

No accompanying comment.