Hostname: page-component-77f85d65b8-hzqq2 Total loading time: 0 Render date: 2026-03-29T20:52:51.988Z Has data issue: false hasContentIssue false

Cloudy with a chance of uncertainty: autoconversion rates forecasting via evidential regression from satellite data

Published online by Cambridge University Press:  02 January 2025

Maria Carolina Novitasari*
Affiliation:
Department of Electronic and Electrical Engineering, University College London, London, United Kingdom
Johannes Quaas
Affiliation:
Leipzig Institute for Meteorology, Universität Leipzig, Leipzig, Germany ScaDS.AI - Center for Scalable Data Analytics and AI, Leipzig, Germany
Miguel R. D. Rodrigues
Affiliation:
Department of Electronic and Electrical Engineering, University College London, London, United Kingdom
*
Corresponding author: Maria Carolina Novitasari; Email: maria.novitasari.20@ucl.ac.uk

Abstract

High-resolution simulations such as the ICOsahedral Non-hydrostatic Large-Eddy Model (ICON-LEM) can be used to understand the interactions among aerosols, clouds, and precipitation processes that currently represent the largest source of uncertainty involved in determining the radiative forcing of climate change. Nevertheless, due to the exceptionally high computing cost required, this simulation-based approach can only be employed for a short period within a limited area. Despite the potential of machine learning to alleviate this issue, the associated model and data uncertainties may impact its reliability. To address this, we developed a neural network (NN) model powered by evidential learning, which is easy to implement, to assess both data (aleatoric) and model (epistemic) uncertainties applied to satellite observation data. By differentiating whether uncertainties stem from data or the model, we can adapt our strategies accordingly. Our study focuses on estimating the autoconversion rates, a process in which small droplets (cloud droplets) collide and coalesce to become larger droplets (raindrops). This process is one of the key contributors to the precipitation formation of liquid clouds, crucial for a better understanding of cloud responses to anthropogenic aerosols and, subsequently, climate change. We demonstrate that incorporating evidential regression enhances the model’s credibility by accounting for uncertainties without compromising performance or requiring additional training or inference. Additionally, the uncertainty estimation shows good calibration and provides valuable insights for future enhancements, potentially encouraging more open discussions and exploration, especially in the field of atmospheric science.

Information

Type
Application Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices
Open data
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. General framework. The left side of the image illustrates the climate science-based procedures we apply to our dataset to generate input–output pairs for training and testing. The center of the image represents our ML framework, which also includes uncertainty quantification. The right side depicts the satellite observation data we used and the procedure to predict the autoconversion rates from the satellite data, including its inherent uncertainty. 1 ICOsahedral Non-hydrostatic Large-Eddy Model; 2 Uncertainty Quantification; 3 Moderate Resolution Imaging Spectroradiometer.

Figure 1

Figure 2. The architecture of our evidential NN model, with cloud effective radius ($ {r}_e $) as the input and the autoconversion rate ($ {a}_u $ or $ \gamma $) as the output, along with three other evidential parameters: $ \upsilon $, $ \alpha $, and $ \beta $.

Figure 2

Figure 3. Histogram of the log-transformed autoconversion rates.

Figure 3

Figure 4. The spread-skill plot of the deep evidential regression model (based on shallow NN model) with varying evidential regularizer coefficients.

Figure 4

Figure 5. Evaluation of uncertainty estimation on simulation data (ICON) via discard test.

Figure 5

Table 1. Evaluation of autoconversion rate prediction results on simulation data (ICON) using evidential ML models, including both shallow neural network (NN) and deep neural network (DNN) architectures, across various testing scenarios: (1) ICON-LEM Germany, (2) Cloud-top ICON-LEM Germany, and (3) Cloud-top ICON-NWP Holuhraun

Figure 6

Figure 6. Visualization of the autoconversion prediction results of ICON-LEM Germany and ICON-NWP Holuhraun. The left side of the image depicts the ground truth, while the middle side shows the prediction results obtained from the NN model. The right side displays the difference between the ground truth and the prediction results. The top image (a) compares the ground truth and predictions from ICON-LEM Germany at a resolution of 1 km, while the second image (b) focuses on cloud-top information only at a resolution of 1 km. The third (c) and fourth (d) figures illustrate the comparison between ground truth and predictions of the ICON-NWP Holuhraun data with a horizontal resolution of 2.5 km, focusing on cloud-top information only.

Figure 7

Figure 7. Mean, standard deviation (Std), median, and percentiles (p25, p75) of cloud-top ICON and MODIS variables over Germany (a and b) and Holuhraun (c and d): cloud effective radius (CER) and autoconversion rates (Aut).

Figure 8

Figure 8. (a) Aleatoric and (b) epistemic uncertainty estimates of autoconversion rates prediction (kg.m−3.s−1) on atmospheric simulation data (ICON) over Germany.

Figure 9

Figure 9. (a) Aleatoric and (b) epistemic uncertainty estimates of the autoconversion rates prediction (kg.m−3.s−1) on satellite data (MODIS) over Germany.

Figure 10

Figure 10. (a) Aleatoric and (b) epistemic uncertainty estimates of the autoconversion rates prediction (kg.m−3.s−1) on atmospheric simulation data (ICON) over Holuhraun.

Figure 11

Figure 11. (a) Aleatoric and (b) epistemic uncertainty estimates of the autoconversion rates prediction (kg.m−3.s−1) on satellite data (MODIS) over Holuhraun.

Author comment: Cloudy with a chance of uncertainty: autoconversion rates forecasting via evidential regression from satellite data — R0/PR1

Comments

Cover Letter for the Environmental Data Science Journal Submission

Subject: Journal Cover Letter Submission for “Cloudy with a Chance of Uncertainty: Autoconversion Rates Forecasting via Evidential Regression from Satellite Data”

Dear Prof. Claire Monteleoni,

I hope you’re well.

I am writing to submit a manuscript entitled “Cloudy with a Chance of Uncertainty: Autoconversion Rates Forecasting via Evidential Regression from Satellite Data” for consideration for publication in Environmental Data Science, as part of the Collection Issue in collaboration with ClimateChange AI, which was hosting a workshop at NeurIPS 2023.

Our research showcases the potential of predicting one of the key processes of precipitation formation—autoconversion rates—directly from satellite data using evidential regression, with the inclusion of both data and model uncertainty estimation. Our current work represents a continuation and enhancement of our previous research efforts. By seamlessly incorporating uncertainty estimation into the prediction of autoconversion rates, evidential regression offers a new perspective that goes beyond prediction. This enhanced credibility and improved understanding of uncertainties have the potential to foster greater transparency and trustworthiness in machine learning results, paving the way for a broader discussion, particularly within the domain of atmospheric science. Furthermore, our work has been selected as a spotlight at CCAI Workshop at NeurIPS 2023 for additional consideration.

We confirm that this manuscript is not under consideration for publication elsewhere and that all co-authors have approved the submission. Additionally, we disclose that there are no conflicts of interest regarding this work.

Thank you for considering our manuscript for publication. We sincerely hope that the reviewers and editors find our research valuable to the scientific community.

Should you require any further information or have any queries regarding our submission, please do not hesitate to contact me at maria.novitasari.20@ucl.ac.uk.

We eagerly await your feedback and the possibility of sharing our work with the readers of Environmental Data Science.

Sincerely,

Maria Carolina Novitasari

Review: Cloudy with a chance of uncertainty: autoconversion rates forecasting via evidential regression from satellite data — R0/PR2

Conflict of interest statement

Reviewer declares none.

Comments

The application context concerning aerosol-cloud-precipitation interactions is a complex and still poorly understood subject. Understanding and effectively managing predictions uncertainty is crucial.The study deals with a model for direct autoconversion rate extraction from satellite observation. The author distinguish between aleatoric (related to the data) and epistemic uncertainties (related to the model)

-As the spatial observations considered are not direct measurements of the parameter of interest, I don’t quite understand how the hidden variables that have an impact on the spatial observation come into play. They will come into play in the model, which is therefore not a deterministic model that links inputs and outputs and may, for example, be ambiguous.

-The consequences of the Gaussian uncertainty hypothesis in the context of the application are not questioned.

-What does the expression ‘The degree of uncertainty’ (l.33) mean? For machine learning models we generally use the notions of bias and variance of the estimator. I doubt that the learning algorithm is the cause of the model’s error,

-The model learns the relationships between the input variable (cloud effective radius (CER)) and the variable of interest (self-conversion rates) from simulated data. Satellite data is only used in the final application stage. The error introduced by the differences between the simulated data used for training and the satellite data used for application not mentioned? How does it come into play?

-Figure 1 does not provide a clear understanding of the different stages, the legend is imprecise and the learning section is not explicit.

-Section 2.1 is not sufficiently developed and does not provide a clear understanding of what data we want to model (images, time series, features)? The size of the datasets? Learning, validation and testing strategies? "The specific time window of investigation spans from 09:55 UTC to 13:20 UTC “what time step? how many images? The elements in the rest of the paper help us to understand better, but understanding would be easier if the description of the data was more complete in the data section.

-We split the data into 80% for training and validation, and 20% for testing' but it doesn’t specify the total number of data sets (pixel or image) and how this split is achieved? Are the data sets independent?

-In section 2.2 A pre-processing stage is applied to the data, and it seems that the model chosen is based on a Gaussian hypothesis. This normalisation hypothesis is therefore fundamental. Is the Gaussian character of the data after normalisation verified? What about the satellite data? Are their statistical properties similar?

-Section 2.3 Repeats the equations from section 3.3 of Alexander Amini’s paper identically but does not provide the information needed to understand how the model described is applied “Given a set of ”what is exactly x and what is y in the application?

-In section 2.4 “Testing Scenarios” the distinction between data sets is well specified but their role in the study is not obvious to understand . There is mention of ‘testing senario’, but what are the learning senarios? The number of points is indicated, but what is the variability of the data from a meteorological point of view? How many distinct cloud systems are used for the different stages of model development (learning/validation/testing)?

-L128.130 What is normalised? Error? Output?

-The Table 1 is not in the right place. It is not clear on which data the model tested on 3 test scenarios was trained and validated.

-Does Table 4 not display the same performance as indicated in Tables 1, 2, and 3 but in a more concise manner?

-Figure 4 “the autoconversion rates obtained from the simulation output and the predicted autoconversion rates from satellite data demonstrated statistical concordance” This remark does not exactly correspond to what is observed in Figure 4.The discrepancies between the simulated and observed distributions of input (CER) and output (Aut) data are not visible in Figure 4. Why can we consider that the model trained on simulations is suitable for spatial observations? To what extent can these discrepancies be considered negligible?

-Figure6 to9

The data is not symmetrically distributed around the modeled uncertainties, is it because the Gaussian assumption is not valid? A comment is missing to explain this situation."

All of these comments are likely associated with a writing problem, which fails to provide the necessary elements for a full understanding of the application context. Specifically, the description of the data and learning strategy is highly inadequate.

Review: Cloudy with a chance of uncertainty: autoconversion rates forecasting via evidential regression from satellite data — R0/PR3

Conflict of interest statement

Reviewer declares none.

Comments

The study presents a computationally efficient approach to analyze the autoconversion process in liquid clouds using model simulations and satellite data. Evidential learning is employed to estimate both aleatoric and epistemic uncertainties, reducing overall computational costs and enhancing trust in neural network’s predictions. Results indicate that aleatoric (data) uncertainty contributes significantly to overall uncertainty, suggesting that modifying the model architecture may not yield significant improvements. This is also supported to some extend by evaluating the performance of two different neural networks (i.e., varying the number of parameters). Instead, prioritizing enhancing data quality or incorporating crucial features like cloud optical thickness or cloud droplet number concentration per layer may be more effective, although such information is lacking in satellite data. Future efforts may well put their focus on estimating these features to improve autoconversion rate estimation and reduce uncertainty in observational datasets.

The present study is of interest to the atmospheric science community in particular, but to a broader extent to the Earth science community applying machine learning (ML) methods. The paper is technically correct and scientifically sound. However, the paper clearly needs re-organization and further details of the datasets and the methods. The results, nevertheless, support the authors’ conclusions. The concluding remarks nicely summarize the work, as well as present its broader implications. Although the paper requires a major revision, I would recommend the present study for publication after my comments below are addressed.

Specific comments:

Abstract. The goal of the study is not clearly defined here. However, it is well presented in the concluding remarks. I suggest re-writing the abstract to better address the overall goal, key points and broader implications of this analysis. Currently, reading the abstract it is not clear if the focus is on improving a physical model via ML, better understanding of physical processes, trust in ML algorithms, or a combination of the above.

L31-35. How can ML alleviate the outlined challenges? Please elaborate?

L57-61. What issues? It is not clear. I think that the key issue that this method is addressing is the known overconfident predictions of ML algorithms, e.g., neglecting variance and stochasticity of atmospheric physical processes. This needs to be described here in the introduction, as well as the current work on this regard.

L62-67. Novitasari et al. (2023) needs to be better described, as this is the motivation and starting point of this analysis. A reader should not need to read it to understand what was done (starting point), the limitations (motivation) and the contribution here.

Figure 1 would really benefit from a detailed text that walks the reader through the framework. Currently, it is really up to the reader to understand the logic and the approach.

L93-96. Where dos the ICON-NWP dataset fit into the general framework (Fig. 1)? What is the resolution of the simulation? Microphysics used? Time steps and number of samples?

L105-109. Single input-output neural net?

L114. Which additional standard scaling technique? There are a number of them which can result in very different normalization of the data, and subsequently affecting the results.

L134. A sketch of the evidential neural net would be very informative.

L136-143. Is this all shown? Where?

L159-160. Explain the parameters determining the normal inverse gamma distribution. This is important to understand how the uncertainties are captured.

L183. It would be beneficial to describe in a nutshell the aleatoric and epistemic uncertainties and how they are captured by the different parameters.

L206-214. ICON-NWP simulation is not well described. It needs a substantial revision.

Sect. 3.2 describes in detail the neural nets, but I believe this belongs to the method section.

Table 4 is largely redundant, right? Most information is given in Tables 1-3.

L280-281. What do R2 and SSIM tell us? Help the reader here.

L280-281. Is it surprising or expected that the performance for the ICON-NWP simulation drops? Why? Despite being a different simulation, the neural net performs well, why is so? These are interesting points not discussed in the manuscript.

Figure 5. Left panel shows the same spread-skill for l=1e-6 as in Fig. 2? If so, this is not needed, right? The panel on the right is not well explained. At least I do not understand the plot and cannot evaluate the performance of the neural net. In any case, Sect. 3.5 is a repetition of and belongs to Sect. 3.1.

L337-339. Sure, since autoconversion rates depend also on other factors and environmental conditions not seeing by the neural net, right? Therefore, this is not surprising. Actually, can this be included in the epistemic uncertainty (e.g., architecture of the model, i.e., missing key input variables)? I believe this needs a discussion.

Minor comments:

L64. “… we have not…” -> “… they have not…” or “… they neglected…”

L86. Sentence describing the icosahedral grid.

L113. Interpretability and stability of the neural net?

L116. R2 is the coefficient of determination?

L257-260. Redundant. Remove?

Tables 1-3. Combining these tables into a single one would be clearer and save space.

Recommendation: Cloudy with a chance of uncertainty: autoconversion rates forecasting via evidential regression from satellite data — R0/PR4

Comments

As you can see, both reviewers have acknowledged that your work is relevant for publication, both on the application and the method. Congratulation! Nevertheless, regarding the description of the method as well as explanations of the results, and assumptions made, some details are missing. I recommend that you address the comments of the reviewers in a revised version.

Decision: Cloudy with a chance of uncertainty: autoconversion rates forecasting via evidential regression from satellite data — R0/PR5

Comments

No accompanying comment.

Author comment: Cloudy with a chance of uncertainty: autoconversion rates forecasting via evidential regression from satellite data — R1/PR6

Comments

Dear Prof. Monteleoni,

Thank you for your email and for the detailed feedback on our manuscript entitled “Cloudy with a Chance of Uncertainty: Autoconversion Rates Forecasting via Evidential Regression from Satellite Data” (EDS-2024-0003). We appreciate the time and effort of the reviewers and the editor in reviewing our submission.

We have carefully reviewed the comments provided and have revised our manuscript accordingly. We have attached a detailed response letter addressing each of the reviewers' and the editor’s comments. The revised manuscript and response letter are included in the submission package.

We hope that the revisions meet the expectations and address the concerns raised. Please let us know if further adjustments are needed.

Thank you once again for considering our revised manuscript for publication in Environmental Data Science. We look forward to your feedback.

Sincerely,

Maria Carolina Novitasari

Review: Cloudy with a chance of uncertainty: autoconversion rates forecasting via evidential regression from satellite data — R1/PR7

Conflict of interest statement

Reviewer declares none.

Comments

Thank you for the remarkable work you’ve done to answer the various questions and comments I had on my reading of the initial paper. I really appreciated the effort to educate in the response I received and the genuine desire to improve the paper.

Recommendation: Cloudy with a chance of uncertainty: autoconversion rates forecasting via evidential regression from satellite data — R1/PR8

Comments

Thank you for the revised version, and the thorough responses with a very nice discussion about the definition of the sources of uncertainty.

Decision: Cloudy with a chance of uncertainty: autoconversion rates forecasting via evidential regression from satellite data — R1/PR9

Comments

No accompanying comment.