Hostname: page-component-77f85d65b8-6c7dr Total loading time: 0 Render date: 2026-03-30T02:35:45.582Z Has data issue: false hasContentIssue false

Uncertainty quantification for deep learning

Published online by Cambridge University Press:  17 December 2025

Peter Jan van Leeuwen*
Affiliation:
Department of Atmospheric Science, Colorado State University, USA
Jui-Yuan Christine Chiu
Affiliation:
Department of Atmospheric Science, Colorado State University, USA
Chen-Kuang Kevin Yang
Affiliation:
Department of Atmospheric Science, Colorado State University, USA
*
Corresponding author: Peter Jan van Leeuwen; Email: peter.vanleeuwen@colostate.edu

Abstract

We present a critical survey on the consistency of uncertainty quantification used in deep learning and highlight partial uncertainty coverage and many inconsistencies. We then provide a comprehensive and statistically consistent framework for uncertainty quantification in deep learning that accounts for all major sources of uncertainty: input data, training and testing data, neural network weights, and machine-learning model imperfections, targeting regression problems. We systematically quantify each source by applying Bayes’ theorem and conditional probability densities and introduce a fast, practical implementation method. We demonstrate its effectiveness on a simple regression problem and a real-world application: predicting cloud autoconversion rates using a neural network trained on aircraft measurements from the Azores and guided by a two-moment bin model of the stochastic collection equation. In this application, uncertainty from the training and testing data dominates, followed by input data, neural network model, and weight variability. Finally, we highlight the practical advantages of this methodology, showing that explicitly modeling training data uncertainty improves robustness to new inputs that fall outside the training data, and enhances model reliability in real-world scenarios.

Information

Type
Methods Paper
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NC
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial licence (http://creativecommons.org/licenses/by-nc/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original article is properly cited. The written permission of Cambridge University Press or the rights holder(s) must be obtained prior to any commercial use.
Open Practices
Open data
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. Training and testing data (blue dots) and example model (orange line).

Figure 1

Figure 2. Posterior pdfs derived from Bagging (black vertical line), quantile regression (black curve), and new methodology (blue line), for input values −2 (left), 2 (middle), and 5(right). The pdfs can be compared with the samples in Figure 1. The orange vertical line is the prediction using the true model. (Note that this not the true prediction because the input value has uncertainty.) In the left panel the Bagging vertical lines fall around −0.07, outside the plot range. Bagging shows a degenerate pdf, and the quantile regression pdf is too narrow because uncertainties in training, testing, and new input data are ignored.

Figure 2

Figure 3. The normalized likelihood values (i.e., the importance weights) for the $ 400 $ neural networks.

Figure 3

Figure 4. Examples of total uncertainty pdf in the output autoconversion rate for four input vectors. The blue curves are the total uncertainty pdfs; the red bar is the autoconversion rate calculated by the baseline neural network ($ {w}_0 $) without uncertainty quantification; black bars are the Bagging output samples. Note the wide variety of shapes of the blue uncertainty pdfs and the small spread in the black bars, demonstrating the inadequacy of the Bagging approach to represent uncertainty.

Figure 4

Figure 5. Same as Figure 4, showing full uncertainty pdfs (blue curves) and the contribution from input uncertainty and model uncertainty (black curves).

Figure 5

Figure B1. Examples of total uncertainty pdf in the output autoconversion rate for 50 input vectors. The blue curves are the total uncertainty pdfs, black curves that are derived considering the uncertainty in the new input vector and the uncertainty in the neural network only. The red bar is the autoconversion rate calculated by traditional deep learning without uncertainty quantification. The black bars are the output samples generated using Bagging and used as a measure of uncertainty for the Bagging approach.

Author comment: Uncertainty quantification for deep learning — R0/PR1

Comments

Dear editor,

This paper discusses uncertainty quantification in machine learning, providing a critical review of existing methodologies and developing a new method does does not have the many shortcomings of the existing methodologies. Specifically, the new method takes all uncertainties into account in a systematic and consistent manner, which has been lacking in the existing literature. Also, existing reviews have been found less critical on UQ methods than they might should have been. The manuscript compares UQ methods on a simple example to provide basic understanding of the workings of the different methods, and on a real-world example.

This manuscript has been submitted under the same title and reviewed before, but we apologize for missing the deadline for reply. We have incorporated all comments in the new version and largely rewritten the manuscript to enhance clarity.

We hope the manuscript can be a useful paper for EDS,

Best wishes,

Peter Jan

Review: Uncertainty quantification for deep learning — R0/PR2

Conflict of interest statement

No competing interests

Comments

Dear Editor, Dear Authors,

first I would like to apologise for the long delay in delivering my review.

I have read the manuscript with great interest and I found it almost ready to be published as it is. The study attempts a very comprehensive, and much needed, critical assessment of the uncertainty quantification capability in deep learning models. The Authors leverage the tools and language of data assimilation and root the treatment under a Bayesian framework, which makes the manuscript rigorous and smooth.

I have only very minor points and a few request of clarifications all listed below in order of appearance in the text.

1. I found the Introduction too long. I do think however that the content is all necessary and functional. Thus I would rather suggest to split the Introduction including some subsections. For example on “Bagging”, “Monte Carlo drop out” etc. Also I think the Authors may better acknowledge diffusion models in their discussion, possibly by giving to it a specific subsection in the introduction.

2. Page 4, line 3. “... weigh...” should read “... weight”.

3. Final discussion in Section 2. Can the difference in the neural network architecture be fully described by a diverse number of weights? Please expand a bit.

4. Two lines after Eq. 20: “...under...” should instead read “...in...”?

5. The discussion at the end of Section 3.3.2 alludes to the degeneracy typically encountered in particle filters. In that context degeneracy is also caused by the repeated multiplication of smaller and smaller weights as a consequences of the temporal iterations, that are missing in the present setting. Could you clarify my possible misunderstanding?

6. Is not clear the impact of choosing the prior with power four in Eq. 37. How much the results and conclusion would depend on that choice?

7. Page 19, line 6. The figure number is missing.

8. Last line of the first paragraph of Section 6: a point is missing in its end.

9. Page 21, second paragraph, line 5: “by by”.

Alberto Carrassi

Review: Uncertainty quantification for deep learning — R0/PR3

Conflict of interest statement

N/A

Comments

The authors propose a unified Bayesian framework for uncertainty quantification (UQ) in deep learning, with applications to a toy regression problem and a real-world atmospheric science dataset (ACE-ENA autoconversion). The key contribution is an explicit factorization of predictive uncertainty into components from input data, training/testing data, network weights, and model structure, along with a novel equal-likelihood training scheme to mitigate ensemble weight degeneracy. The work is potentially impactful for geoscientific machine learning, where calibrated and interpretable UQ is critical. I should note, that I do not have the expertise to evaluate the method, its robustness or appropriateness for the task, and the use-case is fairly narrow, so the scope of my review is somewhat limited.

As mentioned, I am only vaguely familiar with the appropriate literature, but it would be useful to provide a more complete discussion of UQ approaches (especially as the paper is described as a survey). For example, while the introduction discusses bagging, deep ensembles, and MC dropout it omits several widely adopted methods like conformal prediction, Laplace approximations (Laplace Redux, SWAG) and evidential deep learning.

I’m also a bit confused by calling this a ‘survey paper’ since it introduces a substantive new method (I think). This should be reworded to focus on the methodological contribution.

I found the comment about 20-100 ensemble members to be a bit thin. It is not demonstrated in the presented experiments and should be softened or supported with additional results.

Out of curiosity I asked ChatGPT about the methods and it provided the following feedback which may or may not be useful:

The framework assumes independence between training and testing data uncertainties, uses heuristic neighborhood definitions in test-data space, and relies on equalizing training loss as a proxy for uniform proposal weights. These assumptions should be explicitly tested or caveated. Sensitivity analyses (neighborhood size, input covariance, tolerance around target loss) are needed.

Minor points:

• Page 1: “United State of America” → “United States of America.”

• Page 8: “bbaseline” → “baseline.”

• Page 9: “uncertainty..” → “uncertainty.”

• Page 12: “This futher” → “This further.”

• Page 13: “deppends” → “depends.”

• Page 10: “Fig. ??”. Replace with correct figure numbering.

• Ensure consistent use of “dropout” (not “drop-out”), “pdf” (probability density function), and definitions for acronyms (e.g., ARM, EMOS if added).

Recommendation: Uncertainty quantification for deep learning — R0/PR4

Comments

Your manuscript has been favorably reviewed. The topic of uncertainty quantification (UQ) is both timely and your work is well-supported by a strong theoretical foundation rooted in the Bayesian framework. The reviewers have raised a few minor points that I recommend you address in a revised version.

In particular, please consider clarifying the classification of the manuscript as a “Survey paper.” You may wish to justify this choice more explicitly or consider reclassifying it under a more appropriate article type.

Decision: Uncertainty quantification for deep learning — R0/PR5

Comments

No accompanying comment.

Author comment: Uncertainty quantification for deep learning — R1/PR6

Comments

Dear editor,

Please find enclosed a revised version of ‘Uncertainty quantification for Deep Learning’. We would like to change the submission type from ‘Survey’ to ‘Methods’ as indicated by you and by one of the reviewers.

We hope this version meets the standard of Environmental Data Science.

Best wishes,

Peter Jan van Leeuwen, on behalf of all authors

Review: Uncertainty quantification for deep learning — R1/PR7

Conflict of interest statement

N/A

Comments

The authors have addressed my concerns. I cannot comment on the methodology itself but the example application seems solid.

Review: Uncertainty quantification for deep learning — R1/PR8

Conflict of interest statement

No Conflict

Comments

I am happy with the Authors' responses and with the new version of the manuscript that I consider now suitable for publication.

Recommendation: Uncertainty quantification for deep learning — R1/PR9

Comments

Dear author,

I am pleased to recommend your manuscript for acceptance. The comments for the reviewers were well addressed. The paper reads very nicely and will be very useful.

Decision: Uncertainty quantification for deep learning — R1/PR10

Comments

No accompanying comment.