Hostname: page-component-77f85d65b8-pztms Total loading time: 0 Render date: 2026-04-17T21:41:10.479Z Has data issue: false hasContentIssue false

Machine learning for stochastic parametrization

Published online by Cambridge University Press:  02 January 2025

Hannah M. Christensen*
Affiliation:
Department of Physics, University of Oxford, Oxford, UK
Salah Kouhen
Affiliation:
Department of Physics, University of Oxford, Oxford, UK
Greta Miller
Affiliation:
Department of Physics, University of Oxford, Oxford, UK
Raghul Parthipan
Affiliation:
Department of Computer Science, University of Cambridge, Cambridge, UK British Antarctic Survey, Cambridge, UK
*
Corresponding author: Hannah M. Christensen; Email: hannah.christensen@physics.ox.ac.uk

Abstract

Atmospheric models used for weather and climate prediction are traditionally formulated in a deterministic manner. In other words, given a particular state of the resolved scale variables, the most likely forcing from the subgrid scale processes is estimated and used to predict the evolution of the large-scale flow. However, the lack of scale separation in the atmosphere means that this approach is a large source of error in forecasts. Over recent years, an alternative paradigm has developed: the use of stochastic techniques to characterize uncertainty in small-scale processes. These techniques are now widely used across weather, subseasonal, seasonal, and climate timescales. In parallel, recent years have also seen significant progress in replacing parametrization schemes using machine learning (ML). This has the potential to both speed up and improve our numerical models. However, the focus to date has largely been on deterministic approaches. In this position paper, we bring together these two key developments and discuss the potential for data-driven approaches for stochastic parametrization. We highlight early studies in this area and draw attention to the novel challenges that remain.

Information

Type
Position Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. Coarse-graining studies provide evidence for stochastic parametrizations. (a), the pdf of ‘true’ subgrid temperature tendencies derived from a high-resolution simulation is conditioned on the tendency predicted by a deterministic forecast model ($ {T}_{fc} $: colors in legend). (b) Mean ‘true’ tendency conditioned on $ {T}_{fc} $. For this forecast model, positive temperature tendencies are well calibrated, while negative temperature tendencies are biased cold. (c) Standard deviation of “true” tendency conditioned on $ {T}_{fc} $. For this forecast model, the uncertainty in the “true” tendency increases with the magnitude of the low-resolution forecast tendency. Figure adapted from (Christensen, 2020).

Figure 1

Figure 2. Reliability curve for convection occurrence estimated by the random forest (green line), which is close to perfect reliability (grey line). The random forest was developed for use as a stochastic convection trigger function. The circle sizes are proportional to the log of the number of samples per bin; there are many more non-convection events (91%) than convection events (9%). Figure adapted from Miller et al. (submitted).

Figure 2

Figure 3. a. The classic cellular automata, the game of life, after 70 rule iterations on random initial conditions. b. A set of rules discovered through the use of a genetic algorithm after 70 iterations from a random initial condition. c. An example of fitness convergence for a genetic algorithm scheme.

Figure 3

Figure 4. Cloud fractions as a function of height (model levels) for forecasts of 200 hours. Observed cloud fraction is compared to that from the operational deterministic parametrization, and to two stochastic ML models. The Baseline ML model is a simple feed-forward neural network, whilst the Mixed ML model separates the task of modeling into a binary categorization and continuous prediction problem. These are probabilistic models, and three sampled trajectories are shown for both. The mixed model is better able to create and remove cloud. Adapted from Parthipan (2024).

Author comment: Machine learning for stochastic parametrization — R0/PR1

Comments

Dear Editor,

I can confirm this paper has already undergone review for the Climate Informatics 2024 conference. I uploaded a brief response to reviewers comments with my other files. No changes were required.

All the best,

Hannah

Review: Machine learning for stochastic parametrization — R0/PR2

Conflict of interest statement

Reviewer declares none.

Comments

>Summary: In this section please explain in your own words what problem the paper addresses and what it contributes to solving it.

The paper is a position paper, outlining the potential benefits of applying machine learning approaches to stochastic paramaterisation of weather and climate models, and summarising some of the progress made so far.

>Relevance and Impact: Is this paper a significant contribution to interdisciplinary climate informatics?

Yes, I believe this paper is a significant contribution to interdisciplinary climate informatics. While there are no novel results in this paper (results appear from very recent literature, including a coauthor’s PhD thesis), it clearly identifies a research gap, collects the relevant early literature and sets out a path forward. While this is nearer the edge of my field expertise, the paper seems well structured, comprehensive in its treatment of this new and fast-evolving subject, and is well written. I learned a lot, and I think this would be a valuable addition to the literature.

>Detailed Comments

My only surprise and slight sadness, is that this paper (alongside I think much of the literature on stochastic parameterisation) does not engage more with the literature on uncertainty quantification (UQ) for complex models. This isn’t the fault of the authors, but a wider issue of siloing in the two fields.

Review: Machine learning for stochastic parametrization — R0/PR3

Conflict of interest statement

Reviewer declares none.

Comments

>Summary: In this section please explain in your own words what problem the paper addresses and what it contributes to solving it.

This position paper reviewed the current status of stochastic parameterization and then explained the progress and potential of utilizing machine learning to further improve the stochastic parameterization for weather and climate modeling framework.

>Relevance and Impact: Is this paper a significant contribution to interdisciplinary climate informatics?

Using machine learning to improve weather prediction and climate projection is a highly active research domain form climate informatics. The stochastic parameterization is important for better capturing model uncertainty and better achieving the goal of weather and climate modeling.

>Detailed Comments

This position paper provides a great overview of parameterization for weather and climate modeling. Authors carefully summarized existing development of stochastic parameterization including recent advances of using machine learning for the purpose. They articulated the major questions faced by the community in using machine learning for stochastic parameterization which provides a good starting point for other researchers to tackle these challenges. I am also very encouraged to see the mention of new development of MUMIP which could provide training dataset to advance the domain.

My only suggestion would be that authors should briefly address the explainability of machine learning and what is the potential impact on the application for stochastic parameterization.

Recommendation: Machine learning for stochastic parametrization — R0/PR4

Comments

This article was accepted into Climate Informatics 2024 Conference after the authors addressed the comments in the reviews provided. It has been accepted for publication in Environmental Data Science on the strength of the Climate Informatics Review Process.

Decision: Machine learning for stochastic parametrization — R0/PR5

Comments

No accompanying comment.