Hostname: page-component-77f85d65b8-6bnxx Total loading time: 0 Render date: 2026-04-18T10:10:12.639Z Has data issue: false hasContentIssue false

A temporal stochastic bias correction using a machine learning attention model

Published online by Cambridge University Press:  02 January 2025

Omer Nivron*
Affiliation:
Department of Computer Science and Technology, University of Cambridge, Cambridge, UK
Damon J. Wischik
Affiliation:
Department of Computer Science and Technology, University of Cambridge, Cambridge, UK
Mathieu Vrac
Affiliation:
Laboratoire des Sciences du Climat et de l’Environnement, Institut Pierre-Simon Laplace, Paris, France
Emily Shuckburgh
Affiliation:
Department of Computer Science and Technology, University of Cambridge, Cambridge, UK
Alex T. Archibald
Affiliation:
Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
*
Corresponding author: Omer Nivron; Email: on234@cam.ac.uk

Abstract

Climate models are biased with respect to real-world observations. They usually need to be adjusted before being used in impact studies. The suite of statistical methods that enable such adjustments is called bias correction (BC). However, BC methods currently struggle to adjust temporal biases. Because they mostly disregard the dependence between consecutive time points. As a result, climate statistics with long-range temporal properties, such as the number of heatwaves and their frequency, cannot be corrected accurately. This makes it more difficult to produce reliable impact studies on such climate statistics. This article offers a novel BC methodology to correct temporal biases. This is made possible by rethinking the philosophy behind BC. We will introduce BC as a time-indexed regression task with stochastic outputs. Rethinking BC enables us to adapt state-of-the-art machine learning (ML) attention models and thereby learn different types of biases, including temporal asynchronicities. With a case study of number of heatwaves in Abuja, Nigeria and Tokyo, Japan, we show more accurate results than current climate model outputs and alternative BC methods.

Information

Type
Methods Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. Inference Task for Estimating Future Observations: The top panel outlines the available data to us in order to estimate the potential continuation of observed values post-1989 (dashed vertical line), based on historical observations (green line) and both past and future climate model outputs (blue line). The bottom panel displays a red line representing one possible continuation, sampled from the forecasting model, which estimates $ {P}_{\theta}\left({\boldsymbol{O}}^{\star }|\boldsymbol{o},\boldsymbol{g}\right) $.

Figure 1

Figure 2. Construction of Training Data for Tokyo, Japan: Column I displays full sequences from 1948 to 1988 for climate models (blue) and observations (green). Dashed orange lines indicate the selected slice for each data row, a process termed “Window Selection.” Column II zooms into the selected window, featuring a vertical black line at a randomly selected “prediction index” ($ {t}_j $), from which we aim to estimate the observations until $ {t}_h $. Column III illustrates the “Pruning” operation, where time points before and after $ {t}_j $ are randomly selected from both the climate model and observations. The observed values to be estimated are concealed, and the chosen time points, called “prediction tufts” are highlighted with red tufts. This column adapts the data for use in ML sequential models. Note that we have dropped the example index $ i $ for readability, but $ k,j,h $ change from one example to the next.

Figure 2

Figure 3. Illustration of the inference procedure. Given a trained Taylorformer model with parameters $ \theta $, we generate a time-series sequentially. In the top row, we generate a value $ {o}_1^{\star } $ which will be then plugged into the condition set in the second row and so forth.

Figure 3

Figure 4. Comparative Analysis of “number of heatwaves” Trends in Tokyo, Japan (1989–2008): The number of periods featuring at least three consecutive days with temperatures exceeding 22°C is shown. The IPSL climate model predictions are represented by red triangles, which generally underestimate the observations. Actual observations are indicated by a vertical orange line. The Taylorformer temporal BC are depicted using horizontal box plots, with whiskers indicating the 1st and 3rd quartiles. Markers for other BC methods are indicated at the bottom of the figure.

Figure 4

Figure 5. Comparative Analysis of “number of heatwaves” Trends in Abuja, Nigeria (1989–2008): The number of periods featuring at least three consecutive days with temperatures exceeding 24°C is shown. The IPSL climate model predictions are represented by red triangles, which overestimate the observations. A vertical orange line indicates actual observations. The Taylor former temporal BC is depicted using horizontal box plots, with whiskers indicating the 1st and 3rd quartiles. Markers for other BC methods are indicated at the bottom of the figure.

Figure 5

Figure 6. Changes in the distribution of daily maximum temperatures above the 90th quantile in Tokyo: This plot compares the distribution of temperatures between two periods: 1968–1988 (X-axis) and 1989–2008 (Y-axis).

Figure 6

Figure 7. Changes in the distribution of daily maximum temperatures above the 90th quantile in Abuja: This plot compares the distribution of temperatures between two periods: 1968–1988 (X-axis) and 1989–2008 (Y-axis).

Figure 7

Figure 8. The distribution of daily observed (ERA5) maximum temperatures ($ {}^{\circ}\hskip-0.2em C $) in Tokyo, Japan had changed from 1968–1988 (blue histogram) to 1989–2009 (orange histogram) in February (left) and July (right).

Figure 8

Figure 9. The distribution of daily observed (ERA5) maximum temperatures ($ {}^{\circ}C $) in Abuja, Nigeria had changed from 1968–1988 (blue histogram) to 1989–2009 (orange histogram) in February (left) and July (right).

Figure 9

Figure 10. The distribution of modeled (IPSL) ensemble average of maximum daily temperatures in Tokyo, Japan had changed between 1968–1988 (blue histogram) and 1989–2009 (orange histogram) in February (left) and July (right). Qualitatively, the change is different than that in the observed record (see Figure 8).

Figure 10

Figure 11. The distribution of modeled (IPSL) ensemble average of maximum daily temperatures in Abuja, Nigeria had changed between 1968–1988 (blue histogram) and 1989–2009 (orange histogram) in February (left) and July (right). Qualitatively, the change is different than that in the observed record (see Figure 9).

Figure 11

Figure 12. The distribution of observed (ERA5) daily maximum temperature above $ 22{}^{\circ}C $ in Tokyo had shifted between 1968–1988 (blue histogram) and 1989–2009 (orange histogram) in July (left) and August (right).

Figure 12

Figure 13. The distribution of observed (ERA5) daily maximum temperature above $ 30{}^{\circ}C $ in Abuja had shifted between 1968–1988 (blue histogram) and 1989–2009 (orange histogram) in January (left) and February (right).

Figure 13

Figure 14. The distribution of modeled (IPSL) daily maximum temperature above $ 22{}^{\circ}C $ in Tokyo had shifted between 1968–1988 (blue histogram) and 1989–2009 (orange histogram) in January (left) and February (right). Qualitatively, the change is different than that in the observed record (see Figure 12).

Figure 14

Figure 15. The distribution of modeled (IPSL) daily maximum temperature above $ 30{}^{\circ}C $ in Abuja had shifted between 1968–1988 (blue histogram) and 1989–2009 (orange histogram) in January (left) and February (right). Qualitatively, the change is different than that in the observed record (see Figure 13).

Figure 15

Figure 16. The task involves predicting future values using training points (“+”) located left of the vertical black line. The predicted mean (solid blue line) aligns closely with these points and quickly reverts to zero beyond this boundary. The light blue shading indicates the $ 95\% $ confidence interval. Sample trajectories from the predicted probability model are shown as colored lines to the right of the black line. These demonstrate that target variables distant from the training data maintain the same marginal distribution, mean, and variance, as evidenced by the vertical slice marked by green dashed lines. This time-series view highlights correlations among points, offering insights not visible from a simple histogram of training points.

Figure 16

Figure 17. Predictions are made using training points (“+”) left of the vertical black line. The predicted mean (solid blue line) aligns accurately with these points and, upon extrapolation to the right, adapts to typical “seasonality” behaviors instead of becoming non-informative (refer to Figure 16). The light blue shading indicates the $ 95\% $ confidence interval for each x-axis value. Various sample trajectories from the probability model are shown as colored lines, while the dashed green lines illustrate the marginal distribution at corresponding x-axis values.

Figure 17

Figure 18. This figure uses a toy example of a Gaussian Process-based BC regression (details in Appendix 13) to illustrate correction of systematic time asynchronicities between “climate model” and observations. Left panel: The pseudo observations (blue points) are both vertically and horizontally shifted relative to the pseudo climate model (red points). Right panel: The GP model’s mean (golden solid line) successfully captures the vertical and horizontal shifts, as shown by the alignment of the pseudo observations (solid blue line) with the pseudo climate model’s full time-series (solid red line). Traditional quantile mapping approaches with a fixed reference period would fail to recognize that the processes are identical, barring a mean shift.

Figure 18

Table 1. Average MSE and Log-Likelihood for Abuja and Tokyo (1989–2008)

Figure 19

Figure 19. QQ plots for Tokyo, Japan 1989–2008 for maximum daily temperatures (Tmax) for observations (X-axis) versus BC methods (Y-axis): The climate model (IPSL) quantiles (top left plot) are very different than observations (ERA5). For mean shift (bottom left), EQM (bottom right), and our method Temporal BC (top right), the qauntiles are captured quite well with minor differences.

Figure 20

Figure 20. QQ plots for Abuja, Nigeria 1989–2008 for maximum daily temperatures (Tmax) for observations (X-axis) versus BC methods (Y-axis): The climate model (IPSL) quantiles (top left plot) are very different than observations. For mean shift (bottom left) and EQM (bottom right) the quantiles are captured more accurately but our method Temporal BC (top right) seems to be the most accurate.

Figure 21

Figure 21. Partial auto-correlation plot for Tokyo, Japan, 1989–2008 for lags 2–14 and for a randomly chosen initial condition run. Our Taylorformer temporal BC (horizontal blue lines) outputs multiple trajectories and, therefore, multiple partial auto-correlation values at each lag; it seems to underestimate the observed partial auto-correlation (horizontal orange line) for lags 2 and 3 and capture it well for higher lags. See the markers at the bottom of the figure to interpret the partial auto-correlation for other BC methods.

Figure 22

Figure 22. Partial auto-correlation plot for Abuja, Nigeria, 1989–2008 for lags 2–14 and for a randomly chosen initial condition run. Our Taylorformer temporal BC (horizontal blue lines) outputs multiple trajectories and, therefore, multiple partial auto-correlation values at each lag; it seems to capture the observed partial auto-correlation (horizontal orange line) for all lags except lag 3. See the markers at the bottom of the figure to interpret the partial auto-correlation for other BC methods.

Figure 23

Figure 23. Comparison of trend evolution of Climate Model, Observations and Temporal BC in Tokyo, Japan (1989–2008): This figure illustrates the 5-year averages of daily maximum temperatures, with each plot representing a different initial-condition run. The Temporal BC model is depicted by a blue horizontal line, which does not necessarily align with the trend of the initial-condition climate model, shown as a red horizontal line. Furthermore, neither model consistently follows the trend observed in the actual data, represented by an orange horizontal line.

Figure 24

Figure 24. Comparison of trend evolution of Climate Model, Observations and Temporal BC in Abuja, Nigeria (1989–2008): This figure illustrates the 5-year averages of daily maximum temperatures, with each plot representing a different initial-condition run. The Temporal BC model is depicted by a blue horizontal line, which does not necessarily align with the trend of the initial-condition climate model, shown as a red horizontal line. Furthermore, neither model consistently follows the trend observed in the actual data, represented by an orange horizontal line.

Figure 25

Figure 25. Comparative Analysis of “number of heatwaves” Trends in Tokyo, Japan (1989–2008): The number of periods featuring at least three consecutive days with temperatures exceeding 24$ {}^{\circ } $C is shown. The IPSL climate model predictions are represented by red triangles, which generally underestimate the observations. Actual observations are indicated by a vertical orange line. The Taylorformer temporal BC is depicted using horizontal box plots, with whiskers indicating the 1st and 3rd quartiles. Markers for other BC methods are indicated at the bottom of the figure.

Figure 26

Figure 26. Comparative analysis of “number of heatwaves” trends in Tokyo, Japan (1989–2008): The number of periods featuring at least three consecutive days with temperatures exceeding 26$ {}^{\circ } $C is shown. The IPSL climate model predictions are represented by red triangles, which generally underestimate the observations. A vertical orange line indicates actual observations. The Taylorformer temporal BC is depicted using horizontal box plots, with whiskers indicating the 1st and 3rd quartiles. Markers for other BC methods are indicated at the bottom of the figure.

Figure 27

Figure 27. Comparative analysis of “number of heatwaves” Trends in Tokyo, Japan (1989–2008): The number of periods featuring at least three consecutive days with temperatures exceeding 22$ {}^{\circ } $C is shown. The IPSL climate model predictions are represented by red triangles, which generally underestimate the observations. A vertical orange line indicates actual observations. The Taylorformer temporal BC is depicted using horizontal box plots, with whiskers indicating the 1st and 3rd quartiles. Markers for other BC methods are indicated at the bottom of the figure.

Figure 28

Figure 28. Comparative Analysis of “number of heatwaves” Trends in Abuja, Nigeria (1989–2008): The number of periods featuring at least three consecutive days with temperatures exceeding 26$ {}^{\circ } $C is shown. The IPSL climate model predictions are represented by red triangles, which generally overestimate the observations. A vertical orange line indicates actual observations. The Taylorformer temporal BC is depicted using horizontal box plots, with whiskers indicating the 1st and 3rd quartiles. Markers for other BC methods are indicated at the bottom of the figure.

Figure 29

Figure 29. Comparative Analysis of “number of heatwaves” Trends in Abuja, Nigeria (1989–2008): The number of periods featuring at least three consecutive days with temperatures exceeding 28°C is shown. The IPSL climate model predictions are represented by red triangles, which generally overestimate the observations. A vertical orange line indicates actual observations. The Taylorformer temporal BC is depicted using horizontal box plots, with whiskers indicating the 1st and 3rd quartiles. Markers for other BC methods are indicated at the bottom of the figure.

Figure 30

Figure 30. Comparative Analysis of “number of heatwaves” Trends in Abuja, Nigeria (1989–2008): The number of periods featuring at least three consecutive days with temperatures exceeding 30$ {}^{\circ } $C is shown. The IPSL climate model predictions are represented by red triangles, which generally overestimate the observations. A vertical orange line indicates actual observations. The Taylorformer temporal BC is depicted using horizontal box plots, with whiskers indicating the 1st and 3rd quartiles. Markers for other BC methods are indicated at the bottom of the figure.

Figure 31

Figure 31. When collapsing a time-series (top) to a histogram (bottom), the temporal information is lost – a toy illustration.

Author comment: A temporal stochastic bias correction using a machine learning attention model — R0/PR1

Comments

Dear Editor,

We wish to submit an original research article entitled “A Temporal Stochastic Bias Correction using a Machine Learning Attention Model” for consideration by the Environmental Data Science Journal.

In this paper, we tackle the problem of biases in climate models. Specifically, we propose a novel bias correction methodology to correct temporal biases. With a case study of heatwave duration statistics in Abuja, Nigeria, and Tokyo, Japan, we show more accurate results than current climate model outputs and alternative BC methods. Through this work, we aim to advance the correction of climate model biases and, subsequently, the impact studies that depend on them.

We have no conflicts of interest to disclose.

Please address all correspondence concerning this paper to me at on234@cam.ac.uk

Review: A temporal stochastic bias correction using a machine learning attention model — R0/PR2

Conflict of interest statement

Reviewer declares none.

Comments

>Summary: In this section please explain in your own words what problem the paper addresses and what it contributes to solving it.

Many climate model bias adjustment methods correct marginal distributional properties, but do not or only implicitly correct the temporal dependence structure present in the climate model. In this paper the authors introduce a novel bias adjustment method to address biases in temporal dependencies which may be relevant for climate statistics dependent on temporal structure for example the duration of heatwaves. They introduce a probability model for bias adjustment which they parametrize by the Taylorformer model, a transformer based deep learning model. They introduce ways of training the model and apply it for the bias correction of a CMIP6 model at two city locations. They find lower RMSE of their proposed correction method as well as good performance in modelling 3-day temperature threshold exceedances.

>Relevance and Impact: Is this paper a significant contribution to interdisciplinary climate informatics?

The paper contains some interesting ideas, however the evaluation, benchmarking and embedding into the literature is insufficient to judge the quality of the contribution. A major revision or resubmission of the paper is necessary to address these points. I expand upon this in the detailed comments.

>Detailed Comments

In short, the paper under review contains some interesting ideas, however I have some major concerns, in short: 1) insufficient engaging with the literature and wrong benchmarks, 2) wrong classification of the method, 3) insufficient evaluation.

1) Insufficient engagement with the literature and wrong benchmarks

I believe the authors engagement with and discussion of previous and related literature in the bias adjustment space is insufficient. They claim that they develop the first bias adjustment method that is designed to correct temporal dependencies. However, I suggest the authors consider the works by Vrac and Friederichs 2015, Mehrotra and Sharma 2015, 2016, 2019 and others cited therein, which all addressed this problem previously. In order to warrant publication, I believe it is necessary that the authors 1) discuss these alternative methods and embed their own suggested method in the literature, 2) benchmark their bias adjustment method against at least one or two previous bias adjustment methods designed to address this problem. This is connected to the second point.

2) Wrong embedding within other methods and wrong benchmarks

The task as it currently applied: mapping a CMIP6 model to reanalysis data on a city level is not bias adjustment, but rather downscaling (joint with bias adjustment) as it comes with a substantial scale increase (~8 times resolution increase). This is a different task, and many downscaling methods already explicitly consider and correct the temporal structure (see Maraun and Widmann 2018 for an overview of downscaling or consider the weather generator literature e.g. Chandler 2020 which uses similar methods). However, none of the benchmark bias adjustment methods used in the paper is suitable for downscaling (especially they do not add any local scale variability which has long been argued to be crucial for downscaling e.g. Maraun 2013) and thus it is not surprising that they are outperformed. I believe that if the authors wish to cast their method as joint bias adjustment and downscaling, they need to choose more adequate benchmark methods and discuss the downscaling literature. Alternatively, if they wish to present a bias adjustment method, they should make the correction without scale increase and study more grid points. This related to the last major point.

3) Insufficient evaluation

The evaluation of the method presented in the paper is insufficient to judge the quality of the contribution. As of now the evaluation mostly considers RMSE and boxplots of heatwave occurrence. As bias correction is not a prediction/reconstruction task RMSE is not a particularly useful metric: consider e.g. the case of a climate emulator 1 (CE1) perfectly modelling the internal variability of the climate system and a climate emulator 2 (CE2) which only ever outputs the average temperature. Then CE2 will be an unrealistic representation of the climate system but generally have lower RMSE (reduced variance of simulators might lead to lower RMSE but worse representation of extremes). The boxplot of heatwave occurrence is useful, but in the reviewer’s opinion this does not warrant the conclusion of “stunning results”. I suggest the authors look into other studies e.g. Cannon et al. 2015 or the VALUE framework for standardized bias adjustment evaluation. Particularly as the authors aim to correct temporal properties reporting autocorrelation across correction methods would be important. As bias correction is usually seen as distribution matching task QQ-plots over the validation period would be relevant. Also, the authors need to consider more locations as it is not sufficient to draw conclusion from only two grid cells (especially as bias adjustment is usually applied over a grid/larger area). And finally bias adjustment is generally not of interest applied to historical periods, but rather applied to a future period (thus applied under a distribution shift). A discussion of how this method can be applied to future projections as well as the evaluation of such an application considering aspects like trend preservation is important to judge whether the method is sensible for applications.

Minor remarks:

• As noted in the major remarks section the discussion of the literature on bias adjustment is insufficient as is. Crucial literature for some of the methods is not cited. Methods such as ECDFM (Li et al. 2010, citation number: 25) are incorrectly classified as EQM, together with other small oversights. I may suggest careful checking the references and method classifications.

• I find the paper very difficult to read in parts. I suggest simplifying and streamlining the notation, for example the authors might consider whether the number of indices in formulas is needed and might be reduced. I also suggest cleaning up some of the figure labels, possibly removing some of the temperature threshold plots in the appendix (it may not be necessary to consider the same statistic for ~30 runs and many different thresholds) and rephrasing of some of the conclusions e.g. remove the phrasing of “stunning results”.

• I believe section 3.1 on the univariate model does not add much for the reader’s understanding and might be removed. Then whilst staying within the page limit the authors might be able to add some important details on the Taylorformer model used as well as the log-likelihood fit which should improve the reader’s understanding. Also, in section 3.1. just because the mean shift bias correction can be obtained by a linear regression model this does not mean that the correction is based on such a model. To warrant its inclusion in the paper I believe the probability model needs more discussion, especially as the equality of observation and climate model time indices is very atypical (given that climate model run and observation are not in synchronicity). Also, as the most common bias adjustment method is quantile mapping it would be important to discuss how it fits into this framework.

References:

Cannon, A. J., Sobie, S. R., & Murdock, T. Q. (2015). Bias Correction of GCM Precipitation by Quantile Mapping: How Well Do Methods Preserve Changes in Quantiles and Extremes? Journal of Climate, 28(17), 6938–6959. https://doi.org/10.1175/JCLI-D-14-00754.1

Chandler, R. E. (2020). Multisite, multivariate weather generation based on generalised linear models. Environmental Modelling and Software, 134. https://doi.org/10.1016/j.envsoft.2020.104867

Maraun, D. (2013). Bias Correction, Quantile Mapping, and Downscaling: Revisiting the Inflation Issue. Journal of Climate, 26(6), 2137–2143. https://doi.org/10.1175/JCLI-D-12-00821.1

Maraun, D., & Widmann, M. (2018). Statistical Downscaling and Bias Correction for Climate Research. Cambridge University Press.

Mehrotra, R., & Sharma, A. (2015). Correcting for systematic biases in multiple raw GCM variables across a range of timescales. Journal of Hydrology, 520, 214–223. https://doi.org/10.1016/j.jhydrol.2014.11.037

Mehrotra, R., & Sharma, A. (2016). A Multivariate Quantile-Matching Bias Correction Approach with Auto- and Cross-Dependence across Multiple Time Scales: Implications for Downscaling. Journal of Climate, 29(10), 3519–3539. https://doi.org/10.1175/JCLI-D-15-0356.1

Mehrotra, R., & Sharma, A. (2019). A Resampling Approach for Correcting Systematic Spatiotemporal Biases for Multiple Variables in a Changing Climate. Water Resources Research, 55(1), 754–770. https://doi.org/10.1029/2018WR023270

Vrac, M., & Friederichs, P. (2015). Multivariate—Intervariable, Spatial, and Temporal—Bias Correction. Journal of Climate, 28(1), 218–237. https://doi.org/10.1175/JCLI-D-14-00059.1

Recommendation: A temporal stochastic bias correction using a machine learning attention model — R0/PR3

Comments

This article was accepted into Climate Informatics 2024 Conference after the authors addressed the comments in the reviews provided. It has been accepted for publication in Environmental Data Science on the strength of the Climate Informatics Review Process.

Decision: A temporal stochastic bias correction using a machine learning attention model — R0/PR4

Comments

No accompanying comment.