Hostname: page-component-77f85d65b8-pztms Total loading time: 0 Render date: 2026-03-29T05:57:28.104Z Has data issue: false hasContentIssue false

AtmoDist: Self-supervised representation learning for atmospheric dynamics

Published online by Cambridge University Press:  21 February 2023

Sebastian Hoffmann
Affiliation:
Institut für Simulation und Graphik, Otto-von-Guericke-Universität Magdeburg, Magdeburg, Germany
Christian Lessig*
Affiliation:
Institut für Simulation und Graphik, Otto-von-Guericke-Universität Magdeburg, Magdeburg, Germany
*
*Corresponding author. E-mail: christian.lessig@ovgu.de

Abstract

Representation learning has proven to be a powerful methodology in a wide variety of machine-learning applications. For atmospheric dynamics, however, it has so far not been considered, arguably due to the lack of large-scale, labeled datasets that could be used for training. In this work, we show how to sidestep the difficulty and introduce a self-supervised learning task that is applicable to a wide variety of unlabeled atmospheric datasets. Specifically, we train a neural network on the simple yet intricate task of predicting the temporal distance between atmospheric fields from distinct but nearby times. We demonstrate that training with this task on the ERA5 reanalysis dataset leads to internal representations that capture intrinsic aspects of atmospheric dynamics. For example, when employed as a loss function in other machine-learning applications, the derived AtmoDist distance leads to improved results compared to the $ {\mathrm{\ell}}_2 $-loss. For downscaling one obtains higher resolution fields that match the true statistics more closely than previous approaches and for the interpolation of missing or occluded data the AtmoDist distance leads to results that contain more realistic fine-scale features. Since it is obtained from observational data, AtmoDist also provides a novel perspective on atmospheric predictability.

Information

Type
Methods Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Figure 1. Overview of the methodology for AtmoDist. From a temporal sequence of atmospheric fields (bottom), two nearby ones are selected at random (red) and stored together with their temporal separation $ \Delta t $ as a training sample. Both fields are then passed through the same representation network (blue), which embeds them into a high-dimensional feature space (left). These embeddings are subsequently used by the tail network to predict the temporal separation $ \Delta t $ (top, orange). The whole architecture is trained end-to-end. Once training is completed, the embeddings can be used in downstream tasks, for example, through a distance measure $ d\left({X}_{t_1},{X}_{t_2}\right) $ in embedding space.

Figure 1

Table 1. Overview of the data used in this work.

Figure 2

Figure 2. The AtmoDist network used for learning the pretext task. Numbers after layer names indicate the number of filters/feature maps of an operation. The comparison network is only required during training and can be discarded afterward.

Figure 3

Figure 3. Loss (left) and Top-1 accuracy (right) during training calculated on the training (1979–1998) and the evaluation dataset (2000–2005). Drops in loss correspond to learning rate reductions. The best loss and accuracy are achieved in epoch 27; afterward the network begins to overfit.

Figure 4

Figure 4. Mean $ {\mathrm{\ell}}_1 $-norm (left) and mean $ {\mathrm{\ell}}_2 $-norm (right) between samples that are a fixed time-interval apart calculated on the training set. Shaded areas indicate standard deviation. For comparability, the AtmoDist distance has been normalized in each case with the method described in Appendix A.3. To give equal weight to divergence and vorticity, they have been normalized to zero mean and unit variance before calculating grid point-wise metrics.

Figure 5

Figure 5. Divergence field (left) and its reconstruction produced by the autoencoder (right). While the reconstructed field lacks finer details, large-scale structures are properly captured by the autoencoder.

Figure 6

Figure 6. The confusion matrix shows the accuracy for the evaluation set as a function of predicted time lag and actual time lag. The side-diagonals indicate that AtmoDist is able to infer the time of the day for an atmospheric state with high precision solely based on a local patch of divergence and vorticity fields but might err on the day. A logarithmic color scale has been used to better highlight the side-diagonals.

Figure 7

Figure 7. Accuracy of AtmoDist to correctly predict that two patches are 48 hr apart as a function of space with an error margin of 3 hr (i.e., 45 and 51 hr are also counted as correct prediction). The red rectangle in the lower left corner indicates the patch size used as input for the network.

Figure 8

Figure 8. Uncurated set of downscaled divergence fields over the Gulf of Thailand at different time steps. Coastlines are shown in the first ground truth field and then omitted for better comparability.

Figure 9

Figure 9. Left: The energy spectrum starting from wavenumber 200 and averaged over the whole evaluation period. The spectra below wavenumber 200 are almost identical. The spectrum has been calculated by first converting divergence and vorticity to eastwardly and northwardly wind fields, and then evaluating the kinetic energy. Right: Semivariogram of divergence.

Figure 10

Figure 10. Histogram of reconstruction errors measured in $ {\mathrm{\ell}}_2 $ norm (left) and difference of total variation (right) for relative vorticity. We define the difference of total variation between the original field $ f $ and its super-resolved approximation $ g $ as $ {d}_{\mathrm{tv}}\left(f,g\right)={\int}_{\mathcal{D}}\left|\nabla f(x)\right|-\left|\nabla g(x)\right| dx $. Values closer to zero are better. Despite performing better with regards to the $ {\mathrm{\ell}}_2 $ reconstruction error, the $ {\mathrm{\ell}}_2 $-based super-resolution performs worse with regards to the difference of total variation. Notice that the approach by Stengel et al. (2020) minimizes the $ {\mathrm{\ell}}_2 $ reconstruction error during training. Interestingly, all three approaches have solely negative total variation differences, implying that the super-resolved fields are overly smooth compared to the ground truth fields. Similar results are obtained for divergence.

Figure 11

Table 2. Better/worse scores for local statistics of GAN-based super-resolution.

Figure 12

Figure 11. Interpolated vorticity fields (center, in the yellow square) for deleted regions for, from left to right, ground truth, AtmoDist, $ {\mathrm{\ell}}_2 $-norm, and autoencoder. Both $ {\mathrm{\ell}}_2 $-loss and autoencoder-loss produce overly smooth results. The AtmoDist-based reconstruction captures more of the higher-frequency features present in the data although it suffers from some blocking artifacts. The horizontal artifacts for the autoencoder occurred frequently in our results.

Figure 13

Figure 12. Semivariograms for the reconstruction of partially occluded fields for divergence (left) and vorticity (right). The autoencoder performs better on divergence than the $ {\mathrm{\ell}}_2 $-loss while the roles are interchanged for vorticity. We hypothesize that this is due to the larger high-frequency content of divergence.

Figure 14

Figure 13. Left: Kernel density estimate of vorticity distribution at Milan (Italy). The $ {\mathrm{\ell}}_2 $-based GAN achieves a Wasserstein distance of $ 5.3\cdot {10}^{-6} $ while our approach achieves a Wasserstein distance of $ 2.0\cdot {10}^{-6} $. The autoencoder-based GAN yields significant worse statistics. Right: Reconstruction error measured as difference of total variation of divergence for the $ {\mathrm{\ell}}_2 $-based super-resolution as a function of time. To highlight the oscillations, the errors have been smoothed by a 30 day moving average. The oscillations are also present in the AtmoDist-based super-resolution, when comparing vorticity, or when the reconstruction error is measured using the $ {\mathrm{\ell}}_2 $ norm.

Figure 15

Figure 14. The energy spectrum (left), semivariogram (center), and distribution of total variation difference errors (right) for models trained with different maximum $ \Delta {t}_{\mathrm{max}} $ for our ablation study. The semivariogram and error distributions are calculated on divergence, but qualitatively similar results are obtained for vorticity.

Figure 16

Table 3. Mean and standard deviations calculated on the training dataset (1979–1998) on model level 120 for divergence and relative vorticity.

Figure 17

Figure 15. Mean SSIM and PSNR as a function of the temporal separation $ \Delta t $. Since in both cases higher quantities indicate more similarity between samples, we apply the following transformations to make the plots comparable to Figure 4: SSIM: $ y=1-\left(1+ SSIM\left({X}_{t_1},{X}_{t_2}\right)\right)/2 $; PSNR: $ y=50\hskip0.1em dB- PSNR\left({X}_{t_1},{X}_{t_2}\right) $.

Figure 18

Figure 16. Uncurated set of downscaled vorticity fields over the Mediterranean Sea and Eastern Europe at different time steps. Coastlines are shown in the first ground truth field and then omitted for better comparability.