Hostname: page-component-89b8bd64d-46n74 Total loading time: 0 Render date: 2026-05-07T15:31:00.059Z Has data issue: false hasContentIssue false

TemperatureGAN: generative modeling of regional atmospheric temperatures

Published online by Cambridge University Press:  02 January 2025

Emmanuel Balogun*
Affiliation:
Mechanical Engineering, Stanford University, Stanford, CA, USA
Ram Rajagopal
Affiliation:
Civil and Environmental Engineering, Stanford University, Stanford, CA, USA
Arun Majumdar*
Affiliation:
Mechanical Engineering, Stanford University, Stanford, CA, USA
*
Corresponding authors: Emmanuel Balogun and Arun Majumdar; Emails: ebalogun@stanford.edu; amajumdar@stanford.edu
Corresponding authors: Emmanuel Balogun and Arun Majumdar; Emails: ebalogun@stanford.edu; amajumdar@stanford.edu

Abstract

Stochastic generators are useful for estimating climate impacts on various sectors. Projecting climate risk in various sectors, e.g. energy systems, requires generators that are accurate (statistical resemblance to ground-truth), reliable (do not produce erroneous examples), and efficient. Leveraging data from the North American Land Data Assimilation System, we introduce TemperatureGAN, a Generative Adversarial Network conditioned on months, regions, and time periods, to generate 2 m above ground atmospheric temperatures at an hourly resolution. We propose evaluation methods and metrics to measure the quality of generated samples. We show that TemperatureGAN produces high-fidelity examples with good spatial representation and temporal dynamics consistent with known diurnal cycles.

Information

Type
Application Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices
Open data
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. TemperatureGAN model framework. Regional temperature maps are passed as input to discriminator during the training, however the generator never sees the training data.

Figure 1

Figure 2. Image depiction of data aggregation scheme for a specific region $ R $ and for the chosen month $ M $ of January. For a 4-year period $ {k}_i $, all the 24-hour time-series (31 for each year because January has 31 days) within all four months of January are grouped into the same bucket (making it a total of 124 examples) and have the same labels.

Figure 2

Figure 3. Generator $ G $ architecture. The sampled noise is concatenated with the learned label embedding and passed through a dense, fully-connected (FC) linear block with Rectified Linear Unit (ReLU) activation functions. The output of the linear block is sent through series of convolution layers with batch normalization to obtain the desired ($ 8\times 8\times 24 $) output shape.

Figure 3

Figure 4. Spatial discriminator$ {D}_{\mathrm{s}} $architecture. The inputs to the spatial discriminator$ {D}_s $, are the $ 8\times 8\times 24 $ temperature maps. The input is then passed through a series 2D convolution layers with the output of the prior layer serving as input into the following layer. The final convolution layer outputs a 2-dimensional $ 24\times 1 $ vector, which is flattened into a one-dimensional vector before it is passed through a dense FC linear block to produce a score.

Figure 4

Figure 5. Temporal discriminator$ {D}_{\mathrm{t}} $architecture. The inputs to the temporal discriminator$ {D}_t $, are the $ 8\times 8\times 24 $ temperature maps and then a gradient computation is followed, to compute $ \frac{\partial T}{\partial t} $. The output from the convolution layers is flattened and passed through a series of fully-connected (FC) linear layers to produce a score.

Figure 5

Figure 6. Ground-truth (top) and generated (bottom) hourly snapshots of samples of a summer day in California Bay Area (2011–2014).

Figure 6

Figure 7. Ground-truth (top) and generated (bottom) hourly snapshots of samples of a winter day in Nevada. (2011–2014).

Figure 7

Figure 8. Histograms with kde plots comparing ground-truth (blue) and generated (red) empirical distribution plots for a $ {1}^{\circ}\times {1}^{\circ } $ region around the Los Angeles California (2011–2014).

Figure 8

Figure 9. Q–Q Plot envelopes for winter (9a) and summer (9b) in Nevada region.

Figure 9

Figure 10. SPAC’D plots for Nevada, San Francisco Bay, Washington, and Oregon regions (top to bottom). The left plots represent initial training steps and the right plots display latter training steps. The red line is the baseline for which the model is compared with at various regions (1979–1982). It is evident that the model learns some of the structure of the temperature fields later into training.

Figure 10

Figure 11. Temporal gradients distribution plots, Los Angeles (1979–1982).

Figure 11

Figure 12. Histograms with kernel density estimate plots for maximum daily temperatures. Nevada (1979–1982).

Figure 12

Figure 13. Four-year timeseries plots of daily maximum temperatures for the ground truth, GAN, and WeaGETS. Nevada (1979–1982).

Figure 13

Table 1. FDTD (K) for period 0 (1979–1982), San Francisco Bay area avg. distance 0.4 K/°C

Figure 14

Table 2. FDTD (K) for period 8 (2011–2014), San Francisco Bay area avg. distance 0.6459 K/° C (Note: model was not trained on data from this period)

Figure 15

Table 3. FDTD (K) for period 0 (1979–1982), Portland, Oregon area avg. distance 0.4750 K/°C

Figure 16

Table 4. FDTD (K) for period 8 (2011–2014), Portland, Oregon area avg. distance 0.3655 K/°C (Note: model was not trained on data from this period)

Figure 17

Table 5. TGDD values. $ {k}_0 $ (1979–1982)

Figure 18

Table A1. Generator $ G $ architecture

Figure 19

Table A2. Discriminator$ {D}_{\mathrm{s}} $architecture

Figure 20

Table A3. Discriminator$ {D}_{\mathrm{t}} $architecture

Figure 21

Figure A1. Each conditional variable is passed into a model block that transforms it into a higher dimensional learned embedding, resulting in the labels being mapped from $ {\mathrm{\mathbb{R}}}^{15} $ to $ {\mathrm{\mathbb{R}}}^{100} $.

Figure 22

Figure A2. SPAC’D computation description. Using the feature vector, a correlation matrix is computed. The matrix L1 norm distance between the true and generated correlation matrices is calculated and reported as SPAC’D.

Figure 23

Figure B1. Visually observe that diurnal cycles generated which includes a temporal gradient penalty (rows 3 and 4) are of higher quality than the models without the temporal gradient penalty (rows 1 and 2). Note that the actual temperature values for these plots do not matter as we randomly sample from the real and generated examples to visually compare only the temporal patterns from the real data and the generated data.

Figure 24

Table B1. Effect of temporal gradient penalty.

Figure 25

Table B2. FDTD (K) for period 0 (1979–1982), Nevada area. average distance 0.6 K/°C

Figure 26

Table B3. FDTD (K) for period 8 (2011–2014), Nevada area. avg. distance 0.5192 K/°C

Figure 27

Table B4. FDTD (K) for period 3 (1991–1994), Portland, Oregon area. avg. distance 0.2731 K/°C

Figure 28

Figure B2. Q–Q envelopes for Nevada region using 100 TemperatureGAN generated samples.

Figure 29

Figure B3. Q–Q envelopes for Portland region using 100 TemperatureGAN generated samples.

Figure 30

Figure B4. Q–Q plot envelopes for Washington region using 100 TemperatureGAN generated samples.

Figure 31

Figure B5. Q–Q plot envelopes for San Francisco Bay region using 100 TemperatureGAN generated samples.

Figure 32

Figure B6. Monthly ECDF plots for Los Angeles (LA). Observe the distributions exhibit a rightward shift for hotter months, especially with the tails stretching, indicating more warming over 24 years.

Figure 33

Figure B7. Los Angeles county temperature sample distributions for each month using the kernel density estimate plots for period $ {k}_0 $ (1979–1982).

Figure 34

Figure B8. Generated monthly distributions while varying the period variable for Los Angeles (LA) County region. Plots show$ {k}_0,{k}_2,{k}_4,{k}_6 $.

Figure 35

Figure B9. Images show month-region-period based sample distributions from the TemperatureGAN. One can observe a positive distribution shift for some months. Comparing to real data distribution below, one can see the generative model captures the distribution shifts from $ {k}_0 $ to $ {k}_6 $ that exist in specific months within that region. $ {k}_0 $ (blue, 1979–1982), $ {k}_6 $ (red, 2003–2006) for LA County region.

Figure 36

Figure B10. Images show month-region-period based sample distributions from the ground-truth data. $ {k}_0 $ (blue, 1979–1982), $ {k}_6 $ (red, 2003–2006) in LA County region.