Identifying compound weather drivers of forest biomass loss with generative deep learning

Abstract Globally, forests are net carbon sinks that partly mitigates anthropogenic climate change. However, there is evidence of increasing weather-induced tree mortality, which needs to be better understood to improve forest management under future climate conditions. Disentangling drivers of tree mortality is challenging because of their interacting behavior over multiple temporal scales. In this study, we take a data-driven approach to the problem. We generate hourly temperate weather data using a stochastic weather generator to simulate 160,000 years of beech, pine, and spruce forest dynamics with a forest gap model. These data are used to train a generative deep learning model (a modified variational autoencoder) to learn representations of three-year-long monthly weather conditions (precipitation, temperature, and solar radiation) in an unsupervised way. We then associate these weather representations with years of high biomass loss in the forests and derive weather prototypes associated with such years. The identified prototype weather conditions are associated with 5–22% higher median biomass loss compared to the median of all samples, depending on the forest type and the prototype. When prototype weather conditions co-occur, these numbers increase to 10–25%. Our research illustrates how generative deep learning can discover compounding weather patterns associated with extreme impacts.


Impact Statement
Tree mortality is a complex phenomenon involving multiple processes at different temporal scales.Here, we rely on very long simulations of a forest model and develop a method based on generative deep learning to find the relationship between complex weather patterns and tree mortality.The generative nature of the method allows for the generation of new realistic weather conditions outside of the provided samples, which are associated with high biomass loss in a forest.Furthermore, the method can be applied to different weather-driven impacts, adding to the growing pool of explainable/interpretable data-driven methods for Earth system sciences.

Introduction
Forests are the predominant atmospheric carbon sink on land that play a crucial role in mitigating anthropogenic climate change.Forests sequestered 7.6 GtCO 2 e/yr on average between 2000 and 2019 (Harris et al., 2021), which amounted to 22% of the total fossil CO 2 emissions during 2020 (Friedlingstein et al., 2022).Additionally, forests regulate microclimates, sustain biodiversity, and provide ecosystem services such as the production of timber, edible wild plants and fungi, pest control, and increasing nitrogen availability (Felipe-Lucia et al., 2018).However, sudden and unexpected weather-related forest mortality events have been observed in recent years across the globe, including in ecosystems that were not previously considered at risk (Hartmann et al., 2022).In the European Union, the net carbon sink from forest land decreased by 12% between 2010 and 2018 (Pilli et al., 2022).Weather conditions often interact with disturbances such as insect outbreaks, diseases and forest fires to cause forest mortality (Sturrock et al., 2011;Millar and Stephenson, 2015;Jain et al., 2022).With weather extremes expected to increase in intensity and frequency in the future (Seneviratne et al., 2021), an improved understanding of the weather drivers of forest mortality is necessary for the development of effective forest conservation and management strategies.
Understanding the influence of weather drivers on large-scale tree mortality is challenging because forest mortality depends on complex interactions between weather events, the presence of pests and diseases, and forest characteristics such as canopy height, tree cover, biomass, and diversity (Bastos et al., 2023).The forest structure and diversity determine forest resistance to climatic or biotic stressors (for example, a forest with less biomass requires less water for its growth), and this structure, in turn, depends on previous weather conditions (López-Serrano et al., 2005;Bequet et al., 2011;Ježík et al., 2011;Rais et al., 2021;Mahecha et al., 2022;Li et al., 2023).This means that present-day tree mortality can be influenced by weather conditions from previous years.Extreme climate events such as droughts and heatwaves are common drivers of tree mortality (Allen et al., 2010;Anderegg et al., 2013;McDowell et al., 2013;Park Williams et al., 2013;Choat et al., 2018;Senf et al., 2018;Brodribb et al., 2020).However, due to the complex interactions between climatic, biotic, and forest state factors, forest mortality events can also be driven by the temporally compounding effects of multiple weather events, such as observed in sequential droughts (Zscheischler et al., 2018(Zscheischler et al., , 2020;;Sánchez-Pinillos et al., 2022;Bastos et al., 2023).
Modeling forests with process-based models can aid in improving the scientific understanding of forest dynamics.There is a long tradition in ecology of using individual-based forest models to answer a wide range of scientific questions (Shugart et al., 2018), which simulate each tree's growth within a forest stand.These forest gap models were first developed for North America and have become one of the most widely used model types in ecology (Bugmann, 2001).Various process-based models are currently employed, each with different strengths and weaknesses (Mahnken et al., 2022).
Due to the complexity of the simulated processes and their convolution through time, it is challenging to disentangle weather conditions associated with high forest mortality, even in a model environment.Hence, a better understanding of such processes is needed (Bugmann et al., 2019).Currently, two widely used approaches exist to identify the drivers of tree mortality in forests.First, one can gain insight through forward simulations from models (Taccoen et al., 2019;Scheel et al., 2022).Forcing the model with perturbed weather data and analyzing the output in a conventional sensitivity analysis setting has two main challenges.Due to the long-range temporal dependency of weather variables, the forcing data have many input features.The large number of features translates into an even larger number of combinations of weather, making a comprehensive analysis computationally infeasible.Furthermore, defining ways to modify multivariate weather data so that the modified data are physically realistic is not trivial due to structured correlation within the variables.Nevertheless, forward simulations are helpful in analyzing individual events.Second, statistical methods such as linear regression, generalized additive models, principal component analysis, and random forests can be used to understand the impact of weather drivers (Park Williams et al., 2013;Matusick et al., 2018;Xu et al., 2018;Crouchet et al., 2019;Senf et al., 2020;Forzieri et al., 2022;Jaime et al., 2022;Le Grix et al., 2023).Linear methods typically fail to capture e4-2 Mohit Anand et al. complex nonlinear forest dynamics, and conventional machine learning methods like random forests or support vector machines require predefining predictors from an often extensive set of potential candidates.
Deep learning (DL) methods can capture the complex temporal dynamics in high-dimensional data and have been extensively used for time series forecasting and classification problems (Wang et al., 2017;Zhao et al., 2017;Wu et al., 2020;Lim and Zohren, 2021;Marcolongo et al., 2022).Despite high predictive accuracy, DL methods are often criticized for their lack of explainability.Most current weatherinduced impact research utilizes post hoc methods, which interpret an existing, trained model (for example SHapley Additive exPlanations (SHAP) values, Layer-wise Relevance Propagation (LRP), feature importance analysis, and regression activation maps (Wolanin et al., 2020;Labe and Barnes, 2021;Yan et al., 2022;Liu et al., 2023;Mamalakis et al., 2023;Sweet et al., 2023;Wadoux et al., 2023).The advantage of post hoc methods is that they can be used with any model architecture and, therefore, do not require a trade-off of predictive model performance.However, such approaches explain model decisions on specified data points and do not provide a holistic view of the entire decision function (Bordt et al., 2022).An alternative approach is to use intrinsically explainable machine learning models (Rudin, 2019).A method in this approach is to learn a lower-dimensional representation of the input and relate it to the output with a more explainable function (Alvarez-Melis and Jaakkola, 2018;Kim et al., 2018;Chen et al., 2019;Varando et al., 2021;Yang et al., 2021;Gautam et al., 2022).A generative DL architecture like the variational autoencoder (VAE) is one way to learn a lower-dimensional explainable data representation.VAEs learn a lower-dimensional representation (latent variables) of a very high-dimensional dataset by minimizing a reconstruction error and constraining the latent distribution.
In this study, we develop a VAE-based methodology to generate lower-dimensional representations of complex and correlated weather conditions.We apply and evaluate its performance on simulated data as a proof-of-concept, making use of long simulations of multiple forest types driven by synthetic weather data from a weather generator.By using modeled data, we are able to compare our results to the known processes of the model and thereby validate our novel, data-driven approach.With the help of the developed method, we can search for representations (weather prototypes) that are associated with high biomass loss events and thereby identify compound weather drivers of tree mortality.We also explore the relationship between compounding weather prototypes and biomass loss.Our developed methodology could, in future, be applied to real-world data (such as inventory datasets from national forest inventories), for which no ground truth is typically available for method validation.

Data and methods
An overview of the methodology is shown in Figure 1.We use simulated weather from a weather generator to run a parameterized forest model for the Hainich beech forest in Germany at 51.08°N and 10.43°E.We use composites and logistic regression analysis to obtain a general overview of the weather and biomass loss relationships.Next, a VAE is used to learn a low-dimensional representation of weather conditions.These representations are then used to identify weather prototypes associated with high levels of biomass loss based on the forest simulation for three different types of forests, namely beech, pine, and spruce.
2.1.Weather simulation AWE-GEN is an advanced stochastic weather generator that can generate realistic high-resolution (hourly) weather variables (Fatichi et al., 2011).It can reproduce low-and high-frequency characteristics of weather variables, such as the interannual variability of precipitation and temperature, and has been widely used in hydrology and ecological modeling (Mastrotheodoros et al., 2020;Meili et al., 2020;Hirschberg et al., 2021;Marcolongo et al., 2022).For this study, AWE-GEN is calibrated on 40 years (1979-2019) of bias-adjusted ERA5 reanalysis data designed explicitly for impact studies (Cucchi et al., 2020) for the location of the Hainich forest.AWE-GEN requires precipitation, cloud cover, air temperature, shortwave radiation, relative humidity, wind speed, and atmospheric pressure at hourly time step for calibration.For the cloud cover, a single level of hourly ERA5 data is used (Hersbach et al., 2020).We compute relative humidity and specific humidity from pressure and temperature.Ultimately, 200,000 years of hourly stationary weather is generated after calibration.
The simulation's mean annual temperature and precipitation are 8.5 C and 693 mm, respectively.The monthly seasonal cycle of temperature and precipitation is well captured by AWE-GEN, with slightly too warm temperatures in the weather generator (Figure 2a).Reanalysis data show strong negative correlations between monthly radiation and precipitation throughout the year and positive correlations between radiation and temperature during the summer (Figure 2b).The weather generator generally captures these correlations, albeit with weaker magnitude (Figure 2c).

Simulation of forest dynamics
We simulate forest dynamics with the forest gap model FORMIND, a process-based, individual-oriented, grid-based forest growth model (Fischer et al., 2015(Fischer et al., , 2016)).It describes forest succession in small-scale forest patches (20 m × 20 m).The main processes considered are tree growth, mortality, recruitment, and competition; the trees within a forest patch compete for space, light, and water.Carbon balance-based ecophysiological processes, such as photosynthesis, respiration, and carbon allocation, are used to calculate individual tree growth.Photosynthesis is related to temperature with a bell-shaped curve, whereas water shortage linearly reduces photosynthesis.In addition, deciduous species only sequester CO 2 during the temperature-dependent growing season.Maintenance respiration is also modified by temperature following the well-established Q 10 approach (full model description in Bohn et al., 2014).
In the FORMIND model, mortality processes occur due to competition for space and reduced growth rate.Space-competition mortality occurs under optimal growing conditions when crowns occupy more space than is available.The model then randomly removes trees until the remaining crowns have sufficient space.In addition, temperature also affects Net Primary Productivity (NPP) through maintenance respiration.NPP is used to calculate the diameter increment of each tree.The increment in diameter affects the biomass loss as there is stochastic diameter increment-dependent mortality included in FORMIND (Bohn et al., 2014, www.formind.org).FORMIND has been frequently used to simulate temperate forest dynamics (Bohn et al., 2014;Rödig et al., 2017;Bohn et al., 2018;Bugmann et al., 2019;Holtmann et al., 2021).
Here, we conduct three simulations, each time simulating only beech, pine, or spruce trees.Beech trees are late-successional, shade-tolerant, and deciduous, whereas pine is a fast-growing, light-demanding, evergreen needle-leaf tree.Spruce is a moderately fast-growing, shade-tolerant, evergreen, needle-leaf tree.The 200,000 years of AWE-GEN-generated precipitation and temperature data, together with solar irradiance (computed as a fraction of shortwave radiation), are used as an input to FORMIND.The specific parameters of the model are derived from various sources.For instance, we use yield tables to fit allometric and growth functions.Other parameters are obtained by taking the average over several field measurements, like wood density or leaf area index.For more details, see Bohn et al. (2014).The simulations are conducted for 200 hectares of land to reduce the internal stochasticity of the model.The forest model starts simulating the growth of trees from barren land and thus requires a spin-up period of roughly 2,000 years to reach a quasi-equilibrium.The simulations are conducted in chunks of 10,000 years for computational efficiency.Therefore, we do not analyze the first 2,000 years for each of these parts and are finally left with 160,000 (8,000 × 20) years of forest data.Only the 160,000 years of weather is then used for further analysis.The simulations, even though potentially strongly differing from real-world behavior, have enough complexity to validate our proof-of-concept.
FORMIND simulates multiple variables at a daily and yearly time scale.For this analysis, we use biomass loss (BL) on a yearly scale.Instead of the fraction of trees dying yearly, BL refers to the fraction of biomass lost yearly.For example, the BL may be low when some young (therefore small) trees die but high when the same number of old (therefore giant) trees die.BL combines different types of mortality, namely background mortality, which depends on the genus; size-dependent mortality, which accounts for the mortality of young trees; and diameter increment-dependent mortality, which accounts for the fact that older trees have a higher mortality rate.We also consider five forest structure variables at a yearly time scale: age, stem volume, cumulative leaf area index, height, and diameter of each tree.The information on the state variables is condensed as histograms with 51 bins.
As this study focuses on extreme biomass loss, we convert BL to a binary value by thresholding at the 90th percentile.To understand extreme BL (y t ), the weather conditions in the current year and the two previous years (x d tÀ2 ,x d tÀ1 , x d t ) are considered in the analysis.By definition, x d t has a dimension of 12 × 3 for 12 months and three variables (temperature, precipitation, and radiation).The forest structure variables at the end of the third year before the considered BL year (x s tÀ3 ) are also used in some of our analyses.x s t thus refers to the five forest structure variables described by 51 binned distributions each.When the subscript is not specified, x s (51 × 5) refers to x s tÀ3 and y (1 × 1) refers to y t .

Composites and logistic regression
Composites are the difference of the mean weather for years with extremely high BL with respect to the mean weather conditions for all the years.We first construct three years of extreme BL weather composites (BL ≥ 90th percentile) to identify general weather conditions associated with high BL.We also train a logistic regression model, which can be used to identify meteorological drivers of large impacts (Bevacqua et al., 2021;Vogel et al., 2021): where ℙ BL ≥ 90thpercentile ð Þ is the probability of extremely high BL and x ¼ vectorize . The logistic regression model is used to understand the relationship between three years of monthly weather (x d ) and BL y at the end of the third year.In addition to dynamic weather conditions (x d ), the BL also depends on the forest structure.Therefore, we also train a logistic regression model with the forest structure variables (x s ) to quantify their importance.We assess the goodness of fit of the regression models using the Critical Success Index (CSI, Donaldson et al., 1975), F 1 -Score and Average Precision (with definitions provided in Appendix B).

Variational autoencoder as a generative model
A variational autoencoder (VAE) is a form of DL model that can extract nonlinear feature representations from high-dimensional data, which has been widely used in Earth sciences, weather applications and remote sensing (Camps-Valls et al., 2021;Tibau et al., 2021).AVAE is a generative model that can learn low-dimensional representations of complex data in an unsupervised way.It comprises an encoder, with the parameters ϕ, that estimates the latent representation and a decoder part, with parameters θ, that reconstructs the input samples from the latent dimensions.Overall, the aim is to estimate the loglikelihood of some observed data x.
This can be done by using variational Bayesian inference.We are interested in the posterior distribution p zjx ð Þ, where x are our observations and z are latent variables.However, this posterior is often intractable because we cannot compute the evidence or denominator of Bayes' theorem, p x ð Þ, because the latent variables, z, must be marginalized out.The central concept of variational inference (VI) is to use optimization to find a more tractable distribution q z ð Þ from a family of distributions Q such that it is close to the desired posterior distribution p zjx ð Þ.In VI, proximity is defined using the Kullback-Leibler Divergence (D KL k ½ ), so the goal is to retrieve an optimal q z ð Þ that minimizes , which can be interpreted as minimizing the relative entropy between the two distributions.Mathematically, this is equivalent to maximizing the evidence lower bound (ELBO) (also sometimes called the variational lower bound or the negative variational free energy) (Kingma and Welling, 2013): where x and z represent the input data and latent variables respectively.The ELBO has two components.The KL-Divergence term enforces how close the latent dimensions z distribution should be to a specified prior distribution p θ z ð Þ, while the other term aims to minimize reconstruction error, log p θ xjz ð Þ.In the case of β-VAE, the hyperparameter β weighs the KL-Divergence term of the loss function: Therefore, a value of β > 1 pushes the model to learn a latent distribution closer to the specified distribution at the cost of a higher reconstruction error (Higgins et al., 2017) and vice versa for β < 1. Different latent distributions can be learned through VAE (Bond-Taylor et al., 2022), and here we choose a normally distributed prior with a diagonal covariance matrix with zero mean and unit variance.This particular choice of prior helps compute the KL-Divergence analytically.The input data (x ¼ x d ) consists of three weather variables, each a monthly time series of three years (Section 2.2).
We have modified the standard β-VAE by making a shared decoder for each weather variable (Figure 3).This is because each weather variable is a time series, and we expect the decoder to learn the general features of time series data.The encoder outputs two 32-dimensional vectors, one for the mean e4-6 Mohit Anand et al.
and the other for the log variance.Then, 32-dimensional latent values (following a multivariate normal distribution) can be sampled from these parameters.The decoder always takes in 16-dimensional latent variables to reconstruct one of the weather variables.The decoder takes eight weather-specific and eight shared dimensions and returns one reconstructed weather variable.This is done for each weather variable, and to obtain the full reconstructed output, three passes of the decoder are required.The eight shared dimensions capture the correlation between the weather variables (Figure 2b).As a design choice, the first eight dimensions are selected to represent radiation, the next eight for precipitation, the next eight for temperature, and the last eight are shared between all variables.The encoders and decoders can have different architectures.In the encoder, a convolution layer is used to learn the different patterns in the data, increasing the number of dimensions, followed by max-pooling (Scherer et al., 2010), which reduces the number of dimensions.Batch normalization (Ioffe and Szegedy, 2015) is used, followed by LeakyReLU (Maas et al., 2013) as the nonlinear function.This step is repeated twice for 256 and 512 feature kernels.After reshaping the output, two dense layers generate the mean and the log variance vector.Samples from the latent dimensions are then fed to the decoder.First, a dense layer increases the dimensions to reshape it into 9 × 1 × 32.This is followed by upsampling, transposed convolution (Zeiler et al., 2010), and batch normalization with Leaky-ReLU as an activation function.Like in the encoder, this is repeated twice, leading to an output of dimensions 36 × 1 × 1.Finally, a convolutional layer is used without any activation function to get the output (see Figure 3 for more details).
The entire network is trained through backpropagation (Rumelhart et al., 1986).The inputs are normalized based on the statistics of the training set.Adam optimizer is used with a learning rate of 0.0001, the upsampling and downsampling size is 2, 1 ð Þ, and the kernel size is 3, 1 ð Þ with stride as 1.LeakyReLU is used with the value α ¼ 0:3.The train, validation, and test split is 80%, 10%, and 10%, respectively.The total sample size available for training is 126,976.The model is trained for 150 epochs with a β ¼ 1:5 (cf.Equation 1), and the batch size is 256.The computation was done on NVIDIA A100 GPU using TensorFlow v2.9 (TensorFlow Developer ( 2022)).The model training time is less than 20 minutes.

Identifying climate prototypes associated with extremely high BL
To find the latent dimensions that are associated with extremely high BL, for each latent dimension, we compute the mean over samples that are associated with high BL and the mean over the remaining samples.We consider latent dimensions relevant for increasing BL when the difference between the two sample means exceeds half of the standard deviation (0.5).
We create weather prototypes associated with high BL based on the identification step above.For each identified latent dimension of interest, we change its value by a fixed amount s in the direction that increases the BL while setting all other dimensions to 0. If the absolute magnitude of the correlation between two dimensions is greater than 0.3.In that case, we also change the value of the correlated dimension, assuming a bivariate Gaussian distribution with a covariance that results in this correlation.Finally, we use the decoder to generate the climate prototype P i|s for a given change s in dimensions i of the latent space.Compound climate prototypes are generated similarly by selecting the two latent dimensions of interest and changing both in the direction that increases BL, thus generating prototypes P i∧j .We generate univariate prototypes with s ¼ 2σ and compound prototypes with s ¼ 2σ/number of dimensions, denoted by P i|2σ and P i∧j|σ , respectively.
We further test whether samples corresponding to the prototypes are associated with generally higher levels of BL.To this end, we assign weather conditions in the original input space to a univariate prototype if, in the latent space, the value of the corresponding latent dimension exceeds s, i.e., two standard deviations.Similarly, a sample corresponds to a compound prototype if, in latent space, values in the two corresponding latent dimensions concurrently exceed s, i.e., one standard deviation.

Composites and logistic regression
The median BL is 1.8%, 2.4%, and 3.3% per year for beech, pine, and spruce, respectively (Figure 4).The BL distribution is highly right-skewed, with a few years of experiencing extremely high BL values.Excess kurtosis for BL values is 24.8, 17.4, and 8.4 for beech, pine, and spruce forests.Generally, a lower BL is associated with lower radiation and temperature averaged over the preceding three years but higher precipitation, and vice versa for higher BL.
When we increase the temporal resolution of the composites to monthly values, we find that extremely high BL is associated with dry conditions in all forests, especially over the previous 18 months (Figure 5a,c,e).Another prominent feature is the strong positive temperature anomaly from May to September of the BL year.For beech, composites show no temperature anomalies in October t but positive anomalies in November t and December t .For pine and spruce, the signal is reversed for these three months; anomalies are strongly positive in October t but negative in November t and December t .
Radiation has an overall weaker signal that is negatively correlated with precipitation.
Predicting BL based on monthly weather predictors reaches average precision values between 0.64 (beech) and 0.70 (pine) when no structure variables are included (Table 1).Including structure variables as predictors increases accuracy substantially, reaching values between 0.76 and 0.85.Generally, the performance of the linear model is similar for pine and spruce forests and higher than for the beech forest.The coefficients of the logistic regression (without structure variables) generally mirror the behavior of the composites (Figure 5b,d,e).Coefficients associated with precipitation for high BL years are negative, increasing in magnitude until May tÀ1 and staying constant until July t , finally decreasing until the end of the time series.The coefficients for temperature are relatively small, except between April t and September t .The coefficients associated with radiation are close to zero most of the time.The logistic regression coefficients and composite plots show similar trends in all the variables except for radiation in year t .We explore the latent dimensions z i ð Þ of the trained β-VAE.Each three-year input sample is associated with one point in the 32-dimensional latent space.We test whether certain dimensions of this space are associated with weather conditions that lead to higher BL.Generally, we find high variability in the shift in distribution between the samples associated with extremely high BL and the remaining samples across the 32 dimensions (Figure 6, left).For all forests, z 14 exhibits the largest absolute difference in the means (Figure 6, right) followed by z 22 and z 12 .Here, z 12 shows a higher absolute mean difference than z 14 for beech and vice versa for pine and spruce.All three dimensions show significant (p < 0:01) mean absolute differences between years with extreme and nonextreme BL > 0:5 for all forests and are selected to construct the prototypes of weather patterns associated with the high BL in the following.We also test whether extreme BL is predictable when only relying on these three dimensions using a logistic regression model.Prediction accuracy is much lower than when monthly weather predictors are used, reaching average precision between 0.30 and 0.34, but is substantially higher than random (compare Table 3 with Table 1).When relying only on these three dimensions, beech and pine BL are better explained by the latent dimensions than spruce BL.
Our VAE architecture allows us to generate precipitation, temperature, and radiation independently from a sample in the latent space.Still, correlations exist between weather variables in the input space (Section 2.1).These correlations propagate to correlations between latent dimensions.We observe that the latent dimensions responsible solely for radiation (z 1:8 ) are negatively correlated (correlation < À 0:3) with the latent dimensions responsible solely for precipitation (z 9:16 ), respectively (Figure A1 in Appendix A).All other dimensions are not correlated.

Weather prototypes and link to BL
The three selected dimensions z 14 , z 12 and z 22 are used to generate weather prototypes P 14|2σ , P 12|2σ , and P 22|2σ .By construction, z 14 and z 12 only affect the reconstruction of precipitation (Section 2.4).However, because of the correlation between z 14 and z 6 , as well as z 12 and z 4 (Figure A1), changing z 14 and z 12 also requires changing the values of z 6 and z 4 to obtain realistic prototypes.The prototype based on z 22 is generated by only changing z 22 .
Prototypes generated from the selected latent dimensions represent basic weather conditions associated with increased BL (Figure 7).Prototype P 14|2σ and P 12|2σ , by design, only show changes in precipitation and radiation.Both prototypes feature negative precipitation anomalies and positive radiation anomalies that are correlated.P 14|2σ is characterized by negative precipitation anomalies in all three summers, coincident with higher radiation.Precipitation is also lower in the autumn of the second year.The lowest precipitation values occur in the year t followed by the year tÀ2 and the year tÀ1 .P 12|2σ has a different pattern and is characterized by two periods of negative precipitation anomalies.The first one occurs between March tÀ2 and July tÀ1 and is less pronounced than the negative anomaly in the second period, which occurs in the first five months of the last year.We only see slightly positive precipitation anomalies in the last half of year t .In both prototypes, maximum precipitation anomalies are around 0.5 mm/d.Prototype P 22|2σ only features temperature variations, with positive temperature anomalies in three periods.The first period starts from March tÀ2 to August tÀ2 , the second from the March tÀ1 and continues for the entire year.The last high-temperature period starts from March t and ends in September t .There is a short period at the beginning of the last year where we see negative temperature anomalies.Based on their anomaly patterns, we call prototypes P 14|2σ Dry Summers, P 12|2σ Dry Winters and P 22|2σ Hot Summers.We generate two compound prototypes based on the selected three mortality-influencing latent dimensions, namely P 12∧22|σ and P 14∧22|σ .While the univariate prototypes are created by moving two standard deviations along a single latent dimension in the direction of increasing BL, compound prototypes are created by moving one standard deviation in two latent dimensions (Section 2.5).P 14∧22|σ corresponds to dry and hot summers, while P 12∧22|σ corresponds to dry winters and hot summers.Weather conditions associated with compound prototypes occur more frequently in the training dataset (2.5-3.4% of all samples) than those associated with univariate prototypes (1.7-2.2%).
Weather conditions that are associated with any of the prototypes (Section 2.5) are generally associated with a higher BL in all forest types when compared to all weather conditions (Figure 8).The effect is particularly strong in the tail of the distribution; the 90th percentile in the BL distribution is about twice as large for the prototypes compared to all samples (Table 4).From the univariate prototypes, P 14|2σ (Dry Summers) is associated with the highest increase in both median and 90th-percentile BL across all forest types (10.4-21.5% and 29.8-107.3%,respectively).Weather conditions associated with the compound prototypes P 14∧22|σ and P 12∧22|σ are generally associated with higher BL compared to weather conditions associated with any of the univariate prototypes, with P 14∧22|σ (Dry and Hot Summers) leading to slightly higher median BL (9.7-24.0%)compared to P 12∧22|σ (dry winters and hot summers, 10.8-24.6%).

Discussion
This study illustrates how novel VAE-based architectures can learn weather prototypes associated with high BL in a simulated forest setting based on the weather for a given time period.Working with simulated Figure 8. Biomass loss density plot for all samples (top), samples contributing to univariate prototypes (2nd to 4th row) and samples contributing to compound prototypes (last two rows) for beech (left), pine (middle), and spruce (right).Vertical dashed and dash-dotted lines represent the median and 90th percentile in each distribution.

e4-12
Mohit Anand et al. data allows us to avoid the complexities of real-world data arising from nonstationarity and confounders' influence.Applicability of these methods beyond stationary weather requires careful validation and potential modifications to the developed methodology, which is beyond the scope of this paper.Lastly, the reconstruction errors, though small and similar for all the weather variables, impact generating realistic weather prototypes.Very high reconstruction errors would lead to reconstructions different from the input.Within these limits, we test the applicability of our approach.The method could further be adapted for use with observational data in future research.
Here, we learn the weather representations in a completely unsupervised setting and then try to find the latent dimensions associated with high BL.Our generative model can produce realistic weather prototypes.Of the three univariate prototypes generated, dry summers and dry winters feature negative precipitation anomalies and positive radiation anomalies, while hot summers feature positive temperature anomalies in summers.All identified weather prototypes are associated with higher BL overall, particularly in the tail of the BL distribution.Dry Summers lead to the highest increase in BL within the univariate prototypes.However, when combined, dry and hot summers lead to the highest increase in BL overall.Gazol and Camarero (2022) also found that European forest mortality frequently cooccurs with dry and hot summers.Vegetation models and Earth system models also generally support this finding (Zscheischler et al., 2014a(Zscheischler et al., , 2014b;;Bastos et al., 2021;Tschumi et al., 2022Tschumi et al., , 2023)).Our analysis shows that compound weather prototypes lead to higher BL than univariate weather prototypes despite occurring more frequently.This is consistent with real-world observations, where concurrent hot and dry conditions have often been identified as primary drivers of forest mortality (Allen et al., 2010;Anderegg et al., 2013;Crouchet et al., 2019;Hammond et al., 2022;Yi et al., 2022).
Some weather variables used are correlated (Figure 2), so we modify the traditional β-VAE architecture to work well with correlated time series.In particular, we design our encoder to take multivariate, highdimensional weather data and learn three sets of eight weather-specific latent dimensions, each corresponding to one of precipitation, radiation, and temperature.The final eight latent dimensions contain information about all three weather variables.The decoder takes 16 (eight weather-specific and eight shared) latent dimensions and outputs one of the weather variables.The rationale is that our decoder only learns the structure of the weather time series, and the latent representation has all the information about the time series.While working with correlated weather variables, our architecture helps us learn unbiased weather representations (Table 2).In earlier tests with the standard β-VAE architecture, we found that the two more correlated weather variables (precipitation and radiation) were better represented than the less correlated ones (temperature).Though we expected all the correlations between weather variables to be captured in the eight shared latent dimensions, this was not the case.We find a one-to-one correlation in all latent dimensions related to solar radiation and precipitation (z 1:8 and z 9:16 , Figure A1).The correlations in latent dimensions highlight that radiation and precipitation are correlated for most months of the year.Our method respects this correlation when generating prototypes (Section 2.5).Autoencoders have been used to learn representations of weather variables (Guevara et al., 2021;Heinze-Deml et al., 2021;Oliveira et al., 2022;Ahn et al., 2023).However, so far, most work focuses on learning spatial representations of weather variables or univariate spatiotemporal representations instead of a multivariate temporal latent representation of variables.Huamin et al. (2020) try to learn time series representation, but only after transforming the time series to a recurrence plot, learning the latent representation of 2-D images for a univariate time series.The method we develop in this study can learn representations of multivariate time series and separate the representations specific to weather variables while considering the correlation between the variables.The latent variables can capture the patterns in the weather variables, but they have no inherent information about forest structure or BL.Thus, they do not learn representations specific to high BL or given forest structures.In unsupervised settings, VAEs cannot learn the disentangled representation without inductive biases (Locatello et al., 2019).Including BL and forest structure information would introduce an inductive bias and possibly improve representation learning.In contrast, by not including forest structure, we identify weather prototypes leading to higher overall BL rather than conditioned to specific forest structures.The use of β-VAE may also be a limitation, as the reconstruction error is higher than that of standard VAE for β > 1 (in Equation 1), possibly due to a trade-off of reconstruction error against disentanglement aligned with human intuition (Burgess et al., 2018).Fil et al. (2021) show how β > 1 does not always yield the best qualitative disentanglement for complex datasets.There is no clear consensus on the correct choice of β, and one must subjectively balance the trade-off between disentanglement and reconstruction error.
Compared to simple composite analysis and logistic regression, the VAE allows us to understand better the diverse weather patterns associated with high BL.In particular, we identify three weather prototypes and their possible combinations, which would not be possible using simple analysis.Nevertheless, the composite plots and logistic regression coefficients also give interesting insights.The composites of anomalies (Figure 5) show drier weather on average before BL years alongside higher temperatures in the last year.Unlike pine and spruce, we see negative temperature anomalies for beech forests in October, which rise in November and December of the previous year.This may be explained by the difference in vegetation period between the forest types.In the FORMIND beech forest model, the dynamic vegetation period ends when the ten-day mean air temperature falls below 9 °C.Therefore, a negative temperature in October would lead to an early end of the vegetation period.Higher temperatures after this may lead to high BL by increasing maintenance respiration and inducing plant water stress.For needle-leaf forests like pine and spruce, the simulated vegetation period is year-round, so the effect of lower temperature leading to higher BL dominates.In general, temperature has both positive and negative effects on plant productivity.Increasing temperature has been reported to increase phenology-driven carbon uptake for temperate forests (Keenan et al., 2014), and the higher the carbon uptake, the healthier the forest.At the same time, high temperatures increase maintenance respiration for plants and can increase the atmosphere's evaporative demand (Millar and Stephenson, 2015), leading to water stress.
Using a DL model that can capture high-dimensional relationships enables us to study the relationship between BL and the preceding years of weather in more detail.There is strong evidence that it is essential to consider multiple years of weather when analyzing tree mortality in the real world.For instance, an assessment of many different forest mortality events across the globe showed that those are typically associated with negative precipitation anomalies two years before the mortality year (Hammond et al., 2022).In response to the Europe-wide 2003 drought, mortality peaked two years after the driest year (Senf et al., 2020;George et al., 2021).On the other hand, the drought in 2018 led to peak forest mortality in the same year (George et al., 2021).We chose the weather of the previous two years along with the current year to analyze the BL in the current year.Accounting for three years of weather emerged as a good tradeoff between model complexity and accuracy in our model simulations.
Adding information about the forest structure (x s ) alongside the weather variables (x d ) increases the predictive accuracy for all forest types (Table 1).It is well known that forest structures play an essential role in forest mortality (van Mantgem et al., 2009;Lines et al., 2010;Young et al., 2017;Restaino et al., 2019).Our VAE does not use any information about the forest structure or forest mortality in the current setting, which may limit its usability.Adding this information to the input or the latent space might lead to a better representation learning for characterizing the extreme BL years.
In our study, we run FORMIND for 200 ha, for which we then see a somewhat deterministic response of forest dynamics to weather conditions.For smaller areas, stochasticity dominates forest mortality (Eid e4-14 Mohit Anand et al.  and Øyen, 2003;Vanderwel et al., 2013).It is important to highlight that in this study, each simulated forest contains only one variety of trees.Most forests are more diverse and include multiple plant functional types.Thus, our interpretations of weather drivers apply to extensive homogeneous forests with a particular kind of tree.

Conclusion
In this study, we develop an interpretable machine learning-based method to identify compound weather drivers of BL in forests using generative DL on simulated forest dynamics.The proposed method is designed to work well with correlated time series variables, which are often present in climate-related data.In our application, where we use simulations of beech, spruce, and pine forests in a temperate climate, we find that sequential hot summers, sequential dry summers, and dry winters are the weather patterns chiefly associated with high BL.Furthermore, we demonstrate that combinations of these weather patterns are associated with even higher BL.The approach thus enables us to generate hypotheses for compound drivers of extreme impacts, which can then be verified by using, for instance, real-world observations.In principle, the method is able to generate new realistic weather conditions associated with high BL.Overall, the method is a step toward an improved understanding of the weather drivers of significant impacts by using deep generative models.

B. Metrics
Three metrics are used for measuring the performance of the logistic regression models.CSI is equal to the total number of correct events predicted (true-positives) divided by the total number of events predicted (true-positives + false-positives) plus the number of misses (false-negatives).
CSI ¼ TP TP + FP + FN : F 1 -score is the harmonic mean of precision and recall, which can also be written as a function of true-positives, false-positives, and false-negatives.

Figure 1 .
Figure 1.Illustration of the methodology.ERA5 data are used to calibrate AWE-GEN.AWE-GEN is used to generate realistic weather conditions.A VAE is used to learn low-dimensional weather representations and generate prototypes.Based on simulated weather conditions, FORMIND simulates forest dynamics, including mortality and biomass loss.FORMIND-derived biomass loss is used to select weather prototypes associated with high biomass loss.The explainable AI method used here consists of the training of VAE together with the generation of weather prototypes relevant for high biomass loss induced by forest mortality.

Figure 2 .
Figure 2. (a) Monthly mean temperature (red) and precipitation (blue) of the bias-adjusted ERA5 (dotted line) and simulated by AWE-GEN (continuous line).Correlation between monthly means across all three variable combinations for all months for the bias-adjusted ERA5 data (b) and data simulated by AWE-GEN (c).

Figure 4 .
Figure 4. Probability density estimate of biomass loss (left axis), along with three-year average standardized temperature (red), radiation (yellow) and precipitation (blue) (right axis).Vertical lines indicate the median, mean, and 90th percentile of the biomass loss distribution.

Figure 5 .
Figure 5. Composite of monthly standardized weather anomalies associated with the 90th percentile of biomass loss (left) for beech (a), pine (c), and spruce (e).Coefficients of logistic regression without structure variables (right) for beech (b), pine (d), and spruce (f).
The VAE architecture with encoder and decoder blocks.The encoder takes in 36 × 3 dimensional weather variables and outputs two 32-dimensional vectors of mean and log variance of a normal distribution, which define the latent space.The decoder takes samples in the latent variable space associated with specific weather variables in the input space (distinguished by different colors) to generate reconstructions of each of the individual weather variables.Three passes of the decoder are required to reconstruct all three weather variables.

Table 1 .
Performance metrics for the logistic regression with weather variables only (x d ) and with structure variables (x d and x s ) Weather representations via β-VAE Initial tests with vanilla β-VAE with similar layers showed better reconstruction for the correlated variables (precipitation and radiation) under-representing temperature.The modified β-VAE achieves similar reconstruction errors for all three weather variables (Table2) by introducing a shared decoder (Section 2.4).Losses and mean standard errors are similar between training, validation, and test datasets, suggesting that the model does not overfit.

Table 2 .
Training, validation and test losses and mean standard errors (MSE) for β-VAE Figure6.Absolute mean difference between samples in the latent dimensions representative of three-year periods leading to extreme and nonextreme BL for beech (a), pine (c), and spruce (e).The latent dimensions are sorted in descending order with respect to the absolute mean difference.Box plot of three latent dimensions with the highest absolute mean difference associated with nonextreme (orange) and extreme BL (purple) for beech (b), pine (d), and spruce (f).https://doi.org/10.1017/eds.2024.2Published online by Cambridge University Press

Table 3 .
Logistic regression with z 14 , z 22 and z 12 Figure7.Weather prototypes.Note that the y-axis ranges differ between the univariate prototypes (first three rows) and the compound prototypes (last two rows).https://doi.org/10.1017/eds.2024.2Published online by Cambridge University Press

Table 4 .
Percentage increase in the median and the 90th percentile biomass loss for different weather prototypes and different types of forests https://doi.org/10.1017/eds.2024.2Published online by Cambridge University Press