Hostname: page-component-89b8bd64d-j4x9h Total loading time: 0 Render date: 2026-05-06T07:42:16.593Z Has data issue: false hasContentIssue false

Spatiotemporal self-supervised pre-training on satellite imagery improves food insecurity prediction

Published online by Cambridge University Press:  18 December 2023

Ruben Cartuyvels*
Affiliation:
Department of Computer Science, KU Leuven, Leuven, Belgium
Tom Fierens
Affiliation:
Department of Computer Science, KU Leuven, Leuven, Belgium
Emiel Coppieters
Affiliation:
Department of Computer Science, KU Leuven, Leuven, Belgium
Marie-Francine Moens
Affiliation:
Department of Computer Science, KU Leuven, Leuven, Belgium
Damien Sileo
Affiliation:
Department of Computer Science, KU Leuven, Leuven, Belgium
*
Corresponding author: Ruben Cartuyvels; Email: ruben.cartuyvels@kuleuven.be

Abstract

Global warming will cause unprecedented changes to the world. Predicting events such as food insecurities in specific earth regions is a valuable way to face them with adequate policies. Existing food insecurity prediction models are based on handcrafted features such as population counts, food prices, or rainfall measurements. However, finding useful features is a challenging task, and data scarcity hinders accuracy. We leverage unsupervised pre-training of neural networks to automatically learn useful features from widely available Landsat-8 satellite images. We train neural feature extractors to predict whether pairs of images are coming from spatially close or distant regions on the assumption that close regions should have similar features. We also integrate a temporal dimension to our pre-training to capture the temporal trends of satellite images with improved accuracy. We show that with unsupervised pre-training on a large set of satellite images, neural feature extractors achieve a macro F1 of 65.4% on the Famine Early Warning Systems network dataset—a 24% improvement over handcrafted features. We further show that our pre-training method leads to better features than supervised learning and previous unsupervised pre-training techniques. We demonstrate the importance of the proposed time-aware pre-training and show that the pre-trained networks can predict food insecurity with limited availability of labeled data.

Information

Type
Methods Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Figure 1. SSSL: for one sample, positive samples are those that are closer in time to the image than the temporal threshold, and are closer in space to the sample than the spatial threshold.

Figure 1

Figure 2. Examples of $ 145\times 145 $ pixel tiles taken from composite LANDSAT-8 images of Somalia, exported from GEE (only RGB bands visualized), with corresponding IPC scores. Note that the difference between images with different IPC scores is not easily discernible.

Figure 2

Figure 3. IPC score distribution (a) for each country in the dataset from 2009 to 2020 and (b) for Somalia per year from 2013 until 2020. Note that IPC score 5 does only occur in 2011 in Somalia.

Figure 3

Figure 4. (a) Geography of pre-train data splits: train data are used for SSSL pre-training, validation data are used to select the best checkpoint after pre-training, and out-of-domain data are set aside. (b) Geography of downstream IPC score prediction data splits: train data are used for IPC score classification, validation data are used for early stopping and selecting the best checkpoint, out-of-domain and in-domain test data are used for evaluation.

Figure 4

Table 1. Comparison of (pre-training) dataset sizes in related work

Figure 5

Figure 5. Macro F1 on validation set $ {\mathcal{D}}_{val}^{ipc} $ using different configurations of positive and negative pairs (determined by temporal threshold $ {D}_t $ and spatial threshold $ {D}_g $) for SSSL pre-training, with $ {D}_t $ and $ {D}_g $ denoted on the x-axis. The baseline in this plot always predicts the majority class. “admin” means using administrative units instead of longitude/latitude to define spatial positive pairs.

Figure 6

Table 2. Macro F1 on the in-domain and out-of-domain test set of the SSSL model with spatial and temporal positive pairs vs. baselines: Tile2Vec (also with spatial and temporal pairs), the data augmentation-based model of Patacchiola and Storkey (2020), ImageNet pre-training, random initialization, and the random forest (RF) of Andree et al. (2020). The best result per column is marked in bold.

Figure 7

Figure 6. Test macro F1 on $ {\mathcal{D}}_{test}^{ipc} $ with frozen (a) and unfrozen (b) CNN backbone weights for models with different weight initializations using increasing amounts of labeled training data.

Figure 8

Figure 7. Test macro F1 on $ {\mathcal{D}}_{test}^{ipc- temp} $ with frozen (a) and unfrozen (b) CNN backbone weights for neural networks with different weight initializations and a random forest when predicting an increasing number of time steps into the future (one step corresponds to 3–4 months, two to 6–8 months, and three to 9–12 months).

Figure 9

Figure 8. Test macro F1 on $ {\mathcal{D}}_{\mathrm{test}}^{\mathrm{ipc}} $ of the SSSL model with unfrozen CNN weights (magenta line, right vertical axis), and ground truth (red) and predicted (blue) IPC score distributions (violin plots, left vertical axis), both versus the season of the IPC score measurement (x-axis). Note that only four IPC scores are depicted, since only four out of five possible IPC scores occur in Somalia between 2013 and 2020.

Figure 10

Figure 9. Mean SHAP values per LANDSAT-8 band for 100 tiles per IPC score. A positive mean SHAP value for one band and one predicted IPC score means that strong activations for features in this band make the prediction of this IPC score more likely.

Figure 11

Table 3. Random forest performance for binary and multiclass predictions compared to pre-trained neural networks. The best result per column is marked in bold.

Figure 12

Figure A1. Macro F1 on validation set $ {\mathcal{D}}_{val}^{ipc} $ using different configurations of positive and negative pairs for Tile2Vec pre-training, with $ {D}_g $ and $ {D}_t $ denoted on the x-axis. The baseline in this plot always predicts the majority class. “admin” means using administrative units instead of longitude/latitude to define spatial positive pairs.

Figure 13

Figure B1. Test macro F1 on out-of-domain test set $ {\mathcal{D}}_{ood}^{ipc} $ with frozen (a) and unfrozen (b) CNN backbone weights for models with different weight initializations using increasing amounts of labeled training data.

Figure 14

Figure C1. Rows show example images with ground truth IPC scores in ascending order (first row shows an image with IPC score 1, etc.), and the last four columns show the SHAP values for the red, near-infrared, and the first shortwave infrared input bands for an output IPC score prediction of 1–4. The pixel contributions follow image features like vegetation, and one pixel contributes in opposite direction to different IPC scores.