Hostname: page-component-6766d58669-l4t7p Total loading time: 0 Render date: 2026-05-15T09:00:02.076Z Has data issue: false hasContentIssue false

Cross-modal distillation for flood extent mapping

Published online by Cambridge University Press:  07 November 2023

Shubhika Garg*
Affiliation:
Google Research
Ben Feinstein
Affiliation:
Google Research
Shahar Timnat
Affiliation:
Google Research
Vishal Batchu
Affiliation:
Google Research
Gideon Dror
Affiliation:
Google Research School of Computer Sciences, The Academic College of Tel Aviv–Yaffo, Tel Aviv, Israel
Adi Gerzi Rosenthal
Affiliation:
Google Research
Varun Gulshan
Affiliation:
Google Research
*
Corresponding author: Shubhika Garg; Email: shubhikagarg123@gmail.com

Abstract

The increasing intensity and frequency of floods is one of the many consequences of our changing climate. In this work, we explore ML techniques that improve the flood detection module of an operational early flood warning system. Our method exploits an unlabeled dataset of paired multi-spectral and synthetic aperture radar (SAR) imagery to reduce the labeling requirements of a purely supervised learning method. Prior works have used unlabeled data by creating weak labels out of them. However, from our experiments, we noticed that such a model still ends up learning the label mistakes in those weak labels. Motivated by knowledge distillation and semi-supervised learning, we explore the use of a teacher to train a student with the help of a small hand-labeled dataset and a large unlabeled dataset. Unlike the conventional self-distillation setup, we propose a cross-modal distillation framework that transfers supervision from a teacher trained on richer modality (multi-spectral images) to a student model trained on SAR imagery. The trained models are then tested on the Sen1Floods11 dataset. Our model outperforms the Sen1Floods11 baseline model trained on the weak-labeled SAR imagery by an absolute margin of $ 6.53\% $ intersection over union (IoU) on the test split.

Information

Type
Application Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Figure 1. Selectively sampled Sentinel-1 and Sentinel-2 images taken during flooding events.Note: Clouds are heavily correlated with flooding events, and these examples visualize some Sentinel-2 images during such events. On the other hand, SAR images can see through the clouds and serve as a more useful input for segmentation models during flooding events.

Figure 1

Figure 2. Red points highlight the regions from where Sen1Floods11 flooding event data points were sampled, and blue points indicate the same for Floods208 dataset.

Figure 2

Figure 3. Selectively sampled data points from Sen1Floods11 weak-labeled data (first three rows) and Floods208 Dataset (last four rows).Note: In the weak label, the mapping is green: dry, blue: water, and white: clouded/invalid pixels. These examples highlight the poor quality of the weak labels.

Figure 3

Table 1. Summary of the key attributes of all the datasets used for training and evaluation

Figure 4

Figure 4. Randomly selected Sentinel-1 images from the training split with their corresponding label and edge map.Note: In the label, blue pixels denote water, peach pixels denote dry region, and black pixels denotes the invalid pixels. The edge map shows the inner and outer edges in white and gray color, respectively.

Figure 5

Figure 5. Selected examples from the training split demonstrating the effect of amount of label noise in memorizing the label mistakes.Note: (Left) The model learns these mistakes when the label is of very poor quality and is missing most part of the river. (Right) The model can overcome these mistakes when there is less noise. The color scheme for the labels and prediction is the same as Figure 3, with the addition of black regions representing out-of-bounds pixels.

Figure 6

Figure 6. Improving weak label from the training split using the water occurrence map.Note: The color scheme used in the weak label is the same as Figure 5.

Figure 7

Figure 7. Overview of our cross-modal distillation framework.Note: In the training stage, a teacher model using Sentinel-1 and Sentinel-2 images is used to train a student using only the Sentinel-1 image. At inference time, only the student is used to make predictions.

Figure 8

Table 2. Test split results of our model trained on Sen1Floods11 hand-labeled data at 16 m resolution

Figure 9

Table 3. Results of our Sentinel-1 supervised baseline models, improved weak-label supervised model, and our cross-modal distillation framework on Sen1Floods11 hand-label test split at 16 m resolution

Figure 10

Figure 8. Model inference visualization on Sen1Floods11 hand-labeled test split on selected images.Note: Here, the weak supervised model refers to the weak supervised baseline trained on Sen1Floods11 and Floods208 dataset. The color scheme used for the predictions and ground truth is the same as Figure 5. The Sentinel-2 image is not passed as input to the model and only shown for visualization purpose. The ground truth is hand labeled on the Sentinel-2 image and can contain clouds labeled in white color. It can be seen that cross-modal distillation produces sharper and more accurate results. Weak-labeled supervised baseline on the other hand sometimes misses big parts of river due to mistakes learnt from the training data.

Figure 11

Figure 9. Selected examples showcasing the model’s failure case on Sen1Floods11 hand-labeled test split using the same color scheme as Figure 5.Note: The Sentinel-2 image is not passed as input to the model and only shown for visualization purpose. The ground truth is hand labeled on the Sentinel-2 image. We can infer that the model struggles with segmenting water due to ambiguities in Sentinel-1 image. We can also see that it sometimes fails to detects extremely thin rivers

Figure 12

Table 4. Performance comparison of our cross-modal distillation model with other methods on all water hand labels from Sen1Floods11 test set at 10 m resolution. The value in bold represents the top IoU metric value compared across all the models.

Figure 13

Table 5. Validation split results for decoder stride comparison on Sen1Floods11 hand-labeled split

Figure 14

Table 6. Validation split results for loss comparison on Sen1Floods11 hand-labeled split