Hostname: page-component-5db58dd55d-htx7c Total loading time: 0 Render date: 2026-05-30T22:43:55.841Z Has data issue: false hasContentIssue false

Pollution tracker: Finding industrial sources of aerosol emission in satellite imagery

Published online by Cambridge University Press:  03 July 2023

Peter Manshausen*
Affiliation:
Atmospheric, Oceanic and Planetary Physics, Department of Physics, University of Oxford, Oxford, United Kingdom
Duncan Watson-Parris
Affiliation:
Atmospheric, Oceanic and Planetary Physics, Department of Physics, University of Oxford, Oxford, United Kingdom Scripps Institution of Oceanography and Halicioğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
Lena Wagner
Affiliation:
GAF AG, Munich, Germany
Pirmin Maier
Affiliation:
GAF AG, Munich, Germany
Sybrand J. Muller
Affiliation:
GAF AG, Munich, Germany
Gernot Ramminger
Affiliation:
GAF AG, Munich, Germany
Philip Stier
Affiliation:
Atmospheric, Oceanic and Planetary Physics, Department of Physics, University of Oxford, Oxford, United Kingdom
*
Corresponding author: Peter Manshausen; Email: peter.manshausen@physics.ox.ac.uk

Abstract

The effects of anthropogenic aerosol, solid or liquid particles suspended in the air, are the biggest contributor to uncertainty in current climate perturbations. Heavy industry sites, such as coal power plants and steel manufacturers, large sources of greenhouse gases, also emit large amounts of aerosol in a small area. This makes them ideal places to study aerosol interactions with radiation and clouds. However, existing data sets of heavy industry locations are either not public, or suffer from reporting gaps. Here, we develop a supervised deep learning algorithm to detect unreported industry sites in high-resolution satellite data, using the existing data sets for training. For the pipeline to be viable at global scale, we employ a two-step approach. The first step uses 10 m resolution data, which is scanned for potential industry sites, before using 1.2 m resolution images to confirm or reject detections. On held-out test data, the models perform well, with the lower resolution one reaching up to 94% accuracy. Deployed to a large test region, the first stage model yields many false positive detections. The second stage, higher resolution model shows promising results at filtering these out, while keeping the true positives, improving the precision to 42% overall, so that human review becomes feasible. In the deployment area, we find five new heavy industry sites which were not in the training data. This demonstrates that the approach can be used to complement existing data sets of heavy industry sites.

Information

Type
Application Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Figure 1. Some example visualizations the RGB channels of the Sentinel2 patches in the data set. For a human, characteristic features of steel plants are, for example, the dark surface together with industrial structures like in the top first panel. For coal plants, cooling towers are common, note the bright crescent shape in the top second panel, bottom right corner. Labels show truth (tr.) and prediction (pred.) of our best model.

Figure 1

Table 1. Performance evaluation of the best model (ResNet50v2) on the held-out test data for both the lower and higher resolution cases

Figure 2

Table 2. High resolution model confusion matrix, normalized by the number of samples

Figure 3

Figure 2. RGB images of the second stage higher resolution predictions in the deployment region. Scenes that the first stage classed as >50% probability of coal plant are downloaded at higher resolution from Bing Maps and fed into the model. The second stage detects 23 out of 24 coal plants that were present in the training set (top row, labeled “in training,” and with probability output). It also detects five “new” coal plants not in the data set from Global Energy Monitor, four of which are shown in the bottom row with their respective locations.

Supplementary material: PDF

Manshausen et al. supplementary material

Appendix

Download Manshausen et al. supplementary material(PDF)
PDF 247.4 KB