Hostname: page-component-89b8bd64d-dvtzq Total loading time: 0 Render date: 2026-05-13T02:40:50.899Z Has data issue: false hasContentIssue false

Assessing the utility of machine learning for predicting food sufficiency: a case study in Malawi

Published online by Cambridge University Press:  23 July 2025

Andrew Tomes*
Affiliation:
Evans School of Public Policy and Governance, University of Washington , Seattle, WA, USA
Shahrzad Gholami
Affiliation:
Microsoft AI for Good Lab
Didier Alia
Affiliation:
Evans School of Public Policy and Governance, University of Washington , Seattle, WA, USA
Conor Hennessy
Affiliation:
Evans School of Public Policy and Governance, University of Washington , Seattle, WA, USA
Dafeng Xu
Affiliation:
Evans School of Public Policy and Governance, University of Washington , Seattle, WA, USA
Cecilia Bitz
Affiliation:
Atmospheric Sciences Department, University of Washington , Seattle, WA, USA
Rahul Dodhia
Affiliation:
Microsoft AI for Good Lab
Juan Lavista Ferres
Affiliation:
Microsoft AI for Good Lab
C. Leigh Anderson
Affiliation:
Evans School of Public Policy and Governance, University of Washington , Seattle, WA, USA
*
Corresponding author: Andrew Tomes; Email: altomes@uw.edu

Abstract

This study explores the potential of applying machine learning (ML) methods to identify and predict areas at risk of food insufficiency using a parsimonious set of publicly available data sources. We combine household survey data that captures monthly reported food insufficiency with remotely sensed measures of factors influencing crop production and maize price observations at the census enumeration area (EA) in Malawi. We consider three machine-learning models of different levels of complexity suitable for tabular data (TabNet, random forests, and LASSO) and classical logistic regression and examine their performance against the historical occurrence of food insufficiency. We find that the models achieve similar accuracy levels with differential performance in terms of precision and recall. The Shapley additive explanation decomposition applied to the models reveals that price information is the leading contributor to model fits. A possible explanation for the accuracy of simple predictors is the high spatiotemporal path dependency in our dataset, as the same areas of the country are repeatedly affected by food crises. Recurrent events suggest that immediate and longer-term responses to food crises, rather than predicting them, may be the bigger challenge, particularly in low-resource settings. Nonetheless, ML methods could be useful in filling important data gaps in food crises prediction, if followed by measures to strengthen food systems affected by climate change. Hence, we discuss the tradeoffs in training these models and their use by policymakers and practitioners.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices
Open data
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. Average proportion of households with insufficient food in each EA by month and survey wave (subplot title). Error bars represent the 95% CI.

Figure 1

Table 1. Classification categories and their compositions

Figure 2

Figure 2. Selected spatial data means across EAs for the time range included in this study. From upper left to lower right, mean surface temperature, precipitation, soil moisture, vapor pressure, vapor pressure deficit, Palmer Drought Severity Index (PDSI), and vegetation index (NDVI).

Figure 3

Figure 3. Comparison of the two food insufficiency classifications and sampling intensity across the four survey waves. The c1 classification is shown on the left and c2 on the left. The dark bars indicate the number of food insufficient EAs while the light bars indicate the number of food sufficient EAs for each month.

Figure 4

Table 2. Description of the variables used for each specification

Figure 5

Table 3. Summary statistics for selected variables; NDVI and PDSI are both indices and are unitless

Figure 6

Table 4. Evaluation Metrics for the performance of a classifier

Figure 7

Figure 4. From left to right, comparisons for waves 1, 3, and 4 in terms of precision and recall across model specifications for all models at the c1 threshold using the within-wave training and testing datasets. The red dotted lines show the precision and recall fit with logistic regression and the minimal variable set. Points that fall in the upper right quadrant are more accurate, while those in the upper left outperform on recall but underperform on precision, and those in the lower right outperform on precision but underperform on recall. Abbreviations: logit: logistic regression, RF: random forests.

Figure 8

Table 5. Comparison of the simple logit fit to higher-performing models in wave 1 at the c1 threshold

Figure 9

Table 6. Comparison of the simple logit fit to higher-performing models in wave 3 at the c1 threshold

Figure 10

Table 7. Comparison of the simple logit fit to higher-performing models in wave 4 at the c1 threshold

Figure 11

Figure 5. Comparison of precision and recall across model specifications for all models at the c2 classification using the within-wave training and testing datasets. Waves 1, 3, and 4 are presented from left to right.

Figure 12

Table 8. Comparison of the best-performing models to the minimal logit model in terms of classification accuracy, precision, and recall using the c2 category in wave 1

Figure 13

Table 9. Comparison of the best-performing models in wave 3 at the c2 threshold

Figure 14

Table 10. Comparison of the best-performing models in wave 4 at the c2 threshold

Figure 15

Figure 6. SHAP decompositions for models trained and tested on data from wave 1 (top left), wave 3 (top right), and wave 4 (bottom), c1 classification. Abbreviations: inflationcpi: monthly CPI inflation; lsms_elev: elevation; mzinfl: monthly maize price inflation; mzprice: maize price, ppt: precipitation, qtr: quarter; vap: vapor pressure; vpd: vapor pressure deficit. “Lag” indicates how far before the observation (in months) the value was taken.

Figure 16

Table 11. Precision, recall, and accuracy of the nearest-neighbor matching (best-guess) approach for the c1 and c2 categories and each of the three waves for which predictions were generated

Figure 17

Figure 7. Precision and recall scores across specifications and models on the c1 classification compared to the benchmark precision and recall of the naïve classifier (red lines). The left plot represents predictions on wave 2 derived from wave 1, the center predictions on wave 3 derived from waves 1 and 2, and the right predictions on wave 4 derived from waves 1, 2, and 3.

Figure 18

Figure 8. Precision and recall scores across specifications and models on the c2 classification compared to the benchmark precision and recall of the naïve classifier (red lines). The left plot represents predictions on wave 2 derived from wave 1, the center predictions on wave 3 derived from waves 1 and 2, and the right predictions on wave 4 derived from waves 1, 2, and 3.

Figure 19

Figure 9. SHAP decompositions for models trained and tested on data from wave 1 (top left), wave 3 (top right), and wave 4 (bottom), c1 classification. Abbreviations: inflationcpi: monthly CPI inflation; mzinfl: monthly maize price inflation; mzprice: maize price, ppt: precipitation, tmin: mean daily minimum temperature; tmean: mean daily temperature; tmax: mean daily maximum temperature; qtr: quarter; vap: vapor pressure; vpd: vapor pressure deficit. “Lag” indicates how far before the observation (in months) the value was taken.

Supplementary material: File

Tomes et al. supplementary material 1

Tomes et al. supplementary material
Download Tomes et al. supplementary material 1(File)
File 264.7 KB
Supplementary material: File

Tomes et al. supplementary material 2

Tomes et al. supplementary material
Download Tomes et al. supplementary material 2(File)
File 120.2 KB
Supplementary material: File

Tomes et al. supplementary material 3

Tomes et al. supplementary material
Download Tomes et al. supplementary material 3(File)
File 290.8 KB
Submit a response

Comments

No Comments have been published for this article.