Hostname: page-component-89b8bd64d-shngb Total loading time: 0 Render date: 2026-05-09T22:25:01.437Z Has data issue: false hasContentIssue false

Food security analysis and forecasting: A machine learning case study in southern Malawi

Published online by Cambridge University Press:  11 October 2022

Shahrzad Gholami*
Affiliation:
AI for Good Research Lab, Microsoft, Redmond, Washington 98052, USA
Erwin Knippenberg
Affiliation:
Poverty and Equity Global Practice, The World Bank, Washington, District of Columbia 20433, USA
James Campbell
Affiliation:
Food Security Monitoring and Evaluation Programs, Catholic Relief Services, Baltimore, Maryland 21201, USA
Daniel Andriantsimba
Affiliation:
Food Security Monitoring and Evaluation Programs, Catholic Relief Services, Baltimore, Maryland 21201, USA
Anusheel Kamle
Affiliation:
Food Security Monitoring and Evaluation Programs, Catholic Relief Services, Baltimore, Maryland 21201, USA
Pavitraa Parthasarathy
Affiliation:
AI for Good Research Lab, Microsoft, Redmond, Washington 98052, USA
Ria Sankar
Affiliation:
AI for Good Research Lab, Microsoft, Redmond, Washington 98052, USA
Cameron Birge
Affiliation:
AI for Good Research Lab, Microsoft, Redmond, Washington 98052, USA
Juan Lavista Ferres
Affiliation:
AI for Good Research Lab, Microsoft, Redmond, Washington 98052, USA
*
*Corresponding author. E-mail: sgholami@microsoft.com

Abstract

Chronic food insecurity remains a challenge globally, exacerbated by climate change-driven shocks such as droughts and floods. Forecasting food insecurity levels and targeting vulnerable households is apriority for humanitarian programming to ensure timely delivery of assistance. In this study, we propose to harness a machine learning approach trained on high-frequency household survey data to infer the predictors of food insecurity and forecast household level outcomes in near real-time. Our empirical analyses leverage the Measurement Indicators for Resilience Analysis (MIRA) data collection protocol implemented by Catholic Relief Services (CRS) in southern Malawi, a series of sentinel sites collecting household data monthly. When focusing on predictors of community-level vulnerability, we show that a random forest model outperforms other algorithms and that location and self-reported welfare are the best predictors of food insecurity. We also show performance results across several neural networks and classical models for various data modeling scenarios to forecast food security. We pose that problem as binary classification via dichotomization of the food security score based on two different thresholds, which results in two different positive class to negative class ratios. Our best performing model has an F1 of 81% and an accuracy of 83% in predicting food security outcomes when the outcome is dichotomized based on threshold 16 and predictor features consist of historical food security score along with 20 variables selected by artificial intelligence explainability frameworks. These results showcase the value of combining high-frequency sentinel site data with machine learning algorithms to predict future food insecurity outcomes.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited
Copyright
© The Author(s), 2022. Published by Cambridge University Press
Figure 0

Figure 1. Machine learning workflow developed to study food security status of households and communities based on the Measurement Indicators for Resilience Analysis protocol.

Figure 1

Table 1. Comparison of the sample of retained HHs versus HHs that dropped

Figure 2

Figure 2. Malawi map and visual distribution of households.

Figure 3

Table 2. Measurement Indicators for Resilience Analysis coverage

Figure 4

Table 3. Basic HH demographics and livelihood information

Figure 5

Table 4. Coping strategy index module in the Measurement Indicators for Resilience Analysis high-frequency survey and weights assigned to each strategy to compute rCSI score

Figure 6

Figure 3. Reduced Coping Strategy Index score distributions.

Figure 7

Figure 4. Mean binary reduced Coping Strategy Index score distributions based on the two thresholds 19 and 16.

Figure 8

Figure 5. Machine learning dataset preparation and split scheme with cross-sectional data assumption where independent variables are used as predictor features for the reduced Coping Strategy Index score at the time step. Records are pooled together for HHs and time steps to generate a randomized data split. The horizontal axis denotes time, the vertical axis denotes HHs, and the depth axis denotes independent variables.

Figure 9

Figure 6. Time-series data point preparation and split scheme. The horizontal axis denotes time, the vertical axis denotes households, and the depth axis denotes independent variables.

Figure 10

Figure 7. Neural Nets used for time-series forecasting.

Figure 11

Table 5. Evaluation metrics for performance of classifier

Figure 12

Table 6. Comparing classical ML model performances based on random partitioning of the dataset

Figure 13

Table 7. Predictive performance using previous rCSI + SHAP top 20 features and 15 months of data

Figure 14

Figure 8. Shapley additive explanations global and dependence plots.

Figure 15

Figure 9. Evaluation results versus time-series length (reduced Coping Strategy Index cutoff = 19).

Figure 16

Figure 10. Evaluation results versus time-series length (reduced Coping Strategy Index cutoff = 16).

Figure 17

Table 8. RF model performance across multiple steps ahead using rCSI + SHAP top 20 features and length of 12 months for time series

Figure 18

Table 9. Confusion matrix for the threshold 19 and random forest results for robustness check presented in Table 8

Figure 19

Table 10. Confusion matrix for the threshold 16 and random forest results for robustness check presented in Table 8

Figure 20

Table 11. MIRA high-frequency survey snippet.

Figure 21

Table 12. MIRA annual survey snippet.

Submit a response

Comments

No Comments have been published for this article.