Hostname: page-component-77f85d65b8-6c7dr Total loading time: 0 Render date: 2026-03-28T06:25:30.673Z Has data issue: false hasContentIssue false

Modeling Shrimp Income and Disease Risks Prevalence Using Econometric and Machine Learning Approaches: Evidence from Vietnam

Published online by Cambridge University Press:  29 April 2025

Brice M. Nguelifack
Affiliation:
Department of Mathematics, United States Naval Academy, MD, USA
Kim Anh T. Nguyen
Affiliation:
Department of Economics, Nha Trang University, Nha Trang City, Vietnam
Tram Anh T. Nguyen
Affiliation:
Department of Economics, Nha Trang University, Nha Trang City, Vietnam
Curtis Jolly*
Affiliation:
Department of Agricultural Economics and Rural Sociology, Auburn University, Auburn, AL, USA
*
Corresponding author: Curtis Jolly; Email: cjolly@auburn.edu
Rights & Permissions [Opens in a new window]

Abstract

Constrained econometric techniques hamper investigations of disease prevalence and income risks in the shrimp industry. We employ an econometric model and machine learning (ML) to reduce model restrictions and improve understanding of the influence of diseases and climate on income and disease risks. An interview of 534 farmers with the models enables the discernment of factors influencing shrimp income and disease risks. ML complemented the Just-Pope production model, and the partial dependency plots show nonlinear relationships between income, disease prevalence, and risk factors. Econometric and ML models generated complementary information to understand income and disease prevalence risk factors.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Southern Agricultural Economics Association
Figure 0

Table 1. Descriptive statistics of selected socioeconomic and biological variables for 534 farmers

Figure 1

Figure 1. Distribution of the training and validation sets. This figure illustrates the distribution of data points in the training and validation sets, which exhibit similar patterns. The consistent distribution suggests that the model will likely perform well during validation, as both sets encompass the same range of features. This alignment indicates that the validation set represents the training data, which is crucial for assessing the model’s generalization capability.

Figure 2

Table 2. Frequencies and percentages of answers to shrimp risk management questions

Figure 3

Table 3. Results of the ordinary linear regression model with a log transformation of the response variable income risks

Figure 4

Table 4. Ordinary linear regression model results with a log transformation of response variable disease risks

Figure 5

Figure 2. (a) Variable of importance using the Random Forest (RF) model. This figure displays the importance of each variable in the RF model. Each bar represents the contribution of the corresponding variable to the model’s predictions. Variables with greater importance are more influential in driving the model’s decisions, indicating which factors are most critical for understanding the underlying patterns in the data. (b) Variable importance using Cubist (CB) model. This figure displays the importance of each variable in the CB model.

Figure 6

Figure 3. Importance of variable using Random Forest (RF) model on the left (a) and support vector machine (SVM) model on the right (b). Here, the risk-related disease prevalence significant variables are the RF and SVM models.

Figure 7

Table 5. Significant variables from the J-P models in which the + and - sign indicate that the variables are positively and negatively related, respectively, while for RF, CB, and SVM models, the check mark indicates that the variable tends to be important in the model. The empty cells in the table indicate that the variables are not significant (for J-P) and are not important for other models

Figure 8

Figure 4. Partial dependency plot of predicted related income risk versus the top variables using random forest. This figure presents the partial dependency plot illustrating the relationship between predicted income risk and the top influencing variables identified by the Random Forest model. Each curve shows how changes in the top variables affect the predicted income risk while holding other variables constant. This visualization helps to understand the impact of each key variable on the expected outcomes, providing insights into the factors that drive income risk in the dataset. (a) shows the relationship between income risks and experience. (b) shows the relationship between knowledge and income risks. (c) shows the relationship between income risks and a place for sludge. (d) relates income risk and the number of crop years. (e) relates income risk to disease prevalence, and (f) shows the relationship between income risks and pond size.

Figure 9

Figure 5. Partial dependency plot of predicted related disease risk versus the top variables using random forest. This figure presents the partial dependency plot illustrating the relationship between predicted disease risk and the top influencing variables identified by the Random Forest model. Each curve shows how changes in the top variables affect the predicted disease risk while holding other variables constant. This visualization helps to understand the impact of each key variable on the expected disease, providing insights into the factors that drive disease risk in the dataset. (a) shows the relationship between disease risk and the number of dependents living in the family home. (b) shows the relationship between disease risk and the number of years of experience. (c) shows the relationship between disease risk and crop years. (d) relates disease risk and stocking density. (e) relates disease risk to having a place for sludge.

Figure 10

Figure 6. (a) Interaction plot of prevalence-related risk between pond size and stocking density. This figure illustrates the interaction between pond size and stocking density on prevalence-related risk. The plot reveals how varying stocking densities influence risk levels across pond sizes. Understanding this interaction is crucial for effective management practices, as it highlights the conditions under which prevalence risk may increase, enabling stakeholders to make informed decisions regarding optimal stocking strategies. (b) Interaction plot of prevalence-related risk between pond size and income. This figure illustrates the interaction between pond size and income on prevalence-related risk. The plot reveals how variations in income levels influence risk across different pond sizes. This interaction is crucial for understanding the economic factors contributing to prevalence risk, enabling stakeholders to make informed decisions that balance financial outcomes with risk management strategies in aquaculture. (c) Interaction plot of prevalence-related risk between pond size (Area_m2) and experience. This figure illustrates the interaction between pond size (measured in square meters) and the experience level of prevalence-related risk. The plot demonstrates how varying experience levels influence risk across different pond sizes. Understanding this interaction is essential for stakeholders, as it highlights how increased experience can mitigate risks associated with more extensive pond operations, providing insights for better management practices in aquaculture. (d) Interaction plot of prevalence-related risk between stocking density and income. This figure illustrates the interaction between stocking density and income on prevalence-related risk. The plot reveals how changes in stocking density affect risk levels at different income brackets. Understanding this interaction is crucial for aquaculture management, as it highlights how financial factors can influence risk associated with varying stocking practices, enabling stakeholders to optimize their strategies for sustainable and profitable operations.

Supplementary material: File

Nguelifack et al. supplementary material 1

Nguelifack et al. supplementary material
Download Nguelifack et al. supplementary material 1(File)
File 397.5 KB
Supplementary material: File

Nguelifack et al. supplementary material 2

Nguelifack et al. supplementary material
Download Nguelifack et al. supplementary material 2(File)
File 26.3 KB
Supplementary material: File

Nguelifack et al. supplementary material 3

Nguelifack et al. supplementary material
Download Nguelifack et al. supplementary material 3(File)
File 82.4 KB