Hostname: page-component-89b8bd64d-4ws75 Total loading time: 0 Render date: 2026-05-07T09:24:07.778Z Has data issue: false hasContentIssue false

Nitrogen management with reinforcement learning and crop growth models

Published online by Cambridge University Press:  25 September 2023

Michiel G.J. Kallenberg*
Affiliation:
Laboratory of Geo-information Science and Remote Sensing, Wageningen University & Research, Wageningen, The Netherlands
Hiske Overweg
Affiliation:
Laboratory of Geo-information Science and Remote Sensing, Wageningen University & Research, Wageningen, The Netherlands
Ron van Bree
Affiliation:
Laboratory of Geo-information Science and Remote Sensing, Wageningen University & Research, Wageningen, The Netherlands
Ioannis N. Athanasiadis
Affiliation:
Laboratory of Geo-information Science and Remote Sensing, Wageningen University & Research, Wageningen, The Netherlands
*
Corresponding author: Michiel G.J. Kallenberg; Email: michiel.kallenberg@wur.nl

Abstract

The growing need for agricultural products and the challenges posed by environmental and economic factors have created a demand for enhanced agricultural systems management. Machine learning has increasingly been leveraged to tackle agricultural optimization problems, and in particular, reinforcement learning (RL), a subfield of machine learning, seems a promising tool for data-driven discovery of future farm management policies. In this work, we present the development of CropGym, a Gymnasium environment, where a reinforcement learning agent can learn crop management policies using a variety of process-based crop growth models. As a use case, we report on the discovery of strategies for nitrogen application in winter wheat. An RL agent is trained to decide weekly on applying a discrete amount of nitrogen fertilizer, with the aim of achieving a balance between maximizing yield and minimizing environmental impact. Results show that close to optimal strategies are learned, competitive with standard practices set by domain experts. In addition, we evaluate, as an out-of-distribution test, whether the obtained policies are resilient against a change in climate conditions. We find that, when rainfall is sufficient, the RL agent remains close to the optimal policy. With CropGym, we aim to facilitate collaboration between the RL and agronomy communities to address the challenges of future agricultural decision-making.

Information

Type
Application Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Table 1. Crop growth and weather variables exposed in the state space $ S $

Figure 1

Figure 1. Locations of the training (red) and out-of-distribution test (blue), with a CCAFS climate similarity index (Villegas et al., 2011) of 1.0 (reference) and 0.573, respectively.

Figure 2

Table 2. Cumulative reward, nitrogen, and yield (median and associated 95% CI)

Figure 3

Figure 2. (a): Cumulative reward obtained and (b): nitrogen applied by each of the three agents. Each dot depicts a test year (n=16). For most test years, RL is closer to Ceres than SP. (c): the difference in yield between the RL agent and the SP agent as a function of the difference in the amount of nitrogen applied. The dashed line indicates the break-even line, at which both agents achieve the same reward. Most test years are above the break-even line, demonstrating that the RL agent’s choice of applying a different amount of nitrogen is adequate.

Figure 4

Figure 3. Policy visualization of the RL agent: (a) cumulative reward obtained and (b) nitrogen applied. Typically, the RL agent waits until spring for its first actions.

Figure 5

Figure 4. Scatter plot with regression lines of the average daily rainfall and the total amount of nitrogen applied by all three agents for (a) the northern climate and (b) the southern climate. The optimal amount of nitrogen, as determined by Ceres, depends substantially on rainfall. Presumably, the RL agent has learned to adopt this general trend. In dry years, when lack of rainfall impairs yield and the optimal amount of nitrogen is (close to) zero, the RL agent does not limit its nitrogen application sufficiently, as it arguably sticks to the general trend. In years with sufficient rainfall, the RL agent acts in line with the optimal policy. This effect is seen in both the northern climate and the southern climate.

Figure 6

Table 3. Out-of-distribution results: cumulative reward, nitrogen, and yield (median and 95% CI) in southern climate

Figure 7

Figure 5. Scatter plot of PPO and DQN agent for (a): cumulative reward obtained, (b): nitrogen applied, and (c): yield obtained. Each point depicts a test year (n = 32).

Figure 8

Table 4. Cumulative reward, nitrogen, and yield (median and associated 95% CI) for DQN