Introduction
Over the last five decades, weed control technology development has focused primarily on herbicides; however, evaluation of alternative weed control technologies has continued, albeit at a relatively slower pace. Many novel thermal technologies have been identified as potential alternatives to herbicides, including targeted lasers (Coleman et al. Reference Coleman, Betters, Squires, Leon-Saval and Walsh2021; Couch and Gangstad Reference Couch and Gangstad1974; Mathiassen et al. Reference Mathiassen, Bak, Christensen and Kudsk2006), electrical discharge (Armyanov et al. Reference Armyanov, Diprose, Stefanova, Stoyanova and Dimitrova2000; Diprose and Benson Reference Diprose and Benson1984), and microwaves (Brodie et al. Reference Brodie, Ryan and Lancaster2012; Sartorato et al. Reference Sartorato, Zanin, Baldoin and De Zanche2006), as reviewed in Bauer et al. (Reference Bauer, Marx, Bauer, Flury, Ripken and Streit2020). Compared with herbicides, the use of these alternatives as whole-field treatments in large-scale cropping systems has not been viable given the intensive resource demands (energy, labor, and time). Yet, the ability to apply targeted thermal treatments to weeds specifically would make such treatments approximately as resource-efficient as herbicides in large-scale crop production systems (Coleman et al. Reference Coleman, Stead, Rigter, Xu, Johnson, Brooker, Sukkarieh and Walsh2019). This approach, referred to as site-specific weed control (SSWC), enables the in-crop use of nonselective and alternative weed control technologies. In-crop SSWC is strongly reliant on precise and reliable weed recognition within the crop, which can be achieved at varying degrees of specificity depending on the task at hand (Table 1) (Lopez-Granados Reference Lopez-Granados2011; Slaughter et al. Reference Slaughter, Giles and Downey2008). Thus, without accurate weed recognition the implementation of alternative weed control technologies for in-crop uses in large-scale cropping systems will not be successful.
The recent step-change in accessibility and performance of in-crop, image-based weed recognition tools has been driven by developments in three key areas: (1) gains in computational power, efficiency, and hardware accessibility; (2) improved image data availability for training complex algorithms; and (3) novel algorithm architectures. The broader technology industry [e.g., Google AI and Meta AI (previously Facebook)] and the computer science community advanced many of these areas through collaborative and open-source methods, which are now providing new opportunities for weed control (Fernández-Quintanilla et al. Reference Fernández-Quintanilla, Peña, Andújar, Dorado, Ribeiro, López-Granados, Fernandez-Quintanilla, Peña, Andújar, Dorado, Ribeiro and Lopez-Granados2018). Despite these external technological advances, accurate in-crop weed recognition remains a significant challenge, particularly in large-scale production systems. The combination of plastic weed (and crop) morphology (Munier-Jolain et al. Reference Munier-Jolain, Collard, Busset, Guyot and Colbach2014; Nkoa et al. Reference Nkoa, Owen and Swanton2015) and broad environmental variability complicate reliable detection at speed.
Technological advancements are enabling the weed control industry to progress from being able to simply determine the presence of a weed in an image (weed detection), to identifying specific weed species and plant morphological characteristics (weed identification), and finally to be able to both characterize and locate weed species within images (weed recognition) (Table 1) in real time for highly targeted application. With this progression and rapidly rising interest in weed recognition research (Figure 1), it is critical that academic research groups and industry develop an understanding of how in-crop SSWC will advance, the tools required, technology limitations, and where future research should focus. This involves a chronologically based review of the developmental trajectory of technology in the context of weed control beyond a survey of the literature (Hamuda et al. Reference Hamuda, Glavin and Jones2016; Hasan et al. Reference Hasan, Sohel, Diepeveen, Laga and Jones2021; Rakhmatulin et al. Reference Rakhmatulin, Kamilaris and Andreasen2021; Wang et al. Reference Wang, Zhang and Wei2019; Wu et al. Reference Wu, Chen, Zhao, Kang and Ding2021).
This review examines the progress of weed detection, identification, and recognition methods over the past 50 yr, to highlight the potential offered by recent developments in deep learning in the context of weed recognition in large-scale crop production systems. Definitions of key terms relevant to SSWC are provided for clarity, especially those used inconsistently in current literature. This review aims to investigate the most effective approach(es) for developing weed recognition capability that enables highly accurate SSWC for large-scale crop production systems.
1971 to Early 2000s: Introduction of Weed Detection and Computer Vision
From the outset of plant detection technology in the 1970s, the development of SSWC tools has followed a path of increasing complexity. It began with use in simple environments with green weeds in fallow before moving into in-crop weed recognition with highly variable conditions. The historical success of weed detection–based tools has been largely dependent on the ability to control the imaging environment, which enables the application of simple algorithms that rely on consistencies in spectral differences, lighting, background, and/or target appearance. During this initial 30 yr of research, SSWC commenced with the introduction of active reflectance-based detection of living (“green”) plants (Haggar et al. Reference Haggar, Stent and Isaac1983; Hooper et al. Reference Hooper, Harries and Ambler1976; Palmer and Owen Reference Palmer and Owen1971) with photoelectric diodes, progressing to weed recognition in highly controlled horticultural scenarios (Lee et al. Reference Lee, Slaughter and Giles1999) with cameras and early machine learning algorithms.
Reflectance-Based Weed Detection
In general, reflectance-based methods work by analyzing light reflected from a scene, typically without spatial information. By analyzing and comparing different parts of the spectrum, they are able to discriminate between plant and nonplant material (reviewed by Peteinatos et al. Reference Peteinatos, Weis, Andújar, Rueda Ayala and Gerhards2014). The technology for weed detection in fallow scenarios emerged from research for plant detection used in sugar beet thinning in the early 1970s (Hooper et al. Reference Hooper, Harries and Ambler1976; Palmer and Owen Reference Palmer and Owen1971). The concept, which uses photodetectors, compares red and near-infrared reflectance ratios between green plant material and non-green plant residues and soil backgrounds, was later adapted for fallow weed control (Haggar et al. Reference Haggar, Stent and Isaac1983). The light is either provided by an active source or passively provided by the sun. Importantly, the photodetectors used in this method lack spatial resolution. All the reflected light in the field of view of the photodetector is observed by the sensor as one mixed signal. If detection is triggered, the single sensor cannot determine where the trigger was raised within this area. Similarly, there is often not enough information to differentiate between types of plants. As a result, reflectance-based methods simply detect the presence of any plant within their field of view. This is known as “weed detection” (Table 1), and efficacy is largely driven by the usage context. The method is suited to fallow conditions (e.g., Haggar et al. Reference Haggar, Stent and Isaac1983), where weeds can be defined as any living plant––whether they are invasive, crop regrowth, or self-regenerating.
Because of their simplicity and early development, reflectance-based methods have been used for weed detection and spot spraying in large-scale fallow fields since the 1990s (Felton et al. Reference Felton, Doss, Nash and McCloy1991; Haggar et al. Reference Haggar, Stent and Isaac1983; Shearer and Jones Reference Shearer and Jones1991; Visser and Timmermans Reference Visser, Timmermans, Meyer and DeShazer1996). These spot-spraying systems are now widely adopted by Australian crop producers (McCarthy et al. Reference McCarthy, Rees and Baillie2010; SPAA 2016) to target low-density (<1.0 plant 10 m–2) weed populations (Walsh and Powles Reference Walsh and Powles2022). The weed control savings enabled by fallow SSWC have driven the demand for systems that lead to similar savings targeting low-density weeds within crops.
Computer Vision and Machine Learning–Based Weed Detection
As digital technologies matured in the 1970s and 1980s, photodetectors generalized into the charge-coupled device (CCD) sensor and, consequently, more accessible digital cameras. Instead of individual photodetectors observing a single part of the spectrum through filters, the CCD combined multiple photodetectors arranged in a grid, making it possible to record digital images with inherent spatial information. Further development added sensitivity to multiple spectral bands (e.g., red, green, and blue) allowing color or even multispectral images to be recorded.
This new way of capturing spatial and spectral image data gave rise to the discipline of computer vision. In general, the goals of computer vision are to derive high-level information from digital images. Although humans have an intrinsic ability to analyze and understand images of crops and the contexts in which weeds might occur, this is a complex task to replicate in software. Early attempts to convert digital images into a higher-level understanding were predicated on computer-vision experts designing algorithms that could process aspects of images into “features” that could then be passed into classification algorithms (Figure 2).
Computer vision for weed detection and identification involves image pre-processing (e.g., color space transformation or image resizing), feature extraction (selecting which image attributes are relevant), and finally the application of a classification algorithm that uses these features to identify the weed (Figure 2) (Wang et al. Reference Wang, Zhang and Wei2019; Weis and Sökefeld Reference Weis, Sökefeld, Oerke, Gerhards, Menz and Sikora2010). Initial attempts at computer vision for species identification on eight crop and weed species (maize [Zea mays L.], soybean [Glycine max (L.) Merr.], tomato [Lycopersicon esculentum L.], johnsongrass [Sorghum halepense (L.) Pers.], Jimsonweed [Datura stramonium L.], velvetleaf [Abutilon theophrasti Medik.], giant foxtail [Setaria faberi Herrm.], and common lambsquarters [Chenopodium album L.]) in 1986 achieved a modest 69% classification accuracy (Guyer et al. Reference Guyer, Miles, Schreiber, Mitchell and Vanderbilt1986). Results by the start of the millennium appeared to be improving, with up to 96.7% accuracy on five similar weed species (velvetleaf, giant foxtail, common lambsquarters, large crabgrass (Digitaria sanguinalis [L.] Scop.), and ivyleaf morningglory [Ipomoea hederacea Jacq.]), and a soil class in one example using a neural network (Burks et al. Reference Burks, Shearer and Payne2000b). Whereas the initial image attribute (feature) selection component was a manual process that relied upon experts, the classification algorithms that used these features to detect and/or identify weeds, were often based on machine learning (Table 1). Machine learning is a process of optimizing algorithm performance by repeated prediction and error correction from a training dataset of annotated weed images. During the training process, the algorithm modifies or “learns” its parameters (weights and biases) through an error feedback loop, often referred to as a loss function or an objective function, so that its predictions improve over time. Although machine learning improved classification, the process of manual feature extraction struggled in managing the diversity of the field environment (Slaughter et al. Reference Slaughter, Giles and Downey2008), even if weed and agronomy “experts” were involved in identifying important features to use (Golzarian and Frick Reference Golzarian and Frick2011).
The types of features extracted by computer-vision experts can be divided into four general categories: (1) color (spectral), (2) shape, (3) texture, and (4) spatial context (e.g., planting arrangements); details on each category and extraction methods are reviewed by Zhang and Lu (Reference Zhang and Lu2004) and Wang et al. (Reference Wang, Zhang and Wei2019). In early computer vision research for weed detection, color features and vegetation indices formed a major component of image features (Woebbecke et al. Reference Woebbecke, Meyer, Von Bargen and Mortensen1995a). Yet, there were substantial drawbacks in the performance of algorithms due to color changes at different growth stages across a season, between days or periods with variable ambient lighting conditions (El-Faki et al. Reference El-Faki, Zhang and Peterson2000; Wang et al. Reference Wang, Zhang and Wei2019; Woebbecke et al. Reference Woebbecke, Meyer, Von Bargen and Mortensen1995a). It is a common challenge in agriculture and external industries (Pinto et al. Reference Pinto, Cox and DiCarlo2008) that continues for many color- and shape-based algorithms, even in more recent in-field efforts (Chang et al. Reference Chang, Zaman, Schumann, Percival, Esau and Ayalew2012; Coleman et al. Reference Coleman, Salter and Walsh2022). Weed and plant species identification during this period was largely restricted to highly controlled settings, where leaves were removed and image dataset sizes were typically fewer than 100 specimens (Gerhards et al. Reference Gerhards, Nabout, Sökefeld, Kühbauch and Eldin1993; Guyer et al. Reference Guyer, Miles, Schreiber, Mitchell and Vanderbilt1986; Petry and Kühbauch Reference Petry and Kühbauch1989; Shearer and Holmes Reference Shearer and Holmes1990) (Figure 3).
Where computer vision is integrated with a machine response such as tine movement or spot spray application, the term machine vision is used, given the machine now has the capability to “see.” During the 1990s to early 2000s, machine vision systems were developed for high-value horticultural crops such as tomatoes, where slow travel speeds (e.g., under 3 km h–1) and highly managed planting arrangements were appropriate (Lee et al. Reference Lee, Slaughter and Giles1999; Slaughter Reference Slaughter, Young and Pierce2014). These controlled environments and slow travel speeds allow the more effective use of manually identified image features such as shape, color, and texture for weed detection algorithms on systems with highly constrained processing power compared to modern devices. In one of the first attempts at real-time, in-crop weed detection for selective herbicide application with a tractor-mounted, machine vision system, Lee et al. (Reference Lee, Slaughter and Giles1999) used leaf shape features to classify individual leaves based on RGB images with a Bayesian classifier. This system detected 73.1% of tomato leaves and 68.8% of weeds at a forward speed of 1.2 km h–1. Similarly, Åstrand and Baerveldt (Reference Åstrand and Baerveldt2002) employed visual feature analysis to achieve 96% accuracy when differentiating unspecified weeds and sugar beet. The authors noted, however, that color features varied with the intensity of sunlight, a significant weakness of these early vision approaches. Despite adequate performance, the use of comparatively high spatial-resolution images was a limiting factor for real-time use, given the processing capability of systems at the time. In-field use was largely limited to 2 km h–1 with detection algorithms requiring between 100 and 200 ms for processing per image (Fernández-Quintanilla et al. Reference Fernández-Quintanilla, Peña, Andújar, Dorado, Ribeiro, López-Granados, Fernandez-Quintanilla, Peña, Andújar, Dorado, Ribeiro and Lopez-Granados2018; Slaughter et al. Reference Slaughter, Giles and Downey2008).
Toward the end of the 1990s, research on overcoming challenges of visual environmental complexity diverged into either (1) increasing spectral bands through multi- and hyperspectral imaging and/or (2) novel algorithms that employed more advanced techniques to make the most of lower cost digital camera technology, though also limited by processor speed and capacity. Hyperspectral sensors provide increased spectral range and resolution over conventional cameras designed for RGB color imagery. This improves the potential for modeling complex crop–weed scenarios (McCarthy et al. Reference McCarthy, Rees and Baillie2010; Slaughter et al. Reference Slaughter, Giles and Downey2008). For example, in 2003, weed and crop discrimination using hyperspectral sensors achieved an accuracy above 95% in tomatoes (Slaughter et al. Reference Slaughter, Lanini and Giles2004), outperforming the color-based classification effort of 75%. The approach has its own difficulties, with recent research on hyperspectral detection of Palmer amaranth (Amaranthus palmeri S. Wats.) and large crabgrass finding performance changes throughout the season and with variable weed densities (Basinger et al. Reference Basinger, Hestir, Jennings, Monks, Everman and Jordan2022). Additionally, spectral imaging for plant discrimination (as reviewed in Lu et al. Reference Lu, Dao, Liu, He and Shang2020) requires intensive computing resources and expensive imaging devices, which has resulted in a reduced interest in the development of this approach for commercial, in-crop weed recognition systems. The availability of low-cost and readily accessible RGB imaging devices also contributed to the declining interest in the spectral-imaging approach for commercial systems (Brown and Noble Reference Brown and Noble2005). Recent reviews of weed detection (Lopez-Granados Reference Lopez-Granados2011) and machine learning in agriculture (Liakos et al. Reference Liakos, Busato, Moshou, Pearson and Bochtis2018) provide more detail on the future of multi- and hyperspectral research.
Developments in SSWC enabling technology from the 1970s to 2000s saw dramatic advances in performance and increasing relevance for large-scale systems. Reflectance-based weed detection systems for fallow SSWC became commercially available at the beginning of this period; then by 2002 one of the first end-to-end machine vision systems was being used in research settings (Åstrand and Baerveldt Reference Åstrand and Baerveldt2002). Although algorithm performance was a limiting factor, practical, in-field use during this time was substantially restrained as a result of available processing power for image analysis, restricting image resolution and inference speed. The development of more advanced computer vision algorithms for agriculture over the next decade (e.g., CNNs; LeCun et al. Reference LeCun, Boser, Denker, Henderson, Howard, Hubbard and Jackel1989) coupled with gains in computing power and increased image data availability, would see field-ready developments in large-scale systems for real-time use by the end of the 2010s.
Early 2000s to 2012: Advances in Algorithm Performance
The reductionist approach applied in the early period of image analysis and computer vision tools was useful in establishing introductory-level SSWC in controlled, in-crop settings. Yet, as often identified by the authors in early studies, the manually selected features used in these systems were brittle. Changes in the environment or crop could render ineffective simple image feature selection in complex environments. A further complication was the plasticity of weed morphology, which varies with genotypes and is influenced by temperature, moisture, light and nutrient availability, as well as the crop production environment (Maity et al. Reference Maity, Singh, Martins, Ferreira, Smith and Bagavathiannan2021; Munier-Jolain et al. Reference Munier-Jolain, Collard, Busset, Guyot and Colbach2014), increasing the difficulty in developing reliable detection and identification techniques. By and large, image features designed by human experts were not easily scalable to new tasks or able to cope with the variation in large-scale agriculture (Figure 3) (Dyrmann et al. Reference Dyrmann, Karstoft and Midtiby2016a). In the context of these limitations, the next wave of progress sought to use algorithm architectures with a greater ability to represent the complexity of conditions and morphology. This included the first use of so-called “neural network” methods and more robust feature engineering efforts. Whereas developments continued in non-neural network machine learning, as reviewed in Fernández-Quintanilla et al. (Reference Fernández-Quintanilla, Peña, Andújar, Dorado, Ribeiro, López-Granados, Fernandez-Quintanilla, Peña, Andújar, Dorado, Ribeiro and Lopez-Granados2018) and Wang et al. (Reference Wang, Zhang and Wei2019), neural network architectures underpin the current state of weed detection, identification, and recognition for in-crop use and are the focus of the following sections.
Artificial Neural Networks
The capability to deal with the complexity of the in-crop environment in the development of weed recognition algorithms was enhanced with the use of artificial neural networks (ANNs). This comes from the improved ability of ANNs to describe a very large set of functions that represent weed diversity and hence patterns in images that would identify weed species. For example, Burks et al. (Reference Burks, Shearer, Heath and Donohue2005) used an ANN to classify images containing giant foxtail, large crabgrass, common lambsquarters, velvetleaf, and ivyleaf morningglory (Figure 4). Images were collected in controlled-illumination field settings, and the features were extracted using a manual method, achieving a classification accuracy up to 97%. Other ANN-only attempts have reported similar results from ground-based (Burks et al. Reference Burks, Shearer, Gates and Donohue2000a; Yang et al. Reference Yang, Prasher, Landry, Ramaswamy and DiTommaso2000) and, more recently, aerial platforms (Barrero et al. Reference Barrero, Rojas, Gonzalez and Perdomo2016). They did not always outperform the state-of-the-art classification algorithms such as support vector machines (Wu and Wen Reference Wu and Wen2009), likely leading to skepticism about their standalone utility. Despite the promise, the ANN still had the fundamental flaw of previous methods, in that the desired plant features such as color, shape, and texture had to be manually selected by the user, resulting in a lack of robustness in variable field conditions. Nevertheless, ANNs formed the backbone of CNNs and were a critical component in the progress toward weed recognition.
Convolutional Neural Networks
Weed recognition constraints associated with manual feature extraction were largely addressed with CNNs, which combine automatic selection and learning of image features with an ANN-type architecture. The first of these architectures, LeNet, was developed by LeCun et al. (Reference LeCun, Boser, Denker, Henderson, Howard, Hubbard and Jackel1989) for identifying handwritten postal codes in images. LeNet represented a fundamental shift toward recognizing that the spatially connected nature of images could be learned through CNNs (Kamilaris and Prenafeta-Boldú Reference Kamilaris and Prenafeta-Boldú2018). The feature extraction component of a CNN, known as a kernel, moves over pixels in an image and automatically extracts features. Specific kernels and weightings for each image dimension (e.g., red, green, and blue) are learned in the training process. It removes the requirement for weed experts to identify relevant plant features during the feature extraction process, instead shifting toward annotating weeds within images for training datasets (Khan et al. Reference Khan, Sohail, Zahoora and Qureshi2020). The use of CNNs to understand spatial relationships within an image represented a substantial improvement over previous methods (Dyrmann et al. Reference Dyrmann, Karstoft and Midtiby2016a; Hasan et al. Reference Hasan, Sohel, Diepeveen, Laga and Jones2021; Wang et al. Reference Wang, Zhang and Wei2019). Of particular importance was the ability to stack multiple feature extraction layers to develop what are known as “deep” architectures, which has been found to improve performance (Grinblat et al. Reference Grinblat, Uzal, Larese and Granitto2016).
Despite the benefits offered by automated feature extraction, spatially correlated information, and deep architectures, issues with training the algorithms resulted in a view during the early 2000s that CNNs were less effective than manual feature extraction methods (Khan et al. Reference Khan, Sohail, Zahoora and Qureshi2020). Whereas the depth of CNNs improved their ability to recognize weeds, the added complexity and size of the algorithms brought additional issues. These issues stemmed from a lack of large and diverse datasets for development, inadequate computational resources, and algorithmic issues during training that prevented optimum performance. Nevertheless, research persevered, and these flaws were largely resolved in the mid-2000s (Bengio et al. Reference Bengio, Lamblin, Popovici and Larochelle2006; Goodfellow et al. Reference Goodfellow, Bengio and Courville2016). The resolution of these problems revived interest in algorithms that were once considered difficult to train.
The seminal paper in the field is largely considered to be the work of Krizhevsky et al. (Reference Krizhevsky, Sutskever and Hinton2012), who presented the first CNN to substantially outperform non-CNN classification attempts on the ImageNet challenge. This success established CNNs, and deep learning more generally, as a suite of algorithms that could address image complexity, a result that kick-started an era of rapid computer vision and deep-learning advancement.
Though the realization of the potential for CNNs defined the era, research into more advanced methods of weed recognition had continued and delivered some success. Improved species identification (Golzarian and Frick Reference Golzarian and Frick2011; McCarthy et al. Reference McCarthy, Rees and Baillie2010) and better occlusion management (Hall et al. Reference Hall, McCool, Dayoub, Sünderhauf and Upcroft2015; Haug et al. Reference Haug, Michaels, Biber and Ostermann2014), among other areas of research, had substantially increased weed recognition capability (Figure 3). Yet, complexity introduced by variation in environment and weed morphology continued to impede field performance (Chang et al. Reference Chang, Zaman, Esau and Schumann2014; Fernández-Quintanilla et al. Reference Fernández-Quintanilla, Peña, Andújar, Dorado, Ribeiro, López-Granados, Fernandez-Quintanilla, Peña, Andújar, Dorado, Ribeiro and Lopez-Granados2018). Concurrently, gains in computational power supported field research efforts in machine vision systems, while developments in deep learning algorithms set the framework for future success at the end of the 2010s.
2012 to 2015: The Rise of Deep Learning for Weed Recognition
Based on the performance of Krizhevsky’s work (Krizhevsky et al. Reference Krizhevsky, Sutskever and Hinton2012), the growing success of multilayered, deep networks attracted interest. Researchers focused on understanding how to create and train efficient network architectures, taking advantage of the flexibility and descriptiveness that a deep, multilayered network could provide. This field of research is known as “deep learning,” which is a subfield of machine learning, and consists of (1) multilayered models that use nonlinear data transformations, and (2) methods of supervised and unsupervised learning of features that produce progressively abstract layers (Deng and Yu Reference Deng and Yu2013). Within the deep learning domain for image analysis, four key algorithm approaches provide increasing levels of information extraction from an image. From least to most informative, these are (1) whole-image classification (e.g., Olsen et al. Reference Olsen, Konovalov, Philippa, Ridd, Wood, Johns, Banks, Girgenti, Kenny, Whinney, Calvert, Azghadi and White2019), (2) bounding-box object detection (e.g., Gao et al. Reference Gao, French, Pound, He, Pridmore and Pieters2020), (3) pixel-wise semantic segmentation (e.g., Lottes et al. Reference Lottes, Behley, Milioto, Stachniss, Chebrolu, Milioto and Stachniss2020), and (4) instance segmentation (e.g., Champ et al. Reference Champ, Mora-Fallas, Goëau, Mata-Montero, Bonnet and Joly2020). Whole-image classification (Figure 5A) is the simplest but least information-rich method that produces a predicted-output label on an image. However, there is no illustration of pixels corresponding to the predicted area. Bounding-box detection (Figure 5B) methods output the pixel coordinates of boxes where individual weeds have been detected, providing more spatial detail. A disadvantage of bounding-box methods is that they cannot trace the shape of the objects they detect; they are limited to labeling rectangular regions. In contrast, semantic segmentation (Figure 5C) is a pixel-wise approach to image recognition, classifying individual pixels as belonging to a certain class. Although it can trace the shape of weeds at a pixel level, it is unable to separate each weed. Thus, it is unable to predict how many weeds are within the scene. Instance segmentation (Figure 5D) combines the advantages of bounding-box detection and semantic segmentation. Like bounding-box detection, instance segmentation can locate individual “instances” within an image and trace the individual pixels that belong to the detected object. The extra information captured by instance segmentation comes at a cost. The tradeoff for greater detail in the output is higher training efforts (more fine-detailed annotation) and computational requirements due to the generally “deeper” nature of the networks (Rakhmatulin et al. Reference Rakhmatulin, Kamilaris and Andreasen2021). As a result, per-image processing speeds typically decrease from image classification to object detection to semantic segmentation to instance segmentation architectures as architecture size increases.
With a greater number of network layers, deep learning increases the ability of an algorithm to represent complex image features, while being robust to fluctuations in environmental conditions (Bengio et al. Reference Bengio, Courville and Vincent2013). The improvements in performance and subsequent increase in popularity have primarily been driven by (1) access to large quantities of labeled training data (in non-plant datasets) (Russakovsky et al. Reference Russakovsky, Deng, Su, Krause, Satheesh, Ma, Huang, Karpathy, Khosla, Bernstein, Berg and Fei-Fei2015); (2) increased computational power and parallelism with graphics-processing units (GPUs) (Oh and Jung Reference Oh and Jung2004); and (3) more effective, open-source algorithms. Yet, it was not until Lee et al. (Reference Lee, Chan, Wilkin and Remagnino2015) and Hall et al. (Reference Hall, McCool, Dayoub, Sünderhauf and Upcroft2015) that the very first deep learning CNNs were trained for weed leaf identification, achieving accuracies of 99.5% and 97.3%, respectively. The conclusions were that deep learning and CNNs consistently yielded superior performance compared to previously used, non-CNN-based methods. These results are supported by more recent comparative non-deep and deep learning classification studies (Gogul and Kumar Reference Gogul and Kumar2017; Šulc and Matas Reference Šulc and Matas2017). The rapid increase in reported accuracy during this period, as illustrated in Figure 3, supports the conclusion that deep learning is the path forward for in-crop weed recognition. At this stage, with research focusing on validation studies for weed/plant identification, it became increasingly clear that the transition to deep learning resulted in increases in both accuracy and the ability of a trained algorithm to perform outside of its training dataset in complex and occluded environments (Dyrmann et al. Reference Dyrmann, Karstoft and Midtiby2016a; Kamilaris and Prenafeta-Boldú Reference Kamilaris and Prenafeta-Boldú2018; Sapkota et al. Reference Sapkota, Hu and Bagavathiannan2022; Wang et al. Reference Wang, Zhang and Wei2019).
During this period, improvements in open-source software and hardware tools facilitated the development and implementation of deep learning for machine vision. These new technologies helped kick-start a wave of community-driven initiatives and gave rise to the development of weed recognition algorithms for commercial in-crop SSWC in large-scale cropping systems (Table 2). In the 2000s, deep learning and CNNs were solely the domain of computer scientists, as the platforms used to implement the algorithms were inaccessible to most users. This changed in the 2010s with the release of open-source deep learning tools such as Caffe (Jia et al. Reference Jia, Shelhamer, Donahue, Karayev, Long, Girshick, Guadarrama and Darrell2014), Tensorflow (Abadi et al. Reference Abadi, Barham, Chen, Chen, Davis, Dean, Devin, Ghemawat, Irving, Isard, Kudlur, Levenberg, Monga, Moore, Murray, Steiner, Tucker, Vasudevan, Warden, Wicke, Yu and Zheng2016), and Pytorch (Paszke et al. Reference Paszke, Gross, Massa, Lerer, Bradbury, Chanan, Killeen, Lin, Gimelshein and Antiga2019), among many others. These tools reduced the barrier to entry for deep learning evaluation and facilitated its testing on weed-specific datasets. Concurrent with software development, the gains in GPU performance and low-cost computers helped bring deep learning for machine vision into agriculture and weed control.
a Recent and small projects may be missing due to rapid developments in this space.
By the end of these 4 yr, with the fast-paced advancements in the performance of CNNs, research efforts became more focused on the use of deep learning for weed recognition in more realistic large-scale in-crop scenarios. Whereas transformational developments occurred during this period that established the framework for use in large-scale crop production systems, the methods continued to fall short in key areas such as computational speed (inference speed), weed-specific data availability, ability to handle variable conditions (generalizability), and algorithm performance that met the requirements for in-field use in large-scale cropping programs.
2016 to 2022: Deep Learning for In-Crop Weed Recognition
Over time, deep learning has become more accessible to developers with non–computer science backgrounds and those without powerful computers, creating more widespread interest for image-based weed recognition among weed researchers and the weed control industry in general. The interest stems not only from ease of use, but how issues concerning data, algorithm, and deployment are less of a barrier for applied research and in-field use. The improved ability of deep learning to manage environmental and plant variability increases the potential number of specialized applications for precision weed control in a variety of production settings. This increase in research interest is evident in the rapid growth in publications meeting “weed recognition,” “weed identification,” or “weed detection” criteria on Scopus, with a research output that has more than quadrupled over the last 5 yr (Figure 1).
As the field of deep learning for weed recognition has matured, research is pivoting from feasibility assessments (Dyrmann et al. Reference Dyrmann, Karstoft and Midtiby2016a; Lee et al. Reference Lee, Chan, Wilkin and Remagnino2015) toward understanding the interactions between biology and deep learning (e.g., growth stages, species similarity) (Teimouri et al. Reference Teimouri, Dyrmann, Nielsen, Mathiassen, Somerville and Jørgensen2018). This includes optimizing architecture design (Hu et al. Reference Hu, Coleman, Zeng, Wang and Walsh2020; Xu et al. Reference Xu, Zhai, Zhao, Jiao, Kong, Zhou and Gao2021) and/or selection (Chen et al. Reference Chen, Lu, Li and Young2021; Sharpe et al. Reference Sharpe, Schumann, Yu and Boyd2019b); data management (Hu et al. Reference Hu, Sapkota, Thomasson and Bagavathiannan2021a; Skovsen et al. Reference Skovsen, Dyrmann, Mortensen, Laursen, Gislum, Eriksen, Farkhani, Karstoft and Jorgensen2019); and algorithm training (Farkhani et al. Reference Farkhani, Skovsen, Dyrmann, Jørgensen and Karstoft2021; Gao et al. Reference Gao, French, Pound, He, Pridmore and Pieters2020; Hu et al. Reference Hu, Thomasson and Bagavathiannan2021b; Hussain et al. Reference Hussain, Farooque, Schumann, Abbas, Acharya, McKenzie-Gopsill, Barrett, Afzaal, Zaman and Cheema2021). As the field matures, research will closely examine the efficacy of different approaches to address the weed recognition challenge of large-scale cropping systems. The specifics of current deep learning architectures, training methods, and evaluation characteristics for weed recognition are reviewed extensively by Hasan et al. (Reference Hasan, Sohel, Diepeveen, Laga and Jones2021) and Wang et al. (Reference Wang, Zhang and Wei2019), with available datasets and limitations reviewed in Lu and Young (Reference Lu and Young2020). The following sections seek to contextualize important aspects of deep learning approaches within both the chronology of weed recognition development and the relevant agronomy that guides SSWC use.
Cropping System Context
In developing weed recognition for crop production systems, it is critical to identify the opportunities and constraints presented by crop–weed interactions that can be exploited or guarded against in algorithm development. For example, consistent and predictable crop planting arrangements in raised-bed or highly tilled systems (e.g., row spacing, plant spacing, uniformly tilled background) can simplify deep learning decisions with assumptions of (1) no occlusion (Zhuang et al. Reference Zhuang, Li, Bagavathiannan, Jin, Yang, Meng, Li, Li, Wang, Chen and Yu2022), (2) incorporated crop sequence information (Lottes et al. Reference Lottes, Behley, Milioto and Stachniss2018), (3) included crop markers (Kennedy et al. Reference Kennedy, Fennimore, Slaughter, Nguyen, Vuong, Raja and Smith2020), and (4) clearly defined crop rows for unsupervised learning (Pérez-Ortiz et al. Reference Pérez-Ortiz, Peña, Gutiérrez, Torres-Sánchez, Hervás-Martínez and López-Granados2015). Horticultural and wide-row cropping systems that contain these four attributes formed much of the initial success in developing accurate deep learning–based weed recognition algorithms (Bah et al. Reference Bah, Hafiane, Canals and Emile2019; Huang et al. Reference Huang, Wu, Sun, Ma, Jiang and Qi2020). Large-scale systems with dense canopies, unpredictable occlusion, plant spacing, and variable crop–weed morphological stages put greater emphasis on the algorithm for reliable recognition. Dyrmann et al. (Reference Dyrmann, Jørgensen and Midtiby2017) trained an object detection architecture DetectNet to detect broadleaf and grass weeds in wheat (Triticum aestivum L.) under heavy leaf occlusion with image data collected from a high-speed platform (Laursen et al. Reference Laursen, Jørgensen, Dyrmann and Poulsen2017). The algorithm detected 46.3% of weeds, encountering issues with significant overlap. Su et al. (Reference Su, Qiao, Kong and Sukkarieh2021) minimized occlusion by using a camera between rows of wheat to detect rigid ryegrass (Lolium rigidum Gaudin) and an unspecified broad-leaved weed category. The approach recalled up to 92% of weeds present in the area between crop rows, benefiting from the constrained inter-row environment. As research continues into large-scale, more complex environments, the ability to exploit crop agronomy and cultural practices is likely to be reduced, with reliance predominantly on advanced architectures and training methods (Picon et al. Reference Picon, San-Emeterio, Bereciartua-Perez, Klukas, Eggers and Navarra-Mestre2022).
Weed Recognition Algorithm Output
The complexity and diversity of in-crop weed recognition datasets and the consequent interaction with the strengths and weaknesses of different algorithm architectures make difficult the prescription of one-size-fits-all approaches. Selecting the level of specificity provided by the algorithm output (Figure 5) affects the challenges faced during the training and evaluation processes and is dictated by the resulting weed control effort. Controlling invasive species in rangelands may only require whole-image classification (Olsen et al. Reference Olsen, Konovalov, Philippa, Ridd, Wood, Johns, Banks, Girgenti, Kenny, Whinney, Calvert, Azghadi and White2019) if the control treatment is coarse (e.g., spot spraying), whereas the application of laser weed control treatments requires the knowledge of plant morphology to enable targeting of growing points and other critical plant parts (Champ et al. Reference Champ, Mora-Fallas, Goëau, Mata-Montero, Bonnet and Joly2020). Exploring how different architectures affect performance, Sharpe et al. (Reference Sharpe, Schumann and Boyd2019a) found that DetectNet detected all Carolina geranium (Geranium carolinianum L.) growing among plasticulture strawberry plants, compared to just 21% for the image classification architectures tested. In contrast, Zhuang et al. (Reference Zhuang, Li, Bagavathiannan, Jin, Yang, Meng, Li, Li, Wang, Chen and Yu2022) found that image classification algorithms outperformed object detection algorithms for broadleaf weed seedlings in wheat. The difference lies in the data complexity, quantity, and quality; annotation grouping (specific classes vs. grouped “broadleaf”), strategy and quality; and algorithm selection and training process. A quantitative comparison is difficult without access to both datasets and contextual information, and there is limited research that dives deeper into how weed appearance may interact with algorithm architectures (e.g., are image classification algorithms better suited for grass species detection over bounding-box object detection?). Nonetheless, the object detection approach in the first instance enabled leaf-based annotation, which was more successful than whole-plant detection at finding weed instances. The method is likely to have a greater level of detection resilience, with weeds still detected even if individual leaves were missed.
Within a general algorithm type (e.g., image classification), there are many architectures that perform differently on the same dataset (Figure 6). A common occurrence in research is to compare the performance of many different architectures to determine which option best meets the data and performance requirements (e.g., Ma et al. Reference Ma, Deng, Qi, Jiang, Li, Wang and Xing2019 and Chen et al. Reference Chen, Lu, Li and Young2021). Increased algorithm size through larger parameter numbers does not necessarily correlate with algorithm performance, and often a screening of different architectures may be required at the outset (Chen et al. Reference Chen, Lu, Li and Young2021; Jin et al. Reference Jin, Sun, Che, Bagavathiannan, Yu and Chen2022).
Beyond architecture selection, understanding the specificity of each weed class by grouping weeds in broader classes or as individual species influences overall algorithm performance. In a plasticulture setting, grasslike, broad-leaved, and sedge (Cyperus spp.) weeds were detected between the rows of plasticulture with an object detection network YOLO v3 (Sharpe et al. Reference Sharpe, Schumann, Yu and Boyd2019b), whereby the algorithm performed better when distinguishing the three classes individually than when pooled as a broad group of weeds for general detection. Similar performance gains have been found when more specific classes were used for tea shoot detection (Li et al. Reference Li, He, Jia, Lv, Chen, Qiao and Wu2021); however, research to date has not identified appropriate annotation strategies for individual weed morphologies. Considering the rapid advancements in the deep learning and associated hardware fields, a prescriptive approach is unlikely to be beneficial in the long term.
Data Collection, Quality, and Availability
Access to large datasets of annotated images was a critical factor in the progress of deep learning applications. Datasets were harvested from the Internet (e.g., ImageNet), comprising images of “everyday” scenes. Unfortunately, agriculture and weeds were not a substantial part of these collections. As a result, datasets and image data quality remain a challenge for deep learning–based weed recognition systems. Supervised learning is the predominant method of deep learning used for weed recognition and requires human input through the annotation of weeds present in each image, a highly time-consuming process and a significant barrier to widespread development (Lu and Young Reference Lu and Young2020; Wang et al. Reference Wang, Zhang and Wei2019). Annotating weeds in cropping-system images requires expertise in plant species identification, which makes difficult outsourcing to online, paid annotation platforms. Even among trained plant consultants, an error rate of 12% was reported (Dyrmann et al. Reference Dyrmann, Midtiby and Jørgensen2016b). Assisted and corrective annotation approaches, such as the open-source RootPainter (Smith et al. Reference Smith, Han, Petersen, Olsen, Giese, Athmann, Dresbøll and Thorup-Kristensen2020), use targeted annotations on model mistakes or low-confidence areas that improve the algorithm more efficiently. Alternatively, the generation of synthetic images (to replace/supplement in-field images) to reduce annotation requirements through cut-and-paste approaches (Gao et al. Reference Gao, French, Pound, He, Pridmore and Pieters2020; Hu et al. Reference Hu, Thomasson and Bagavathiannan2021b), generative adversarial networks (Madsen et al. Reference Madsen, Dyrmann, Jørgensen and Karstoft2019), or 3D weed datasets (Di Cicco et al. Reference Di Cicco, Potena, Grisetti and Pretto2017; Hu et al. Reference Hu, Thomasson, Reberg-Horton, Mirsky and Bagavathiannan2022) can supplement field-collected images in improving model performance, without the need for prohibitively large manual annotation efforts.
The performance of deep learning algorithms has been shown to increase with larger quantities of training data (Hestness et al. Reference Hestness, Narang, Ardalani, Diamos, Jun, Kianinejad, Patwary, Yang and Zhou2017; Sun et al. Reference Sun, Shrivastava, Singh and Gupta2017; Zhuang et al. Reference Zhuang, Li, Bagavathiannan, Jin, Yang, Meng, Li, Li, Wang, Chen and Yu2022). Efforts to mitigate this bottleneck and provide greater access to image data have been attempted in platforms such as Weed-AI (https://weed-ai.sydney.edu.au/), with upload, download, and standardization of agricultural metadata. Several public weed datasets exist (Table 3); however, the quantity of images within each dataset (30,000 or less) is many orders of magnitude lower than those in the largest generic datasets such as ImageNet (Deng et al. Reference Deng, Dong, Socher, Li, Li and Fei-Fei2009), Pascal VOC, and COCO, which have images of everyday objects and scenes.
a The dataset includes 15,336 separate segments derived from 400 UAV-acquired images.
Entirely unsupervised learning techniques group data without intervention, although they have drawbacks in their ability to generalize into new data. There is limited research on their use for weed recognition. Developing unsupervised approaches based on CNN-based anomaly detection that exploit the crop growth similarity and treat weeds as abnormalities may reduce the reliance on large, annotated datasets altogether for late-season weed recognition, where weed escapes stand out against homogeneous crop backgrounds. Weakly supervised methods that rely on clear soil backgrounds, no occlusion, and rows have been proposed (Bah et al. Reference Bah, Hafiane, Canals and Emile2019; Hu et al. Reference Hu, Thomasson and Bagavathiannan2021b) but are limited to these less complex environments with defined agronomic contexts, as discussed previously.
Besides image quantity, the influence of image quality (e.g., resolution, camera angle, and lighting conditions) and plant morphology (growth stage and size) on the performance of deep learning algorithms are important, though not well understood (Wang et al. Reference Wang, Zhang and Wei2019). Prior to the widespread use of deep learning, it was acknowledged that higher image spatial resolution increased the quantity of data on which algorithms can operate, likely improving performance on smaller weeds at the cost of greater hardware requirements (Brown and Noble Reference Brown and Noble2005; Fernández-Quintanilla et al. Reference Fernández-Quintanilla, Peña, Andújar, Dorado, Ribeiro, López-Granados, Fernandez-Quintanilla, Peña, Andújar, Dorado, Ribeiro and Lopez-Granados2018). More recent investigations of resolution on deep learning performance for weed recognition have found either reduced or no change to performance (Zhuang et al. Reference Zhuang, Li, Bagavathiannan, Jin, Yang, Meng, Li, Li, Wang, Chen and Yu2022) or increased performance (Hu et al. Reference Hu, Sapkota, Thomasson and Bagavathiannan2021a). In the latter, Hu et al. (Reference Hu, Sapkota, Thomasson and Bagavathiannan2021a) found that image resolution was the most beneficial for object detection and segmentation tasks; however, consistency between the training and testing (or inference) was critical. Algorithms trained on specific resolutions or blur levels did not perform well when tested on datasets with different resolutions and higher blur. If consistency was not possible, it was recommended that the full diversity of expected conditions be included in the training dataset instead. In contrast, Zhuang et al. (Reference Zhuang, Li, Bagavathiannan, Jin, Yang, Meng, Li, Li, Wang, Chen and Yu2022) reported that reductions in performance with increasing image size (from 200 × 200 pixels to 400 × 400) for small architectures such as AlexNet. The variability in findings is consistent with research in the field of medical imaging, in which some tasks show higher performance at low to medium resolutions rather than the highest resolution images (Sabottke and Spieler Reference Sabottke and Spieler2020).
Unlike other research fields, variability in lighting conditions and plant growth stage are complicating factors in weed recognition. Differential lighting across the day, year, and between weather conditions changes the appearance of plants and may cause harsh shadows, impeding the performance of algorithms (Hasan et al. Reference Hasan, Sohel, Diepeveen, Laga and Jones2021). Under natural lighting, Quan et al. (Reference Quan, Feng, Lv, Wang, Zhang, Liu and Yuan2019) found that sunny conditions decreased maize–weed detection F1-score with a Faster RCNN architecture from 98.46% in cloudy conditions to 94.60%. Changes in plant appearance are also likely to affect model performance, though research is sparse. In one study, growth stages were also observed to affect precision, with the detection accuracy of two- to five-leaf maize 0.53% higher on average than the six-to seven-leaf seedlings. Besides developing more resilient weed recognition algorithms capable of managing field-scale variability, growth stage detection may also offer opportunities for more targeted application of weed control treatments. Information on the location of weeds at different growth stages would provide additional management tools for farmers to understand weed distribution and problem areas. Improving our understanding of the influence of environmental conditions and plant morphology on recognition performance will be important in managing in-field deployment during periods of known increased false-positive and false-negative rates. Further research in this space should identify weaknesses in existing architectures and approaches.
Training and Evaluation
Different training and evaluation methods have been found to influence how effective or appropriate an algorithm may be for weed recognition in large-scale cropping and if the on-paper perception of performance is the reality in the field. Whereas Tensorflow and Pytorch are both widely used tools for deep learning research, training, and deployment, weed recognition models trained using the Pytorch framework were found to marginally outperform models trained using Tensorflow, with peak accuracy values of 97% and 96%, respectively (Hussain et al. Reference Hussain, Farooque, Schumann, Abbas, Acharya, McKenzie-Gopsill, Barrett, Afzaal, Zaman and Cheema2021). Understanding the influence of machine learning development tools such as these requires more attention, given the ubiquitous nature of both Tensorflow and Pytorch and the impact if there are consistent and repeatable weaknesses. After the training process, fairly evaluating the algorithm for performance is critical. Weed recognition models are typically evaluated using a range of different metrics, as discussed in detail in Hasan et al. (Reference Hasan, Sohel, Diepeveen, Laga and Jones2021); however, there is very limited research on how these metrics translate into in-crop weed control under field conditions. For example, the intersection-over-union (IoU) metric for segmentation models provides an understanding of how many pixels were predicted correctly. Yet, for a simple fallow spray operation, a weed only needs to be detected and its morphology not precisely estimated, making a low IoU score not necessarily representative of the in-field performance. The converse is also true, where models that have high performance on paper may not translate well into the dusty, variable in-field conditions. With respect to spot-spraying, Salazar-Gomez et al. (Reference Salazar-Gomez, Darbyshire, Gao, Sklar and Parsons2021) proposed the weed coverage rate, which incorporates both model accuracy and sprayer resolution into a performance model. It provides an indication of the percentage of weeds that would be controlled and would be more relevant to field scenarios than typically reported metrics such as precision, mean average precision, recall, and accuracy.
2022 and Beyond: The Future of Weed Recognition
As the development trajectory of weed recognition continues, trends suggest that research will focus on better identification of fine-grained weed morphology for increasingly targeted weed control, alongside architectures that include rather than avoid large-scale complexity. In the initial phase, there has been substantial interest in using weed recognition technologies for spot-spraying herbicide application with traditional sprayers and nozzle systems that have a spatially coarse weed control footprint. Looking ahead, weed recognition is likely to provide greater opportunities for increasingly targeted weed control such as more precise herbicide application, lasers, and electrical weeding, among others. Incorporating temporal data with spatial weed data would provide new insights into weed movement and the potential for density predictions before emergence, and incorporating area-wide information on weather, resistance status, and even crop yield could improve management processes and weed control method selection by autonomous platforms. Developing tools for the early detection and mitigation of herbicide resistance becomes feasible when a high degree of species-level detection can occur at low densities when monitored remotely. The theme of SSWC development moving from controlled areas to complex systems approaches is likely to continue as more and varied data contributes to the decision to control weeds.
Weed Recognition for Nonchemical Weed Control
Highly targeted methods of nonchemical weed control, including lasers and electrical weeding, have been proposed as viable alternatives to herbicides when used on a site-specific basis in low-density weed scenarios (Coleman et al. Reference Coleman, Stead, Rigter, Xu, Johnson, Brooker, Sukkarieh and Walsh2019). Despite their potential, these systems require highly detailed information on weed location and morphological details, including growing points, leaf locations, size, and species for effective targeting, energy estimation, and autonomous delivery. The detection of precise targeting locations such as growing points and plant centers has been shown in more controlled settings, by incorporating the predictable sequence of crop plants within the row into a row model (Lottes et al. Reference Lottes, Behley, Milioto and Stachniss2018, Reference Lottes, Behley, Milioto, Stachniss, Chebrolu, Milioto and Stachniss2020), or annotating plant nodes for object detection models from multiple viewing points (Boogaard et al. Reference Boogaard, Rongen and Kootstra2020). Simpler, barycenter methods have also been proposed, but error between plant center and predicted center could result in narrow laser beams missing the target entirely (Champ et al. Reference Champ, Mora-Fallas, Goëau, Mata-Montero, Bonnet and Joly2020). For reliable targeting of different species, there is a requirement to detect and track features that represent a growing point instead of estimating the centroid based on plant sequences and barycenter methods. Laser damage models have been developed that adjust laser power based on species and estimated biomass (Marx et al. Reference Marx, Barcikowski, Hustedt, Hustedt, Haferkamp, Rath, Hustedt, Haferkamp and Rath2012; Rakhmatulin and Andreasen Reference Rakhmatulin and Andreasen2020), which would require the real-time prediction of these parameters in the field. Species prediction with deep learning has already been shown in numerous studies (Hasan et al. Reference Hasan, Sohel, Diepeveen, Laga and Jones2021); however, real-time biomass estimation, growth stage determination, or plant organ detection from single images are less well understood and require input from the field of high-throughput plant phenotyping, where such methods are required for fine-detailed analysis of plant traits (Arunachalam and Andreasson Reference Arunachalam and Andreasson2021).
Weed Recognition for Weed Risk Profiling
As weed recognition algorithms advance, development has moved from managing variability with controlled environments, to adopting deep learning methods that can manage complexity themselves. Now, trends in external industries suggest that the next phase is for the development of deep learning–based architectures that do not just avoid complexity but incorporate diverse data sources using variability to their advantage. Research from Google AI recently demonstrated the potential for an algorithm capable of doing many thousands of tasks (Barham et al. Reference Barham, Chowdhery, Dean, Ghemawat, Hand, Hurt, Isard, Lim, Pang, Roy, Saeta, Schuh, Sepassi, El, Thekkath and Wu2022; Dean Reference Dean2021). The approach used an architecture that activated different regions depending on the task at hand. Taxonomic approaches to weed recognition have been proposed (Skovsen et al. Reference Skovsen, Dyrmann, Mortensen, Laursen, Gislum, Eriksen, Farkhani, Karstoft and Jorgensen2019) that would allow a model to select the most confident level of specificity in its prediction for a weed. Future models that have learned taxonomic relationships to detect different species of plants could be deployed on imagery on regional scales for area-wide understandings of weed prevalence. Such maps would provide insights into the prevalence of certain species outside of field margins, and thus the risk that this weed may be present in particular fields given the incorporation of weather and agronomy data. A more flexible approach to weed recognition may improve the ability of these systems to operate in unseen areas and over large regions incorporating not just image data, but previous application maps, weather information, soil information, and crop agronomy.
Besides area-wide management, species and morphology-level weed recognition would enable SSWC platforms to conduct risk assessments of the likely impact of each weed on crop yield and the likely herbicide resistance risk of each weed. Weed risk profiles based on species, morphology, past detections in the location, herbicide application history, and current crop agronomy would improve the identification of an appropriate control treatment for that weed. For example, Norsworthy et al. (Reference Norsworthy, Ward, Shaw, Llewellyn, Nichols, Webster, Bradley, Frisvold, Powles, Burgos, Witt and Barrett2012) proposed 12 best management practices focused on reducing the risk of herbicide resistance that require additional information on weed biology, herbicide labels, and weed morphology. The data could be provided by more generalized weed recognition algorithms enabling more accurate, real-time risk assessments of herbicide resistance evolution and hence more appropriate weed control application. Toward rationalizing the application of treatments, there may be instances where a weed may not pose a risk and could be ignored or monitored for possible future control (Gerhards et al. Reference Gerhards, Andújar Sanchez, Hamouz, Peteinatos, Christensen and Fernandez-Quintanilla2022). Given the existing prevalence of yield maps and field histories, it is reasonable to expect that architectures such as these could learn how weed control decisions affected localized yield to inform future weed control decisions. Incorporating complexity instead of simply managing it for weed control decision making is likely the future of SSWC and should change the way weeds are approached over the next 50 yr of development.
The agricultural industry has a high level of anticipation surrounding deep learning–based weed recognition and the subsequent benefits for SSWC. As we have illustrated, the idea of weed detection, identification, and recognition is not novel, having been in development over the last 50 years; however, advancements in deep learning algorithms and supporting software and hardware have enabled widespread development for horticultural and large-scale systems. Promisingly, deep learning research has shown that the performance of CNNs has continued to improve with the release of novel, open-source algorithm architectures and when trained with increasing quantities of data. It is likely that in-crop performance will improve if weed datasets increase in size, diversity, and accessibility and the industry continues to adopt the most recent algorithms or develops weed recognition–specific architectures. Just as ImageNet paved the way for data availability and algorithm development in machine learning, there is an opportunity in weed recognition to capture research interest in complex image analysis problems through the development of large-scale weed image databases. Yet much about the biological interactions with machine learning remains unknown. Explainable AI or machine learning is an emerging field of research that aims to show how “black box” decisions are made by trained models. An improved understanding on how complex models function could help optimize their integration and use with biological systems. Research on real-time growth stage and weed morphology identification for highly targeted methods of weed control is sparse. Furthermore, most existing methodologies used for weed recognition were developed in nonagricultural industries, where the architecture design was tuned for the task at hand and adapted for agriculture. There are likely benefits from the development of weed recognition–specific algorithm architectures from large-scale image datasets that attempt to replicate the impact ImageNet had for broader deep learning research; however, this requires access to such public datasets.
Given the unprecedented rate of progress in weed recognition technologies over the last decade, the next 50 yr are likely to herald step-changes in technology. Trends in current development suggest that short-term research will focus on larger, multi-modal systems. These systems would incorporate large amounts of diverse farm data to better predict the required weed control method, which may be a risk-based decision to ignore the weed. The development of weed recognition with performance at the requisite scale and reliability for agricultural systems is creating a new potential for weed control at the individual plant level.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/wet.2022.84
Acknowledgments
This work has been funded by the Grains Research and Development Corporation (GRDC). The authors would like to acknowledge the discussions with Professor Paul Neve at the University of Copenhagen for helping shape ideas around weed risk profiling.
Conflict of Interest
The authors declare no conflict of interest.