Hostname: page-component-6766d58669-bp2c4 Total loading time: 0 Render date: 2026-05-14T13:12:28.379Z Has data issue: false hasContentIssue false

Toward low-cost automated monitoring of life below water with deep learning

Published online by Cambridge University Press:  13 June 2023

Devi Ayyagari*
Affiliation:
Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada
Corey Morris
Affiliation:
Department of Fisheries and Oceans, St. John’s, NL, Canada
Joshua Barnes
Affiliation:
National Research Council Canada, St. John’s, NL, Canada
Christopher Whidden
Affiliation:
Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada
*
Corresponding author: Devi Ayyagari; Email: devi.ayyagari@dal.ca

Abstract

Oceans will play a crucial role in our efforts to combat the growing climate emergency. Researchers have proposed several strategies to harness greener energy through oceans and use oceans as carbon sinks. However, the risks these strategies might pose to the ocean and marine ecosystem are not well understood. It is imperative that we quickly develop a range of tools to monitor ocean processes and marine ecosystems alongside the technology to deploy these solutions on a large scale into the oceans. Large arrays of inexpensive cameras placed deep underwater coupled with machine learning pipelines to automatically detect, classify, count, and estimate fish populations have the potential to continuously monitor marine ecosystems and help study the impacts of these solutions on the ocean. In this paper, we successfully demonstrate the application of YOLOv4 and YOLOv7 deep learning models to classify and detect six species of fish in a dark artificially lit underwater video dataset captured 500 m below the surface, with a mAP of 76.01% and 85.0%, respectively. We show that 2,000 images per species, for each of the six species of fish is sufficient to train a machine-learning species classification model for this low-light environment. This research is a first step toward systems to autonomously monitor fish deep underwater while causing as little disruption as possible. As such, we discuss the advances that will be needed to apply such systems on a large scale and propose several avenues of research toward this goal.

Information

Type
Application Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Figure 1. Intersection over union.

Figure 1

Figure 2. (a) Variation of brightness in the dataset. (b) Variation of average brightness over time. Brightness is measured by the average “Value” channel in HSV color space across frames of each video.

Figure 2

Table 1. Number of unique objects in each class (Sample frequency) and number of videos with annotated objects from each class (video frequency) in the manually annotated label set.

Figure 3

Table 2. Number of objects of each species of fish in train, valid, and test splits.

Figure 4

Figure 3. mAP at different IoU thresholds for models trained with 200, 500, 1,000, 2,000, and all samples of fish in each class using YOLOv4 architecture.

Figure 5

Figure 4. mAP at different IoU thresholds for models trained with 200, 500, 1,000, 2,000, and all samples of fish in each class using YOLOv7 architecture.

Figure 6

Figure 5. mAP of the models trained with 2,000 species in each class for both YOLOv4 and YOLOv7 models evaluated at thresholds of 0.4 and 0.6.

Figure 7

Figure 6. Confusion matrix for the YOLOv4 model trained with 2,000 samples in each class evaluated at IoU threshold of 0.4.

Figure 8

Figure 7. Confusion matrix for the YOLOv7 model trained with 2,000 samples in each class evaluated at IoU threshold of 0.4.

Figure 9

Figure 8. Examples of fish objects identified in water frames by the YOLOv7 model trained with 2,000 frames in each class and evaluated at the IoU threshold of 0.6. Water frames are the frames with no manually annotated fish objects.

Figure 10

Figure 9. Brightness versus mean confidence of predictions for each video in the test set for YOLOv4 model trained with 2,000 samples in each class evaluated at IoU thresholds of 0.4 and 0.6.

Figure 11

Figure 10. Brightness versus mean confidence of predictions for each video in the test set for YOLOv7 model trained with 2,000 samples in each class evaluated at IoU thresholds of 0.4 and 0.6.

Figure 12

Figure 11. Brightness versus mAP of predictions for each video in the test set for YOLOv4 model trained with 2,000 samples in each class evaluated at IoU thresholds of 0.4 and 0.6.

Figure 13

Figure 12. Brightness versus mAP of predictions for each video in the test set for YOLOv7 model trained with 2,000 samples in each class evaluated at IoU thresholds of 0.4 and 0.6.