Hostname: page-component-89b8bd64d-z2ts4 Total loading time: 0 Render date: 2026-05-09T22:45:09.075Z Has data issue: false hasContentIssue false

MonoVisual3DFilter: 3D tomatoes’ localisation with monocular cameras using histogram filters

Published online by Cambridge University Press:  18 September 2024

Sandro Augusto Costa Magalhães*
Affiliation:
Faculty of Engineering, University of Porto, Porto 4200-465, Portugal INESC TEC – Instituto de Engenharia de Sistemas e Computadores, Tecnologia e Ciência, Porto 4200-465, Portugal
Filipe Neves dos Santos
Affiliation:
INESC TEC – Instituto de Engenharia de Sistemas e Computadores, Tecnologia e Ciência, Porto 4200-465, Portugal
António Paulo Moreira
Affiliation:
Faculty of Engineering, University of Porto, Porto 4200-465, Portugal INESC TEC – Instituto de Engenharia de Sistemas e Computadores, Tecnologia e Ciência, Porto 4200-465, Portugal
Jorge Manuel Miranda Dias
Affiliation:
Institute of Systems and Robotics, Department of Electrical Engineering and Computers, University of Coimbra, Coimbra, Portugal Khalifa University of Science, Technology, and Research, Abu Dhabi, United Emirates of Arabia (EUA)
*
Corresponding author: Sandro Augusto Costa Magalhães; Email: sandro.a.magalhaes@inesctec.pt
Rights & Permissions [Opens in a new window]

Abstract

Performing tasks in agriculture, such as fruit monitoring or harvesting, requires perceiving the objects’ spatial position. RGB-D cameras are limited under open-field environments due to lightning interferences. So, in this study, we state to answer the research question: “How can we use and control monocular sensors to perceive objects’ position in the 3D task space?” Towards this aim, we approached histogram filters (Bayesian discrete filters) to estimate the position of tomatoes in the tomato plant through the algorithm MonoVisual3DFilter. Two kernel filters were studied: the square kernel and the Gaussian kernel. The implemented algorithm was essayed in simulation, with and without Gaussian noise and random noise, and in a testbed at laboratory conditions. The algorithm reported a mean absolute error lower than 10 mm in simulation and 20 mm in the testbed at laboratory conditions with an assessing distance of about 0.5 m. So, the results are viable for real environments and should be improved at closer distances.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Table I. Comparision with literature review.

Figure 1

Figure 1. Robot AgRob v16 from INESC TEC to operate under open-field and controlled agricultural environments.

Figure 2

Figure 2. Simulated environment to validate the histogram filter effectiveness. Green spheres are the objects being detected, representing the tomatoes, and the black box is the bounding box camera looking at the spheres.

Figure 3

Figure 3. Simulated testbed in the laboratory to essay the histogram filter algorithm.

Figure 4

Figure 4. Intersection between multiple viewpoints in 2D plane.

Figure 5

Algorithm 1. Histogram filter – updating weights

Figure 6

Figure 5. Intersection of the camera around in the decomposed space.

Figure 7

Figure 6. Conversion between the camera’s and sensor’s frames (blue – sensor’s frame; black – camera’s frame).

Figure 8

Figure 7. View of the spheres by the bounding box camera at each fixed viewpoint. The green square boxes around the spheres are the bounding boxes of the detected spheres by the bounding box camera.

Figure 9

Figure 8. View of the tomatoes in the testbed at each pose of the OAK-1 camera. The blue squares around the tomatoes are the detected tomatoes by the bounding box camera OAK-1 using a custom-trained YOLO v8 tiny detector. Inside each bounding box are the detected class (tomato) and the detection confidence. Each row is an experiment, in a total of six experiments, and each figure contains the number of tomatoes being detected.

Figure 10

Figure 9. Iteration of the histogram filter during simulation for detecting the six spheres, considering a square kernel. (a) Decomposition of the state space at the beginning of the algorithm; (b) detection at the end of the first viewpoint; (c) detection at the end of the second viewpoint; (d) detection at the end of the third viewpoint.

Figure 11

Figure 10. Iteration of the histogram filter during simulation for detecting the six spheres, considering a Gaussian kernel, $\mathcal{N}(0, 0.2)$. (a) Decomposition of the state space at the beginning of the algorithm; (b) detection at the end of the first viewpoint; (c) detection at the end of the second viewpoint; (d) detection at the end of the third viewpoint.

Figure 12

Figure 11. Error in estimating the position of the spheres in simulation without noise.

Figure 13

Figure 12. Error in estimating the position of the spheres in simulation with noise.

Figure 14

Figure 13. Error in estimating the position of the tomatoes at the testbed.

Figure 15

Table II. Error computations to the testbed experiments for the different kernels and centre estimation methods.

Figure 16

Figure 14. Error for the case of partial occlusion of Figs. 8j, 8k, 8l,.

Figure 17

Figure 15. RGB and depth images from the MiDaS v3.1 DPT SWIN2 Large 384 for estimating the tomatoes’ distance to the camera’s sensor.

Figure 18

Figure 16. Calibration curve to estimate the absolute depth in metres to the camera sensor for MiDaS CNN.

Figure 19

Figure 17. Maximum speedup analysis for parallelisation according to Amdahl’s law for the Gaussian and square kernels.