Hostname: page-component-77f85d65b8-pztms Total loading time: 0 Render date: 2026-04-18T03:27:44.594Z Has data issue: false hasContentIssue false

Three-dimensional optical microrobot orientation estimation and tracking using deep learning

Published online by Cambridge University Press:  05 December 2024

Sunil Choudhary
Affiliation:
Institut des Systemes Intelligents et de Robotique (ISIR), Sorbonne University, CNRS, Paris, France
Ferhat Sadak*
Affiliation:
Institut des Systemes Intelligents et de Robotique (ISIR), Sorbonne University, CNRS, Paris, France Department of Mechanical Engineering, Bartin University, Bartin, Türkiye
Edison Gerena
Affiliation:
Institut des Systemes Intelligents et de Robotique (ISIR), Sorbonne University, CNRS, Paris, France MovaLife microrobotics, Paris, France
Sinan Haliyo
Affiliation:
Institut des Systemes Intelligents et de Robotique (ISIR), Sorbonne University, CNRS, Paris, France
*
Corresponding author: Ferhat Sadak, Email: fsadak@bartin.edu.tr
Rights & Permissions [Opens in a new window]

Abstract

Optical microrobots are activated by a laser in a liquid medium using optical tweezers. To create visual control loops for robotic automation, this work describes a deep learning-based method for orientation estimation of optical microrobots, focusing on detecting 3-D rotational movements and localizing microrobots and trapping points (TPs). We integrated and fine-tuned You Only Look Once (YOLOv7) and Deep Simple Online Real-time Tracking (DeepSORT) algorithms, improving microrobot and TP detection accuracy by $\sim 3$% and $\sim 11$%, respectively, at the 0.95 Intersection over Union (IoU) threshold in our test set. Additionally, it increased mean average precision (mAP) by 3% at the 0.5:0.95 IoU threshold during training. Our results showed a 99% success rate in trapping events with no false-positive detection. We introduced a model that employs EfficientNet as a feature extractor combined with custom convolutional neural networks (CNNs) and feature fusion layers. To demonstrate its generalization ability, we evaluated the model on an independent in-house dataset comprising 4,757 image frames, where microrobots executed simultaneous rotations across all three axes. Our method provided mean rotation angle errors of $1.871^\circ$, $2.308^\circ$, and $2.808^\circ$ for X (yaw), Y (roll), and Z (pitch) axes, respectively. Compared to pre-trained models, our model provided the lowest error in the Y and Z axes while offering competitive results for X-axis. Finally, we demonstrated the explainability and transparency of the model’s decision-making process. Our work contributes to the field of microrobotics by providing an efficient 3-axis orientation estimation pipeline, with a clear focus on automation.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (https://creativecommons.org/licenses/by-nc-sa/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. Outline of our experimental configuration and approach to data preparation. (a) Setup for collecting microrobot data using an optical micromanipulation platform. (b) Microrobots transfer using a micro pipette in an optical microscope. (c) Optical tweezer (OT) setup and input device to record data. (d) Computer-aided design model of the microrobot in $\mu m$. (e) Sample microrobot tracking and TPs localization dataset collection at random rotation in image plane. (f) At different brightness level. (g) At noisy micro-environment. (h) At various trapped conditions. (i) Sample 3-D microrobot orientation estimation dataset collection cases at ($48.6^\circ$, $1.2^\circ$, $15.76^\circ$), (j) at ($3.4^\circ$, $-36.6^\circ$, $8.2^\circ$), and (k) at ($0^\circ$, $0^\circ$, $-31.2^\circ$).

Figure 1

Figure 2. Comparison between orientations computed from the input device (Omega7 measure) and kinematic transforms (ground truth).

Figure 2

Table I. Orientation estimation network parameters (SER = squeeze excitation ratio, R = repeats, IC = input channel, and OC = output channel).

Figure 3

Figure 3. The illustration of the proposed model architecture.

Figure 4

Figure 4. An illustration of some of the main hyperparameter tuning procedures. The Y-axis represents the fitness score, the X-axis represents hyperparameter values, and higher concentrations are denoted by the yellow color code [37].

Figure 5

Figure 5. Loss during training and validation versus number of epochs.

Figure 6

Figure 6. Performance evaluation of optimized YOLOv7 and DeepSORT model for 100 epochs based on: (a) mAP@0.5 (b) mAP@0.5:0.95 (c) training and validation loss for YOLOv7 and DeepSORT model. (d) training and validation loss for optimized YOLOv7 and DeepSORT model [37].

Figure 7

Table II. Comparison of TP detection methods.

Figure 8

Figure 7. Evaluation of the optimized YOLOv7 and DeepSORT model performance using different IoU thresholds in the test set [37].

Figure 9

Figure 8. Evaluation of model performance in the test set.

Figure 10

Table III. Metrics for different axes on test dataset.

Figure 11

Table IV. Performance evaluation of our proposed model for different axes on in-house dataset.

Figure 12

Figure 9. Evaluation of model performance on in-house dataset.

Figure 13

Figure 10. Explainability of the microrobot and TP localization using SCORE-CAM. (a) input image (b) scoreCam heatmap results (c) saliency maps (d) detection and tracking results for microrobot.

Figure 14

Figure 11. Explainability of the microrobot orientation estimation model with input images and SHAP value images.

Figure 15

Table V. Comparison between the proposed model and pre-trained models of metrics for an orientation around X, Y, and Z.

Figure 16

Table VI. Comparison of our work with state-of-the-art for orientation angle estimation (OM = optical microscope).