RGB-D image-based real-time pose estimation algorithm for mobile robots with rectangular body

Haoran Tang; Bo Wang; Xiaofei Zhou; Zhimin Han; Qiang Lv

doi:10.1017/S0263574725102117

RGB-D image-based real-time pose estimation algorithm for mobile robots with rectangular body

Published online by Cambridge University Press: 11 August 2025

Zhimin Han and

Haoran Tang: Affiliation:
School of Automation and International Joint Research Laboratory for Autonomous Robotic Systems, Hangzhou Dianzi University, Hangzhou, China
Bo Wang*: Affiliation:
School of Automation and International Joint Research Laboratory for Autonomous Robotic Systems, Hangzhou Dianzi University, Hangzhou, China
Xiaofei Zhou: Affiliation:
School of Automation and International Joint Research Laboratory for Autonomous Robotic Systems, Hangzhou Dianzi University, Hangzhou, China
Zhimin Han: Affiliation:
School of Automation and International Joint Research Laboratory for Autonomous Robotic Systems, Hangzhou Dianzi University, Hangzhou, China
Qiang Lv: Affiliation:
School of Automation and International Joint Research Laboratory for Autonomous Robotic Systems, Hangzhou Dianzi University, Hangzhou, China
*: Corresponding author: Bo Wang; Email: wangbo@hdu.edu.cn

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In this study, we introduce a real-time pose estimation for a class of mobile robots with rectangular body (e.g., the common automatic guided vehicles), by integrating odometry and RGB-D images. First, a lightweight object detection model is designed based on the visual information. Then, a pose estimation algorithm is proposed based on the depth value variations within the target region that exhibit specific patterns due to the robot’s three-dimensional geometry and the observation perspective (termed as “differentiated depth information”). To improve the robustness of object detection and pose estimation, a Kalman filter is further constructed by incorporating odometry data. Finally, a series of simulations and experiments are conducted to demonstrate the method’s effectiveness. Experiments show that the proposed algorithm can achieve a speed over 20 Frames Per Second (FPS) together with a good estimation accuracy on a mobile robot equipped with an Nvidia Jetson Nano Developer KIT.

Keywords

pose estimation object detection mobile robot

Information

Type: Research Article
Information: Robotica , Volume 43 , Issue 8 , August 2025 , pp. 3080 - 3093

DOI: https://doi.org/10.1017/S0263574725102117 [Opens in a new window]
Copyright: © The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Zhai, C., Zhang, P., Xu, H., Yuan, X., Zhou, L. and Wu, R., “The Application and Inspiration of Robots in the US Military,” 2023 8th Asia-Pacific Conference on Intelligent Robot Systems (ACIRS) (IEEE, 2023) pp. 24–28.CrossRef Google Scholar

Lin, H.-Y., Xu, Z.-Y., Zhou, J.-Y. and Chen, J.-Y., “A ROS-based Agricultural AI-Driven AGV (A3GV) with Collaboration and Guiding from Drones in the Outdoor Farming Fields,” 2024 IEEE 21st Consumer Communications & Networking Conference (CCNC) (IEEE, 2024) pp. 1–2.CrossRef Google Scholar

Hassan, M., Eberhardt, J., Malorodov, S. and Jäger, M., “Robust multiview 3d pose estimation using time of flight cameras,” IEEE Sens. J. 22(3), 2672–2684 (2021).CrossRef Google Scholar

Chen, X., Xing, Z., Feng, L., Zhang, T., Wu, W. and Hu, R., “An etcen-based motion coordination strategy avoiding active and passive deadlocks for multi-AGV system,” IEEE Trans. Autom. Sci. Eng. 20(2), 1364–1377 (2023).CrossRef Google Scholar

Meng, J., Wang, S., Jiang, L., Hu, Z. and Xie, Y., “Accurate and efficient self-localization of AGV relying on trusted area information in dynamic industrial scene,” IEEE Trans. Veh. Technol. 72(6), 7148–7159 (2023).CrossRef Google Scholar

Zhang, Y., Li, B., Sun, S., Liu, Y., Liang, W., Xia, X. and Pang, Z., “GCMVF-AGV: Globally consistent multi-view visual-inertial fusion for AGV navigation in digital workshops,” IEEE Trans. Instrum. Meas. 72, 1–16 (2023).Google Scholar

Ramos, G. S., Pinto, M. F. and Haddad, D. B., “Advancing UAV swarm autonomy with ARCog-NET for task allocation, path planning, and formation control,” Robotica, 1–45 (2025). doi: 10.1017/S0263574725101914.CrossRef Google Scholar

Ding, W., Pei, Z., Yang, T. and Chen, T., “Dynamic simultaneous localization and mapping based on object tracking in occluded environment,” Robotica 42(7), 2209–2225 (2024).CrossRef Google Scholar

Zeng, J., Zhong, H., Wang, Y., Fan, S. and Zhang, H., “Autonomous control design of an unmanned aerial manipulator for contact inspection,” Robotica 41(4), 1145–1158 (2023).CrossRef Google Scholar

Liu, C., Shi, K., Zhou, K., Wang, H., Zhang, J. and Dong, H., “Rgbgrasp: Image-based object grasping by capturing multiple views during robot arm movement with neural radiance fields,” IEEE Rob. Autom. Lett. 9(6), 6012–6019 (2024).CrossRef Google Scholar

Zhou, Y., Luo, J. and Wang, M., “Dynamic manipulability analysis of multi-arm space robot,” Robotica 39(1), 23–41 (2021).CrossRef Google Scholar

Liu, J. and Guo, G., “Vehicle localization during gps outages with extended kalman filter and deep learning,” IEEE Trans. Instrum. Meas. 70, 1–10 (2021).Google Scholar

Liang, H., He, Y., Zhao, C., Li, M., Wang, J., Yu, J. and Xu, L., “HybridCap: Inertia-aid Monocular Capture of Challenging Human Motions,” AAAI Conference on Artificial Intelligence (2023) pp. 1539–1548.Google Scholar

Zhang, S., Qiang, B., Yang, X., Zhou, M., Chen, R. and Chen, L., “Efficient pose estimation via a lightweight single-branch pose distillation network,” IEEE Sens. J. 23(22), 27709–27719 (2023).CrossRef Google Scholar

Wang, C., Zhu, D., Sun, L., Han, C. and Guo, J., “Real-time through-wall multiperson 3d pose estimation based on mimo radar,” IEEE Trans. Instrum. Meas. 12(8), 10589–10600 (2024).Google Scholar

Qin, T., Li, P. and Shen, S., “Vins-mono: A robust and versatile monocular visual-inertial state estimator,” IEEE Trans. Rob. 34(4), 1004–1020 (2018).CrossRef Google Scholar

Haomin, L., Guofeng, Z. and Hujun, B., “A survey of monocular simultaneous localization and mapping,” J. Comput. Aid Mol. Des. 28(6), 855–868 (2016).Google Scholar

Cheng, J., Zhang, L., Chen, Q., Hu, X. and Cai, J., “A review of visual slam methods for autonomous driving vehicles,” Eng. Appl. Artif. Intell. 114, 104992 (2022).CrossRef Google Scholar

Qiao, S., Zhang, H., Xie, F. and Jiang, Z., “Deep-learning-based direct attitude estimation for uncooperative known space objects,” IEEE Trans. Aerosp. Electron. Syst. 60(3), 2526–2541 (2024).CrossRef Google Scholar

Zhang, X., Jiang, Z. and Zhang, H., “Real-time 6d pose estimation from a single rgb image,” Image Vision Comput. 89, 1–11 (2019).CrossRef Google Scholar

Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K. and Navab, N., “Model Based Training, Detection and Pose Estimation of Texture-less 3d Objects in Heavily Cluttered Scenes,” Computer Vision – ACCV 2012 (Springer Berlin Heidelberg, Berlin, Heidelberg, 2013) pp. 548–562.CrossRef Google Scholar

Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J. and Rother, C., “Learning 6d Object Pose Estimation Using 3d Object Coordinates,” Computer Vision – ECCV 2014 (Springer International Publishing, Cham, 2014) pp. 536–551.CrossRef Google Scholar

Höfer, T., Shamsafar, F., Benbarka, N. and Zell, A., “Object Detection and Autoencoder-based 6d Pose Estimation for Highly Cluttered Bin Picking,” 2021 IEEE international conference on image processing (ICIP) (IEEE, 2021) pp. 704–708.CrossRef Google Scholar

Josifovski, J., Kerzel, M., Pregizer, C., Posniak, L. and Wermter, S., “Object Detection and Pose Estimation Based on Convolutional Neural Networks Trained with Synthetic Data,” 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE, 2018) pp. 6269–6276.CrossRef Google Scholar

Redmon, J., Divvala, S., Girshick, R. and Farhadi, A., “You Only Look Once: Unified, Real-Time Object Detection,” IEEE Conference on Computer Vision and Pattern Recognition (2016) pp. 779–788.Google Scholar

Redmon, J. and Farhadi, A., Yolov3: An incremental improvement (2018).Google Scholar

Wang, C.-Y., Yeh, I.-H. and Liao, H.-Y. M., Yolov9: Learning what you want to learn using programmable gradient information (2024).CrossRef Google Scholar

Ge, Z., Liu, S., Wang, F., Li, Z. and Sun, J., Yolox: Exceeding yolo series in 2021 (2021).Google Scholar

Huang, L., Li, W., Tan, Y., Shen, L., Yu, J. and Fu, H., Yolocs: Object detection based on dense channel compression for feature spatial solidification (2024).Google Scholar

Song, J., Zhou, M., Luo, J., Pu, H., Feng, Y., Wei, X. and Jia, W., “Boundary-aware feature fusion with dual-stream attention for remote sensing small object detection,” IEEE Trans. Geosci. Remote Sens. 63, 1–13 (2024).Google Scholar

Cheng, S., Song, J., Zhou, M., Wei, X., Pu, H., Luo, J. and Jia, W., “Ef-detr: A lightweight transformer-based object detector with an encoder-free neck,” IEEE Trans. Ind. Inf. 20(11), 12994–13002 (2024).CrossRef Google Scholar

Xu, M., Wang, Y., Xu, B., Zhang, J., Ren, J., Huang, Z., Poslad, S. and Xu, P., “A critical analysis of image-based camera pose estimation techniques,” Neurocomputing 570, 127125 (2024).CrossRef Google Scholar

Huang, H., Song, B., Zhao, G. and Bo, Y., “End-to-end monocular pose estimation for uncooperative spacecraft based on direct regression network,” IEEE Trans. Aerosp. Electron. Syst. 59(5), 5378–5389 (2023).Google Scholar

Liu, Z., Li, R., Shao, S., Wu, X. and Chen, W., “Self-supervised monocular depth estimation with self-reference distillation and disparity offset refinement,” IEEE Trans. Circuits Syst. Video Technol. 33(12), 7565–7577 (2023).CrossRef Google Scholar

Zhou, M., Leng, H., Fang, B., Xiang, T., Wei, X. and Jia, W., “Low-light image enhancement via a frequency-based model with structure and texture decomposition,” ACM Trans. Multimedia Comput. Commun. Appl. 19a(6), 1–23 (2023a).Google Scholar

Zhou, M., Wu, X., Wei, X., Xiang, T., Fang, B. and Kwong, S., “Low-light enhancement method based on a retinex model for structure preservation,” IEEE Trans. Multimedia 26b, 650–662 (2023b).Google Scholar

Shen, Y. and Zhang, X., “A dynamic slam system with yolov7 segmentation and geometric constraints for indoor environments,” Robotica, 1–19 (2025). doi: 10.1017/S0263574725101823.CrossRef Google Scholar

Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B. and Belongie, S., “Feature Pyramid Networks for Object Detection,” IEEE Conference on Computer Vision and Pattern Recognition (2017) pp. 2117–2125.Google Scholar

Kenney, R. H. and McDaniel, J. W., “Cooperative navigation of mobile radar sensors using time-of-arrival measurements and the unscented kalman filter,” IEEE Trans. Radar Syst. 1, 435–447 (2023).CrossRef Google Scholar

Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J. and Ding, G., Yolov10: Real-time end-to-end object detection (2024).Google Scholar

Khanam, R. and Hussain, M., Yolov11: An overview of the key architectural enhancements (2024).Google Scholar

Tian, Y., Ye, Q. and Doermann, D., Yolov12: Attention-centric real-time object detectors (2025).Google Scholar

Guo, Q., Zhang, Z., Zhou, M., Yue, H., Pu, H. and Luo, J., “Image defogging based on regional gradient constrained prior,” ACM Trans. Multimedia Comput. Commun. Appl. 20(3), 1–17 (2023).CrossRef Google Scholar

Fan, L., Wei, X., Zhou, M., Yan, J., Pu, H., Luo, J. and Li, Z., “A semantic-aware detail adaptive network for image enhancement,” IEEE Trans. Circuits Syst. Video Technol. 35(2), 1787–1800 (2024).CrossRef Google Scholar

Guo, Q. and Zhou, M., “Progressive domain translation defogging network for real-world fog images,” IEEE Trans. Broadcast. 68(4), 876–885 (2022).CrossRef Google Scholar

Zhou, M., Lan, X., Wei, X., Liao, X., Mao, Q., Li, Y., Wu, C., Xiang, T. and Fang, B., “An end-to-end blind image quality assessment method using a recurrent network and self-attention,” IEEE Trans. Broadcast. 69(2), 369–377 (2022).CrossRef Google Scholar

Zhou, M., Shen, W., Wei, X., Luo, J., Jia, F., Zhuang, X. and Jia, W., “Blind image quality assessment: Exploring content fidelity perceptibility via quality adversarial learning,” Int. J. Comput. Vision 133, 3242–3258 (2025).CrossRef Google Scholar

Shen, W., Zhou, M., Luo, J., Li, Z. and Kwong, S., “Graph-represented distribution similarity index for full-reference image quality assessment,” IEEE Trans. Image Process. 33, 3075–3089 (2024).CrossRef Google Scholar PubMed

Article contents

RGB-D image-based real-time pose estimation algorithm for mobile robots with rectangular body

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests