Hostname: page-component-cb9f654ff-mnl9s Total loading time: 0 Render date: 2025-08-11T09:43:32.096Z Has data issue: false hasContentIssue false

RGB-D image-based real-time pose estimation algorithm for mobile robots with rectangular body

Published online by Cambridge University Press:  11 August 2025

Haoran Tang
Affiliation:
School of Automation and International Joint Research Laboratory for Autonomous Robotic Systems, Hangzhou Dianzi University, Hangzhou, China
Bo Wang*
Affiliation:
School of Automation and International Joint Research Laboratory for Autonomous Robotic Systems, Hangzhou Dianzi University, Hangzhou, China
Xiaofei Zhou
Affiliation:
School of Automation and International Joint Research Laboratory for Autonomous Robotic Systems, Hangzhou Dianzi University, Hangzhou, China
Zhimin Han
Affiliation:
School of Automation and International Joint Research Laboratory for Autonomous Robotic Systems, Hangzhou Dianzi University, Hangzhou, China
Qiang Lv
Affiliation:
School of Automation and International Joint Research Laboratory for Autonomous Robotic Systems, Hangzhou Dianzi University, Hangzhou, China
*
Corresponding author: Bo Wang; Email: wangbo@hdu.edu.cn

Abstract

In this study, we introduce a real-time pose estimation for a class of mobile robots with rectangular body (e.g., the common automatic guided vehicles), by integrating odometry and RGB-D images. First, a lightweight object detection model is designed based on the visual information. Then, a pose estimation algorithm is proposed based on the depth value variations within the target region that exhibit specific patterns due to the robot’s three-dimensional geometry and the observation perspective (termed as “differentiated depth information”). To improve the robustness of object detection and pose estimation, a Kalman filter is further constructed by incorporating odometry data. Finally, a series of simulations and experiments are conducted to demonstrate the method’s effectiveness. Experiments show that the proposed algorithm can achieve a speed over 20 Frames Per Second (FPS) together with a good estimation accuracy on a mobile robot equipped with an Nvidia Jetson Nano Developer KIT.

Information

Type
Research Article
Copyright
© The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Zhai, C., Zhang, P., Xu, H., Yuan, X., Zhou, L. and Wu, R., “The Application and Inspiration of Robots in the US Military,” 2023 8th Asia-Pacific Conference on Intelligent Robot Systems (ACIRS) (IEEE, 2023) pp. 2428.10.1109/ACIRS58671.2023.10239710CrossRefGoogle Scholar
Lin, H.-Y., Xu, Z.-Y., Zhou, J.-Y. and Chen, J.-Y., “A ROS-based Agricultural AI-Driven AGV (A3GV) with Collaboration and Guiding from Drones in the Outdoor Farming Fields,” 2024 IEEE 21st Consumer Communications & Networking Conference (CCNC) (IEEE, 2024) pp. 12.10.1109/CCNC51664.2024.10454671CrossRefGoogle Scholar
Hassan, M., Eberhardt, J., Malorodov, S. and Jäger, M., “Robust multiview 3d pose estimation using time of flight cameras,” IEEE Sens. J. 22(3), 26722684 (2021).10.1109/JSEN.2021.3133108CrossRefGoogle Scholar
Chen, X., Xing, Z., Feng, L., Zhang, T., Wu, W. and Hu, R., “An etcen-based motion coordination strategy avoiding active and passive deadlocks for multi-AGV system,” IEEE Trans. Autom. Sci. Eng. 20(2), 13641377 (2023).10.1109/TASE.2022.3175909CrossRefGoogle Scholar
Meng, J., Wang, S., Jiang, L., Hu, Z. and Xie, Y., “Accurate and efficient self-localization of AGV relying on trusted area information in dynamic industrial scene,” IEEE Trans. Veh. Technol. 72(6), 71487159 (2023).10.1109/TVT.2023.3241203CrossRefGoogle Scholar
Zhang, Y., Li, B., Sun, S., Liu, Y., Liang, W., Xia, X. and Pang, Z., “GCMVF-AGV: Globally consistent multi-view visual-inertial fusion for AGV navigation in digital workshops,” IEEE Trans. Instrum. Meas. 72, 116 (2023).Google Scholar
Ramos, G. S., Pinto, M. F. and Haddad, D. B., “Advancing UAV swarm autonomy with ARCog-NET for task allocation, path planning, and formation control,” Robotica, 145 (2025). doi: 10.1017/S0263574725101914.CrossRefGoogle Scholar
Ding, W., Pei, Z., Yang, T. and Chen, T., “Dynamic simultaneous localization and mapping based on object tracking in occluded environment,” Robotica 42(7), 22092225 (2024).10.1017/S0263574724000420CrossRefGoogle Scholar
Zeng, J., Zhong, H., Wang, Y., Fan, S. and Zhang, H., “Autonomous control design of an unmanned aerial manipulator for contact inspection,” Robotica 41(4), 11451158 (2023).10.1017/S0263574722001588CrossRefGoogle Scholar
Liu, C., Shi, K., Zhou, K., Wang, H., Zhang, J. and Dong, H., “Rgbgrasp: Image-based object grasping by capturing multiple views during robot arm movement with neural radiance fields,” IEEE Rob. Autom. Lett. 9(6), 60126019 (2024).10.1109/LRA.2024.3396101CrossRefGoogle Scholar
Zhou, Y., Luo, J. and Wang, M., “Dynamic manipulability analysis of multi-arm space robot,” Robotica 39(1), 2341 (2021).10.1017/S0263574720000077CrossRefGoogle Scholar
Liu, J. and Guo, G., “Vehicle localization during gps outages with extended kalman filter and deep learning,” IEEE Trans. Instrum. Meas. 70, 110 (2021).Google Scholar
Liang, H., He, Y., Zhao, C., Li, M., Wang, J., Yu, J. and Xu, L., “HybridCap: Inertia-aid Monocular Capture of Challenging Human Motions,” AAAI Conference on Artificial Intelligence (2023) pp. 15391548.Google Scholar
Zhang, S., Qiang, B., Yang, X., Zhou, M., Chen, R. and Chen, L., “Efficient pose estimation via a lightweight single-branch pose distillation network,” IEEE Sens. J. 23(22), 2770927719 (2023).10.1109/JSEN.2023.3322987CrossRefGoogle Scholar
Wang, C., Zhu, D., Sun, L., Han, C. and Guo, J., “Real-time through-wall multiperson 3d pose estimation based on mimo radar,” IEEE Trans. Instrum. Meas. 12(8), 1058910600 (2024).Google Scholar
Qin, T., Li, P. and Shen, S., “Vins-mono: A robust and versatile monocular visual-inertial state estimator,” IEEE Trans. Rob. 34(4), 10041020 (2018).10.1109/TRO.2018.2853729CrossRefGoogle Scholar
Haomin, L., Guofeng, Z. and Hujun, B., “A survey of monocular simultaneous localization and mapping,” J. Comput. Aid Mol. Des. 28(6), 855868 (2016).Google Scholar
Cheng, J., Zhang, L., Chen, Q., Hu, X. and Cai, J., “A review of visual slam methods for autonomous driving vehicles,” Eng. Appl. Artif. Intell. 114, 104992 (2022).10.1016/j.engappai.2022.104992CrossRefGoogle Scholar
Qiao, S., Zhang, H., Xie, F. and Jiang, Z., “Deep-learning-based direct attitude estimation for uncooperative known space objects,” IEEE Trans. Aerosp. Electron. Syst. 60(3), 25262541 (2024).10.1109/TAES.2023.3325801CrossRefGoogle Scholar
Zhang, X., Jiang, Z. and Zhang, H., “Real-time 6d pose estimation from a single rgb image,” Image Vision Comput. 89, 111 (2019).10.1016/j.imavis.2019.06.013CrossRefGoogle Scholar
Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K. and Navab, N., “Model Based Training, Detection and Pose Estimation of Texture-less 3d Objects in Heavily Cluttered Scenes,” Computer Vision – ACCV 2012 (Springer Berlin Heidelberg, Berlin, Heidelberg, 2013) pp. 548562.10.1007/978-3-642-37331-2_42CrossRefGoogle Scholar
Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J. and Rother, C., “Learning 6d Object Pose Estimation Using 3d Object Coordinates,” Computer Vision – ECCV 2014 (Springer International Publishing, Cham, 2014) pp. 536551.10.1007/978-3-319-10605-2_35CrossRefGoogle Scholar
Höfer, T., Shamsafar, F., Benbarka, N. and Zell, A., “Object Detection and Autoencoder-based 6d Pose Estimation for Highly Cluttered Bin Picking,” 2021 IEEE international conference on image processing (ICIP) (IEEE, 2021) pp. 704708.10.1109/ICIP42928.2021.9506304CrossRefGoogle Scholar
Josifovski, J., Kerzel, M., Pregizer, C., Posniak, L. and Wermter, S., “Object Detection and Pose Estimation Based on Convolutional Neural Networks Trained with Synthetic Data,” 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE, 2018) pp. 62696276.10.1109/IROS.2018.8594379CrossRefGoogle Scholar
Redmon, J., Divvala, S., Girshick, R. and Farhadi, A., “You Only Look Once: Unified, Real-Time Object Detection,” IEEE Conference on Computer Vision and Pattern Recognition (2016) pp. 779788.Google Scholar
Redmon, J. and Farhadi, A., Yolov3: An incremental improvement (2018).Google Scholar
Wang, C.-Y., Yeh, I.-H. and Liao, H.-Y. M., Yolov9: Learning what you want to learn using programmable gradient information (2024).10.1007/978-3-031-72751-1_1CrossRefGoogle Scholar
Ge, Z., Liu, S., Wang, F., Li, Z. and Sun, J., Yolox: Exceeding yolo series in 2021 (2021).Google Scholar
Huang, L., Li, W., Tan, Y., Shen, L., Yu, J. and Fu, H., Yolocs: Object detection based on dense channel compression for feature spatial solidification (2024).10.1016/j.knosys.2025.113024CrossRefGoogle Scholar
Song, J., Zhou, M., Luo, J., Pu, H., Feng, Y., Wei, X. and Jia, W., “Boundary-aware feature fusion with dual-stream attention for remote sensing small object detection,” IEEE Trans. Geosci. Remote Sens. 63, 113 (2024).Google Scholar
Cheng, S., Song, J., Zhou, M., Wei, X., Pu, H., Luo, J. and Jia, W., “Ef-detr: A lightweight transformer-based object detector with an encoder-free neck,” IEEE Trans. Ind. Inf. 20(11), 1299413002 (2024).10.1109/TII.2024.3431044CrossRefGoogle Scholar
Xu, M., Wang, Y., Xu, B., Zhang, J., Ren, J., Huang, Z., Poslad, S. and Xu, P., “A critical analysis of image-based camera pose estimation techniques,” Neurocomputing 570, 127125 (2024).10.1016/j.neucom.2023.127125CrossRefGoogle Scholar
Huang, H., Song, B., Zhao, G. and Bo, Y., “End-to-end monocular pose estimation for uncooperative spacecraft based on direct regression network,” IEEE Trans. Aerosp. Electron. Syst. 59(5), 53785389 (2023).Google Scholar
Liu, Z., Li, R., Shao, S., Wu, X. and Chen, W., “Self-supervised monocular depth estimation with self-reference distillation and disparity offset refinement,” IEEE Trans. Circuits Syst. Video Technol. 33(12), 75657577 (2023).10.1109/TCSVT.2023.3275584CrossRefGoogle Scholar
Zhou, M., Leng, H., Fang, B., Xiang, T., Wei, X. and Jia, W., “Low-light image enhancement via a frequency-based model with structure and texture decomposition,” ACM Trans. Multimedia Comput. Commun. Appl. 19a(6), 123 (2023a).Google Scholar
Zhou, M., Wu, X., Wei, X., Xiang, T., Fang, B. and Kwong, S., “Low-light enhancement method based on a retinex model for structure preservation,” IEEE Trans. Multimedia 26b, 650662 (2023b).Google Scholar
Shen, Y. and Zhang, X., “A dynamic slam system with yolov7 segmentation and geometric constraints for indoor environments,” Robotica, 119 (2025). doi: 10.1017/S0263574725101823.CrossRefGoogle Scholar
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B. and Belongie, S., “Feature Pyramid Networks for Object Detection,” IEEE Conference on Computer Vision and Pattern Recognition (2017) pp. 21172125.Google Scholar
Kenney, R. H. and McDaniel, J. W., “Cooperative navigation of mobile radar sensors using time-of-arrival measurements and the unscented kalman filter,” IEEE Trans. Radar Syst. 1, 435447 (2023).10.1109/TRS.2023.3310862CrossRefGoogle Scholar
Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J. and Ding, G., Yolov10: Real-time end-to-end object detection (2024).Google Scholar
Khanam, R. and Hussain, M., Yolov11: An overview of the key architectural enhancements (2024).Google Scholar
Tian, Y., Ye, Q. and Doermann, D., Yolov12: Attention-centric real-time object detectors (2025).Google Scholar
Guo, Q., Zhang, Z., Zhou, M., Yue, H., Pu, H. and Luo, J., “Image defogging based on regional gradient constrained prior,” ACM Trans. Multimedia Comput. Commun. Appl. 20(3), 117 (2023).10.1145/3565266CrossRefGoogle Scholar
Fan, L., Wei, X., Zhou, M., Yan, J., Pu, H., Luo, J. and Li, Z., “A semantic-aware detail adaptive network for image enhancement,” IEEE Trans. Circuits Syst. Video Technol. 35(2), 17871800 (2024).10.1109/TCSVT.2024.3483191CrossRefGoogle Scholar
Guo, Q. and Zhou, M., “Progressive domain translation defogging network for real-world fog images,” IEEE Trans. Broadcast. 68(4), 876885 (2022).10.1109/TBC.2022.3187816CrossRefGoogle Scholar
Zhou, M., Lan, X., Wei, X., Liao, X., Mao, Q., Li, Y., Wu, C., Xiang, T. and Fang, B., “An end-to-end blind image quality assessment method using a recurrent network and self-attention,” IEEE Trans. Broadcast. 69(2), 369377 (2022).10.1109/TBC.2022.3215249CrossRefGoogle Scholar
Zhou, M., Shen, W., Wei, X., Luo, J., Jia, F., Zhuang, X. and Jia, W., “Blind image quality assessment: Exploring content fidelity perceptibility via quality adversarial learning,” Int. J. Comput. Vision 133, 32423258 (2025).10.1007/s11263-024-02338-7CrossRefGoogle Scholar
Shen, W., Zhou, M., Luo, J., Li, Z. and Kwong, S., “Graph-represented distribution similarity index for full-reference image quality assessment,” IEEE Trans. Image Process. 33, 30753089 (2024).10.1109/TIP.2024.3390565CrossRefGoogle ScholarPubMed