Hostname: page-component-89b8bd64d-5bvrz Total loading time: 0 Render date: 2026-05-13T16:38:50.005Z Has data issue: false hasContentIssue false

A semantic visual SLAM based on improved mask R-CNN in dynamic environment

Published online by Cambridge University Press:  03 October 2024

Kang Zhang
Affiliation:
College of Electric Power, Inner Mongolia University of Technology, Hohhot, China Intelligent Energy Technology and Equipment Engineering Research Centre of Colleges and Universities in Inner Mongolia Autonomous Region, Hohhot, China
Chaoyi Dong*
Affiliation:
College of Electric Power, Inner Mongolia University of Technology, Hohhot, China Intelligent Energy Technology and Equipment Engineering Research Centre of Colleges and Universities in Inner Mongolia Autonomous Region, Hohhot, China Engineering Research Center of Large Energy Storage Technology, Ministry of Education, Hohhot, China
Hongfei Guo*
Affiliation:
Inner Mongolia Academy of Science and Technology, Hohhot, China
Qifan Ye
Affiliation:
College of Electric Power, Inner Mongolia University of Technology, Hohhot, China Intelligent Energy Technology and Equipment Engineering Research Centre of Colleges and Universities in Inner Mongolia Autonomous Region, Hohhot, China
Liangliang Gao
Affiliation:
College of Electric Power, Inner Mongolia University of Technology, Hohhot, China Intelligent Energy Technology and Equipment Engineering Research Centre of Colleges and Universities in Inner Mongolia Autonomous Region, Hohhot, China
Shuai Xiang
Affiliation:
College of Electric Power, Inner Mongolia University of Technology, Hohhot, China Intelligent Energy Technology and Equipment Engineering Research Centre of Colleges and Universities in Inner Mongolia Autonomous Region, Hohhot, China
Xiaoyan Chen
Affiliation:
College of Electric Power, Inner Mongolia University of Technology, Hohhot, China Intelligent Energy Technology and Equipment Engineering Research Centre of Colleges and Universities in Inner Mongolia Autonomous Region, Hohhot, China Engineering Research Center of Large Energy Storage Technology, Ministry of Education, Hohhot, China
Yi Wu
Affiliation:
Inner Mongolia Academy of Science and Technology, Hohhot, China
*
Corresponding authors: Chaoyi Dong; Email: dongchaoyi@imut.edu.cn; Hongfei Guo; Email: ghf-2005@163.com
Corresponding authors: Chaoyi Dong; Email: dongchaoyi@imut.edu.cn; Hongfei Guo; Email: ghf-2005@163.com
Rights & Permissions [Opens in a new window]

Abstract

To address the issues of low positioning accuracy and weak robustness of prior visual simultaneous localization and mapping (VSLAM) systems in dynamic environments, a semantic VSLAM (Sem-VSLAM) approach based on deep learning is proposed in this article. The proposed Sem-VSLAM algorithm adds semantic segmentation threads in parallel based on the open-source ORB-SLAM2’s visual odometry. First, while extracting the ORB features from an RGB-D image, the frame image is semantically segmented, and the segmented results are detected and repaired. Then, the feature points of dynamic objects are eliminated by using semantic information and motion consistency detection, and the poses are estimated by using the remaining feature points after the dynamic feature elimination. Finally, a 3D point cloud map is constructed by using tracking information and semantic information. The experiment uses Technical University of Munich public data to show the usefulness of the Sem-VSLAM algorithm. The experimental results show that the Sem-VSLAM algorithm can reduce the absolute trajectory error and relative attitude error of attitude estimation by about 95% compared to the ORB-SLAM2 algorithm and by about 14% compared to the VO-YOLOv5s in a highly dynamic environment and the average time consumption of tracking each frame image reaches 61 ms. It is verified that the Sem-VSLAM algorithm effectively improves the robustness and positioning accuracy in high dynamic environment and owning a satisfying real-time performance. Therefore, the Sem-VSLAM has a better mapping effect in a highly dynamic environment.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (https://creativecommons.org/licenses/by-nc-sa/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. ORB-SLAM2 system framework.

Figure 1

Figure 2. The system framework of sem-VSLAM.

Figure 2

Figure 3. An illustration of multi-view epipolar geometric constraints.

Figure 3

Figure 4. The flowchart of the motion consistency detection algorithm.

Figure 4

Figure 5. Judgment criteria of segmentation result states.

Figure 5

Figure 6. TUM datasets in different environments.

Figure 6

Table I. Training and experimental computer parameters.

Figure 7

Table II. The detection performance of the backbone-changed mask R-CNN networks and YOLOv5s network.

Figure 8

Figure 7. Dynamic feature point elimination process.

Figure 9

Figure 8. Dynamic feature point elimination process.

Figure 10

Figure 9. Comparison of dynamic feature point elimination effects (TUM dataset).

Figure 11

Figure 10. Comparison of dynamic feature point elimination effects (laboratory environment dataset).

Figure 12

Table III. RMSE and SD comparison of ATE.

Figure 13

Table IV. RMSE and SD comparison of RPE.

Figure 14

Table V. RMSE and SD comparison of relative rotation errors.

Figure 15

Figure 11. Comparison of RMSE and SD data in ATE (VO-YOLOv5s).

Figure 16

Figure 12. Comparison of RMSE and SD data in RPE (VO-YOLOv5s).

Figure 17

Figure 13. Comparison of RMSE and SD data in relative rotation error (VO-YOLOv5s).

Figure 18

Figure 14. ATE and RPE of the three algorithms on fr3_ walking _xyz dataset.

Figure 19

Figure 15. ATE and RPE of the three algorithms on fr3_ walking _rpy dataset.

Figure 20

Figure 16. ATE and RPE of the three algorithms on fr3_ walking _halfsphere dataset.

Figure 21

Figure 17. ATE and RPE of the three algorithms on fr3_walking _static dataset.

Figure 22

Table VI. RMSE comparison of VSLAM algorithm ATE in dynamic environment.

Figure 23

Figure 18. Comparison of RMSE of VSLAM algorithm ATE in dynamic environment.

Figure 24

Table VII. Real-time comparison of VSLAM algorithm in dynamic environment (ms).

Figure 25

Figure 19. 3D point cloud map under official public data.

Figure 26

Figure 20. 3D point cloud map constructed in the laboratory environment.