A robot should be able to estimate an accurate and dense 3D model of its environment (a map), along with its pose relative to it, all of it in real time, in order to be able to navigate autonomously without collisions.
As the robot moves from its starting position and the estimated map grows, the computational and memory footprint of a dense 3D map increases and might exceed the robot capabilities in a short time. However, a global map is still needed to maintain its consistency and plan for distant goals, possibly out of the robot field of view.
In this work, we address such problem by proposing a real-time stereo mapping pipeline, feasible for standard CPUs, which is locally dense and globally sparse and accurate. Our algorithm is based on a graph relating poses and salient visual points, in order to maintain a long-term accuracy with a small cost. Within such framework, we propose an efficient dense fusion of several stereo depths in the locality of the current robot pose.
We evaluate the performance and the accuracy of our algorithm in the public datasets of Tsukuba and KITTI, and demonstrate that it outperforms single-view stereo depth. We release the code as open-source, in order to facilitate the system use and comparisons.