Hostname: page-component-cb9f654ff-mwwwr Total loading time: 0 Render date: 2025-09-06T08:07:08.292Z Has data issue: false hasContentIssue false

Learning “where to look” for visual exploration of autonomous mobile robot in indoor unknown environments

Published online by Cambridge University Press:  04 September 2025

Sheng Jin
Affiliation:
Institute of Robotics and Autonomous Systems, Tianjin Key Laboratory of Intelligent Unmanned Swarm Technology and System, School of Electrical and Information Engineering, Tianjin University, Tianjin, China Advanced Perception and Intelligent Equipment Engineering Research Center of Jiangsu Province, School of Smart Manufacturing and Intelligent Transportation, Suzhou City University, Suzhou, China
Shuai-Dong Yang
Affiliation:
Advanced Perception and Intelligent Equipment Engineering Research Center of Jiangsu Province, School of Smart Manufacturing and Intelligent Transportation, Suzhou City University, Suzhou, China
Hao-Yu Wang
Affiliation:
Institute of Robotics and Autonomous Systems, Tianjin Key Laboratory of Intelligent Unmanned Swarm Technology and System, School of Electrical and Information Engineering, Tianjin University, Tianjin, China
Chen-Yang Zhang
Affiliation:
School of Civil Engineering and Architecture, Changzhou Institute of Technology, Changzhou, China School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Qing-Hao Meng*
Affiliation:
Institute of Robotics and Autonomous Systems, Tianjin Key Laboratory of Intelligent Unmanned Swarm Technology and System, School of Electrical and Information Engineering, Tianjin University, Tianjin, China
*
Corresponding author: Qing-Hao Meng; Email: qh_meng@tju.edu.cn

Abstract

Visual exploration is a task in which a camera-equipped robot seeks to efficiently visit all navigable areas of an environment within the shortest possible time. Most existing visual exploration methods rely on a static camera fixed to the robot’s body to control its own movements. However, coupling the orientation of camera with robot’s body limits the extra degrees of freedom to obtain more visual information. In this work, we adjust the camera orientation during robot motion by using a novel camera view planning (CVP) policy to improve the exploration efficiency. Specifically, we reformulate the CVP problem as a reinforcement learning problem. However, two new challenges need to be addressed: 1) determining how to learn an effective CVP policy in complex indoor environments and 2) figuring out how to synchronize it with the robot motion. To solve the above issues, we create a reward function considering factors such as exploration area, observed semantic objects, and the motion conflicts between the camera and the robot’s body. Moreover, to better coordinate the policies of the camera and the robot’s body, the CVP policy takes the body actions and the egocentric 2D spatial maps with exploration, occupancy, and trajectory information into account to make motion decisions. Experimental results show that after using the proposed CVP policy, the exploration area is expanded by 21.72% and 25.6% on average in the small-scale indoor scene with few structured obstacles and large-scale indoor scene with cluttered obstacles, respectively.

Information

Type
Research Article
Copyright
© The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Ye, J., Batra, D., Wijmans, E. and Das, A.. “Auxiliary Tasks Speed Up Learning Point Goal Navigation.” In: Conference on Robot Learning (CoRL) (2020) pp. 498516.Google Scholar
Cai, J., Yan, F., Shi, Y., Zhang, M. and Guo, L., “Autonomous robot navigation based on a hierarchical cognitive model,” Robotica 41(2), 690712 (2023).10.1017/S0263574722001539CrossRefGoogle Scholar
Zhou, B., Yi, J., Zhang, X., Chen, L., Yang, D., Han, F. and Zhang, H., “An autonomous navigation approach for unmanned vehicle in outdoor unstructured terrain with dynamic and negative obstacles,” Robotica 40(8), 28312854 (2022).10.1017/S0263574721001983CrossRefGoogle Scholar
Chaplot, D. S., Gandhi, D., Gupta, A. and Salakhutdinov, R., “Object Goal Navigation Using Goal-Oriented Semantic Exploration,” In: 2020 Neural Information Processing Systems (NIPS) (MIT Press, 2020) pp. 42474258.Google Scholar
Gupta, S., Davidson, J., Levine, S., Sukthankar, R. and Malik, J.. “Cognitive mapping and planning for visual navigation.” In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2017) pp. 26162625.Google Scholar
Gutiérrez-Álvarez, C., Ríos-Navarro, P., Flor-Rodríguez-Rabadán, R., Acevedo-Rodríguez, F. J. and López-Sastre, R. J., “Visual semantic navigation with real robots,” Appl. Intell. 55, 206 (2025).10.1007/s10489-024-06115-4CrossRefGoogle Scholar
Zhou, F., Liu, H., Zhao, H. and Liang, L., “Long-term object search using incremental scene graph updating,” Robotica 41(3), 962975 (2023).CrossRefGoogle Scholar
Chaplot, D. S., Salakhutdinov, R., Gupta, A. and Gupta, S.. “Neural topological slam for visual navigation.” In: 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2020) pp. 1287512884.Google Scholar
Wu, Y., Wu, Y., Tamar, A., Russell, S., Gkioxari, G. and Tian, Y.. “Bayesian relational memory for semantic visual navigation.” In: 2019 IEEE International Conference on Computer Vision (ICCV), IEEE (2019) pp. 27692779.Google Scholar
Ramakrishnan, S. K., Jayaraman, D. and Grauman, K., “An exploration of embodied visual exploration,” Int. J. Comput. Vis. 129, 16161649 (2021).10.1007/s11263-021-01437-zCrossRefGoogle Scholar
Li, Y., Debnath, A., Stein, G. J. and Košecká, J.. “Learning-augmented model-based planning for visual exploration.” In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE (2023) pp. 51655171.Google Scholar
Bigazzi, R., Cornia, M., Cascianelli, S., Baraldi, L. and Cucchiara, R.. “Embodied agents for efficient exploration and smart scene description.” In: 2023 IEEE International Conference on Robotics and Automation (ICRA), IEEE (2023) pp. 60576064.Google Scholar
Yamauchi, B.. “A frontier-based approach for autonomous exploration.” In: 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation, IEEE (1997) pp. 146151.Google Scholar
Dai, A., Papatheodorou, S., Funk, N., Tzoumanikas, D. and Leutenegger, S.. “Fast frontier-based information-driven autonomous exploration with an MAV.” In: 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE (2020) pp. 95709576.Google Scholar
Lopez-Soriano, S., ““Plug-and-Play” inventory robots: Autonomous itinerary planning through autonomous waypoint generation,” IEEE Internet Things J. 11, 17111718 (2023).10.1109/JIOT.2023.3290395CrossRefGoogle Scholar
Liu, J., Wang, C., Chi, W., Chen, G. and Sun, L., “Estimated path information gain-based robot exploration under perceptual uncertainty,” Robotica 40(8), 27482764 (2022).10.1017/S0263574721001946CrossRefGoogle Scholar
Pathak, D., Agrawal, P., Efros, A. A. and Darrell, T., “Curiosity-driven exploration by self-supervised prediction.” In: 2017 International Conference on Machine Learning (2017) pp. 27782787.Google Scholar
Burda, Y., Edwards, H., Pathak, D., Storkey, A., Darrell, T. and Efros, A. A., “Large-scale study of curiosity-driven learning,” (2018). arXiv preprint arXiv: 1808.04355.Google Scholar
Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D. and Munos, R., “Unifying Count-Based Exploration and Intrinsic Motivation,” In: 2016 Neural Information Processing Systems (NIPS) (MIT Press, 2016) pp. 14791487.Google Scholar
Savinov, N., Raichuk, A., Marinier, R., Vincent, D., Pollefeys, M., Lillicrap, T. and Gelly, S., “Episodic curiosity through reachability,” arXiv preprint arXiv: 1810.02274.Google Scholar
Chen, T., Gupta, S. and Gupta, A.. “Learning exploration policies for navigation.” In: 2019 International Conference on Learning Representations (ICLR) (2019).Google Scholar
Jayaraman, D. and Grauman, K.. “Learning to look around: Intelligently exploring unseen environments for unknown tasks.” In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2018) pp. 12381247.Google Scholar
Ramakrishnan, S. K., Jayaraman, D. and Grauman, K., “Emergence of exploratory look-around behaviors through active observation completion,” Sci. Robot. 4(30), (2019). https://doi.org/10.1126/scirobotics.aaw6326.CrossRefGoogle ScholarPubMed
Seifi, S. and Tuytelaars, T., “Where to look next: Unsupervised active visual exploration on 360° input (2019). arXiv preprint: arXiv: 1909.10304.Google Scholar
Khlif, N., Nahla, K. and Safya, B., “Reinforcement learning with modified exploration strategy for mobile robot path planning,” Robotica 41(9), 26882702 (2023).10.1017/S0263574723000607CrossRefGoogle Scholar
Savva, M., Kadian, A., Maksymets, O., Zhao, Y., Wijmans, E., Jain, B., Straub, J., Liu, J., Koltun, V. and Malik, J.. “Habitat: A platform for embodied ai research.” In: 2019 IEEE/CVF International Conference on Computer Vision (CVPR), IEEE (2019) pp. 93399347.Google Scholar
Holz, D., Basilico, N., Amigoni, F. and Behnke, S.. “Evaluating the efficiency of frontier-based exploration strategies.” In: 2010 International Symposium on Robotics (ISR) and 2010 German Conference on Robotics (ROBOTIK), IEEE (2010) pp. 3643.Google Scholar
Dornhege, C. and Kleiner, A.. “A frontier-void-based approach for autonomous exploration in 3D.” In: 2011 IEEE International Symposium on Safety, Security, and Rescue Robotics, IEEE (2011) pp. 351356.Google Scholar
González-Banos, H. H. and Latombe, J.-C., “Navigation strategies for exploring indoor environments,” Int. J. Robot. Res. 21(10-11), 829848 (2002).10.1177/0278364902021010834CrossRefGoogle Scholar
Dean, V., Tulsiani, S. and Gupta, A., “See, Hear, Explore: Curiosity Via Audio-Visual Association,” In: 2016 Neural Information Processing Systems (NIPS) (MIT Press, 2016) pp. 1496114972.Google Scholar
Chaplot, D. S., Gandhi, D., Gupta, S., Gupta, A. and Salakhutdinov, R.. “Learning to explore using active neural slam.” In: 2020 International Conference on Learning Representations (ICLR) (2020).Google Scholar
Ramakrishnan, S. K., Al-Halah, Z. and Grauman, K.. “Occupancy anticipation for efficient exploration and navigation.” In: 2020 European Conference on Computer Vision (ECCV) (Springer, 2020) pp. 400418.10.1007/978-3-030-58558-7_24CrossRefGoogle Scholar
Chen, C., Majumder, S., Al-Halah, Z., Gao, R., Ramakrishnan, S. K. and Grauman, K.. “Learning to set waypoints for audio-visual navigation.” In: 2020 International Conference on Learning Representations (ICLR) (2021).Google Scholar
Liu, S., Suganuma, M. and Okatani, T., “Symmetry-aware neural architecture for embodied visual navigation,” Int. J. Comput. Vision 132, 10911107 (2023).10.1007/s11263-023-01909-4CrossRefGoogle Scholar
Niroui, F., Zhang, K., Kashino, Z. and Nejat, G., “Deep reinforcement learning robot for search and rescue applications: Exploration in unknown cluttered environments,” IEEE Robot. Autom. Lett. 4, 610617 (2019).10.1109/LRA.2019.2891991CrossRefGoogle Scholar
Shrestha, R., Tian, F.-P., Feng, W., Tan, P. and Vaughan, R.. “Learned map prediction for enhanced mobile robot exploration.” In: 2019 IEEE International Conference on Robotics and Automation (ICRA), IEEE (2019) pp. 11971204.Google Scholar
Zwecher, E., Iceland, E., Levy, S. R., Hayoun, S. Y., Gal, O. and Barel, A.. “Integrating deep reinforcement and supervised learning to expedite indoor mapping.” In: 2022 International Conference on Robotics and Automation (ICRA), IEEE (2022) pp. 1054210548.Google Scholar
Wu, S., Sun, W., Long, P., Huang, H., Cohen-Or, D., Gong, M., Deussen, O. and Chen, B., “Quality-driven poisson-guided autoscanning,” ACM Trans. Graph. 33, 203 (2014).Google Scholar
Mendoza, M., Vasquez-Gomez, J. I., Taud, H., Sucar, L. E. and Reta, C., “Supervised learning of the next-best-view for 3D object reconstruction,” Pattern Recogn. Lett. 133, 224231 (2020).Google Scholar
Kaba, M. D., Uzunbas, M. G. and Lim, S. N.. “A reinforcement learning approach to the view planning problem.” In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2017) pp. 69336941.Google Scholar
Chen, J., Liu, H., Zhang, Q. and He, S., “Orientation optimization for full-view coverage using rotatable camera sensors,” IEEE Internet Things J. 6, 1050810518 (2019).10.1109/JIOT.2019.2939431CrossRefGoogle Scholar
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X. and Xiao, J.. “3D shapenets: A deep representation for volumetric shapes.” In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2015) pp. 19121920.Google Scholar
Johns, E., Leutenegger, S. and Davison, A. J.. “Pairwise decomposition of image sequences for active multi-view recognition.” In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2016) pp. 38133822.Google Scholar
Dai, X.-Y., Meng, Q.-H., Jin, S. and Liu, Y.-B., “Camera view planning based on generative adversarial imitation learning in indoor active exploration,” Appl. Soft Comput. 129, 109621 (2022).10.1016/j.asoc.2022.109621CrossRefGoogle Scholar
Mur-Artal, R. and Tardós, T. D., “Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,” IEEE Trans. Robot. 33, 12551262 (2017).10.1109/TRO.2017.2705103CrossRefGoogle Scholar
Hornung, A., Wurm, K. M., Bennewitz, M., Stachniss, C. and Burgard, W., “OctoMap: An efficient probabilistic 3D mapping framework based on octrees,” Auton. Robot. 34, 189206 (2013).10.1007/s10514-012-9321-0CrossRefGoogle Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. and Klimov, O., “Proximal policy optimization algorithms (2017). arXiv preprint arXiv: 1707.06347.Google Scholar
Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A. and Zhang, Y., “Matterport3D: Learning from RGB-D data in indoor environments.” In: 2017 International Conference on 3D Vision (3DV), IEEE (2017) pp. 667676.Google Scholar
Zeng, S., Chang, X., Xie, M., Liu, X., Bai, Y., Pan, Z., Xu, M. and Wei, X., “FutureSightDrive: Thinking visually with spatio-temporal CoT for autonomous driving (2025). arXiv preprint arXiv: 2505.17685.Google Scholar