Learning “where to look” for visual exploration of autonomous mobile robot in indoor unknown environments

Sheng Jin; Shuai-Dong Yang; Hao-Yu Wang; Chen-Yang Zhang; Qing-Hao Meng

doi:10.1017/S026357472510221X

Learning “where to look” for visual exploration of autonomous mobile robot in indoor unknown environments

Published online by Cambridge University Press: 04 September 2025

Chen-Yang Zhang and

Sheng Jin: Affiliation:
Institute of Robotics and Autonomous Systems, Tianjin Key Laboratory of Intelligent Unmanned Swarm Technology and System, School of Electrical and Information Engineering, Tianjin University, Tianjin, China Advanced Perception and Intelligent Equipment Engineering Research Center of Jiangsu Province, School of Smart Manufacturing and Intelligent Transportation, Suzhou City University, Suzhou, China
Shuai-Dong Yang: Affiliation:
Advanced Perception and Intelligent Equipment Engineering Research Center of Jiangsu Province, School of Smart Manufacturing and Intelligent Transportation, Suzhou City University, Suzhou, China
Hao-Yu Wang: Affiliation:
Institute of Robotics and Autonomous Systems, Tianjin Key Laboratory of Intelligent Unmanned Swarm Technology and System, School of Electrical and Information Engineering, Tianjin University, Tianjin, China
Chen-Yang Zhang: Affiliation:
School of Civil Engineering and Architecture, Changzhou Institute of Technology, Changzhou, China School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Qing-Hao Meng*: Affiliation:
Institute of Robotics and Autonomous Systems, Tianjin Key Laboratory of Intelligent Unmanned Swarm Technology and System, School of Electrical and Information Engineering, Tianjin University, Tianjin, China
*: Corresponding author: Qing-Hao Meng; Email: qh_meng@tju.edu.cn

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Visual exploration is a task in which a camera-equipped robot seeks to efficiently visit all navigable areas of an environment within the shortest possible time. Most existing visual exploration methods rely on a static camera fixed to the robot’s body to control its own movements. However, coupling the orientation of camera with robot’s body limits the extra degrees of freedom to obtain more visual information. In this work, we adjust the camera orientation during robot motion by using a novel camera view planning (CVP) policy to improve the exploration efficiency. Specifically, we reformulate the CVP problem as a reinforcement learning problem. However, two new challenges need to be addressed: 1) determining how to learn an effective CVP policy in complex indoor environments and 2) figuring out how to synchronize it with the robot motion. To solve the above issues, we create a reward function considering factors such as exploration area, observed semantic objects, and the motion conflicts between the camera and the robot’s body. Moreover, to better coordinate the policies of the camera and the robot’s body, the CVP policy takes the body actions and the egocentric 2D spatial maps with exploration, occupancy, and trajectory information into account to make motion decisions. Experimental results show that after using the proposed CVP policy, the exploration area is expanded by 21.72% and 25.6% on average in the small-scale indoor scene with few structured obstacles and large-scale indoor scene with cluttered obstacles, respectively.

Keywords

motion planning mobile robots visual exploration camera view planning deep reinforcement learning

Information

Type: Research Article
Information: Robotica , Volume 43 , Issue 9 , September 2025 , pp. 3257 - 3276

DOI: https://doi.org/10.1017/S026357472510221X [Opens in a new window]
Copyright: © The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Ye, J., Batra, D., Wijmans, E. and Das, A.. “Auxiliary Tasks Speed Up Learning Point Goal Navigation.” In: Conference on Robot Learning (CoRL) (2020) pp. 498–516.Google Scholar

Cai, J., Yan, F., Shi, Y., Zhang, M. and Guo, L., “Autonomous robot navigation based on a hierarchical cognitive model,” Robotica 41(2), 690–712 (2023).CrossRef Google Scholar

Zhou, B., Yi, J., Zhang, X., Chen, L., Yang, D., Han, F. and Zhang, H., “An autonomous navigation approach for unmanned vehicle in outdoor unstructured terrain with dynamic and negative obstacles,” Robotica 40(8), 2831–2854 (2022).CrossRef Google Scholar

Chaplot, D. S., Gandhi, D., Gupta, A. and Salakhutdinov, R., “Object Goal Navigation Using Goal-Oriented Semantic Exploration,” In: 2020 Neural Information Processing Systems (NIPS) (MIT Press, 2020) pp. 4247–4258.Google Scholar

Gupta, S., Davidson, J., Levine, S., Sukthankar, R. and Malik, J.. “Cognitive mapping and planning for visual navigation.” In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2017) pp. 2616–2625.Google Scholar

Gutiérrez-Álvarez, C., Ríos-Navarro, P., Flor-Rodríguez-Rabadán, R., Acevedo-Rodríguez, F. J. and López-Sastre, R. J., “Visual semantic navigation with real robots,” Appl. Intell. 55, 206 (2025).CrossRef Google Scholar

Zhou, F., Liu, H., Zhao, H. and Liang, L., “Long-term object search using incremental scene graph updating,” Robotica 41(3), 962–975 (2023).CrossRef Google Scholar

Chaplot, D. S., Salakhutdinov, R., Gupta, A. and Gupta, S.. “Neural topological slam for visual navigation.” In: 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2020) pp. 12875–12884.Google Scholar

Wu, Y., Wu, Y., Tamar, A., Russell, S., Gkioxari, G. and Tian, Y.. “Bayesian relational memory for semantic visual navigation.” In: 2019 IEEE International Conference on Computer Vision (ICCV), IEEE (2019) pp. 2769–2779.Google Scholar

Ramakrishnan, S. K., Jayaraman, D. and Grauman, K., “An exploration of embodied visual exploration,” Int. J. Comput. Vis. 129, 1616–1649 (2021).CrossRef Google Scholar

Li, Y., Debnath, A., Stein, G. J. and Košecká, J.. “Learning-augmented model-based planning for visual exploration.” In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE (2023) pp. 5165–5171.Google Scholar

Bigazzi, R., Cornia, M., Cascianelli, S., Baraldi, L. and Cucchiara, R.. “Embodied agents for efficient exploration and smart scene description.” In: 2023 IEEE International Conference on Robotics and Automation (ICRA), IEEE (2023) pp. 6057–6064.Google Scholar

Yamauchi, B.. “A frontier-based approach for autonomous exploration.” In: 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation, IEEE (1997) pp. 146–151.Google Scholar

Dai, A., Papatheodorou, S., Funk, N., Tzoumanikas, D. and Leutenegger, S.. “Fast frontier-based information-driven autonomous exploration with an MAV.” In: 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE (2020) pp. 9570–9576.Google Scholar

Lopez-Soriano, S., ““Plug-and-Play” inventory robots: Autonomous itinerary planning through autonomous waypoint generation,” IEEE Internet Things J. 11, 1711–1718 (2023).CrossRef Google Scholar

Liu, J., Wang, C., Chi, W., Chen, G. and Sun, L., “Estimated path information gain-based robot exploration under perceptual uncertainty,” Robotica 40(8), 2748–2764 (2022).CrossRef Google Scholar

Pathak, D., Agrawal, P., Efros, A. A. and Darrell, T., “Curiosity-driven exploration by self-supervised prediction.” In: 2017 International Conference on Machine Learning (2017) pp. 2778–2787.Google Scholar

Burda, Y., Edwards, H., Pathak, D., Storkey, A., Darrell, T. and Efros, A. A., “Large-scale study of curiosity-driven learning,” (2018). arXiv preprint arXiv: 1808.04355.Google Scholar

Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D. and Munos, R., “Unifying Count-Based Exploration and Intrinsic Motivation,” In: 2016 Neural Information Processing Systems (NIPS) (MIT Press, 2016) pp. 1479–1487.Google Scholar

Savinov, N., Raichuk, A., Marinier, R., Vincent, D., Pollefeys, M., Lillicrap, T. and Gelly, S., “Episodic curiosity through reachability,” arXiv preprint arXiv: 1810.02274.Google Scholar

Chen, T., Gupta, S. and Gupta, A.. “Learning exploration policies for navigation.” In: 2019 International Conference on Learning Representations (ICLR) (2019).Google Scholar

Jayaraman, D. and Grauman, K.. “Learning to look around: Intelligently exploring unseen environments for unknown tasks.” In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2018) pp. 1238–1247.Google Scholar

Ramakrishnan, S. K., Jayaraman, D. and Grauman, K., “Emergence of exploratory look-around behaviors through active observation completion,” Sci. Robot. 4(30), (2019). https://doi.org/10.1126/scirobotics.aaw6326.CrossRef Google Scholar PubMed

Seifi, S. and Tuytelaars, T., “Where to look next: Unsupervised active visual exploration on 360° input (2019). arXiv preprint: arXiv: 1909.10304.Google Scholar

Khlif, N., Nahla, K. and Safya, B., “Reinforcement learning with modified exploration strategy for mobile robot path planning,” Robotica 41(9), 2688–2702 (2023).CrossRef Google Scholar

Savva, M., Kadian, A., Maksymets, O., Zhao, Y., Wijmans, E., Jain, B., Straub, J., Liu, J., Koltun, V. and Malik, J.. “Habitat: A platform for embodied ai research.” In: 2019 IEEE/CVF International Conference on Computer Vision (CVPR), IEEE (2019) pp. 9339–9347.Google Scholar

Holz, D., Basilico, N., Amigoni, F. and Behnke, S.. “Evaluating the efficiency of frontier-based exploration strategies.” In: 2010 International Symposium on Robotics (ISR) and 2010 German Conference on Robotics (ROBOTIK), IEEE (2010) pp. 36–43.Google Scholar

Dornhege, C. and Kleiner, A.. “A frontier-void-based approach for autonomous exploration in 3D.” In: 2011 IEEE International Symposium on Safety, Security, and Rescue Robotics, IEEE (2011) pp. 351–356.Google Scholar

González-Banos, H. H. and Latombe, J.-C., “Navigation strategies for exploring indoor environments,” Int. J. Robot. Res. 21(10-11), 829–848 (2002).CrossRef Google Scholar

Dean, V., Tulsiani, S. and Gupta, A., “See, Hear, Explore: Curiosity Via Audio-Visual Association,” In: 2016 Neural Information Processing Systems (NIPS) (MIT Press, 2016) pp. 14961–14972.Google Scholar

Chaplot, D. S., Gandhi, D., Gupta, S., Gupta, A. and Salakhutdinov, R.. “Learning to explore using active neural slam.” In: 2020 International Conference on Learning Representations (ICLR) (2020).Google Scholar

Ramakrishnan, S. K., Al-Halah, Z. and Grauman, K.. “Occupancy anticipation for efficient exploration and navigation.” In: 2020 European Conference on Computer Vision (ECCV) (Springer, 2020) pp. 400–418.CrossRef Google Scholar

Chen, C., Majumder, S., Al-Halah, Z., Gao, R., Ramakrishnan, S. K. and Grauman, K.. “Learning to set waypoints for audio-visual navigation.” In: 2020 International Conference on Learning Representations (ICLR) (2021).Google Scholar

Liu, S., Suganuma, M. and Okatani, T., “Symmetry-aware neural architecture for embodied visual navigation,” Int. J. Comput. Vision 132, 1091–1107 (2023).CrossRef Google Scholar

Niroui, F., Zhang, K., Kashino, Z. and Nejat, G., “Deep reinforcement learning robot for search and rescue applications: Exploration in unknown cluttered environments,” IEEE Robot. Autom. Lett. 4, 610–617 (2019).CrossRef Google Scholar

Shrestha, R., Tian, F.-P., Feng, W., Tan, P. and Vaughan, R.. “Learned map prediction for enhanced mobile robot exploration.” In: 2019 IEEE International Conference on Robotics and Automation (ICRA), IEEE (2019) pp. 1197–1204.Google Scholar

Zwecher, E., Iceland, E., Levy, S. R., Hayoun, S. Y., Gal, O. and Barel, A.. “Integrating deep reinforcement and supervised learning to expedite indoor mapping.” In: 2022 International Conference on Robotics and Automation (ICRA), IEEE (2022) pp. 10542–10548.Google Scholar

Wu, S., Sun, W., Long, P., Huang, H., Cohen-Or, D., Gong, M., Deussen, O. and Chen, B., “Quality-driven poisson-guided autoscanning,” ACM Trans. Graph. 33, 203 (2014).CrossRef Google Scholar

Mendoza, M., Vasquez-Gomez, J. I., Taud, H., Sucar, L. E. and Reta, C., “Supervised learning of the next-best-view for 3D object reconstruction,” Pattern Recogn. Lett. 133, 224–231 (2020).CrossRef Google Scholar

Kaba, M. D., Uzunbas, M. G. and Lim, S. N.. “A reinforcement learning approach to the view planning problem.” In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2017) pp. 6933–6941.Google Scholar

Chen, J., Liu, H., Zhang, Q. and He, S., “Orientation optimization for full-view coverage using rotatable camera sensors,” IEEE Internet Things J. 6, 10508–10518 (2019).CrossRef Google Scholar

Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X. and Xiao, J.. “3D shapenets: A deep representation for volumetric shapes.” In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2015) pp. 1912–1920.Google Scholar

Johns, E., Leutenegger, S. and Davison, A. J.. “Pairwise decomposition of image sequences for active multi-view recognition.” In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2016) pp. 3813–3822.Google Scholar

Dai, X.-Y., Meng, Q.-H., Jin, S. and Liu, Y.-B., “Camera view planning based on generative adversarial imitation learning in indoor active exploration,” Appl. Soft Comput. 129, 109621 (2022).CrossRef Google Scholar

Mur-Artal, R. and Tardós, T. D., “Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,” IEEE Trans. Robot. 33, 1255–1262 (2017).CrossRef Google Scholar

Hornung, A., Wurm, K. M., Bennewitz, M., Stachniss, C. and Burgard, W., “OctoMap: An efficient probabilistic 3D mapping framework based on octrees,” Auton. Robot. 34, 189–206 (2013).CrossRef Google Scholar

Schulman, J., Wolski, F., Dhariwal, P., Radford, A. and Klimov, O., “Proximal policy optimization algorithms (2017). arXiv preprint arXiv: 1707.06347.Google Scholar

Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A. and Zhang, Y., “Matterport3D: Learning from RGB-D data in indoor environments.” In: 2017 International Conference on 3D Vision (3DV), IEEE (2017) pp. 667–676.Google Scholar

Zeng, S., Chang, X., Xie, M., Liu, X., Bai, Y., Pan, Z., Xu, M. and Wei, X., “FutureSightDrive: Thinking visually with spatio-temporal CoT for autonomous driving (2025). arXiv preprint arXiv: 2505.17685.Google Scholar

Article contents

Learning “where to look” for visual exploration of autonomous mobile robot in indoor unknown environments

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests