Skip to main content

A Q-learning approach based on human reasoning for navigation in a dynamic environment

  • Rupeng Yuan (a1), Fuhai Zhang (a1), Yu Wang (a1), Yili Fu (a1) and Shuguo Wang (a1)...

A Q-learning approach is often used for navigation in static environments where state space is easy to define. In this paper, a new Q-learning approach is proposed for navigation in dynamic environments by imitating human reasoning. As a model-free method, a Q-learning method does not require the environmental model in advance. The state space and the reward function in the proposed approach are defined according to human perception and evaluation, respectively. Specifically, approximate regions instead of accurate measurements are used to define states. Moreover, due to the limitation of robot dynamics, actions for each state are calculated by introducing a dynamic window that takes robot dynamics into account. The conducted tests show that the obstacle avoidance rate of the proposed approach can reach 90.5% after training, and the robot can always operate below the dynamics limitation.

Corresponding author
*Corresponding author. E-mail:,
Hide All
1. Minguez, J. and Montano, L., “Sensor-based robot motion generation in unknown, dynamic and troublesome scenarios,” Robot. Auton. Syst. 52 (4), 290311 (2005).
2. Xidias, E., Zacharia, P. and Nearchou, A., “Path planning and scheduling for a fleet of autonomous vehicles,” Robotica 34 (10), 22572273 (2016).
3. Zhang, L., “Self-adaptive Monte Carlo localization for mobile robots using range sensors,” Robotica 30 (2), 229244 (2009).
4. Chen, X., Xu, Y., Li, Q., Tang, J. and Shen, C., “Improving ultrasonic-based seamless navigation for indoor mobile robots utilizing EKF and LS-SVM,” Measurement 92, 243251 (2016).
5. Zhuang, Y., Syed, Z., Li, Y. and El-Sheimy, N., “Evaluation of Two WiFi positioning systems based on autonomous crowdsourcing of handheld devices for indoor navigation,” IEEE Trans. Mob. Comput. 15 (8), 19821995 (2016).
6. Cadena, C. et al., “Simultaneous localization and mapping: Present, future, and the robust-perception age,” IEEE Trans. Robot. 30 (6), 13091332 (2016).
7. Hu, X., Chen, L., Tang, B., Cao, D. and He, H., “Dynamic path planning for autonomous driving on various roads with avoidance of static and moving obstacles,” Mech. Syst. Signal Process. 100, 482500 (2018).
8. Khatib, O., “Real-time obstacle avoidance for manipulators and mobile robots,” Int. J. Robot. Res. 5 (5), 500505 (1986).
9. Ge, S. S. and Cui, Y. J., “New potential functions for mobile robot path planning,” IEEE Trans. Robot. Autom. 16 (5), 615620 (2000).
10. Ge, S. S. and Cui, Y. J., “Dynamic motion planning for mobile robots using potential field method,” Auton. Robots 13 (3), 207222 (2002).
11. Chen, Y., Peng, H. and Grizzle, J., “Obstacle avoidance for low-speed autonomous vehicles with barrier function,” IEEE Trans. Control Syst. Technol. 26 (1), 194206 (2018).
12. Lavalle, S., “Rapidly-exploring random trees: A new tool for path planning,” Res. Report 1, 293308 (1998).
13. Richards, Arthur et al., “Spacecraft trajectory planning with avoidance constraints using mixed-integer linear programming,” J. Guidance Control Dynamics 25 (4), 755764 (2012).
14. Yucong, Lin and Saripalli, S., “Path planning using 3D Dubins Curve for Unmanned Aerial Vehicles,” Proceedings of the International Conference on Unmanned Aircraft Systems IEEE, Orlando, FL, USA (2014) pp. 296–304.
15. Duguleana, M. and Mogan, G., “Neural networks based reinforcement learning for mobile robots obstacle avoidance,” Expert Syst. Appl. Int. J. 62, 104115 (2016).
16. Jordan, M. I. and Mitchell, T. M., “Machine learning: Trends, perspectives, and prospects,” Science 349 (6245), 255260 (2015).
17. Tai, L., Li, S. and Liu, M., “A Deep-Network Solution Towards Model-Less Obstacle Avoidance,” Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, South Korea (Oct. 2016) pp. 2759–2764.
18. Findi, A. H. M., Marhaban, M. H., Kamil, R. and Hassan, M. K., “Collision prediction based genetic network programming-reinforcement learning for mobile robot navigation in unknown dynamic environments,” J. Electr. Eng. Technol. 12, (2017).
19. Watkins, C. J. C. H., “Learning from delayed rewards,” Robot. Auton. Syst. 15 (4), 233235 (1989).
20. Xu, X., Zuo, L. and Huang, Z., “Reinforcement learning algorithms with function approximation: Recent advances and applications,” Information Sci. 261, 131 (2014).
21. Gu, D. and Hu, H., “Teaching robots to plan through Q-learning,” Robotica 23 (2), 139147 (2005).
22. Smart, W. D. and Kaelbling, L. P., “Effective Reinforcement Learning for Mobile Robots,” Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 4, Washington, DC, USA (May 2002) pp. 3404–3410.
23. Macek, K., Petrovic, I. and Peric, N., “A Reinforcement Learning Approach to Obstacle Avoidance of Mobile Robots,” Proceedings of the International Workshop on Advanced Motion Control, Maribor, Slovenia (2002) pp. 462–466.
24. Lee, J., Kim, T. and Kim, H. J., “Autonomous Lane Keeping based on Approximate Q-learning,” Proceedings of the International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), Jeju, South Korea (July 2017) pp. 402–405.
25. Jaradat, M. A. K., Al-Rousan, M. and Quadan, L., “Reinforcement based mobile robot navigation in dynamic environment,” Robot. Comput.-Integr. Manuf. 27 (1), 135149 (2011).
26. Fox, D., Burgard, W. and Thrun, S., “The dynamic window approach to collision avoidance,” IEEE Robot. Autom. Mag. 4 (1), 2333 (1997).
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

  • ISSN: 0263-5747
  • EISSN: 1469-8668
  • URL: /core/journals/robotica
Please enter your name
Please enter a valid email address
Who would you like to send this to? *



Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed