Hostname: page-component-76fb5796d-2lccl Total loading time: 0 Render date: 2024-04-28T02:59:01.730Z Has data issue: false hasContentIssue false

Autonomous Vehicular Landings on the Deck of an Unmanned Surface Vehicle using Deep Reinforcement Learning

Published online by Cambridge University Press:  08 April 2019

Riccardo Polvara*
Affiliation:
Lincoln Centre for Autonomous Systems Research, School of Computer Science, College of Science, University of Lincoln, Brayford Pool, Lincoln LN6 7TS, UK
Sanjay Sharma
Affiliation:
Autonomous Marine Systems Research Group, School of Engineering, Faculty of Science and Engineering, University of Plymouth, Drake Circus, Plymouth PL4 8AA, UK E-mails: sanjay.sharma@plymouth.ac.uk, jian.wan@plymouth.ac.uk, a.manning@plymouth.ac.uk, r.sutton@plymouth.ac.uk
Jian Wan
Affiliation:
Autonomous Marine Systems Research Group, School of Engineering, Faculty of Science and Engineering, University of Plymouth, Drake Circus, Plymouth PL4 8AA, UK E-mails: sanjay.sharma@plymouth.ac.uk, jian.wan@plymouth.ac.uk, a.manning@plymouth.ac.uk, r.sutton@plymouth.ac.uk
Andrew Manning
Affiliation:
Autonomous Marine Systems Research Group, School of Engineering, Faculty of Science and Engineering, University of Plymouth, Drake Circus, Plymouth PL4 8AA, UK E-mails: sanjay.sharma@plymouth.ac.uk, jian.wan@plymouth.ac.uk, a.manning@plymouth.ac.uk, r.sutton@plymouth.ac.uk
Robert Sutton
Affiliation:
Autonomous Marine Systems Research Group, School of Engineering, Faculty of Science and Engineering, University of Plymouth, Drake Circus, Plymouth PL4 8AA, UK E-mails: sanjay.sharma@plymouth.ac.uk, jian.wan@plymouth.ac.uk, a.manning@plymouth.ac.uk, r.sutton@plymouth.ac.uk
*
*Corresponding author. E-mail: rpolvara@lincoln.ac.uk

Summary

Autonomous landing on the deck of a boat or an unmanned surface vehicle (USV) is the minimum requirement for increasing the autonomy of water monitoring missions. This paper introduces an end-to-end control technique based on deep reinforcement learning for landing an unmanned aerial vehicle on a visual marker located on the deck of a USV. The solution proposed consists of a hierarchy of Deep Q-Networks (DQNs) used as high-level navigation policies that address the two phases of the flight: the marker detection and the descending manoeuvre. Few technical improvements have been proposed to stabilize the learning process, such as the combination of vanilla and double DQNs, and a partitioned buffer replay. Simulated studies proved the robustness of the proposed algorithm against different perturbations acting on the marine vessel. The performances obtained are comparable with a state-of-the-art method based on template matching.

Type
Articles
Copyright
© Cambridge University Press 2019 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Altshuler, Y., Yanovsky, V., Wagner, I. A. and Bruckstein, A. M., “Efficient cooperative search of smart targets using UAV swarms,” Robotica 26(4), 551557 (2008).CrossRefGoogle Scholar
Polvara, R., Sharma, S., Sutton, R., Wan, J. and Manning, A., “Toward a Multi-agent System for Marine Observation,” In: Advances in Cooperative Robotics (World Scientific, London, UK, 2017) pp. 225232.Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al., “Human-level control through deep reinforcement learning,” Nature 518(7540), 529533 (2015).CrossRefGoogle ScholarPubMed
Levine, S., Finn, C., Darrell, T. and Abbeel, P., “End-to-end training of deep visuomotor policies,” J. Mach. Learn. Res. 17(39), 140 (2016).Google Scholar
Kong, W., Zhou, D., Zhang, D., and Zhang, J., “Vision-Based Autonomous Landing System for Unmanned Aerial Vehicle: A Survey,” 2014 International Conference on Multisensor Fusion and Information Integration for Intelligent Systems (MFI) (IEEE, 2014) pp. 18.CrossRefGoogle Scholar
Gautam, A., Sujit, P. and Saripalli, S., “A Survey of Autonomous Landing Techniques for UAVs,” 2014 International Conference on Unmanned Aircraft Systems (ICUAS) (IEEE, 2014) pp. 12101218.CrossRefGoogle Scholar
Gomez-Balderas, J. E., Salazar, S., Guerrero, J. A. and Lozano, R., “Vision-based autonomous hovering for a miniature quad-rotor,” Robotica 32(1), 4361 (2014).CrossRefGoogle Scholar
Saripalli, S. and Sukhatme, G., “Landing on a Moving Target Using an Autonomous Helicopter,” In: Field and Service Robotics (Springer, Berlin, Heidelberg, 2006) pp. 277286.CrossRefGoogle Scholar
Herissé, B., Hamel, T., Mahony, R. and Russotto, F.-X., “Landing a VTOL unmanned aerial vehicle on a moving platform using optical flow,” IEEE Trans. Rob. 28(1), 7789 (2012).CrossRefGoogle Scholar
Cesetti, A., Frontoni, E., Mancini, A., Zingaretti, P. and Longhi, S., “A Vision-Based Guidance System for UAV Navigation and Safe Landing Using Natural Landmarks,” Selected papers from the 2nd International Symposium on UAVs, Reno, Nevada, USA, June 8–10, 2009 (Springer, 2009) pp. 233257.CrossRefGoogle Scholar
Ruffier, F. and Franceschini, N., “Optic flow regulation in unsteady environments: A tethered MAV achieves terrain following and targeted landing over a moving platform,” J. Intell. Rob. Syst. 79(2), 275293 (2015).CrossRefGoogle Scholar
Kim, J., Jung, Y., Lee, D. and Shim, D. H., “Outdoor Autonomous Landing on a Moving Platform for Quadrotors Using an Omnidirectional Camera,” 2014 International Conference on Unmanned Aircraft Systems (ICUAS), Orlando, FL, USA (IEEE, 2014) pp. 12431252.CrossRefGoogle Scholar
Yakimenko, O. A., Kaminer, I. I., Lentz, W. J. and Ghyzel, P., “Unmanned aircraft navigation for shipboard landing using infrared vision,” IEEE Trans. Aerosp. Electron. Syst. 38(4), 11811200 (2002).CrossRefGoogle Scholar
Wenzel, K. E., Rosset, P. and Zell, A., “Low-cost visual tracking of a landing place and hovering flight control with a microcontroller,” J. Intell. Rob. Syst. 57(1), 297311 (2010).CrossRefGoogle Scholar
Theodore, C., Rowley, D., Ansar, A., Matthies, L., Goldberg, S., Hubbard, D. and Whalley, M., “Flight Trials of a Rotorcraft Unmanned Aerial Vehicle Landing Autonomously at Unprepared Sites,” Annual Forum Proceedings–American Helicopter Society, vol. 62, no. 2 (American Helicopter Society, Inc., Phoenix, AZ, USA, 2006) p. 1250.Google Scholar
Scherer, S., Chamberlain, L. and Singh, S., “Autonomous landing at unprepared sites by a full-scale helicopter,” Rob. Auton. Syst. 60(12), 15451562 (2012).CrossRefGoogle Scholar
Sereewattana, M., Ruchanurucks, M. and Siddhichai, S., “Depth Estimation of Markers for UAV Automatic Landing Control Using Stereo Vision with a Single Camera,” International Conference of Information and Communication Technology for Embedded System, Ayutthaya, Thailand (2014).Google Scholar
Shi, H. and Wang, H., “A Vision System for Landing an Unmanned Helicopter in a Complex Environment,” Sixth International Symposium on Multispectral Image Processing and Pattern Recognition (International Society for Optics and Photonics, Yichang, China, 2009) pp. 74 962G–74 962G.Google Scholar
Shakernia, O., Ma, Y., Koo, T. J. and Sastry, S., “Landing an unmanned air vehicle: Vision based motion estimation and nonlinear control,” Asian J. Control 1(3), 128145 (1999).CrossRefGoogle Scholar
Peters, J. and Schaal, S., “Reinforcement learning of motor skills with policy gradients,” Neural Networks 21(4), 682697 (2008).CrossRefGoogle ScholarPubMed
Cui, Y., Matsubara, T. and Sugimoto, K., “Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states,” Neural Networks 94, 1323 (2017).CrossRefGoogle ScholarPubMed
Kober, J., Bagnell, J. A. and Peters, J., “Reinforcement learning in robotics: A survey,” Int. J. Rob. Res. 32(11), 12381274 (2013).CrossRefGoogle Scholar
Bagnell, J. A. and Schneider, J. G., “Autonomous Helicopter Control Using Reinforcement Learning Policy Search Methods,” IEEE International Conference on Robotics and Automation, 2001. Proceedings 2001 ICRA, vol. 2 (IEEE, 2001) pp. 16151620.Google Scholar
Ng, A., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E. and Liang, E., “Autonomous Inverted Helicopter Flight via Reinforcement Learning,” In: Experimental Robotics IX (Springer, Berlin, Heidelberg, 2006) pp. 363372.CrossRefGoogle Scholar
Zhang, T., Kahn, G., Levine, S. and Abbeel, P., “Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search,” 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden (IEEE, 2016) pp. 528535.CrossRefGoogle Scholar
Fayek, H. M., Lech, M. and Cavedon, L., “Evaluating deep learning architectures for speech emotion recognition,” Neural Networks 92, 6068 (2017).CrossRefGoogle ScholarPubMed
Patacchiola, M. and Cangelosi, A., “Head pose estimation in the wild using convolutional neural networks and adaptive gradient methods,” Pattern Recognition 71, 132143 (2017).CrossRefGoogle Scholar
Pavel, M. S., Schulz, H. and Behnke, S., “Object class segmentation of RGB-D video using recurrent convolutional neural networks,” Neural Networks 88, 105113 (2017).CrossRefGoogle ScholarPubMed
Schmidhuber, J., “Deep learning in neural networks: An overview,” Neural Networks 61, 85117 (2015).CrossRefGoogle ScholarPubMed
Levine, S., Pastor, P., Krizhevsky, A., Ibarz, J. and Quillen, D., “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection,” Int. J. Rob. Res. 37(4-5), 421436 0278364917710318 (2016).CrossRefGoogle Scholar
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D. and Kavukcuoglu, K., “Asynchronous Methods for Deep Reinforcement Learning,” International Conference on Machine Learning, ICML, New York City, NY, USA (2016) pp. 19281937.Google Scholar
Glorot, X., Bordes, A. and Bengio, Y., “Deep Sparse Rectifier Neural Networks,” In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (Gordon, G., Dunson, D. and Dudík, M., eds.), vol. 15 (PMLR, 2011) pp. 315323.Google Scholar
WawrzyńSki, P. and Tanwani, A. K., “Autonomous reinforcement learning with experience replay,” Neural Networks 41, 156167 (2013).CrossRefGoogle ScholarPubMed
Van Hasselt, H., Guez, A. and Silver, D., “Deep Reinforcement Learning with Double q-Learning,” In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI, Phoenix, Arizona, 2016) pp. 20942100.Google Scholar
Thrun, S. and Schwartz, A., “Issues in Using Function Approximation for Reinforcement Learning,” Proceedings of the 1993 Connectionist Models Summer School (Lawrence Erlbaum, Hillsdale, NJ, 1993).Google Scholar
Schaul, T., Quan, J., Antonoglou, I. and Silver, D., “Prioritized experience replay,” arXiv preprint arXiv:1511.05952 (2015).Google Scholar
Narasimhan, K., Kulkarni, T. and Barzilay, R., “Language understanding for text-based games using deep reinforcement learning,” arXiv preprint arXiv:1506.08941 (2015).CrossRefGoogle Scholar
Barto, A. G. and Mahadevan, S., “Recent advances in hierarchical reinforcement learning,” Discrete Event Dyn. Syst. 13(4), 341379 (2003).CrossRefGoogle Scholar
Bellemare, M., Naddaf, Y., Veness, J. and Bowling, M., “The arcade learning environment: An evaluation platform for general agents,” Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina (2015).Google Scholar
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J. and Zaremba, W., “Openai gym,” arXiv preprint arXiv:1606.01540 (2016).Google Scholar
Koenig, N. and Howard, A., “Design and Use Paradigms for Gazebo, an Open-Source Multi-robot Simulator,” IEEE/RSJ International Conference on Intelligent Robots and Systems, (IROS 2004) Sendai, Japan. Proceedings, vol. 3 (IEEE, 2004) pp. 21492154.Google Scholar
Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T., Leibs, J., Wheeler, R. and Ng, A. Y., “ROS: An Open-Source Robot Operating System,” ICRA Workshop on Open Source Software, vol. 3, no. 3.2 (Kobe, Japan, 2009) p. 5.Google Scholar
Dias, J., Althoefer, K. and Lima, P. U., “Robot competitions: What did we learn? [Competitions],” IEEE Rob. Autom. Mag. 23(1), 1618 (2016).CrossRefGoogle Scholar
Tieleman, T. and Hinton, G., “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude,” COURSERA: Neural Networks Mach. Learn. 4, 2630 (2012).Google Scholar
Glorot, X. and Bengio, Y., “Understanding the Difficulty of Training Deep Feedforward Neural Networks,” In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (Teh, Y. W. and Titterington, M., eds.), vol. 9 (PMLR, 2010) pp. 249256.Google Scholar
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., et al., “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467 (2016).Google Scholar
Polvara, R., Sharma, S., Wan, J., Manning, A. and Sutton, R., “Vision-based autonomous landing of a quadrotor on the perturbed deck of an unmanned surface vehicle,” Drones 2(2), 15 (2018).CrossRefGoogle Scholar
Polvara, R., Patacchiola, M., Sharma, S., Wan, J., Manning, A., Sutton, R. and Cangelosi, A., “Autonomous quadrotor landing using deep reinforcement learning,” arXiv preprint arXiv:1709.03339 (2017).Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. and Klimov, O., “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347 (2017).Google Scholar
Osband, I., Russo, D. and Roy, B. Van, “(More) Efficient Reinforcement Learning via Posterior Sampling,” In: Proceedings of the 26th International Conference on Neural Information Processing Systems, vol. 2 (Curran Associates Inc., Lake Tahoe, Nevada, 2013) pp. 30033011.Google Scholar
Osband, I., Blundell, C., Pritzel, A. and Roy, B. Van, “Deep Exploration via Bootstrapped DQN,” In: Advances in Neural Information Processing Systems (Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. and Garnett, R., eds.), vol. 29 (Curran Associates, Inc., 2016) pp. 40264034.Google Scholar
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W. and Abbeel, P., “Domain randomization for transferring deep neural networks from simulation to the real world,” arXiv preprint arXiv:1703.06907 (2017).CrossRefGoogle Scholar