Autonomous Vehicular Landings on the Deck of an Unmanned Surface Vehicle using Deep Reinforcement Learning

Riccardo Polvara; Sanjay Sharma; Jian Wan; Andrew Manning; Robert Sutton

doi:10.1017/S0263574719000316

Autonomous Vehicular Landings on the Deck of an Unmanned Surface Vehicle using Deep Reinforcement Learning

Published online by Cambridge University Press: 08 April 2019

Riccardo Polvara

Sanjay Sharma

Jian Wan ,

Andrew Manning and

Robert Sutton

Show author details

Riccardo Polvara*: Affiliation:
Lincoln Centre for Autonomous Systems Research, School of Computer Science, College of Science, University of Lincoln, Brayford Pool, Lincoln LN6 7TS, UK
Sanjay Sharma: Affiliation:
Autonomous Marine Systems Research Group, School of Engineering, Faculty of Science and Engineering, University of Plymouth, Drake Circus, Plymouth PL4 8AA, UK E-mails: sanjay.sharma@plymouth.ac.uk, jian.wan@plymouth.ac.uk, a.manning@plymouth.ac.uk, r.sutton@plymouth.ac.uk
Jian Wan: Affiliation:
Autonomous Marine Systems Research Group, School of Engineering, Faculty of Science and Engineering, University of Plymouth, Drake Circus, Plymouth PL4 8AA, UK E-mails: sanjay.sharma@plymouth.ac.uk, jian.wan@plymouth.ac.uk, a.manning@plymouth.ac.uk, r.sutton@plymouth.ac.uk
Andrew Manning: Affiliation:
Autonomous Marine Systems Research Group, School of Engineering, Faculty of Science and Engineering, University of Plymouth, Drake Circus, Plymouth PL4 8AA, UK E-mails: sanjay.sharma@plymouth.ac.uk, jian.wan@plymouth.ac.uk, a.manning@plymouth.ac.uk, r.sutton@plymouth.ac.uk
Robert Sutton: Affiliation:
Autonomous Marine Systems Research Group, School of Engineering, Faculty of Science and Engineering, University of Plymouth, Drake Circus, Plymouth PL4 8AA, UK E-mails: sanjay.sharma@plymouth.ac.uk, jian.wan@plymouth.ac.uk, a.manning@plymouth.ac.uk, r.sutton@plymouth.ac.uk
*: *Corresponding author. E-mail: rpolvara@lincoln.ac.uk

Article contents

Summary
References

Get access

Rights & Permissions

Summary

Autonomous landing on the deck of a boat or an unmanned surface vehicle (USV) is the minimum requirement for increasing the autonomy of water monitoring missions. This paper introduces an end-to-end control technique based on deep reinforcement learning for landing an unmanned aerial vehicle on a visual marker located on the deck of a USV. The solution proposed consists of a hierarchy of Deep Q-Networks (DQNs) used as high-level navigation policies that address the two phases of the flight: the marker detection and the descending manoeuvre. Few technical improvements have been proposed to stabilize the learning process, such as the combination of vanilla and double DQNs, and a partitioned buffer replay. Simulated studies proved the robustness of the proposed algorithm against different perturbations acting on the marine vessel. The performances obtained are comparable with a state-of-the-art method based on template matching.

Keywords

Deep reinforcement learning Unmanned aerial vehicle Autonomous agents

Type: Articles
Information: Robotica , Volume 37 , Issue 11 , November 2019 , pp. 1867 - 1882

DOI: https://doi.org/10.1017/S0263574719000316 [Opens in a new window]
Copyright: © Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Altshuler, Y., Yanovsky, V., Wagner, I. A. and Bruckstein, A. M., “Efficient cooperative search of smart targets using UAV swarms,” Robotica 26(4), 551–557 (2008).CrossRef Google Scholar

Polvara, R., Sharma, S., Sutton, R., Wan, J. and Manning, A., “Toward a Multi-agent System for Marine Observation,” In: Advances in Cooperative Robotics (World Scientific, London, UK, 2017) pp. 225–232.Google Scholar

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al., “Human-level control through deep reinforcement learning,” Nature 518(7540), 529–533 (2015).CrossRef Google Scholar PubMed

Levine, S., Finn, C., Darrell, T. and Abbeel, P., “End-to-end training of deep visuomotor policies,” J. Mach. Learn. Res. 17(39), 1–40 (2016).Google Scholar

Kong, W., Zhou, D., Zhang, D., and Zhang, J., “Vision-Based Autonomous Landing System for Unmanned Aerial Vehicle: A Survey,” 2014 International Conference on Multisensor Fusion and Information Integration for Intelligent Systems (MFI) (IEEE, 2014) pp. 1–8.CrossRef Google Scholar

Gautam, A., Sujit, P. and Saripalli, S., “A Survey of Autonomous Landing Techniques for UAVs,” 2014 International Conference on Unmanned Aircraft Systems (ICUAS) (IEEE, 2014) pp. 1210–1218.CrossRef Google Scholar

Gomez-Balderas, J. E., Salazar, S., Guerrero, J. A. and Lozano, R., “Vision-based autonomous hovering for a miniature quad-rotor,” Robotica 32(1), 43–61 (2014).CrossRef Google Scholar

Saripalli, S. and Sukhatme, G., “Landing on a Moving Target Using an Autonomous Helicopter,” In: Field and Service Robotics (Springer, Berlin, Heidelberg, 2006) pp. 277–286.CrossRef Google Scholar

Herissé, B., Hamel, T., Mahony, R. and Russotto, F.-X., “Landing a VTOL unmanned aerial vehicle on a moving platform using optical flow,” IEEE Trans. Rob. 28(1), 77–89 (2012).CrossRef Google Scholar

Cesetti, A., Frontoni, E., Mancini, A., Zingaretti, P. and Longhi, S., “A Vision-Based Guidance System for UAV Navigation and Safe Landing Using Natural Landmarks,” Selected papers from the 2nd International Symposium on UAVs, Reno, Nevada, USA, June 8–10, 2009 (Springer, 2009) pp. 233–257.CrossRef Google Scholar

Ruffier, F. and Franceschini, N., “Optic flow regulation in unsteady environments: A tethered MAV achieves terrain following and targeted landing over a moving platform,” J. Intell. Rob. Syst. 79(2), 275–293 (2015).CrossRef Google Scholar

Kim, J., Jung, Y., Lee, D. and Shim, D. H., “Outdoor Autonomous Landing on a Moving Platform for Quadrotors Using an Omnidirectional Camera,” 2014 International Conference on Unmanned Aircraft Systems (ICUAS), Orlando, FL, USA (IEEE, 2014) pp. 1243–1252.CrossRef Google Scholar

Yakimenko, O. A., Kaminer, I. I., Lentz, W. J. and Ghyzel, P., “Unmanned aircraft navigation for shipboard landing using infrared vision,” IEEE Trans. Aerosp. Electron. Syst. 38(4), 1181–1200 (2002).CrossRef Google Scholar

Wenzel, K. E., Rosset, P. and Zell, A., “Low-cost visual tracking of a landing place and hovering flight control with a microcontroller,” J. Intell. Rob. Syst. 57(1), 297–311 (2010).CrossRef Google Scholar

Theodore, C., Rowley, D., Ansar, A., Matthies, L., Goldberg, S., Hubbard, D. and Whalley, M., “Flight Trials of a Rotorcraft Unmanned Aerial Vehicle Landing Autonomously at Unprepared Sites,” Annual Forum Proceedings–American Helicopter Society, vol. 62, no. 2 (American Helicopter Society, Inc., Phoenix, AZ, USA, 2006) p. 1250.Google Scholar

Scherer, S., Chamberlain, L. and Singh, S., “Autonomous landing at unprepared sites by a full-scale helicopter,” Rob. Auton. Syst. 60(12), 1545–1562 (2012).CrossRef Google Scholar

Sereewattana, M., Ruchanurucks, M. and Siddhichai, S., “Depth Estimation of Markers for UAV Automatic Landing Control Using Stereo Vision with a Single Camera,” International Conference of Information and Communication Technology for Embedded System, Ayutthaya, Thailand (2014).Google Scholar

Shi, H. and Wang, H., “A Vision System for Landing an Unmanned Helicopter in a Complex Environment,” Sixth International Symposium on Multispectral Image Processing and Pattern Recognition (International Society for Optics and Photonics, Yichang, China, 2009) pp. 74 962G–74 962G.Google Scholar

Shakernia, O., Ma, Y., Koo, T. J. and Sastry, S., “Landing an unmanned air vehicle: Vision based motion estimation and nonlinear control,” Asian J. Control 1(3), 128–145 (1999).CrossRef Google Scholar

Peters, J. and Schaal, S., “Reinforcement learning of motor skills with policy gradients,” Neural Networks 21(4), 682–697 (2008).CrossRef Google Scholar PubMed

Cui, Y., Matsubara, T. and Sugimoto, K., “Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states,” Neural Networks 94, 13–23 (2017).CrossRef Google Scholar PubMed

Kober, J., Bagnell, J. A. and Peters, J., “Reinforcement learning in robotics: A survey,” Int. J. Rob. Res. 32(11), 1238–1274 (2013).CrossRef Google Scholar

Bagnell, J. A. and Schneider, J. G., “Autonomous Helicopter Control Using Reinforcement Learning Policy Search Methods,” IEEE International Conference on Robotics and Automation, 2001. Proceedings 2001 ICRA, vol. 2 (IEEE, 2001) pp. 1615–1620.Google Scholar

Ng, A., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E. and Liang, E., “Autonomous Inverted Helicopter Flight via Reinforcement Learning,” In: Experimental Robotics IX (Springer, Berlin, Heidelberg, 2006) pp. 363–372.CrossRef Google Scholar

Zhang, T., Kahn, G., Levine, S. and Abbeel, P., “Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search,” 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden (IEEE, 2016) pp. 528–535.CrossRef Google Scholar

Fayek, H. M., Lech, M. and Cavedon, L., “Evaluating deep learning architectures for speech emotion recognition,” Neural Networks 92, 60–68 (2017).CrossRef Google Scholar PubMed

Patacchiola, M. and Cangelosi, A., “Head pose estimation in the wild using convolutional neural networks and adaptive gradient methods,” Pattern Recognition 71, 132–143 (2017).CrossRef Google Scholar

Pavel, M. S., Schulz, H. and Behnke, S., “Object class segmentation of RGB-D video using recurrent convolutional neural networks,” Neural Networks 88, 105–113 (2017).CrossRef Google Scholar PubMed

Schmidhuber, J., “Deep learning in neural networks: An overview,” Neural Networks 61, 85–117 (2015).CrossRef Google Scholar PubMed

Levine, S., Pastor, P., Krizhevsky, A., Ibarz, J. and Quillen, D., “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection,” Int. J. Rob. Res. 37(4-5), 421–436 0278364917710318 (2016).CrossRef Google Scholar

Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D. and Kavukcuoglu, K., “Asynchronous Methods for Deep Reinforcement Learning,” International Conference on Machine Learning, ICML, New York City, NY, USA (2016) pp. 1928–1937.Google Scholar

Glorot, X., Bordes, A. and Bengio, Y., “Deep Sparse Rectifier Neural Networks,” In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (Gordon, G., Dunson, D. and Dudík, M., eds.), vol. 15 (PMLR, 2011) pp. 315–323.Google Scholar

WawrzyńSki, P. and Tanwani, A. K., “Autonomous reinforcement learning with experience replay,” Neural Networks 41, 156–167 (2013).CrossRef Google Scholar PubMed

Van Hasselt, H., Guez, A. and Silver, D., “Deep Reinforcement Learning with Double q-Learning,” In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI, Phoenix, Arizona, 2016) pp. 2094–2100.Google Scholar

Thrun, S. and Schwartz, A., “Issues in Using Function Approximation for Reinforcement Learning,” Proceedings of the 1993 Connectionist Models Summer School (Lawrence Erlbaum, Hillsdale, NJ, 1993).Google Scholar

Schaul, T., Quan, J., Antonoglou, I. and Silver, D., “Prioritized experience replay,” arXiv preprint arXiv:1511.05952 (2015).Google Scholar

Narasimhan, K., Kulkarni, T. and Barzilay, R., “Language understanding for text-based games using deep reinforcement learning,” arXiv preprint arXiv:1506.08941 (2015).CrossRef Google Scholar

Barto, A. G. and Mahadevan, S., “Recent advances in hierarchical reinforcement learning,” Discrete Event Dyn. Syst. 13(4), 341–379 (2003).CrossRef Google Scholar

Bellemare, M., Naddaf, Y., Veness, J. and Bowling, M., “The arcade learning environment: An evaluation platform for general agents,” Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina (2015).Google Scholar

Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J. and Zaremba, W., “Openai gym,” arXiv preprint arXiv:1606.01540 (2016).Google Scholar

Koenig, N. and Howard, A., “Design and Use Paradigms for Gazebo, an Open-Source Multi-robot Simulator,” IEEE/RSJ International Conference on Intelligent Robots and Systems, (IROS 2004) Sendai, Japan. Proceedings, vol. 3 (IEEE, 2004) pp. 2149–2154.Google Scholar

Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T., Leibs, J., Wheeler, R. and Ng, A. Y., “ROS: An Open-Source Robot Operating System,” ICRA Workshop on Open Source Software, vol. 3, no. 3.2 (Kobe, Japan, 2009) p. 5.Google Scholar

Dias, J., Althoefer, K. and Lima, P. U., “Robot competitions: What did we learn? [Competitions],” IEEE Rob. Autom. Mag. 23(1), 16–18 (2016).CrossRef Google Scholar

Tieleman, T. and Hinton, G., “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude,” COURSERA: Neural Networks Mach. Learn. 4, 26–30 (2012).Google Scholar

Glorot, X. and Bengio, Y., “Understanding the Difficulty of Training Deep Feedforward Neural Networks,” In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (Teh, Y. W. and Titterington, M., eds.), vol. 9 (PMLR, 2010) pp. 249–256.Google Scholar

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., et al., “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467 (2016).Google Scholar

Polvara, R., Sharma, S., Wan, J., Manning, A. and Sutton, R., “Vision-based autonomous landing of a quadrotor on the perturbed deck of an unmanned surface vehicle,” Drones 2(2), 15 (2018).CrossRef Google Scholar

Polvara, R., Patacchiola, M., Sharma, S., Wan, J., Manning, A., Sutton, R. and Cangelosi, A., “Autonomous quadrotor landing using deep reinforcement learning,” arXiv preprint arXiv:1709.03339 (2017).Google Scholar

Schulman, J., Wolski, F., Dhariwal, P., Radford, A. and Klimov, O., “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347 (2017).Google Scholar

Osband, I., Russo, D. and Roy, B. Van, “(More) Efficient Reinforcement Learning via Posterior Sampling,” In: Proceedings of the 26th International Conference on Neural Information Processing Systems, vol. 2 (Curran Associates Inc., Lake Tahoe, Nevada, 2013) pp. 3003–3011.Google Scholar

Osband, I., Blundell, C., Pritzel, A. and Roy, B. Van, “Deep Exploration via Bootstrapped DQN,” In: Advances in Neural Information Processing Systems (Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. and Garnett, R., eds.), vol. 29 (Curran Associates, Inc., 2016) pp. 4026–4034.Google Scholar

Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W. and Abbeel, P., “Domain randomization for transferring deep neural networks from simulation to the real world,” arXiv preprint arXiv:1703.06907 (2017).CrossRef Google Scholar

Article contents

Autonomous Vehicular Landings on the Deck of an Unmanned Surface Vehicle using Deep Reinforcement Learning

Summary

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests