Skip to main content Accessibility help

Pre-training with non-expert human demonstration for deep reinforcement learning

  • Gabriel V. de la Cruz (a1), Yunshu Du (a1) and Matthew E. Taylor (a1)


Deep reinforcement learning (deep RL) has achieved superior performance in complex sequential tasks by using deep neural networks as function approximators to learn directly from raw input images. However, learning directly from raw images is data inefficient. The agent must learn feature representation of complex states in addition to learning a policy. As a result, deep RL typically suffers from slow learning speeds and often requires a prohibitively large amount of training time and data to reach reasonable performance, making it inapplicable to real-world settings where data are expensive. In this work, we improve data efficiency in deep RL by addressing one of the two learning goals, feature learning. We leverage supervised learning to pre-train on a small set of non-expert human demonstrations and empirically evaluate our approach using the asynchronous advantage actor-critic algorithms in the Atari domain. Our results show significant improvements in learning speed, even when the provided demonstration is noisy and of low quality.



Hide All
Abtahi, F. & Fasel, I. 2011. Deep belief nets as function approximators for reinforcement learning, Restricted Boltzmann Machine (RBM) 2, h3.
Anderson, C. W., Lee, M. & Elliott, D. L. 2015. Faster reinforcement learning after pretraining deep networks to predict state dynamics. In 2015 International Joint Conference on Neural Networks (IJCNN), 17. IEEE.
Argall, B. D., Chernova, S., Veloso, M. & Browning, B. 2009. A survey of robot learning from demonstration. Robotics and Autonomous Systems 57(5), 469483.
Bellemare, M. G., Naddaf, Y., Veness, J. & Bowling, M. 2013. The arcade learning environment: an evaluation platform for general agents. Journal of Artificial Intelligence Research 47, 253279.
Bojarski, M., Yeres, P., Choromanska, A., Choromanski, K., Firner, B., Jackel, L. & Muller, U. 2017. Explaining how a deep neural network trained with end-to-end learning steers a car. arXiv preprint arXiv:1704.07911.
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J. & Zaremba, W. 2016. Openai gym.
Brys, T., Harutyunyan, A., Taylor, M. E. & Nowé, A. 2015. Policy transfer using reward shaping. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, 181188. International Foundation for Autonomous Agents and Multiagent Systems.
Christiano, P., Leike, J., Brown, T. B., Martic, M., Legg, S. & Amodei, D. 2017. Deep reinforcement learning from human preferences. In NIPS, Curran Associates, Inc.
Deng, Y., Bao, F., Kong, Y., Ren, Z. & Dai, Q. 2017. Deep direct reinforcement learning for financial signal representation and trading. IEEE Transactions on Neural Networks and Learning Systems 28(3), 653664.
Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y. & Zhokhov, P. 2017. Openai baselines.
Du, Y., Czarnecki, W. M., Jayakumar, S. M., Pascanu, R. & Lakshminarayanan, B. 2018. Adapting auxiliary losses using gradient similarity. arXiv preprint arXiv:1812.02224.
Du, Y., de la Cruz, G. V. Jr., Irwin, J. & Taylor, M. E. 2016. Initial progress in transfer for deep reinforcement learning algorithms. In Proceedings of the Deep Reinforcement Learning: Frontiers and Challenges (DeepRL) Workshop (at IJCAI 2016).
Duan, Y., Chen, X., Houthooft, R., Schulman, J. & Abbeel, P. 2016. Benchmarking deep reinforcement learning for continuous control. In International Conference on Machine Learning, 13291338,
Erhan, D., Bengio, Y., Courville, A., Manzagol, P.-A., Vincent, P. & Bengio, S. 2010. Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research 11, 625660.
Erhan, D., Manzagol, P.-A., Bengio, Y., Bengio, S. & Vincent, P. 2009. The difficulty of training deep architectures and the effect of unsupervised pre-training. In Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), 153160, PMLR.
Glatt, R., d. Silva, F. L. & Costa, A. H. R. 2016. Towards knowledge transfer in deep reinforcement learning. In 2016 5th Brazilian Conference on Intelligent Systems (BRACIS), 9196, IEEE.
Hester, T., Vecerik, M., Pietquin, O., Lanctot, M., Schaul, T., Piot, B., Sendonaris, A., Dulac-Arnold, G., Osband, I., Agapiou, J., Leibo, J. Z. & Gruslys, A. 2018. Deep q-learning from demonstrations. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, AAAI Press.
Jaderberg, M., Mnih, V., Czarnecki, W. M., Schaul, T., Leibo, J. Z., Silver, D. & Kavukcuoglu, K. 2017. Reinforcement learning with unsupervised auxiliary tasks. In ICLR,
Kempka, M., Wydmuch, M., Runc, G., Toczek, J. & Jaśkowski, W. 2016. Vizdoom: a Doom-based AI research platform for visual reinforcement learning. In 2016 IEEE Conference on Computational Intelligence and Games (CIG), 18. IEEE.
Kurin, V., Nowozin, S., Hofmann, K., Beyer, L. & Leibe, B. 2017. The Atari grand challenge dataset. arXiv preprint arXiv:1705.10998.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D. & Wierstra, D. 2016. Continuous control with deep reinforcement learning. In ICLR,
Lin, M., Chen, Q. & Yan, S. 2013. Network in network. arXiv preprint arXiv:1312.4400.
Miotto, R., Wang, F., Wang, S., Jiang, X. & Dudley, J. T. 2017. Deep learning for healthcare: review, opportunities and challenges. Briefings in Bioinformatics 19(6), 12361246.
Mirowski, P., Pascanu, R., Viola, F., Soyer, H., Ballard, A. J., Banino, A., Denil, M., Goroshin, R., Sifre, L., Kavukcuoglu, K., Kumaran, D. & Hadsell, R. 2017. Learning to navigate in complex environments. In ICLR,
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D. & Kavukcuoglu, K. 2016. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, 1928–1937,
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S. & Hassabis, D. 2015. Human-level control through deep reinforcement learning. Nature 518(7540), 529533, Nature Publishing Group.
Ng, A. Y., Harada, D. & Russell, S. 1999. Policy invariance under reward transformations: theory and application to reward shaping. In ICML, 99, 278287, Morgan Kaufmann.
Pan, S. J. & Yang, Q. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22(10), 13451359.
Papoudakis, G., Chatzidimitriou, K. C. & Mitkas, P. A. 2018. Deep reinforcement learning for Doom using unsupervised auxiliary tasks. CoRR abs/1807.01960.
Parisotto, E., Ba, J. L. & Salakhutdinov, R. 2016. Actor-mimic: deep multitask and transfer reinforcement learning. In ICLR,
Pohlen, T., Piot, B., Hester, T., Azar, M. G., Horgan, D., Budden, D., Barth-Maron, G., van Hasselt, H., Quan, J., Večerík, M., Hessel, M., Munos, R. & Pietquin, O. 2018. Observe and look further: achieving consistent performance on Atari. arXiv preprint arXiv:1805.11593.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Alexander, C. B. & Fei-Fei, L. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115(3), 211252.
Rusu, A. A., Colmenarejo, S. G., Gulcehre, C., Desjardins, G., Kirkpatrick, J., Pascanu, R., Mnih, V., Kavukcuoglu, K. & Hadsell, R. 2016, Policy distillation. In ICLR,
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D. & Batra, D. 2017. Grad-cam: visual explanations from deep networks via gradient-based localization. In ICCV, 618626, IEEE.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T. & Hassabis, D. 2016. Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484489.
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K. & Hassabis, D. 2018. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 11401144.
Sutton, R. S. & Barto, A. G. 2018. Reinforcement Learning: An Introduction. MIT Press.
Taylor, M. E. & Stone, P. 2009. Transfer learning for reinforcement learning domains: a survey. Journal of Machine Learning Research 10(Jul), 16331685.
Teh, Y., Bapst, V., Czarnecki, W. M., Quan, J., Kirkpatrick, J., Hadsell, R., Heess, N. & Pascanu, R. 2017. Distral: robust multitask reinforcement learning. In Advances in Neural Information Processing Systems, 44964506.
Tieleman, T. & Hinton, G. 2012. Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning 4(2), 2631.
Vinyals, O., Ewalds, T., Bartunov, S., Georgiev, P., Vezhnevets, A. S., Yeo, M., Makhzani, A., Küttler, H., Agapiou, J., Schrittwieser, J., Quan, J., Gaffney, S., Petersen, S., Simonyan, K., Schaul, T., van Hasselt, H., Silver, D., Lillicrap, T., Calderone, K., Keet, P., Brunasso, A., Lawrence, D., Ekermo, A., Repp, J. & Tsing, R. 2017. Starcraft II: a new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782.
Wang, Z. & Taylor, M. E. 2017. Improving reinforcement learning with confidence-based demonstrations. In Proceedings of the 26th International Conference on Artificial Intelligence (IJCAI), International Joint Conference on Artificial Intelligence.
Watkins, C. J. & Dayan, P. 1992. Q-learning. Machine Learning 8(3–4), 279292.
Yosinski, J., Clune, J., Bengio, Y. & Lipson, H. 2014. How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems, 33203328.
Zhang, Y., Lee, K. & Lee, H. 2016. Augmenting supervised neural networks with unsupervised objectives for large-scale image classification. In ICML,

Pre-training with non-expert human demonstration for deep reinforcement learning

  • Gabriel V. de la Cruz (a1), Yunshu Du (a1) and Matthew E. Taylor (a1)


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed