Skip to main content Accessibility help

Action learning and grounding in simulated human–robot interactions

  • Oliver Roesler (a1) and Ann Nowé (a1)


In order to enable robots to interact with humans in a natural way, they need to be able to autonomously learn new tasks. The most natural way for humans to tell another agent, which can be a human or robot, to perform a task is via natural language. Thus, natural human–robot interactions also require robots to understand natural language, i.e. extract the meaning of words and phrases. To do this, words and phrases need to be linked to their corresponding percepts through grounding. Afterward, agents can learn the optimal micro-action patterns to reach the goal states of the desired tasks. Most previous studies investigated only learning of actions or grounding of words, but not both. Additionally, they often used only a small set of tasks as well as very short and unnaturally simplified utterances. In this paper, we introduce a framework that uses reinforcement learning to learn actions for several tasks and cross-situational learning to ground actions, object shapes and colors, and prepositions. The proposed framework is evaluated through a simulated interaction experiment between a human tutor and a robot. The results show that the employed framework can be used for both action learning and grounding.



Hide All
Abdo, N., Spinello, L., Burgard, W. & Stachniss, C. 2014. Inferring what to imitate in manipulation actions by using a recommender system. In IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China.
Akhtar, N. & Montague, L. 1999. Early lexical acquisition: the role of cross-situational learning. First Language 19(57), 347358.
Aly, A., Taniguchi, A. & Taniguchi, T. 2017. A generative framework for multimodal learning of spatial concepts and object categories: an unsupervised part-of-speech tagging and 3D visual perception based approach. In IEEE International Conference on Development and Learning and the International Conference on Epigenetic Robotics (ICDL-EpiRob), Lisbon, Portugal, September 2017.
Argall, B. D., Chernova, S., Veloso, M. & Browning, B. 2009. A survey of robot learning from demonstration. Robotics and Autonomous Systems 57, 469483.
Blythe, R. A., Smith, K. & Smith, A. D. M. 2010. Learning times for large lexicons through cross-situational learning. Cognitive Science 34, 620642.
Carey, S. 1978. The child as word-learner. In Linguistic Theory and Psychological Reality , Halle, M., Bresnan, J. & Miller, G. A. (eds). MIT Press, 265293.
Carey, S. & Bartlett, E. 1978. Acquiring a single new word. Papers and Reports on Child Language Development 15, 1729.
Dawson, C. R., Wright, J., Rebguns, A., Escárcega, M. V., Fried, D. & Cohen, P. R. 2013. A generative probabilistic framework for learning spatial language. In IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL), Osaka, Japan, August 2013.
Fisher, C., Hall, D. G., Rakowitz, S. & Gleitman, L. 1994. When it is better to receive than to give: syntactic and conceptual constraints on vocabulary growth. Lingua 92, 333375.
Flanagan, R., Bowman, M. C. & Johansson, R. S. 2006. Control strategies in object manipulation tasks. Current Opinion in Neurobiology 16, 650659.
Fontanari, J. F., Tikhanoff, V., Cangelosi, A., Ilin, R. & Perlovsky, L. I. 2009a. Cross-situational learning of object-word mapping using neural modeling fields. Neural Networks 22(5–6), 579585.
Fontanari, J. F., Tikhanoff, V., Cangelosi, A. & Perlovsky, L. I. 2009b. A cross-situational algorithm for learning a lexicon using neural modeling fields. In International Joint Conference on Neural Networks (IJCNN), Atlanta, GA, USA, June 2009.
Gillette, J., Gleitman, H., Gleitman, L. & Lederer, A. 1999. Human simulations of vocabulary learning. Cognition 73, 135176.
Gu, S., Holly, E., Lillicrap, T. & Levine, S. 2017. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In IEEE International Conference on Robotics and Automation (ICRA), Singapore, May–June 2017.
Gudimella, A., Story, R., Shaker, M., Kong, R., Brown, M., Shnayder, V. & Campos, M. 2017. Deep reinforcement learning for dexterous manipulation with concept networks. CoRR.
Harnad, S. 1990. The symbol grounding problem. Physica D 42, 335346.
International Federation of Robotics. 2017. World robotics 2017 - service robots.
Kemp, C. C., Edsinger, A. & Torres-Jara, E. 2007. Challenges for robot manipulation in human environments. IEEE Robotics & Automation Magazine 14(1), 2029.
Ng, A. Y., Harada, D., & Russell, S. 1999. Policy invariance under reward transformations: theory and application to reward shaping. In Proceedings of the Sixteenth International Conference on Machine Learning (ICML), Bratko, I. & Dzeroski, S. (eds), 99, 278287.
Pinker, S. 1989. Learnability and Cognition. MIT Press.
Popov, I., Heess, N., Lillicrap, T., Hafner, R., Barth-Maron, G., Vecerik, M., Lampe, T., Tassa, Y., Erez, T. & Riedmiller, M. 2017. Data-efficient deep reinforcement learning for dexterous manipulation. CoRR.
Puterman, M. L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, Inc.
Roesler, O., Aly, A., Taniguchi, T. & Hayashi, Y. 2018. A probabilistic framework for comparing syntactic and semantic grounding of synonyms through cross-situational learning. In ICRA ’18 Workshop on Representing a Complex World: Perception, Inference, and Learning for Joint Semantic, Geometric, and Physical Understanding, Brisbane, Australia, May 2018.
Roesler, O., Aly, A., Taniguchi, T. & Hayashi, Y. 2019. Evaluation of word representations in grounding natural language instructions through computational human–robot interaction. In Proceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), Daegu, South Korea, March 2019.
Rusu, R. B., Bradski, G., Thibaux, R. & Hsu, J. 2010. Fast 3D recognition and pose using the viewpoint feature histogram. In Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Taipei, Taiwan, October 2010, 2155–2162.
She, L., Yang, S., Cheng, Y., Jia, Y., Chai, J. Y. & Xi, N. 2014. Back to the blocks world: learning new actions through situated human-robot dialogue. In Proceedings of the SIGDIAL 2014 Conference, Philadelphia, USA, June 2014, 89–97.
Siskind, J. M. 1996. A computational study of cross-situational techniques for learning word-to-meaning mappings. Cognition 61, 3991.
Smith, A. D. M., & Smith, K. 2012. Cross-Situational Learning. Springer US, 864866. ISBN 978-1-4419-1428-6. doi: 10.1007/978-1-4419-1428-6_1712.
Smith, K., Smith, A. D. M. & Blythe, R. A. 2011. Cross-situational learning: an experimental study of word-learning mechanisms. Cognitive Science 35(3), 480498.
Smith, L. & Yu, C. 2008. Infants rapidly learn word-referent mappings via cross-situational statistics. Cognition 106, 15581568.
Steels, L. & Loetzsch, M. 2012. The grounded naming game. In Experiments in Cultural Language Evolution, Steels, L. (ed). John Benjamins, 4159.
Stulp, F., Theodorou, E. A. & Schaal, S. 2012. Reinforcement learning with sequences of motion primitives for robust manipulation. IEEE Transactions on Robotics (T-RO) 28(6), 13601370.
Sutton, R. S. & Barto, A. G. 1998. Reinforcement Learning: An Introduction. MIT Press.
Taniguchi, A., Taniguchi, T. & Cangelosi, A. 2017. Cross-situational learning with Bayesian generative models for multimodal category and word learning in robots. Frontiers in Neurorobotics 11.
Tellex, S., Kollar, T., Dickerson, S., Walter, M. R., Banerjee, A. G., Teller, S. & Roy, N. 2011. Approaching the symbol grounding problem with probabilistic graphical models. AI Magazine 32(4), 6476.
Vogt, P. 2012. Exploring the robustness of cross-situational learning under Zipfian distributions. Cognitive Science 36(4), 726739.

Action learning and grounding in simulated human–robot interactions

  • Oliver Roesler (a1) and Ann Nowé (a1)


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed