Manipulate as human: learning task-oriented manipulation skills by adversarial motion priors

Ziqi Ma; Changda Tian; Yue Gao

doi:10.1017/S0263574725001444

Manipulate as human: learning task-oriented manipulation skills by adversarial motion priors

Published online by Cambridge University Press: 11 June 2025

and

Ziqi Ma: Affiliation:
ParisTech Elite Institute of Technology, Shanghai Jiao Tong University, Shanghai, P.R. China
Changda Tian: Affiliation:
Department of Automation, Shanghai Jiao Tong University, Shanghai, P.R. China
Yue Gao*: Affiliation:
MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, P.R. China. Shanghai Innovation Institute, Shanghai, P.R. China
*: Corresponding author: Yue Gao; Email: yuegao@sjtu.edu.cn

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In recent years, there has been growing interest in developing robots and autonomous systems that can interact with human in a more natural and intuitive way. One of the key challenges in achieving this goal is to enable these systems to manipulate objects and tools in a manner that is similar to that of humans. In this paper, we propose a novel approach for learning human-style manipulation skills by using adversarial motion priors, which we name HMAMP. The approach leverages adversarial networks to model the complex dynamics of tool and object manipulation and the aim of the manipulation task. The discriminator is trained using a combination of real-world data and simulation data executed by the agent, which is designed to train a policy that generates realistic motion trajectories that match the statistical properties of human motion. We evaluated HMAMP on one challenging manipulation task: hammering, and the results indicate that HMAMP is capable of learning human-style manipulation skills that outperform current baseline methods. Additionally, we demonstrate that HMAMP has potential for real-world applications by performing real robot arm hammering tasks. In general, HMAMP represents a significant step towards developing robots and autonomous systems that can interact with humans in a more natural and intuitive way, by learning to manipulate tools and objects in a manner similar to how humans do.

Keywords

human-like manipulation task-oriented motion prior discriminator hammering

Information

Type: Research Article
Information: Robotica , Volume 43 , Issue 6 , June 2025 , pp. 2320 - 2332

DOI: https://doi.org/10.1017/S0263574725001444 [Opens in a new window]
Copyright: © The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Ainetter, S. and Fraundorfer, F., “End-to-End Trainable Deep Neural Network for Robotic Grasp Detection and Semantic Segmentation from RGB,” 2021 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2021) pp. 13452–13458.CrossRef Google Scholar

Fang, K., Zhu, Y., Garg, A., Kurenkov, A., Mehta, V., Fei-Fei, L. and Savarese, S., “Learning task-oriented grasping for tool manipulation from simulated self-supervision,” Int. J. Robot. Res. 39(2-3), 202–216 (2020).CrossRef Google Scholar

Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M., Vanhoucke, V. and Levine, S., Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation. arXiv preprint arXiv: 1806.10293 (2018).Google Scholar

Qin, Z., Fang, K., Zhu, Y., Fei-Fei, L. and Savarese, S., “Keto: Learning Keypoint Representations for Tool Manipulation,” 2020 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2020) pp. 7278–7285.CrossRef Google Scholar

Manuelli, L., Gao, W., Florence, P. and Tedrake, R., “kpam: Keypoint Affordances for Category-Level Robotic Manipulation,” Robotics Research: The 19th International Symposium ISRR (Springer, 2022) pp. 132–157.CrossRef Google Scholar

Turpin, D., Wang, L., Tsogkas, S., Dickinson, S. and Garg, A., Gift: Generalizable interaction-aware functional tool affordances without labels. arXiv preprint arXiv: 2106.14973 (2021).Google Scholar

Edmonds, M., Gao, F., Liu, H., Xie, X., Qi, S., Rothrock, B., Zhu, Y., Wu, Y. N., Lu, H. and Zhu, S.-C., “A tale of two explanations: Enhancing human trust by explaining robot behavior,” Sci. Robot. 4(37), eaay4663 (2019).CrossRef Google Scholar PubMed

Liu, H., Zhang, C., Zhu, Y., Jiang, C. and Zhu, S.-C., “Mirroring Without Overimitation: Learning Functionally Equivalent Manipulation Actions,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33 (2019) pp. 8025–8033.Google Scholar

Zhang, Z., Jiao, Z., Wang, W., Zhu, Y., Zhu, S.-C. and Liu, H., “Understanding physical effects for effective tool-use,” IEEE Robot. Autom. Lett. 7(4), 9469–9476 (2022).CrossRef Google Scholar

Johns, E., “Coarse-to-Fine Imitation Learning: Robot Manipulation from a Single Demonstration,” 2021 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2021) pp. 4613–4619.CrossRef Google Scholar

Wang, C., Fan, L., Sun, J., Zhang, R., Fei-Fei, L., Xu, D., Zhu, Y. and Anandkumar, A., Mimicplay: Long-horizon imitation learning by watching human play (2023).Google Scholar

Zorina, K., Carpentier, J., Sivic, J. and Petrík, V., Learning to manipulate tools by aligning simulation to video demonstration (2021).Google Scholar

Zhang, T., McCarthy, Z., Jow, O., Lee, D., Chen, X., Goldberg, K. and Abbeel, P., “Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation,” 2018 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2018) pp. 5628–5635.CrossRef Google Scholar

Peng, X. B., Ma, Z., Abbeel, P., Levine, S. and Kanazawa, A., “Amp: Adversarial motion priors for stylized physics-based character control,” ACM Trans. Graphics (TOG) 40(4), 1–20 (2021).CrossRef Google Scholar

Sanz, C., Call, J. and Boesch, C., Tool Use in Animals: Cognition and Ecology (Cambridge University Press, 2013).CrossRef Google Scholar

St Amant, R. and Horton, T. E., “Revisiting the definition of animal tool use,” Anim. Behav. 75(4), 1199–1208 (2008).CrossRef Google Scholar

Van Lawick-Goodall, J., “Tool-Using in Primates and Other Vertebrates,” In: Advances in the Study of Behavior, vol. 3 (Academic Press, 1971) pp. 195–249.Google Scholar

Chen, W., Liang, H., Chen, Z., Sun, F. and Zhang, J., Learning 6-dof task-oriented grasp detection via implicit estimation and visual affordance (2022).Google Scholar

Xu, R., Chu, F.-J., Tang, C., Liu, W. and Vela, P. A., “An affordance keypoint detection network for robot manipulation,” IEEE Robot. Autom. Lett. 6(2), 2870–2877 (2021).CrossRef Google Scholar

Murali, A., Liu, W., Marino, K., Chernova, S. and Gupta, A., “Same object, different grasps: Data and semantic knowledge for task-oriented grasping. CoRR abs/2011.06431 (2020).Google Scholar

Al-Shanoon, A. and Lang, H., “Robotic manipulation based on 3-d visual servoing and deep neural networks,” Robot. Auton. Syst. 152(C), 104041 (2022).CrossRef Google Scholar

Ribeiro, E. G., de Queiroz Mendes, R. and Grassi, V., “Real-time deep learning approach to visual servo control and grasp detection for autonomous robotic manipulation,” Robot. Auton. Syst. 139(C), 103757 (2021).CrossRef Google Scholar

Saito, N., Ogata, T., Funabashi, S., Mori, H. and Sugano, S., “How to select and use tools? : Active perception of target objects using multimodal deep learning. CoRR abs/2106.02445 (2021).Google Scholar

Sun, M. and Gao, Y., “Gater: Learning grasp-action-target embeddings and relations for task-specific grasping,” IEEE Robot. Autom. Lett. 7(1), 618–625 (2022).CrossRef Google Scholar

Nair, S., Rajeswaran, A., Kumar, V., Finn, C. and Gupta, A., “R3m: A Universal Visual Representation for Robot Manipulation,” 6th Annual Conference on Robot Learning (2022).Google Scholar

Xiong, H., Fu, H., Zhang, J., Bao, C., Zhang, Q., Huang, Y., Xu, W., Garg, A. and Lu, C., “Robotube: Learning Household Manipulation from Human Videos with Simulated Twin Environments,” 6th Annual Conference on Robot Learning (2022).Google Scholar

Taheri, O., Ghorbani, N., Black, M. J. and Tzionas, D., GRAB: A dataset of whole-body human grasping of objects. CoRR abs/2008.11200 (2020).Google Scholar

Xiong, H., Li, Q., Chen, Y., Bharadhwaj, H., Sinha, S. and Garg, A., Learning by watching: Physical imitation of manipulation skills from human videos. CoRR abs/2101.07241 (2021).Google Scholar

Ho, J. and Ermon, S., “Generative Adversarial Imitation Learning,” Advances in Neural Information Processing Systems 29 (2016).Google Scholar

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. and Bengio, Y., “Generative adversarial networks,” Commun. ACM 63(11), 139–144 (2020).CrossRef Google Scholar

Escontrela, A., Peng, X. B., Yu, W., Zhang, T., Iscen, A., Goldberg, K. and Abbeel, P., “Adversarial Motion Priors make Good Substitutes for Complex Reward Functions,” 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE, 2022) pp. 25–32.CrossRef Google Scholar

Mao, X., Li, Q., Xie, H., Lau, R. Y. K. and Wang, Z., Multi-class generative adversarial networks with the L2 loss function. CoRR abs/1611.04076 (2016).Google Scholar

Mao, X., Li, Q., Xie, H., Lau, R., Zhen, W. and Smolley, S., “On the effectiveness of least squares generative adversarial networks,” IEEE Trans. Pattern Anal. Mach. Intell. 41(12), 2947–2960 (2018).CrossRef Google Scholar PubMed

Bazarevsky, V., Grishchenko, I., Raveendran, K., Zhu, T., Zhang, F. and Grundmann, M., Blazepose: On-device real-time body pose tracking. arXiv preprint arXiv: 2006.10204 (2020).Google Scholar

Geng, T., Lee, M. and Hülse, M., “Transferring human grasping synergies to a robot,” Mechatronics 21(1), 272–284 (2011).CrossRef Google Scholar

Gioioso, G., Salvietti, G., Malvezzi, M. and Prattichizzo, D., “Mapping synergies from human to robotic hands with dissimilar kinematics: An approach in the object domain,” IEEE Trans. Robot. 29(4), 825–837 (2013).Google Scholar

Suárez, R., Rosell, J. and García, N., “Using Synergies in Dual-Arm Manipulation Tasks,” 2015 IEEE International Conference on Robotics and Automation (ICRA) (2015) pp. 5655–5661.Google Scholar

N. Physics Simulation Environment for Reinforcement Learning Research, Isaac gym - preview release (2023). https://developer.nvidia.com/isaac-gym.Google Scholar

Aronov, B., Har-Peled, S., Knauer, C., Wang, Y. and Wenk, C., “Fréchet Distance for Curves, Revisited,” Algorithms–ESA 2006: 14th Annual European Symposium, Zurich, Switzerland, September 11-13, 2006. Proceedings 14 (Springer, 2006) pp. 52–63.CrossRef Google Scholar

Ma et al. supplementary material 1

Ma et al. supplementary material

File 51 MB

Ma et al. supplementary material 2

Ma et al. supplementary material

File 36.2 MB

Article contents

Manipulate as human: learning task-oriented manipulation skills by adversarial motion priors

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Ma et al. supplementary material 1

Ma et al. supplementary material 2

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests