Hostname: page-component-cb9f654ff-5jtmz Total loading time: 0 Render date: 2025-08-19T12:13:09.581Z Has data issue: false hasContentIssue false

NGANet: Neighborhood-aware Graph Aggregation Network for 6D pose estimation in robotic grasping

Published online by Cambridge University Press:  19 August 2025

Lu Chen
Affiliation:
Institute of Big Data Science and Industry, Shanxi University, Taiyuan, China
Yuhao Zheng
Affiliation:
Institute of Big Data Science and Industry, Shanxi University, Taiyuan, China
Peng Wu
Affiliation:
Institute of Big Data Science and Industry, Shanxi University, Taiyuan, China
Jing Yang*
Affiliation:
School of Automation and Software Engineering, Shanxi University, Taiyuan, China
Yan Gao
Affiliation:
Institute of Big Data Science and Industry, Shanxi University, Taiyuan, China
Jingyang Liu
Affiliation:
Institute of Big Data Science and Industry, Shanxi University, Taiyuan, China
*
Corresponding author: Jing Yang; Email: yangjing199002@sxu.edu.cn

Abstract

6D pose estimation can perceive an object’s position and orientation in 3D space, playing a critical role in robotic grasping. However, traditional sparse keypoint-based methods generally rely on a limited number of feature points, restricting their performance under occlusion and viewpoint variations. To address this issue, we propose a novel Neighborhood-aware Graph Aggregation Network (NGANet) for precise pose estimation, which combines fully convolutional networks and graph convolutional networks (GCNs) to establish dense correspondences between 2D–3D and 3D–3D spaces. The $K$-nearest neighbor algorithm is integrated to build neighborhood relationships within isolated point clouds, followed by GCNs to aggregate local geometric features. When combined with mesh data, both surface details and topological shapes can be modeled. A positional encoding attention mechanism is introduced to adaptively fuse these multimodal features into a unified, spatially coherent representation about pose-specific features. Extensive experiments indicate that our proposed NGANet achieves a higher estimation accuracy on LINEMOD and Occlusion-LINEMOD datasets. In addition, its effectiveness is also validated under real-world scenarios.

Information

Type
Research Article
Copyright
© The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Peng, S., Ma, Y., Xu, X., Jia, Y. and Liu, S., “A novel precise pose prediction algorithm for setting the sleeping mode of the Yutu-2 rover based on a multiview block bundle adjustment,” Robotica 40(11), 38373862 (2022).10.1017/S0263574722000637CrossRefGoogle Scholar
Chen, L., Niu, M., Yang, J., Qian, Y., Li, Z., Wang, K., Yan, T. and Huang, P., “Robotic grasp detection using structure prior attention and multiscale features,” IEEE Trans. Syst., Man, Cybern.: Syst. 54(11), 70397053 (2024).10.1109/TSMC.2024.3446841CrossRefGoogle Scholar
Dong, M. and Zhang, J., “A review of robotic grasp detection technology,” Robotica 41(12), 38463885 (2023).10.1017/S0263574723001285CrossRefGoogle Scholar
Chib, P. S. and Singh, P., “Recent advancements in end-to-end autonomous driving using deep learning: A survey,” IEEE Trans. Intell. Veh. 9(1), 103118 (2024).10.1109/TIV.2023.3318070CrossRefGoogle Scholar
Chen, L., Li, Z., Lu, Z., Wang, Y., Nie, H. and Yang, C., “Domain-invariant feature learning via margin and structure priors for robotic grasping,” IEEE Robot. Autom. Lett. 10(2), 13131320 (2025).10.1109/LRA.2024.3520437CrossRefGoogle Scholar
Xi, H., Li, S. and Liu, X., “A pixel-level grasp detection method based on efficient grasp aware network,” Robotica 42(9), 31903210 (2024).10.1017/S0263574724001358CrossRefGoogle Scholar
Zhao, Z., He, W. and Lu, Z., “Tactile-based grasping stability prediction based on human grasp demonstration for robot manipulation,” IEEE Robot. Autom. Lett. 9(3), 26462653 (2024).10.1109/LRA.2024.3359553CrossRefGoogle Scholar
Li, Y., Huang, P., Ma, Z. and Chen, L., “A context-free method for robust grasp detection: Learning to overcome contextual bias,” IEEE Trans. Ind. Electron. 69(12), 1312113130 (2022).10.1109/TIE.2021.3134078CrossRefGoogle Scholar
Lepetit, V., Moreno-Noguer, F. and Fua, P., “EPnP: An accurate O(n) solution to the PnP problem,” Int. J. Comput. Vision 81(2), 155166 (2009).10.1007/s11263-008-0152-6CrossRefGoogle Scholar
Xiang, Y., Schmidt, T., Narayanan, V. and Fox, D., “PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scenes,“ In: Robotics: Science and Systems (2018).10.15607/RSS.2018.XIV.019CrossRefGoogle Scholar
Liu, Y., Wen, Y., Peng, S., Lin, C., Long, X., Komura, T. and Wang, W., “Gen6D: Generalizable model-free 6-DoF object pose estimation from RGB images,” In: Computer Vision – ECCV 2022: 17th European Conference, Springer-Verlag, Berlin, Heidelberg (2022) pp. 298315.10.1007/978-3-031-19824-3_18CrossRefGoogle Scholar
Nie, H., Zhao, Z., Chen, L., Lu, Z., Li, Z. and Yang, J., “Smaller and faster robotic grasp detection model via knowledge distillation and unequal feature encoding,” IEEE Robot. Autom. Lett. 9(8), 72067213 (2024).10.1109/LRA.2024.3421790CrossRefGoogle Scholar
Lin, J., Liu, L., Lu, D. and Jia, K., “SAM-6D: Segment Anything Model Meets Zero-shot 6D Object Pose Estimation,” In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2024) pp. 2790627916.Google Scholar
Kehl, W., Manhardt, F., Tombari, F., Ilic, S. and Navab, N., “SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again,” In: 2017 IEEE International Conference on Computer Vision, IEEE Computer Society (2017) pp. 15301538.Google Scholar
Li, Y., Wang, G., Ji, X., Xiang, Y. and Fox, D., “DeepIM: Deep Iterative Matching for 6D Pose Estimation,” In: Proceedings of the European Conference on Computer Vision, Springer (2018) pp.657678.Google Scholar
Kendall, A., Grimes, M. and Cipolla, R., “PoseNet: A Convolutional Network for Real-time 6-DoF Camera Relocalization,” In: 2015 IEEE International Conference on Computer Vision, IEEE Computer Society (2015) pp. 29382946.Google Scholar
Peng, S., Zhou, X., Liu, Y., Lin, H., Huang, Q. and Bao, H., “PVNet: Pixel-wise voting network for 6DoF object pose estimation,” IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 32123223 (2022).10.1109/TPAMI.2020.3047388CrossRefGoogle Scholar
Song, C., Song, J. and Huang, Q., “HybridPose: 6D Object Pose Estimation Under Hybrid Representations,” In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2020) pp. 428437.Google Scholar
Su, Y., Saleh, M., Fetzer, T., Rambach, J., Navab, N., Busam, B., Stricker, D. and Tombari, F., “ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estimation,” In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2022) pp. 67286738.Google Scholar
Lian, R. and Ling, H., “CheckerPose: Progressive Dense Keypoint Localization for Object Pose Estimation with Graph Neural Network,” In: 2023 IEEE/CVF International Conference on Computer Vision, IEEE Computer Society (2023) pp. 1397613987.Google Scholar
Wang, C., Xu, D., Zhu, Y., Martín-Martín, R., Lu, C., Fei-Fei, L. and Savarese, S., “DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion,” In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2019) pp. 33383347.Google Scholar
Mo, N., Gan, W., Yokoya, N. and Chen, S., “ES6D: A Computation Efficient and Symmetry-aware 6D Pose Regression Framework,” In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2022) pp. 67086717.Google Scholar
He, Y., Sun, W., Huang, H., Liu, J., Fan, H. and Sun, J., “PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation,” In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2020) pp. 1162911638.Google Scholar
He, Y., Huang, H., Fan, H., Chen, Q. and Sun, J., “FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation,” In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2021) pp. 30023012.Google Scholar
Zhou, J., Chen, K., Xu, L., Dou, Q. and Qin, J., “Deep Fusion Transformer Network with Weighted Vector-wise Keypoints Voting for Robust 6D Object Pose Estimation,” In: 2023 IEEE/CVF International Conference on Computer Vision, IEEE Computer Society (2023) pp. 1392113931.Google Scholar
Phan, A. V., Nguyen, M. L., Nguyen, Y. L. H. and Bui, L. T., “DGCNN: A convolutional neural network over large-scale labeled graphs,” Neural Netw. 108, 533543 (2018).10.1016/j.neunet.2018.09.001CrossRefGoogle ScholarPubMed
Sarlin, P.-E., DeTone, D., Malisiewicz, T. and Rabinovich, A., “SuperGlue: Learning Feature Matching with Graph Neural Networks,” In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2020) pp. 49374946.Google Scholar
Zhou, G., Wang, H., Chen, J. and Huang, D., “PR-GCN: A Deep Graph Convolutional Network with Point Refinement For 6D Pose Estimation,” In: 2021 IEEE/CVF International Conference on Computer Vision, IEEE Computer Society (2021) pp. 27732782.Google Scholar
Wu, Y., Zand, M., Etemad, A. and Greenspan, M., “Vote From the Center: 6 DoF Pose Estimation in RGB-D Images By Radial Keypoint Voting,” In: Computer Vision – ECCV 2022, Springer Nature Switzerland, Cham (2022) pp. 335352.10.1007/978-3-031-20080-9_20CrossRefGoogle Scholar
Tan, T. and Dong, Q., “SMOC-Net: Leveraging Camera Pose for Self-supervised Monocular Object Pose Estimation,” In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2023) pp. 2130721316.Google Scholar
Zhong, L., Lu, J., Chen, Z., Song, N. and Wang, S., “Adaptive multi-channel contrastive graph convolutional network with graph and feature fusion,” Inform. Sciences 658, 120012 (2024).10.1016/j.ins.2023.120012CrossRefGoogle Scholar
Huang, T., Xu, Y., Bai, S., Wang, Y. and Bai, X., “Feature context learning for human parsing,” Science China Information Sciences 62, 220101:1220101:14 (2019).10.1007/s11432-019-9935-6CrossRefGoogle Scholar
Wu, C., Chen, L., He, Z. and Jiang, J., “Pseudo-Siamese graph matching network for textureless objects’ 6-D pose estimation,” IEEE Trans. Ind. Electron. 69(3), 27182727 (2022).10.1109/TIE.2021.3070501CrossRefGoogle Scholar
Steinbach, M. and Tan, P.-N., "Knn: K-Nearest Neighbors,” In: The Top Ten Algorithms in Data mining, Chapman and Hall/CRC (2009) pp. 165176.Google Scholar
Qi, C. R., Yi, L., Su, H. and Guibas, L. J., “PointNet++: Deep hierarchical feature learning on point sets in a metric space,” Adv. Neural Inf. Process Syst. 30, 51055114 (2017).Google Scholar
Fey, M., Lenssen, J. E., Weichert, F. and Müller, H., “SplineCNN: Fast Geometric Deep Learning with continuous B-spline Kernels,” In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2018) pp. 869877.Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. and Polosukhin, I., “Attention is All You Need,” In: Proceedings of the 31st International Conference on Neural Information Processing Systems, MIT (2017) pp. 60006010.Google Scholar
Cignoni, P., Callieri, M., Corsini, M., Dellepiane, M., Ganovelli, F., Ranzuglia, G., “MeshLab: An Open-Source Mesh Processing Tool,” In: Eurographics Italian Chapter Conference, Eurographics Association (2008) pp. 129136.Google Scholar
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S., “PyTorch: An imperative style, high-performance deep learning library,” Adv. Neural Inf. Process Syst. 32, 80248035 (2019).Google Scholar
Brachmann, E., Michel, F., Krull, A., Yang, M. Y., Gumhold, S. and Rother, C., “Uncertainty-driven 6D pose Estimation of Objects and Scenes from a Single RGB Image,” In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2016) pp. 33643372.Google Scholar
Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K. and Navab, N., “Model Based Training, Detection and Pose Estimation of Texture-less 3D Objects in Heavily Cluttered Scenes,” In: Computer Vision–ACCV 2012: 11th Asian Conference on Computer Vision, Springer (2013) pp. 548562.Google Scholar
Wang, G., Manhardt, F., Tombari, F. and Ji, X., “GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation,” In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2021) pp. 1660616616.Google Scholar
Li, Z., Wang, G. and Ji, X., “DPN: Coordinates-based Disentangled Pose Network for Real-time RGB-based 6-DoF Object Pose Estimation,” In: 2019 IEEE/CVF International Conference on Computer Vision, IEEE Computer Society (2019) pp. 76777686.Google Scholar
Di, Y., Manhardt, F., Wang, G., Ji, X., Navab, N. and Tombari, F., “SO-Pose: Exploiting Self-occlusion for Direct 6D Pose Estimation,” In: 2021 IEEE/CVF International Conference on Computer Vision, IEEE Computer Society (2021) pp. 1237612385.Google Scholar