NGANet: Neighborhood-aware Graph Aggregation Network for 6D pose estimation in robotic grasping

Lu Chen; Yuhao Zheng; Peng Wu; Jing Yang; Yan Gao; Jingyang Liu

doi:10.1017/S0263574725102130

NGANet: Neighborhood-aware Graph Aggregation Network for 6D pose estimation in robotic grasping

Published online by Cambridge University Press: 19 August 2025

Peng Wu ,

Yan Gao and

Lu Chen: Affiliation:
Institute of Big Data Science and Industry, Shanxi University, Taiyuan, China
Yuhao Zheng: Affiliation:
Institute of Big Data Science and Industry, Shanxi University, Taiyuan, China
Peng Wu: Affiliation:
Institute of Big Data Science and Industry, Shanxi University, Taiyuan, China
Jing Yang*: Affiliation:
School of Automation and Software Engineering, Shanxi University, Taiyuan, China
Yan Gao: Affiliation:
Institute of Big Data Science and Industry, Shanxi University, Taiyuan, China
Jingyang Liu: Affiliation:
Institute of Big Data Science and Industry, Shanxi University, Taiyuan, China
*: Corresponding author: Jing Yang; Email: yangjing199002@sxu.edu.cn

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

6D pose estimation can perceive an object’s position and orientation in 3D space, playing a critical role in robotic grasping. However, traditional sparse keypoint-based methods generally rely on a limited number of feature points, restricting their performance under occlusion and viewpoint variations. To address this issue, we propose a novel Neighborhood-aware Graph Aggregation Network (NGANet) for precise pose estimation, which combines fully convolutional networks and graph convolutional networks (GCNs) to establish dense correspondences between 2D–3D and 3D–3D spaces. The $K$-nearest neighbor algorithm is integrated to build neighborhood relationships within isolated point clouds, followed by GCNs to aggregate local geometric features. When combined with mesh data, both surface details and topological shapes can be modeled. A positional encoding attention mechanism is introduced to adaptively fuse these multimodal features into a unified, spatially coherent representation about pose-specific features. Extensive experiments indicate that our proposed NGANet achieves a higher estimation accuracy on LINEMOD and Occlusion-LINEMOD datasets. In addition, its effectiveness is also validated under real-world scenarios.

Keywords

pose estimation robotic grasping graph convolutional network attention mechanism

Information

Type: Research Article
Information: Robotica , Volume 43 , Issue 9 , September 2025 , pp. 3095 - 3111

DOI: https://doi.org/10.1017/S0263574725102130 [Opens in a new window]
Copyright: © The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Peng, S., Ma, Y., Xu, X., Jia, Y. and Liu, S., “A novel precise pose prediction algorithm for setting the sleeping mode of the Yutu-2 rover based on a multiview block bundle adjustment,” Robotica 40(11), 3837–3862 (2022).CrossRef Google Scholar

Chen, L., Niu, M., Yang, J., Qian, Y., Li, Z., Wang, K., Yan, T. and Huang, P., “Robotic grasp detection using structure prior attention and multiscale features,” IEEE Trans. Syst., Man, Cybern.: Syst. 54(11), 7039–7053 (2024).CrossRef Google Scholar

Dong, M. and Zhang, J., “A review of robotic grasp detection technology,” Robotica 41(12), 3846–3885 (2023).CrossRef Google Scholar

Chib, P. S. and Singh, P., “Recent advancements in end-to-end autonomous driving using deep learning: A survey,” IEEE Trans. Intell. Veh. 9(1), 103–118 (2024).CrossRef Google Scholar

Chen, L., Li, Z., Lu, Z., Wang, Y., Nie, H. and Yang, C., “Domain-invariant feature learning via margin and structure priors for robotic grasping,” IEEE Robot. Autom. Lett. 10(2), 1313–1320 (2025).CrossRef Google Scholar

Xi, H., Li, S. and Liu, X., “A pixel-level grasp detection method based on efficient grasp aware network,” Robotica 42(9), 3190–3210 (2024).CrossRef Google Scholar

Zhao, Z., He, W. and Lu, Z., “Tactile-based grasping stability prediction based on human grasp demonstration for robot manipulation,” IEEE Robot. Autom. Lett. 9(3), 2646–2653 (2024).CrossRef Google Scholar

Li, Y., Huang, P., Ma, Z. and Chen, L., “A context-free method for robust grasp detection: Learning to overcome contextual bias,” IEEE Trans. Ind. Electron. 69(12), 13121–13130 (2022).CrossRef Google Scholar

Lepetit, V., Moreno-Noguer, F. and Fua, P., “EPnP: An accurate O(n) solution to the PnP problem,” Int. J. Comput. Vision 81(2), 155–166 (2009).CrossRef Google Scholar

Xiang, Y., Schmidt, T., Narayanan, V. and Fox, D., “PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scenes,“ In: Robotics: Science and Systems (2018).Google Scholar

Liu, Y., Wen, Y., Peng, S., Lin, C., Long, X., Komura, T. and Wang, W., “Gen6D: Generalizable model-free 6-DoF object pose estimation from RGB images,” In: Computer Vision – ECCV 2022: 17th European Conference, Springer-Verlag, Berlin, Heidelberg (2022) pp. 298–315.CrossRef Google Scholar

Nie, H., Zhao, Z., Chen, L., Lu, Z., Li, Z. and Yang, J., “Smaller and faster robotic grasp detection model via knowledge distillation and unequal feature encoding,” IEEE Robot. Autom. Lett. 9(8), 7206–7213 (2024).CrossRef Google Scholar

Lin, J., Liu, L., Lu, D. and Jia, K., “SAM-6D: Segment Anything Model Meets Zero-shot 6D Object Pose Estimation,” In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2024) pp. 27906–27916.Google Scholar

Kehl, W., Manhardt, F., Tombari, F., Ilic, S. and Navab, N., “SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again,” In: 2017 IEEE International Conference on Computer Vision, IEEE Computer Society (2017) pp. 1530–1538.Google Scholar

Li, Y., Wang, G., Ji, X., Xiang, Y. and Fox, D., “DeepIM: Deep Iterative Matching for 6D Pose Estimation,” In: Proceedings of the European Conference on Computer Vision, Springer (2018) pp.657–678.Google Scholar

Kendall, A., Grimes, M. and Cipolla, R., “PoseNet: A Convolutional Network for Real-time 6-DoF Camera Relocalization,” In: 2015 IEEE International Conference on Computer Vision, IEEE Computer Society (2015) pp. 2938–2946.Google Scholar

Peng, S., Zhou, X., Liu, Y., Lin, H., Huang, Q. and Bao, H., “PVNet: Pixel-wise voting network for 6DoF object pose estimation,” IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 3212–3223 (2022).CrossRef Google Scholar

Song, C., Song, J. and Huang, Q., “HybridPose: 6D Object Pose Estimation Under Hybrid Representations,” In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2020) pp. 428–437.Google Scholar

Su, Y., Saleh, M., Fetzer, T., Rambach, J., Navab, N., Busam, B., Stricker, D. and Tombari, F., “ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estimation,” In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2022) pp. 6728–6738.Google Scholar

Lian, R. and Ling, H., “CheckerPose: Progressive Dense Keypoint Localization for Object Pose Estimation with Graph Neural Network,” In: 2023 IEEE/CVF International Conference on Computer Vision, IEEE Computer Society (2023) pp. 13976–13987.Google Scholar

Wang, C., Xu, D., Zhu, Y., Martín-Martín, R., Lu, C., Fei-Fei, L. and Savarese, S., “DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion,” In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2019) pp. 3338–3347.Google Scholar

Mo, N., Gan, W., Yokoya, N. and Chen, S., “ES6D: A Computation Efficient and Symmetry-aware 6D Pose Regression Framework,” In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2022) pp. 6708–6717.Google Scholar

He, Y., Sun, W., Huang, H., Liu, J., Fan, H. and Sun, J., “PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation,” In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2020) pp. 11629–11638.Google Scholar

He, Y., Huang, H., Fan, H., Chen, Q. and Sun, J., “FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation,” In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2021) pp. 3002–3012.Google Scholar

Zhou, J., Chen, K., Xu, L., Dou, Q. and Qin, J., “Deep Fusion Transformer Network with Weighted Vector-wise Keypoints Voting for Robust 6D Object Pose Estimation,” In: 2023 IEEE/CVF International Conference on Computer Vision, IEEE Computer Society (2023) pp. 13921–13931.Google Scholar

Phan, A. V., Nguyen, M. L., Nguyen, Y. L. H. and Bui, L. T., “DGCNN: A convolutional neural network over large-scale labeled graphs,” Neural Netw. 108, 533–543 (2018).CrossRef Google Scholar PubMed

Sarlin, P.-E., DeTone, D., Malisiewicz, T. and Rabinovich, A., “SuperGlue: Learning Feature Matching with Graph Neural Networks,” In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2020) pp. 4937–4946.Google Scholar

Zhou, G., Wang, H., Chen, J. and Huang, D., “PR-GCN: A Deep Graph Convolutional Network with Point Refinement For 6D Pose Estimation,” In: 2021 IEEE/CVF International Conference on Computer Vision, IEEE Computer Society (2021) pp. 2773–2782.Google Scholar

Wu, Y., Zand, M., Etemad, A. and Greenspan, M., “Vote From the Center: 6 DoF Pose Estimation in RGB-D Images By Radial Keypoint Voting,” In: Computer Vision – ECCV 2022, Springer Nature Switzerland, Cham (2022) pp. 335–352.CrossRef Google Scholar

Tan, T. and Dong, Q., “SMOC-Net: Leveraging Camera Pose for Self-supervised Monocular Object Pose Estimation,” In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2023) pp. 21307–21316.Google Scholar

Zhong, L., Lu, J., Chen, Z., Song, N. and Wang, S., “Adaptive multi-channel contrastive graph convolutional network with graph and feature fusion,” Inform. Sciences 658, 120012 (2024).CrossRef Google Scholar

Huang, T., Xu, Y., Bai, S., Wang, Y. and Bai, X., “Feature context learning for human parsing,” Science China Information Sciences 62, 220101:1–220101:14 (2019).CrossRef Google Scholar

Wu, C., Chen, L., He, Z. and Jiang, J., “Pseudo-Siamese graph matching network for textureless objects’ 6-D pose estimation,” IEEE Trans. Ind. Electron. 69(3), 2718–2727 (2022).CrossRef Google Scholar

Steinbach, M. and Tan, P.-N., "Knn: K-Nearest Neighbors,” In: The Top Ten Algorithms in Data mining, Chapman and Hall/CRC (2009) pp. 165–176.Google Scholar

Qi, C. R., Yi, L., Su, H. and Guibas, L. J., “PointNet++: Deep hierarchical feature learning on point sets in a metric space,” Adv. Neural Inf. Process Syst. 30, 5105–5114 (2017).Google Scholar

Fey, M., Lenssen, J. E., Weichert, F. and Müller, H., “SplineCNN: Fast Geometric Deep Learning with continuous B-spline Kernels,” In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2018) pp. 869–877.Google Scholar

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. and Polosukhin, I., “Attention is All You Need,” In: Proceedings of the 31st International Conference on Neural Information Processing Systems, MIT (2017) pp. 6000–6010.Google Scholar

Cignoni, P., Callieri, M., Corsini, M., Dellepiane, M., Ganovelli, F., Ranzuglia, G., “MeshLab: An Open-Source Mesh Processing Tool,” In: Eurographics Italian Chapter Conference, Eurographics Association (2008) pp. 129–136.Google Scholar

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S., “PyTorch: An imperative style, high-performance deep learning library,” Adv. Neural Inf. Process Syst. 32, 8024–8035 (2019).Google Scholar

Brachmann, E., Michel, F., Krull, A., Yang, M. Y., Gumhold, S. and Rother, C., “Uncertainty-driven 6D pose Estimation of Objects and Scenes from a Single RGB Image,” In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2016) pp. 3364–3372.Google Scholar

Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K. and Navab, N., “Model Based Training, Detection and Pose Estimation of Texture-less 3D Objects in Heavily Cluttered Scenes,” In: Computer Vision–ACCV 2012: 11th Asian Conference on Computer Vision, Springer (2013) pp. 548–562.Google Scholar

Wang, G., Manhardt, F., Tombari, F. and Ji, X., “GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation,” In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2021) pp. 16606–16616.Google Scholar

Li, Z., Wang, G. and Ji, X., “DPN: Coordinates-based Disentangled Pose Network for Real-time RGB-based 6-DoF Object Pose Estimation,” In: 2019 IEEE/CVF International Conference on Computer Vision, IEEE Computer Society (2019) pp. 7677–7686.Google Scholar

Di, Y., Manhardt, F., Wang, G., Ji, X., Navab, N. and Tombari, F., “SO-Pose: Exploiting Self-occlusion for Direct 6D Pose Estimation,” In: 2021 IEEE/CVF International Conference on Computer Vision, IEEE Computer Society (2021) pp. 12376–12385.Google Scholar

Article contents

NGANet: Neighborhood-aware Graph Aggregation Network for 6D pose estimation in robotic grasping

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests