No CrossRef data available.
Published online by Cambridge University Press: 19 August 2025
6D pose estimation can perceive an object’s position and orientation in 3D space, playing a critical role in robotic grasping. However, traditional sparse keypoint-based methods generally rely on a limited number of feature points, restricting their performance under occlusion and viewpoint variations. To address this issue, we propose a novel Neighborhood-aware Graph Aggregation Network (NGANet) for precise pose estimation, which combines fully convolutional networks and graph convolutional networks (GCNs) to establish dense correspondences between 2D–3D and 3D–3D spaces. The $K$-nearest neighbor algorithm is integrated to build neighborhood relationships within isolated point clouds, followed by GCNs to aggregate local geometric features. When combined with mesh data, both surface details and topological shapes can be modeled. A positional encoding attention mechanism is introduced to adaptively fuse these multimodal features into a unified, spatially coherent representation about pose-specific features. Extensive experiments indicate that our proposed NGANet achieves a higher estimation accuracy on LINEMOD and Occlusion-LINEMOD datasets. In addition, its effectiveness is also validated under real-world scenarios.