Vision-based food handling system for high-resemblance random food items

Yadan Zeng; Yee Seng Teoh; Guoniu Zhu; Elvin Toh; I-Ming Chen

doi:10.1017/S0263574724000122

Vision-based food handling system for high-resemblance random food items

Published online by Cambridge University Press: 21 February 2024

Yadan Zeng

Yee Seng Teoh ,

Guoniu Zhu ,

Elvin Toh and

I-Ming Chen

Show author details

Yadan Zeng*: Affiliation:
Robotics Research Centre of the School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore
Yee Seng Teoh: Affiliation:
Robotics Research Centre of the School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore
Guoniu Zhu: Affiliation:
Robotics Research Centre of the School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore
Elvin Toh: Affiliation:
Robotics Research Centre of the School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore
I-Ming Chen: Affiliation:
Robotics Research Centre of the School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore
*: Corresponding author: Yadan Zeng; Email: yadan001@e.ntu.edu.sg

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

The rise in the number of automated robotic kitchens accelerated the need for advanced food handling system, emphasizing food analysis including ingredient classification pose recognition and assembling strategy. Selecting the optimal piece from a pile of similarly shaped food items is a challenge to automated meal assembling system. To address this, we present a constructive assembling algorithm, introducing a unique approach for food pose detection–Fast Image to Pose Detection (FI2PD), and a closed-loop packing strategy. Powered by a convolutional neural network (CNN) and a pose retrieval model, FI2PD is adept at constructing a 6D pose from only RGB images. The method employs a coarse-to-fine approach, leveraging the CNN to pinpoint object orientation and position, alongside a pose retrieval process for target selection and 6D pose derivation. Our closed-loop packing strategy, aided by the Item Arrangement Verifier, ensures precise arrangement and system robustness. Additionally, we introduce our FdIngred328 dataset of nine food categories ranging from fake foods to real foods, and the automatically generated data based on synthetic techniques. The performance of our method for object recognition and pose detection has been demonstrated to achieve a success rate of 97.9%. Impressively, the integration of a closed-loop strategy into our meal-assembly process resulted in a notable success rate of 90%, outperforming the results of systems lacking the closed-loop mechanism.

Keywords

food handling system pose estimation object detection food dataset high-resemblance item closed-loop strategy

Type: Research Article
Information: Robotica , First View , pp. 1 - 17

DOI: https://doi.org/10.1017/S0263574724000122 [Opens in a new window]
Copyright: © The Author(s), 2024. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Marwan, Q. M., Chua, S. C. and Kwek, L. C., “Comprehensive review on reaching and grasping of objects in robotics,” Robotica 39(10), 1849–1882 (2021).CrossRef Google Scholar

Wang, Z., Hirai, S. and Kawamura, S., “Challenges and opportunities in robotic food handling: A review,” Front Robo AI 8, 789107 (2022).CrossRef Google Scholar PubMed

Lu, N., Cai, Y., Lu, T., Cao, X., Guo, W. and Wang, S., “Picking out the impurities: Attention-based push-grasping in dense clutter,” Robotica 41(2), 470–485 (2023).CrossRef Google Scholar

Wang, H., Sahoo, D., Liu, C., Lim, E.-p. and Hoi, S. C., “Learning Cross-Modal Embeddings with Adversarial Networks for Cooking Recipes and Food Images,” In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA (IEEE, 2019) pp. 11564–11573.CrossRef Google Scholar

Zhu, G.-N., Zeng, Y., Teoh, Y. S., Toh, E., Wong, C. Y. and Chen, I.-M., “A bin-picking benchmark for systematic evaluation of robotic-assisted food handling for line production,” IEEE/ASME Trans Mech 28(3), 1778–1788 (2022).CrossRef Google Scholar

Hu, Y., Fua, P., Wang, W. and Salzmann, M., “Single-Stage 6D Object Pose Estimation,” In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA (IEEE, 2020) pp. 2927–2936.CrossRef Google Scholar

Periyasamy, A. S., Schwarz, M. and Behnke, S., “Robust 6D Object Pose Estimation in Cluttered Scenes Using Semantic Segmentation and Pose Regression Networks,” In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain (IEEE, 2018) pp. 6660–6666.CrossRef Google Scholar

Collet, A., Martinez, M. and Srinivasa, S. S., “The moped framework: Object recognition and pose estimation for manipulation,” Int J Rob Res 30(10), 1284–1306 (2011).CrossRef Google Scholar

Zeng, A., Yu, K.-T., Song, S., Suo, D., Walker, E., Rodriguez, A. and Xiao, J., “Multi-View Self-Supervised Deep Learning for 6D Pose Estimation in the Amazon Picking Challenge,” In: 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore (IEEE, 2017) pp. 1386–1383.CrossRef Google Scholar

Xiang, Y., Schmidt, T., Narayanan, V. and Fox, D., “Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes, (2017). arXiv preprint arXiv: 1711.00199.Google Scholar

Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S. and Guibas, L. J.. Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA (IEEE, 2019) pp. 2637–2646.CrossRef Google Scholar

He, Y., Sun, W., Huang, H., Liu, J., Fan, H. and Sun, J., “PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation,” In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA (IEEE, 2020) pp. 11632–11641.CrossRef Google Scholar

Kang, H., Zhou, H. and Chen, C., “Visual perception and modeling for autonomous apple harvesting,” IEEE Access 8, 62151–62163 (2020).CrossRef Google Scholar

Lin, G., Tang, Y., Zou, X., Xiong, J. and Li, J., “Guava detection and pose estimation using a low-cost rgb-d sensor in the field,” Sensors 19(2), 428 (2019).CrossRef Google Scholar PubMed

Yin, W., Wen, H., Ning, Z., Ye, J., Dong, Z. and Luo, L., “Fruit detection and pose estimation for grape cluster–harvesting robot using binocular imagery based on deep neural networks,” Front Robot AI 8, 626989 (2021).CrossRef Google Scholar PubMed

Rong, J., Wang, P., Wang, T., Hu, L. and Yuan, T., “Fruit pose recognition and directional orderly grasping strategies for tomato harvesting robots,” Comput Electron Agr 202, 107430 (2022).CrossRef Google Scholar

Costanzo, M., De Simone, M., Federico, S., Natale, C. and Pirozzi, S., “Enhanced 6d pose estimation for robotic fruit picking, (2023). arXiv preprint arXiv: 2305.15856, 2023.Google Scholar

Khan, Z. H., Khalid, A. and Iqbal, J., “Towards realizing robotic potential in future intelligent food manufacturing systems,” Innov Food Sci Emerg 48, 11–24 (2018).CrossRef Google Scholar

JLS Automation, Pick-and-Place Robots Designed For Agility (2002), https://www.jlsautomation.com/talon-packaging-systems.Google Scholar

Paul, H., Qiu, Z., Wang, Z., Hirai, S. and Kawamura, S., “A ROS 2 Based Robotic System to Pick-and-Place Granular Food Materials,” In: 2022 IEEE International Conference on Robotics and Biomimetics (ROBIO), Jinghong, China (IEEE, 2022) pp. 99–104.CrossRef Google Scholar

Takahashi, K., Ko, W., Ummadisingu, A. and Maeda, S.-i.. Uncertainty-Aware Self-Supervised Target-Mass Grasping of Granular Foods. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi‘an, China (IEEE, 2021) pp. 2620–2626.CrossRef Google Scholar

Low, J. H., Khin, P. M., Han, Q. Q., Yao, H., Teoh, Y. S., Zeng, Y., Li, S., Liu, J., Liu, Z., y Alvarado, P. V., Cheng, I-M, Keong Tee, B. C. and Hua Yeow, R. C., “Sensorized reconfigurable soft robotic gripper system for automated food handling,” IEEE/ASME Trans Mech 27(5), 3232–3243 (2022).CrossRef Google Scholar

Wang, Z., Makiyama, Y. and Hirai, S., “A soft needle gripper capable of grasping and piercing for handling food materials,” J Robot Mech 33(4), 935–943 (2021).CrossRef Google Scholar

Wang, Z., Kanegae, R. and Hirai, S., “Circular shell gripper for handling food products,” Soft Robot 8(5), 542–554 (2021).CrossRef Google Scholar PubMed

Pavlakos, G., Zhou, X., Chan, A., Derpanis, K. G. and Daniilidis, K., “6-DoF Object Pose from Semantic Keypoints,” In: 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore (IEEE, 2017) pp. 2011–2018.CrossRef Google Scholar

Wu, J., Zhou, B., Russell, R., Kee, V., Wagner, S., Hebert, M., Torralba, A. and Johnson, D. M.," Real-Time Object Pose Estimation with Pose Interpreter Networks,” In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain (IEEE, 2018) pp. 6798–6805.CrossRef Google Scholar

Park, K., Patten, T. and Vincze, M., “Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6d Pose Estimation.” In: Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, South Korea (IEEE, 2019) pp. 7668–7677.Google Scholar

Li, Y., Wang, G., Ji, X., Xiang, Y. and Fox, D., “DeepIm: Deep Iterative Matching for 6d Pose Estimation,” In: Proceedings of the European Conference on Computer Vision (ECCV), (Springer, 2018) pp. 683–698.CrossRef Google Scholar

Lee, G. G., Huang, C.-W., Chen, J.-H., Chen, S.-Y. and Chen, H.-L., “AIFood: A Large Scale Food Images Dataset for Ingredient Recognition,” In: TENCON 2019-2019 IEEE Region 10 Conference (TENCON), Kochi, India (IEEE, 2019) pp. 802–805.CrossRef Google Scholar

Horiguchi, S., Amano, S., Ogawa, M. and Aizawa, K., “Personalized classifier for food image recognition,” IEEE Trans Multi 20(10), 2836–2848 (2018).CrossRef Google Scholar

Ciocca, G., Napoletano, P. and Schettini, R., “Food recognition: A new dataset, experiments, and results,” IEEE J Biomed Health Inform 21(3), 588–598 (2017).CrossRef Google Scholar PubMed

Güngör, C., Baltacı, F., Erdem, A. and Erdem, E., “Turkish Cuisine: A Benchmark Dataset with Turkish Meals for Food Recognition,” In: 2017 25th Signal Processing and Communications Applications Conference (SIU), Antalya, Turkey (IEEE, 2017) pp. 1–4.CrossRef Google Scholar

Bargoti, S. and Underwood, J., “Deep fruit detection in orchards, (2016). arXiv preprint arXiv: 1610.03677.Google Scholar

Häni, N., Roy, P. and Isler, V., “Minneapple: A benchmark dataset for apple detection and segmentation,” IEEE Robot Auto Lett 5(2), 852–858 (2020).CrossRef Google Scholar

Ummadisingu, A., Takahashi, K. and Fukaya, N., “Cluttered Food Grasping with Adaptive Fingers and Synthetic-Data Trained Object Detection,” In: 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA (IEEE, 2022) pp. 8290–8297.CrossRef Google Scholar

Radford, A., Metz, L. and Chintala, S., “Unsupervised representation learning with deep convolutional generative adversarial networks, (2015), arXiv preprint arXiv: 1511.06434.Google Scholar

Bochkovskiy, A., Wang, C.-Y. and Liao, H.-Y. M., “Yolov4: Optimal speed and accuracy of object detection, (2020), arXiv preprint arXiv: 2004.10934.Google Scholar

Su, T., Liang, X., Zeng, X. and Liu, S., “Pythagorean-hodograph curves-based trajectory planning for pick-and-place operation of delta robot with prescribed pick and place heights,” Robotica 41(6), 1651–1672 (2023).CrossRef Google Scholar

Bang, S., Baek, F., Park, S., Kim, W. and Kim, H., “Image augmentation to improve construction resource detection using generative adversarial networks, cut-and-paste, and image transformation techniques,” Automat Constr 115(3), 103198 (2020).CrossRef Google Scholar

Zeng et al. supplementary material

File 19.1 MB

Article contents

Vision-based food handling system for high-resemblance random food items

Abstract

Keywords

Access options

References

Zeng et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests