Hostname: page-component-7dd5485656-wlg5v Total loading time: 0 Render date: 2025-10-26T10:25:22.847Z Has data issue: false hasContentIssue false

Real-time mouth posture estimation for meal-assisting robots

Published online by Cambridge University Press:  24 October 2025

Yuhe Fan
Affiliation:
College of Mechanical and Electrical Engineering, Harbin Engineering University, Building No.61, Nantong Street No. 145, Harbin, 150001, China
Lixun Zhang*
Affiliation:
College of Mechanical and Electrical Engineering, Harbin Engineering University, Building No.61, Nantong Street No. 145, Harbin, 150001, China
Canxing Zheng
Affiliation:
Department of Anorectal Surgery, Weifang People’s Hospital, Weifangy, Shandong, China
Zhenhan Wang
Affiliation:
College of Mechanical and Electrical Engineering, Harbin Engineering University, Building No.61, Nantong Street No. 145, Harbin, 150001, China
Zekun Yang
Affiliation:
College of Mechanical and Electrical Engineering, Harbin Engineering University, Building No.61, Nantong Street No. 145, Harbin, 150001, China
Feng Xue
Affiliation:
College of Mechanical and Electrical Engineering, Harbin Engineering University, Building No.61, Nantong Street No. 145, Harbin, 150001, China
Huaiyu Che
Affiliation:
College of Mechanical and Electrical Engineering, Harbin Engineering University, Building No.61, Nantong Street No. 145, Harbin, 150001, China
Xingyuan Wang
Affiliation:
College of Mechanical and Electrical Engineering, Harbin Engineering University, Building No.61, Nantong Street No. 145, Harbin, 150001, China
*
Corresponding author: Lixun Zhang; Email: zhanglixun@hrbeu.edu.cn

Abstract

In the fields of meal-assisting robotics and human–robot interaction (HRI), real-time and accurate mouth pose estimation is critical for ensuring interaction safety and improving user experience. The complexity arises from the diverse opening degrees of mouths, variations in orientation, and external factors such as lighting conditions and occlusions, which pose significant challenges for real-time and accurate posture estimation of mouths. In response to the above-mentioned issues, this paper proposes a novel method for point cloud fitting and posture estimation of mouth opening degrees (FP-MODs). The proposed method leverages both RGB and depth images captured from a single viewpoint, integrating geometric modeling with advanced point cloud processing techniques to achieve robust and accurate mouth posture estimation. The innovation of this work lies in the hypothesis that different states of mouth openings can be effectively described by distinct geometric shapes: closed mouths are modeled by spatial quadratic surfaces, half-open mouths by spatial ellipses, and fully open mouths by spatial circles. Then, based on these hypotheses, we developed algorithms for fitting geometric models to point clouds obtained from mouth regions, respectively. Specifically, for the closed mouth state, we employ an algorithm based on least squares optimization to fit a spatial quadratic surface to the point cloud data. For the half-open or fully open mouth states, we combine inverse projection methods with least squares fitting to model the contour as a spatial ellipse and circle, respectively. Finally, to evaluate the effectiveness of the proposed FP-MODs method, extensive actual experiments were conducted under varying conditions, including different orientations and various types of mouths. The results demonstrate that the proposed FP-MODs method achieves high accuracy and robustness. This study can provide a theoretical foundation and technical support for improving HRI and food delivery safety in the field of robotics.

Information

Type
Research Article
Copyright
© The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Daehyung, P., Yuuna, H. and Charles, C. K., “A multimodal anomaly detector for robot-assisted feeding using an LSTM-based variational autoencoder,” IEEE Robot. Autom. Lett. 3(3), 15441551 (2018).Google Scholar
Jihyeon, H., Sangin, P., Chang-Hwan, I. and Laehyun, K., “A hybrid brain–computer interface for real-life food assist robot control,” Sensors 21, 4578 (2021).Google Scholar
Nabil, E. and Aman, B., “A learning from demonstration framework for implementation of a feeding task,” Encycl. Semant. Comput. Robot. Intell. 2(1), 1850001 (2018).Google Scholar
Tejas, K. S., Maria, K. G. and Graser, A., “Application of reinforcement learning to a robotic drinking assistant,” Robotics 9(1), 115 (2019).Google Scholar
Fei, L., Hongliu, Y., Wentao, W. and Changcheng, Q., “I-feed: A robotic platform of an assistive feeding robot for the disabled elderly population,” Technol. Health Care 2, 15 (2020).Google Scholar
Fei, L., Peng, X. and Hongliu, Y., “Robot-assisted feeding: A technical application that combines learning from demonstration and visual interaction,” Technol. Health Care 1, 16 (2020).Google Scholar
Suneel, B., Ethan, K., Yuxiao, C., Siddhartha, S., Tapomayukh, B. and Dorsa, S., “Balancing Efficiency and Comfort in Robot-Assisted Bite Transfer,” In: Proc. 2022 IEEE Int. Conf. Robot. Autom., Shenyang, China (2022). arXiv:2111.11401.Google Scholar
Daehyung, P., Hokeun, K. and Charles, K., “Multimodal anomaly detection for assistive robots,” Auton. Robot. 43, 611629 (2019).Google Scholar
Yuhe, F., Lixun, Z., Xingyuan, W., Keyi, W., Lan, W., Zhenhan, W., Feng, X., Jinghui, Z. and Chao, W., “Rheological thixotropy and pasting properties of food thickening gums orienting at improving food holding rate,” Appl. Rheol. 32, 100121 (2022).Google Scholar
Yuhe, F., Lixun, Z., Jinghui, Z., Yunqin, Z. and Xingyuan, W., “Viscoelasticity and friction of solid foods measurement by simulating meal-assisting robot,” Int. J. Food Prop. 25(1), 23012319 (2022).Google Scholar
Yuhe, F., Lixun, Z., Canxing, Z., Xingyuan, W., Keyi, W. and Jinghui, Z., “Motion behavior of non-Newtonian fluid-solid interaction foods,” J. Food Eng. 347, 111448 (2023).Google Scholar
Yuhe, F., Lixun, Z., Canxing, Z., Feng, X., Zhenhan, W., Xingyuan, W. and Lan, W., “Contact forces and motion behavior of non-Newtonian fluid–solid food by coupled SPH–FEM method,” J. Food Sci., 88, 25362556 (2023).Google Scholar
Yuhe, F., Lixun, Z., Canxing, Z., Yunqin, Z., Xingyuan, W. and Jinghui, Z., “Real-time and accurate meal detection for meal-assisting robots,” J. Food Eng. 371, 111996 (2024).Google Scholar
Yuhe, F., Lixun, Z., Canxing, Z., Yunqin, Z., Keyi, W. and Xingyuan, W., “Real‐time and accurate model of instance segmentation of foods,” J. Real-Time Image Process 21, 80 (2024).Google Scholar
Yuhe, F., Lixun, Z., Canxing, Z., Zhenhan, W., Jinghui, Z. and Xingyuan, W., “Real-time and accurate detection for face and mouth openings in meal-assisting robotics,” Signal, Image Video Process 18, 92579274 (2024).Google Scholar
Yuhe, F., Lixun, Z., Caixing, Z., Xingyuan, W., Jinghui, Z. and Lan, W., “Instance segmentation of faces and mouth-opening degrees based on improved YOLOv8 method,” Multimedia Syst. 30, 269 (2024).Google Scholar
Chowdary, M. K., Nguyen, T. N. and Hemanth, D. J., “Deep learning-based facial emotion recognition for human–computer interaction applications,” Neural Comput. Appl. 35, 2331123328 (2023).10.1007/s00521-021-06012-8CrossRefGoogle Scholar
Taiba, M. W., Teddy, S. G., Syed, A. A. Q., Mira, K. and Eliathamby, A., “A comprehensive review of speech emotion recognition systems,” IEEE Access 9, 4779547814 (2021).Google Scholar
Xuening, W., Zhaopeng, Q. and Chongchong, Y.. Multi-stage Multi-modalities Fusion of Lip, Tongue and Acoustics Information for Speech Recognition. In: Proc. 6th AI Cloud Comput. Conf., Kyoto, Japan (2023) pp. 1618.Google Scholar
Menghan, L., Bin, H. and Guohui, T., “A comprehensive survey on 3D face recognition methods,” Eng. Appl. Artif. Intell. 110, 104669 (2022).Google Scholar
Souheil, F., Daqing, C., Kun, G. and Perry, X., “Lip reading sentences using deep learning with only visual cues,” IEEE Access 8, 215516215530 (2020).Google Scholar
Wei, X., Li, C. T. and Hu, Y., “Robust Face Recognition Under Varying Illumination and Occlusion Considering Structured Sparsity,” In: Proc. 2012 Int. Conf. Digit. Image Comput. Tech. Appl., Fremantle, WA, Australia (2012) pp. 17.Google Scholar
Hongyang, H., Chai, S., Jin, T., Taoling, T., Chen, H., Zhang, D. and Danni, G., “A novel machine lip reading model,” Procedia Comput. Sci. 199, 14321437 (2022).10.1016/j.procs.2022.01.181CrossRefGoogle Scholar
Pu, G. and Wang, H., “Review on research progress of machine lip reading,” Vis. Comput. 39, 30413057 (2023).10.1007/s00371-022-02511-4CrossRefGoogle Scholar
Ethan, K. G., Rajat, K. J., Amal, N., Ziang, L., Tyler, S. and Haya, B., “An Adaptable, Safe, and Portable Robot-Assisted Feeding System. In: Proc. 2024 ACM/IEEE Int. Conf. Hum.-Robot Interact. March, Boulder, CO, USA (2024) pp. 11 (14.Google Scholar
Du, S., Chen, T., Lou, Z. and Wu, Y., “A2D-LiDAR-based localization method for indoor mobile robots using correlative scan matching,” Robotica 43, 514541 (2025).10.1017/S026357472400198XCrossRefGoogle Scholar
Dinct, S., Fahimit, F. and Aygun, R., “Mirage: An O(n) time analytical solution to 3D camera pose estimation with multi-camera support,” Robotica 35, 22782296 (2017).10.1017/S0263574716000874CrossRefGoogle Scholar
Zubair, M., Kansal, S. and Mukherjee, S., “Vision-based pose estimation of craniocervical region: Experimental setup and saw bone-based study,” Robotica 40, 20312046 (2022).10.1017/S0263574721001508CrossRefGoogle Scholar
Liang, Q., Zhao, M., Wang, S. and Chen, M., “Asemantic knowledge database-based localization method for UAV inspection in perceptual-degraded underground mine,” Robotica 42, 34803504 (2024).10.1017/S0263574724001474CrossRefGoogle Scholar
Zubair, M., Kansal, S. and Mukherjee, S., “Investigating the vision-based intervertebral motion estimation of the Cadaver’s craniovertebral junction,” Robotica 41, 29072914 (2023).10.1017/S0263574723000644CrossRefGoogle Scholar
Yuhe, F., Lixun, Z., Canxing, Z., Zekun, Y., Huaiyu, C., Zhenhan, W., Feng, X. and Xingyuan, W., “Measuring posture and volume of meals for meal-assisting robotics,” J. Food Eng. 392, 112485 (2025).Google Scholar
Whitehill, J. and Movellan, J. R.. A Discriminative Approach to Frame-By-Frame Head Pose tracking. In: Proc. IEEE Conf. Autom. Face Gesture Recognit. (2008) pp. 17.Google Scholar
Fanelli, G., Gall, J. and Van, L.. Real time head pose estimation with random regression forests. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit, 2011) pp. 617624.10.1109/CVPR.2011.5995458CrossRefGoogle Scholar
Luo, C., Zhang, J., Yu, J., Chen, C. W. and Wang, S., “Real-time head pose estimation and face modeling from a depth image,” IEEE Trans. Multimed. 21(10), 24732481 (2019).10.1109/TMM.2019.2903724CrossRefGoogle Scholar
Saeed, A. and Al-Hamadi, A.. Boosted human head pose estimation using kinect camera. In: Proc. IEEE Int. Conf. Image Process (2015) pp. 17521756.Google Scholar
Baltrušaitis, T., Robinson, P. and Morency, L. P.. 3d Constrained Local Model for Rigid and Non-rigid Facial Tracking. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2012) pp. 26102617.Google Scholar
Ahn, B., Park, J. and Kweon, I. S.. Real-Time Head Orientation From a Monocular Camera Using Deep Neural Network. In: Proc. Asian Conf. Comput. Vis. (2014) pp. 8296.Google Scholar
Ahn, B., Choi, D. G., Park, J. and Kweon, I. S., “Real-time head pose estimation using multi-task deep neural network,” Robot. Auton. Syst. 103, 112 (2018).10.1016/j.robot.2018.01.005CrossRefGoogle Scholar
Ruiz, N., Chong, E. and Rehg, J. M.. Fine-Grained Head Pose Estimation Without Keypoints. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit (2018) pp. 20742083.Google Scholar
Borghi, G., Fabbri, M., Vezzani, R., Calderara, S. and Cucchiara, R., “Face-from-depth for head pose estimation on depth images,” IEEE Trans. Pattern Anal. Mach. Intell. 42(3), 596609 (2020).10.1109/TPAMI.2018.2885472CrossRefGoogle ScholarPubMed
Martin, M., Van, F. and Stiefelhagen, R.. Real Time Head Model Creation and Head Pose Estimation on Consumer Depth Cameras. In: Proc. Int. Conf. 3D Vis. (2014) pp. 641648.Google Scholar
Ghiass, R. S., Arandjelovi, O. and Laurendeau, D.. Highly Accurate and Fully Automatic Head Pose Estimation from a Low Quality Consumer-Level RGB-d Sensor. In: Proc. Workshop Comput. Models Soc. Interact. Hum.-Comput.-Media Commun. (2015) pp. 2534.Google Scholar
Li, S., Ngan, K. N., Paramesran, R. and Sheng, L., “Real-time head pose tracking with online face template reconstruction,” IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 19221928 (2015).10.1109/TPAMI.2015.2500221CrossRefGoogle ScholarPubMed
Yuanquan, X., Cheolkon, J. and Yakun, C., “Head pose estimation using deep neural networks and 3D point clouds,” Pattern Recognit. 121, 108210 (2022).Google Scholar
Mo, S. and Miao, X.. Osgg-Net: One-Step Graph Generation Network for Unbiased Head Pose Estimation. In: Proc. 29th ACM Int. Conf. Multimedia (2021) pp. 24652473.Google Scholar
Wu, C. Y., Xu, Q. and Neumann, U., “Synergy Between 3dmm and 3d Landmarks for Accurate 3d Facial Geometry,” In: Proc. 2021 Int. Conf. 3D Vis. (2021).453 (463.10.1109/3DV53792.2021.00055CrossRefGoogle Scholar
Xin, M., Mo, S. and Lin, Y.. “Eva-gcn: Head Pose Estimation Based on Graph Convolutional Networks,” In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit (2021) pp. 14621471.Google Scholar
Chai, W., Chen, J., Wang, J., Velipasalar, S., Venkatachalapathy, A. and Adu-Gyamfi, Y., “Driver head pose detection from naturalistic driving data,” IEEE Trans. Intell. Transp. Syst. 24(9), 93689377 (2023).10.1109/TITS.2023.3275070CrossRefGoogle Scholar
Kao, Y., Pan, B., Xu, M., Lyu, J., Zhu, X. and Chang, Y., “Towards 3D face reconstruction in perspective projection: Estimating 6DoF face pose from monocular image,” IEEE Trans. Image Process 32, 30803091 (2023).10.1109/TIP.2023.3275535CrossRefGoogle Scholar
Cao, Z., Chu, Z., Liu, D. and Chen, Y.. “A Vector-Based Representation to Enhance Head Pose Estimation,” In: Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (2021) pp. 11881197.Google Scholar
Hempel, T., Abdelrahman, A. A. and Al-Hamadi, A., “6D rotation representation for unconstrained head pose estimation,” Proc, IEEE Int. Conf. Image Process 2022, 24962500 (2022).Google Scholar
Xu, Y., Jung, C. and Chang, Y., “Head pose estimation using deep neural networks and 3D point clouds,” Pattern Recognit. 121, 108210 (2022).10.1016/j.patcog.2021.108210CrossRefGoogle Scholar
Zhou, H., Jiang, F. and Lu, H., “A simple baseline for direct 2D multi-person head pose estimation with full-range angles (2023), arXiv preprint, arXiv:2302.01110.Google Scholar
Redhwan, A., Hyunsoo, S. and Sungon, L., “Real-time 6DoF full-range marker-less head pose estimation,” Expert Syst. Appl. 239, 122293 (2024).Google Scholar
Martin, C., Daniel, S. and Sungkil, L., “Automated outdoor depth-map generation and alignment,” Comput. Graph. 74, 109118 (2018).Google Scholar
Guichao, L., Yunchao, T., Xiangjun, Z. and Chenglin, W., “Three-dimensional reconstruction of guava fruits and branches using instance segmentation and geometry analysis,” Comput. Electron. Agric. 184, 106107 (2021).Google Scholar
Lin, G., Tang, Y., Zou, X., Xiong, J. and Li, J., “Guava detection and pose estimation using a low-cost RGB-D sensor in the field,” Sensors 19(2), 428 (2019).10.3390/s19020428CrossRefGoogle ScholarPubMed
Zong, C. and Wang, H., “An improved 3D point cloud instance segmentation method for overhead catenary height detection,” Comput. Electr. Eng. 98, 107685 (2022).10.1016/j.compeleceng.2022.107685CrossRefGoogle Scholar
Yuhe, F., Lixun, Z., Canxing, Z., Zhenhan, W., Huaiyu, C. and Xingyuan, W., “Study on adaptive fuzzy force control based on food rheology properties,” J. Food Eng. 406, 112818 (2026).Google Scholar
Schnabel, R., Wahl, R. and Klein, R., “Efficient RANSAC for point-cloud shape detection,” Comput. Graph. Forum. 26(2), 214226 (2007).10.1111/j.1467-8659.2007.01016.xCrossRefGoogle Scholar
Supplementary material: File

Fan et al. supplementary material

Fan et al. supplementary material
Download Fan et al. supplementary material(File)
File 24 KB