Hostname: page-component-cb9f654ff-w5vf4 Total loading time: 0 Render date: 2025-09-06T11:17:20.357Z Has data issue: false hasContentIssue false

Enhancing real-time patient activity recognition for consistent performance in varying illumination and complex indoor environment

Published online by Cambridge University Press:  04 September 2025

Manoj Kumar Sain*
Affiliation:
Department of Electronics and Communication, The LNM Institute of Information Technology, Jaipur, India Current affiliation: Department of Computer Science, Banasthali Vidyapith, Niwai, Rajasthan, India
Rabul Laskar
Affiliation:
Department of Electronics and Communication, National Institute of Technology, Silchar, Assam, India
Joyeeta Singha
Affiliation:
Department of Electronics and Communication, The LNM Institute of Information Technology, Jaipur, India
Sandeep Saini
Affiliation:
Department of Electronics and Communication, The LNM Institute of Information Technology, Jaipur, India
*
Corresponding author: Manoj Kumar Sain; Email: manoj.sain@lnmiit.ac.in

Abstract

Human Activity Recognition (HAR) holds significant importance in health and human-machine interaction. However, recognizing actions from 2D information faces challenges like occlusion, illumination variation, cluttered backgrounds, and view invariance. These hurdles are particularly pronounced in indoor patient monitoring settings due to fluctuating lighting conditions and cluttered backgrounds, which compromise the accuracy of activity recognition systems. A new architecture named IlluminationRevive has been proposed to tackle this issue, which utilizes an encoder-decoder convolutional neural network (CNN) and image post-processing blocks to enhance the image’s visual appearance. A new dataset comprising seven indoor physical activities has been proposed, created with contributions from thirty individuals aged 20–45. A hybrid fusion architecture is proposed to classify activities, integrating motion sequence information and body joint features. The proposed classification model incorporates generated Skeleton Motion History Images (SMHIs), collected human joint motion features from video frames, and novel kinematic and geometric features within window frames as inputs. The model can extract spatial and temporal feature vectors by integrating ResNet50-ViT (Residual Network-50 layers, Vision Transformer) and CNN-BiLSTM (Convolutional Neural Network-Bidirectional Long Short-Term Memory) layers. The suggested classification model was evaluated alongside state-of-the-art models using the LNMIIT-SMD (The LNM Institute of Information Technology-Skeleton Motion Dataset) and established NTU-RGBD (Nanyang Technological University’s Red Blue Green and Depth information) dataset. The evaluation aimed to assess the effectiveness of the proposed classification model architecture. Results demonstrate the superiority of the proposed model, achieving impressive accuracies of 98.21% on real-time data, 98.45% on the proposed dataset, and 97.12% on the NTU-RGBD dataset. This high-accuracy, low-latency approach enhances robotic perception for healthcare applications, enabling service robots to perform real-time patient monitoring and assistive tasks in dynamic indoor environments.

Information

Type
Research Article
Copyright
© The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Hassan, M. M., Ullah, S., Hossain, M. S. and Alelaiwi, A., “An end-to-end deep learning model for human activity recognition from highly sparse body sensor data in internet of medical things environment,” J. Supercomput. 77(3), 22372250 (2021).10.1007/s11227-020-03361-4CrossRefGoogle Scholar
Mekruksavanich, S. and Jitpattanakul, A., “Biometric user identification based on human activity recognition using wearable sensors: An experiment using deep learning models,” Electronics 10(3), 308 (2021).10.3390/electronics10030308CrossRefGoogle Scholar
Cengiz, A. B., Birant, K. U., Cengiz, M., Birant, D. and Baysari, K., “Improving the performance and explainability of indoor human activity recognition in the internet of things environment,” Symmetry 14(10), (2022).CrossRefGoogle Scholar
Mihoub, A., “A deep learning-based framework for human activity recognition in smart homes,” Mobile Info. Syst. 2021(1) 6961343 (2021).CrossRefGoogle Scholar
Yu, X., Xu, C., Zhang, X. and Ou, L., “Real-time multitask multihuman–robot interaction based on context awareness,” Robotica 40(9), 29692995 (2022).CrossRefGoogle Scholar
Perez-Gamboa, S., Sun, Q. and Zhang, Y., “Improved sensor based human activity recognition via hybrid convolutional and recurrent neural networks,” 2021 IEEE International Symposium on Inertial Sensors and Systems (INERTIAL), (2021) IEEE pp. 14.Google Scholar
Jiang, X., Lu, Y., Lu, Z. and Zhou, H., “Smartphone-Based Human Activity Recognition Using Cnn in Frequency Domain", Web and Big Data: APWeb-WAIM 2018 International Workshops: MWDA, BAH, KGMA, DMMOOC, DS, Macau, China, July 23-25, 2018, Revised Selected Papers 2 (2018), Springer, pp. 101110.Google Scholar
Zhu, R., Xiao, Z., Li, Y., Yang, M., Tan, Y., Zhou, L., Lin, S. and Wen, H., “Efficient human activity recognition solving the confusing activities via deep ensemble learning,” IEEE Access 7, 7549075499 (2019).10.1109/ACCESS.2019.2922104CrossRefGoogle Scholar
Pareek, P. and Thakkar, A., “A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications,” Artif. Intell. Rev. 54(3), 22592322 (2021).CrossRefGoogle Scholar
Karsh, B., Laskar, R. H. and Karsh, R. K., “mIV3Net: modified inception V3 network for hand gesture recognition,” Multimed. Tools Appl. 83(4), 1058710613 (2024).CrossRefGoogle Scholar
Bibbò, L. and Vellasco, M. M. B. R., “Human activity recognition (har) in healthcare,” Appl. Sci. 13(24), (2023).CrossRefGoogle Scholar
Wang, S., Zhou, Z., Li, B., Li, Z. and Kan, Z., “Multi-modal interaction with transformers: bridging robots and human with natural language,” Robotica 42(2), 415434 (2024).CrossRefGoogle Scholar
Al-Janabi, S. and Salman, A., “Sensitive integration of multilevel optimization model in human activity recognition for smartphone and smartwatch applications,” Big Data Min. Anal. 4(2), 124138 (2021).CrossRefGoogle Scholar
Kishor, I., Mamodiya, U., Saini, S. and Bossoufi, B., “Voice-enabled human-robot interaction: adaptive self-learning systems for enhanced collaboration,” Robotica 43(6),129 (2025).CrossRefGoogle Scholar
A.Silva, C., GarciaBermudez, R. and Casilari, E., “Features selection for fall detection systems based on machine learning and accelerometer signals,” Advances in Computational Intelligence: 16th International Work-Conference on Artificial Neural Networks, IWANN. 2021, Virtual Event, June 16-18, 2021, Proceedings, Part II, (2021) Springer-Verlag pp. 380391.Google Scholar
Beddiar, D. R., Nini, B., Sabokrou, M. and Hadid, A., “Vision-based human activity recognition: a survey,” Multimed. Tools Appl. 79(41), 3050930555 (2020). doi: 10.1007/s11042-020-09004-3.CrossRefGoogle Scholar
Tian, Z., Qu, P., Li, J., Sun, Y., Li, G., Liang, Z. and Zhang, W., “A survey of deep learning-based low-light image enhancement,” Sensors 23(18), (2023).CrossRefGoogle ScholarPubMed
Balmik, A., Paikaray, A., Jha, M. and Nandy, A., “Motion recognition using deep convolutional neural network for kinect-based nao teleoperation,” Robotica 40(9), 32333253 (2022).CrossRefGoogle Scholar
Emptoz, H. and Lamure, M., “A systemic approach to pattern recognition,” Robotica 5(2), 129133 (1987).10.1017/S0263574700015095CrossRefGoogle Scholar
Huang, H., Wu, D., Liang, Z., Sun, F. and Dong, M., “Virtual interaction and manipulation control of a hexacopter through hand gesture recognition from a data glove,” Robotica 40(12), 43754387 (2022).CrossRefGoogle Scholar
Malviya, V. and Kala, R., “Socialistic 3D tracking of humans from a mobile robot for a ‘human following robot’ behaviour,” Robotica 41(5), 14071435 (2023).CrossRefGoogle Scholar
Malviya, A. and Kala, R., “Learning-based simulation and modeling of unorganized chaining behavior using data generated from 3D human motion tracking,” Robotica 40(3), 544569 (2022).CrossRefGoogle Scholar
Gumaei, A., Hassan, M. M., Alelaiwi, A. and Alsalman, H., “A hybrid deep learning model for human activity recognition using multimodal body sensing data,” IEEE Access 7, 9915299160 (2019).CrossRefGoogle Scholar
Gu, F., Chung, M.-H., Chignell, M., Valaee, S., Zhou, B. and Liu, X., “A survey on deep learning for human activity recognition,” ACM Comput. Surv. (CSUR) 54(8), 134 (2021).Google Scholar
Abdul-Azim, H. A. and Hemayed, E. E., “Human action recognition using trajectory-based representation,” Egyptian Info. J. 16(2), 187198 (2015).CrossRefGoogle Scholar
Brun, L., Percannella, G., Saggese, A. and Vento, M., “Hack: A system for the recognition of human actions by kernels of visual strings,” 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), (2014) IEEE pp. 142147.CrossRefGoogle Scholar
Tehrani, A., Yadollahzadeh-Tabari, M., Zehtab-Salmasi, A. and Enayatifar, R., “Wearable sensor-based human activity recognition system employing bi-LSTM algorithm,” Comput. J. 67(3), bxad035 (2023). doi: 10.1093/comjnl/bxad035.Google Scholar
Sain, M. K., Laskar, R. H., Singha, J. and Saini, S., “Hybrid deep learning model-based human action recognition in indoor environment,” Robotica, 41(12), 130 (2023).CrossRefGoogle Scholar
Simonyan, K. and Zisserman, A., “Two-Stream Convolutional Networks for Action Recognition in Videos,” In: Advances in Neural Information Processing Systems. vol. 1, (MIT Press, Cambridge, MA, USA, 2014) pp. 06.Google Scholar
Das, D. B. and Birant, D., “Human activity recognition based on multi-instance learning,” Expert Syst. 40(7), e13256 (2023).Google Scholar
Deotale, D. G., Verma, M., Suresh, P., Srivastava, D., Kumar, M. and Jangir, S. K., Analysis of human activity recognition algorithms using trimmed video datasets, ch. 11, (John Wiley and Sons, Ltd, 2022).CrossRefGoogle Scholar
Semwal, V. B., Gupta, A. and Lalwani, P., “An optimized hybrid deep learning model using ensemble learning approach for human walking activities recognition,” J. Supercomput. 77(11), 1225612279 (2021).CrossRefGoogle Scholar
Dua, N., Singh, S. N. and Semwal, V. B., “Multi-input CNN-GRU based human activity recognition using wearable sensors,” Computing 103(7), 14611478 (2021).CrossRefGoogle Scholar
Andriluka, M., Pishchulin, L., Gehler, P. and Schiele, B., “2d human pose estimation: New benchmark and state of the art analysis,” 2014 IEEE Conference on Computer Vision and Pattern Recognition, IEEE (2014) pp. 36863693.Google Scholar
Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E. and Sheikh, Y., “OpenPose: Realtime multi-person 2D pose estimation using part affinity fields,” IEEE T. Pattern Anal. 43(1), 172186 (2021).CrossRefGoogle ScholarPubMed
Yadav, S. K., Tiwari, K., Pandey, H. M. and Akbar, S. A., “Skeleton-based human activity recognition using ConvLSTM and guided feature learning,” Soft Comput. 26 (Springer), 877890 (2022).CrossRefGoogle Scholar
Tu, J., Zang, T., Duan, M., Jiang, H., Zhao, J., Jiang, N. and Liu, L., “MFOGCN: multi-feature-based orthogonal graph convolutional network for 3D human motion prediction,” Vis. Comput. 40(9), 60476062 (2024). doi: 10.1007/s00371-023-03152-x.CrossRefGoogle Scholar
Jia, R., Yang, H., Zhao, L., Wu, X. and Zhang, Y., “MPA-GNet: multi-scale parallel adaptive graph network for 3D human pose estimation,” Vis. Comput., 40(9), 60476062 (2024). doi: 10.1007/s00371-023-03142-z.CrossRefGoogle Scholar
Ashwini, K., Amutha, R. and raj, S. A., “Skeletal data based activity recognition system,” 2020 International Conference on Communication and Signal Processing (ICCSP). IEEE Conference, Chennai, India (2020) pp. 444447.Google Scholar
Liu, J., Wang, G., Hu, P., Duan, L.-Y. and Kot, A. C., “Global context-aware attention lstm networks for 3d action recognition,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Conference Honolulu, HI, USA (2017) pp. 36713680.Google Scholar
Liao, T., Zhao, J., Liu, Y., Ivanov, K., Xiong, J. and Yan, Y.. “Deep transfer learning with graph neural network for sensor-based human activity recognition,” 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) .IEEE Conference, Las Vegas, NV, USA (2022) pp. 24452452.Google Scholar
Sun, S., Jia, Z., Zhu, Y., Liu, G. and Yu, Z., “Decoupled spatio-temporal grouping transformer for skeleton-based action recognition,” Vis. Comput. 40(8), (2024). doi: 10.1007/s00371-023-03132-1.CrossRefGoogle Scholar
Gaikwad, S., Bhatlawande, S., Shilaskar, S. and Solanke, A., “A computer vision-approach for activity recognition and residential monitoring of elderly people,” Med. Novel Technol. Dev. 20, 100272 (2023).10.1016/j.medntd.2023.100272CrossRefGoogle Scholar
Parmar, D., Bhardwaj, M., Garg, A., Kapoor, A. and Mishra, A., “Human activity recognition system,” 2023 International Conference on Computational Intelligence (2023) Communication Technology and Networking (CICTN) pp. 533535.Google Scholar
Verma, K. K. and Singh, B. M., “Vision based human activity recognition using deep transfer learning and support vector machine,” 2021 IEEE 8th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON), IEEE (2021) pp. 19.Google Scholar
Ullah, A., Muhammad, K., Haq, I. U. and Baik, S. W., “Action recognition using optimized deep autoencoder and cnn for surveillance data streams of non-stationary environments,” Fut. Gener. Comput. Syst. 96, 386397 (2019).10.1016/j.future.2019.01.029CrossRefGoogle Scholar
Wang, H., Kläser, A., Schmid, C. and Liu, C.-L., “Dense trajectories and motion boundary descriptors for action recognition,” Int. J. Comput Vision 103(1), 6079 (2013).CrossRefGoogle Scholar
Chaaraoui, A., Padilla-Lopez, J. and Flórez-Revuelta, F.. “Fusion of skeletal and silhouette-based features for human action recognition with rgb-d devices,” Proceedings of the IEEE international conference on computer vision workshops, IEEE (2013) pp. 9197.Google Scholar
Shahroudy, A., Ng, T. T., Yang, Q. and Wang, G., “Multimodal multipart learning for action recognition in depth videos,” IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 21232129 (2016).CrossRefGoogle ScholarPubMed
Du, Y., Wang, W. and Wang, L.. “Hierarchical recurrent neural network for skeleton based action recognition.” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2015) pp. 11101118.Google Scholar
Jalal, A., Kamal, S. and Kim, D., “A depth video-based human detection and activity recognition using multi-features and embedded hidden markov models for health care monitoring systems,” Int. J. Interact. Multi. Artifi. Intell. 4(4), 54 (2017).Google Scholar
Singh, D. and Vishwakarma, D., “A deeply coupled convnet for human activity recognition using dynamic and rgb images,” Neural Comput. Appl. 33(1), 01 (2021).10.1007/s00521-020-05018-yCrossRefGoogle Scholar
Muhammad, K., Mustaqeem, A. U., Imran, A. S., Sajjad, M., Kiran, M. S., Sannino, G. and de Albuquerque, V. H. C., “Human action recognition using attention based lstm network with dilated cnn features,” Fut. Gener. Comput. Syst. 125(1), 820830 (2021).10.1016/j.future.2021.06.045CrossRefGoogle Scholar
Khan, I. U., Afzal, S. and Lee, J. W., “Human activity recognition via hybrid deep learning based model,” Sensors 22(1), 1 (2022).Google ScholarPubMed
Wei, X. and Wang, Z., “TCN-attention-HAR: human activity recognition based on attention mechanism time convolutional network,” Sci. Rep. 14(1), 7414 (2024).10.1038/s41598-024-57912-3CrossRefGoogle ScholarPubMed
Davidashvilly, S., Cardei, M., Hssayeni, M., Chi, C. and Ghoraani, B., “Deep neural networks for wearable sensor-based activity recognition in parkinson’s disease: Investigating generalizability and model complexity,” Biomed. Eng. Online 23(1), 17 (2024).Google ScholarPubMed
Chen, H., Wang, G., Xue, J.-H. and He, L., “A novel hierarchical framework for human action recognition,” Pattern Recognit. 55(1), 148159 (2016).CrossRefGoogle Scholar
Guo, X. and Hu, Q., “Low-light image enhancement via breaking down the darkness,” Int J Comput Vision 131(1), 119 (2022).Google Scholar
Mekruksavanich, S. and Jitpattanakul, A., “Lstm networks using smartphone data for sensor-based human activity recognition in smart homes,” Sensors 21(5), 125 (2021).CrossRefGoogle ScholarPubMed
Yang, H., Zhang, J., Li, S. and Luo, T., “Bi-direction hierarchical lstm with spatialoral attention for action recognition,” J. Intell. Fuzzy Syst. 36(1), 775786 (2019).Google Scholar
Hussain, A., Hussain, T., Ullah, W. and Baik, S. W., “Vision transformer and deep sequence learning for human activity recognition in surveillance videos,” Comput. Intell. Neurosci. 2022(1), 3454167 (2022).Google ScholarPubMed
Mim, T. R., Amatullah, M., Afreen, S., Yousuf, M. A., Uddin, S., Alyami, S. A., Hasan, K. F. and Moni, M. A., “GRU-INC: An inception-attention based approach using GRU for human activity recognition,” Expert Syst. Appl. 216(1), 119419 (2023).10.1016/j.eswa.2022.119419CrossRefGoogle Scholar
Liu, Y., Zhang, H., Li, Y., He, K. and Xu, D., “Skeleton-based human action recognition via large-kernel attention graph convolutional network,” IEEE Trans. Vis. Comput Graph 29(5), 25752585 (2023).CrossRefGoogle ScholarPubMed
Yang, W., Yuan, Y., Ren, W., Liu, J., Scheirer, W. J., Wang, Z., Zhang, T., Zhong, Q., Xie, D., Pu, S., Zheng, Y., Qu, Y., Xie, Y., Chen, L., Li, Z., Hong, C., Jiang, H., Yang, S., Liu, Y., Qu, X., Wan, P., Zheng, S., Zhong, M., Su, T., He, L., Guo, Y., Zhao, Y., Zhu, Z., Liang, J., Wang, J., Chen, T., Quan, Y., Xu, Y., Liu, B., Liu, X., Sun, Q., Lin, T., Li, X., Lu, F., Gu, L., Zhou, S., Cao, C., Zhang, S., Chi, C., Zhuang, C., Lei, Z., Li, S. Z., Wang, S., Liu, R., Yi, D., Zuo, Z., Chi, J., Wang, H., Wang, K., Liu, Y., Gao, X., Chen, Z., Guo, C., Li, Y., Zhong, H., Huang, J., Guo, H., Yang, J., Liao, W., Yang, J., Zhou, L., Feng, M. and Qin, L., “Advancing image understanding in poor visibility environments: a collective benchmark study,” IEEE Trans. Image Process 29(1), 57375752 (2020).10.1109/TIP.2020.2981922CrossRefGoogle Scholar
Shahroudy, A., Liu, J., Ng, T. and Wang, G.. “Ntu rgb+d: A large scale dataset for 3d human activity analysis.” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society (2016) pp. 10101019.CrossRefGoogle Scholar
Hossain, S., Deb, K., Sakib, S. and Sarker, I. H., “A hybrid deep learning framework for daily living human activity recognition with cluster-based video summarization,” Multimed. Tools Appl. 84(9), 6219–6272 (2024).10.1007/s11042-024-19022-0CrossRefGoogle Scholar
Guo, C., Li, C., Guo, J., Loy, C. C., Hou, J., Kwong, S. and Runmin, C., “Zero-reference deep curve estimation for low-light image enhancement,” CVPR 1, (2020).Google Scholar
Jiang, Y., Gong, X., Liu, D., Cheng, Y., Fang, C., Shen, X., Yang, J., Zhou, P. and Wang, Z., “Enlightengan: Deep light enhancement without paired supervision,” IEEE Trans. Image Process 30(2), 23402349 (2021).10.1109/TIP.2021.3051462CrossRefGoogle ScholarPubMed
Mi, A., Luo, W., Qiao, Y. and Huo, Z., “Rethinking zero-dce for low-light image enhancement,” Neural Process. Lett. 56(2), 93 (2024).Google Scholar
Varshney, N., Bakariya, B., Kushwaha, A. K. S. and Khare, M., “Human activity recognition by combining external features with accelerometer sensor data using deep learning network model,” Multimed. Tools Appl. 81(24), 3463334652 (2022).CrossRefGoogle Scholar