Enhancing real-time patient activity recognition for consistent performance in varying illumination and complex indoor environment

Manoj Kumar Sain; Rabul Laskar; Joyeeta Singha; Sandeep Saini

doi:10.1017/S0263574725102312

Enhancing real-time patient activity recognition for consistent performance in varying illumination and complex indoor environment

Published online by Cambridge University Press: 04 September 2025

Joyeeta Singha and

Manoj Kumar Sain*: Affiliation:
Department of Electronics and Communication, The LNM Institute of Information Technology, Jaipur, India Current affiliation: Department of Computer Science, Banasthali Vidyapith, Niwai, Rajasthan, India
Rabul Laskar: Affiliation:
Department of Electronics and Communication, National Institute of Technology, Silchar, Assam, India
Joyeeta Singha: Affiliation:
Department of Electronics and Communication, The LNM Institute of Information Technology, Jaipur, India
Sandeep Saini: Affiliation:
Department of Electronics and Communication, The LNM Institute of Information Technology, Jaipur, India
*: Corresponding author: Manoj Kumar Sain; Email: manoj.sain@lnmiit.ac.in

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Human Activity Recognition (HAR) holds significant importance in health and human-machine interaction. However, recognizing actions from 2D information faces challenges like occlusion, illumination variation, cluttered backgrounds, and view invariance. These hurdles are particularly pronounced in indoor patient monitoring settings due to fluctuating lighting conditions and cluttered backgrounds, which compromise the accuracy of activity recognition systems. A new architecture named IlluminationRevive has been proposed to tackle this issue, which utilizes an encoder-decoder convolutional neural network (CNN) and image post-processing blocks to enhance the image’s visual appearance. A new dataset comprising seven indoor physical activities has been proposed, created with contributions from thirty individuals aged 20–45. A hybrid fusion architecture is proposed to classify activities, integrating motion sequence information and body joint features. The proposed classification model incorporates generated Skeleton Motion History Images (SMHIs), collected human joint motion features from video frames, and novel kinematic and geometric features within window frames as inputs. The model can extract spatial and temporal feature vectors by integrating ResNet50-ViT (Residual Network-50 layers, Vision Transformer) and CNN-BiLSTM (Convolutional Neural Network-Bidirectional Long Short-Term Memory) layers. The suggested classification model was evaluated alongside state-of-the-art models using the LNMIIT-SMD (The LNM Institute of Information Technology-Skeleton Motion Dataset) and established NTU-RGBD (Nanyang Technological University’s Red Blue Green and Depth information) dataset. The evaluation aimed to assess the effectiveness of the proposed classification model architecture. Results demonstrate the superiority of the proposed model, achieving impressive accuracies of 98.21% on real-time data, 98.45% on the proposed dataset, and 97.12% on the NTU-RGBD dataset. This high-accuracy, low-latency approach enhances robotic perception for healthcare applications, enabling service robots to perform real-time patient monitoring and assistive tasks in dynamic indoor environments.

Keywords

patient activity recognition(PAR)convolutional neural network skeleton motion history images(SMHI)long short term memory(LSTM)restNet50 viT(Vision transformer)

Information

Type: Research Article
Information: Robotica , Volume 43 , Issue 9 , September 2025 , pp. 3277 - 3315

DOI: https://doi.org/10.1017/S0263574725102312 [Opens in a new window]
Copyright: © The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Hassan, M. M., Ullah, S., Hossain, M. S. and Alelaiwi, A., “An end-to-end deep learning model for human activity recognition from highly sparse body sensor data in internet of medical things environment,” J. Supercomput. 77(3), 2237–2250 (2021).CrossRef Google Scholar

Mekruksavanich, S. and Jitpattanakul, A., “Biometric user identification based on human activity recognition using wearable sensors: An experiment using deep learning models,” Electronics 10(3), 308 (2021).CrossRef Google Scholar

Cengiz, A. B., Birant, K. U., Cengiz, M., Birant, D. and Baysari, K., “Improving the performance and explainability of indoor human activity recognition in the internet of things environment,” Symmetry 14(10), (2022).CrossRef Google Scholar

Mihoub, A., “A deep learning-based framework for human activity recognition in smart homes,” Mobile Info. Syst. 2021(1) 6961343 (2021).CrossRef Google Scholar

Yu, X., Xu, C., Zhang, X. and Ou, L., “Real-time multitask multihuman–robot interaction based on context awareness,” Robotica 40(9), 2969–2995 (2022).CrossRef Google Scholar

Perez-Gamboa, S., Sun, Q. and Zhang, Y., “Improved sensor based human activity recognition via hybrid convolutional and recurrent neural networks,” 2021 IEEE International Symposium on Inertial Sensors and Systems (INERTIAL), (2021) IEEE pp. 1–4.Google Scholar

Jiang, X., Lu, Y., Lu, Z. and Zhou, H., “Smartphone-Based Human Activity Recognition Using Cnn in Frequency Domain", Web and Big Data: APWeb-WAIM 2018 International Workshops: MWDA, BAH, KGMA, DMMOOC, DS, Macau, China, July 23-25, 2018, Revised Selected Papers 2 (2018), Springer, pp. 101–110.Google Scholar

Zhu, R., Xiao, Z., Li, Y., Yang, M., Tan, Y., Zhou, L., Lin, S. and Wen, H., “Efficient human activity recognition solving the confusing activities via deep ensemble learning,” IEEE Access 7, 75490–75499 (2019).CrossRef Google Scholar

Pareek, P. and Thakkar, A., “A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications,” Artif. Intell. Rev. 54(3), 2259–2322 (2021).CrossRef Google Scholar

Karsh, B., Laskar, R. H. and Karsh, R. K., “mIV3Net: modified inception V3 network for hand gesture recognition,” Multimed. Tools Appl. 83(4), 10587–10613 (2024).CrossRef Google Scholar

Bibbò, L. and Vellasco, M. M. B. R., “Human activity recognition (har) in healthcare,” Appl. Sci. 13(24), (2023).CrossRef Google Scholar

Wang, S., Zhou, Z., Li, B., Li, Z. and Kan, Z., “Multi-modal interaction with transformers: bridging robots and human with natural language,” Robotica 42(2), 415–434 (2024).CrossRef Google Scholar

Al-Janabi, S. and Salman, A., “Sensitive integration of multilevel optimization model in human activity recognition for smartphone and smartwatch applications,” Big Data Min. Anal. 4(2), 124–138 (2021).CrossRef Google Scholar

Kishor, I., Mamodiya, U., Saini, S. and Bossoufi, B., “Voice-enabled human-robot interaction: adaptive self-learning systems for enhanced collaboration,” Robotica 43(6),1–29 (2025).CrossRef Google Scholar

A.Silva, C., GarciaBermudez, R. and Casilari, E., “Features selection for fall detection systems based on machine learning and accelerometer signals,” Advances in Computational Intelligence: 16th International Work-Conference on Artificial Neural Networks, IWANN. 2021, Virtual Event, June 16-18, 2021, Proceedings, Part II, (2021) Springer-Verlag pp. 380–391.Google Scholar

Beddiar, D. R., Nini, B., Sabokrou, M. and Hadid, A., “Vision-based human activity recognition: a survey,” Multimed. Tools Appl. 79(41), 30509–30555 (2020). doi: 10.1007/s11042-020-09004-3.CrossRef Google Scholar

Tian, Z., Qu, P., Li, J., Sun, Y., Li, G., Liang, Z. and Zhang, W., “A survey of deep learning-based low-light image enhancement,” Sensors 23(18), (2023).CrossRef Google Scholar PubMed

Balmik, A., Paikaray, A., Jha, M. and Nandy, A., “Motion recognition using deep convolutional neural network for kinect-based nao teleoperation,” Robotica 40(9), 3233–3253 (2022).CrossRef Google Scholar

Emptoz, H. and Lamure, M., “A systemic approach to pattern recognition,” Robotica 5(2), 129–133 (1987).CrossRef Google Scholar

Huang, H., Wu, D., Liang, Z., Sun, F. and Dong, M., “Virtual interaction and manipulation control of a hexacopter through hand gesture recognition from a data glove,” Robotica 40(12), 4375–4387 (2022).CrossRef Google Scholar

Malviya, V. and Kala, R., “Socialistic 3D tracking of humans from a mobile robot for a ‘human following robot’ behaviour,” Robotica 41(5), 1407–1435 (2023).CrossRef Google Scholar

Malviya, A. and Kala, R., “Learning-based simulation and modeling of unorganized chaining behavior using data generated from 3D human motion tracking,” Robotica 40(3), 544–569 (2022).CrossRef Google Scholar

Gumaei, A., Hassan, M. M., Alelaiwi, A. and Alsalman, H., “A hybrid deep learning model for human activity recognition using multimodal body sensing data,” IEEE Access 7, 99152–99160 (2019).CrossRef Google Scholar

Gu, F., Chung, M.-H., Chignell, M., Valaee, S., Zhou, B. and Liu, X., “A survey on deep learning for human activity recognition,” ACM Comput. Surv. (CSUR) 54(8), 1–34 (2021).Google Scholar

Abdul-Azim, H. A. and Hemayed, E. E., “Human action recognition using trajectory-based representation,” Egyptian Info. J. 16(2), 187–198 (2015).CrossRef Google Scholar

Brun, L., Percannella, G., Saggese, A. and Vento, M., “Hack: A system for the recognition of human actions by kernels of visual strings,” 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), (2014) IEEE pp. 142–147.CrossRef Google Scholar

Tehrani, A., Yadollahzadeh-Tabari, M., Zehtab-Salmasi, A. and Enayatifar, R., “Wearable sensor-based human activity recognition system employing bi-LSTM algorithm,” Comput. J. 67(3), bxad035 (2023). doi: 10.1093/comjnl/bxad035.Google Scholar

Sain, M. K., Laskar, R. H., Singha, J. and Saini, S., “Hybrid deep learning model-based human action recognition in indoor environment,” Robotica, 41(12), 1–30 (2023).CrossRef Google Scholar

Simonyan, K. and Zisserman, A., “Two-Stream Convolutional Networks for Action Recognition in Videos,” In: Advances in Neural Information Processing Systems. vol. 1, (MIT Press, Cambridge, MA, USA, 2014) pp. 06.Google Scholar

Das, D. B. and Birant, D., “Human activity recognition based on multi-instance learning,” Expert Syst. 40(7), e13256 (2023).Google Scholar

Deotale, D. G., Verma, M., Suresh, P., Srivastava, D., Kumar, M. and Jangir, S. K., Analysis of human activity recognition algorithms using trimmed video datasets, ch. 11, (John Wiley and Sons, Ltd, 2022).CrossRef Google Scholar

Semwal, V. B., Gupta, A. and Lalwani, P., “An optimized hybrid deep learning model using ensemble learning approach for human walking activities recognition,” J. Supercomput. 77(11), 12256–12279 (2021).CrossRef Google Scholar

Dua, N., Singh, S. N. and Semwal, V. B., “Multi-input CNN-GRU based human activity recognition using wearable sensors,” Computing 103(7), 1461–1478 (2021).CrossRef Google Scholar

Andriluka, M., Pishchulin, L., Gehler, P. and Schiele, B., “2d human pose estimation: New benchmark and state of the art analysis,” 2014 IEEE Conference on Computer Vision and Pattern Recognition, IEEE (2014) pp. 3686–3693.Google Scholar

Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E. and Sheikh, Y., “OpenPose: Realtime multi-person 2D pose estimation using part affinity fields,” IEEE T. Pattern Anal. 43(1), 172–186 (2021).CrossRef Google Scholar PubMed

Yadav, S. K., Tiwari, K., Pandey, H. M. and Akbar, S. A., “Skeleton-based human activity recognition using ConvLSTM and guided feature learning,” Soft Comput. 26 (Springer), 877–890 (2022).CrossRef Google Scholar

Tu, J., Zang, T., Duan, M., Jiang, H., Zhao, J., Jiang, N. and Liu, L., “MFOGCN: multi-feature-based orthogonal graph convolutional network for 3D human motion prediction,” Vis. Comput. 40(9), 6047–6062 (2024). doi: 10.1007/s00371-023-03152-x.CrossRef Google Scholar

Jia, R., Yang, H., Zhao, L., Wu, X. and Zhang, Y., “MPA-GNet: multi-scale parallel adaptive graph network for 3D human pose estimation,” Vis. Comput., 40(9), 6047–6062 (2024). doi: 10.1007/s00371-023-03142-z.CrossRef Google Scholar

Ashwini, K., Amutha, R. and raj, S. A., “Skeletal data based activity recognition system,” 2020 International Conference on Communication and Signal Processing (ICCSP). IEEE Conference, Chennai, India (2020) pp. 444–447.Google Scholar

Liu, J., Wang, G., Hu, P., Duan, L.-Y. and Kot, A. C., “Global context-aware attention lstm networks for 3d action recognition,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Conference Honolulu, HI, USA (2017) pp. 3671–3680.Google Scholar

Liao, T., Zhao, J., Liu, Y., Ivanov, K., Xiong, J. and Yan, Y.. “Deep transfer learning with graph neural network for sensor-based human activity recognition,” 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) .IEEE Conference, Las Vegas, NV, USA (2022) pp. 2445–2452.Google Scholar

Sun, S., Jia, Z., Zhu, Y., Liu, G. and Yu, Z., “Decoupled spatio-temporal grouping transformer for skeleton-based action recognition,” Vis. Comput. 40(8), (2024). doi: 10.1007/s00371-023-03132-1.CrossRef Google Scholar

Gaikwad, S., Bhatlawande, S., Shilaskar, S. and Solanke, A., “A computer vision-approach for activity recognition and residential monitoring of elderly people,” Med. Novel Technol. Dev. 20, 100272 (2023).CrossRef Google Scholar

Parmar, D., Bhardwaj, M., Garg, A., Kapoor, A. and Mishra, A., “Human activity recognition system,” 2023 International Conference on Computational Intelligence (2023) Communication Technology and Networking (CICTN) pp. 533–535.Google Scholar

Verma, K. K. and Singh, B. M., “Vision based human activity recognition using deep transfer learning and support vector machine,” 2021 IEEE 8th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON), IEEE (2021) pp. 1–9.Google Scholar

Ullah, A., Muhammad, K., Haq, I. U. and Baik, S. W., “Action recognition using optimized deep autoencoder and cnn for surveillance data streams of non-stationary environments,” Fut. Gener. Comput. Syst. 96, 386–397 (2019).CrossRef Google Scholar

Wang, H., Kläser, A., Schmid, C. and Liu, C.-L., “Dense trajectories and motion boundary descriptors for action recognition,” Int. J. Comput Vision 103(1), 60–79 (2013).CrossRef Google Scholar

Chaaraoui, A., Padilla-Lopez, J. and Flórez-Revuelta, F.. “Fusion of skeletal and silhouette-based features for human action recognition with rgb-d devices,” Proceedings of the IEEE international conference on computer vision workshops, IEEE (2013) pp. 91–97.Google Scholar

Shahroudy, A., Ng, T. T., Yang, Q. and Wang, G., “Multimodal multipart learning for action recognition in depth videos,” IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2123–2129 (2016).CrossRef Google Scholar PubMed

Du, Y., Wang, W. and Wang, L.. “Hierarchical recurrent neural network for skeleton based action recognition.” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2015) pp. 1110–1118.Google Scholar

Jalal, A., Kamal, S. and Kim, D., “A depth video-based human detection and activity recognition using multi-features and embedded hidden markov models for health care monitoring systems,” Int. J. Interact. Multi. Artifi. Intell. 4(4), 54 (2017).Google Scholar

Singh, D. and Vishwakarma, D., “A deeply coupled convnet for human activity recognition using dynamic and rgb images,” Neural Comput. Appl. 33(1), 01 (2021).CrossRef Google Scholar

Muhammad, K., Mustaqeem, A. U., Imran, A. S., Sajjad, M., Kiran, M. S., Sannino, G. and de Albuquerque, V. H. C., “Human action recognition using attention based lstm network with dilated cnn features,” Fut. Gener. Comput. Syst. 125(1), 820–830 (2021).CrossRef Google Scholar

Khan, I. U., Afzal, S. and Lee, J. W., “Human activity recognition via hybrid deep learning based model,” Sensors 22(1), 1 (2022).Google Scholar PubMed

Wei, X. and Wang, Z., “TCN-attention-HAR: human activity recognition based on attention mechanism time convolutional network,” Sci. Rep. 14(1), 7414 (2024).CrossRef Google Scholar PubMed

Davidashvilly, S., Cardei, M., Hssayeni, M., Chi, C. and Ghoraani, B., “Deep neural networks for wearable sensor-based activity recognition in parkinson’s disease: Investigating generalizability and model complexity,” Biomed. Eng. Online 23(1), 17 (2024).CrossRef Google Scholar PubMed

Chen, H., Wang, G., Xue, J.-H. and He, L., “A novel hierarchical framework for human action recognition,” Pattern Recognit. 55(1), 148–159 (2016).CrossRef Google Scholar

Guo, X. and Hu, Q., “Low-light image enhancement via breaking down the darkness,” Int J Comput Vision 131(1), 1–19 (2022).Google Scholar

Mekruksavanich, S. and Jitpattanakul, A., “Lstm networks using smartphone data for sensor-based human activity recognition in smart homes,” Sensors 21(5), 1–25 (2021).CrossRef Google Scholar PubMed

Yang, H., Zhang, J., Li, S. and Luo, T., “Bi-direction hierarchical lstm with spatialoral attention for action recognition,” J. Intell. Fuzzy Syst. 36(1), 775–786 (2019).Google Scholar

Hussain, A., Hussain, T., Ullah, W. and Baik, S. W., “Vision transformer and deep sequence learning for human activity recognition in surveillance videos,” Comput. Intell. Neurosci. 2022(1), 3454167 (2022).CrossRef Google Scholar PubMed

Mim, T. R., Amatullah, M., Afreen, S., Yousuf, M. A., Uddin, S., Alyami, S. A., Hasan, K. F. and Moni, M. A., “GRU-INC: An inception-attention based approach using GRU for human activity recognition,” Expert Syst. Appl. 216(1), 119419 (2023).CrossRef Google Scholar

Liu, Y., Zhang, H., Li, Y., He, K. and Xu, D., “Skeleton-based human action recognition via large-kernel attention graph convolutional network,” IEEE Trans. Vis. Comput Graph 29(5), 2575–2585 (2023).CrossRef Google Scholar PubMed

Yang, W., Yuan, Y., Ren, W., Liu, J., Scheirer, W. J., Wang, Z., Zhang, T., Zhong, Q., Xie, D., Pu, S., Zheng, Y., Qu, Y., Xie, Y., Chen, L., Li, Z., Hong, C., Jiang, H., Yang, S., Liu, Y., Qu, X., Wan, P., Zheng, S., Zhong, M., Su, T., He, L., Guo, Y., Zhao, Y., Zhu, Z., Liang, J., Wang, J., Chen, T., Quan, Y., Xu, Y., Liu, B., Liu, X., Sun, Q., Lin, T., Li, X., Lu, F., Gu, L., Zhou, S., Cao, C., Zhang, S., Chi, C., Zhuang, C., Lei, Z., Li, S. Z., Wang, S., Liu, R., Yi, D., Zuo, Z., Chi, J., Wang, H., Wang, K., Liu, Y., Gao, X., Chen, Z., Guo, C., Li, Y., Zhong, H., Huang, J., Guo, H., Yang, J., Liao, W., Yang, J., Zhou, L., Feng, M. and Qin, L., “Advancing image understanding in poor visibility environments: a collective benchmark study,” IEEE Trans. Image Process 29(1), 5737–5752 (2020).CrossRef Google Scholar

Shahroudy, A., Liu, J., Ng, T. and Wang, G.. “Ntu rgb+d: A large scale dataset for 3d human activity analysis.” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society (2016) pp. 1010–1019.CrossRef Google Scholar

Hossain, S., Deb, K., Sakib, S. and Sarker, I. H., “A hybrid deep learning framework for daily living human activity recognition with cluster-based video summarization,” Multimed. Tools Appl. 84(9), 6219–6272 (2024).CrossRef Google Scholar

Guo, C., Li, C., Guo, J., Loy, C. C., Hou, J., Kwong, S. and Runmin, C., “Zero-reference deep curve estimation for low-light image enhancement,” CVPR 1, (2020).Google Scholar

Jiang, Y., Gong, X., Liu, D., Cheng, Y., Fang, C., Shen, X., Yang, J., Zhou, P. and Wang, Z., “Enlightengan: Deep light enhancement without paired supervision,” IEEE Trans. Image Process 30(2), 2340–2349 (2021).CrossRef Google Scholar PubMed

Mi, A., Luo, W., Qiao, Y. and Huo, Z., “Rethinking zero-dce for low-light image enhancement,” Neural Process. Lett. 56(2), 93 (2024).CrossRef Google Scholar

Varshney, N., Bakariya, B., Kushwaha, A. K. S. and Khare, M., “Human activity recognition by combining external features with accelerometer sensor data using deep learning network model,” Multimed. Tools Appl. 81(24), 34633–34652 (2022).CrossRef Google Scholar

Article contents

Enhancing real-time patient activity recognition for consistent performance in varying illumination and complex indoor environment

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests