Hostname: page-component-77f85d65b8-8v9h9 Total loading time: 0 Render date: 2026-03-28T03:40:12.751Z Has data issue: false hasContentIssue false

Machine learning models for sequential motion recognition in human activity for logistics operations efficiency improvement

Published online by Cambridge University Press:  27 January 2025

Chih-Feng Cheng
Affiliation:
Department of Industrial Management, National Taiwan University of Science and Technology, Taipei, Taiwan
Chiuhsiang Joe Lin*
Affiliation:
Department of Industrial Management, National Taiwan University of Science and Technology, Taipei, Taiwan
Qin-Xuan Hu
Affiliation:
Department of Industrial Management, National Taiwan University of Science and Technology, Taipei, Taiwan
*
Corresponding author: Chiuhsiang Joe Lin; Email: cjoelin@mail.ntust.edu.tw
Rights & Permissions [Opens in a new window]

Abstract

Human activity recognition (HAR) is a vital component of human–robot collaboration. Recognizing the operational elements involved in an operator’s task is essential for realizing this vision, and HAR plays a key role in achieving this. However, recognizing human activity in an industrial setting differs from recognizing daily living activities. An operator’s activity must be divided into fine elements to ensure efficient task completion. Despite this, there is relatively little related research in the literature. This study aims to develop machine learning models to classify the sequential movement elements of a task. To illustrate this, three logistic operations in an integrated circuit (IC) design house were studied, with participants wearing 13 inertial measurement units manufactured by XSENS to mimic the tasks. The kinematics data were collected to develop the machine learning models. The time series data preprocessing involved applying two normalization methods and three different window lengths. Eleven features were extracted from the processed data to train the classification models. Model validation was carried out using the subject-independent method, with data from three participants excluded from the training dataset. The results indicate that the developed model can efficiently classify operational elements when the operator performs the activity accurately. However, incorrect classifications occurred when the operator missed an operation or awkwardly performed the task. RGB video clips helped identify these misclassifications, which can be used by supervisors for training purposes or by industrial engineers for work improvement.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. The positions of 13 XSENSs on the front and back of the human body.

Figure 1

Figure 2. Operational elements decomposition of the assembly of the inner box (T1).

Figure 2

Figure 3. Operational elements decomposition of the assembly of outer carton (T2).

Figure 3

Figure 4. Operational elements decomposition of the final packing of the carton for shipment (T3).

Figure 4

Table 1. Time-domain features applied in this study

Figure 5

Figure 5. The Microsoft Kinect Skeleton diagram reveals significant distortions in body posture compared to the actual body position.

Figure 6

Table 2. The determined optimal time window size for tasks in this study

Figure 7

Table 3. The best model for the inner box assembly (T1)

Figure 8

Figure 6. The confusion matrix of the best SVM model with MA normalization and a window width of 287 frames for T1 in the participant-independent validation phase.

Figure 9

Figure 7. The video clip of Participant 2 performing the S2 element of the assembly of the inner box (T1) from (a) to (c) in order.

Figure 10

Figure 8. The video clip of Participant 2 performing the S4 element of the assembly of the inner box (T1) from (a) to (c) in order.

Figure 11

Table 4. The best model for the outer carton assembly (T2)

Figure 12

Figure 9. The confusion matrix of the best NB model with MM normalization and a window width of 441 frames for T2 in the participant-independent validation phase.

Figure 13

Figure 10. The video clip of Participant 1 performing the S1 element of the assembly of the outer carton (T2).

Figure 14

Figure 11. The video clip of Participant 1 performing the S3 element of the assembly of the outer carton (T2).

Figure 15

Table 5. The best model for the packing task (T3)

Figure 16

Figure 12. The confusion matrix of the best SVM model with MM normalization and a window width of 530 frames for T3 in the participant-independent validation phase. All predictions are correct.

Figure 17

Table 6. The comparison of the best models in the literature and the current study