Towards automated video-based human behavior analysis: leveraging AI capabilities for spatial behavior detection

Shuyun Liu; Chris McTeague; Susanne Dreyer; Katja Thoring

doi:10.1017/pds.2025.10331

Towards automated video-based human behavior analysis: leveraging AI capabilities for spatial behavior detection

Published online by Cambridge University Press: 27 August 2025

and

Shuyun Liu*: Affiliation:
Technical University of Munich, Germany
Chris McTeague: Affiliation:
Technical University of Munich, Germany
Susanne Dreyer: Affiliation:
Technical University of Munich, Germany
Katja Thoring: Affiliation:
Technical University of Munich, Germany
*: shuyun.liu@tum.de

Article contents

Abstract:
Introduction
Related Work
Methodology
Results
Discussion
Conclusions
References

Abstract:

To investigate human behavior in spatial environments, researchers commonly implement video-based behavioral analysis, which is time-consuming and tedious. With improvements in algorithmic performance and expansions in behavior datasets, vision-based AI demonstrates great potential to support human behavior analysis and understanding in design research automatically. To bridge this gap, we proposed a framework for utilizing vision-based AI models for spatial behavior analysis tasks in design research and utilize it in applications. This work offers new insights for design researchers, pointing toward strategies for refning AI-enhanced human behavior analysis and integrating emerging AI technologies into the study of human behavior in design settings.

Keywords

human behaviour in design machine learning artifcial intelligence evaluation

Information

Type: Article
Information: Proceedings of the Design Society , Volume 5: ICED25 , August 2025 , pp. 3171 - 3180

DOI: https://doi.org/10.1017/pds.2025.10331 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright: © The Author(s) 2025

1. Introduction

Space signifcantly infuences key human activities such as collaboration, learning, and creative thinking, through shaping the activity process and human behaviors (Reference Meinel, Maier, Wagner and VoigtThoring et al., 2019; Reference Tong, Song, Wang and WangMehta & Zhu, 2009). For instance, workspaces outftted with advanced information and communication technologies can enhance social interaction and cognitive functions, thereby fostering collaboration and new knowledge creation (Reference Ravi, Gabeur, Hu, Hu, Ryali, Ma, Khedr, Rädle, Rolland, Gustafson, Mintun, Pan, Alwala, Carion, Wu, Girshick, Dollár and FeichtenhoferPeschl & Fundneider, 2014). Similarly, spatial features encouraging physical activity and relaxation, such as fexible furniture and lounge areas, can stimulate creative thinking and support refection and problem-solving (Reference Nebeling, Ott and NorrieMeinel et al., 2017).

To investigate these spatial infuences, researchers commonly implement video-based behavioral analysis, which involves recording entire activity sessions and systematically annotating observed behaviors (Reference Cash, Hicks and CulleyCash et al., 2015). However, this manual method is time-consuming and tedious, as it often requires repeatedly reviewing large volumes of video data (Brudy et al., Reference Brudy, Suwanwatcharachat, Zhang, Houben and Marquardt2018b). Moreover, solely relying on the researcher’s observation can result in a degree of subjectivity and potential inaccuracy.

Advances in computer vision, particularly vision-based artifcial intelligence (AI), offer promising solutions to these challenges. Recent progress in spatio-temporal feature representation and action classi-fcation shows that vision-based AI models can automatically detect, track, and segment human subjects and objects in a given space, providing both visualized outputs and quantitative metrics of observed behaviors (Reference Jordan and HendersonAbou Elassad et al., 2020; Reference Abou Elassad, Mousannif, Al Moatassime and KarkouchJaouedi et al., 2022). Moreover, these models show promising capabilities to recognize human posture, actions, and facial expressions in videos (Reference Al-Faris, Chiverton, Ndzi and AhmedAl-Faris et al., 2020). With improvements in algorithmic performance and expansions in behavior datasets, vision-based AI demonstrates great potential to support human behavior analysis and understanding in design research.

Despite these advancements, challenges remain when applying vision-based AI in the design domain. Specifcally, there is a signifcant need for domain-specifc datasets that refect the specifc behaviors relevant to design practices, and a closer alignment of AI’s current capabilities with the particular analytic tasks required in design studies. To better understand and leverage AI capabilities in design research, our work seeks to investigate the following research questions:

1. How can we bridge specifc behavior analysis tasks in design research and current AI capabilities for behavior analysis?
2. Which AI models currently are available and well-suited for video-based behavior analysis tasks common in design research?
3. Which spatial behaviors commonly occur and are observed within the context of design research?
4. How can we apply these vision-based AI models to analyze the behaviors in real physical space in design research, and what are the advantages and limitations of these models?

The contributions of this work are four-fold. First, we proposed a framework for utilizing vision-based AI models for spatial behavior analysis tasks in design research. Second, we compile and evaluate a set of suitable vision-based AI models—considering their usability, capabilities, and applications—to guide researchers in the effective selection and employment of AI tools. Third, we identify and categorize relevant spatial behaviors observed in workspaces into four distinct groups, drawing on insights from previous behavior studies and our own design research. This categorization helps pinpoint specifc research tasks where AI can enhance behavior analysis. Finally, we apply the proposed framework and selected AI models to video data collected in design research settings. Through these applications, we assess the models’ performance, discussing both their advantages and limitations, and ultimately offer practical experiences and new insights for researchers interested in integrating AI into design research.

2. Related Work

2.1. Video-based human behavior analysis in design research

Video-based human behavior analysis involves recording activities in videos and then observing, annotating, and analyzing behavior for deeper insight (Reference Aggarwal and RyooJordan & Henderson, 1995; Reference Larsen, Ramsay, Godinho, Gershuny and HovorkaAggarwal & Ryoo, 2011). In design research, this approach has been used to study collaboration, creativity, and social interaction. For instance, Brudy et al. (Reference Brudy, Budiman, Houben and Marquardt2018a) examined how shared screens infuence team decision-making and sensemaking via video-based open coding. Similarly, Cash et al. (Reference Cash, Hicks and Culley2015) leveraged multi-perspective video recordings to identify complex behavior patterns in product design, organizational processes, and management tasks, enabling multi-level behavior analysis. Jakobsen & Hornbæk (Reference Jaouedi, Boujnah and Bouhlel2014) used video coding to assess communication frequency, attention, and spatial preferences, indicating team interaction dynamics.

However, manual video annotation is both time-consuming and labor-intensive (Reference Papandreou, Zhu, Kanazawa, Toshev, Tompson, Bregler and MurphyNebeling et al., 2015). To mitigate this, visualization tools such as VisTACO (Reference Thoring, Desmet and Badke-SchaubTang et al., 2010), EagleView (Brudy et al., Reference Brudy, Suwanwatcharachat, Zhang, Houben and Marquardt2018b), and MIRIA (Reference Büschel, Lehmann and DachseltBüschel et al., 2021) provide insights into spatial behaviors (e.g., distance, orientation, movement) and offer data visualizations like scatterplots, heatmaps, and 3D trails. Yet, these still require substantial manual effort in observation and annotation, underscoring the need for more automated and effcient analysis methods.

2.2. State-of-the-art - vision-based AI for human behavior analysis

With the development of AI in computer vision, vision-based AI models have been widely researched to detect and recognize human behaviors based on video data (Reference Jordan and HendersonLi & Zhu, 2024; Reference Peschl and FundneiderPareek & Thakkar, 2021; Reference Marfa and RoccettiJaouedi et al., 2022). In education settings, researchers can detect students’ attention states in different classrooms by assessing their facial expressions, hand gestures, and body postures using AI models (Reference Ashwin and GuddetiAshwin & Guddeti, 2020). In healthcare, vision-based AI is applied for behavior monitoring and posture correction (Reference Sharma, Choudhury, Soni and SharmaSharma et al., 2022). In workspace scenarios, large vision models can support overexertion behavior detection in the offce (Reference Mehta and ZhuMarfa & Roccetti, 2017) and workspace occupancy monitoring (Reference Zou, Chen and SrebricZou et al., 2017), facilitating human well-being and effcient space usage in the offce. The application of AI-based computer vision method fnds in Human-computer interaction to develop a user-friendly interface and operation system Sharma et al. (Reference Tang, Pahud, Carpendale and Buxton2023). These advancements highlight the potential of vision-based AI to streamline and enhance behavior analysis in diverse spatial and contextual settings, including design research.

3. Methodology

The previous studies highlight the need for a low manual-effort behavior analysis method and the potential of AI in this feld. Building on these insights, our work further investigated how to leverage AI capabilities for behavior analysis tasks in design research, for which we conducted a workspace study and applied vision-based AI models to it.

Data Set. We design tree rooms, the activating room, the relaxing room and the neutral room (as control group), as demonstrated in Fig. 1. Participants worked individually and 30 minutes in each setting. Each session was recorded from a top-down camera view, resulting in approximately 45 hours of video data. We used data from 10 participants: eight for defning workspace behaviors through qualitative coding and two for testing AI models.

Figure 1. Room setups

AI Model Research. We searched for open-source, actively maintained vision-based AI models capable of motion tracking, human detection, object recognition, facial expression analysis, and posture/action analysis. Models were identifed through literature reviews, GitHub, Hugging Face, and relevant model hubs, and then fltered by accuracy higher than 75% (Reference Yacouby and AxmanWu et al., 2023) and core functionalities.

Qualitative Video Coding. Based on the method proposed by Saldaña (Reference Serengil and Ozpinar2021), we coded eight hours of video from eight participants in ATLAS.ti to identify workspace behaviors. We annotated behaviors with behavior code (e.g., “turning on a light,” “sitting on a stool”) and timestamps, then grouped similar behaviors into clusters and categorized them following a taxonomy adapted from Larsen et al. (Reference Li and Zhu2021).

4. Results

4.1. A framework of using AI for behavior analysis in design research

We developed a framework that leverages AI for video-based human behavior analysis in design research (Fig. 2). The process begins with deconstructing complex human behaviors into more specifc target behaviors, allowing researchers to identify key objects, contextual elements. These detailed factors identify the behavior features, which guide the selection and placement of camera systems—including the number, types, and optimal angles of cameras—to ensure that critical aspects of the behavior are adequately captured on video. Next, vision-based AI models process these video inputs to extract behavior features using capabilities such as detection, tracking, segmentation, and recognition. For instance, AI tools can detect and track both humans and objects, or recognize human postures like standing and sitting. By representing observed objects and their relationships, AI models generate analysis process stored in accessible data formats. These outputs can include visualizations, analytic graphs, and qualitative annotations (e.g., object and behavior labels).

Figure 2. Framework for video-based human behavior analysis utilizing AI in design research

4.2. Vision-based AI models for behavior analysis

Table 1 summarizes a selection of open-source vision-based AI models identifed for behavior analysis tasks. Pose estimation models, such as AlphaPose, PoseNet, DensePose, and OpenPose, can detect and track key points of the whole body, including the human face, body, hand, and foot, enabling to capture the subtle human actions in complex behavior analysis. YOLOv8, an object detection model, can detect and track humans and objects, and classify diverse objects with name labels. SAM2 is an object segmentation model, with which users can identify the segment interest in the video using positive and negative prompts. MMAction2 and SlowFast provide comprehensive frameworks for action recognition, supporting various algorithms and integrating with popular datasets. Moreover, deepFace can analyze facial expressions to predict human emotions.

Table 1. Vision-based AI models for behavior analysis.

4.3. Behavior classifcations in the workspace design research

We qualitatively code the behaviors of eight participants in ca. eight hours of videos and demonstrate the result in Table 2.

We defned 106 spatial behavior codes in the workspace, grouped them into 32 behaviors, and then classifed them into four categories: object manipulation, spatial movement, posture, and eye gaze. The frequency(Freq.) represents the overall occurrence numbers of each behavior, and the following columns show behaviors’ occurrence percentage in the activating room(Act.), the neutral room 1(Neu.1), the relaxing room(Rlx.) and the neutral room 2(Neu.2). Specifc actions, such as manipulate the light and lean over the desk, shows considerable difference in frequency depending on the workspace arrangement. General behaviors, such as moving, standing, and reading, occurred in different workspaces with relatively balanced frequencies.

Table 2. Observed Behaviors by Qualitative Video Coding (Percentages by Room).

Note: “-” indicates not applicable to this room due to the workspace element setup.

4.4. Applications of AI for human behavior analysis in workspace

We applied the proposed AI-based behavior analysis framework to selected testing videos. Specifcally, we utilized a segmentation AI model(SAM2) and a recognition AI toolkit (MMAction2) from Section 4.2 to analyze one representative behavior under each behavior category defned in Section 4.3. We then conducted evaluations of the AI-generated results.

Spatial Movement. In this behavior category, we focus on using AI models to observe moving-related behavior patterns, such as walking and position changing, in activating and neutral rooms. Using SAM2 (Reference SaldañaRavi et al., 2024), we tracked the human movement path, as illustrated in Fig 3. Specifcally, we used SAM2 to track the participant’s central body point at a rate of two frames per second, resulting in a sequence of 3,600 yellow points over the 30-minute video. Linking these points by timecode generated the observed movement path. The coordinate axes correspond to the original image frame of (1080x1920 pixels). Fig. 4b and 4d respectively display participant stay position and movement path in activation and neutral rooms.

Figure 3. AI-tracked human movement path in 30 minuets video in two rooms

To evaluate accuracy, we reviewed a result video composed of the 3,600 analyzed frames and identi-fed mis-tracked frames. Out of 3,600 frames, AI incorrectly tracked nine frames, achieving a 99.75% accuracy in path tracking.

Object Manipulation. As an example of object manipulation, we examined participants’ interactions with a soft cushion on the desk (labeled “Manipulate cushion” in Table 4.3). We analyze when and how long the participant interacts with the cushion. We used SAM2 to detect, segment, and track the relevant objects (human hands and the cushion) twice a second and cover them with masks. Based on the overlapping of different masks we can capture the timestamps and duration of human-cushion interaction. The AI-estimated interaction timestamps are represented by the orange line in Fig. 4.

We then compared these AI-estimated interaction timestamps with the timestamps coded by two human researchers (blue line in Fig. 4). In results coded by the human researcher, the video contained eight human-cushion interactions, all correctly identifed by the AI, but with one additional false estimation (nine estimated interactions in total). Among the eight correctly detected interactions, fve showed start and end times closely aligning with the human researchers’ analysis, while three exhibited larger discrepancies in timing.

Figure 4. AI-estimated human-cushion interactions in 30 minutes of video in the activating room

Posture. We employed MMAction 2 (Reference Fan, Li, Xiong, Lo and FeichtenhoferContributors, 2020) to analyze participants’ postures during the frst fve minutes of videos recorded in both the activating and neutral rooms. More specifcally, within MMAction2, we used VideoMAE (Reference Varghese and SambathTong et al., 2022), a skeleton-based action recognition model, combined with the Atomic Visual Actions (AVA) dataset. The AI model generated posture predictions at a frequency of one frame per second and only reported results with confdence scores above 0.48. Figure 5 presents the AI-predicted postures, their frequency, and their temporal distribution.

Figure 5. AI-predicted postures in 5 minutes of video in two rooms

We evaluated the AI-based posture recognition against human-coded reference data using three performance indicators: recall, precision, and F1-score (Reference Zou, Chen and SrebricYacouby & Axman, 2020) (Table 3). The AI model produced posture predictions for 300 frames of each 5-minute video; these predictions were then compared frame-by-frame with the video annotations from the human researcher. For “get up,” “jump,” and fall down,” the data is insuffcient to compute the aforementioned indicators, so these actions were excluded from the table.

Recall (1) quantifes the proportion of researcher-identifed behavior instances correctly detected by the AI:

(1)

$$Recall = \frac{{{Correct}}\over {{Total}}} \times 100\% $$

“Stand” showed low recall (18.4%), whereas “Bend/Bow” and “Touch” both exceeded 90%, indicating strong coverage by the AI. Precision (2) measures correct AI predictions among all AI predictions:

(2)

$$Precision = \frac{{{Correct}}\over{{AIpredicted}}} \times 100\% $$

Except for “Stand,” all postures surpassed 63% precision, with particularly strong results for “Walk” and “Touch.” Finally, the F1-score (3), the harmonic mean of recall and precision, provides an overall performance metric:

(3)

$$ F1 - Score = 2 \times \frac{{{Precision \times Recall}}\over{{Precision + Recall}}} \times 100\% $$

“Carry/Hold” and “Touch” achieved high F1-scores, refecting robust overall detection, while “Stand” remained problematic under the tested conditions.

Table 3. Performance analysis of AI posture recognition for participant 3382

5. Discussion

5.1. Performance of AI models for behavior analysis

In this study, current vision-based open-source AI models show promising capabilities, especially in tracking, detection, and segmentation. Moreover, several AI models demonstrate good usability for nonexperts, providing non-coding browsers and demons. Notably, tools like SAM2 and YOLOv8 provided accessible interfaces and comprehensive documentation, making them more user-friendly for design researchers with limited coding backgrounds. In contrast, more specialized tools such as MMAction2 might remain challenging for researchers without coding experience.

Overall, the tested AI models achieved proper accuracy in most tasks. In spatial movement analysis, the AI-generated movement paths closely matched human-coded references, demonstrating high accuracy. In object manipulation, AI effectively identifes all the interactions with objects, though it struggles to detect the interaction duration. The inaccuracies in duration can be caused by the proximity of the human hand to the object. In posture recognition, the AI’s systematic accuracy was moderate, achieving f1-scores above 57% for most predicted postures.

Two main errors of AI emerged in posture recognition: (1) failure to respond to target behavior when it occurs, and (2) misclassifcations, most notably error predictions in standing and bending/bowing. These issues are likely related to the camera’s top-down angle, which resulted in substantial occlusion of participants’ lower bodies and thus limited the model’s ability to interpret torso positions accurately. In contrast, the upper body and hands were generally visible, facilitating more accurate recognition of hand-related actions. For instance, the system effectively identifed interactions between the hands and the environment, such as carrying/holding, and touching objects. However, it struggled to infer the specifc context and purpose. As a result, grouping a range of specifc behaviors (e.g. adjusting the height of a desk, holding a chair’s armrest, or using a mobile phone) under a single generic category - “carry/hold”.

5.2. Limitations and future work

Several limitations apply to this study. We only used top-down camera angles. Incorporating multiple camera perspectives could improve the accuracy of posture and action recognition, especially for lower-body movements. Moreover, it poses signifcant limitations to accurately detect eye gaze relying solely on videos, due to the infuence of camera angles and the inherent limitations of AI models. Integrating sensor-based technologies (e.g., eye-tracking devices) with AI-driven video analysis may offer a more comprehensive approach. While AI-based posture and action recognition have great potential for making behavior analysis more automatic, its direct application in design research is still challenging. Achieving higher accuracy and context-specifc understanding will likely require domain-specifc datasets and advanced algorithms tailored to design-related behaviors.

Moving forward, we plan to refne and expand the space-related behavior categories proposed in Section 4.3, ensuring they refect the common needs of human behavior analysis and design research. Moreover, we would break down the general observed actions into fne-grained actions to improve the general-izability of AI-detected behaviors for diverse design-related research scope. Based on that, we aim to develop an AI-enhanced analysis toolkit to detect behavior patterns in design-related research more automatically. Future work will also include direct comparisons between human-coded and AI-based behavior analysis to more thoroughly assess the advantages and limitations of current AI models in design research contexts.

6. Conclusions

This study introduced a framework for leveraging vision-based AI models to analyze human behavior in spatial environments. By aligning AI capabilities with specifc analytical tasks, we aimed to enhance the effciency of video-based behavior analysis and reduce the extensive manual effort traditionally involved. Moreover, we listed current visual-based AI models and tools, which are available and offer reliable tracking, detection, and segmentation capabilities, providing reference to design researchers. Additionally, we identifed a set of commonly observed spatial behaviors and tested suitable AI tools for them.

Our applications demonstrated how identifed behavior categories—ranging from spatial movement to posture recognition—could be examined following the proposed framework. These practical implementations provide valuable guidance for researchers seeking to incorporate AI into their analyses, illustrating the processes and considerations necessary for successful integration. Finally, our evaluation of the AI models’ performance offers new insights for design researchers, pointing toward strategies for refning AI-enhanced human behavior analysis and integrating emerging AI technologies into the study of human behavior in design settings.

References

Abou Elassad, Z. E., Mousannif, H., Al Moatassime, H., & Karkouch, A. (2020). The application of machine learning techniques for driving behavior analysis: A conceptual framework and a systematic literature review. Engineering Applications of Artifcial Intelligence, 87, 103312.Google Scholar

Aggarwal, J. & Ryoo, M. (2011). Human activity analysis: A review. ACM Computing Surveys, 43(3), 16.10.1145/1922649.1922653CrossRef Google Scholar

Al-Faris, M., Chiverton, J., Ndzi, D., & Ahmed, A. I. (2020). A review on computer vision-based methods for human action recognition. Journal of Imaging, 6(6), 46.10.3390/jimaging6060046CrossRef Google Scholar

Ashwin, T. S. & Guddeti, R. M. R. (2020). Impact of inquiry interventions on students in e-learning and classroom environments using affective computing framework. User Modeling and User-Adapted Interaction, 30(5), 759-801.10.1007/s11257-019-09254-3CrossRef Google Scholar

Brudy, F., Budiman, J. K., Houben, S., & Marquardt, N. (2018a). Investigating the role of an overview device in multi-device collaboration. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (pp. 1-13).: ACM.10.1145/3173574.3173874CrossRef Google Scholar

Brudy, F., Suwanwatcharachat, S., Zhang, W., Houben, S., & Marquardt, N. (2018b). Eagleview: A video analysis tool for visualising and querying spatial interactions of people and devices. In Proceedings of the 2018 ACM International Conference on Interactive Surfaces and Spaces (pp. 61-72).: ACM.CrossRef Google Scholar

Büschel, W., Lehmann, A., & Dachselt, R. (2021). Miria: A mixed reality toolkit for the in-situ visualization and analysis of spatio-temporal interaction data. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (pp. 1-15).: ACM.Google Scholar

Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., & Sheikh, Y. (2021). Openpose: Realtime multi-person 2d pose estimation using part affnity felds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1), 172-186.10.1109/TPAMI.2019.2929257CrossRef Google Scholar

Cash, P., Hicks, B., & Culley, S. (2015). Activity theory as a means for multi-scale analysis of the engineering design process: A protocol study of design in practice. Design Studies, 38, 1-32.10.1016/j.destud.2015.02.001CrossRef Google Scholar

Contributors, M. (2020). Openmmlab’s next generation video understanding toolbox and benchmark.Google Scholar

Fan, H., Li, Y., Xiong, B., Lo, W.-Y., & Feichtenhofer, C. (2020). Pyslowfast.Google Scholar

Fang, H.-S., Li, J., Tang, H., Xu, C., Zhu, H., Xiu, Y., Li, Y.-L., & Lu, C. (2023). Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(6), 7157-7173.10.1109/TPAMI.2022.3222784CrossRef Google Scholar

Güler, R. A., Neverova, N., & Kokkinos, I. (2018). Densepose: Dense human pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7297-7306).: IEEE.Google Scholar

Jakobsen, M. R. & Hornbæk, K. (2014). Up close and personal: Collaborative work on a high-resolution multitouch wall display. ACM Transactions on Computer-Human Interaction, 21(2), 11: 1-11:34.Google Scholar

Jaouedi, N., Boujnah, N., & Bouhlel, M. S. (2022). A survey on human behavior analysis and actions recognition from videos. In 2022 IEEE 9th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT) (pp. 493-498).: IEEE.Google Scholar

Jordan, B. & Henderson, A. (1995). Interaction analysis: Foundations and practice. Journal of the Learning Sciences, 4(1), 39-103.10.1207/s15327809jls0401_2CrossRef Google Scholar

Larsen, K. R., Ramsay, L. J., Godinho, C. A., Gershuny, V., & Hovorka, D. S. (2021). Ic-behavior: An interdisciplinary taxonomy of behaviors. Plos One, 16(9), e0252003.10.1371/journal.pone.0252003CrossRef Google Scholar

Li, S. & Zhu, A. (2024). Recent advancements with human behavior recognition and ai in construction automation: A literature review. In 2024 International Conference on Advanced Robotics and Mechatronics (ICARM) (pp. 759-764).: IEEE.Google Scholar

Marfa, G. & Roccetti, M. (2017). A practical computer based vision system for posture and movement sensing in occupational medicine. Multimedia Tools and Applications, 76, 8109-8129.10.1007/s11042-016-3469-0CrossRef Google Scholar

Mehta, R. & Zhu, R. (2009). Blue or red? exploring the effect of color on cognitive task performances. Science, 323(5918), 1226-1229.Google Scholar

Meinel, M., Maier, L., Wagner, T., & Voigt, K.-I. (2017). Designing creativity-enhancing workspaces: A critical look at empirical evidence. Journal of Technology and Innovation Management, 1(1).Google Scholar

Nebeling, M., Ott, D., & Norrie, M. C. (2015). Kinect analysis: a system for recording, analysing and sharing multimodal interaction elicitation studies. In Proceedings of the 7th ACM SIGCHI Symposium on Engineering Interactive Computing Systems (pp. 142-151).: ACM.Google Scholar

Papandreou, G., Zhu, T. L., Kanazawa, N., Toshev, A., Tompson, J., Bregler, C., & Murphy, K. P. (2017). Towards accurate multi-person pose estimation in the wild. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3711-3719).: IEEE.Google Scholar

Pareek, P. & Thakkar, A. (2021). A survey on video-based human action recognition: recent updates, datasets, challenges, and applications. Artifcial Intelligence Review, 54(3), 2259-2322.10.1007/s10462-020-09904-8CrossRef Google Scholar

Peschl, M. F. & Fundneider, T. (2014). Designing and enabling spaces for collaborative knowledge creation and innovation: From managing to enabling innovation as socio-epistemological technology. Computers in Human Behavior, 37, 346-359.10.1016/j.chb.2012.05.027CrossRef Google Scholar

Ravi, N., Gabeur, V., Hu, Y.-T., Hu, R., Ryali, C., Ma, T., Khedr, H., Rädle, R., Rolland, C., Gustafson, L., Mintun, E., Pan, J., Alwala, K. V., Carion, N., Wu, C.-Y., Girshick, R., Dollár, P., & Feichtenhofer, C. (2024). Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714.Google Scholar

Saldaña, J. (2021). The Coding Manual for Qualitative Researchers. SAGE Publications Ltd.Google Scholar

Serengil, S. I. & Ozpinar, A. (2021). Hyperextended lightface: A facial attribute analysis framework. In 2021 International Conference on Engineering and Emerging Technologies (ICEET) (pp. 1-4).: IEEE.Google Scholar

Sharma, A., Shah, Y., Agrawal, Y., & Jain, P. (2022). Real-time recognition of yoga poses using computer vision for smart health care. arXiv preprint arXiv:2201.07594.Google Scholar

Sharma, H. K., Choudhury, T., Soni, R., & Sharma, S. (2023). Human computer interface (hci) controlled ai enabled system for optimization. In 2023 13th International Conference on Cloud Computing, Data Science & Engineering (Confuence) (pp. 552-557).: IEEE.Google Scholar

Tang, A., Pahud, M., Carpendale, S., & Buxton, B. (2010). Vistaco: Visualizing tabletop collaboration. In ACM International Conference on Interactive Tabletops and Surfaces (pp. 29-38).: ACM.Google Scholar

Thoring, K., Desmet, P., & Badke-Schaub, P. (2019). Creative space: A systematic review of the literature. Proceedings of the Design Society: International Conference on Engineering Design, 1(1), 299-308.Google Scholar

Tong, Z., Song, Y., Wang, J., & Wang, L. (2022). Videomae: Masked autoencoders are data-effcient learners for self-supervised video pre-training. In Advances in Neural Information Processing Systems, volume 35 (pp. 10078-10093).Google Scholar

Varghese, R. & Sambath, M. (2024). Yolov8: A novel object detection algorithm with enhanced performance and robustness. In 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS) (pp. 1-6).: IEEE.Google Scholar

Wu, H., Zhang, Z., Zhang, E., Chen, C., Liao, L., Wang, A., Li, C., Sun, W., Yan, Q., Zhai, G., et al. (2023). Q-bench: A benchmark for general-purpose foundation models on low-level vision. arXiv preprint arXiv:2309.14181.Google Scholar

Yacouby, R. & Axman, D. (2020). Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classifcation models. In Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems (pp. 79-91).: Association for Computational Linguistics.Google Scholar

Zou, Z., Chen, Q., & Srebric, J. (2017). Occupancy detection in the offce by analyzing surveillance videos and its application to building energy conservation. Energy and Buildings, 152, 385-398.10.1016/j.enbuild.2017.07.064CrossRef Google Scholar