Hostname: page-component-5db58dd55d-8mwbx Total loading time: 0 Render date: 2026-06-01T06:22:31.904Z Has data issue: false hasContentIssue false

Measuring joint attention in co-creation through automatic human activity recognition

Published online by Cambridge University Press:  12 August 2025

Tao Shen
Affiliation:
College of Design and Innovation, Tongji University , Shanghai, China
Yanyi Li
Affiliation:
College of Surveying and Geo-Informatics, Tongji University , Shanghai, China
Yonqqi Lou*
Affiliation:
College of Design and Innovation, Tongji University , Shanghai, China
Chun Liu*
Affiliation:
College of Surveying and Geo-Informatics, Tongji University , Shanghai, China
Danwen Ji
Affiliation:
College of Design and Innovation, Tongji University , Shanghai, China
Man Zhang
Affiliation:
Shanghai Research Institute for Intelligent Autonomous Systems, Tongji University , Shanghai, China
Ying Li
Affiliation:
College of Design and Innovation, Tongji University , Shanghai, China
*
Corresponding authors Yonqqi Lou and Chun Liu; Emails: louyongqi@tongji.edu.cn; liuchun@tongji.edu.cn
Corresponding authors Yonqqi Lou and Chun Liu; Emails: louyongqi@tongji.edu.cn; liuchun@tongji.edu.cn
Rights & Permissions [Opens in a new window]

Abstract

Within the broad context of design research, joint attention within co-creation represents a critical component, linking cognitive actors through dynamic interactions. This study introduces a novel approach employing deep learning algorithms to objectively quantify joint attention, offering a significant advancement over traditional subjective methods. We developed an optimized deep learning algorithm, YOLO-TP, to identify participants’ engagement in design workshops accurately. Our research methodology involved video recording of design workshops and subsequent analysis using the YOLO-TP algorithm to track and measure joint attention instances. Key findings demonstrate that the algorithm effectively quantifies joint attention with high reliability and correlates well with known measures of intersubjectivity and co-creation effectiveness. This approach not only provides a more objective measure of joint attention but also allows for the real-time analysis of collaborative interactions. The implications of this study are profound, suggesting that the integration of automated human activity recognition in co-creation can significantly enhance the understanding and facilitation of collaborative design processes, potentially leading to more effective design outcomes.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use and/or adaptation of the article.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Table 1. List of important references

Figure 1

Table 2. Reference source comparison table of important indicators

Figure 2

Figure 1. YOLO v5s network structure diagram.

Figure 3

Figure 2. Improvement of V2 compared with V1.

Figure 4

Figure 3. ASFF module structure diagram.

Figure 5

Figure 4. SKAttention module structure diagram.

Figure 6

Figure 5. Schematic of the improved YOLO-TP.

Figure 7

Figure 6. Comparison of the effect between the improved YOLO-TP network structure and the traditional network: (a) comparison of personnel target detection accuracy, (b) comparison of mAP50 and mAP50:90 accuracy indicators, (c) comparison of F1-Score model evaluation indicators and (d) comparison of training accuracy loss.

Figure 8

Figure 7. Test results of real design workshop environment.

Figure 9

Figure 8. Example of tagged information picture after YOLO-TP processing.

Figure 10

Table 3. List of video data of the design workshop

Figure 11

Figure 9. Relationship between image coordinates and real coordinates.

Figure 12

Figure 10. Key indicator statistics flow chart.

Figure 13

Table 4. Reliability evaluation table for 8 indicators (compared with manual interpretation)

Figure 14

Table 5. Factor loadings of the generalized principal component

Figure 15

Table 6. Linear combination coefficients and weights

Figure 16

Table A1. Statistical algorithm flow of the number of people in the scene

Figure 17

Table A2. Statistical algorithm flow of the number of activity tracks

Figure 18

Table A3. Statistical algorithm flow of the number of people in key areas

Figure 19

Table A4. Statistical algorithm flow of time of appearance of people in key areas

Figure 20

Table A5. Statistical algorithm flow of frequency of eye contact

Figure 21

Table A6. Statistical algorithm flow of the frequency of common facial expressions

Figure 22

Table A7. Statistical algorithm flow of mutual social distance

Figure 23

Table A8. Statistical algorithm flow of the frequency of common attention

Figure 24

Table A9. Comparison between algorithm calculation of 8 indicators and manual interpretation data