Hostname: page-component-89b8bd64d-mmrw7 Total loading time: 0 Render date: 2026-05-09T18:18:37.171Z Has data issue: false hasContentIssue false

Assessing two methods of webcam-based eye-tracking for child language research

Published online by Cambridge University Press:  07 May 2024

Margaret Kandel*
Affiliation:
Department of Psychology, Harvard University, USA
Jesse Snedeker
Affiliation:
Department of Psychology, Harvard University, USA
*
Corresponding author: Margaret Kandel; Email: mkandel@g.harvard.edu.
Rights & Permissions [Opens in a new window]

Abstract

We assess the feasibility of conducting web-based eye-tracking experiments with children using two methods of webcam-based eye-tracking: automatic gaze estimation with the WebGazer.js algorithm and hand annotation of gaze direction from recorded webcam videos. Experiment 1 directly compares the two methods in a visual-world language task with five to six year-old children. Experiment 2 more precisely investigates WebGazer.js’ spatiotemporal resolution with four to twelve year-old children in a visual-fixation task. We find that it is possible to conduct web-based eye-tracking experiments with children in both supervised (Experiment 1) and unsupervised (Experiment 2) settings – however, the webcam eye-tracking methods differ in their sensitivity and accuracy. Webcam video annotation is well-suited to detecting fine-grained looking effects relevant to child language researchers. In contrast, WebGazer.js gaze estimates appear noisier and less temporally precise. We discuss the advantages and disadvantages of each method and provide recommendations for researchers conducting child eye-tracking studies online.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. Example Experiment 1A (left) and Experiment 1B (right) trials. Each competitor image (e.g., mitten) appeared with its own target in the cohort condition (e.g., milk, right) and with another target in the control condition (e.g., banana, left). Image canvas borders turned from gray to purple when WebGazer estimated eye-gaze to fall on the image. Stills include images from Duñabeitia et al. (2018) and Rossion and Pourtois (2004).

Figure 1

Figure 2. Mean WebGazer looks to the target and competitor images by condition in Experiment 1A. Ribbons indicate standard error. Vertical lines indicate average target word duration. Shading indicates when looks to the target image differed from chance.

Figure 2

Figure 3. Mean WebGazer looks to the competitor image by condition in Experiment 1A. Ribbons indicate standard error. Vertical lines indicate average target word duration. Shading indicates when looks between conditions were reliably different in the cluster analysis.

Figure 3

Figure 4. Mean WebGazer looks to the target image, competitor image, and distractor images (collapsed) by condition in Experiment 1B. Ribbons indicate standard error. Vertical lines indicate average target word duration. Shading indicates the temporal overlap of the clusters when target side looks differed from chance in both the horizontal and vertical directions.

Figure 4

Figure 5. Boxplot of participant WebGazer fixation proportions to the target and non-target images in the Experiment 1B control trials from 1200–2000ms after target onset. Mean fixation proportions for each image are labeled and identified by black diamonds. The gray points represent participant means.

Figure 5

Figure 6. Mean WebGazer looks to the competitor image by condition in Experiment 1B. Ribbons indicate standard error. Vertical lines indicate average target word duration.

Figure 6

Figure 7. Mean looks to the target and competitor images by condition in the Experiment 1A annotated webcam video data and in the WebGazer data from the same participants. Ribbons indicate standard error. Vertical lines indicate average target word duration. Shading indicates when looks to the target image differed from chance.

Figure 7

Figure 8. Mean looks to the competitor image by condition in the Experiment 1A annotated webcam video data and in the WebGazer data from the same participants. Ribbons indicate standard error. Vertical lines indicate average target word duration. Shading indicates when looks between conditions reliably differed.

Figure 8

Figure 9. Mean looks to the target and competitor images by condition in the Experiment 1B annotated webcam video data and in the WebGazer data from the same participants. Ribbons indicate standard error. Vertical lines indicate average target word duration. Shading indicates the temporal overlap of the clusters when target side looks differed from chance in both the horizontal and vertical directions.

Figure 9

Figure 10. Boxplot of participant fixation proportions to the target and non-target images in the Experiment 1B control trials from 700–2000ms after target onset for the annotated webcam video data. Mean fixation proportions for each image are labeled and identified by black diamonds. The gray points represent participant means.

Figure 10

Figure 11. Mean looks to the competitor image by condition in the Experiment 1B annotated webcam video data and in the WebGazer data from the same participants. Ribbons indicate standard error. Vertical lines indicate average target word duration. Shading indicates when looks between conditions reliably differed.

Figure 11

Table 1. Experiment 2 participant ages

Figure 12

Figure 12. The 13 possible target stimulus locations in Experiment 2. The panel represents the full experiment screen (the axis labels indicate percentage of screen-size).

Figure 13

Table 2. Experiment 2 mean participant calibration scores by age group

Figure 14

Figure 13. Mean Euclidean distance (in percentage of screen-size) from the target stimulus over the course of the trial. Error bars indicate standard deviation. Ribbons indicate standard error.

Figure 15

Figure 14. Mean Euclidean distance (in percentage of screen-size) from the target stimulus over the course of the trial, broken down by participant calibration score. Ribbons indicate standard error.

Figure 16

Figure 15. Mean Euclidean distance (in percentage of screen-size) from the target stimulus over the course of the trial, broken down by participant age bin. Ribbons indicate standard error.

Figure 17

Figure 16. Density plots indicating estimated looks on the screen 500–1500ms after target onset for each possible target location. Each panel represents the full experiment screen (the axis labels indicate percentage of screen-size), and the black crosses indicate the center of the target locations.

Figure 18

Figure 17. Quadrant looks over time for the Experiment 2 child participants and Slim and Hartsuiker’s (2022) adult participants. Ribbons indicate standard error. Shading indicates the temporal overlap of the clusters when target side looks differed from chance in both the horizontal and vertical directions.