Hostname: page-component-89b8bd64d-r6c6k Total loading time: 0 Render date: 2026-05-09T08:03:39.669Z Has data issue: false hasContentIssue false

Analyzing Complex Educational Data: A Data Analytic Framework for Integrating Structured and Unstructured Eye-Tracking Data

Published online by Cambridge University Press:  23 March 2026

Luyang Fang
Affiliation:
Statistics, University of Georgia , USA
Shiyu Wang*
Affiliation:
Educational Psychology, University of Georgia , USA
Yinghan Chen
Affiliation:
Mathematics and Statistics, University of Nevada Reno , USA
Susu Zhang
Affiliation:
Psychology; Statistics, University of Illinois at Urbana-Champaign , USA
Zichu Liu
Affiliation:
Educational Psychology, University of Georgia , USA
Wenxuan Zhong
Affiliation:
Statistics, University of Georgia , USA
*
Corresponding author: Shiyu Wang; Email: swang44@uga.edu
Rights & Permissions [Opens in a new window]

Abstract

The growing use of computer-based assessments has produced complex process data that capture learners’ cognitive and behavioral processes in real time. Among these, eye-tracking data provide rich temporal information on how individuals attend to and process visual information during problem solving. Yet, analyzing such high-dimensional, temporally dependent, and multimodal data remains a methodological challenge. This study introduces a two-component data-analytic framework (DAK) for integrating and interpreting structured and unstructured data in educational assessments. The first component employs a time-aware long short-term memory Autoencoder to extract latent features representing dynamic visual attention patterns. The model extends conventional architectures by incorporating fixation duration and elapsed time between actions, using a data-driven temporal decay function, and optimizing a multi-target reconstruction objective. The second component integrates these extracted features through clustering, categorical data analyses, and mixed-effects modeling to generate construct-relevant validity evidence for test-taking and learning behaviors. We demonstrate the DAK using structured scores and unstructured eye-tracking data from a spatial rotation learning program. Results reveal distinct behavioral patterns linked to test performance and intervention effectiveness, highlighting the potential of multimodal process data to advance psychometric modeling and instrument design.

Information

Type
Application and Case Studies - Original
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press on behalf of Psychometric Society
Figure 0

Figure 1 The example of the fixation sequence for one participant on one question.

Figure 1

Table 1 Structured and unstructured data from each participant

Figure 2

Figure 2 Example of a fixation sequence from one participant to one question.

Figure 3

Figure 3 The distribution of fixation sequence lengths for test block 1.

Figure 4

Figure 4 Distribution of fixation duration for each action (test block 1).

Figure 5

Figure 5 Five transition patterns of consecutive fixations for test block 1 sequences.

Figure 6

Figure 6 The proposed data analytic framework.

Figure 7

Figure 7 Autoencoder model structure. (a) Overall architecture incorporating temporal information. (b) Notation and data representation at the sequence level.

Figure 8

Figure 8 An illustration of a T-LSTM memory block.

Figure 9

Figure 9 (a) The proposed data-driven function. (b)–(d) Three potential alternative functions.

Figure 10

Table 2 Comparison of model performance across different metrics (Accuracy, BLEU, and ROUGE) for the testing blocks, with their chosen hyperparameters

Figure 11

Table 3 Comparison of model performance across different metrics (Accuracy, BLEU, and ROUGE) for the learning blocks, with their chosen hyperparameters

Figure 12

Figure 10 (a) Hierarchical clustering dendrogram. (b) Silhouette analysis.

Figure 13

Table 4 Descriptive statistics for five testing clusters

Figure 14

Figure 11 The distribution of six fixation locations for each cluster.

Figure 15

Table 5 Summary statistics of sequence length distribution for each cluster

Figure 16

Figure 12 The adjusted transition matrix for five clusters.

Figure 17

Table 6 Problem solving (PS) behaviors interpretation

Figure 18

Figure 13 The mosaic plot of contingency table between item type and testing cluster in two tests.

Figure 19

Table 7 Generalized linear mixed model results

Figure 20

Figure 14 (a) Scree plot showing the percentage of variance explained by each principal component. (b) Hierarchical clustering dendrogram. (c) Silhouette analysis.

Figure 21

Table 8 Descriptive statistics for four learning clusters

Figure 22

Figure 15 The distribution of six fixation locations for each cluster.

Figure 23

Table 9 Summary statistics of sequence length distribution for each cluster

Figure 24

Figure 16 The adjusted transition matrix for four clusters.

Figure 25

Table 10 Learning behaviors interpretation

Figure 26

Figure 17 The mosaic plot of the two contingency tables.

Figure 27

Figure 18 The plot of contingency table between LB2 cluster mode and TB2 cluster mode.

Supplementary material: File

Fang et al. supplementary material

Fang et al. supplementary material
Download Fang et al. supplementary material(File)
File 163.2 KB