The growing use of computer-based assessments has produced complex process data that capture learners’ cognitive and behavioral processes in real time. Among these, eye-tracking data provide rich temporal information on how individuals attend to and process visual information during problem solving. Yet, analyzing such high-dimensional, temporally dependent, and multimodal data remains a methodological challenge. This study introduces a two-component data-analytic framework (DAK) for integrating and interpreting structured and unstructured data in educational assessments. The first component employs a time-aware long short-term memory Autoencoder to extract latent features representing dynamic visual attention patterns. The model extends conventional architectures by incorporating fixation duration and elapsed time between actions, using a data-driven temporal decay function, and optimizing a multi-target reconstruction objective. The second component integrates these extracted features through clustering, categorical data analyses, and mixed-effects modeling to generate construct-relevant validity evidence for test-taking and learning behaviors. We demonstrate the DAK using structured scores and unstructured eye-tracking data from a spatial rotation learning program. Results reveal distinct behavioral patterns linked to test performance and intervention effectiveness, highlighting the potential of multimodal process data to advance psychometric modeling and instrument design.