1. Introduction
Vocabulary knowledge concerns learners’ understanding of words in terms of their form, meaning, and use. It serves as a vital measure of learners’ overall language development (Nation, Reference Nation2001). This becomes particularly significant during the primary grades, which serve as a pivotal period for young learners’ language acquisition and overall linguistic growth. During this stage, vocabulary learning provides a foundation for future language skills and academic success. Research also suggests that the working memory (WM) capacity of young learners may influence their ability to acquire and retain new vocabulary (Martin & Ellis, Reference Martin and Ellis2012; Teng, Reference Teng2024a). WM refers to the temporary storage and manipulation of information relevant to a particular cognitive task (Baddeley & Hitch, Reference Baddeley, Hitch and Bower1974). It acts as a mental workspace and plays a determining role in learners’ outcomes in vocabulary acquisition (Gathercole et al., Reference Gathercole, Willis, Emslie and Baddeley1992). Therefore, understanding the intricate relationship between WM and BVK during the primary grades holds both theoretical and practical significance. A clearer understanding of this linkage may inform instructional approaches and interventions that support vocabulary development by taking learners’ cognitive resources into account.
Despite the significance of BVK and WM for young learners, there has been limited attention to understanding the developmental patterns of these variables. Little is known about how WM influences vocabulary knowledge development over time from a longitudinal perspective, and vice versa. Therefore, the current study aimed to address this gap by investigating the longitudinal trajectories of WM and BVK in foreign language young learners across Grades 1–5. By examining the developmental trends of these cognitive factors, this study aimed to provide valuable insights into the domain of vocabulary research from a cognitive perspective. Understanding how WM and BVK evolve over time can contribute to a deeper understanding of the cognitive processes involved in vocabulary acquisition and retention. The findings have the potential to shed light on the dynamic nature of working memory and its relationship with the acquisition and expansion of BVK in a foreign language context. The aim was to make a contribution to the ongoing debates regarding whether WM operates as a domain-general or domain-specific cognitive function, specifically in the context of BVK.
Based on data from a sample of foreign language learners aged 6–7, the findings are interpreted through the Cognitive Filter Model and the Transactional Model in a longitudinal foreign language learning context. In the Cognitive Filter Model, individual differences in WM are assumed to be trait-like and unaffected by students’ BVK. Under this view, WM predicts later foreign language outcomes (including vocabulary learning) but is not itself influenced by prior vocabulary knowledge. In the Transactional Model, by contrast, individual differences in WM are expected to be influenced by students’ BVK, implying bi-directional coupling: WM supports vocabulary learning, and accumulating vocabulary knowledge reduces processing load and strengthens representational efficiency, which in turn enhances WM performance. Thus, this study makes a twofold contribution: first, it adjudicates whether WM operates as a stable, exogenous learner characteristic that filters instructional input; second, it tests whether WM and BVK co-develop, with vocabulary growth feeding back to shape WM over time.
2. Literature review
2.1. Working memory
Baddeley and Hitch’s seminal work in 1974, which was later updated by Baddeley (Reference Baddeley2000), presents a comprehensive view of WM as a multi-component system. In this model, WM is not a unitary construct but rather comprises multiple interacting components. The model posits that information from modality-specific short-term memory stores is dynamically processed and manipulated, allowing for cognitive tasks to be executed effectively. The episodic buffer, a critical addition in Baddeley’s (Reference Baddeley2000) update, plays a pivotal role by serving as a temporary workspace that integrates information from various modalities and binds it into meaningful chunks. The Baddeley and Hitch model has been influential in our understanding of cognitive processes, providing a nuanced framework for how information is actively manipulated and processed in the mind and how various components of memory interact to support cognitive functions. Based on this model, WM is often understood as the cognitive capacity of learners to retain and process a certain amount of information simultaneously for a brief period while engaging in various tasks (Baddeley, Reference Baddeley1992).
Over time, researchers have increasingly recognized the intertwined relationship between WM and vocabulary learning, with WM being identified as a significant source and driving force behind the development and refinement of cognitive models of vocabulary acquisition (Ellis & Sinclair, Reference Ellis and Sinclair1996). In the field of second-language acquisition, Baddeley’s multi-component model of working memory has emerged as the most influential framework for investigating this cognitive process. Recent studies have highlighted distinct yet positive roles of phonological and executive working memory in relation to specific sub-skills of second language learning, indicating a growing trend in the field (Wen & Skehan, Reference Wen and Skehan2021). Consequently, it has become crucial to explore individual differences in executive functions such as updating, task switching, and inhibitory control to gain a comprehensive understanding of working memory in language learning (Miyake, Reference Miyake2001). By examining these executive functions, researchers can shed light on the intricate mechanisms and variations in working memory processes, ultimately contributing to a deeper comprehension of second language acquisition.
2.2. Vocabulary knowledge
When considering vocabulary knowledge, it is important to go beyond simple form–meaning links and to specify how such knowledge is assessed. Following Nation (Reference Nation2001), vocabulary knowledge spans multiple facets, including derivations, collocations, and associations, that can be measured receptively (recognition-based) or productively (recall/use-based). Receptive assessments commonly include multiple-choice or matching tests of word–meaning recognition, yes/no checklist formats with pseudo-words, and lexical decision tasks. Productive assessments typically require active retrieval or use, such as cued recall of meanings given forms, L2 → L1 translation production, gap-fill/cloze with no or limited options, controlled writing tasks targeting target items, or prompted sentence production to elicit collocations and derivational forms.
Anderson and Freebody’s (Reference Anderson, Freebody and Guthrie1981) breadth–depth framework, further elaborated by Schmitt (Reference Schmitt2008), aligns naturally with these assessment modes. BVK refers to how many lexical items a learner knows and is most often measured receptively. For example, with standardized vocabulary size tests or yes/no checklists that estimate coverage across frequency bands. Productive breadth can also be sampled, though less commonly, via controlled productive vocabulary levels tests that require learners to supply target words in context. Depth of vocabulary knowledge concerns how well a word is known and typically relies on tasks that probe multiple dimensions of a single item. Receptive depth measures may include tests of polysemy recognition, synonym–antonym networks, collocation recognition, and derivational family recognition. Productive depth measures ask learners to produce appropriate collocates, supply derivational variants (noun/verb/adjective forms), generate context-appropriate senses of polysemous items, or construct sentences that demonstrate accurate combinatorial use.
In this study, we focused solely on vocabulary breadth, which provides a clean, stable window into lexical growth that is well suited to repeated measurement. Breadth, defined as the number of lexical items known, typically assessed receptively, yields scalable coverage estimates across frequency bands, and is less entangled with task-specific production demands. This makes it more sensitive to early-stage gains, precisely where WM processes of maintenance and integration are expected to exert their strongest influence (Gathercole et al., Reference Gathercole, Willis, Emslie and Baddeley1992). By limiting outcomes to breadth, this study avoids mixing constructs such as collocational fluency, derivational control, and context-sensitive usage, keeping the operationalization aligned with hypothesized mechanisms: WM helps learners hold unfamiliar forms and connect them with pre-existing semantic categories, expanding recognition-based coverage before deeper combinatorial knowledge consolidates.
2.3. Theoretical frameworks on linking working memory and vocabulary knowledge
The link between WM and vocabulary acquisition is theorized through three complementary mechanisms. First, the phonological loop maintains unfamiliar sound sequences, supporting initial encoding and phonolexical mapping; behavioural indices such as non-word repetition and digit span often forecast early word learning and short-term gains, especially for foreign language young learners (Teng, Reference Teng2025). Second, executive control processes enable consolidation and integration: updating incorporates new evidence into evolving lexical representations, task switching supports flexible strategy deployment across language input types, and inhibitory control limits interference from L1 or competing L2 candidates, aiding precise form–meaning binding (Swanson et al., Reference Swanson, Orosco, Lussier, Gerber and Guzman-Orth2011). Third, the episodic buffer binds phonological, semantic, visual, and contextual cues into chunked representations (Baddeley, Reference Baddeley2000), a process that is particularly consequential in foreign learning environments and for developing collocational and associative depth in EFL vocabulary learning. Together, these mechanisms predict that WM is diagnostic of the initial uptake of unfamiliar forms (early breadth gains).
Integrating these strands, vocabulary and WM may be reciprocally related over time: vocabulary growth strengthens representations and reduces processing load, potentially feeding back to ease WM demands. Operationally, WM can be indexed via phonological measures (forward digit span, word/non-word span, non-word repetition) and executive measures (backward digit span, counting/complex span, tasks targeting updating, task switching, and inhibitory control) (Wen, Reference Wen2016). Vocabulary breadth can be assessed with standardized receptive size measures and depth via instruments capturing polysemy, collocations, derivational knowledge, and associative strength. To test directional and reciprocal claims with stronger causal inference, a multi-wave cross-lagged panel modelling (CLPM) design can disentangle temporal precedence (WM → later vocabulary; vocabulary → later WM), evaluate bi-directionality, and control for stable between-person differences, thereby moving beyond concurrent correlations to illuminate the dynamic interplay between WM components and the breadth and depth of L2 vocabulary development.
2.4. Empirical studies on working memory and vocabulary learning
To support vocabulary learning, WM plays a crucial role as learners need to access recently presented information and effectively process and integrate it with their prior knowledge to construct a coherent mental model. Studies have been conducted in first language (L1) contexts. For example, Martin and Ellis (Reference Martin and Ellis2012) conducted a study to examine the influence of phonological short-term memory and executive working memory on vocabulary and grammar learning in a foreign language context. The study involved 50 monolingual native English speakers as participants. The results revealed significant effects of both phonological short-term memory and executive working memory on the participants’ first language vocabulary and grammar learning. In another study by Stokes and Klee (Reference Stokes and Klee2009), the relative impact of various factors on young learners’ vocabulary development was explored. The participants were 232 monolingual, English L1 children aged 24–30 months. Demographic, cognitive, behavioural, and psycholinguistic factors were examined, and the results revealed that non-word repetition, an important component of working memory, was the only significant unique predictor of vocabulary learning scores.
While the earlier studies focused on L1 students, there have also been investigations specifically targeting L2 young learners. Morra and Camba (Reference Morra and Camba2009) conducted a study involving 161 primary school learners learning English as a second language in Italy to examine the relationship between WM and vocabulary learning. The participants’ vocabulary knowledge was assessed using the Primary Mental Abilities (PMA) battery, while WM was assessed through tasks such as forward digit span, backward digit span, non-word repetition, and counting span. The findings from linear structural relation models indicated that working memory capacity was a significant predictor of learning all non-word types, but not short native non-words. Swanson et al. (Reference Swanson, Orosco, Lussier, Gerber and Guzman-Orth2011) investigated the predictive power of phonological working memory (PWM) measures, including forward digit span, backward digit span, word span, and non-word span tasks, on the performance of 471 first- to third-grade Spanish L1 ESL students in vocabulary measures. The results showed that PWM measures were predictive of vocabulary outcomes, independent of the students’ L1 phonological processing skills. Vulchanova et al. (Reference Vulchanova, Foyn, Nilsen and Sigmundsson2014) tested 84 Norwegian 10-year-old children who learn English as L2 in short-term memory, L1 language competence (semantics and grammar), and L2 skills (vocabulary and comprehension). Results showed significant correlations between PWM, measured through a forward digit span task, and L2 vocabulary knowledge. Additionally, Verhagen and Leseman (Reference Verhagen and Leseman2016) focused on a sample of participants of 63 Turkish children who learned Dutch as an L2 and 45 Dutch monolingual children (mean age = 5 years). Results demonstrated that PWM, measured using word span and non-word span tasks, significantly predicted vocabulary knowledge and grammar in L1 children and in L2 children who learn their L2 naturalistically. These findings collectively indicate the importance of working memory in L2 young learners’ vocabulary development. They highlight the significant role of working memory measures, such as digit span tasks and non-word repetition, in predicting vocabulary learning outcomes in various language learning contexts. Understanding the relationship between working memory and vocabulary learning in young learners can inform the design of effective instructional strategies to support vocabulary acquisition.
The relationship between PWM and L2 vocabulary acquisition may be influenced by learners’ existing L2 vocabulary knowledge (Cheung, Reference Cheung1996; Masoura & Gathercole, Reference Masoura and Gathercole2005). For example, Cheung (Reference Cheung1996) supports this notion, and found that scores on a PWM measure (specifically, a non-word span task) predicted the number of trials required for seventh-grade learners to learn new L2 words. However, this relationship was only observed in participants with English vocabulary sizes below the group median. Similarly, Masoura and Gathercole (Reference Masoura and Gathercole2005) proposed that as L2 learners become more familiar with L2 phonological forms, they rely increasingly on the L2 lexical forms stored in their long-term memory, diminishing the reliance on PWM as measured by non-word span tasks. In contrast to the earlier findings, Kormos and Sáfár (Reference Kormos and Sáfár2008) discovered a significant correlation between PWM (measured by a non-word span task) and vocabulary range in Hungarian L1 ESL learners at the pre-intermediate level. However, no significant correlations were found between PWM and vocabulary test scores among the beginner-level learners. These findings supported the complex and varied relationship between PWM and L2 vocabulary acquisition. While some studies suggest that the influence of PWM diminishes with increased L2 proficiency and reliance on lexical forms (Masoura & Gathercole, Reference Masoura and Gathercole2005), others have found significant correlations between PWM and vocabulary measures (Martin & Ellis, Reference Martin and Ellis2012). Further research is needed to better understand the interplay between WM and vocabulary knowledge.
However, studies in a foreign language context seem to depict a different picture. Building on previous research, Teng and Zhang (Reference Teng and Zhang2023) investigated the impact of phonological short-term memory and executive working memory on vocabulary learning within a multimedia learning context, specifically utilizing a combination of definition, word information, and video materials. The study focused on university students in a foreign language context. The findings indicated the beneficial roles of phonological short-term memory and executive working memory in facilitating vocabulary learning. It was observed that learners with lower levels of working memory capacity tended to achieve better outcomes in vocabulary learning (see also Teng, Reference Teng2024b, for the role of WM in vocabulary learning). However, when the demands of a vocabulary learning activity exceeded their working memory capacity, even with the provision of multiple types of language input, learners were likely to experience less satisfactory vocabulary learning outcomes. Liu et al. (Reference Liu, Nassaji and Tseng2024) focused on a total of 84 Mandarin-speaking college students (ranging from 20 to 22 years old) in Taiwan, another foreign language learning context. Results showed that the 95% confidence intervals of vocabulary scores of the high-WM groups intersected with those of the low-WM groups. This result points to the possibility of the moderating effect of WM on vocabulary acquisition. In addition, results also suggested that working memory modulated the impact of attentional manipulation on productive vocabulary gains. Wang et al. (Reference Wang, Liu and Liu2024), focusing on 103 Chinese school-aged children with intellectual disabilities in a foreign language context, supported the role of WM in vocabulary learning, but not the role of vocabulary learning on WM. In a recent longitudinal study, Teng (Reference Teng2025) examined the longitudinal development of meta-cognitive knowledge, working memory, and non-verbal intelligence on vocabulary knowledge growth in 210 foreign language young learners in China. Results showed that the baseline level of WM was positively linked to both the initial level and the growth rate of VK. In other words, participants who began with higher WM also started with higher VK, and their VK increased more rapidly over time. These findings underscore the importance of considering individual differences in working memory capacity when considering vocabulary learning outcome. Learners with lower working memory capacity may benefit from tasks that do not overload their cognitive resources, leading to improved vocabulary learning outcomes, while at the same time, higher capacity learners can be challenged with richer input and more complex retrieval schedules to maximize their vocabulary growth without inducing unnecessary cognitive strain. Another noted gap is the absence of cross-lagged panel modelling (CLPM) in previous studies to clarify the directional, reciprocal links between young learners’ working memory and vocabulary learning outcomes. Employing CLPM across multiple waves would help disentangle temporal precedence, test bi-directionality (WM → later vocabulary; vocabulary → later WM), and control for stable between-person differences – yielding stronger causal inferences than concurrent or simple longitudinal correlations.
3. The present study
The aim of this study was to investigate whether WM ability predicts subsequent acquisition of BVK in cognitively normal young learners, and vice versa, using a longitudinal cross-lagged panel model. Children with low WM capacity might struggle to recall the correct information, leading to difficulties in concurrent performance on BVK tests. Conversely, children with higher WM capacity are expected to perform better on such tests. Considering the short-term stability of WM capacity, it is assumed that WM capacity at Time 1 would also predict BVK at Time 2, reflecting consistent retrieval problems throughout the academic years. Accordingly, WM measured at Time 1 is expected to predict BVK performance at Time 2. However, given the cascading effects of knowledge on WM (Ericsson & Kintsch, Reference Ericsson and Kintsch1995), it is also expected that accumulated vocabulary knowledge at Time 2 would predict WM performance at Time 3.
Therefore, both concurrent and cross-lagged relationships are expected to emerge over time, forming a transactional model rather than a cognitive filter model (Sweller, Reference Sweller2011). In the cognitive filter model, individual differences in WM would not be affected by the students’ BVK, while in the transactional model, individual differences in WM would be influenced by the students’ BVK. This study thus addresses a key theoretical gap by modelling the relationship between WM and vocabulary as bi-directional rather than one way. Components of WM, including the phonological loop, episodic buffer, and central executive, are expected to facilitate the initial uptake of unfamiliar word forms and their integration with semantic long-term memory, accelerating growth in recognition. In turn, expanding breadth is expected to re-shape WM efficiency: as learners accumulate more words, they can leverage stronger category structures and chunking routines to reduce maintenance load and guide attention more effectively. According to Baddeley’s (Reference Baddeley2000) model, the episodic buffer organizes new information into meaningful chunks by drawing on semantic long-term memory; as vocabulary breadth grows, these pre-existing structures become richer, further shaping how information is processed within WM and potentially improving WM performance over time. Figure 1 illustrates the hypothesized transactional model. This study attempts to answer two questions:
-
1. What are the co-varying relations between learners’ working memory and breadth of vocabulary knowledge during the primary grades?
-
2. What is the relative significance of working memory on breadth of vocabulary knowledge, and vice versa?

Figure 1. Alt text: The hypothesized model in depicting WM and BVK.
4. Method
4.1. Participants
The study included a total of 158 children (84 boys and 74 girls) aged 6–7 years (M = 77.1 months, SD = 3.41 months). The participants were followed from the end of Grade 1 to the end of Grade 5. This study lasted for 5 years. To ensure that the participants had received a certain level of English instruction, the tests were administered at the end of each grade level. Most of the participants began their English learning journey in Grade 1 of primary school, while some reported starting their English learning in kindergarten. The participants were thus at the beginning level of learning English.
The original sample consisted of Grade 1 students (n = 210) from a prestigious primary school located in a developed city in southern China. However, the final dataset used in the analysis included 158 participants. The attrition of 52 students resulted from a combination of factors: lack of parental consent at later waves, student transfers to other schools during the study period, excessive absenteeism on testing days, and incomplete or unusable data due to non-compliance or administration errors. Prior to data collection, parental permission was obtained to gather data from the children. The participants were homogeneous in terms of ethnicity and cultural background, which is typical in Chinese schools. All participants spoke Chinese as their first language and were learning English as a foreign language (EFL). None of the participants were enrolled in special education classes.
4.2. Measures
The primary objective of this study was to investigate the cognitive factors that contribute to the acquisition of vocabulary knowledge, with a specific focus on WM. The assessment of these variables was carried out individually. To ensure accurate and standardized testing procedures, a team of 16 investigators with substantial teaching experience and specialized training in psychological assessment conducted all tests. The measures employed in this study are described as follows:
Working memory. In order to assess the working memory capacity of the learners, we utilized the Working Memory Power Test (WMPT) developed by Freeman et al. (Reference Freeman, Karayanidis and Chalmers2017). The WMPT is specifically designed to measure various aspects of memory performance for young learners, including central executive and attention control (Ellis & Sinclair, Reference Ellis and Sinclair1996), inhibition (Miyake, Reference Miyake2001), and updating/manipulation (Baddeley, Reference Baddeley2000). It consists of five levels of increasing difficulty: Memorize, 1-Swap, 2-Swap, 3-Swap, and 4-Swap. At the Memorize level, the learner is presented with a screen displaying drawings of three non-verbal stimuli, such as animals (e.g., bird, rat, dog), reflecting the phonological and visuospatial short-term storage (maintenance of item–order) (Baddeley, Reference Baddeley2000). The learner’s task is to remember the order of these items. Afterward, the learner clicks to proceed to the test phase, where they are presented with six options, each consisting of a triplet of animals in different order combinations. The learner’s objective is to select the option that accurately reflects the original order of the animals they saw. This level represents the easiest condition of the test. The 1-Swap condition introduces a higher level of difficulty. In this level, the learner is presented with a new triplet of animals and is instructed to mentally swap the position of two out of the three animals (e.g., “Swap 2 and 3”). The original triplet remains on the screen but disappears once the learner clicks to proceed. Subsequently, six options appear, and the learner must choose the option that corresponds to the swapped sequence. The most challenging level is the 4-Swap, which involves four position swaps. The learner is required to mentally swap the positions of the animals multiple times, following specific instructions (e.g., swap 3 with 1, then 2 with 3, then 1 with 2, then 3 with 2). This level tests the learner’s attention and inhibition in their WM (Freeman et al., Reference Freeman, Karayanidis and Chalmers2017).
To avoid ceiling effects, we implemented several controls: (a) adaptive branching that increases difficulty (e.g., larger set sizes, five to six swaps, delayed execution) when accuracy exceeds 80%; (b) time-limited responses to reduce overreliance on rehearsal; (c) interference through visually similar stimuli and lure sequences; and (d) partial-credit scoring for multi-swap trials combined with response latency, ensuring discrimination among high performers. Pilot testing with advanced learners confirmed that fewer than 5% reached maximum scores, indicating an adequate difficulty range.
Instructions for the WMPT in this study were presented in Chinese, which was the learners’ first language. This ensured that the learners could fully comprehend the task requirements and perform to the best of their abilities. Scoring in this study focused on accuracy, reflecting the number of correct responses. For example, one point was awarded for each correctly executed swap in the correct order; 0 otherwise. In the 4-Swap condition, a maximum of four points could be obtained. If the learner executes the first three swaps correctly but the fourth incorrectly, the score was 3/4 4, reflecting partial credit scoring. The test consisted of 25 trials, with five trials at each difficulty level, resulting in a total of 125 trials. One point was awarded for each trial answered correctly. The WMPT demonstrated good reliability, as indicated by the Cronbach’s alpha value of .85, ensuring consistent and dependable measurement of working memory capacity.
Vocabulary knowledge (VK). In this study, we employed the Picture Vocabulary Size Test (PVST) developed by Anthony and Nation (Reference Anthony and Nation2017) to evaluate the extent of vocabulary knowledge among young learners. The PVST is a standardized test specifically designed to estimate the receptive vocabulary size of young preliterate native speakers up to the age of 8, as well as young non-native English speakers. In this study, the test was adapted to assess the receptive vocabulary size and growth of young learners studying English as a foreign language (EFL). The PVST gauges the test-takers’ ability to associate an appropriate meaning, represented by a picture, with a given partially contextualized word form. The test comprises two sets, each containing 96 items. However, for the purposes of this study, only the first set was administered, as we did not observe any ceiling effects among the participants. Each item in the test presents the test-taker with a plate featuring four pictures, and they are required to select the picture that best corresponds to the meaning of the stimulus word. In instances of uncertainty, an “I don’t know” option was provided. By employing the PVST, we aimed to assess BVK among young EFL learners and track their vocabulary growth over time. The test design, with its contextualized word forms and picture associations, provides a reliable means of evaluating receptive vocabulary skills.
To accommodate the proficiency levels of EFL young learners and provide a multi-modal assessment experience, we made adaptations to the original test format, which was primarily based on listening. In addition to the auditory component, we also provided printed texts for the target words. This modification aimed to support learners who may experience vulnerabilities related to their cognitive, social, emotional, and physical growth (McKay, Reference McKay2006). By allowing the learners to listen while reading the words, we aimed to create a more inclusive and accessible testing situation. Subsequently, the participants were required to choose the correct picture that corresponded to the word they heard and read. The personalized testing environment on a one-on-one basis allowed the teacher to provide individual support and guidance to each learner. The teacher played an active role in keeping the children motivated throughout the task, offering encouraging comments such as “Well done” and “Great.” The PVST provides raw scores, indicating the number of correct responses out of a total of 96 possible items. Before the actual test, the participants received five training words and image plates to familiarize themselves with the test requirements and procedures. The test concluded when the participants made six consecutive errors due to wild guessing. To ensure the reliability of the test scores, two raters independently scored the tests. The interrater reliability for scoring was determined to be 98.4%. In cases of disagreements, raters resolved them through discussion, reaching a consensus. The Cronbach’s alpha values for different grades ranged from .85 to .89, indicating the sound reliability of the test. These measures ensured the validity and reliability of the assessment.
4.3. Data analysis
In this study, we employed a cross-lagged analysis (Kenny, Reference Kenny, Everitt and Howell2005) to specify a structural equation model (SEM) in order to investigate the relationship between WM and acquisition of vocabulary knowledge in a group of primary school students (N = 158). The Cross-lagged Model, also known as Causal Model, Linear Panel Model, Cross-lagged Panel Model, or Auto-regressive Cross-lagged Model (Selig & Little, Reference Selig, Little, Laursen, Little and Card2012), is widely used in non-experimental longitudinal studies to explore and analyse the patterns of mutual influence between two or more variables. It allows for the estimation of the strength of the causal effects of each variable on the other variable. The Cross-lagged Model is particularly useful in examining the dynamics between variables over time, as it takes into account the temporal ordering of the variables and allows for the investigation of reciprocal relationships. By assessing the lagged effects, the model helps to determine the extent to which changes in one variable influence subsequent changes in another variable, while controlling for their previous values. The Auto-regression Model, as the name suggests, involves multiple repeated measurements of a single variable. In the Auto-regression Model, the auto-regressive coefficient represents the magnitude of the auto-regressive effect, indicating the stability of an individual’s rank position on the same variable over time. It reflects individual differences rather than within-individual variations. Expanding the Auto-regression Model to include two or more repeated measurement variables leads to the Cross-lagged Model. In this model, the focus shifts to exploring the interplay between multiple variables and their reciprocal influences over time. By incorporating cross-lagged effects, the model allows for the examination of how changes in one variable are associated with subsequent changes in other variables, while taking into account the temporal ordering of the measurements.
Our cross-lagged model utilized five time points (T1, T2, T3, T4, T5), with the primary variables of interest, WM and BVK, assessed at each measurement occasion. In our model, we specified autoregressive paths for working memory and vocabulary knowledge, including the previous cognitive assessments (e.g., WM at T1 predicting WM at T2). These autoregressive paths allowed WM and BVK at T2 and T3 to be predicted by their respective preceding time points (T1 and T2, respectively). Additionally, we included within-time point correlations between working memory and vocabulary knowledge acquisition at each assessment. Finally, to evaluate our main hypothesis, we incorporated cross-lagged correlations between WM and BVK (e.g., WM at T1 predicting BVK at T2). This allowed us to examine the predictive relationship between working memory and subsequent vocabulary knowledge acquisition.
We conducted the structural equation modelling (SEM) analysis using Mplus 8.0 (Muthén & Muthén, Reference Muthén and Muthén2012) and employed the maximum likelihood (ML) estimation method for parameter estimation in the model. To evaluate the model fit, we primarily relied on two key indicators: the root mean square error of approximation (RMSEA) and the comparative fit index (CFI). A well-fitting model is indicated when the RMSEA value is less than .06 and the CFI value is greater than .95 (Hu & Bentler, Reference Hu and Bentler1999). Additionally, other fit indices such as the incremental fit index (IFI) and the standardized root mean square residual (SRMR) were also used to assess the model fit. An IFI value above 0.95 and an SRMR value below 0.05 indicate good overall model fit.
The estimated parameters of the model include standardized regression weights, which allow for the comparison of prediction variables on different scales. These weights indicate the amount of standard deviation change in the outcome variable when the predictor variable changes by one standard deviation. The calculation of standardized values follows the formula β = b×SD(x)/SD(y) for continuous predictor variables, while for binary variables, the formula β = b/SD(y) is used (Muthén & Muthén, Reference Muthén and Muthén2012). By utilizing these statistical procedures and calculations, we aimed to obtain reliable estimates of the model parameters and assess the goodness of fit for the SEM analysis. These analyses allowed us to evaluate the relationships between the variables of interest and make meaningful interpretations of the results in the context of our study.
In terms of model construction, the cross-lagged regression model was constructed using Mplus 8.0 to investigate the longitudinal relationship between working memory and vocabulary knowledge. In the model, we included auto-regressive paths, simultaneous correlation coefficients, and cross-lagged coefficients. The following steps were taken to build the model: Multiple models were constructed to describe the relationships between variables, and the best-fitting model was selected using likelihood ratio tests. The variables of interest in the hypothesized model were observed variables X and observed variables Y.
Model M1: This model is a baseline model, which included only auto-regressive paths, but not any cross-lagged regression paths.
Model M2: X to Y Cross-lagged Model – This model included a unidirectional cross-lagged regression path from X to Y, while keeping the other paths the same as in Model M1.
Model M3: Y to X Cross-lagged Model – This model included a unidirectional cross-lagged regression path from Y to X, while keeping the other paths the same as in Model M1.
Model M4: Full Model – This model included both the X to Y and Y to X cross-lagged regression paths, capturing the bi-directional relationships between X and Y, while keeping the other paths the same as in Model M1.
Following the construction of these models, likelihood ratio tests were used to compare the fit of the models and determine the best-fitting model among them. The selected model provided insights into the specific longitudinal relationships between WM and BVK. Appendix shows the Mplus syntax for the analysis.
It is worth noting that in the cross-lagged regression model, there may be an unmeasured stable factor that affects a particular variable at each measurement occasion, leading to a positive correlation between the residual variances of that variable at different time points (Berrington et al., Reference Berrington, Smith and Sturgis2006). In other words, in the cross-lagged model, there may exist a third variable Z that is not included in the model but simultaneously influences both variable X2 and variable Y2. In this case, the residual variances of X2 and Y2 should be correlated. Therefore, it is more realistic to assume correlated residual variances when constructing the model. In the next step, after obtaining the best-fitting model, additional constraints can be added to the model based on specific research hypotheses. According to theoretical assumptions, certain parameters can be constrained to be equal across time points. Likelihood ratio tests (Δχ2 test or ΔCFI test) can be used to compare the constrained model with the unconstrained model and test the validity of the constraints. The following constraints can be added based on theoretical considerations: If it is assumed that the effects between variables remain constant over time, the auto-regressive and cross-lagged coefficients can be successively constrained to be equal across time. If the model estimates the covariance between residual variances, assumptions can also be made regarding the covariance between residuals. If the observed variables in the cross-lagged model consist of multiple measured indicators rather than directly observed variables, the model becomes a latent variable cross-lagged model. This model is divided into a measurement model and a structural model. The weak factorial invariance assumption should be satisfied for latent variable cross-lagged models, meaning that the number of factors and factor loadings should remain invariant across repeated measurements (McArdle, Reference McArdle, Nesselroade and Raymond1988). Therefore, the analysis of latent variable cross-lagged models is typically conducted using a two-step approach (Kosloski et al., Reference Kosloski, Stull, Kercher and Van Dussen2005). Specifically:
Step 1: Examination of the measurement model (factor invariance) to test the invariance of factor loadings. Construct the configural model M0 without any restrictions, allowing for measurement error correlations between indicators at different time points. Based on M0, constrain the factor loadings of the same indicators to be equal across time, constructing model M1.
Step 2: After satisfying the assumption of loadings invariance, examine the structural model using the same steps as in a standard cross-lagged regression analysis to explore the relationships between variables.
5. Results
Table 1 presents the descriptive statistics and correlation matrix for the variables used in the study. The findings revealed that Grade 1 students exhibited the lowest level of WM (M = 12.85, SD = 6.81) and BVK (M = 13.33, SD = 6.87). Grade 5 students demonstrated the highest level of WM (M = 42.47, SD = 11.48) and BVK (M = 47.99, SD = 11.35). Across different time points, working memory exhibited significant autocorrelation. Similarly, vocabulary acquisition also shows significant autocorrelation, suggesting stability in individuals’ vocabulary learning abilities over time. Additionally, there is a significant positive correlation between working memory and vocabulary acquisition, indicating that higher levels of WM are associated with better vocabulary learning outcomes.
Table 1. Descriptive statistics and correlation matrix for study variables

Note: WM = working memory; BVK=breadth of vocabulary knowledge. *p < .05, **p < .01.
5.1 Model construction
Based on the steps proposed by Kosloski et al. (Reference Kosloski, Stull, Kercher and Van Dussen2005), the best model is selected. Our results indicate that the model fits well, with all fit indices being at desirable levels ([MODEL FIT INFORMATION, number of free parameters = 41, χ2 = 28.402, degrees of freedom = 24, CFI = .999, IFI = .998, RMSEA = .034, SRMR = .007]). The autoregressive coefficients, lagged effects, non-standardized estimates, standardized estimates, and 95% confidence intervals (CI) for the cross-lagged model are presented in Table 2. The concise summary of the model can be found in Figure 2.
Table 2. Model results for the cross-lagged panel model


Figure 2. Alt text: Structural equation model of the cross-lagged panel model examining the bi-directional association between WM and BVK. *p < .05, **p < .01, T1 ~ T5 = Time1 ~ Time5. All estimates reported are standardized regression estimates. WM = working memory; BVK = breadth of vocabulary knowledge. Solid lines represent significance, while dashed lines represent non-significance.
We first examined the autocorrelations of WM and BVK. As expected, the autocorrelation coefficients of WM between T1 and T2, T2 and T3, T3 and T4, and T4 and T5 were significant and of substantial magnitude (see Table 2). Similarly, the autocorrelation coefficients of BVK between T1 and T2, T2 and T3, T3 and T4, and T4 and T5 were significant and of substantial magnitude (see Table 2). Regarding the simultaneous correlation between WM and BVK, as shown in Figure 2, there was only a significant positive correlation at time points 1 and 4. In time points 2, 3, and 5, the simultaneous correlation between WM and BVK was not significant.
The autoregressive paths indicate strong temporal stability for both WM and BVK. For WM, coefficients from WM1 → WM2 (.915), WM2 → WM3 (.924), WM3 → WM4 (.811), and WM4 → WM5 (.896) are all large and highly significant, suggesting that WM is largely trait-like across waves. The slight dip at WM3 → WM 4 (.811) implies a period of greater change or variability in WM during that interval. BVK’s stability increases as the study progresses: BVK1 → BVK2 is moderate (.557), BVK2 → BVK3 strengthens (.784), and from BVK3 onwards the paths are very high (BVK3 → BVK4 = .955; BVK4 → BVK5 = .934). This pattern suggests BVK becomes increasingly self-maintaining over time, with later values of BVK being strongly anchored by their immediate past.
Turning to cross-lagged influences, BVK consistently exerts small but significant positive effects on subsequent WM. The paths from BVK1 → WM2 (.073, p = .018), BVK2 → WM3 (.067, p = .031), and BVK4 → WM5 (.088, p = .010) are modest, while the effect peaks midstream at BVK3 → WM4 (.175, p < .001). These results indicate that even after accounting for WM’s own strong stability, prior BVK provides an incremental nudge to increase WM at the next time point, most notably around the third to fourth wave.
In contrast, WM’s influence on BVK is front-loaded and fades over time. Early on, WM1 → BVK2 is moderate and highly significant (.395, p < .001), and WM2 → BVK3 remains significant though smaller (.198, p < .001). However, by later waves the cross-lagged paths from WM to BVK are no longer significant (WM3 → BVK4 = .025, p = .465; WM4 → BVK5 = .040, p = .223), with confidence intervals spanning zero. This attenuation aligns with the rising autoregressive strength of BVK: as BVK becomes more stable, it appears less responsive to prior WM.
Overall, the results suggest a mutually influencing causal relationship between WM and BVK. In particular, the model portrays two constructs with strong self-continuity, especially pronounced for BVK in the latter half of the timeline. Cross-lagged dynamics are asymmetric: BVK has a small but persistent effect on future WM across waves, while WM’s effect on future BVK is meaningful early and then disappears. Substantively, this suggests that as BVK becomes highly self-sustaining, its accumulated lexical richness offers small but reliable downstream benefits for WM, whereas WM’s early advantage primarily fuels initial vocabulary growth and then diminishes once BVK is entrenched. In other words, vocabulary breadth increasingly “carries itself” and gently supports WM, while WM’s role in expanding vocabulary is time-sensitive and front-loaded.
6. Discussion
In this research, we have unveiled a bi-directional, reinforcing relationship between WM and BVK over an extended period. To delve into the specifics, our findings indicate that at Time 1 (T1), WM significantly and positively predicts BVK at Time 2 (T2), and at Time 2 (T2), WM similarly exerts a significant positive influence on BVK at Time 3 (T3). However, the lagged effects of WM at Time 3 (T3) and Time 4 (T4) on BVK at Time 4 (T4) and Time 5 (T5) are not statistically significant. BVK holds significant predictive power over subsequent WM levels with regard to lagged effects. When examining the coefficients of these lagged effects, it becomes apparent that WM has a greater predictive influence on BVK at the early phases. However, when considering the frequency of significant predictions in the context of lagged effects, BVK consistently and positively predicts WM over time, showcasing its ability to stably influence the development of WM as time progresses. The discussion was structured based on the research questions.
6.1 Relations between WM and BVK during the primary grades
To answer the first question of exploring the co-varying relations between learners’ WM and BVK during the primary grades, it becomes evident that WM is related to the development of BVK through several essential mechanisms. First, WM is closely associated with the ability to selectively attend to task-relevant stimuli while simultaneously inhibiting irrelevant information. This skill, as noted by Baddeley (Reference Baddeley1992, Reference Baddeley2000), is of utmost importance in the context of vocabulary acquisition. Selective attention ensures that learners focus their cognitive resources on the most pertinent information, enabling more effective encoding and retention of new words. Second, WM involves the maintenance of information over time, which is crucial for vocabulary learning. As highlighted by Teng and Zhang (Reference Teng and Zhang2023), the integration of information across various domains, such as verbal, acoustic, and visual, is vital for a comprehensive understanding of word meanings. Partially supporting Teng (Reference Teng2025), WM’s capacity to temporarily hold and manipulate information from these different sources is instrumental in creating meaningful connections and associations among words, thereby enhancing the overall vocabulary learning process. Additionally, WM reflects the ability to perform additional cognitive processes concurrently, such as recalling word information while simultaneously holding it in short-term storage for word learning. This simultaneous processing and maintenance capability, as emphasized by Ellis and Sinclair (Reference Ellis and Sinclair1996), is related to improved word retention.
On the other hand, expanding BVK can, in turn, be related to WM efficiency during the primary grades. As children acquire more words, they develop stronger semantic categories and more robust lexical networks. These vocabulary knowledge structures enable chunking and faster retrieval of relevant cues, thereby reducing the maintenance load on WM and making task-relevant features more salient, which improves attentional control. In practice, a broader vocabulary means children can bind new forms to familiar categories more quickly, narrowing search space and diminishing interference from similar items. This reciprocal pathway aligns with models in which the episodic buffer leverages semantic long-term memory to assemble coherent representations; as BVK grows, the buffer’s integrations become more efficient, supporting subsequent WM performance (Baddeley, Reference Baddeley2000).
6.2 Relative significance of WM on BVK, and vice versa
Greater WM capacity facilitates the binding of word forms to meanings. The WMPT used in this study is designed to index the sub-systems that underlie this process. The Memorize level taps the phonological loop (and visuospatial maintenance) for item–order retention; the episodic buffer supports integrating maintained sequences with task cues (“swap 2 and 3”) across phases; and the 1–4 Swap levels increasingly load the central executive for attention control, inhibition, updating, and manipulation. In line with Baddeley et al. (Reference Baddeley, Allen and Hitch2011), this architecture explains how maintenance, integration, and manipulation jointly support vocabulary acquisition.
Furthermore, WM enables the use of executive control processes to select and implement effective strategies during learning. It also plays a crucial role in maintaining information in focal attention while it is integrated into pre-existing cognitive schema, a process that aligns with the demands of BVK acquisition. As learners strive to remember a list of unrelated words and build connections between their form and meaning, a robust WM capacity can facilitate these cognitive operations, leading to improved vocabulary retention over time. It is important to note that these cognitive processes are particularly relevant in BVK learning scenarios where participants need to remember a list of unrelated words and the details necessary for establishing form and meaning links. The ability to retain words in Short-Term Memory while organizing them in a meaningful way can significantly influence the number of words remembered, especially at later time points. As suggested by Verhagen and Leseman (Reference Verhagen and Leseman2016), instructed learners might need to analyse incoming input based on L2 meta-linguistic knowledge and subsequently store smaller chunks of L2 speech, where well-developed WM skills, particularly long-term memory, play a critical role. This underscores the intricate relationship between WM and BVK, with WM being a cornerstone in the cognitive processes necessary for successful vocabulary acquisition.
We may need to be critical that remembering lists of familiar words in a WM task is not the same as learning new vocabulary. The WMPT in this study assesses maintenance, integration, and manipulation with known items; it does not teach words. Similarly, the VK test gauges existing knowledge without a list-based format. WMPT uses familiar items and artificial swap rules, its components map onto the cognitive operations recruited during authentic vocabulary learning: maintaining novel phonological forms long enough to rehearse and encode them; integrating form, meaning, and contextual cues across encounters; and manipulating/updating representations as new usages refine understanding. Another issue to be noted is that WM’s influence on future BVK is front loaded: it is moderate early (WM1 → BVK2 = .395) and smaller but still significant at the next interval (WM2 → BVK3 = .198), then becomes non-significant later (WM3 → BVK4, WM4 → BVK5). Such results contributed to the cross-lagged model results in Wang et al. (Reference Wang, Liu and Liu2024), for which the WM and vocabulary tests were only administered for two times. Theoretically, this pattern is consistent with a developmental coupling model. In this model, WM scaffolds vocabulary acquisition during periods of rapid growth by supporting attention, encoding, and consolidation. However, as BVK consolidates and its autoregressive strength increases, vocabulary gains become increasingly driven by prior knowledge itself. Consequently, additional WM advantages become less consequential for subsequent BVK expansion. The findings also underscore the influence of BVK on WM. Such results were not consistent with Wang et al. (Reference Wang, Liu and Liu2024), for which vocabulary does not enhance WM. In this study, BVK actively engages with WM processes, especially when chunking mechanisms are employed during new learning. According to Baddeley’s (Reference Baddeley2000) model, the episodic buffer plays a pivotal role in organizing new information in a meaningful way. This process of chunking often relies on the use of semantic long-term memory (LTM) to categorize and structure incoming information. For example, after learners learn how to group words into categories, these categories often have pre-existing representations in LTM. The use of such pre-existing structures, include animals, furniture, emotions. The episodic buffer can bind multiple items under one semantic label, compressing several elements into a single chunk (Baddeley et al., Reference Baddeley, Allen and Hitch2011). This lowers phonological-loop demands and increases span for additional details (e.g., form, collocations), essential to vocabulary learning. This approach leverages the semantic richness of BVK to enhance WM’s capabilities, suggesting that BVK profoundly impacts how information is organized and stored. A related facet of this influence, as proposed by Vulchanova et al. (Reference Vulchanova, Foyn, Nilsen and Sigmundsson2014), is that BVK can evoke rich word representations directly within WM itself. This implies that as individuals draw upon their BVK, they activate associated word representations, which can enhance their WM performance, judged from certain WM tests. The activated representations in WM might serve as mental scaffolding for the integration and manipulation of new information, ultimately aiding in the retention and retrieval of vocabulary items. Furthermore, the retrieval process is influenced by factors such as lexicality and word frequency (Poirier & Saint-Aubin, Reference Poirier and Saint-Aubin1996). The influence of prior experiences in learning BVK is particularly relevant here. Accumulated vocabulary knowledge can facilitate the evaluation of items stored in WM, as the familiarity and frequency of certain words may impact the ease with which they can be retrieved and manipulated in cognitive tasks. This suggests that the depth and breadth of one’s BVK can shape the efficiency and effectiveness of WM operations, particularly during the retrieval and utilization of vocabulary items.
The data presented in this study strongly thus support the transactional model when analysing WM and BVK. This finding is intriguing and offers several noteworthy arguments and implications for our understanding of the relationship between WM and BVK. First, it appears that the cognitive filter model is partially supported by the data, especially in its characterization of situational processing constraints that limit the amount of information students can effectively process when attempting to acquire BVK. The transactional model shares this aspect of explanation with the cognitive filter model, as students with higher WM capacity demonstrated more substantial growth in their BVK compared to their peers with lower WM capacity. This suggests that WM does indeed play a critical role in determining the extent to which BVK can be acquired, confirming the cognitive filter model’s emphasis on processing constraints.
However, the transactional model goes a step further by acknowledging the mutual, interacting effects of BVK and WM over time. Unlike the cognitive filter model, the transactional model accounts for the bi-directional nature of this relationship. The study’s findings highlight that while both models predict that prior WM capacity influences the acquisition of BVK, only the transactional model anticipates the concept of reciprocal causation. This means that accumulated BVK predicts increases in WM capacity, and vice versa. The data reveal that students who exhibited greater BVK at Time 1 also exhibited higher WM ability at Time 2, and conversely, students with stronger WM abilities at Time 1 demonstrated more BVK at Time 2. This reciprocal causation emphasizes the dynamic, evolving nature of the relationship between BVK and WM over time, underlining the complexity of this cognitive interaction. These findings contribute significantly to previous research on the interplay between WM and vocabulary learning, as demonstrated in studies by Morra and Camba (Reference Morra and Camba2009), Swanson et al. (Reference Swanson, Orosco, Lussier, Gerber and Guzman-Orth2011), and Teng and Zhang (Reference Teng and Zhang2023). It is now clear that BVK and WM do not operate in isolation but rather interact, affecting each other’s development and growth.
One interesting finding is that while WM and BVK both stabilize with time, BVK becomes particularly inertial in later waves, and their cross-lagged coupling shifts. Early on, stronger WM facilitates vocabulary expansion; later, a broad vocabulary base modestly supports WM performance, while WM’s incremental contribution to further BVK growth fades. Practically, this implies two windows of opportunity: bolster WM early to accelerate vocabulary breadth acquisition, and maintain rich language exposure later to sustain BVK while leveraging it to support WM-related tasks. Such results were not in line with Wang et al. (Reference Wang, Liu and Liu2024), for which the cross-lagged panel model results showed that WM scores significantly predicted changes in receptive vocabulary for children over time, but the reverse pattern of vocabulary on WM was not evident. In this study, broader vocabulary may scaffold WM tasks. Furthermore, richer lexical representations can reduce retrieval effort, thereby freeing cognitive resources and improving performance on subsequent WM measures.
6.3. Limitations and implications
This study has several limitations that should be considered. First, it utilized the WMPT as the sole measure of WM. It might have been more advantageous to incorporate multiple WM measures to create a composite WM measure, thus providing a more comprehensive understanding of WM’s role in the studied context. Second, the depth of vocabulary knowledge could be as influential as BVK, but it was not included in this study. Given the intricacies of vocabulary knowledge, as emphasized by Nation (Reference Nation2001), future research would benefit from incorporating measures that encompass various facets of vocabulary knowledge. Finally, the data used in this study were specific to the Chinese context, which may limit the generalizability of the findings to other linguistic and cultural contexts.
Notwithstanding its limitations, the current study holds several significant implications. The observed relationship between WM and BVK has noteworthy practical applications for interventions aimed at enhancing vocabulary acquisition. Future research could concentrate on tailoring instructional approaches to accommodate students with low WM capacity, thus addressing individual learner needs more effectively. The well-documented variance in WM capacity among students with different deficit profiles suggests that specific instructional strategies may be more beneficial for students with varying WM abilities. Acknowledging the support for the transactional model, it raises the possibility that rather than exclusively focusing on WM training, as explored in previous studies (Ellis & Sinclair, Reference Ellis and Sinclair1996; Verhagen & Leseman, Reference Verhagen and Leseman2016), an alternative approach could involve teaching students strategies to offload information into long-term memory during vocabulary learning. This shift in pedagogical strategy could prove beneficial in enhancing BVK in students with differing WM capacities. In particular, in the early phase, when BVK is less stable, strengthening working memory, through attention control, rehearsal strategies, and structured memory tasks, can accelerate BVK growth by supporting efficient encoding and retention of new words; in the later phase, when BVK is highly stable, expanding and actively using vocabulary via rich reading, diverse discourse, and morphology/semantic network work yields modest improvements in WM performance, likely by reducing processing load through more familiar lexical representations; accordingly, time interventions to target WM supports early to catalyse vocabulary gains and prioritize sustained, high-quality language exposure later, since BVK’s inertia means improvements arise chiefly from its own continuity rather than additional WM boosts.
Data availability statement
All data were shared in https://osf.io/2hgq8/
Acknowledgements
No generative AI tools were used during the writing or initial preparation of this manuscript. ChatGPT-5.2 was employed solely in the revision stage to refine wording and structure. All substantive content, analyses, and interpretations originated from the author.
Funding statement
This study was supported by the National Social Science Fund of China, entitled cross-sectional effects and longitudinal development of working memory and vocabulary acquisition (grant number: 22BYY182).
Ethical approval statement
The research conducted in this study has received ethical approval from the institutional review board (IRB). The approval was granted after a thorough evaluation of the research protocol, ensuring that it adheres to the highest ethical standards and protects the rights, welfare, and privacy of the participants involved. The ethical approval process involved a comprehensive review of the study’s objectives, methodology, data collection procedures, potential risks and benefits, participant recruitment and consent procedures, and data handling and storage protocols.
Informed consent was obtained from all participants and their parents involved in the study, ensuring that they were fully informed about the nature of the research, their rights as participants, and any potential risks or discomforts associated with their involvement. Participants were given the opportunity to ask questions and were assured of their voluntary participation and the confidentiality of their personal information.
A. Appendix – Mplus syntax for the CLPM analysis
The following is the Mplus syntax for the analysis:
VARIABLE: names = id x1−x5 y1−y5;
USEVARIABLES = x1−x5 y1−y5;
ANALYSIS:
ESTIMATOR = ML;
INFORMATION = observed;
bootstrap is 1000;
MODEL:
x2 on x1 y1; x3 on x2 y2; x4 on x3 y3; x5 on x4 y4;
y2 on x1 y1; y3 on x2 y2; y4 on x3 y3; y5 on x4 y4;
MODEL:
Model M1: Baseline model
Y ON Y@1;
X ON X@1;
Model M2: X to Y Cross-lagged model
Y ON Y@1;
X ON X@1;
Y ON X@1;
Model M3: Y to X Cross-lagged model
Y ON Y@1;
X ON X@1;
X ON Y@1;
Model M4: Full model
Y ON Y@1;
X ON X@1;
Y ON X@1;
X ON Y@1;
In the above syntax, “Y” and “X” represent the observed variables, and “@1” indicates the auto-regressive paths. The different models M1, M2, M3, and M4 are specified by including or excluding the cross-lagged regression paths accordingly.



