The multidimensionality of second language oral fluency: Interfacing cognitive fluency and utterance fluency

Abstract The current study examined the extent to which cognitive fluency (CF) contributes to utterance fluency (UF) at the level of constructs. A total of 128 Japanese-speaking learners of English completed four speaking tasks—argumentative task, picture narrative task, reading-to-speaking task, and reading-while-listening-to-speaking task—and a battery of linguistic knowledge tests, capturing vocabulary size, lexical retrieval speed, sentence construction skills, grammaticality judgments, and articulatory speed. Their speaking performance was analyzed in terms of speed, breakdown, and repair fluency (i.e., UF), and scores on linguistic knowledge tests were used to assess students’ L2 linguistic resources and processing skills (i.e., CF). Structural equation modeling revealed a complex interplay between the multidimensionality of CF and UF and speaking task types. L2 processing speed consistently contributed to all aspects of UF across speaking tasks, whereas the role of linguistic resources in speed and repair fluency varied, depending on task characteristics.


Introduction
Oral fluency is one of the most robust indicators of second language (L2) proficiency . In the context of learning, teaching, and assessment of L2 speaking skills, oral fluency is thus commonly regarded as one of the major learning goals. For a better understanding of L2 fluency as a construct and as an important language learning target, it is essential to examine how underlying linguistic knowledge contributes to students' fluent speech production. Insights into how L2 users' linguistic resources and processing mechanisms contribute to the efficiency of speech production may also assist language teachers and materials designers and inform language teaching policy makers what linguistic knowledge areas and skills to develop so that L2 learners may become fluent speakers. In L2 fluency research, underlying linguistic knowledge and temporal characteristics of speech are termed cognitive fluency (CF) and utterance fluency (UF), respectively (Segalowitz, 2010(Segalowitz, , 2016. Specifically, CF refers to "the efficiency of the speaker's underlying processes responsible for fluency-relevant features of utterance" (Segalowitz, 2010, p. 50). UF is concerned with "the oral features of utterances that reflect the operation of underlying cognitive processes" (ibid.), including speed of delivery and hesitations. Although few in number, previous studies have examined the relationship between CF and UF (henceforth, CF-UF link), providing important insights into the CF-UF link Kahng, 2020). However, previous studies analyzed the measures of CF and UF only at the level of observed variables, meaning that the findings may entail measurement errors. As any observable phenomena are produced not only by the underlying target constructs but also by some unpredictable random factors (i.e., measurement error), scholars in the human sciences have adopted the concept of latent variables and calculate them based on the covariance of multiple observed variables (for an overview, see Bollen, 2002;Kline, 2016). To further clarify the CF-UF link, the current study, therefore, examines the CF-UF link at the level of constructs by means of latent variable analyses.

Literature Review
Cognitive Fluency CF is concerned with how efficiently L2 speakers operate their systems of speech production (Segalowitz, 2010). The validity of operationalization of CF can thus be discussed in terms of how different cognitive and linguistic processes in L2 speech production are reflected in utterances. L2 speech production models (Kormos, 2006;Segalowitz, 2010) are commonly based on Levelt's (1989Levelt's ( , 1999 work and assume that L2 speech production entails three major phases-conceptualization, formulation, and articulation-which are executed serially in this order. Conceptualization is responsible for the generation of the preverbal message that includes selected information to convey and its manner of communication. Formulation transforms the preverbal message into corresponding linguistic forms through different linguistic encoding processes (e.g., lexical retrieval, syntactic procedures). Articulation proceeds by moving the speech organs to produce speech sounds. In addition to these major processes of speech production, the self-monitoring function examines the interim content and eventual outcome of the preceding processes in terms of appropriacy and linguistic correctness. Among these speech production processes, conceptualization is assumed to be relatively independent of L2 proficiency because conceptualization is responsible for the manipulation of conceptual information prior to linguistic encoding processes (Kormos, 2006). In contrast, formulation and articulation draw on L2 knowledge and skills and thus are categorized as L2specific components of CF (Kahng, 2020;Segalowitz, 2016). One clear distinction between formulation and articulation is the level of representations of processing. Formulation involves several linguistic encoding modules, all of which manipulate different types of linguistic representations (e.g., lexical and phonological representations). Articulation is considered to be purely motoric, meaning that the execution of articulation involves the use of gestural movements rather than information processing. Meanwhile, the self-monitoring function is related to both conceptual and L2-specific aspects of L2 speech production because it is driven by either content accuracy or linguistic errors identified in the course of speech production. Building on the notion of L2-specific CF (Kahng, 2020;Segalowitz, 2016), valid measurements of CF should therefore tap into formulation and articulation processes and selfmonitoring processes triggered by language-related problems.
Looking closely at the literature on CF research, one may argue that previous studies have used both broad and narrow definitions of CF. In a narrow sense, in accordance with Segalowitz's (2016) original conceptualization, CF refers to the speed and efficiency of linguistic encoding processes. In a broad sense, often adopted in empirical studies (e.g., Kahng, 2020), CF may include linguistic knowledge resources as well as the speed of processing skills. For instance, lexical processing in L2 speech production is related to the range of available lexical resources (i.e., breadth and depth of vocabulary knowledge) as well as the speed of lexical retrieval (i.e., lexical fluency) (see Kormos, 2006). Following the narrow sense, only lexical fluency is regarded as a lexical component of CF, whereas the broad definition of CF concerns both the breadth and depth of vocabulary knowledge and lexical fluency as the lexical component of CF. According to Segalowitz's (2010Segalowitz's ( , 2016 framework, CF is conceptualized as a construct that can explain observable temporal features of utterances (i.e., UF). From the perspective of L2 speech production mechanisms, breakdowns in utterances can be caused by both lack of linguistic resources and slow processing speed. The valid operationalization of CF may thus involve both linguistic resources and processing speed. Therefore, the current study follows the broad definition of CF and subsequently operationalizes CF as linguistic resources and processing efficiency at the level of vocabulary, grammar, and pronunciation (for a similar methodological decision, see Kahng, 2020). However, to the best of our knowledge, the dimensionality of CF (i.e., the number of subconstructs) has not yet been empirically examined. Accordingly, the current study also aims to test different factor structures of L2-specific CF.

Utterance Fluency
Within Segalowitz's (2010Segalowitz's ( , 2016 framework, UF refers to observable temporal features, such as speed of delivery, pauses, and hesitations, which reflect the speaker's CF. There has been a consensus that UF is composed of three subcomponentsspeed, breakdown, and repair fluency (Skehan, 2003;Tavakoli & Skehan, 2005). Speed fluency is concerned with the density of information or speed of delivery and thus is typically measured by articulation rate (i.e., the number of syllables produced over speech duration excluding pauses). Breakdown fluency refers to pausing behavior and is commonly operationalized in terms of the frequency, duration, type, and location of pauses (S. Suzuki et al., 2021;Tavakoli & Wright, 2020). Among the different dimensions of pausing behavior, recent studies have recognized the importance of pause location as an indicator of underlying speech processing. Pauses in the middle of utterances are hypothesized to reflect disruptions in L2-specific linguistic processing, whereas pauses at clausal boundaries are supposed to capture breakdowns in conceptualization-related processes, such as content planning (De Jong, 2016;Tavakoli, 2011). Finally, repair fluency covers, by definition, a range of disfluency phenomena, including self-corrections, false starts, and verbatim repetitions. Some scholars argue that repair fluency is in a supplementary relationship with breakdown fluency because both breakdown and repair fluency are assumed to reflect the operation of self-monitoring processes (i.e., covert and overt repairs; see Kormos, 2000Kormos, , 2006 and are regarded as opportunities for speakers to buy time to deal with disruptions in speech processing (De Jong et al., 2015). As such, some studies even examine breakdown and repair fluency as inseparable phenomena (e.g., Williams & Korko, 2019).
L2 fluency research has conventionally measured temporal features of speech, following Tavakoli and Skehan's (2005) triad model of UF (speed, breakdown, and repair fluency). The triad model was empirically validated to examine the extent to which fluency is distinguishable from other constructs of L2 oral proficiency such as accuracy and complexity, and to establish the robustness of the model across four different prompts of picture narrative tasks. The results of factor analysis in Tavakoli and Skehan's (2005) study indicated two separate factors of UF: one including both speed and breakdown fluency and the other repair fluency. The finding that speed and breakdown fluency were indistinguishable might have been due to the lack of the measure that taps solely into speed fluency, that is, articulation rate. Following Tavakoli and Skehan's (2005) study, different UF measures with high construct validity, such as articulation rate and mid-clause pause frequency, have been developed in L2 fluency research. In addition, even though the triad model of UF was validated only with speech data from picture narrative tasks, the model has been applied to a variety of speaking tasks, going beyond picture narratives. Therefore, to test the validity of Tavakoli and Skehan's (2005) model of UF in diverse research contexts, it is essential to revisit the dimensionality of UF, using a comprehensive set of UF measures based on different speaking tasks.

The Cognitive-Utterance Fluency Connection
According to Segalowitz's (2010Segalowitz's ( , 2016 framework, a speaker's CF is assumed to underlie the UF of their speech. Although few in number, previous studies have examined what cognitive and linguistic processes can explain variability in UF. Even before Segalowitz's (2010) work, Segalowitz and Freed's (2004) pioneering research investigated the role of L2-specific cognitive ability in L2 oral fluency with Englishspeaking learners of Spanish (N = 40). Using a semantic classification task and a repeat-and-shift task in both L1 and L2, they computed L2-specific cognitive measures for lexical access and attention control by partialing out corresponding L1 measures. They found that the length of run without fillers in L2 speech was positively associated with the speed of L2 lexical access. Meanwhile, L2 speech rate correlated negatively with the processing stability of L2 attention control, measured by the coefficient of variance (CV) index. Despite the narrow range of cognitive processing measures, these findings confirmed the role of cognitive ability in L2 UF.
Building on Segalowitz's (2010) framework of oral fluency, De Jong et al. (2013) employed a range of linguistic resource and processing measures to predict different UF measures. Their data were collected from 179 learners of L2 Dutch from various L1 backgrounds. Their CF measures covered vocabulary knowledge (vocabulary size, lexical retrieval speed), grammatical knowledge (grammatical knowledge, sentence construction speed), and pronunciation knowledge (phonetic accuracy, articulatory speed). Their UF measures captured speed, breakdown, and repair fluency. Correlational analyses showed that relevant components of CF varied across UF measures. For instance, mean syllable duration, that is, the inverse measure of articulation rate (speed fluency), correlated with a whole range of CF measures. Meanwhile, breakdown fluency measures were related to more specific dimensions of CF. Mean duration of pauses correlated weakly with lexical retrieval speed only. Moreover, both silent and filled pause ratio measures correlated with lexical and grammatical measures. In addition, their linear mixed-effects modeling included a random slope of speaking task types for all UF measures, except self-repetition ratio, indicating the moderating role of task type in the CF-UF link. Kahng (2020) examined the predictive power of CF measures for UF measures, using a personal narrative task with Chinese learners of English (N = 44). Uniquely, Kahng (2020) included corresponding L1 UF measures as another predictor variable to partial out the covariance between L1 and L2 UF measures. In her study, CF measures covered vocabulary size for single words and multiword phrases, lexical retrieval speed, grammatical resources and processing speed, and articulatory speed, largely following De . The stepwise multiple regression analyses resulted in three major findings. First, although mean syllable duration (speed fluency) and mid-clause pause ratio (breakdown fluency) correlated with both lexical and syntactic measures of CF, different CF measures were identified as predictor variables in the regression models. Mean syllable duration was predicted from lexical measures of CF (lexical retrieval speed, phrasal vocabulary size), whereas mid-clause pause ratio was predicted from the measure of syntactic processing speed. This finding indicates that the primary component of CF can vary across the dimensions of UF. Second, the regression models of midclause pause ratio and self-correction ratio did not include corresponding L1 UF measures as predictor variables. This finding suggests that pauses in the middle of clauses and self-repair may specifically reflect L2-specific processing. Third, the strongest predictors in the regression models of mean pause duration and filled pause ratio were their corresponding L1 UF measures, suggesting that the length of silent pauses and the frequency of filled pauses are more closely related to languagegeneral idiosyncratic factors than to L2-specific CF.
Taken together, previous studies suggest two common patterns with regard to the CF-UF link. First, different components of CF may be associated with different dimensions of UF to varying degrees. Therefore, for a better understanding of the CF-UF link, it is essential to consider the dimensionality of CF and UF. Second, the association between CF and UF can vary, depending on the speaking task design . However, it is still unclear what task design features moderate the CF-UF link because, in their study,  treated speaking tasks as random-effects predictors in their regression models. Meanwhile, previous studies employed different measurements of CF and analyzed scores only at the level of observed variables. It can thus be argued that the findings of previous studies may entail measurement errors to some extent. Therefore, L2 fluency research may be extended by examining the CF-UF link at the level of constructs by means of latent variable analyses.

The Current Study
Motivated by the scarcity of studies examining the CF-UF link, the current study examines the relationship between CF and UF across four speaking tasks at the level of constructs, as well as the dimensionality of CF and UF. Accordingly, the study employed a cross-sectional design to investigate the factor structure of CF and UF. Building on Segalowitz's (2010) original framework, we predicted the latent variables of UF (i.e., outcome variables) from those of CF (i.e., predictor variables), using structural equation modeling (SEM). We also included one moderator variable, that is, speaking task. Following De  and Kahng (2020), this study operationalized CF as a set of linguistic resources and processing skills. Each dimension of UF, that is, speed, breakdown, and repair fluency, was also measured. Furthermore, to examine the moderating role of speaking tasks in the CF-UF link, we employed four speaking tasks: argumentative task, picture narrative task, reading-tospeaking task, and reading-while-listening-to-speaking task. The current study is guided by the following research questions: RQ1. What is the relationship between cognitive fluency measures of lexical, grammatical, and pronunciation knowledge? RQ2. What is the relationship between utterance fluency measures of speed, breakdown, and repair fluency? RQ3. To what extent do components of cognitive fluency contribute to different dimensions of utterance fluency? RQ4. To what extent is the cognitive-utterance fluency link (RQ3) moderated by speaking tasks?

Participants
To reach adequate statistical power, the minimum number of for the sample size was determined by the ratio of the sample size to the number of variables. Traditionally, the optimal ratio for confirmatory factor analysis (CFA) can range from 5 to 10 (Kyriazos, 2018). As a total of 20 observed variables (11 UF measures and 9 CF measures) was predetermined for the current study (see "Analysis" section), the minimum number of sample size was set at N = 100. A total of 128 Japanese learners of English, ranging from 18 to 27 years of age (M age = 20.43, SD age = 1.81), participated voluntarily in the current study (Female = 73, Male = 55). Their self-reported university placement test scores suggested that most of them could be placed at the B1-B2 levels of the Common European Framework of Reference (CEFR; Council of Europe, 2001) scale, while some of them seemed to have reached C1 level.

Speaking tasks
The current study aims to examine the moderator effects of speaking task design on the CF-UF link. Given the mechanisms of speech production as theoretical underpinnings of CF (Segalowitz, 2010(Segalowitz, , 2016, we selected task design features based on the framework of speech processing demands (e.g., Préfontaine & Kormos, 2015;Skehan, 2009), targeting three speech processing characteristics: content planning (i.e., conceptualization), the preemptive activation of relevant linguistic items, and the availability of phonological information. To manipulate these speech processing components, the study employed four speaking tasks: (a) an argumentative speech task, (b) a related picture narrative task, (c) a reading-to-speaking (RtoS) task, and (d) a reading-while-listening-to-speaking (RwLtoS) task. All the task prompts are available using OSF (https://osf.io/95rzj/). In the argumentative task, students were provided with a statement and argued to what extent they agree/disagree with it (S. Suzuki & Kormos, 2020), while in the picture narrative task, they were asked to describe an 11-frame cartoon adopted from Préfontaine and Kormos (2015; available in the IRIS database, https://www.iris-database.org). In both RtoS and RwLtoS tasks, students were instructed to read a 300-word long expository text written in English and to retell the content of the text. However, these tasks differed in the modality of the source text presentation. The RtoS task offered a written text (i.e., reading-only), while a bimodal text (reading-while-listening) was provided in the RwLtoS task. To minimize the effects of source texts, we prepared two comparable texts adapted from Millington (2019), and the audio-input for the bimodal source text was recorded by a L1 Canadian English speaker with 15 years of English teaching experience at universities in Japan. There are three intended contrasts between these tasks. First, comparing the argumentative task with the other three tasks, the moderating role of the necessity for content planning in the CF-UF link can be examined. Second, the contrast between the picture narrative task and both RtoS and RwLtoS tasks may offer insights into how the CF-UF link is affected by the preemptively enhanced activation of linguistic items by means of the source text presentation. Third, the impact of the availability of phonological information on the CF-UF link can be examined by contrasting the RtoS and RwLtoS tasks.

Utterance Fluency Measures
Following previous studies, we targeted three major aspects of UF-speed, breakdown, and repair fluency (Tavakoli & Skehan, 2005). There is one measure that only taps into the construct of speed fluency, that is, articulation rate, or its inverse measure, mean duration of syllables . However, to construct a latent variable, more than two observed variables are ideally loaded onto the latent variable to avoid an underidentified model (Brown, 2006). We thus included two composite measures-speech rate and mean length of run-as the measures of speed fluency. The selected UF measures are listed as follows: Speed fluency 1. Articulation rate (AR). The mean number of syllables produced per second, divided by total phonation time (i.e., total speech duration excluding pauses).

Composite measures
2. Speech rate (SR). The mean number of syllables produced per second, divided by total speech duration time, including pauses. 3. Mean length of run (MLR). The mean number of syllables produced in utterances between pauses.
Breakdown fluency 4. Mid-clause pause ratio (MCPR). The mean number of silent pauses within clauses, divided by the total number of syllables produced. 5. End-clause pause ratio (ECPR). The mean number of silent pauses between clauses, divided by the total number of syllables produced. 6. Filled pause ratio (FPR). The mean number of filled pauses, divided by the total number of syllables produced.
Repair fluency 9. Self-correction ratio (SCR). The mean number of self-correction behaviors, divided by the total number of syllables produced. 10. False start ratio (FSR). The mean number of false starts/reformulations, divided by the total number of syllables produced. 11. Self-repetition ratio (SRR). The mean number of self-repetitions, divided by the total number of syllables produced.
All the speech data were transcribed and then annotated for the boundaries of clauses.
To minimize collinearity across different constructs of UF, temporal features for breakdown and repair fluency were standardized by the number of syllables produced in pruned transcripts rather than speech duration because speech duration can entail variability in speed fluency. To annotate temporal features, Praat software was used (Boersma & Weenink, 2012). After annotating and excluding disfluency features, the number of syllables produced in pruned transcripts was calculated. Following prior research S. Suzuki et al., 2021), the threshold of silent pauses was defined as 250 ms. With the assistance of automated detection of silence, clause boundaries and pause locations were annotated in TextGrid files of Praat. To ensure the validity of pause identifications, automatically annotated boundaries of silences and sounds were manually checked and, if necessary, modified.

Vocabulary Knowledge
In L2 speech production, vocabulary knowledge mainly plays a role in lexical retrieval where the speaker activates and selects lexical items from the mental lexicon that match the conceptual meaning of the message (Kormos, 2006). We thus assessed speakers' vocabulary size and lexical retrieval speed.

Vocabulary Size
To estimate the speakers' vocabulary size, the study used the Productive Vocabulary Levels Test (PVLT; Laufer & Nation, 1999). In the PVLT, participants were asked to fill in a blank in a sentence in the paper format version of the test. Considering the expected proficiency levels of the participants, the study administered tests of 2,000, 3,000, and 5,000 frequency levels (excluding the 10,000 level and university word list).
To avoid collinearity with lexical retrieval speed, participants were not given a time limit for their responses. The score for vocabulary size was computed as the total number of correct responses out of 54 items (18 items from each level). Following De , inflectional errors and obvious spelling mistakes were ignored.

Lexical Retrieval Speed
To assess the speakers' speed of lexical retrieval, a picture naming task was employed Leonard & Shea, 2017). Participants were presented with pictures and instructed to name each picture orally in English as fast and accurately as possible. Target stimuli were selected from Snodgrass and Vanderwart (1980). The final set of picture stimuli for the study included 50 pictures (for the selection procedure, see Supplementary Information).
The current study administered the picture naming task using the PsychoPy software package (Peirce, 2007). Following , participants were first presented with a fixation cross in the middle of the screen for 1,500 ms, followed by a picture stimulus with a 10,000-ms response deadline. The order of the picture stimuli was randomized for each participant. Prior to the main trials, three practice trials were conducted.
Lexical retrieval speed was computed as the average reaction time (RT) for correct responses. RT was calculated as the response latency between the onset of the presentation of picture stimuli and that of the participants' response. Incorrect responses and outliers were treated as missing values. Outliers were identified as RTs below the minimum of 300 ms and RTs higher than 3 SD above the group mean for each item. As a result, 2.4% of correct responses (k = 127 out of 5,375) were removed.

Grammatical Knowledge
Grammatical processing in L2 speech production entails a variety of syntactic and morphological processes, such as syntactic procedures and morphological inflections (Kormos, 2006). Accordingly, we evaluated students' grammatical knowledge in terms of their accuracy and efficiency in syntactic encoding skills and grammatical monitoring processes.

Syntactic Encoding Skills
The study used the maze task that is designed to measure the automaticity of syntactic processing (Y. Suzuki & Sunada, 2018). In this task, participants were presented with two options for single words on a computer screen and instructed to select the word that can be grammatically connected to the sentence being constructed from two options (e.g., The ! student vs. and ! ocean vs. took ! the vs. dress! tests. vs. organic.).
Stimuli were adapted from Y. Suzuki and Sunada's (2018) study, which consisted of 48 sentences with 12 sentences for each of the four major syntactic structures: (a) declaratives, (b) wh-questions, (c) relative clauses, and (d) indirect questions. The order of sentence stimuli was randomized for each participant. Prior to the main trials, four practice sentences were provided. The time limit for each response was set at 4,300 ms, following Y. Suzuki and Sunada (2018). Participants were instructed to respond as quickly and accurately as possible. The maze task was administered using DMDX software (Forster & Forster, 2003).
The study computed two measures: (a) the number of correct responses in words and (b) the mean duration of the response latency (i.e., RT) to which trials correctly responded. Regarding the RT measure, outliers were identified as RTs below 300 ms or higher than 3 SD above the group mean of the latency of all word-level responses. As a result, 68 RTs (6.6 %) out of 49,406 RTs were removed.
To capture participants' grammatical knowledge in the monitoring mode, we employed a timed grammaticality judgment test (GJT; Godfroid et al., 2015). Target stimuli were adapted from Godfroid et al.'s (2015) study, which included 17 target grammatical features. For each grammatical target, four sentence stimuli were devised (68 sentences in total) with two for each of the grammatical and ungrammatical conditions. Considering the relatively low proficiency of the target population, we used written stimuli.
The timed GJT was administered using PsychoPy software (Peirce, 2007). Participants were instructed to judge the grammaticality of the sentences as fast and accurately as possible. Prior to the main 68 trials, participants completed eight sentences as practice trials. For each trial, the term "Ready?" was presented in the middle of the screen for 1,000 ms, and then the sentence stimulus appeared on the screen for 10,000 ms.
To compute accuracy scores based on GJT responses, we assigned one point for each correct response, while incorrect responses and no responses within the time limit were assigned no points. Only correct responses were used to compute RT scores, excluding outliers whose RT was below 300 ms or higher than 3 SD above the group mean for each sentence stimulus. Eventually, 28 RTs (0.4%) were removed from the RT analysis. We calculated accuracy and RT scores separately for syntactic and morphological features.

Articulatory Skills
The current study solely focused on the speed aspect of pronunciation knowledge, given the substantive difficulty of defining what constitutes targetlike pronunciation (Harding, 2018). 1 Moreover, prior work reported that a significant slowdown in L2 oral production may result from the speed of articulatory movements rather than the accuracy and speed of phonological processing (Broos et al., 2018). However, due to the incremental nature of speech processing (Kormos, 2006), we measured the efficiency of pronunciation-related processes holistically, using a controlled speech production task. The rationale for using controlled speech production, as opposed to single word production (e.g., delayed picture naming task; De , was that one of the essential processes of phonological encoding, syllabification, is supposed to take place not only within words but also between words, such as linking (Levelt, 1999).
Participants were asked to read a 69-word passage of an instruction on shopping silently and then read it aloud in English. The passage was adapted from Weinberger's (2011) speech accent archive (see http://accent.gmu.edu/index.php). Based on the speech data we computed the articulation rate measure applying the same procedure as the one for speed fluency. 1 As the accuracy or accent of pronunciation is evaluated as a deviation from a targetlike benchmark, it is necessary to define targetlike pronunciation for the assessment of pronunciation accuracy. However, due to the fact that there are different models of L2 pronunciation learning, especially in English, the assessment of pronunciation entails a substantive difficulty in defining what constitutes targetlike pronunciation (Kang & Ginther, 2018). Given the potential challenge to the validity of pronunciation accuracy, we thus decided not to include a cognitive fluency measure for pronunciation accuracy.

Data Collection Procedure
Data were collected in two sessions: group and individual sessions. Both sessions were conducted in a research laboratory and lasted for approximately 1 hour. In the group session, participants worked individually and completed CF tests including the paperbased PVLT, the maze task, and the GJT. In the individual sessions, participants performed four English speaking tasks, the controlled speech production task, and the picture-naming task, in this order. All participants first took part in the group session, and approximately 1 week later they participated in individual sessions. In the group testing session, the order of the PVLT and the grammar tests (the maze task and the GJT) was counterbalanced across participants. In the individual sessions, the order of the argumentative and picture narrative tasks was also counterbalanced across participants. Regarding the RtoS and RwLtoS tasks, the combination of the text presentation mode and source texts as well as its order was counterbalanced across participants.

Analysis
The current study investigates the CF-UF link at the level of latent variables (RQ3) and its variability across tasks (RQ4), using SEM. Prior to SEM analysis, we constructed several theoretically motivated CFA models of CF and UF and tested their model fit to identify the optimal factor structure of CF and UF (RQ1, RQ2). A SEM model was built to predict the latent variables of UF from those of CF. In response to the nonnormal distributions of many UF measures (for descriptive statistics, see Supplementary Information), estimations of all CFA and SEM models were made using Robust Maximum Likelihood estimation (Hu & Bentler, 1998).
Considering the relatively small sample size (N < 250) as well as the estimation method (i.e., Maximum Likelihood estimation), we focused on the model fit indices of SRMR and CFI (Hu & Bentler, 1998), while reporting the indices of chi square/df ratio, TLI, and RMSEA for the sake of comparability with future replication studies. The cutoff scores for these model fit indices were predetermined as follows: SRMR (< .08), CFI and TLI (> .95), chi-square ratio/df (< 2.0), and RMSEA (< .06). To address RQ3, the statistical significance of the regression paths from the latent variables of CF to those of UF was tested. As for RQ4, the regression coefficients of paths were compared across four speaking tasks using standardized coefficients and their 95% confidence intervals, which is analogous to the estimation of t-values in t-tests (i.e., path coefficient t-test; Tabachnick & Fidell, 1996). For the sake of interpretability of results, CF measures based on RT and breakdown and repair fluency measures were inversed in the CFA and SEM analyses. All the CFA and SEM models were estimated through the cfa function in the lavaan package (Rosseel, 2012), using R statistical software 4.0.2 (R Development Core Team, 2020).

Confirmatory Factor Analysis of Cognitive Fluency
To specify the factor structure of CF (RQ1), three proposed CFA models were tested. For these proposed CFA models of CF, residual covariances were set across CF tasks (e.g., the RT and accuracy measures of the maze task). The R code and anonymized data set are available on the OSF website (https://osf.io/95rzj/).
The first model (CF Model 1; see Figure 1) was a single-factor model, which assumes that CF is a unitary construct. One statistical advantage of a single-factor model is that the model is constructed with the minimum number of parameters, meaning that the estimation of the proposed model is relatively robust for a small sample size.
The second model (CF Model 2; see Figure 2) consisted of two subconstructs of CF, namely, linguistic resource and processing speed. These two subconstructs were conceptualized in accordance with the distinction made in empirical studies Kahng, 2020) and the theoretical assumption of causes for breakdowns in utterances (see "Cognitive Fluency" section). The latent variable of linguistic resource consisted of CF measures capturing the range of linguistic resources (the PVLT score, the accuracy score of the maze task, and the accuracy scores of the GJT), whereas the latent variable of processing speed was composed of RT-based measures and the articulatory speed measure.
Finally, we proposed a three-factor model that comprises linguistic resource, processing speed, and monitoring speed (CF Model 3; see Figure 3), separating monitoring processes from encoding processes. In L2 speech production, linguistic resources for monitoring processes are identical to those for linguistic encoding processes (Kormos, 2006;Levelt, 1999). Therefore, only the RT measures of the GJT (GJT Morphology RT, GJT Syntax RT) were used to create the third latent variable of CF, that is, monitoring speed. The indices of SRMR and CFI indicated an optimal fit for all three models, whereas two-and three-factor models showed a slightly better fit than the single-factor model (see Table 1). In principle, the more parsimonious the model is (i.e., fewer parameters), the more robust is the estimation of the model (Schoonen, 2015). We thus adopted the two-factor model for the factor structure of CF (for the model parameters of the CF Model 2, see Supplementary Information).

Confirmatory Factor Analysis of Utterance Fluency
We proposed several CFA models for UF. First, due to its advantage of statistical robustness, a single-factor model was proposed (UF Model 1, see Figure 4). Second, motivated by speech production mechanisms, we proposed a two-factor model (UF Model 2; see Figure 5) by categorizing the temporal features of speech into processing smoothness and processing disruptions.

Mid-clause pause duration
Self-repetition ratio Self-correction ratio  Finally, following Tavakoli and Skehan (2005), a three-factor model (UF Model 3; see Figure 6) was proposed, consisting of speed, breakdown, and repair fluency. In the proposed CFA models of UF, residual covariances were set between mid-and endclause pause ratio measures and mean length of run because their measurement errors are commonly attributed to pause annotation.
Although the three-factor model showed a relatively better fit across tasks, none of the proposed models optimally fit the data. To explore a better CFA model, a datadriven approach was taken to modify the factor structures. First, overall intercollinearity among the UF measures was inspected by means of correlation coefficients pooled over tasks. We then excluded speech rate due to its strong correlation with mid-clause pause ratio (r = .845). In addition, motivated by the strong correlation between mid-and end-clause pause duration (r = .735), we also replaced mid-and end-clause pause duration with a single measure of mean pause duration without the distinction of pause locations. Second, modification indices were calculated to explore potential residual covariances and improve the model fit. The following three residual covariances were adopted: (a) between mean pause duration and filled pause ratio, (b) between mid-clause pause ratio and self-correction ratio, and (c) between end-clause pause ratio and false start ratio (for details of model modification, see Supplementary Information).
The revised models of UF (one-, two-, and three-factor models; UF Model 4, UF Model 5, and UF Model 6, respectively) were inspected for goodness of fit. SRMR indices indicated that all the models may fit well to the current data set, whereas the other model fit indices (e.g., CFI) consistently showed that the three-factor models better fit the current data set (see Table 2). The revised three-factor model of UF measures also suggested strong correlations between latent variables of speed and breakdown fluency (r = .929-.960), indicating redundancy in the distinction between these two latent variables. We thus proposed another factor structure with speed and breakdown fluency measures loaded onto one latent variable (UF Model 7; see Figure 7).
Although the model-fit of the new model was virtually identical to the revised three-factor model (SRMR = .058-.070; CFI = .845-.922; see also Supplementary  Information), we decided to adopt the revised three-factor model (UF Model 6), considering its theoretical compatibility with Tavakoli and Skehan's (2005) triad model of UF and L2 speech production mechanisms (Kormos, 2006;Segalowitz, 2010).

Structural Equation Model of the Cognitive-Utterance Fluency Link
Building on the CFA models of CF (CF Model 2) and UF (UF Model 6), an SEM model was constructed to predict the latent variables of UF (speed, breakdown, and repair fluency) from those of CF (linguistic resource, processing speed) separately for four speaking tasks. One additional residual covariance was included between the articulatory speed measure of CF and the articulation rate measure of UF in the SEM model because measurement errors of these measures can be methodologically shared.
The indices of goodness-of-fit were first inspected. The proposed SEM model optimally fitted the current data set (SRMR < .08), with some potential room for improvement in the model's fit to the data (CFI < .95; see Table 3). The modification indices did not suggest paths that can be verified by a theoretical framework of oral  fluency and were consistent across tasks. We thus regarded the model as the final model of the CF-UF link. The SEM model with standardized regression coefficients across tasks is visually presented in Figure 8. RQ3 is concerned with how the latent variables of CF are associated with the latent variables of UF. As summarized in Figure 8, speed fluency was associated with linguistic resource only in the RtoS and RwLtoS tasks, and with processing speed in all four tasks. Meanwhile, breakdown fluency was overall consistently related to both linguistic resource and processing speed across tasks. Despite the lack of significant differences, the latent variable of breakdown fluency seemed to show slightly stronger associations with processing speed (β = .376-.502) than with linguistic resource (β = .221-.345). As for repair fluency, linguistic resource significantly contributed to the construct of repair fluency only in speaking tasks where the content of speech was predefined (the picture narrative task, the RtoS task, the RwLtoS task). Meanwhile, processing speed was not related to repair fluency in any of the speaking tasks.
The SEM model suggested that the relative importance of linguistic dimensions differed between the latent variables of CF in terms of their range of confidence intervals (see Supplementary Information). Regarding linguistic resource, the regression coefficients of PVLT (β = .845-.879) were significantly higher than those of Maze Word Accuracy, except for the picture narrative task (β = .675-.691). As regards processing speed, the highest regression coefficients were found in Maze Word RT (β = .794-.821). According to the 95% confidence intervals, the strengths of coefficients between Maze Word RT and GJT Syntax RT (β = .607-.620) did not reach statistical significance in any of the speaking tasks. Significant differences in the regression coefficients were only found between Maze Word RT and Picture Naming RT (β = .436-.453). The latent variables of linguistic resource and processing speed were strongly associated with each other consistently across tasks (β = .664-.676). Looking closely at the measurement models of UF constructs, the regression coefficients of articulation rate (β = .876-.905) to the latent variable of speed fluency seemed to be slightly higher than those of mean length of run (β = .721-.882). Regarding breakdown fluency, the coefficients of mid-clause pause ratio (β = .919-.963) were significantly higher than those of the other measures-mean pause duration (β = .528-.690), end-clause pause ratio (β = .373-.515), and filled pause ratio (β = .545-.628). As regards repair fluency, the regression coefficients of selfrepetition ratio were significantly higher than those of self-correction ratio (except for the RtoS task) and false start ratio. Finally, there were strong competitive relationships between the latent variables of speed fluency and breakdown fluency (β = -|.769-.822|) and between those of speed fluency and repair fluency (β = -|.720-.749|), whereas the latent variables of breakdown fluency were positively associated with those of repair fluency (β = .639-.796). 2

Discussion
Motivated by the lack of studies on the CF-UF link at the level of constructs, the current study examined the CF-UF link (RQ3), using SEM. We operationalized CF as a set of linguistic resources and processing skills involved in speech production, and each dimension of UF-speed, breakdown, and repair fluency-was also measured using four different speaking tasks. Furthermore, in L2 fluency research, the dimensionality of CF and UF had not been revisited, or even specified, especially concerning generalizability across different speaking tasks. Therefore, we also examined the factor structure of CF and UF by means of CFA (RQ1, RQ2). Finally, in light of the generalizability and robustness of the CF-UF link, we explored the variability in the association between the subconstructs of CF and UF across different speaking tasks (RQ4).

Dimensionality of Cognitive Fluency
We tested the single-, two-, and three-factor models of CF, all of which were proposed based on L2 speech production mechanisms (Kormos, 2006;Levelt, 1989;Segalowitz, 2010) and Segalowitz's (2010) conception of CF. We adopted the two-factor model that consisted of the latent variables of linguistic resource and processing speed (CFI = .976, SRMR = .051). The latent variable of linguistic resource involved the PVLT score (vocabulary size), GJT accuracy scores (syntax and morphology), and the maze task accuracy score (sentence construction skills), whereas that of processing speed included the RT measures of the picture naming task (lexical retrieval), the maze task, and the GJT, as well as articulatory speed in controlled speech production. The strong association between these two latent variables (r = .676) indicates that the subdimensions of CF-linguistic resource and processing speed-are interrelated. Compared to the final two-factor model, the single-factor model showed a less adequate fit to the current data (CFI = .919, SRMR = .078), indicating that the construct of CF may not be regarded as a unitary construct. The current finding of two-dimensionality of CF may thus provide supporting evidence for the broad definition of CF as well as the existing methodological practice of measuring CF components Kahng, 2020).
The measurement models of the subconstructs of CF suggested that the primary components of linguistic resource and processing speed were different. To interpret the dimensionality of CF in relation to its contributions to UF, the measurement model of CF in the final SEM model is discussed. As for the latent variable of linguistic resource, PVLT (vocabulary size) had the highest regression coefficients (β = .845-.879). The regression coefficients of PVLT were significantly higher than those of Maze Word Accuracy, except for the picture narrative task (β = .675-.691). However, there were overlaps of confidence intervals between PVLT and GJT Syntax Accuracy (β = .710-.746). Students' performance in the maze task can be explained with reference to their efficiency in the application of syntactic encoding procedures in L2 (e.g., word order) as well as accessibility to the syntactic properties of lemmas in their mental lexicon. Meanwhile, the accuracy scores of syntactic items in the GJT may only represent the mastery of syntactic properties of target lemmas. Building on the assumption that the syntactic properties of lemmas (e.g., part of speech) are stored in speakers' mental lexicon (Kormos, 2006;Levelt, 1989), the accessibility of such syntactic properties of lemmas can be regarded as part of the construct of depth of vocabulary knowledge. As vocabulary size and depth are arguably closely related to each other (González-Fernández & Schmitt, 2020), the nonsignificant difference in the regression coefficients between PVLT and GJT Syntactic Accuracy may be explained by the potential overlap between vocabulary size and depth. The relative strengths of those regression coefficients suggests that lexical resources can be regarded as a primary component of linguistic resource of CF in line with the lexically driven nature of L2 speech production (Kormos, 2006). The construct of linguistic resource in CF can thus be defined as the breadth and depth of linguistic knowledge to express speakers' intended message.
Regarding the latent variable of processing speed, the strongest regression path was Maze Word RT (β = .794-.821), which taps into the speed of sentence construction. Despite the slight overlaps of the boundaries of 95% confidence intervals, the regression path of Maze Word RT seemed stronger than that of Syntax RT (β = .604-.620), GJT Morphology RT (β = .614-.626), and articulatory speed (β = .635-.663) in the SEM model. Note that the regression coefficients of Maze Word RT were clearly higher than those of Picture Naming . Therefore, the current results indicate that the primary component of processing speed may be the speed of sentence construction (measured by Maze Word RT). Such syntactic processing skills might also be more important than lexical retrieval speed within the construct of processing speed of CF (for a different pattern, see Kahng, 2020). One possible explanation for the primary role of syntactic processing skills in processing speed is that variability in the speed of linguistic processing might be aligned with variability in the automaticity of L2 syntactic knowledge (cf. McManus & Marsden, 2019;Morgan-Short et al., 2014). Taken together, the construct of processing speed can be defined as the automaticity of accessing and manipulating linguistic knowledge.

Dimensionality of Utterance Fluency
Motivated by theoretical conceptualizations of speech production mechanisms as well as Tavakoli and Skehan's (2005) triad model of UF, the current study tested single-, two-and three-factor models of UF. Considering the theoretical distinction between speed and breakdown fluency, the three-factor model, following Tavakoli and Skehan (2005), was adopted as the final model of UF, suggesting that the construct of UF consists of speed, breakdown, and repair fluency. The optimal model fit in all four tasks (e.g., SRMR = .056-.070) indicated the generalizability and robustness of Tavakoli and Skehan's (2005) triad model of UF across different speaking tasks. Moreover, Tavakoli and Skehan's (2005) study only included two composite measures (speech rate, mean length of run) as measures of speed fluency, and these two measures and breakdown fluency measures loaded on the same latent variable in their study. Tavakoli and Skehan (2005) could thus only conceptually argue for distinguishability between speed and breakdown fluency. Meanwhile, the current study statistically has proved the distinction between speed and breakdown fluency by including the pure measure of speed fluency, that is, articulation rate .
The construct definition of each dimension of UF can be revisited with regard to the relative importance of observed variables within latent variables. As for speed fluency, the regression coefficients of articulation rate (β = .876-.905) seemed to be slightly higher than those of mean length of run (β = .721-.882). This may support the statistical procedure of handling mean length of run as a measure of speed fluency in the SEM analysis, despite its composite nature . The slightly lower regression coefficients of mean length of run to the latent variable may indicate that some variance in mean length of run can be derived from factors other than the construct of speed fluency, such as the construct of breakdown fluency. The primary component of speed fluency is thus arguably represented by the measure of articulation rate, which captures the whole range of speech processing mechanisms (Kormos, 2006;Segalowitz, 2010). Therefore, the construct of speed fluency can be defined as the overall efficiency of speech production.
Regarding breakdown fluency, the regression coefficients of mid-clause pause ratio (β = .919-.963) were significantly higher than those of other breakdown fluency measures-end-clause pause ratio (β = .373-.515), filled pause ratio (β = .545-.628), and mean pause duration (β = .528-.690; except for the RtoS task). There were no significant differences in the regression coefficients among these three measures of mean pause duration, end-clause pause ratio, and filled pause ratio. Therefore, the representative component of breakdown fluency is the frequency of breakdowns in the middle of utterances, whereas the length of pauses and the frequency of pauses at clausal boundaries and filled pauses might be secondary . Midclause pauses are reflective of disruptions to L2-specific processing, such as lexical retrieval and sentence construction (De Jong, 2016;Tavakoli, 2011). Accordingly, the construct of breakdown fluency may represent L2 users' ability to continue speaking without disruptions to L2-specific speech processing.
As regards repair fluency, the regression coefficients of self-repetition ratio to the latent variable of repair fluency (β = .787-.860) tended, overall, to be significantly higher than those of false start ratio (β = .289-.459) and self-correction ratio (β = .487-.632). Accordingly, the frequency of self-repetitions can be regarded as the primary component of repair fluency, whereas both self-corrections and false starts are of secondary importance. The frequency of self-repetitions may be independent of L2 proficiency  and reflective of learners' speaking style (De Jong et al., 2015). Alternatively, self-repetition can be used as a fluency strategy or problem-solving mechanism (Dörnyei & Kormos, 1998). Specifically, the use of selfrepetitions can buy time for monitoring or retrieval processes, as lexicalized fillers do. From the perspective of speech production, another important assumption is that repair fluency is in a complementary relationship with breakdown fluency (Tavakoli & Wright, 2020). When a speaker experiences disruption to speech processing and is required to repair their utterance, the speaker can engage with the repairing process either by producing no speech (i.e., silent pauses) or repeating the previous utterance (i.e., self-repetition). The strategic use of self-repetition may be determined by the speaker's individual preference and might consequently obscure the association with L2 competence. Taken together, the construct of repair fluency reflects the ability to produce L2 speech without disfluency features.

Contribution of Cognitive Fluency to Utterance Fluency
The SEM model revealed the multidimensional interrelationship between CF and UF with some variations across four speaking tasks. The latent variable of processing speed of CF contributed to that of speed fluency consistently across speaking tasks (β = .431-.609). Meanwhile, the latent variable of linguistic resource made significant contributions to that of speed fluency only in the RtoS task (β = .234) and the RwLtoS task (β = .276). Therefore, the overall efficiency of speech production (speed fluency) can be primarily supported by the speed of linguistic processing skills. The consistent contributions of the speed dimension of CF to speed fluency in the current study may provide some supporting evidence for Segalowitz's (2016) claim that CF is mainly characterized by the speed of L2-specific linguistic processing. Meanwhile, the taskdependent role of linguistic resource in speed fluency can be interpreted with regard to the characteristics of RtoS and RwLtoS tasks, that is, the enhanced activation of relevant linguistic items by the source texts. If students have acquired those activated items for productive use, the enhanced activation of those items can assist students to use the items rapidly (cf. priming effects, McDonough & Trofimovich, 2008), subsequently increasing their overall efficiency of speech production (i.e., speed fluency). Therefore, the contributions of linguistic resources to speed fluency may increase when the mastery of relevant linguistic items plays a particularly important role in the completion of a given task.
The latent variable of breakdown fluency was associated with both dimensions of CF consistently across speaking tasks, despite the marginally significant contribution of linguistic resource in the RwLtoS task (p = .061). The results indicated that the ability to continue speaking without disruption may be underpinned by both the availability of linguistic resources and the speed of linguistic processing. This finding is in line with the broad definition of CF, which assumes that breakdowns in speech production can be caused by either lack of linguistic resources or slow processing speed (see Kormos, 2006; see also the "Cognitive Fluency" section). Moreover, the association of breakdown fluency with both dimensions of CF may give some insights into how the constructs of speed fluency and breakdown fluency are theoretically distinguishable, despite the strong correlation between them. Speed fluency was mainly related to the speed dimension of CF, whereas breakdown fluency was connected to the linguistic resource of CF as well as the processing speed component.
The significant contribution of linguistic resource to repair fluency was only found in the picture narrative task, the RtoS task, and the RwLtoS task (β = .330-.375). Meanwhile, the processing speed of CF was not associated with the latent variable of repair fluency in any of the speaking tasks. Previous studies have shown that the construct of repair fluency is relatively independent of L2 proficiency  and reflective of individual speakers' speaking styles (De Jong et al., 2015;Peltonen, 2018). However, the current result may suggest that repair fluency is not entirely independent of L2-specific linguistic knowledge in some communicative situations where the content of speech is mostly predefined (i.e., closed task; see Pallotti, 2009). One essential characteristic of closed tasks is that students cannot avoid expressing some information to achieve the given task, even if they have not fully acquired the necessary linguistic items to convey the intended information. Students are thus required to engage with modifying the intended message or search for some alternative expressions using their own resources. As discussed previously, students can strategically or subconsciously use self-repetition to buy time to repair their utterances (Dörnyei & Kormos, 1998). Therefore, the contribution of linguistic resource to repair fluency may reflect engagement with repair due to the lack of linguistic resources needed to express task-essential information.

Conclusion
Our study is the first one to examine the CF-UF link at the level of constructs and offers novel insights into how the subconstructs of CF contribute to those of UF across different speaking tasks. Our research has demonstrated that the construct of CF consists of two dimensions-linguistic resource and processing speed-and confirmed the robustness of Tavakoli and Skehan's (2005) three-dimensional model of UF (speed, breakdown, and repair fluency) across tasks. Based on our analyses, we have argued that key components of linguistic resource in CF are the breadth and depth of linguistic knowledge needed for encoding speakers' intended message. This suggests that similar to L1 speech production (Levelt, 1989(Levelt, , 1999, semantic knowledge is essential to ensure the efficiency of encoding L2 speech. We also found that the speed of sentence construction was a key component of the construct of processing speed, which highlights the important role of automaticity of syntactic encoding processes in L2 spoken performance (cf. Kormos, 2006). The SEM analysis also revealed a complex interplay between the multidimensionality of CF and UF and speaking task types. Speed fluency was primarily associated with processing speed, whereas linguistic resource might only play a role when relevant linguistic items are activated in advance by the input task (i.e., RtoS and RwLtoS tasks). Meanwhile, both linguistic resource and processing speed contributed to breakdown fluency consistently across speaking tasks, suggesting that linguistic encoding problems can occur due to both lack of resources or challenges in accessing and processing linguistic knowledge in real time. Finally, the contribution of linguistic resource to repair fluency was significant only when the content of speech was predefined (i.e., picture narrative task, RtoS task, RwLtoS task), whereas repair fluency was generally independent of processing speed. These results confirmed that the processing speed of CF showed a consistent pattern of contributions to UF across speaking task types, whereas the role of linguistic resource of CF in UF tends to vary, depending on task characteristics.
The current findings offer some insights into what linguistic objectives should be prioritized in relation to L2 fluency development. The CFA model of CF showed that vocabulary size was found to be the primary component of linguistic resource, whereas sentence construction speed was the primary component of processing speed. Accordingly, vocabulary instruction should emphasize widening students' lexical repertoires for productive use (Webb et al., 2020), and grammar instruction should focus not only on accuracy but also on the speed and efficiency of grammatical encoding which can be enhanced through meaningful and engaging practice activities (Y. Suzuki & DeKeyser, 2017). Articulatory speed was also found to be another component of the processing speed of CF, indicating that training on some suprasegmental features, such as linking and vowel reduction, may also facilitate students' fluent speech production (Saito et al., 2019). In addition, our SEM model showed that the construct of breakdown fluency may be consistent across tasks, whereas that of speed fluency and repair fluency could vary, depending on task characteristics. Therefore, breakdown fluency measures, such as mid-clause pause ratio (for predictive validity in perceived fluency, see S. Suzuki et al., 2021), could be adopted as a representative feature in automated scoring systems for oral proficiency.
Two significant methodological limitations need to be acknowledged in interpreting the current findings. First, we did not include measures of multiword sequences and pronunciation accuracy (cf. De Kahng, 2020). The processing advantage of multiword sequences in L2 speech production has been advocated in L2 fluency research (Tavakoli & Uchihara, 2020). Similarly, despite the substantive difficulty in identifying targetlike pronunciation, previous studies have found some unique contributions of pronunciation, such as syllable structure errors, to listenerbased judgments of fluency (S. Suzuki & Kormos, 2020). Due to the SEM approach, the latent variables of CF in the current study may encompass a certain amount of potential covariance with phraseological competence and pronunciation skills. However, future studies can replicate the current study with additional CF measures of multiword sequences and pronunciation accuracy. Second, two composite measures (mean length of run, speech rate) were used as speed fluency measures for statistical reasons to avoid an underidentified model in CFA analyses. However, due to the intercollinearity among observed variables of speed fluency, the measure of speech rate was excluded from the CFA model of UF. Eventually, the measurement model of speed fluency was regarded as an underidentified model.