PREDICTING L2 FLUENCY FROM L1 FLUENCY BEHAVIOR

Abstract The article reports on the findings of a study investigating the relationship between first language (L1) and second language (L2) fluency behavior. Drawing on data collected from Turkish learners of English, the study also addresses the question of whether proficiency level mediates the relationship, if any. The data were coded for a range of breakdown, repair, speed, and composite measures. Language proficiency was measured by means of two tests: Oxford Placement Test (OPT) and an Elicited Imitation Task (EIT). The results show that some breakdown and repair measures were positively correlated in L1 and L2, but no correlations were observed for articulation rate and speech rate. The relationships were not mediated by proficiency level. Regression analyses show that a number of models predicted L2 fluency. L1 fluency contributed significantly to models predicting pausing behavior; EIT scores predicted L2 speech rate; and L1 fluency and OPT scores predicted L2 repair and mid-clause pauses. The important implications of the findings for fluency research and second language pedagogy are discussed.


INTRODUCTION
Research in second language (L2) oral fluency is inspired by at least two convincing rationales. First, recognizing the importance of fluency as a key construct of L2 ability, researchers have attempted to examine fluency in relation to other dimensions of linguistic output such as syntactic complexity and accuracy. Second, researchers are interested in fluency as a means toward developing a more profound understanding of L2 processing and production. Regardless of rationale and motivation, researchers in this area are most likely to agree that (a) fluency has a complex nature that is difficult to conceptualize and measure, (b) our current knowledge of what psychological and social factors contribute to fluency is limited (see Segalowitz, 2016, for a detailed discussion), and (c) more research is needed to extend our understanding of fluency.
In its short history, oral fluency research has yielded some important findings that help us develop not only a more in-depth understanding of fluency and its components but also a more reliable measurement of it. These findings also point at emerging evidence to suggest that L2 fluency behavior is, at least to some extent, influenced by L1 behavior and individual differences Derwing et al., 2009;Huensch & Tracy-Ventura, 2016). This body of research implies that the amount of pausing and hesitation in L2 speech might, to some extent, mirror L1 fluency behavior, and that L2 fluency can, at least to some degree, be predicted from L1 fluency behavior. These findings, however, remain rather limited as few languages, language typologies, and individual differences have been examined so far.
One of the shortcomings in the current understanding of fluency is its relationship with proficiency level . While it is commonly agreed that fluency increases as proficiency improves, little research has examined whether the relationship is linear (ibid.). More central to the focus of this article is the question of whether L2 proficiency level plays a role in the relationship between L1 and L2 fluency. To the best of our knowledge, there has been little research investigating whether the relationship between L1 and L2 fluency persists consistently across assessed levels of proficiency, or to what extent L2 speakers' fluency behavior deviates from their L1 fluency behavior when they become more proficient L2 users. The current study aims to help fill this gap by examining the relationship between L1 Turkish and L2 English fluency behavior, and by probing whether proficiency level mediates this relationship.

L2 FLUENCY RESEARCH
Fluency in its broad sense is "a cover term for oral proficiency" (Lennon, 1990, p. 398), and is sometimes interchangeably used for overall proficiency. In the narrow sense of the term, however, fluency is regarded as a component of oral communicative ability, both temporal and nontemporal aspects, that influence how smoothly and effortlessly speech is delivered (Derwing et al., 2009). The present study is interested in fluency in the latter sense of the term. While we do not aim to add another definition to the literature, we find Koponen and Riggenbach's (2000) definition useful as it considers fluency as "flow, continuity, automaticity, or smoothness of speech" (p. 6).
Research in L2 fluency has shed light on the complex and multifaceted nature of fluency. Segalowitz (2010), in his triadic framework, proposes that fluency consists of three aspects of cognitive, utterance, and perceived fluency. Cognitive fluency is the "ability to efficiently mobilize and integrate the underlying cognitive processes responsible for producing utterances" (Segalowitz, 2010, p. 48); utterance fluency refers to the temporal characteristics of speech that can be measured objectively (e.g., speed, pauses, or hesitations); and perceived fluency concerns the listener impression of the cognitive fluency of a speaker. While these three aspects are equally important and are inevitably interrelated, utterance fluency has been central to research in second language acquisition (SLA) (e.g., de Jong et al., 2015;Huensch & Tracy-Ventura, 2017;Kormos & Denes, 2004;Thai & Boers, 2016;Wright, 2013) as its concrete and measurable nature allows objective and systematic examination. A new classification of different aspects of fluency is proposed by Skehan et al. (2016) who argue that utterance fluency should be divided to speed, clause level, and discourse level. They propose that clause processing, which focuses on lexical choices occurring within a clause, is linked to mid-clause pausing and influenced by the work of the Formulator and Articulator in Levelt's model (Levelt, 1989, see the following text for a brief discussion). In contrast, discourse-level processing, occurring above the clause level and dealing with issues and problems related to conjoining linguistic units, should be considered a discourse-level fluency. The authors maintain that discourse-level disfluencies are linked to the Conceptualizer's demands, while the clause-level disfluencies are evidence of microplanning and connected to the Formulator. Given that Skehan et al.'s proposal still needs empirical evidence from L2 research to be validated, we will adopt Segalowitz's (2010) classification of fluency in this study.
For measurement purposes, utterance fluency is generally divided into three categories: breakdown, speed, and repair (Skehan, 2003;Tavakoli & Skehan, 2005). Breakdown fluency concerns the flow of speech and examines pauses in terms of amount, location (mid-vs. end-clause position), and character (filled vs. silent). Speed fluency refers to the speed with which speech is produced (e.g., number of syllables per second), and repair fluency indicates repair strategies speakers employ to monitor and modify their utterances (e.g., false starts, repetitions, or corrections). A useful distinction made in this area is between pure and composite measures of fluency Hunter, 2017;Tavakoli et al., 2017). While composite measures, that is those combining two or more aspects of fluency (e.g., speed and breakdown fluency in Mean Length of Run), are known to be linked with human judgement of fluency (e.g., Kormos & Denes, 2004;Prefontaine & Kormos, 2016), pure measures, that is those examining only one aspect of fluency (Skehan, 2014), are believed to reveal underlying processes of speech formulation and production (e.g., Huensch & Tracy-Ventura, 2017) and can help provide a more nuanced picture of the speech production processes.
L2 fluency research has also gained currency in view of the need for developing an L2 speech production model. Many researchers (de Bot, 1992;Kahng, 2014;Kormos, 2006;Skehan, 2009) have taken Levelt's (1989) model of L1 speech production as a starting point. According to this model, during the process of speech production a speaker goes through three stages of conceptualization, formulation, and articulation. During conceptualization, the speaker generates a preverbal message that then moves to the formulation stage, where an original linguistic message is formulated through grammatical and phonological encoding of lemmas accessed in the lexicon. This then moves to the articulation stage, where the linguistic plan is converted into overt speech. de Bot (1992) and Kormos (2006) claim that although similar processes may be in place between the two language processing systems, there is a range of L2-related phenomena that should be carefully considered in the development of an L2 speech processing model, including differences in L1 and L2 speech development and cross-linguistic influences. Kormos (2006) argues that for L2 processing because L2 speech may not yet have become automatic, an additional L2-specific knowledge store is needed to act as a source of declarative memory to process "knowledge of syntactic and phonological rules" (p. 178).
In effect, the degree of automaticity in speech production highlights one of the key differences between L1 and L2 processing (Kormos, 2006;Tavakoli, 2019).
To develop a better conceptualization of L2 speech processing, therefore, it is necessary to examine the extent of the relationship between L1 and L2 fluency and the factors that may affect this relationship, for example, individual differences such as language proficiency. Derwing et al. (2009) argue that insight can be gained through an investigation of automatic processing in both languages or through a careful investigation of fluency in L1 and L2 speech production. The present study is primarily interested in the latter, that is, examining fluency in L1 Turkish and L2 English speakers. A brief overview of relevant research in this area is presented in the following text.

RESEARCH IN L1 AND L2 FLUENCY
Studies focusing on the relationship between L1 and L2 fluency are relatively scarce (e.g., de Jong et al., 2015;Derwing et al., 2009;Huensch &Tracy-Ventura, 2016;Riazantseva, 2001;Towell & Dewaele, 2005;Towell et al., 1996). One of the earliest studies on this topic is Riazantseva (2001). Comparing data elicited from advanced and intermediate Russian learners of L2 English, Riazantseva examined the pausing phenomena in L1 Russian and L2 English. Measuring pauses of longer than 100ms, she found that the intermediate group exhibited more L1-like behavior than the advanced group and interpreted the differences in the light of L1 transfer, concluding that L2 speakers may overcome these effects as their L2 proficiency develops. The study design, however, did not include low-proficiency learners, nor did it examine the relationship between pausing behavior and other aspects of fluency.
In a longitudinal study of L2 immigrant learners in Canada, Derwing et al. (2009) investigated the extent to which temporal features of L1 speech can be related to L2 fluency characteristics. They collected data throughout 2 years (at 2 months, 10 months, and 1 year of residency in Canada) with 16 L1 Slavic and 16 L1 Mandarin speakers of L2 English, performing the same task in their L1 and L2. The data were analyzed for a number of temporal measures (i.e., number of pauses per second, speech rate, pruned syllables per second) and rated by trained judges. The fluency ratings demonstrated correlations between L1 and L2 measures, highlighting a stronger correlation for the Slavic L1 group than for the Mandarin L1 group. The results also indicated that L1 and L2 measures correlated significantly for both Slavic and Mandarin languages during the initial stages of language exposure, but at later stages only for the Slavic group. The authors concluded that the group differences might be attributed to different amounts of exposure to English or to a "closer relationship between Slavic languages and English than between Mandarin and English" (p. 534).
Perhaps the strongest evidence about the relationship between L1 and L2 fluency behavior comes from two recent studies: de Jong et al. (2015) and Huensch and Tracy-Ventura (2016). Working with learners of Dutch as an L2, de Jong et al. (2015) used two types of fluency measures, that is uncorrected measures and measures corrected for L1 behavior, to explore the relationship between L1 and L2 fluency. They postulated that because the original L2 measures could be testing both personal speaking styles and learners' L2-specific skills, correcting these measures by adjusting for L1 fluency behavior would lead to a more accurate measurement of fluency. They examined two typologically different L1s, that is English and Turkish, and elicited language samples through eight different tasks from intermediate to advanced proficiency learners. Their results showed that all L2 fluency measures were, to a certain extent, related to L1 fluency, with the variance ranging from 21% to 57%. The authors also reported that although all measures (i.e., both corrected and uncorrected) predicted proficiency significantly, the corrected measure of syllable duration predicted L2 proficiency more strongly than its original uncorrected equivalence, concluding that adjusting this measure for L1 fluency would lead to more accurate results.
In another study, Huensch and Tracy-Ventura (2016) examined the contribution of L1 fluency behavior, cross-linguistic differences, and L2 proficiency to L2 fluency behavior over time in a study-abroad context. The participants, L1 English speakers who were studying Spanish and French for their university degree, performed an oral narrative task three times: twice in their L2s (once before traveling abroad and once after residing abroad for 5 months), and a last time in their L1 after they had returned home from study abroad. The results suggested that "L1 fluency behaviour, cross-linguistic differences, and proficiency differentially contributed to explaining L2 fluency behaviour prior to and during immersion" (Huensch & Tracy-Ventura, 2016, p. 2). They reported that proficiency level predicted L2 fluency measures of mean syllable duration and mean silent pause duration before residing abroad; however, after 5 months of study abroad no influence of proficiency level was observed for any of the L2 fluency measures employed. These results are particularly noteworthy as they show a clear relationship between L1 and L2 fluency and, moreover, broaden our perspective on the change in fluency behavior as proficiency increases. However, this study neither examined different proficiency levels systematically nor did it establish whether proficiency level mediates the relationship between L1 and L2 behavior. Like Huensch and Tracy-Ventura (2016), most studies examining the relationship between L1 and L2 fluency were performed with learners in a study-abroad context or participants using and learning L2 in the target language community for communication purposes Derwing et al., 2009;Di Silvio et al., 2016;Leonard & Shea, 2017). Given the research evidence that fluency is one of the key aspects of the L2 use to benefit from study abroad or life in the L2 community (Mora & Valls-Ferrer, 2012;Tavakoli, 2018;Wright, 2013Wright, , 2018, it seems necessary to conduct studies that investigate the relationship between L1 and L2 fluency among speakers who have not had substantial exposure to the L2 or authentic opportunities for using it for genuine communication purposes. A final study to report here is Peltonen (2018) who examined L1 and L2 fluency behaviors among Finnish children learning English at school. Working with 42 participants from two school levels, she examined a range of fluency measures and reported positive correlations between the majority of temporal L1 and L2 measures. In addition, the study suggested that many L2 temporal measures can be predicted from L1 behavior. While the participants in this study belonged to two levels, Peltonen (2018) acknowledged that the two groups may not be distinctive in their proficiency level, and therefore, the results may not show whether proficiency level mediates relationship between L1 and L2 fluency behavior.
Taken together, the results of these studies suggest there is a link between L1 and L2 fluency behavior and imply that relationships between the two would change over time as proficiency increases. However, these results are neither conclusive nor generalizable to different L1s Derwing et al., 2009). Further research is needed to demonstrate whether and in what ways the L1-L2 fluency relationship changes when learners' proficiency improves. Before discussing the study, it is necessary to highlight the role of proficiency in understanding the relationship between L1 and L2 fluency.

SECOND LANGUAGE PROFICIENCY AND FLUENCY
Language proficiency is an individual learner difference commonly examined in L2 studies due to its direct relationship with L2 acquisition and development. Language proficiency, or as Hulstijn (2015) defines it, "knowledge of language and the ability to access, retrieve and use that knowledge in listening, speaking, reading and writing" (p. 21), is of particular interest to the current study as it can help develop a better understanding of the relationship between L1 and L2 fluency. In L1, speech production relies largely on incremental, parallel, and automatic processing (Levelt, 1989). The parallel and automatic nature of L1 processing helps make speech fluent with little undue hesitation or disruption. In contrast, L2 speech production, particularly at lower levels of proficiency, does not operate on automatic processing. Working with incomplete linguistic knowledge, the L2 speech production process faces several challenges including access and retrieval of linguistic units and monitoring the language during and after production (Kormos, 2006). This controlled and serial, rather than parallel, processing is often marked by signs of disfluency, for example, slower speech rate and frequent pauses. When proficiency improves, whether due to exposure, practice, or instruction, linguistic knowledge expands and parallel processing becomes possible, the production process becomes more automatic and speech becomes more fluent. Therefore, it can be hypothesized that at higher levels of proficiency an increase in automaticity makes parallel processing possible, freeing up attentional resources available to the speaker, and L2 speech processing becomes more similar to L1 speech processing. This implies that as language proficiency improves, one might expect a stronger relationship between L1 and L2 fluency behavior. In contrast, for less proficient speakers, this relationship may be overshadowed by the L2 speech production challenges (e.g., serial processing, and access and retrieval demands), and therefore, two different processing systems can be observed. While research in this area already recognizes that L2 fluency is at least partly related to L1 fluency Peltonen, 2018), it is necessary to understand whether this relationship persists as proficiency develops. This is what the current article aims to investigate.

RESEARCH AIMS
The current study attempts to expand our knowledge of the relationship between L1 and L2 fluency in three regards. First, it aims to examine the relationship between L1 Turkish and L2 English to contribute to the current understanding of fluency research in this area. Previous studies have shown that typological similarities and differences between two languages seem to affect the relationship between L1 and L2 fluency behavior Huensch & Tracy-Ventura, 2016). Structural differences between Turkish and English can also be expected to affect the results. In Turkish, a highly agglutinative language, words can have several affixes (mostly suffixes) to reflect different meanings or grammatical functions. For example, an English sentence or phrase, such as "they will come," can be expressed in a single word in Turkish, with "come" as the morpheme (i.e., "gelecekler"). Given that speed is usually calculated in terms of number of syllables per minute, it is not expected to be affected by the structure of words and utterances. However, in an agglutinative language, it is common to produce a word of several syllables to represent a whole sentence, and this is likely to result in speaker's producing more syllables in Turkish than in English in the same amount of time, which might have an influence on speed. Given that the current study is not focusing on cross-linguistic differences between the two languages, this is one of the several questions about L1-L2 fluency relationship that future research will need to address.
Second, the study aims to examine whether L2 proficiency level mediates the relationship between L1 and L2 fluency. Compared to most previous studies that have looked at only one or two levels of proficiency Derwing et al., 2009;Huensch &Tracy-Ventura, 2016), the current study is one of the few studies in which participants come from three different proficiency levels (A2, B1, and B2 of the CEFR). In addition, from a methodological point of view, we are extending the existing research framework by examining proficiency level from a broader perspective. Research in SLA suggests there are two underlying constructs in language proficiency, that is declarative and procedural knowledge. However, most of the research in fluency studies examined proficiency in terms of either of the two kinds. This is a shortcoming of research in this area because recent research (e.g., Tavakoli et al., 2017) showed that certain aspects of fluency might be linked to different kinds of linguistic knowledge. Our rationale for using measures of language proficiency that test both kinds of knowledge was based on the assumption that a more complete profile of the learner proficiency would provide us with a more valid interpretation of the role of proficiency in the relationship between L1 and L2 fluency.
Following research in this area (Gaillard & Tremblay, 2016), we employ the Elicited Imitation Task (EIT) and the grammar section of the Oxford Placement Test (OPT). We assume that the first is more likely to test the participants' procedural knowledge while the latter it is more likely to test their declarative knowledge.
The following research questions guide our study: (1) Is there a relationship between L1 Turkish and L2 English fluency behavior? If so, To what extent does language proficiency mediate the relationship between L1 and L2 fluency behavior? (2) To what extent can L2 fluency measures be predicted from L1 fluency behavior?

METHOD
The study had a factorial design with language proficiency as the independent variable, and different measures of fluency as the dependent variables of the study. Language proficiency was between-participant variables, representing three levels (A2, B1, and B2 according to the CEFR).

PARTICIPANTS
The data was initially collected from 44 participants who volunteered to take part in the study at the time of the data collection; however, data from two of the participants were removed at a later point because they did not meet all task completion requirements. The data reported, therefore, here comes from 42 native speakers of Turkish undergraduate students (25 females and 17 males) aged between 19 and 25. They were taking English courses at a state university in Turkey either as part of their foundation classes or their degree program (i.e., as a compulsory course of General English). They had varying levels of English at the start of their programs and had been on these courses for five months at the time of the study.
Previous research (Mora & Valls-Ferrer, 2012;Tavakoli et al., 2016) has shown that life/study-abroad experiences and professional use of L2 would have a positive impact on fluency helping it develop fast. Therefore, to explore our research questions and to ensure homogeneity of the sample, we needed a group of participants whose L2 fluency was not affected by such experiences. To control for this, a short demographic questionnaire, available on IRIS (https://www.iris-database.org/iris/app/home/index), was used before the data collection, and data from anyone with such a profile were excluded.

Oral Narrative Tasks
For the purposes of this study, two oral narrative tasks were used (see Appendix 1). Following de Jong and Vercellotti's (2016) recommendations for careful selection of "features that constitute task complexity" (p. 387), a number of criteria were considered when selecting the narratives. These included number of characters and props (de Jong & Vercellotti, 2016), amount of contextual support (Révész, 2009), similar storyline complexity (Tavakoli & Foster, 2011), and similar amount of intentional reasoning (Awwad et al., 2017). Although care was taken to choose comparable tasks, we are aware that the two narratives were different and as such a task design impact might be detected. We ran t-tests to check for any possible effects of task design.

Proficiency Tests
Proficiency is measured rather variably in L2 studies. This measurement ranges from studies using C-tests and vocabulary size tests to those employing internationally standardized and validated four-skills (reading, writing, speaking, and listening) tests. In the current study, our measurement of proficiency was based on the definition of proficiency provided earlier (Hulstijn, 2015, p. 21). Following Hulstijn's (2015) definition, we examined language proficiency along the two dimensions of "knowledge of language" or declarative knowledge and "the ability to use the language" or procedural knowledge (for further discussion see "Research Aims"). To achieve this aim, we used the grammar section of the OPT (Allan, 2004), and the EIT (Ortega et al., 2002;Tracy-Ventura et al., 2014) to measure the participants declarative and procedural knowledge, respectively. 1 EITs are increasingly used in L2 studies to measure procedural knowledge (Yan et al., 2015) and as evidence of the participants' interlanguage (Ellis, 2005;Kim et al., 2016). In an EIT, participants are asked to repeat a set of sentences of varying length and complexity usually after listening to it once. Also, because it is dependent on fast language processing and producing speech in real time, the EIT is arguably more suited to measuring procedural oral language ability, that is, both implicit and speeded-up explicit knowledge types (Suzuki & DeKeyser, 2019). 2 The EIT used in the current study consisted of a total of 30 sentences in English designed to test L2 proficiency with a combination of grammatical features, syntax, and vocabulary. These sentences varied in length (between 7 to 19 syllables), ordered from the fewest syllables to the most, and were adopted from Ortega et al. (2002). The task also involved a practice session using additional five Turkish sentences at the beginning of the test to make sure the procedures were well understood and followed. The results of the proficiency tests are discussed in the following text.

PROCEDURES
The data were collected on two separate days. On the first day, all participants sat the OPT in a classroom, and the EIT individually in a different room. Individual meetings were then arranged for the participants in the same week to perform the oral narratives in L1 and L2. In the implementation of the EITs, the procedures in Gaillard and Tremblay (2016) were followed. Following the practice session in Turkish, the participants were presented with the English sentences one by one while they were recorded. They were asked to listen to each of the sentences, which was followed by a beep sound and then were required to repeat it. There were two seconds between the end of each sentence and the beginning of the beep sound, and the participants had only one attempt to repeat the sentences. The rationale for this arrangement came from previous studies (Gaillard & Trembly, 2016;Ortega et al., 2002) that suggested this procedure to ensure the test takers do not merely mimic the stimuli, but rather they would process them.
On a separate day, the participants performed the tasks. They were divided randomly into two equal groups, with one group performing Task A and the other performing Task B. 3 They were given the picture prompts and 30 seconds of planning time, followed by 90 seconds to retell the story. No additional information about the tasks or vocabulary to be used was provided. Their speech was digitally recorded as they performed the tasks. Each participant had to narrate the story twice, once in Turkish and once in English. To control for any practice effect, a counterbalanced design was used for the language in which they performed the task, that is, half of the participants narrated the story first in L1 and then in L2, while the other half performed the tasks in the reverse order of first L2 and second L1. The details of the counterbalanced design are provided in Table 1. All the information about the purposes of the study and the instructions were provided in Turkish. To determine the power of the study's sample size, we used GPower 3.1 (Faul et al., 2009) to run a post-hoc power analysis. Running the analysis for a linear regression fixed model, we calculated the power of each individually significant regression model with an alpha level of .05, an effect size of .15, and a sample size of 42. The results showed a power of .98 for number of mid-clause filled pauses, .99 for speech rate, and .95 for number of repair. The only significant model achieving a power below the .80 threshold level was for number of mid-clause silent pauses (1Àβ = .67). The results of the power analysis suggest a good level of confidence could be maintained in the findings.

DATA ANALYSIS
The OPT comprised 100 questions in total, and for each correct item, 1 point was awarded. The scoring of the EIT sentences was done based on a holistic scale (i.e., scores ranging from 0 to 4), which was adapted from Ortega et al. (2002) and has been since employed in several studies (e.g., Gaillard & Tremblay, 2016;Kim et al., 2016;Tracy-Ventura et al., 2014;Wu & Ortega, 2013). Based on this scale, four points were given for exact repetition, three points for accurate repetitions of the sentences keeping the content meaning but including small structural changes, two for repetitions that included changes in grammar that could affect the meaning of the sentences, one for repetition of the half of the sentence, and zero when less than half of the sentence or no repetition was provided. After providing a second rater a training session on the scoring system, a total of 20% of the data was scored by this researcher. The correlation coefficient between the two sets of scores was .98. The maximum score one could obtain on the EIT was 120. For comparability reasons, the OPT scores were converted into 120 as well (please see Appendix 2 for descriptive statistics for OPT and EIT scores). There was a positive moderate correlation between the OPT and the EIT scores (r =.51, p < .001), suggesting that a high score on the OPT was associated with a high score of the EIT. This correlation demonstrates that about 26% of the variance in participants' OPT scores can be accounted for by their EIT scores.
The mean of the EIT and the OPT scores was calculated, and this combined score was used to group the participants into proficiency levels. The groupings were based on the scoring system of the Oxford Online Placement Test, that is, each 20-scores corresponds to one level of proficiency (A1: 1-20; A2: 21-40, etc.). Table 2 shows the proficiency categorisation of this placement test and the participants in this study. While the maximum 1-20 15 *This is the combined score of OPT and EIT scores (50% of each), and the maximum score is 120.
score one could obtain was on this test 120, our participants' combined scores ranged between 21 and 80, suggesting they belonged to the three levels of A2, B1, and B2 of the CEFR (Council of Europe, 2009), with 15, 15, and 12 participants in each group, respectively (see Oxford English Testing website for further information about how scores are interpreted in terms of the CEFR levels). Although we did not use the proficiency level categorization in our data analysis, we believe placing the participants into the corresponding CEFR levels of proficiency is helpful when discussing the results.
In the narrative tasks, the participants produced speech samples of varying length (see Appendix 3); however, 60 seconds was used as the cutoff point for data analysis. To have comparable measures of analysis, analyses of shorter samples were corrected for time to a 60-seconds level. Following Foster et al. (2000), the data were transcribed and coded for AS-units and clause boundaries. Instances of repair fluency (i.e., repetitions, replacements, false starts, and reformulations) were marked with a set of conventional symbols. Our choice of repair measures follows previous research (Hunter, 2017;Kormos, 1998;Skehan, 2003Skehan, , 2009Skehan, , 2015 that suggests repetitions, replacements, false starts, and reformulations are examples of repair strategies that L2 learners use. Whether these repairs occur because of the conceptualization demands (e.g., amount of information) or formulation demands (e.g., lexical retrieval or syntactic encoding), they suggest that speakers are engaged in a repair process. Also, unpruned data were used in the analysis; that is to say, the syllables in repetitions, replacements, reformulations, and false starts were included in the analysis.
Following Thai and Boers (2016), we used the Syllable Count program (www. syllablecount.com) for English transcripts to calculate the syllable counts. We used a similar program, Hece Hesaplama (www.hesapla.online.com), to calculate the syllable counts in the Turkish transcripts. To ensure the reliability of these programs, 10% of these calculations for both languages were double checked by manual counts, and an interrater of 0.98 (Cohen's Kappa) was achieved. To ensure the interreliability for the data coding, 20% of the coded data were checked by a second bilingual (Turkish-English). Again, Cohen's Kappa statistics was performed to determine the consistency on the placement of clause boundaries, pauses, and repair fluency instances and types (i.e., repetitions, replacements, reformulations, and false starts), combining all of the second-coded data. For each of these, a reliability of above 0.95 was achieved.
The audio files were analyzed for silent pauses and the length of all pauses using PRAAT software, which allowed for accurate measurement and multiple revisions. The data were segmented using the Annonate to TextGrid (silences) command (Boersma & Weenink, 2010). A threshold of 250ms was chosen for all the pauses .
To capture an accurate and complete picture of fluency, it was necessary to use measures that were shown to represent each aspect of utterance fluency reliably. Following recent research in this area, we chose the following seven measures. For breakdown fluency, frequency of pauses, both filled and silent, at mid-clause and end-clause positions were chosen (Bosker et al., 2013;Huensch & Tracy-Ventura, 2017). Fluency research Hunter, 2017;Tavakoli et al., 2017) emphasizes that to understand speakers' breakdown behavior, it is necessary to examine pause character (filled or unfilled) and location (mid-clause and end clause). Speech rate was selected to represent composite (speed and pausing aspects combined) fluency as it has been reported to be one of the most sensitive aspects of utterance fluency to demonstrate change in fluency behavior (Huensch & Tracy-Ventura, 2017;Mora & Valls-Ferrer, 2012;Tavakoli, 2018;Tavakoli et al., 2017). Articulation rate was also included in our measures as it is the only pure measure of speed that excludes pauses to provide a measure of how fast someone speaks regardless of their pausing behavior. Total number of repairs, that is, total number of repetitions, replacements, false starts, and reformulations, has also been suggested as a valid measure to demonstrate L2 speakers' repair behavior and monitoring processes (Hunter, 2017;Kahng, 2014;Skehan, 2009). Table 3 demonstrates the utterance fluency measures used in this study.

ANALYSIS AND RESULTS
Before answering the research questions, the descriptive statistics provided in Table 4 for means and standard deviations are discussed for the different fluency measures across the two languages. As shown in Table 4, the figures in Turkish as L1 and English as L2 were very similar for number of end-clause filled pauses. For a number of other fluency measures, the figures seemed different in the two languages. These included measures of breakdown and repair fluency, and speed and composite measures.
To see whether these differences are statistically significant, paired samples t-tests were run (Table 4). Bonferroni-corrected p values were also considered to minimize Type I errors. To estimate the magnitude of the differences, effect sizes (Cohen, 1988) were calculated, and the results were interpreted using the field-specific benchmarks suggested by Plonsky and Oswald (2014), that is, for within group comparisons and  We also ran a number of t-tests to examine any possible effects of the two different narratives on the speakers' performance (Table S1, see the supplementary information online). The results showed no statistically significant differences between fluency of performances in the two tasks.
Research question 1 examined whether there was a relationship between L1 fluency and L2 fluency behavior. To address this question, Pearson product-moment correlations were run between L1 and L2 fluency measures for the four aspects of fluency (i.e., breakdown, repair, speed, and composite) separately. Preliminary analyses were performed to ensure that there were no violations of the assumptions of normality, linearity, and homoscedasticity. As can be seen in Table 5, the results demonstrate moderate to strong positive correlations for some breakdown measures. Regarding breakdown fluency, there was weak (under r = .4) to moderate (under r = .7) (see Plonsky & Oswald, 2014) positive correlations between the two languages for number of mid-clause filled pauses (r = .60, p = .001), number of end-clause filled pauses (r = .30, p = .048), number of mid-clause silent pauses (r = .34, p = .024), and for number of repair (r = .45, p = .003). No significant correlations were found for speed and composite measures ( Figure S1, see the supplementary information online).
To examine the extent to which the relationship between L1 and L2 fluency was moderated by L2 proficiency level (represented by the mean score of EIT and OPT scores), partial correlations were carried out separately for this variable as the controlling factor. Once again, it was ensured that the assumptions of normality, linearity, and homoscedasticity were not violated. Weak to moderate correlations were maintained for all the significant results. An inspection of zero order correlations (Table 6) suggested that controlling for language proficiency had little impact on the strength of relationship between the measures; number of mid-clause filled pauses (r = .60, p < .001), number of end-clause filled pauses (r = .32, p < .040), number of mid-clause silent pauses (r = .36, p < .020), and number of repair (r = .46, p < .001) achieved a statistically significant level, with zero order correlations of r = .60, p < .001; r = .30, p < .048; r = .34, p < .024; and r = .45, p < .003, respectively. Research question 2 examined the extent to which measures of L2 fluency can be predicted from measures of L1 fluency and language proficiency. To address this question, multiple regressions were run for all fluency measures (variables entered simultaneously) with English fluency measure as the dependent variable, and Turkish fluency measure and language proficiency measured by the OPT and the EIT scores as independent variables. Inspections of the SPSS output, for example, normal distribution of residuals, independence of observations, multicollinearity among the independent variables, and linearity between the dependent and independent variables suggested that all assumptions of regression analysis were met. The results from the multiple regressions are presented in Table 7.
As indicated in Table 7, the results imply that for a number of measures, models predicting L2 fluency from L1 fluency and language proficiency reached a statistically significant level. As for breakdown fluency, the significant models were number of mid-clause filled pauses, F (3,38) = 9.10, p < .001, and number of mid-clause silent pauses F (3,38) = 4.11, p < .01, explaining 41% and 24% of the variance in the participants' performance, respectively (adjusted R 2 = .37 and .18, respectively). While the models for number of end-clause silent pauses, F (3,38) = .75, p < .525, and articulation rate, F (3,38) = 1.17, p < .371, did not reach a statistically significant level, the model for number of end-clause filled pauses F (3,38) = 2.63, p < .06 missed reaching a significant level, although 17% of the variance in the number of end-clause filled pauses was explained by this model (adjusted R 2 = .10). For number of mid-clause filled pauses, Turkish fluency measure significantly contributed to the model (p < .001) while language proficiency did not. For number of mid-clause silent pauses, all measures (i.e., L1 measure, OPT and EIT scores) significantly contributed to the model (p < .012, p < .03 and p < .049, respectively). Other models reaching a significant level was number of repair, F (3,38) = 5.88, p < .002 and speech rate, F (3,38) = 11.70, p < .001. For the model of number of repair, while Turkish number of repair (p < .007) and EIT scores (p < .02) made a significant contribution to the model, OPT scores did not. The variance explained by this measure was 31 % (adjusted R 2 = .26). Interestingly, for the model of speech rate, the contribution made by OPT scores was nonsignificant, while the EIT scores and L1 fluency measure made a significant contribution to the model (p < .001 and p < .036, respectively). The amount of variance explained by speech rate was 48% (adjusted R 2 = .43), suggesting that L2 speech rate can be predicted from language proficiency assessed by the EIT scores and Turkish speech rate. To interpret the strength of the adjusted R 2 values, we followed Plonsky and Ghanbar's (2018) recent proposal in which values of up to .20 are considered small and those above .50 are regarded as large.

DISCUSSION
The prime aim of the current study was to investigate whether L1 and L2 fluency behaviors were related, and whether this relationship was mediated by proficiency level. The study also aimed to examine to what extent L1 fluency behavior and language proficiency predicted L2 fluency in a group of Turkish learners of English. Our descriptive statistics highlighted a number of similarities and differences between L1 and L2 fluency behavior. It was interesting to see that the participants produced similar amount of end-clause filled pauses in the two languages, while they showed large differences in their pausing behavior for number of mid-clause pauses (both filled and silent) and end-clause silent pauses. This is similar to previous research in this area (Skehan, 2014;Tavakoli, 2011) in that L2 speakers make more frequent mid-clause and fewer end-clause pauses. We believe providing descriptive statistics about fluency measures in different languages would offer a baseline for cross-linguistic analysis of fluency in future studies. As anticipated, the results of the t-tests showed that the participants were overall more fluent in their L1. The higher figures for repair measure in English and a much higher speed in Turkish are examples of their higher fluency levels in L1.
The results also indicated that overall L2 speed measure was lower than those reported in previous studies (e.g., Mora & Valls-Ferrer, 2012;Tavakoli et al., 2016). The lower L2 speed measure in our study may be explained by the fact that several of our participants were from an A2 proficiency level, and none of them had either studied or lived abroad in an English-speaking community, whereas in both Tavakoli et al. (2016) and Mora and Valls-Ferrer (2012), the participants were on study-abroad courses and had higher proficiency levels.
The results of the correlation analyses suggested that some of the breakdown measures and total number of repairs were positively correlated in L1 and L2, although the strength of associations varied. The results showed significant positive correlations for three breakdown measures, that is, number of mid-clause filled pauses (r = .60), number of end-clause filled pauses (r = .31), and number of mid-clause silent pauses (r = .35), suggesting that the L2 speakers' frequency of pausing was to some extent a function of their L1 pausing behavior. These correlations replicate the findings of Peltonen (2018) in which moderate to high correlations of (r = .66) and (r = .41) were reported for mid-clause silent pauses and mid-clause filled pauses. Interestingly, the highest correlation found in our results is for mid-clause filled pauses (r = .60), indicating a high degree of association between L1 and L2 pausing behavior. Previous research has suggested that mid-clause pausing is a key characteristic of L2 speech production, especially when the production process is not yet automatic (Kormos, 2006;Segalowitz, 2010;Tavakoli, 2011;Tavakoli & Skehan, 2005). The high correlation obtained for mid-clause filled pauses, however, suggests that L2 pausing behavior in the current study is also to some extent a function of one's personal style carried over from L1 behavior. Overall, these findings corroborate the findings of previous research Peltonen, 2018) regarding the correlations between breakdown measures in L1 and L2.
The results also indicated a positive correlation for total number of repairs (r = .45) in the two languages, confirming the hypothesis that L2 repair behavior might be related to L1 repair behavior. This finding is in contrast with that of Peltonen (2018) where weak correlations between L1 and L2 repetitions were found. The different results might have been caused by the way repair fluency is operationalized in the two studies. While Peltonen (2018) measured repair in terms of repetitions, in our study we examined a range of repair measures that the SLA literature suggests as reliable representatives of repair processes. Examining a wider range of repair measures, we argue, has given us a rich opportunity to investigate the L1 and L2 repair behaviors and to explore the possible relationship between them.
With regard to L2 speakers' use of repairs across different proficiency levels, some emerging research evidence  suggests that use of repairs is not related to proficiency level, implying that a kind of personal style might be at work. This finding is particularly important as repair is often perceived as an indicator of dysfluency in second language pedagogy, for example, language benchmarks such as CEFR and language testing scales such as IELTS. L2 teachers regularly associate repair with monitoring processes involved in L2 production, and often link it to the development of the learner's interlanguage system. The results presented here suggest that while repair measures represent the speaker's process of repairing an utterance, for example, correcting an L2 error or reformulating an L2 structure and concept, it can to some extent demonstrate the speaker's personal speaking style. This is an important point to be included in L2 teacher training programs as previous research (Tavakoli & Hunter, 2018) has highlighted the crucial role that teachers' understanding of the relationship between L1 and L2 fluency plays in their classroom practice and their ability to help their learners.
As for speech rate and articulation rate, however, the results did not show any significant correlations between L1 and L2 fluency behavior. This is in contrast to de Jong et al. (2015), Derwing et al. (2009), and Huensch and Tracy-Ventura (2016) who did report correlations for speed measures. We interpret these contradictory results in the light of the differences between our participants and those in the other studies. Our study included 15 participants at A2 (CEFR) level of proficiency. As discussed earlier, learners at lower proficiency levels often speak at a slower speed with more mid-clause pauses, hesitations, and interruptions. This means their speech rate and articulation rate are typically slower than those in their L1; the different L2 speaking patterns in the A2 level participants may have made it difficult to obtain a significant correlation between the two speed patterns. It is also important to note that our participants had never lived/studied abroad or used English for professional purposes in their everyday life. Arguably, they had fewer opportunities for developing fluent speaking skills compared to those in de Jong et al. (2015), Derwing et al. (2009), and Huensch and Tracy-Ventura (2016) who had lived/studied abroad.
When controlled for proficiency level, the significant results of the partial correlations suggested that the relationship between L1 and L2 fluency was maintained across different proficiency levels and indicated that the relationship between fluency behavior in the two languages was not mediated by proficiency. This suggests that the relationship between L1 and L2 fluency behavior persists regardless of how proficient the speakers are, highlighting the fact that the L2 speaker's fluency behavior is, at least to some extent, related to their L1 fluency behavior.
Our second aim was to investigate whether L2 fluency behavior can be predicted from L1 fluency behavior and language proficiency. The results of multiple regression analyses indicated that four models predicted L2 fluency, with L1 fluency making a significant contribution to all of them, OPT to one and EIT to three. When interpreting the findings, we have followed Plonsky and Ghanbar's (2018) proposal in which R 2 values in the realm of .20 (or below) and .50 (and above) are considered as small and large, respectively. The models predicting L2 fluency were mid-clause filled and mid-clause silent pauses, total number of repairs and speech rate. Two of the significant models where L1 fluency predicted L2 fluency behavior were number of mid-clause filled pauses and number of mid-clause silent pauses (R 2 values of 41% and 24%, respectively, which are regarded medium and small) with L1 making a significant contribution to both (p < .001 and p < .01). These findings are in line with Huensch and Tracy-Ventura's (2016) results where frequency of L1 pauses predicted L2 fluency behavior. However, a key difference between the two studies is that instead of measuring pauses at clause boundary, which is recommended in the literature, Huensch and Tracy-Ventura (2016) examined pauses at AS-unit boundary. The literature on fluency studies (Peltonen, 2018;Skehan, 2009Skehan, , 2014 considers this a limitation, which has been addressed in the current study as we distinguished between mid-and end-clause pauses. The more detailed analysis of pause location provides a more useful insight into the differences between L1 and L2 pausing behavior and seems to be crucial for an in-depth understanding of the differences between L1 and L2 processing. The highest significant level achieved was for speech rate (R 2 = 48%) in which the EIT scores made a significant contribution to the model (p < .001), whereas L1 measure made a significant but modest contribution (p < .04). This finding is interesting as it suggests that the L2 speech rate can be predicted by a measure of procedural knowledge. It is necessary to note that articulation rate, which is a pure measure of speed, failed to show a significant model, suggesting speed of performance in L1 cannot predict L2 speed. This finding highlights the importance of including both pure and composite measures of speed when analyzing fluency. For total number of repairs, both L1 behavior and the EIT scores made significant contributions to the model (p < .007 and p < .02, respectively). The adjusted R 2 figure of 26% in this model implies that L2 repair can also be predicted from L1 repair and L2 procedural knowledge. It is worth noting that the R 2 figures obtained for our significant models ranged between small level of .18 to a medium level of .48 (Plonsky & Ghanbar, 2018) indicating the models were able to explain up to 48% of the variance in learners' performance.
The results also revealed that in a number of models, language proficiency predicted L2 fluency behavior. Our interest in examining the role of language proficiency was motivated by previous research (de Jong, 2016;de Jong et al., 2015;Huensch &Tracy-Ventura, 2016) that questioned whether language proficiency mediated the relationship between L1 and L2 fluency behavior. We were also inspired by research (e.g., de Jong et al., 2012;Révész et al., 2016) that claimed some L2 fluency measures were reliable predictors of proficiency. For speech rate, the model reached significance with the EIT scores making a significant contribution to the model (R 2 figure of 48% and p < .001).
The results indicated that the EIT scores were a good predictor of a number of L2 fluency measures, that is, speech rate, number of repairs, and mid-clause silent pauses. This finding highlights the role of procedural knowledge in producing fast and uninterrupted speech. The OPT scores, however, predicted mid-clause silent pauses suggesting that L2 declarative knowledge may encourage silent pauses in mid-clause position. That is to say, mid-clause silent pauses are opportunities the speakers employ to use their declarative knowledge of the language during the production process. Research in this area (Kahng, 2014;Skehan, 2014;Skehan et al., 2016) reports that mid-clause pausing is linked to the Formulation (Levelt, 1989) stage of speech processing. Taking the two points together, it is plausible to argue that L2 learners who rely on their declarative knowledge are expected to have more mid-clause pausing. Given the relatively small sample size of the study, however, we suggest the results are considered with caution.

FINAL REMARKS
Our study has been the first to examine the role of proficiency level in L1 and L2 fluency behavior systematically across three proficiency levels. In addition, it is one of the first studies to examine proficiency level through more than one means. For these reasons, our findings will make a valuable contribution to helping develop a more reliable understanding of the relationship between L1 and L2 fluency behavior. Overall, our results are in line with recent research in oral fluency (Huensch & Tracy-Ventura, 2016;Peltonen, 2018), and reveal that L1 and L2 fluency behaviors are related for breakdown and repair measures, and that the relationship persists across different proficiency levels. Unlike Peltonen (2018), our results suggest that L2 pure speed measure cannot be predicted from L1 speed. We interpret this difference in relation to how the participants' proficiency was assessed as the current study adopted a more systematic approach to measuring proficiency.
The results also reveal that the pausing and breakdown aspects of L2 fluency can, to some extent, be predicted from L1 fluency behavior. The significant models for speech rate, number of mid-clause filled and silent pauses and total number of repairs suggest that a considerable amount of the variance in L2 fluency behavior can be explained by L1 fluency. It may be inferred from the results that pure speed measure, that is, articulation rate, is independent in the two languages. The results also indicate that L2 procedural knowledge predicts speech rate, whereas the participants' declarative knowledge contributes to the models predicting mid-clause silent pausing. The results of the study have significant implications for the field of language testing, particularly for developing speaking rating scales. Rating scales for the assessment of speaking in many international language tests, for example, the British Council's IELTS and APTIS, frequently refer to features such as pauses and reformulations as indicators of disfluency. The results of the current study suggest that frequent use of such features may represent, at least to some extent, the speaker's L1 behavior rather than their L2 disfluency.
An important methodological contribution of this study concerns the assessment of proficiency level as well as the impact that the choice of test might have on the results. We used two measures of proficiency to tap into different underlying constructs of L2 speaking proficiency, that is, L2 declarative and procedural knowledge. This allowed us to see which aspects of fluency are linked to declarative and which to procedural types of linguistic knowledge. The findings clearly suggest that a more complete profile of the learner proficiency can help shed light on the role of proficiency in fluency behavior.

NOTES
1 While a grammar test is primarily a means of activating declarative knowledge, we are aware that it can, at least to some extent, tap into procedural knowledge as well. For research purposes in our field, however, a grammar test is usually used as a measure of declarative knowledge (references).
2 Given that we do not distinguish between implicit knowledge and speeded-up explicit knowledge in this study, the use of EIT seems well justified for measuring procedural knowledge of both kinds. 3 Using two different tasks was aimed at increasing test security.