Aptitude–Treatment Interaction (ATI)

doi:10.1017/9781009076463.018

Part IV Aptitude–Treatment Interaction (ATI)

14 The Role of Language Aptitude and Timing of Form-Focused Instruction in TBLT

Introduction

Task-based language teaching (TBLT) has become a prominent second language (L2) teaching approach (Ellis et al., Reference Ellis, Skehan, Li, Shintani and Lambert2020). TBLT involves using tasks whose overarching goal is to achieve specific communicative goals. Several task-based teaching methodologies have been proposed, usually involving three phases (Ellis & Shintani, Reference Ellis and Shintani2014; Van den Branden, Reference Van den Branden and Hall2016; Willis & Willis; Reference Willis and Willis2007): a pretask, a task, and a posttask. Roughly speaking, the pretask phase serves to introduce information relevant to the task, the task phase engages learners in the performance of the task, and the posttask affords learners the opportunity to present the results of their work and reflect on what they did.

Even if the premises of TBLT rely primarily on implicit learning, where learners’ attention is focused on comprehending and expressing meaning, form-focused instruction is a key aspect of TBLT. Form-focused instruction involves attracting attention to specific linguistic features in a context where communication of meaning is the main objective (Spada, Reference Spada2011). However, the timing of the form-focused instruction in a task remains unclear. Researchers and methodologists have taken different positions as to the ideal moment when form-focused instruction should be provided. Opposed to providing form-focused instruction in the pretask phase, Long (Reference Long2015) argued that drawing learners’ attention to the formal properties of language should occur reactively in response to learners’ errors or questions. Among other things, this reactive stance allows instruction to be in phase with learners’ developmental readiness. However, Willis and Willis (Reference Willis and Willis2007) claimed that form-focused instruction should occur at the posttask phase to avoid learners fixating their attention on form at the expense of the global meaning of the task. But, according to skill acquisition theory (DeKeyser, Reference DeKeyser, VanPatten and Williams2015), presenting learners with declarative knowledge at the beginning of a task eventually allows them to develop procedural knowledge in the real operating conditions afforded by the context of a task.

Against these positions, some studies have seemed to show that one-size-fits-all recommendations are not the best advice and that the moderating effect of certain individual learner variables, notably learner proficiency level, should be taken into account when the timing of form-focused instruction is concerned (Li et al., Reference Li2016; Michaud, Reference Michaud2020; Shintani, Reference Shintani2017). In this respect, Michaud (Reference Michaud2020) concluded that whilst learners possessing less knowledge of a particular structure get more out of within-task instruction, learners with a higher level of knowledge extract more benefit from pretask instruction.

Although evidence for the mediating effect of proficiency level is emerging, little is known about the impact of other individual differences, such as language aptitude, on the benefits of different form-focused instruction timing conditions. Language aptitude refers to a set of cognitive abilities that facilitate information processing whilst learning and using an L2 in different contexts and at different stages of acquisition (Robinson, Reference Robinson2005). In his early definition of aptitude, Carroll (Reference Carroll and Glaser1965) discussed four components: phonetic coding ability, rote memory, grammatical sensitivity, and inductive learning ability. Skehan (Reference Skehan1998) regrouped the two latter components under ‘language analytic ability’, which he defined as ‘the capacity to infer rules of language and make linguistic generalizations or extrapolations’ (p. 204). It is under this tripartite conception that the concept of aptitude has been most often discussed (Li, Reference Li2016), namely phonetic coding, language analytical capacity, and memory.

Traditionally, language aptitude was seen as a predictor of individuals’ ability to learn an L2. However, more recently, researchers have tried to understand how language aptitude interacts with different learning conditions, which has been referred to as aptitude–treatment interaction research. In a recent meta-analysis, Li (Reference Li2015) reported that language aptitude was mostly associated with explicit instruction conditions. However, a closer look at specific studies paints a more nuanced picture. For instance, research that has investigated interactions of language aptitude with types of feedback, both implicit and explicit, has yielded controversial results. Although some studies showed that language aptitude was associated with learning gains under explicit corrective feedback (Sheen, Reference Sheen and Mackey2007; Yilmaz & Granena, Reference Yilmaz and Granena2016), others revealed that it was with implicit feedback that language aptitude played a bigger role (Li, Reference Li2013; Trofimovich et al., Reference Trofimovich, Ammar, Gatbonton and Mackey2007). Given these results, Li (Reference Li2013) posited that under implicit teaching conditions, learners’ level of language aptitude might play a role in learning easy and transparent structures that are within the reach of learners’ processing abilities. On the other hand, under explicit learning conditions, language aptitude plays a role in learning difficult and opaque structures for which the processing load would be too heavy for learners. Finally, in the case of a simple structure, explicit instruction might neutralise the need to rely on language aptitude, as was observed in studies which revealed that under deductive instruction, higher-level aptitude learners did not outperform lower-level aptitude learners (Erlam, Reference Erlam2005; Hwu & Sun, Reference Hwu and Sun2012; Hwu et al., Reference Hwu, Wei and Sun2014).

Regarding the complexity of a structure, Skehan (Reference Skehan2015) has stated that it might be the redundant and salient aspects of a structure that explain why aptitude would be solicited or not. Although learners with low language aptitude might not notice a redundant structure, higher aptitude learners might do so despite the structure’s low communicative value. The same thinking applies to saliency, that is, learners with higher language aptitude might be more sensitive to a structure that stands out more in the input than would lower aptitude learners.

Proficiency and Aptitude

Another variable that might impact the interaction of aptitude with treatment is proficiency level. Skehan (Reference Skehan, Wen, Skehan, Biedroń, Li and Sparks2019) judged that the different components of language aptitude can be called upon at different stages of acquisition – handling sounds, handling patterns, and automatising–proceduralising – each stage being composed of several acquisitional sequences. The first stage of handling sound involves input processing and noticing. This step allows learners to recognise unfamiliar sounds and impose a structure to better retain information. The second stage of handling patterns is a process that occurs at a more advanced stage and involves pattern identification, generalising, complexification, and handling feedback. The final stage, termed automatising–proceduralising, is the point at which learners develop an automated repertoire that allows them to communicate with ease, which in turn decreases the load of cognitive processing. It involves error avoidance, automatisation, creating a repertoire, and lexicalisation. According to Skehan, different components of aptitude are called upon at each of the three stages. There is not a clear match between stages of acquisition and language aptitude tests. It is possible to hypothesise that phonetic coding would be the most relevant component of the skill involved at the first stage of handling sounds and that language analytical ability would be the component most in demand at the second stage of handling patterns (Skehan, Reference Skehan, Granena, Jackson and Yilmaz2016, Reference Kachinske and DeKeyser2019). For the last stage, automatising–proceduralising, Skehan (Reference Skehan, Wen, Skehan, Biedroń, Li and Sparks2019) has been more speculative, noting that research that has investigated proficiency has seldom focused on advanced learners, and, therefore, it is difficult to understand the components of aptitude associated with later stages of learning. He has declared that it would be memory – perhaps associative memory, but more likely procedural or implicit memory – that would be associated with learning at this point.

This model remains theoretical at best because very few studies have been conducted to validate its premises empirically. In a meta-analysis, Li (Reference Li2015) concluded that analytical language ability seems to be associated more with the initial stage of learning. However, Li’s hypothesis is probably explained by the fact that aptitude research has mainly targeted learners in the earlier stages of acquisition. Artieda and Muñoz (Reference Artieda and Muñoz2016), who studied the effect of language proficiency at the beginner (A1) and intermediate (B1–B2) levels, reported findings that supported Skehan’s model. They found that phonetic coding as measured by LLAMA E, which would correspond to the handling sounds stage, was predictive for beginners, whilst language analytic ability (LLAMA F), a measure of handling patterns, was associated with both levels of proficiency. However, LLAMA D, an aptitude subtest that could be considered as corresponding to the automatising–proceduralising stage (Skehan, Reference Skehan, Granena, Jackson and Yilmaz2016, Reference Kachinske and DeKeyser2019) was only predictive for the beginner-level learners. In that study, the researchers measured the global proficiency level of learners; thus, it is unclear whether different components of aptitude are called upon at different stages of the acquisition of a specific structure.

Proficiency and Timing

After exploring different moderating variables involved in aptitude–treatment interaction research, we return to the issue of timing of instruction within a task. To our knowledge, only two studies have investigated the moderating effect of language aptitude regarding the efficiency of timing of instruction. Li et al. (Reference Li, Ellis and Zhu2019) controlled for the moderating effect of language analytical ability on the learning of passive structures in L2 English. In their study, five groups received form-focused instruction at different times during two dictogloss tasks: (a) explicit instruction before the task, (b) explicit instruction before the task and corrective feedback during the task, (c) corrective feedback during the task, (d) corrective feedback after the task, and (e) task only. The results showed that, after controlling for participants’ initial knowledge on a grammaticality judgement test (GJT), language analytical ability had a moderating effect only for the group that had received corrective feedback after the tasks and the group that had performed only the tasks.

Kachinske and DeKeyser (Reference Kachinske and DeKeyser2019) observed the same results. In their study, learners were shown pictures on a computer and asked to select the sentence that best described the image. The study, targeting two structures, namely Spanish object–verb position and the ser/estar distinction, included three experimental groups and one control group. Explicit instruction withheld from the control group was provided to the experimental groups either before, during, or after the tasks. The results indicated that language aptitude had the greatest impact on the group that did not receive the explicit instruction.

In both studies, it seemed that language aptitude had a moderating effect in conditions where there was the least amount of explicit support. Even though language aptitude has been more associated with explicit treatment (Li, Reference Li2015), in this case aptitude played a bigger role when instruction was either not provided or provided at the end of a task, conditions that can be considered more implicit. Two variables could explain these results. The first variable concerns the complexity of the structures. As we stated previously, explicit instruction can neutralise the advantage conferred by a higher aptitude when structures are not so complex. Therefore, learners who did not receive instruction at the beginning of the task had to rely more on their language aptitude to make sense of the structure. Structures involved in these studies could be considered salient, which, according to Skehan (Reference Skehan2015), would benefit learners with higher language aptitude. The other variable might be the characteristics of the tasks. In both studies, the tasks drew considerable attention to form. In Li et al. (Reference Li, Ellis and Zhu2019), the dictogloss activities each contained 15 occurrences of the structure, so learners, even if they had not received instruction, might have focused on this structure due to its frequency. Kachinske and DeKeyser (Reference Kachinske and DeKeyser2019) used picture-matching and sentence-interpretation comprehension activities, where choosing the right answer depended on the correct understanding of the structures. In a way, the tasks resembled conditions of inductive learning that have been shown to favour learners with higher-level aptitude. These conditions might have enticed learners with higher language aptitude to pay close attention to those structures and to try to understand their meaning even in the absence of instruction.

Motivation of the Present Study

Teachers wishing to adopt a task-based programme are given contradictory recommendations regarding the integration of form-focused instruction: before, during, or after a task. However, recent research has seemed to show that individual differences might modulate the effects pertaining to the timing of instruction. Previous research using tasks seemed to suggest that posttask instruction or no instruction were conditions in which language aptitude was most solicited. However, the design of those tasks placed an important focus on the processing of certain structures. It remains to be seen whether tasks more focused on meaning, where emphasis is not as strong on the processing of a specific structure, would put the same demands on language aptitude among learners of different levels of proficiency. Also, previous research has largely focused on one component of language aptitude, language analytical ability, but different components might play a role at different stages of acquisition. Therefore, we were interested in examining the role of different components among L2 learners with different levels of proficiency who receive form-focused instruction at different timing during a task.

Research Questions

Research Question 1: Does language aptitude have a moderating effect on timing of instruction for the acquisition of explicit and implicit L2 knowledge?
Research Question 2: Do different components of aptitude intervene at different levels of proficiency?

Method

Participants

The study took place in an English-language university in the French-speaking province of Quebec (Canada) over a period of four weeks. The participants, whose average age was 20.2 years, were university students taking credited courses in French as an L2 $(n = 159)$ . They came from eight intact groups, four at the B1 level $(n = 79)$ and four at the B2 level $(n = 80)$ . Each group followed what can be described as a modular task-based curriculum (Ellis, Reference Ellis2018) in which tasks constituted the learning units but where teaching certain linguistic features might be planned pre-emptively. The curriculum was also influenced by the action-oriented perspective proposed by the Common European Framework of Reference for Languages (Council of Europe, 2001), where learners are considered as social actors and where tasks must mirror real-life activities. Classes met twice a week, lasting 90 minutes each time.

Students from the eight classes were assigned to four experimental conditions, each of which included one B1-level group and one B2-level group. The four conditions included three experimental groups, who received explicit instruction before (pretask, B1, $n = 21$ ; B2, $n = 20$ ), during (task, B1, $n = 17$ ; B2, $n = 18$ ) or after (posttask, B1, $n = 20$ ; B2, $n = 18$ ) the tasks, and a control group (control, B1, $n = 17$ ; B2, $n = 21$ ), who performed the same tasks without receiving any experimental instruction. A one-way analysis of variance (ANOVA) conducted on the pretest indicated that the difference between the B1 and B2 groups was not significant for the GJT $F (7, 153) = 1.25, p = .28, {n_{p}}^{2} = .054$ . It was, however, significant for the EIT $F (7, 158) = 2.96, p = .006, {n_{p}}^{2} = .116$ . Consequently, in order to better assess the moderating effect of proficiency level with the particular structure targeted by the instruction, participants in each of the four conditions were assigned to lower- and higher-proficiency subgroups based on a median split using pretest scores. This decision was motivated by the desire to test the involvement of different language aptitude components on the development of a particular structure alongside the stages of acquisition proposed by Skehan (Reference Skehan, Wen, Skehan, Biedroń, Li and Sparks2019).

Tasks

During the study, the participants completed two tasks that were part of their regular curriculum. Both tasks were focused (Ellis, Reference Ellis2003) and targeted the French subjunctive mood. The first task was a hierarchical task. The participants had to film a video in which they had to give advice to future students on how to prepare themselves for winter on campus. In the pretask, the participants watched a video about a refugee family trying to adapt to winter in Canada. The video contained three occurrences of the French subjunctive. Following that, the participants individually had to devise five tips to include in the video that they would record for future students. During the task phase, participants worked in groups of four and had to share their tips. Then, the four participants had to agree on five tips to be included in the video. At the posttask stage, each video was shown in class so that the participants could choose the best video.

The second task was a decision-making task. The participants were part of a committee in charge of organising a winter carnival at the university and were asked to assess activity proposals to be included in the event programme. During the pretask, the participants, working in groups of four, had to establish criteria with which to assess the proposals. During the task, they read seven activity proposals in order to determine which proposals to eliminate based on the retained criteria. In the posttask, the participants had to write an email to the organisers of the rejected proposal explaining their decisions.

Even if we used focused tasks, the targeted structure was always presented as a useful way of expressing what the participants wanted to say, but the structure was never imposed.

Target Structure

The French subjunctive mood was taught during the two tasks. It was chosen because this structure can express advice and recommendations, formulations deemed useful for performing the task (Loschky & Bley-Vroman, Reference Loschky, Bley-Vroman, Crookes and Gass1993). At the institution where the study took place, the subjunctive is introduced at the B1 level and reinforced in the following levels. The subjunctive is a verbal mood that expresses a certain subjectivity and can denote will, obligation, doubt, feelings, fear, and unreality. It is used in a dependent clause that can be introduced by a verbal expression (e.g., Il faut que vous preniez le métro, ‘You have to take the metro’). The conjugation of verbs in the subjunctive follows certain regular patterns for most verbs, except for a few verbs that occur very frequently. The subjunctive has low saliency and is communicatively redundant; it can therefore be considered a complex structure (Poplack, Reference Poplack1990), which would explain its late acquisition (Forsberg & Bartning, Reference Forsberg and Bartning2010).

Form-Focused Instruction

All three experimental groups received explicit instruction on the subjunctive in both tasks. The teaching sequence was built in collaboration with the teachers to reflect their actual practices. Relying on metalinguistic explanations, the teachers presented the meaning, context of use, and formation of the subjunctive on slides. Because the subjunctive mood can be used in a variety of contexts, instruction focused only on the present subjunctive and on verbal expressions deemed useful for giving advice and recommendations. The teachers introduced some sentences using the subjunctive to express advice. Then, they explained the rules of conjugation of the verbal mood and proposed some exercises in which the participants had to conjugate verbs in the subjunctive in sentences expressing advice. The pretask group received explicit instruction at the beginning of the tasks, the task group received explicit instruction whilst the group members worked in teams on carrying out the tasks, and the posttask group received explicit instruction after the tasks were finished. No explicit instruction was provided to the control group, which simply completed the tasks. In order to ensure the greatest consistency between the conditions, the teachers used the same PowerPoint presentation to present the information to the students. Before each task, the principal researcher reviewed all the slides and specific teaching instructions with the instructor. The instruction lasted 15 minutes in the first task and seven minutes in the second task.

Data Collection Instruments

In response to calls for more exhaustive measurements of L2 learners’ knowledge (Norris & Ortega, Reference Norris and Ortega2000), we used two tests: a GJT to assess explicit knowledge and an elicited imitation test (EIT) to assess implicit knowledge (Ellis, Reference Ellis2005). A reliability study conducted with 65 learners of French as an L2 to ensure the internal consistency of the GJT and the EIT yielded alpha coefficients of .92 and .72, respectively. Both tests were administered two days before the experimental treatment started (pretest), immediately at the end of the second task (immediate posttest) and two weeks later (delayed posttest).

Grammaticality Judgement Test

We used an untimed GJT to assess explicit knowledge (Ellis, Reference Ellis2005; Gutiérrez, Reference Gutiérrez2013; Zhang, Reference Zhang2015). The test consisted of presenting statements targeting predetermined structures, and the participants had to judge if the statements were grammaticality correct or not. For the study, we developed a 32-item task comprising 24 statements targeting the subjunctive and eight distractors. Apart from judging the grammaticality of the 32 items, the participants were asked to correct the sentences they judged as ungrammatical. One point was given for each correct answer, that is, when the participants accurately judged the statements. No point was awarded if the participants checked the incorrect box (i.e., indicated that a correct statement was incorrect). Participants who corrected statements they deemed incorrect obtained one point when the correction was accurate and half a point when a conjugation error was committed whilst trying to correct the statement in question.

Elicited Imitation Test

The EIT was used to assess implicit knowledge. Numerous studies have confirmed its validity (Ellis, Reference Ellis2005, Erlam, Reference Erlam2006; Gutiérrez, Reference Gutiérrez2013; Kim & Nam, Reference Kim and Nam2017; Zhang, Reference Zhang2015). The test required the participants to process the meaning of statements and decide whether they agreed with each statement or whether it was true or false. The participants then had to repeat the statement aloud correctly as quickly as possible. Repeating the utterance quickly limited access to explicit knowledge because the participants could hardly rely on a conscious consideration of the form. The task contained 32 items: 24 statements targeting the subjunctive and eight distractors. Of the 24 items targeting the subjunctive, 12 contained an error and 12 did not. The items were advice that focused on three areas: getting in shape, learning a language, and protecting the environment. The participants heard the item first (e.g., Il faut que vous buviez beaucoup d’eau., ‘You need to drink lots of water’). They indicated on a paper–pencil answer sheet whether they thought that this was good advice or not, and then repeated the item. A point was awarded if the participants used the target structure correctly, that is, if they correctly repeated a grammatical statement or if they corrected an ungrammatical statement. Half a point was given when the participants corrected an ungrammatical statement but made a conjugation error (e.g., item: Il est essentiel que tu rends un bon travail*, ‘It is essential that you do[conjugation error] a good job)’; participant’s response: Il est essentiel que tu rendisses un bon travail *), and no point was awarded if the participants did not use the structure correctly or avoided using it. The test was performed on CAN-8 software (CAN-8 Virtual Lab, 2018). In order to limit participants’ access to explicit knowledge (Kim & Nam, Reference Kim and Nam2017), we asked the participants to answer questions as quickly as possible.

LLAMA Aptitude Test

According to Skehan’s (Reference Skehan, Wen, Skehan, Biedroń, Li and Sparks2019) three-stage model, language aptitude might play a different role in three broad stages of acquisition: handling sound, handling patterns, and automatising–proceduralising. Consequently, we opted for the LLAMA test with its three subtests – LLAMA E, LLAMA F, and LLAMA D – that are meant to measure handling sound, handling patterns and automatising–proceduralising, respectively. The three subtests were administered at the same time as the delayed posttest.

The LLAMA (Meara, Reference Meara2005) is based on the Modern Language Aptitude Test, a test developed by Carroll and Sapon (Reference Carroll and Sapon1959) that has been used in numerous recent aptitude–treatment interaction studies (Kachinske & DeKeyser, Reference Kachinske and DeKeyser2019; Yalçin & Spada, Reference Yalçın and Spada2016; Yilmaz & Granena, Reference Yilmaz and Granena2016). Unlike the Modern Language Aptitude Test, LLAMA does not require knowledge of a common language. This makes it possible to use it in contexts where participants have different first languages, as was the case in the present study. In addition, the test has many advantages: it is easy to administer on a computer, it is available free online, and the completion time is approximately 25 minutes (Granena, Reference Granena, Granena and Long2013). However, the LLAMA test has not been the object of extensive validation. In an exploratory validation study of the LLAMA, Granena (Reference Granena, Granena and Long2013) obtained satisfactory internal reliability for the entire test $(α = .77)$ . However, the reliability of individual subtests was lower (LLAMA D, $α = .64$ ; LLAMA E, $α = .65$ ; LLAMA F, $α = .60$ ). The lower reliability for specific subtests was confirmed by Bokander and Bylund (Reference Bokander and Bylund2020), who conducted a larger validation study (LLAMA D, $α = .54$ ; LLAMA E, $α = .74$ ; LLAMA F, $α = .60$ ). Therefore, the results of subtests of the LLAMA must be interpreted with these caveats in mind.

LLAMA E

The LLAMA E subtest measures the ability to recognise new combinations of sounds. This subtest targets the component associated with phonetic coding. Twenty-four symbols are shown on a screen, and each of them corresponds to a particular sound. Participants have a two-minute study period during which they can take notes to learn this new written system. In the testing phase, participants hear the combination of two sounds and must find the right combination of the corresponding symbols from two possible choices. Participants receive points for correct answers and lose points for wrong answers. The total score varies between 0 and 100 points.

LLAMA F

The LLAMA F subtest measures the ability to infer grammatical rules from a new language. Participants have a five-minute period to learn the rules of a grammar system through pictures. During the testing phase, an image is presented and accompanied by a grammatical and an ungrammatical sentence. The subtest contains 20 items. Participants must choose the correct phrase. Points are given for correct answers and deducted for incorrect answers. The total score varies between 0 and 100 points.

LLAMA D

The LLAMA D subtest measures the ability to recognise sounds. This test targets the ability to discriminate oral sound patterns, the component associated with implicit learning. During the first phase of the subtest, 10 words are presented once orally. In the second phase, participants hear words and must decide whether they are the words that they had previously heard. The subtest contains 30 items. The possible score varies between 0 and 75.

As opposed to LLAMA E and F, the LLAMA D subtest does not have an initiation phase and does not call on analytical skills. In an exploratory validation study, Granena (Reference Granena, Granena and Long2013) observed that although the LLAMA B, LLAMA E, and LLAMA F subtests and a general intelligence test loaded on the same factor, LLAMA D loaded on another factor alongside a serial reaction time task associated with implicit learning. Suzuki (Reference Suzuki2021) doubted that LLAMA D is a true measure of implicit learning aptitude and posited that it is rather an aptitude test for proceduralisation. Regardless of the distinction, this test would fit in Skehan’s (Reference Skehan, Wen, Skehan, Biedroń, Li and Sparks2019) automatising–proceduralising stage, for which there are very few known tests.

Analysis

In order to assess the predictive value of the different components of language ability, we performed multiple hierarchical regressions using SPSS software (IBM SPSS Statistics, Version 25). Studies examining the interactions of skill and language aptitude have proceeded in two different ways. The first way has been to use gain scores as the dependent variable and aptitude tests as independent variables (e.g., Li, Reference Li2013). The problem with this method is that it favours participants with a lower score in the pretest, influences the variance and the distribution of the raw scores, and increases the chances of obtaining a negative score in the case where a posttest score is lower than a pretest score (Li et al., Reference Li, Ellis and Zhu2019). For this reason, Li et al. recommend performing multiple hierarchical regression using the posttest as a dependent variable and entering the pretest as an independent variable in a first step and then entering language aptitude in a second step. Therefore, we performed hierarchical multiple regressions for each test (GJT and EIT as immediate posttests and delayed posttests) and for lower- and higher-proficiency groups, using the posttest scores as a dependent variable and the pretest scores and aptitude components as independent variables.

Results

Because the aim of this chapter was to examine the associations between individual differences and learning gains, we will comment on the results of descriptive statistics only (for more information, see Michaud, Reference Michaud2020). Tables 14.1 and 14.2 present the results of the descriptive analyses of the GJT and EIT. As the tables show, even if the higher-proficiency participants started with more knowledge, the lower-proficiency participants still possessed some level of explicit knowledge as measured by the GJT. Results for the EIT were much lower for both groups, especially for the lower-proficiency participants, whose means oscillated between 0.88 and 2.30 (max = 24). From the overall trends, it seemed that larger gains were obtained for the lower-proficiency participants of the within-task group for both the GJT and EIT, but for the higher-level proficiency participants, the highest gains were obtained by the pretask group.

Table 14.1 Grammaticality judgement test: Descriptive statistics for lower- versus higher-proficiency participants by group

			Pretest		Immediate posttest		Delayed posttest
Group	Level	n¹	M	SD	M	SD	M	SD
Pretask	Lower	21	8.29	1.73	17.46	4.24	15.60	4.50
	Higher	20	18.15	4.38	22.25	1.18	20.63	3.65
Task	Lower	17	9.24	1.96	19.19	4.44	17.89	4.34
	Higher	18	16.56	3.22	21.03	2.41	20.14	3.3
Posttask	Lower	20	8.98	4.59	17.95	5.09	16.83	5.25
	Higher	18	17.63	3.80	20.64	3.75	21.47	2.03
Control	Lower	17	8.73	3.31	11.77	4.97	12.29	5.05
	Higher	21	15.64	4.42	17.75	4.26	17.89	3.91

Note. Maximum score = 24.

Table 14.2 Elicited imitation test: Descriptive statistics for lower- versus higher-proficiency participants by group

			Pretest		Immediate posttest		Delayed posttest
Group	Level	n¹	M	SD	M	SD	M	SD
Pretask	Lower	20	1.32	1.17	6.40	4.30	5.68	4.41
	Higher	21	5.86	5.23	13.43	6.67	12.45	7.10
Task	Lower	18	1.42	1.60	8.25	6.52	8.36	6.31
	Higher	21	4.19	2.77	9.74	5.17	9.67	3.94
Posttask	Lower	22	2.30	1.88	7.77	6.21	7.02	5.51
	Higher	20	5.73	4.14	11.15	6.23	10.73	5.65
Control	Lower	16	1.00	0.88	2.06	1.51	2.47	1.68
	Higher	20	5.00	2.71	6.88	4.15	7.60	3.68

Note. Maximum score = 24.

To assess the difference between groups at the beginning of the study, we performed a one-way ANOVA for the eight groups for the two pretests, and both ANOVAs were statistically significant: $GJT : F (7, 145) = 2.96, p < .001, {n_{p}}^{2} = .669$ ; $EIT : F (7, 158) = 2.96, p = .006, {n_{p}}^{2} = .116$ . Post hoc analyses using Bonferroni correction indicated that the p values were consistently less than .001 for all comparisons between lower- and higher-proficiency groups but consistently greater than .05 for comparisons within the same level of proficiency. Therefore, the level of explicit and implicit prior knowledge between the two levels diverged significantly, but it did not diverge within the same level.

The overall trends were the same for both lower and higher-proficiency participants, as shown in Tables 14.3. and 14.4. Scores on the LLAMA D subtest were the lowest and those on the LLAMA E test the highest, to the point where there seemed to be a ceiling effect for this subtest.

Table 14.3 Scores on the LLAMA subtests for lower-proficiency participants

		LLAMA D		LLAMA E		LLAMA F
Group	N	M	SD	M	SD	M	SD
Pretask	24	23.91	17.32	93.01	8.38	60.42	22.36
Task	15	30.67	14.25	89.00	23.01	56.00	29.71
Posttask	20	33.95	11.50	92.50	12.09	65.00	25.85
Control	20	29.25	12.17	84.50	26.85	54.00	29.80

Note. Maximum score: LLAMA D = 75; LLAMA E and F = 100.

Table 14.4 Scores on the LLAMA subtests for higher-proficiency participants

		LLAMA D		LLAMA E		LLAMA F
Group	N	M	SD	M	SD	M	SD
Pretask	18	33.23	12.61	96.47	4.93	74.11	21.23
Task	22	25.44	14.21	88.19	23.02	60.91	23.07
Posttask	21	29.76	16.01	94.76	8.14	68.10	25.81
Control	19	24.47	15.08	90.53	13.93	50.00	28.48

Note. Maximum score: LLAMA D = 75; LLAMA E and F = 100.

In the following section, we present the results of the multiple hierarchical regressions. For each analysis, we checked the assumptions of multiple regression and found no violations of the assumptions. We obtained no value greater than 3 for the variance inflation factor, for which the generally accepted limit indicating a multicollinearity problem is 10. Tables 14.5–14.8 present the predictors that were statistically significant for each regression analysis.

Table 14.5 Significant predictors of the grammaticality judgement posttests for lower-level proficiency participants

		Immediate posttest				Delayed posttest
Group	Predictors	β	t	p	R²	Predictors	β	t	p	R²
Pretask	–					–
Task	–					L_F	0.11	2.57	0.03	.27
Posttask	–
Control	–					–

Note. L_D = LLAMA D; L_F = LLAMA F.

Table 14.6 Significant predictors of the elicited imitation posttests for lower-level proficiency participants

		Immediate posttest				Delayed posttest
Group	Predictors	β	T	p	R²	Predictors	β	T	p	R²
Pretask	Pretest L_F L_E	1.57 0.11 −0.30	4.02 2.62 −3.23	.001 .019 .005	.38 .16 .24	Pretest	1.01	2.34	.031	.23
Task	–					–
Posttask	Pretest	1.83	2.32	.037	.24	Pretest	1.96	3.02	.008	.35
Control	Pretest	0.73	2.54	.021	.26	Pretest	0.79	2.18	.044	.22

Note. L_E = LLAMA E; L_F = LLAMA F.

Table 14.7 Significant predictors of the grammaticality judgement posttests for higher-level proficiency participants

		Immediate posttest				Delayed posttest
Group	Predictors	β	T	P	R²	Predictors	β	t	p	R²
Pretask	–	–	–	–	–	Pretest	0.73	3.02	.009	.38
Task	L_D	0.10	2.53	.024	.26	L_D	0.16	2.83	.015	.35
Posttask	–	–	–	–	–	–
Control	Pretest	0.77	5.41	0	.65	Pretest	0.73	5.59	0	.66

Note. L_D = LLAMA D.

Table 14.8 Significant predictors of the elicited imitation posttests for higher-level proficiency participants

		Immediate posttest				Delayed posttest
Group	Predictors	β	t	P	R²	Predictors	β	t	p	R²
Pretask	–					–
Task	–					–
Posttask	Pretest	1.06	4.33	0	.50	Pretest	1.18	6.49	.001	.69
Control	Pretest	0.86	4.69	0	.58	Pretest	0.77	2.86	.012	.35

For the GJT of the lower-proficiency participants, pretest scores were not a predictor for any group. The only contributions obtained for language aptitude components were for the within-task group at the delayed posttest, where LLAMA F explained 27% of the variance. For the EIT, unlike the GJT, the pretest scores explained a significant contribution of the variance of all groups at both the immediate and delayed posttests, except for the within-task group. With regards to the contribution of the aptitude components, only one positive relationship was obtained: the LLAMA F subtest explained 16% of the variance of the immediate posttest of the pretask group. For that same group, an inverse relationship was also observed for the LLAMA E, which explained 24% of the variance.

For the GJT of the higher-level proficiency participants, as opposed to the lower-proficiency participants, the pretest scores made significant contributions to the results of the control group for the immediate and delayed posttest and for the delayed posttest of the pretask group. Only one contribution from language aptitude was obtained: LLAMA D made a significant contribution to the immediate posttest of 26% and to the delayed posttest of 35% for the within-task group. For the EIT, no significant contributions of language aptitude were observed. Only the pretest score explained part of the variance significantly for the posttask and control groups at the immediate and delayed posttests.

Overall, for the lower-proficiency group, only the LLAMA F subtest was associated with a positive contribution for the pretask and during groups, and pretest scores were associated only with the EIT results.

For the higher-level proficiency participants, the only language aptitude predictor was LLAMA D for the within-task group. The initial level of knowledge of the participants was the predictor that was most associated with the development of knowledge, especially in the case of the EIT.

Discussion

This study sought to investigate the effect of different components of language aptitude on the timing of form-focused instruction within a task whilst taking into consideration learners’ proficiency level.

Timing of Instruction and Aptitude

Regarding the timing of instruction, language aptitude was only a predictor for the participants who had received instruction during the pretask or within the task, both for the lower- and the higher-proficiency groups. These conditions could be considered the most explicit ones because the participants had been presented with explicit explanations at the outset or during the tasks and could use the rest of the tasks to make sense of these rules. This finding is consistent with research trends that show that aptitude is associated more with explicit rather than implicit instruction (Li, Reference Li2013; Sheen, Reference Sheen and Mackey2007; Yalçin & Spada, Reference Yalçın and Spada2016; Yilmaz & Granena, Reference Yilmaz and Granena2016). However, unlike our study, Li et al. (Reference Li, Ellis and Zhu2019) and Kachinske and DeKeyser (Reference Kachinske and DeKeyser2019) found no role for language aptitude for the pretask and within-task groups but observed a positive contribution for the posttask groups and their task-only group. Two aspects of the treatment could explain this difference: the grammatical notion targeted and task type. In the aforementioned studies, the researchers targeted structure that could be considered salient, whereas in our study, the subjunctive is considered less salient. Indeed, the difference between indicative and subjunctive present can be quite difficult to perceive (e.g., for second person plural: prenez[ind.]/preniez[subj.]) As Skehan observed (Reference Skehan2015), greater saliency of a structure attracts the attention of higher-level aptitude learners. Therefore, when a structure is not salient and when instruction is absent, learners with higher language aptitude cannot make use of their skills to process a structure that does not attract their attention. Provision of explicit instruction in the earlier phases of a task seems to be necessary to activate language aptitude. These findings are in line with the assumptions made by Li (Reference Li2013) regarding the conditions under which aptitude is associated with explicit or implicit instruction: under conditions of implicit teaching, language aptitude can facilitate the learning of simple and transparent structures, whereas in the case of complex and opaque structures, explicit instruction is necessary to ease the processing load, as was the case in our study.

As for task type, as explained earlier, Li et al. (Reference Li, Ellis and Zhu2019) and Kachinske and DeKeyser (Reference Kachinske and DeKeyser2019) used tasks that were heavily focused on a certain structure: a dictogloss task that contained many passive structures and an activity that required the correct understanding of a particular structure to complete the exercises. Both of these tasks could have primed learners who did not receive any explicit instruction to rely on their language aptitude to analyse patterns in the input. The tasks used in the present study were more meaning focused, so the incentive to understand and produce subjunctive was not strong.

Level of Proficiency and Components of Aptitude

Skehan (Reference Skehan1998, Reference Skehan, Granena, Jackson and Yilmaz2016, Reference Skehan, Wen, Skehan, Biedroń, Li and Sparks2019) posited that different components of language aptitude might intervene at different stages of acquisition. According to his conceptualisation, three stages would be involved: handling sounds at the beginning stage, handling patterns in the intermediate stage, and automatising–proceduralising at the latter stages of acquisition.

The first stage, handling sounds, as measured by LLAMA E, was not positively associated with any conditions. Skehan (Reference Skehan, Wen, Skehan, Biedroń, Li and Sparks2019) proposed that phonological coding was associated with the first stage of learning. In our study, all participants were either intermediate (B1) or high-intermediate (B2) learners, and even the lower-proficiency participants started with some knowledge of the subjunctive as measured by the GJT. So, pressure to rely on phonetic coding to decipher this structure may not have been as strong as it might be with no-proficiency learners. It is worth noting that, in their recent validation study of the LLAMA test, Bokander and Bylund (Reference Bokander and Bylund2020) observed a ceiling effect for LLAMA E, which is also what we observed in our study. Bokander and Bylund hypothesised that the items in the LLAMA E subtest contain a certain systematicity and can be approached as a problem solving test rather than a phonetic coding test, which may be what the participants in our study did. Therefore, it is hard to speculate more on this question.

Interestingly, language analytical ability, the measure of the second stage – handling patterns – was only associated with the lower-proficiency participants, and the measure of the last stage – automatising–proceduralising assessed by LLAMA D – was only associated with higher-level proficiency participants. Therefore, our results support Skehan’s hypothesis that identifying patterns and making generalisations might be useful at an earlier stage, whereas learners who possess more knowledge might have to rely on automatising–proceduralising abilities to progress along the acquisition stages. In a study by Artieda and Muñoz (Reference Artieda and Muñoz2016), LLAMA D was associated with beginner learners and LLAMA F was associated with both beginner and intermediate learners. However, Artieda and Muñoz looked only at global competency tests and not at specific structures. Therefore, their study did not offer fine-grained measures for examining the involvement of specific components at different stages of acquisition of specific subsystems.

Level of Previous Knowledge

Following Li et al.’s (Reference Li, Ellis and Zhu2019) recommendation, we entered pretest scores as a predictor to control for previous knowledge instead of using gain scores. It is interesting to note that pretest scores were more related to the EIT than to the GJT, especially among the lower-proficiency participants. EIT is a measure of proceduralised/implicit knowledge. Therefore, it would seem logical that learners with more previous knowledge can capitalise on that knowledge to advance alongside the declarative–procedural continuum.

Pedagogical Implications

Recommendations of certain researchers and methodologists who see a role for form-focused instruction at the end of the task must once again be evaluated using evidence from research such as ours, that is, a one-size-fits-all approach does not seem to be the best one to apply. Based on our study of a structure that has very low saliency, deferring instruction to the end of the task does not seem to provide an advantage for any type of learners. Learners would need the support of initial explicit instruction provided in either the pretask or during the task to be able to benefit from their language aptitude.

Limitations and Future Studies

Limitations of this study primarily concern the testing instruments. As we mentioned previously, even though the LLAMA test offers many advantages and has been used widely in recent aptitude–treatment interaction research, the reliability of certain subtests has not been shown to be strong, so caution should be taken when analysing the present results. Furthermore, we used two tests deemed to assess the development of explicit and implicit knowledge, respectively the GJT and the EIT, but for the EIT, debate exists as to the exact nature of the knowledge that it taps: proceduralised explicit knowledge versus implicit knowledge (Sukuki, Reference Skehan2015). The same issue has also been raised regarding LLAMA D. Even though Granena (Reference Granena, Granena and Long2013) has suggested that it might tap into implicit processing, Suzuki (Reference Suzuki2021) has doubted that this is truly the case because the learning phase requires conscious processing. Results from that study seem to suggest that LLAMA D might be considered as a measure of proceduralisation. Although we have acknowledged the distinction between automatised explicit and implicit knowledge, the goal of this study, being classroom-oriented in nature, was not to address this distinction. Lastly, our research looked at B1–B2 learners. However, because the French subjunctive is a structure that is acquired at a later stage, future research might want to control language aptitude for more advanced learners (C1–C2).

15 Implicit (Not Explicit) Learning Aptitude Predicts the Acquisition of Difficult (Not Easy) Structure A Visual-World Eye-Tracking Study

Introduction

Researching individual differences in cognitive aptitude has proved to be a useful approach for understanding the relationship between L2 learning and resultant knowledge. Second language acquisition (SLA) researchers have strived to identify systematic relationships between cognitive aptitude and L2 knowledge to elucidate underlying learning processes (DeKeyser, Reference DeKeyser2012). Of particular interest to us as SLA researchers is that associations between aptitude and grammatical knowledge have been examined from explicit and implicit perspectives (e.g., Bolibaugh & Foster, Reference Bolibaugh and Foster2021; DeKeyser, Reference DeKeyser2000; Granena, Reference Granena2013a; Suzuki & DeKeyser, Reference Suzuki and DeKeyser2015, Reference Suzuki and DeKeyser2017). Explicit learning refers to conscious learning processes, whereas implicit learning refers to the learning process without intention or awareness (Andringa & Rebuschat, Reference Andringa and Curcic2015; Hulstijn, Reference Hulstijn2005).

Recent innovations in two related lines of L2 research – development of tests for cognitive aptitude and for grammar knowledge – have pushed the boundary of our understanding of the interface of explicit and implicit learning and knowledge, which has had a profound impact on L2 theories and practical issues. First, in the last decade, a proposal was made for distinguishing two aptitude components, one for explicit and one for implicit learning. Explicit learning aptitude is defined as cognitive abilities that are important for intentional and rote learning and deliberate hypothesis testing, whereas aptitude for implicit learning refers to cognitive capacity for learning transitional/distributional probabilities of linguistic input without awareness as well as absence of conscious attribution of the resulting knowledge (Granena, Reference Granena2019; Li & DeKeyser, Reference Li and DeKeyser2021; Linck et al., Reference Linck, Hughes and Campbell2013).

Second, the validation research of explicit–implicit knowledge tests has offered a set of research tools to measure explicit and implicit knowledge (e.g., R. Ellis, Reference Ellis2005; Suzuki, Reference Suzuki2017; see Isbell & Rogers, Reference Isbell, Rogers, Winke and Brunfaut2021 for a recent review). Grammatical knowledge is distinguished as explicit or implicit based on whether or not awareness is involved (DeKeyser, Reference DeKeyser, Doughty and Long2003; Williams, Reference Williams, Ritchie and Bhatia2009). Although it is challenging to isolate implicit knowledge from explicit knowledge (access to the latter can also be speeded up), accumulating evidence suggests that using finely tuned reaction time (RT) and eye-tracking measures allows for assessing implicit knowledge, at least for L2 learners with sufficient immersion experience (Suzuki, Reference Suzuki2017; Suzuki & DeKeyser, Reference Suzuki and DeKeyser2015; Suzuki et al., Reference Suzuki, Jeong and Cui2022; Vafaee et al., Reference Vafaee, Suzuki and Kachinske2017).

Based on these two emerging lines of research, the current study aims to achieve two goals pertaining to the interface of explicit and implicit learning and knowledge. First, we investigate the roles of explicit and implicit aptitude on the acquisition of grammatical knowledge measured by a visual-world eye-tracking task. Because grammatical knowledge presumably results from a combination of explicit and implicit learning, this cross-sectional study attempts to elucidate underlying learning processes that may be facilitated by aptitudes for explicit and implicit learning. Second, because the roles of aptitude in learning of different types of structures vary considerably (e.g., Robinson, Reference Robinson1997; Yalçın & Spada, Reference Yalçın and Spada2016), the current study investigates to what extent explicit and implicit learning aptitude would predict two grammatical structures that differ in learning difficulty.

In the remainder of this section, we first provide a focused review on the roles of explicit and implicit learning aptitudes in acquisition of morphosyntax in naturalistic settings – target L2–speaking countries where extensive L2 exposure is putatively sufficient for (certain levels of) implicit learning. Given the current study’s scope, our review concerns late or adult L2 learners with arrival in target L2–speaking countries after age 12 (Abrahamsson & Hyltenstam, Reference Abrahamsson and Hyltenstam2008). The last part of this section discusses the complex interactions between aptitude and types of grammatical structures, both in intervention and cross-sectional studies.

Explicit Learning Aptitudes and Explicit Knowledge

Many SLA researchers have investigated the relationship between explicit learning aptitudes and L2 grammar acquisition in naturalistic acquisition settings (e.g., Abrahamsson & Hyltenstam, Reference Abrahamsson and Hyltenstam2008; DeKeyser, Reference DeKeyser2000; DeKeyser et al., Reference DeKeyser, Alfi-Shabtay and Ravid2010; Granena, Reference Granena, Granena and Long2013b; Granena & Long, Reference Granena and Long2013). In these studies, explicit learning aptitudes have been measured with different tests across studies (e.g., Modern Language Aptitude Test [MLAT], LLAMA tests), but all the tests require the ability to consciously reflect on linguistic aspects of languages. For the assessment of grammatical knowledge, grammaticality judgment tests (GJTs) have been commonly used across the studies with different task parameters (e.g., presence of time pressure).

The earlier three studies (Abrahamsson & Hyltenstam, Reference Abrahamsson and Hyltenstam2008; DeKeyser, Reference DeKeyser2000; DeKeyser et al., Reference DeKeyser, Alfi-Shabtay and Ravid2010) have consistently found a weak to moderate positive correlation for post-puberty learners between explicit aptitudes and the grammatical knowledge, which was measured by untimed GJTs (.33 < r < .53). GJTs in DeKeyser’s and Abrahamsson and Hyltenstam’s studies were untimed or “off-line,” in which participants were given enough time to allow for accessing explicit knowledge. These findings lend support for the reliance on explicit learning processes for the acquisition of explicit knowledge. A subsequent study by Granena and Long (Reference Granena and Long2013) failed to find any effects of explicit aptitudes on the acquisition of L2 morphosyntactic knowledge, which was measured with a time-pressured GJT. The lack of relationship appears to be due to the differences in how the GJTs were administered. Granena and Long’s study told participants to make a grammatical judgment as quickly as they could, and no pause was inserted between the test items, which made the task more conducive to draw on automatic processes. Although the role of (explicit) aptitude may be more important for explicit, not implicit, knowledge, which is arguably influenced by conditions under which GJTs were administered (Granena, Reference Granena, Granena and Long2013b), it remains inconclusive whether linguistic knowledge, measured by the time-pressured GJTs, was explicit or implicit (e.g., Godfroid et al., Reference Godfroid, Loewen and Jung2015; Vafaee et al., Reference Vafaee, Suzuki and Kachinske2017). It is thus still an open question to what extent explicit aptitude influences the acquisition of implicit knowledge. In order to scrutinize the role of explicit aptitude, more fine-grained measures that tap into implicit knowledge should be employed, which will be discussed next.

Implicit Learning Aptitude and Implicit Knowledge

There is a growing interest in implicit learning aptitude in SLA (Granena, Reference Granena2019; Li & DeKeyser, Reference Li and DeKeyser2021; Linck et al., Reference Linck, Hughes and Campbell2013). Despite the incipient nature of our understanding of this new construct of aptitude, researchers have started to find some systematic relationship between implicit learning aptitude and L2 grammar acquisition in naturalistic settings. Most relevant to the present study are two studies (Granena, Reference Granena2013a; Suzuki & DeKeyser, Reference Suzuki and DeKeyser2015) that used the same set of implicit aptitude and language tests. In these studies, implicit learning aptitude was measured with a serial reaction time (SRT) task (Kaufman et al., Reference Kaufman, DeYoung and Gray2010), which measures the ability of probabilistic sequence learning without awareness. Grammatical knowledge was measured by an RT task called the word-monitoring task. The word-monitoring task is claimed to be a measure of implicit knowledge because it can capture real-time grammar processing while learners’ attention is directed to meaning (Godfroid, Reference Godfroid2016; Suzuki, Reference Suzuki2017; Suzuki et al., Reference Suzuki, Jeong and Cui2022; Vafaee et al., Reference Vafaee, Suzuki and Kachinske2017). As illustrated in Figure 15.1, participants listen to a sentence that includes a monitoring word, to which participants need to respond by pressing a button as soon as they hear it. When they can detect the grammatical error (i.e., a past-tense error “purchase” in Sentence (b)), which occurs immediately before the monitoring word, their RT to the monitored word is expected to slow down relative to the grammatical sentence (a). The RT difference between the ungrammatical and grammatical items thus indexes the online sensitivity to grammatical errors during sentence comprehension. Importantly, participants are told to answer a yes/no comprehension question following the stimulus sentence presentation in order to focus participants’ attention on meaning, that is, to minimize the conscious attention directed to search for grammatical errors.

Figure 15.1 Word-monitoring task used in Granena (Reference Granena2013a) and Suzuki and DeKeyser (Reference Suzuki and DeKeyser2015)

In the aforementioned two studies, significant positive correlations were found between the SRT score (implicit aptitude) and the word-monitoring task performance (implicit knowledge) among L1 Chinese speakers who had lived in Japan at least for two and a half years (Suzuki & DeKeyser, Reference Suzuki and DeKeyser2015) and L1 Chinese speakers who had lived in Spain for a minimum of five years (Granena, Reference Granena2013a). These findings provide initial converging evidence regarding the role of implicit learning aptitude on L2 grammar acquisition of adult learners (cf. Suzuki & DeKeyser, Reference Suzuki and DeKeyser2017, who found more limited effects of implicit aptitude when a link between explicit knowledge and aptitude had been established in the statistical model). However, these previous studies used RT data from both aptitude and grammar tests. The positive correlation found between the aptitude and grammar test scores could at least in part be an artifact of the shared metrics (i.e., RT) between the two tasks.

In order to overcome this potential limitation, a visual-world eye-tracking task is employed in the current study to assess L2 grammatical knowledge. In this task, participants’ eye movement toward a display with several possible referents during aural comprehension is analyzed to examine real-time grammar processing. The sensitivity to grammatical manipulations (index of grammatical knowledge) is captured using eye-movement data, and an underlying association between knowledge and aptitude can be explored more rigorously than the relation between the RT measures. More importantly, the visual-world task is very unlikely to raise awareness of target structures. As a case in point, the post-task debriefing results have shown that participants remained unaware of the real purpose or target linguistic structure of a visual-world task (Andringa & Curcic, Reference Andringa and Curcic2015; Dussias et al., Reference Dussias, Valdés Kroff, Guzzardo Tamargo and Gerfen2013). The finely tuned eye-tracking technique is arguably a promising test of L2 implicit knowledge because it is a more direct measure of real-time grammar processing (also characterized as fast and ballistic) than RT tasks (Suzuki, Reference Suzuki2017). Despite its task constraints, a variety of grammatical structures, such as gender agreement, case-markers, tense-aspect, and pronouns have been tested in this visual-world paradigm (Godfroid, Reference Godfroid2019). In the current study, a visual-world task was designed to test two grammatical properties of the English nominal phrase – definiteness and mass–count distinctions.

Explicit and Implicit Learning Aptitudes and Grammatical Difficulty

To what extent a certain grammatical structure is amenable to explicit and implicit learning is presumably influenced by characteristics of that structure and the specific problems they pose for a given learner. According to Krashen (Reference Krashen1982), explicit learning is conducive to easy structures, whereas implicit learning favors difficult structures. The definition of easy and difficult structures – grammatical difficulty – is elusive and most likely multifaceted (DeKeyser, Reference DeKeyser2005). Grammatical difficulty should be conjointly determined by taking into account not only linguistic factors (e.g., abstractness of form–meaning mapping), but also contextual (e.g., predisposition of intentional or incidental learning) and learner-related (e.g., aptitude) factors (DeKeyser, Reference DeKeyser2016; Housen & Simoens, Reference Housen and Simoens2016).

Findings from instructed SLA research indicate that the role of explicit learning aptitudes is moderated by grammatical difficulty of target structures (e.g., Robinson, Reference Robinson1997; Yalçın & Spada, Reference Yalçın and Spada2016). Robinson (Reference Robinson1997) examined the role of aptitudes on the acquisition of L2 English syntactic rules under different types of training conditions (implicit, incidental, rule-search, and inductive). He trained intermediate ESL learners in the university language programs on an easy rule (a subject–verb inversion, e.g., “Into the house John ran”) and a difficult English syntactic rule (pseudo-clefts of location, e.g., “Where Mary and John live is in Chicago not in New York”). The difficulty of the structures was determined based on expert judgments by ESL teachers. The explicit learning aptitude score (MLAT Part 4 [Words in Sentences]) was related to the learning of easy structures, not to that of difficult structures, measured by the GJT, in the rule-search (explicit–inductive) condition. In contrast, recent EFL classroom research by Yalçın and Spada (Reference Yalçın and Spada2016) showed the opposite pattern of findings. Turkish L2 English learners received explicit, form-focused instruction on an easy (past progressive) and a difficult (passive) structure. The difficulty of the structures was determined according to various linguistic criteria, such as transparency of form–meaning mapping, frequency, saliency, and perceived difficulty by learners. Explicit learning aptitude (the LLAMA F test) turned out to be a significant predictor of the acquisition of the difficult structure, not for the easy structure, as assessed by the GJT.

The findings of Robinson (Reference Robinson1997) and Yalçın and Spada (Reference Yalçın and Spada2016) may not be as contradictory as they seem because the effect of aptitudes could have become significant only for the linguistic structures that imposed a moderate level of difficulty for their targeted participants (DeKeyser, Reference DeKeyser2016). In Robinson (Reference Robinson1997), the pseudo-clefts of location were too difficult for the intermediate ESL learners, whereas in Yalçın and Spada (Reference Yalçın and Spada2016), the past progressive structure was too easy for the Turkish learners (hence the lack of significant aptitude effects for either case). In contrast, higher aptitude could have facilitated the learning of the subject–verb inversion rule (easier than the “too difficult” structure) and the passive structure (harder than the “too easy” structure), respectively.

While these short-term intervention studies examined the effects of explicit learning aptitude in instructed settings, the role of implicit learning aptitude for different types of grammatical structures was examined in the aforementioned naturalistic acquisition study by Granena (Reference Granena2013a). The word-monitoring task measured the real-time sensitivity to six grammatical structures categorized into two types: the agreement structures (noun–adjective gender agreement, subject–verb agreement, and noun–adjective number agreement) and the non-agreement structures (subjunctive mood, perfective/imperfective aspect, and passives with ser/estar). These two categories were based on a developmental stage in L1 acquisition. The first type of structures is mastered by age three in the L1 acquisition of Spanish, whereas the latter is not fully acquired until age seven or later in L1 Spanish. The results showed that implicit learning aptitude, measured by the SRT task, significantly predicted the acquisition of the agreement structures only, not that of the non-agreement structures.

According to Granena’s interpretation, the implicit learning aptitude might have compensated for the lack of inflectional morphology in L1 Chinese in order to acquire the Spanish agreement structures involving a rich inflectional paradigm. Consistent with prior research on explicit learning aptitude in instructed settings (Robinson, Reference Robinson1997; Yalçın & Spada, 2016), the effects of implicit learning aptitude should also be examined by taking the linguistic difficulty for a given group of L2 learners into account. In order to understand explicit and implicit learning processes, it is of high importance to study the complex interaction between explicit–implicit aptitudes and linguistic difficulty.

The Current Study

In order to explore the relationship between explicit–implicit learning and grammatical knowledge, the current study aimed at clarifying the contribution of explicit and implicit learning aptitude to the acquisition of L2 implicit knowledge by adult L2 English speakers with L1 Chinese who lived in the United States. The visual-world task was employed to assess L2 real-time grammatical processing as an indicator of implicit knowledge of definiteness and the mass–count distinction. These two distinctions within the English noun phrase were chosen because they are interesting cases in which to examine whether the contribution of aptitudes is moderated by grammatical difficulty. Definiteness is distinguished by the and a, and pragmatically guided by discourse information. Definite descriptions like “the can” are used when referents are uniquely identifiable, while indefinite descriptions like “a can” are used when multiple referents are possible (Lyons, Reference Lyons1999). English uses a numerical counter for a count noun like “two candles” and a quantifier for a mass noun like “two pieces of bacon” to distinguish between quantification of referents with rigid boundaries from ones without them. It is ungrammatical to use a numeral counter with a mass noun (e.g., two bacon).

From a psycholinguistic perspective, definiteness is essentially much more challenging for acquisition than the mass–count distinction for the current L2 learners. First, the two systems are different in terms of the degree of transfer effects from L1. An analogous distinction of definiteness does not exist in L1 Mandarin Chinese, whereas the mass–count distinction bears resemblance to the Chinese classifier systems (Cheng & Sybesma, Reference Cheng and Sybesma1998, Reference Cheng and Sybesma1999). Based on similar structures in L1, L2 learners can selectively attend to relevant features within the L2 input (N. C. Ellis et al., Reference Ellis, Hafeez and Martin2012), which may facilitate the acquisition of the mass–count distinction. Second, the two systems differ in complexity of meaning. Definiteness is a more abstract concept than the mass–count distinction; learners have to learn how to map the definite and indefinite articles to variable discourse semantics (e.g., identifiability of referents in context). On the other hand, the mass vs. count status is based on rigid boundaries of referents, although the distinction can sometimes be arbitrary for L2 learners. These factors are related to each other and all contribute to the different level of difficulty in their acquisition (Graus & Coppen, Reference Graus and Coppen2015; Housen, Reference Housen and Chapelle2014; Yalçın & Spada, Reference Yalçın and Spada2016).

Empirical evidence supports the higher difficulty of definiteness over mass–count. Hua and Lee (Reference Hua, Lee, Dekydtspotter, Sprouse and Liljestrand2005) showed that L2 English learners with L1 Chinese (i.e., third-year Chinese college students) often correctly rejected descriptions like “ten beef” and “three rice.” Their GJT performance was indistinguishable from native English speakers. On the other hand, definiteness is notoriously difficult, especially for L2 learners whose L1 does not make the distinction (Ionin et al., Reference Ionin, Zubizarreta and Philippov2009; Robertson, Reference Robertson2000; Snape, Reference Snape2008; Trenkic, Reference Trenkic2008). Although some types of article errors seem to decrease in frequency as proficiency increases (Lu, Reference Lu2001; Trenkic, Reference Trenkic2002; Young, Reference Young, Bayley and Preston1996), non-target-like use appears to persist for an extended period of time under some circumstances (Lardiere, Reference Lardiere2007; White, Reference White2003). Critically, the current sample of L2 English learners with L1 Chinese residing in the United States were able to make the mass–count distinction, whereas they had substantial difficulty in distinguishing definiteness (see the Preliminary Analysis of Visual-World Task section).

Explicit and implicit learning aptitudes were used as predictors for the acquisition of these two grammatical structures. Explicit learning aptitude, operationalized as language-analytic ability, was measured with two aptitude tests that are characteristically involved in explicit learning: the “Words in Sentences” of MLAT, or MLAT_4 (Carroll & Sapon, Reference Carroll and Sapon1959) for assessing grammatical sensitivity, and the LLAMA F (Meara, Reference Meara2005) for assessing inductive language learning ability. Implicit learning aptitude was measured by the probabilistic SRT task (Kaufman et al., Reference Kaufman, DeYoung and Gray2010). The current study addresses the following research questions (RQs):

1. To what extent do explicit and implicit learning aptitude contribute to the attainment of real-time grammar processing?
2. Do the contributions of explicit and implicit aptitude vary by the L2 structures (definiteness and count-mass distinction)?

With regard to RQ1, we hypothesize that implicit, not explicit, learning aptitude will be a significant predictor for real-time grammar processing ability, measured by the visual-world task. The second RQ further explores whether the effects of aptitudes change depending on linguistic structures. We hypothesize that implicit learning aptitude may play a facilitative role in the acquisition of definiteness because of the complexity of mastering the article system. Due to the lack of a similar L1 system of definiteness for Chinese speakers in the current study, learners may need to rely on implicit learning mechanisms to successfully acquire the definiteness distinction from scratch, leading to a systematic relationship between aptitude for implicit learning and acquisition of this distinction. For the mass–count distinction, we predict that there will be no systematic contributions of implicit learning aptitude because the learning difficulty is less burdensome. With regard to the role of explicit learning aptitude, we leave the question open for the context of this study – naturalistic L2 acquisition – given the scarce research findings even in instructed settings (Robinson, Reference Robinson1997; Yalçın & Spada, Reference Yalçın and Spada2016).

Methods

Participants

Sixty-five English L2 learners with L1 Chinese participated in the current study. Most of them were university students (15 undergraduate, 26 masters, and 21 doctoral students), and three participants were employees (two participants with MA degrees and one with a PhD degree). Since all participants had received or were receiving college education entirely in English, their proficiency was deemed advanced. They had all arrived in an English-speaking country after the age of 15; the mean age of arrival was 21.29 ( $SD = 3.41$ , range 15–28 years). They had received EFL/ESL instruction for an average of 11.66 years ( $SD = 3.43$ , range 5–20 years). The mean length of residence in English-speaking countries was 46.22 months ( $SD = 29.67$ , range: 1–121 months). Twenty-eight native speakers (NSs) were also recruited to ensure that the visual-world task would work as expected.

Instruments

Visual-World Task: Definiteness. Figure 15.2 (Panel A) illustrates that displays for the definiteness trials consisted of two possible locations, a distractor location, and a theme (cf., Chambers et al., Reference Chambers, Tanenhaus, Eberhard, Filip and Carlson2002; Trenkic et al., Reference Trenkic, Mirkovic and Altmann2014). Possible locations involved two same-category locations (e.g., big and small cans), while the distractor location involved a different kind of location (e.g., a bowl). Sentences involved pairs of instructions: “Pick up the pig. Now put it inside ____.” Two critical trial types were created by manipulating the definiteness of the article as a within-subject factor: definite (e.g., “the can”) and indefinite trials (e.g., “a can”). The indefinite description (inside a can) matched the display where multiple goal locations (two cans) were available, whereas the definite description (inside the can) was pragmatically inappropriate because the display does not include a single uniquely identifiable goal. If participants were sensitive to the definiteness distinction, the reference resolution would be facilitated (resulting in faster convergence of eye movement to a goal location) when hearing an indefinite description compared to a definite description. Sixteen additional filler trials were also included so that participants could not predict the goal location prior to the instruction (see Appendix A in Supplementary Materials for a list of stimuli).

Figure 15.2 Sample displays of the visual-world task for definiteness condition (panel A) and mass–count condition (panel B)

Visual-world Task: Mass–count. Figure 15.2 (Panel B) illustrates that displays for the mass–count trials consisted of two possible locations and two themes. Themes involved pairs of printed pictures of count nouns (e.g., belt and cell phone), while the two possible locations involved pairs of three-dimensional miniatures of count nouns (e.g., candles) and mass nouns (e.g., bacon). Sentences involved pairs of instructions: “Pick up the belt. Now put it on top of ______.” Two critical trial types were created by manipulating the referring expressions as a within-subject factor: quantifier (e.g., “some bacon”) and numeral trials (e.g., “the two bacon”). Since the numeral description was ungrammatical (e.g., “two bacon”), the numeral “two” was expected to mislead participants to look for the count noun (e.g., “two candles”). On the other hand, the quantifier “some” can follow either a count or a mass noun. The referent resolution (choosing the mass noun) was thus expected to be facilitated in the quantifier trials rather than in the numeral trials. If participants were sensitive to the mass–count distinction, eye-movements should converge to the mass noun faster after hearing a quantified expression than a numerical expression. Eight additional filler trials asked participants to put the theme on top of the count (competitor) noun (e.g.., “Now put it on top of some/the two candles”) so that participants could not predict the goal location prior to the instruction (see Appendix B in Supplementary Materials for a list of stimuli). While the modifier again varied across trials, it was always grammatically correct. The count and mass nouns were chosen for this study based on a norming study (see Appendix C in Supplementary Materials).

Visual-World Task: Procedure. Participants were seated in front of a podium consisting of four shelves. An object was placed on each quadrant, and a camera was placed at the center of the display to record participants’ eye movements on these objects. A second camera was located behind the participants and recorded their actions following the sentences. Each trial began with the experimenter taking four objects out of a bag and labeling them using a bare singular form (e.g., “bacon,” “candle,” “cell phone,” “belt”). Each object was placed on a dedicated quadrant on the podium. The experimenter then played a pair of audio-recorded sentences that instructed participants to move an object in/on the container/object in one of the quadrants to another quadrant. All sentences were pre-recorded by a female native speaker of English.

A total of 40 trials were presented, consisting of 16 critical trials (four definite trials, four indefinite trials, four quantifier trials, and four numeral trials) and 24 filler trials. All trials were presented in semi-randomized order such that the same trial type never occurred more than twice in a row. Multiple versions of items in the critical trials were distributed across four counterbalanced lists, constructed to control for the number of items in each condition within subjects and vary the condition for each item between subjects. Within each counterbalancing list, the placement of each object type was rotated through the four quadrants on the podium, such that goal locations/objects could not be predicted based on their location.

Visual-World Task: Debriefing. The experimenter asked the participants two questions immediately after the visual-world task was completed. The first question asked about their general impression of performing the task (“How was the task?”). The second question specifically aimed at the participants’ awareness of the target grammatical structures embedded in the task (“Did you notice anything weird in the sentences you heard?”). When a participant reported that they noticed infelicitous/ungrammatical features in the sentences, we interpreted this as indicating they became aware of the target structures either during or after the visual-world task. This debriefing procedure was only conducted for L2 learners.

SRT Task. The SRT task was administered to measure the domain-general ability of learning sequences without awareness (i.e., implicit learning aptitude). In the current study, the probabilistic SRT task was adopted from Kaufman et al. (Reference Kaufman, DeYoung and Gray2010). In the SRT task, participants saw a dot appearing at one of four locations on the computer screen and responded to it as quickly and accurately as possible by pressing the corresponding key. Their RTs were recorded for each response. Unbeknownst to the participants, the sequence of stimuli was generated by a probabilistic rule: 85% of the sequences followed the rule (probable, training condition), whereas the other 15% of the sequences were generated by another rule (improbable, the control condition). This probabilistic nature of the SRT task made it difficult to learn the sequence explicitly. There were eight blocks, and each block consisted of 120 trials, giving 960 trials in total. The scoring method devised in Kaufman et al. (Reference Kaufman, DeYoung and Gray2010) was used in the current study, resulting in the score ranging from 0 to a maximum of 6 (see Appendix D in Supplementary Materials for details). The internal consistency, indexed by Cronbach’s alpha, was .46. Although this reliability index is considered very low for more traditional cognitive tests, such as a working memory task, it was deemed acceptable for the implicit learning task in both psychology and SLA (Granena, Reference Granena2013a; Kaufman et al., Reference Kaufman, DeYoung and Gray2010; Reber et al., Reference Reber, Walkenfeld and Hernstadt1991; Suzuki & DeKeyser, Reference Suzuki and DeKeyser2015).

In order to provide some evidence that the knowledge developed and assessed in the task was implicit, a surprise recognition test was also administered immediately after the SRT task was completed. This test assessed whether participants became aware of the sequence patterns in the SRT task, i.e., whether they developed explicit knowledge about the sequence ( see Suzuki, Reference Suzuki2015 for details). The results from the recognition task confirmed that the knowledge the participants developed through the SRT task was implicit (see Appendix E in Supplementary Materials).

LLAMA F. LLAMA F (Meara, Reference Meara2005) was administered to measure inductive language learning ability. Inductive language learning ability refers to the capacity to induce rules governing a given foreign language with conscious effort (Carroll, Reference Carroll, Parry and Stansfield1991). This test consisted of a learning phase and a test phase. In the learning phase, participants were given five minutes to learn a new language by seeing sentences matched with pictures. In the testing phase, the program displayed a picture and two sentences, one grammatical and the other ungrammatical, and the learners’ task was to choose the grammatical sentence. In order to increase reliability, ten test items were added to the original 20 test items, so the total number of items was 30 (see Suzuki & DeKeyser, Reference Suzuki and DeKeyser2017 for details). To reduce the skewness, the LLAMA F score was square-root-transformed for statistical analyses. The internal consistency, indexed by Cronbach’s alpha, was .66.

Chinese MLAT Part 4. The Chinese version of MLAT_4, “Words in Sentences,” was administered to measure grammatical sensitivity. Grammatical sensitivity is the awareness of the syntactic patterns and grammatical functions of sentences in the test-takers’ L1. This Chinese-version MLAT was validated and found to be a significant predictor for English achievement test scores (Xia, Reference Xia2011). The participants were asked to identify the parts of speech in the sentences in L1. They were first presented with the key sentence in which one word is underlined and bolded, and their task was to select the word in the second sentence that had the same grammatical function as the underlined word in the key sentence. The total number of items was 20. To reduce the skewness, the MLAT_4 score was log-transformed for statistical analyses. The internal consistency, indexed by Cronbach’s alpha, was .59.

Procedure

All the L2 learners participated in two individual sessions in a quiet laboratory. They took the visual-world task in the first session. In the second session, which took place on a separate day (usually after one week), they were administered the following tasks in fixed order: SRT task, LLAMA F, and MLAT_4.

Data Coding of Visual-World Task

In the visual-world task, participants’ actions for the critical trials were coded by the first author and five trained research assistants. Action accuracy (whether participants put correct themes in correct locations) was near ceiling across all conditions and groups (>99%). Eye-movement data from the incorrect trials in which participants misidentified themes or locations were excluded, which only accounted for 0.4% (NS) and 0.6% (L2) of all trials in the definiteness condition and 0% (NS) and 0.2% (L2) of those in the mass–count condition.

The same individuals also coded participants’ eye movements using the frame-by-frame annotation software Vcode (Hagedorn et al., Reference Hagedorn, Hailpern and Karahalios2008). All coders were blind to the condition of the trial. For every frame, eye movements were coded as fixations on one of the quadrants (upper-left, upper-right, lower-left, lower-right), center of the display, or missing due to blinks, looks outside of the podium, or track loss. Missing frames accounted for 2.5% of the data in the NS group and 9.4% in the L2 learners. For all remaining frames, fixation locations were then recoded based on the displayed object. We were primarily interested in looks to two possible locations, coded as Targets and Competitors. In the critical definiteness trials where a little theme was included, a Target was defined as a location that matched the size of the theme (e.g., small can), while a Competitor was defined as a non-size-matching location (e.g., big can). This is because when coders scored which target location participants selected, the results revealed a strong tendency for participants to match the size of the theme with the size of the location (e.g., small pig placed in the small can). Matching responses exceeded chance for both NS $(M = 77.2 %, SD = 19.6 %, t (27) = 1331.79, p < .001)$ and L2 learners $(M = 68.6 %, SD = 20.6 %, t (64) = 1933.86, p < .001)$ .Footnote ¹ In the mass–count trials, Targets were defined as mentioned locations (i.e., mass nouns) while Competitors were defined as non-mentioned locations (i.e., count nouns) because mass nouns were the goal locations in both quantifier and numeral trials.

Preliminary Analysis of Visual-World Task

Since we did not have specific hypotheses about when the effects of linguistic processing would emerge in eye movements, we focused on eye movements over an extended period, beginning from 200 ms before the onset of the determiner until 1,000 ms later. For each linguistic cue (i.e., “a/the” and “two/some”), the window was shifted 200 ms after the linguistic cue in the speech stream to account for the time it takes to generate a saccadic eye movement (Matin et al., Reference Matin, Shao and Boff1993). Within each window, our dependent measure was the preference for the Target over the Competitor. This was calculated as the number of samples (for a given trial and window) in which participants looked at the Target minus the number of samples in which they looked at the Competitor. If this number was positive, Target preference was 1. If it was negative, then Target preference was 0. If participants looked at neither object, or at both objects equally, this sample was excluded from the analysis.

Definiteness. Figure 15.3 illustrates that, as expected, NS participants were quicker to look at the Target following indefinite descriptions compared to definite descriptions (Panel A). This led to a Target advantage that began approximately 100 ms after determiner onset and continued until the 600 ms window. In contrast, L2 learners showed the opposite pattern. Target preference was unexpectedly greater in the definite trial compared to the indefinite trial (Panel B). This pattern, opposite from what was seen in the NS group, began approximately 200 ms before determiner onset and continued until the 400 ms window, possibly due to the co-articulation in the target phrase, “inside a/the.”

Figure 15.3 Time-course of target preference for the definiteness condition

Note. The figures illustrate the time windows that are shifted 200 ms after the linguistic cues. The mean length of the modifier and noun was 281 ms and 642 ms for quantifier trials and 435 ms and 679 ms for numeral trials.

As Target preference successfully captured the sensitivity of definiteness by the native English speakers, we used Target preference to compute a “sensitivity index” for each L2 learner by subtracting Target preference in the definite trials from that in the indefinite trials. A greater index indicates higher sensitivity to the distinction. The index was calculated based on the critical time region in which NSs consistently showed the sensitivity (i.e., 100–600 ms). We used NSs’ critical region as a reference for computing the L2 sensitivity index because the study aimed to assess to what extent L2 learners’ linguistic processing ability is qualitatively similar to NSs’ ability (implicit knowledge).

Mass–count. Figure 15.4 illustrates that, as expected, the NS group was quicker to look at the Target following the quantifier compared to the numeral (Panel A). This led to a Target advantage that began immediately after quantifier onset and continued until the 600 ms window. Similar patterns were also observed for the L2 learners. The same Target advantage for quantified expressions was observed from the −200 ms window to the 800 ms window (Panel B).

Figure 15.4 Time-course of target preference for the mass–count condition

Note. The figures illustrate the time windows that are shifted 200 ms after the linguistic cues. The mean length of the modifier and noun was 281 ms and 642 ms for quantifier trials and 435 ms and 679 ms for numeral trials.

Similar to the definiteness condition, we also computed a “sensitivity index” for each L2 learner by subtracting the Target preference in the numeral trials from that in the quantifier trials. The index was calculated across a critical region that began 100 ms from the onset of quantifier to the 600 ms window so that we could fairly compare the sensitivity to definiteness.Footnote ²

Results

Debriefing of Visual-World Task

In answer to the first debriefing question (“How was the task?”), most participants reported “interesting,” “fun,” “easy,” and “simple.” No participants mentioned anything about ungrammatical/infelicitous aspects of stimulus sentences. In response to the more specific second question (“Did you notice anything weird in the sentences you heard?”), three participants reported that they found the determiner of the noun was wrong (e.g., “the sentence said the cup, but there were two cups”), and eight participants reported that they noticed the ungrammaticality of “numeral + mass noun” (e.g., “‘two meat’ should be ‘two pieces of meat’”). These “aware” participants were excluded from further analysis in order to ensure that the visual-world task never triggered noticing of the target structures. Additionally, we excluded two participants due to experimenter errors in procedure. Note that excluding these participants did not change the pattern of the current findings. Results including all 65 participants before excluding the aforementioned participants are presented in Appendix F in Supplementary Materials.

Descriptive Statistics for Dependent and Independent Variables

Descriptive statistics for the eye-tracking measures and the aptitude tests are presented in Table 15.1. The eye-tracking measure for both structures was computed by summing up the z scores of “sensitivity index” (see Preliminary Analysis of Visual-World Task) for definiteness and mass–count structures in order to equally weight the indices for the two structures (see Granena, Reference Granena and Long2013 and Suzuki & DeKeyser, Reference Suzuki and DeKeyser2015, for a similar approach). This composite score indicates online sensitivity for the two systems together. Although we predicted that the two structures would involve different learning problems, we were also interested in whether a more broadly defined ability for real-time grammar processing can index implicit knowledge. As L2 learners were sensitive to the mass–count distinction, the mean eye-tracking score for mass–count is positive (i.e., a higher target preference in the numeral trials than that in the quantifier trials). The score for the definiteness is negative as the L2 participants showed sensitivity in the opposite direction to NS participants’ (i.e., a higher target preference in the indefinite trials than that in the definite trials). Despite the difference in the means, distributions were almost identical between definiteness and mass–count conditions ( $SD = 0.29$ and 0.30, respectively). According to the Kolmogorov-Smirnov test, the sensitivity indices for definiteness and mass–count conditions were normally distributed $(p s > .1)$ , but the sensitivity index for the two structures combined was not $(p = .01)$ . Square-root transformation was applied to reduce the skewness for that sensitivity index.

Table 15.1 Descriptive statistics for language and aptitude measures for L2 learners

		N	M	SD	Min	Max	Possible Max
Dependent V.	Def. + Count/Mass	55	0.14	1.46	−4.20	3.34	−
	Definiteness	60	−0.06	0.30	−0.73	0.73	1
	Count/Mass	55	0.20	0.31	−0.47	0.82	1
Independent V.	SRT	65	2.14	1.46	0	6	6
	MLAT_4	65	14.77	2.66	7	19	20
	LLAMA F	65	24.88	3.24	17	30	30

Relationship of Eye-Movement Data with Explicit and Implicit Learning Aptitudes

In order to examine to what extent explicit and implicit aptitudes predict real-time grammar processing, correlations and multiple regression analyses were conducted. The dependent variables were eye-tracking scores for definiteness and the mass–count distinction separately and their combined score (square-root transformed). The independent variables were SRT scores, LLAMA F scores (square-root transformed), and MLAT_4 scores (log transformed).

Table 15.2 presents correlations between the three eye-tracking scores and the three aptitude scores. Significant correlations with the eye-tracking scores for both structures $(definiteness + mass ‐ count)$ and definiteness only $(r = .30 and .35, p < .05)$ were found only for the SRT scores. The eye-tracking score for the mass–count distinction was not significantly correlated with any of the aptitude test scores; the largest magnitude was .19 with MLAT_4.

Table 15.2 Correlation coefficients (p values) between eye-tracking scores and aptitude scores

	SRT	LLAMA F	MLAT_4
Combined	.30^*	.10	.10
	(.03)	(.48)	(.48)
Definiteness	.35^**	.03	−.04
	(.01)	(.85)	(.79)
Mass–count	.07	.04	.19
	(.63)	(.77)	(.16)

* $p < .05$ , ** $p < .01$ .

Note. The SRT score was not correlated with either of the LLAMA F $(r = - .12, p = 34)$ or MLAT_4 scores $(r = .01, p = 92)$ . The scores from the two explicit learning aptitude tests were significantly correlated $(r = .27, p = .03)$ , suggesting that grammatical sensitivity and inductive language learning ability are related constructs.

Three multiple regression analyses were conducted separately on the composite score of definiteness and mass–count (RQ1) and the separate scores for each (RQ2). Inspection of the data showed no violation of multicollinearity (which would be indexed by VIF less than 10 and tolerance above .02 (Field, Reference Field2009). First, a multiple regression analysis was conducted on the combined eye-tracking scores. The omnibus test revealed that the model was not significant: $F (3, 51) = 1.93, p = .14, R^{2} = .10$ . Regression coefficients in the multiple regression model are presented in Table 15.3. Results showed that the SRT score was the only significant predictor $(β = .29, p = .04)$ . None of the other predictors were statistically significant $(p > .1)$ .

Table 15.3

Multiple regression results

Composite of Definiteness and Mass–Count $(n = 55)$

	B	SE	β	t	p	Partial-r
SRT	.07	.03	.28	2.13	.04	.29
LLAMA F	.01	.02	.09	.64	.52	.09
MLAT_4	.01	.02	.07	.48	.63	.07

Definiteness $(n = 60)$

	B	SE	β	t	p	Partial-r
SRT	.08	.03	.36	2.91	.01	.36
LLAMA F	.01	.01	.08	.61	.54	.08
MLAT_4	−.01	.02	−.09	−.67	.51	−.09

Mass–Count $(n = 55)$

	B	SE	β	t	p	Partial-r
SRT	.01	.03	.05	.34	.74	.05
LLAMA F	.00	.01	−.02	−.14	.89	−.02
MLAT_4	.03	.0	.21	1.43	.16	.20

Note. B and β indicate the unstandardized and standardized regression coefficients, respectively.

Next, a multiple regression analysis was conducted on the eye-tracking scores for definiteness. The omnibus test revealed that the model was significant: $F (3, 56) = 2.90, p = .04, R^{2} = .13$ . The SRT score was the only significant predictor $(β = .36, p = .01)$ . Again, none of the other predictors were statistically significant $(p > .1)$ .

Finally, a multiple regression analysis was conducted on the eye-tracking scores for the mass–count distinction. The omnibus test revealed that the model was not significant, $F (3, 51) = 0.81, p = .50, R^{2} = .05$ . None of the predictors were statistically significant $(p > .1)$ . The largest standardized coefficient was for MLAT_4 $(β = .21, p = .16)$ .

Discussion

The Role of Explicit and Implicit Learning Aptitudes in Adult SLA

The first RQ addressed to what extent explicit and implicit learning aptitude contributed to the attainment of real-time grammar processing. The findings indicate that aptitude for implicit learning (SRT task) – not explicit learning aptitudes (LLAMA F and MLAT_4) – significantly predicted the overall sensitivity to the distinctions in the English noun phrase.

As hypothesized at the outset of this study, a positive relationship was detected between implicit learning aptitude and real-time comprehension of grammatical structures, which is argued to index implicit knowledge (Suzuki, Reference Suzuki2017). With virtually no overlap of the measurement metrics between eye movements (the visual-world task) and RT (the SRT task), their systematic correlation can be construed as empirical evidence – more convincing than what was found in prior research using the RT-based word-monitoring task (Granena, Reference Granena, Granena and Long2013b; Suzuki & DeKeyser, Reference Suzuki and DeKeyser2015; see also Godfroid & Kim, Reference Godfroid and Kim2021) – that aptitude and language tests tap into a common underlying ability pertaining to implicit learning. In contrast, there was no significant association between explicit learning aptitude and the visual-world task performance. Unlike the lingering ambiguous interpretations of GJT scores as explicit, implicit, or a mixture of both (Godfroid et al., Reference Godfroid, Loewen and Jung2015; Vafaee et al., Reference Vafaee, Suzuki and Kachinske2017), the current finely tuned eye-tracking measure was useful to demonstrate that explicit learning aptitudes played no significant role in the acquisition of implicit knowledge.

The sensitivity index from the visual-world task, however, may need to be interpreted with caution as an indicator of implicit knowledge. Unlike regular visual-world task design, which does not include any ungrammatical sentences (e.g., see Suzuki, Reference Suzuki2017, in which predictive sentence processing based on Japanese case-markers and classifiers were examined without using ungrammatical sentences in the visual-world task), the current task included ungrammatical noun phrases (e.g., “two bacon”) and infelicitous use of the definite article. This might have raised learners’ awareness of the target structures. Yet, according to the debriefing results, only a few participants noticed infelicitous/ungrammatical target(s) in the auditory sentences. The majority of the participants remained unaware of the linguistic target as they were performing the visual-world task. The aptitude–outcome correlation essentially remained the same, regardless of learners’ post-task self-reported awareness (see Appendix F in Supplementary Materials). This finding, combined with the systematic correlation with implicit learning aptitude, rather than explicit learning aptitude, is promising in the sense that the visual-world task can tap into automatic linguistic processing with little or no awareness (i.e., implicit knowledge).

A broader picture of the present findings illustrates the significant effect of implicit learning aptitude AND no effects of explicit learning aptitude on the acquisition of implicit knowledge. In our view, this overall pattern is consistent with the major claim in the SLA field that there are independent routes of explicit and implicit learning (e.g., Hulstijn, Reference Hulstijn2002; Krashen, Reference Krashen1985; Paradis, Reference Paradis2009). A systematic correlation between implicit aptitude and grammatical knowledge suggests that adult L2 learners can still use implicit learning mechanisms to acquire L2 grammatical properties to the extent they can be used rapidly – and possibly without awareness at some stage in the learning process.

Having said that, interpreting a systematic association of eye-tracking measures with implicit (but not explicit) aptitude may not always be straightforward (DeKeyser & Li, Reference DeKeyser and Li2021; cf., Godfroid & Kim, Reference Godfroid and Kim2021). First, the grammatical knowledge that was retrieved for the current visual-world task may not necessarily be the same as the knowledge that is initially acquired by recruiting explicit or/and implicit aptitudes (DeKeyser & Li, Reference DeKeyser and Li2021). In order to shed light on the developmental processes, a longitudinal study is urgently needed to examine the differential effects of explicit and implicit aptitudes in early and late stages of L2 learning (e.g., Kim, Reference Kim2020; Li & DeKeyser, Reference Li and DeKeyser2021). Second, even if the two learning mechanisms operate independently, explicit learning aptitude (and mechanism) could have “indirectly” influenced the acquisition of implicit knowledge. For instance, Li and DeKeyser (Reference Li and DeKeyser2021) argue that “explicit aptitude may contribute to implicit knowledge indirectly by providing fodder (declarative knowledge) for implicit learning” (p. 479). Hence, the lack of explicit aptitude effect does NOT lead to the conclusion that the role explicit learning plays in adult SLA is marginal. The empirical question still remains to what extent explicit knowledge (i.e., the product of explicit learning) influences the acquisition of implicit knowledge (i.e., the interface issue). It is conceivable that explicit learning aptitude is related to explicit knowledge, and explicit knowledge further plays facilitative or essential roles in the acquisition of implicit knowledge, as demonstrated by the structural equation modeling in Suzuki and DeKeyser’s (Reference Suzuki and DeKeyser2017) study.

The SLA field has just started rigorous empirical investigations into how different individuals recruit explicit and implicit learning mechanisms – through a set of cognitive aptitudes – for attaining explicit and implicit knowledge in naturalistic settings. These complex interfaces among explicit–implicit learning and knowledge, presumably mediated by a set of cognitive aptitudes, need to be further understood by a more rigorous study from multiple perspectives.

Implicit Learning Aptitude Predicted the Learning of “Difficult” Grammatical Structure

Regarding the second RQ, the current findings indicated that implicit learning aptitude was a significant predictor of the acquisition of definiteness, but not the mass–count distinction.Footnote ³ The relatively easier mass–count distinction was acquired so easily that it might have led to the diminished effects of explicit aptitude, while the acquisition of definiteness imposed much more burdensome demands and might have necessitated the compensatory role of aptitude (cf., DeKeyser, Reference DeKeyser2016). Although the implicit aptitude test (SRT) primarily requires domain-general sequence learning of “dots” that appear on the computer screen and thus requires no semantic processing component, it is surprising that the SRT task consistently predicted the acquisition of L2 implicit knowledge (weak to moderate correlations ranging from r = .36 to .43) across different studies targeting different structures – agreement structures (Granena, Reference Granena2013a), case-markers (Suzuki & DeKeyser, Reference Suzuki and DeKeyser2015), and definiteness in this study (see Table 15.4).

Table 15.4 Summary of previous research on implicit learning aptitude

	Granena (Reference Granena, Granena and Long2013b)	Suzuki & DeKeyser (Reference Suzuki and DeKeyser2015)	Current Study
Participants	50 L2 Spanish learners	63 L2 Japanese learners	65 L2 English learners
Age of Arrival	>16 years old	>18 years old	>15 years old
Length of Residence	101 months	55 months	46 months
L1	Chinese	Chinese	Chinese
Language Test	Word-monitoring task	Word-monitoring task	Visual-world task
Target Structures	3 agreement structures 3 morpho-semantic structures	5 Japanese particles	Definiteness Count/Mass
Aptitude test	SRT task	SRT task	SRT task LLAMA F MLAT_4
Results	SRT was only related to agreement structures (r = .36, p < .05)	SRT was related to five Japanese particles (r = .43, p < .05)	SRT was only related to definiteness (r = .36, p < .05)

All three studies recruited Chinese speakers, whose L1 has no overt morphological marking; all L2 grammatical structures that were significantly facilitated by the implicit learning aptitude have no equivalent grammatical system to their learner’s L1.Footnote ⁴ These incongruent L2 structures are notoriously difficult for integration into learners’ L2 linguistic system and automatization for real-time comprehension (Jiang et al., Reference Jiang, Novokshanova, Masuda and Wang2011, Reference Jiang, Hu, Chrabaszcz and Ye2015; Roberts & Liszka, Reference Roberts and Liszka2013). The role of implicit learning aptitude may become more important when L2 learners are left with no congruent L1 feature to fall back on. More specifically, implicit learning aptitude becomes more important when new semantic distinctions are to be learned (cf., Murakami & Alexopoulou, Reference Murakami and Alexopoulou2016). In the case of learning the definiteness distinction, it is speculated that more efficient sequence learning may facilitate accumulating determiner–noun sequences from the input in the first place (N. C. Ellis, Reference Ellis and Rebuschat2015). The initial statistical tallying of co-occurrence of determiner and noun pairs essentially corresponds to the nature of implicit learning assessed in the probabilistic SRT task. The higher sequence learning ability might have promoted rapid association of the determiner–noun pair across different pieces of discourse. Subsequently, these determiner–noun sequences committed to memory need to be abstracted through more complex mappings of the determiners with the semantics of definiteness, which may necessitate different cognitive abilities beyond what is tapped by the SRT task.

With regard to the contribution of explicit aptitude, no systematic relation was found for either definiteness or mass–count distinction. Definiteness involves a very abstract pattern of form–meaning mapping that requires the learning of novel semantic distinctions for Chinese speakers, which might have made explicit learning very difficult (DeKeyser, Reference DeKeyser2005; Robinson, Reference Robinson2005). Possibly, the absence of a role for explicit aptitude might have left more room for other predictors (e.g., implicit aptitude). While definiteness might have been too hard to learn explicitly, let alone to automatize, the mass–count distinction might have been easy enough to neutralize the effects of aptitude (DeKeyser, Reference DeKeyser2016). Since the mass–count distinction is salient for Chinese learners (e.g., by virtue of a similar classifier system), learners were automatically able to tune into the relevant features in input (Leung & Williams, Reference Leung and Williams2014).

Suggestions for Future Research: Aptitude and Language Test Development

The current study opens up new lines of investigation into explicit and implicit learning processes in adult SLA. First and foremost, we call for further research addressing the validity of the explicit and implicit measures for both aptitudes and linguistic knowledge. The current study did not include any tests for explicit knowledge, and it is important to more comprehensively examine the relationships between explicit and implicit knowledge and aptitudes (see Suzuki & DeKeyser, Reference Suzuki and DeKeyser2017).

For measures of implicit knowledge, a wider variety of target structures should be tested in the visual-world task. While the current study only targeted two easy and difficult structures, a wider variety of linguistic structures can be examined through a useful operationalization of grammatical difficulty, such as saliency (Gass et al., Reference Gass, Spinner and Behney2017). Because explicit learning processes are more influenced by the saliency of linguistic structures than implicit learning processes, it is highly relevant to investigate the roles of explicit and implicit aptitudes for salient and non-salient linguistic structures (see, e.g., DeKeyser et al., Reference DeKeyser, Alfi-Shabtay, Ravid, Shi, Gass, Spinner and Behney2017, who demonstrated that adult learners, presumably relying more on explicit learning mechanisms, found it extremely difficult to acquire the low salient grammatical structures).

Although a majority of L2 learners remained unaware of the target grammatical structures tested in this study, it is possible that some participants simply could not articulate what they noticed. One way to solve that potential problem might be to have participants agree or disagree with a list of comments, including simple descriptions of the infelicity that they could have noticed. Nonetheless, the visual-world task can be devised without any ungrammatical sentences, which can help further reduce the risk of raising awareness (see Suzuki, Reference Suzuki2017). Although it requires more laborious work to develop a visual-world task than RT tasks, it is applicable to wider L2 populations, including children.

For measures of aptitudes, the field can avail itself of a new aptitude test battery, such as the High-Level Language Aptitude Battery (e.g., Linck et al., Reference Linck, Hughes and Campbell2013). The aptitude measure for implicit learning in the current study was the SRT task, which focuses on probabilistic/statistical sequence learning; other aspects of implicit learning should be examined. Future research could thus scrutinize the role of more specific forms of implicit learning (Granena, Reference Granena2019, Reference Granena2020). Interestingly, in the current study, MLAT_4 (grammatical sensitivity in L1) may seem to be related to the acquisition of the mass–count distinction, the learning of which is assumed to rely on a similar L1 structure, more strongly than LLAMA F (inductive language learning ability). While grammatical sensitivity and inductive language learning ability are subsumed under the larger construct “language-analytic ability” (Skehan, Reference Skehan and Robinson2002), it may be useful to further examine how different subcomponents of explicit aptitude are more or less related to different grammatical structures. The reliability of the aptitude tests was acceptable (e.g., Granena, Reference Granena and Long2013; Suzuki & DeKeyser, Reference Suzuki and DeKeyser2017) but not very high in this study, which points to the need for improvement in the test instruments’ reliability and validity. We hope research on both language and aptitude tests can advance hand in hand to enable better understanding of L2 learning processes.

Conclusions

The aim of the current study was to investigate the extent to which explicit and implicit learning aptitudes influence the attainment of real-time processing by adult L2 English learners with L1 Chinese. By employing the visual-world task, we assessed real-time processing of definiteness and the mass–count distinction with little contamination from explicit knowledge. The post-task debriefing results showed that a majority of L2 learners were unaware of the target grammatical structures tested, which lends support to the use of linguistic knowledge without awareness (i.e., implicit knowledge). Implicit learning aptitude was particularly important for the acquisition of definiteness, but not for the acquisition of the mass–count distinction. These findings underscore the importance of understanding L2 learning processes underlying different structures through the lens of explicit and implicit learning aptitudes.

16 Implicit Statistical Learning and Second Language Outcomes A Bayesian Meta-Analysis

Introduction

Recent theoretical advances viewing language aptitude as multifaceted (e.g., Skehan, Reference Skehan, Granena, Jackson and Yilmaz2016) have led to accumulated evidence on certain constructs beyond those measured by traditional aptitude tests. It has been suggested that sensitivity to language-like regularities in auditory or visual stimuli may constitute an ability relevant to adult L2 development and that such implicit processing is the default mode of adult second language acquisition (SLA) (Granena, Reference Granena2020; Long, Reference Long2015). In this chapter, we review terminological and conceptual issues related to implicit and statistical learning ability, consider its putative role in adult SLA, and present evidence pertaining to the relationship between performance on implicit and statistical learning tasks and L2 outcomes based on a meta-analysis. The chapter concludes with a discussion, noting implications for language aptitude research and SLA theory.

The first step in investigating the potential role of implicit and statistical learning mechanisms in L2 aptitude is to understand the somewhat dynamic nature of the constructs at hand. In this section, we present relevant definitions with the goal of clarifying the mechanisms claimed to underlie implicit statistical learning (ISL). The terminology addressed includes implicit learning, statistical learning, and ISL. For each term, we briefly review associated definitions, underlying processes, and learning outcomes. We then consider the evolving construct of implicit language aptitude (ILA) in order to link ISL to a nuanced understanding of the language learning abilities drawn upon in adulthood.

Terminology and Conceptual Issues

Beginning in the late 1960s, Reber applied the label implicit learning to how adults learning artificial grammars “can become sensitive to the statistical nature of their environment without using explicit or verbalizable strategies” (Reference Reber1967, p. 863). In this view, implicit learning is an unconscious process resulting in abstract knowledge. In theory, implicit learning is different from explicit learning in terms of being less affected by disorders or dysfunctions, less influenced by age, less variable across individuals, unrelated to intelligence, and more similar across species (Reber, Reference Reber1993, p. 88). The processes or mechanisms underlying implicit learning are attention to the input and unconscious generalization across exemplars. It is worth noting in particular that attention does not entail awareness in this case. Adult L2 learning outcomes of implicit learning involve abstract phonological (Chan & Leung, Reference Chan and Leung2014), morphological (e.g., Williams, Reference Williams2005), or syntactic (e.g., Rebuschat & Williams, Reference Rebuschat and Williams2012a) rules in the absence of awareness and without intention to learn the rules (for a book-length treatment, see Rebuschat, Reference Rebuschat2015). The notion of learning L2 constructions, or form–meaning mappings, without awareness has been referred to as semantic implicit learning (Paciorek & Williams, Reference Paciorek and Williams2015).

During the 1990s, Saffran, Newport, and Aslin conducted research describing how adults perform the task of word segmentation, concluding that statistical learning mechanisms help to “extract the regularities of natural language” (Reference Saffran, Newport and Aslin1996, p. 619). According to these authors, statistical learning (1) is based on frequency or probability, (2) occurs through exposure alone (i.e., with no feedback), and (3) operates in conditions where stimuli are presented rapidly, as in natural speech (Aslin & Newport, Reference Aslin, Newport, Colombo, McCardle and Freund2009). The link between statistical learning and SLA may be understood in terms of four principles proposed by Onnis (Reference Onnis, Rebuschat and Williams2012): integrating probabilistic sources of information, seeking invariance among exemplars to make generalizations, reusing a small number of learning mechanisms, and learning to predict. This assumption of a small number of mechanisms, which would build the case for a domain-general ability, is a matter of ongoing research (see Frost, Armstrong, & Christiansen, Reference Frost, Armstrong and Christiansen2019; Siegelman & Frost, Reference Siegelman and Frost2015). In addition to word segmentation, the outcomes of statistical learning experiments with adults include, among others, detection of nonadjacent dependencies via phonological characteristics (Onnis et al., Reference Onnis, Monaghan, Richmond and Chater2005) and identification of phrasal structures via partial overlap of successive stimuli (Onnis, Waterfall, & Edelman, Reference Onnis, Waterfall and Edelman2008). Its rapidly growing research base has positioned statistical learning (i.e. learning of patterns based upon multiple recurrent stimuli across time or space) as a promising construct to be integrated into various domains in cognitive science, including language acquisition (Frost, Armstrong, & Christiansen, Reference Frost, Armstrong and Christiansen2019). (For further discussion in the context of L2 research, see Rebuschat & Williams, Reference Rebuschat and Williams2012b.)

Recently, multiple authors have suggested a merger of implicit and statistical learning. According to Conway et al. (Reference Conway, Bauernschmidt, Huang and Pisoni2010), “implicit learning and statistical learning … refer to the same underlying phenomenon: inducing structure from input following exposure to multiple exemplars” (p. 357; see also Perruchet & Pacton, Reference Perruchet and Pacton2006). In both the implicit and statistical learning literatures, access to input is assumed to drive inductive learning, or more specifically, the discovery of particular generalizations inherent to stimuli exhibiting properties found in language. Thus, the ISL appellation can be regarded as an umbrella term (Frost, Armstrong, & Chrstiansen, Reference Frost, Armstrong and Christiansen2019) or a unifying term (Christiansen, Reference Christiansen2019), which covers experimental research carried out using learning paradigms adopted by Reber, those paradigms developed by Saffran, Newport, and Aslin, and other, newer paradigms. It should be acknowledged that the unitary nature of implicit and statistical learning is debated and that key differences in these two traditions exist. For example, statistical learning studies may exclude measures of awareness typically found in studies of implicit learning (Hamrick & Rebuschat, Reference Hamrick, Rebuschat, Williams and Rebuschat2012), whereas research on implicit learning may not manipulate distributional information in the same manner as statistical learning research (Onnis, Reference Onnis, Rebuschat and Williams2012). Interestingly, though, Christiansen (Reference Christiansen2019) also argued for a deeper similarity in the underlying processes involved. In his view, ISL does not involve a dedicated mechanism but instead relies on basic memory processes, such as chunking. As will be shown, this distinction between learning versus memory is also highly relevant to the discussion of ILA.

The most authoritative discussion of the emerging construct of ILA comes from Granena (Reference Granena2020; see also Li & DeKeyser, Reference Li and DeKeyser2021). Building on a long-standing view of L2 aptitude as multi-componential, one recent trend has been to reclassify aptitude measures according to two contrasts: domain specificity vs. generality and explicit vs. implicit processes (Skehan, Reference Skehan, Granena, Jackson and Yilmaz2016). ILA focuses on the role of measures of implicit cognitive ability in predicting L2 learning outcomes under naturalistic and instructed conditions (Granena, Reference Granena2020). Such aptitude is itself multifaceted. Based on empirical research, Granena proposed a model of ILA including the two factors of implicit learning ability and implicit memory ability, with the latter comprising four nondeclarative memory systems: procedural, associative, nonassociative, and priming (Reference Weiss, Schwob and Lebkuecher2020, pp. 13–15). This work also offered a detailed compendium of cognitive tasks that may measure ILA. Clearly, this represents a major advance in discussions of the potential role of implicit processes in L2 learning ability. Nonetheless, issues remain regarding measurement of ILA because there is no solid consensus on precisely which tasks tap the ILA construct, let alone its subcomponents (see Godfroid & Kim, Reference Godfroid and Kim2021 on this issue). Moreover, the reliability of certain measures may undermine attempts to establish construct validity. Finally, regarding the issue of domain-generality, it appears that ISL tasks relying on auditory versus visual modalities do not yield similar results within participants (Granena, Reference Granena2020; Siegelman & Frost, Reference Siegelman and Frost2015; see, for discussion, Frost et al., Reference Frost, Armstrong, Siegelman and Christiansen2015), which raises further quandaries for measurement.

The value of a research synthesis is to focus on areas where findings have accumulated in order to summarize and integrate them with relevant theory. Examples of meta-analyses in the area of cognitive IDs in adult SLA include Linck et al. (Reference Linck, Osthus, Koeth and Bunting2014), who reported the relationship between working memory and various learning outcomes in terms of the population estimate ( $ρ = .25$ [.21, .29]) and Li (Reference Li2015), who found an overall moderate relationship between language aptitude and grammar learning ( $r = .31$ [.25, .36]). To our knowledge, this chapter is the first attempt to quantify the relationship between ISL and L2 outcomes using meta-analytic techniques.

At its initial stage, a synthesis needs to formulate the problem, offer clear definitions of constructs, specify the nature of their relationship, and determine the conceptual relevance of studies (Cooper, Reference Cooper2017). As noted, the present study is concerned with ISL and its correlation with adult SLA. While defining and identifying connections between variables might seem straightforward, the issue of conceptual relevance is slightly more complicated. Conceptual relevance involves establishing which studies pertain to the review. In this review, we decided to place emphasis on studies of the component of ILA that Granena (Reference Granena2020) called implicit learning ability, and that we term ISL. There were two reasons for our decision. First, the theory and evidence to support claims of implicit learning versus implicit memory are associated with distinct research traditions, making it especially difficult to combine them in a single review. Second, a recent review by Hamrick, Lum, and Ullman (Reference Hamrick, Lum and Ullman2018) has investigated the roles of declarative and procedural memory in adult language learning. Because procedural memory is among the subcomponents of implicit memory ability posited by Granena (Reference Granena2020), Hamrick and colleagues provided useful insight into this dimension of aptitude. By contrast, our goal was to offer a complementary synthesis specifically targeting studies claiming to investigate ISL (as opposed to implicit memory). This decision meant that potentially more studies could be included, even if some others featured in the meta-analysis led by Hamrick would be excluded. In this way, our study sought to occupy a clear, well-defined gap in the research literature. In the following section, a further rationale for focusing on the ISL–SLA relationship is described to clarify the potential value of this study.

The Role of ISL in Adult L2 Learning

Although it has been asserted that implicit learning and incidental L2 learning are “related but different processes” (Robinson, Reference Robinson2010, p. 260), others have argued that implicit learning is the default mechanism in adult SLA (Doughty et al., Reference Doughty, Campbell, Mislevy, Prior, Watanabe and Lee2010; Granena, Reference Granena2020; Long, Reference Long2015). The reasoning behind this claim is that learning a complex system such as an L2 involves a default, implicit processing mode, which may be overridden, at times, by an explicit mode. Several qualifications have been stated regarding this implicit processing: (1) It is only after a new form is noticed (Schmidt, Reference Schmidt1990) that the implicit mode becomes the default process for subsequent exemplars in the input (Long, Reference Long2015, pp. 51–52); (2) its effectiveness in adults may be limited to adjacent items (e.g., chunks, collocations) (Long, Reference Long2015, p. 46); (3) it is especially relevant to reaching advanced levels of an L2 (Doughty et al., Reference Doughty, Campbell, Mislevy, Prior, Watanabe and Lee2010, p. 18); and (4) it decreases markedly in adolescence and then gradually in later years (Granena, Reference Granena2020; Long, Reference Long2015). Support for implicit processing includes experiments revealing apparently unaware learning of L2 rules (e.g., Rebuschat & Williams, Reference Rebuschat and Williams2012a). However, another potential source of evidence for (or against) the role of implicit learning in SLA is within-subject studies investigating the relationship between ISL ability and L2 outcomes, which we review here. The assumption is that tasks designed to measure ISL ability should correlate, to a meaningful extent, with L2 performance. L2 studies have assumed that higher ISL ability may be predictive of speed and aptitude (Lee, Reference Lee2014), enhanced error sensitivity (Suzuki & DeKeyser, Reference Suzuki and DeKeyser2017), and, ultimately, higher proficiency (Linck et al., Reference Linck, Hughes and Campbell2013). This chapter reviews the evidence to address the claim that ISL plays a role in adult SLA.

If a connection can be established between ISL and L2 outcomes, then this relationship may be affected by several factors. As noted, it is believed that the default processing mode may operate differently for different age groups, linguistic outcomes, and proficiency levels. Therefore, we also seek to provide an initial assessment of how such moderator variables have been treated in ISL–SLA studies. Moreover, the measures of ISL used in this research constitute an important focal area in the development of new aptitude tests. Thus, we also aimed to provide a list of measures used in these studies. Based on the foregoing issues, we posed three research questions:

1. Which studies have examined the association between ISL tasks and L2 outcomes and what are the features (e.g., age groups, proficiency levels, L2 outcomes) of these studies?
2. Which measures have been used to investigate the construct of ISL?
3. What are the findings of studies investigating the correlation between ISL and L2 outcomes? Do the findings support the notion that ISL is positively related to L2 outcomes and, if so, what is the strength of this relationship?

Method

Inclusion Criteria

Based on the research questions, several criteria were established to determine which studies were relevant to the present investigation. These inclusion criteria evolved during the initial process of gathering reports for the study (see below). To be included in this synthesis, each retrieved study had to meet the following criteria:

1. The study used at least one task that the researcher(s) claimed measures ISL ability. A number of cognitive tasks have been employed to measure ISL, as we report later. However, instead of selecting studies on the basis of the tasks used, we chose to focus on the researchers’ intended construct. One reason was that, at times, the same task has been assumed to measure constructs that are, in theory, distinct. This is the case with the serial reaction time task, which is regarded by some researchers as an ISL task (e.g., Granena, Reference Granena2013) and by others as a measure of procedural memory ability (e.g., Hamrick, Reference Hamrick2015). In terms of L2 aptitude theory, as noted above, this restricted our focus to implicit learning ability, which contrasts with implicit memory ability (Granena, Reference Granena2020) and also procedural memory ability (Hamrick, Lum, & Ullman, Reference Hamrick, Lum and Ullman2018).
2. The researcher(s) directly measured the relationship between the ISL task(s) and one or more specific measures of L2 ability. This further limited the number of included studies because many ISL studies have explored its connection to L1 processing and learning (e.g., Kidd & Arciuli, Reference Kidd and Arciuli2016), or to the abilities possessed by bilinguals (e.g., Weiss, Schwob, & Lebkuecher, Reference Weiss, Schwob and Lebkuecher2020).
3. The study clearly indicated the participants’ age and reported data from post-sensitive-period learners. To keep the focus on adult L2 learners, studies with learners below a certain age, such as primary school students (e.g., Seipel, Reference Seipel2011), were excluded.
4. The study was based on a unique sample. In the case of studies based on a subsample of the data found in other reports, we selected only one report for inclusion after consideration of the full set of inclusion criteria. For instance, Yilmaz and Granena (Reference Yilmaz and Granena2019) was included instead of Granena and Yilmaz (Reference Granena and Yilmaz2019) because the former study included a larger sample and described the strength of the relationships between the key variables of interest.
5. The report had been published prior to October 2020, at which time coding for the present study commenced.

For the sake of a more inclusive analysis, we considered reports published as monographs, journal articles, book chapters, conference proceedings papers, and theses available through digital repositories. Looking further than studies published as journal articles is important owing to the file-drawer problem, i.e., “studies not finding a significant link between statistical learning and language performance might be less likely to reach publication” (Mor & Prior, Reference Mor and Prior2020, p. 686).

In addition to the above criteria to be included in the meta-analysis, studies needed to meet two other criteria:

6. Results directly indicated the strength of the relationship between ISL ability and one or more L2 outcomes, usually in terms of the correlation coefficient, r. In cases where these data were not included within the report itself, we examined supplementary information files available through journal websites.Footnote ¹
7. The sample size was clearly reported.

Study Retrieval

The process of retrieving and coding primary studies for this synthesis took place from August to October 2020. Four methods were utilized to conduct a comprehensive search of the literature (Cooper, Reference Cooper2017). First, we conducted formal searches of six reference databases (LLBA, ERIC, PsychInfo, Proquest, Web of Science, and Google Scholar) using the following keywords (in various combinations): statistical learning, implicit learning, artificial grammar, second language, and aptitude. Second, we perused relevant journals (e.g., Language Learning, Studies in Second Language Acquisition, Modern Language Journal) and edited collections (e.g., Granena, Jackson, & Yilmaz, Reference Granena, Jackson and Yilmaz2016; Rebuschat, Reference Rebuschat2015; Rebuschat & Williams, Reference Rebuschat and Williams2012b; Wen, Borges Mota, & McNeill, Reference Wen, Borges Mota and McNeill2015; Wen et al., Reference Wen, Skehan, Biedroń, Li and Sparks2019). Third, we reviewed backward citations to earlier studies found in retrieved documents as well as forward citations to newer studies using database functions. Fourth, we contacted L2 aptitude researchers by email to confirm that the search had been exhaustive.

This search led to the retrieval of 67 documents of potential interest. Based on Criteria 1 through 5 above, 44 of these documents were excluded. The most common reason was that many ISL studies have been conducted on L1 rather than L2 processing. The remaining 23 reports were coded and analyzed to answer Research Questions 1 and 2. Of these, two reports were excluded from the meta-analysis based on Criterion 6. Thus, 21 reports were subsequently used to answer Research Question 3. In the references list, studies included in the synthesis are marked with a single asterisk. Those included in both the synthesis and meta-analysis are marked with a double asterisk.

Coding Procedures

The research questions and inclusion criteria described above informed the development of the coding sheet used to record the study data. We determined a number of features to be entered into this spreadsheet prior to evaluating reports. These features included detailed information regarding each report (date, source, number of studies included), study design, sample (L1, target language, proficiency in target language, sample size, descriptive data regarding age), predictor variable (construct definition, ISL measure(s), input modality, response type, reliability), outcome variable (L2 measure(s), reliability) and statistical result (statistic used, value, significance). More specifically, target language referred to languages that the participants had previously studied as well as those with which they had no prior experience. ISL measures were coded according to their description, including whether the input modality in the training phase was auditory or visual and whether scoring during the test phase was based on speed or accuracy. Additional rows were added in the case of multiple L2 outcomes.

Inter-Rater Reliability

To demonstrate the reliability of the study coding, both authors independently coded a subsample consisting of five randomly selected reports (approximately 20% of the entire sample). The overall agreement ratio was 95%. Table 16.1 presents the inter-rater reliability separately for those categories central to our research questions. Afterwards, any inconsistencies were discussed and resolved. It was also decided that the expected direction of the effect (i.e., a negative or positive correlation) should be coded, as this pertained to the analyses for Research Question 3. The remaining data were coded by the first author.

Table 16.1 Coding reliability for key study features

Coding category	Number of decisions	% Agreement	Cohen’s kappa
First language	5	100.00	1.00
Target language (TL)	6	100.00	1.00
TL proficiencyFootnote ²	5	80.00	0.75
Sample size	6	100.00	1.00
Average age	6	100.00	1.00
ISL measure	5	80.00	0.71
L2 measure	18	94.44	0.94
Statistic reported	17	100.00	1.00
Value of statistic	17	94.12	0.94
TOTAL	85	95.21	0.95

Synthetic Methods

The current meta-analysis was based on 21 reports (see those marked with a double asterisk in the references list), generating 82 effect size estimates (i.e., rs) from 1,398 participants. Of these, two effect sizes were not reported, and seven were reported in the form of (un)standardized regression coefficients without enough information to convert them to correlations. After confirming that these data points were indeed unavailable (by contacting the authors of the primary studies), we decided to treat them as missing values and to replace them with $r = 0$ (Pigott, Reference Pigott, Cooper, Hedges and Valentine2009).

Our statistical analysis consisted of two steps. First, we pooled all 82 effect sizes together and obtained the overall population estimate of the relationship between ISL and adult SLA. Note that most of the reports contributed multiple effect sizes, often based on the same participant samples. In the current dataset, 19 reported more than one effect size and 16 did so on the basis of a single sample. In analytical terms, this meant that our dataset comprised a multilevel structure, with data points nested within participant samples and samples nested within study reports. When faced with this kind of data dependence, meta-analyses can either (a) ignore the dependence, (b) average effect sizes within studies, (c) select an estimate per study, or (d) explicitly model the dependence in statistical models (Cheung, Reference Cheung2014). We deemed the first three approaches unjustified because they either violate the assumption of data independence or lose (potentially) important information by data reduction. We thus avoided using a traditional random-effects model (Hedges & Vevea, Reference Hedges and Vevea1998) as it did not account for a nested data structure. Rather, we adopted the last approach, of statistically modeling data dependence, and estimated the population estimate using Bayesian multilevel random-effects models (Gelman et al., Reference Gelman, Carlin and Stern2013; Harrer et al., Reference Harrer, Cuijpers, Furukawa and Ebert2019).

As correlation coefficients are known to be non-normally distributed, we first transformed all effect sizes into Fisher’s z and then synthesized the results (e.g., Hedges & Olkin, Reference Hedges and Olkin1985; Shadish & Haddock, Reference Shadish, Haddock, Cooper, Hedges and Valentine2009). When we present our synthesis findings, however, we discuss the results on the basis of the original r metric to facilitate interpretation. As with traditional random-effects models, we assumed each data point (i.e., Fisher’s z) to come from a normal distribution centered on $θ_{i, j}$ and dispersed with a standard deviation, $σ_{i, j}$ :

z_{i, j} ~ Normal (θ_{i, j}, σ_{i, j})

.

Here, z_i,j denotes ith data point in jth study and $θ_{i, j}$ can be considered as the true estimate of z_i,j after accounting for measurement error, $σ_{i, j}$ . Note that $σ_{i, j}$ was a known variable because the standard error of Fisher’s z can be approximated using the sample size (see Shadish & Haddock, Reference Shadish, Haddock, Cooper, Hedges and Valentine2009). We then modeled the true estimates of data points, $θ_{i, j}$ , as drawn from a normal distribution with the mean of $μ + a_{i} + a_{j}$ and with the variation expressed by $τ$ :

θ_{i, j} ~ Normal (μ + a_{i} + a_{j}, τ)

where $μ$ is the population estimate of the correlation efficient, and thus our target estimand, and $a_{i}$ and $a_{j}$ expressed how much each $θ_{i, j}$ deviated from the true population estimate $(μ)$ due to unobserved (random) factors specific to each data point $(a_{i})$ or to each study $(a_{j})$ . Our models are thus multilevel in the sense that they accounted for how much each data point varied at the study level as well as at the data level. Although our dataset actually comprised a three-level dependence $(data point < sample < study)$ , we combined the upper two levels because most of the 21 reports (85%) were solely based on a single sample.

In the second step of our analysis, we investigated whether there were any moderator variables that could mitigate the relationship between ISL and adult SLA. In doing so, we once again reviewed our coded dataset and identified potential features to use in the moderator analysis. We did not include the participants’ L1s and target languages because there were so many values those variables could realize that statistical analysis was not possible. The final set of moderator variables included:

Proficiency in the target language: None, Intermediate, or Advanced
Input modality of ISL measures: Auditory or Visual
Scoring methods of ISL measures: Accuracy-Based or Speed-based
Test modality of L2 outcome measures: Comprehension or Production
Scoring methods of L2 outcome measures: Accuracy-Based or Reaction Time (RT)-Based

When grouping the studies to a given level, there were some instances that could not be categorized in these ways. For instance, McDonough et al. (Reference McDonough and Trofimovich2016) measured the occurrence of syntactic priming (i.e., frequency) as their L2 outcome measure, and this was neither accuracy based nor RT based. We excluded those isolated studies from the moderator analysis. We investigated the effect of each moderator with a meta-regression approach. Specifically, we added each moderator to the aforementioned multilevel model as a predictor and examined whether the population estimate $(μ)$ could reliably differ at two (or more) levels of the moderators.

Throughout the analysis, we fitted our models using the R-package brms (version 2.12.0, Büerkner, Reference Büerkner2017), which provided an interface to fit Bayesian models using Stan (version 2.18.0, Stan Development Team, 2018), a probabilistic programming language for full Bayesian inference and optimization. In Bayesian analysis, prior knowledge, in the form of probability distributions, is combined with observed data to produce posterior distributions (see Norouzian, de Miranda, & Plonsky, Reference Norouzian, de Miranda and Plonsky2018 for a brief review). As this study was the first investigation that attempted to synthesize the relationship between ISL and adult SLA, we assigned a weakly informative prior to all parameters we estimated. The posterior distributions were estimated through Markov chain Monte Carlo simulation from four chains of 50,000 iterations each, with a warmup period of 10,000 iterations and the amount of thinning being 2 to reduce auto-correlation of the posterior samples. To check whether each chain converged into model parameters with a stationary distribution, we monitored whether the value $R_{hat}$ associated with each parameter (as a convergence index) was within the range of $1 \leq R_{hat} \leq 1.1$ (Gelman & Rubin, Reference Gelman and Rubin1992).

Results

Research Question 1: Studies of the ISL–SLA Relationship

The entire set of 23 reports contained 25 separate studies of the ISL–SLA relationship. In answering Research Questions 1 and 2, these study-level cases are the unit of analysis. Here, several key features of these studies are described: participant age, language background, and L2 outcomes.

Participant Age

The age of participants was reported in terms of its average and/or range. Among the 23 studies reporting the average, the overall mean age of participants was 23.05. In the 16 studies reporting the range, participants spanned from 16 to 82 years old. This wide range was justified by the study goals. For instance, Cox (Reference Cox2013) sought to examine learning outcomes among younger (18–27) versus older (60–82) adults while considering ISL as a mediator. Despite better performance by the younger group on a serial reaction time (SRT) task and regardless of ISL ability, participant age predicted outcomes. We note that this valuable study was nevertheless excluded from our meta-analysis because its aim was assessing ISL as a potential mediating and moderating variable rather than probing its direct relationship to SLA.

For the subset of 21 reports (containing 23 studies) analyzed to answer Research Question 3, participant’s ages ranged from 16 to 52 years old $(M = 22.46)$ .

Language Background

The first languages of participants in the reported samples included Chinese, Dutch, English, Hebrew, Japanese, Korean, and Thai. Two studies reported on participants from a variety of language backgrounds. Another study focused on Russian–Hebrew multilinguals (Degani & Goldberg, Reference Degani and Goldberg2019). Two further studies did not clearly indicate participants’ first, or dominant, languages.

Most studies investigated the role of ISL in target languages already known to participants, including English, French, Hebrew, Japanese, and Spanish. Several others were miniature language learning studies involving controlled exposure to previously unknown languages, such as Arabic, Esperanto, Fijian, Latin, Russian, and Samoan. In the case of languages known to the participants, the assessment of proficiency is relevant. Based on Thomas (Reference Thomas, Norris and Ortega2006), means of assessing proficiency included standardized measures (e.g., CEFRFootnote ³ levels or TOEFL scores), questionnaires (e.g., LEAP-Q, Kaushanskaya, Blumenfeld, & Marian, Reference Kaushanskaya, Blumenfeld and Marian2020), institutional status (e.g., prior coursework in the language), and other instruments (e.g., a cloze test). Often, several of these strategies were combined in a single study. In some cases, proficiency assessment served not only to provide context but also to establish requirements for participation (e.g., JLPT Level 1 in Suzuki & DeKeyser, Reference Suzuki and DeKeyser2017). In sum, L2 proficiency levels examined in this dataset ranged from absolute beginner to highly advanced.

L2 Outcomes

A plethora of measures were used to examine L2 performance, which can be classified broadly into comprehension and production. Comprehension-based tasks included those involving acceptability judgments as well as measures of response time during tasks that focused participants on meaning. Examples included timed or untimed grammaticality judgment tasks (e.g., Ćurčić, Reference Ćurčić2018; Robinson, Reference Robinson2005), self-paced reading tasks (e.g., Chui, Reference Chui2017; Lee, Reference Lee2014), and picture-identification tasks (McDonough & Trofimovich, Reference McDonough and Trofimovich2016). In the word-monitoring task (Granena, Reference Granena2013; Suzuki & DeKeyser, Reference Suzuki and DeKeyser2015, Reference Suzuki and DeKeyser2017), participants were instructed to respond immediately upon hearing a target word located after an element that was either grammatical or ungrammatical, then answer yes/no questions. The RT increase when responding to ungrammatical sentences was used as a measure of implicit L2 knowledge.

Production-based tasks elicited specific structures or translation equivalents, or freer performance during communicative tasks. Examples here included guided oral production tasks eliciting case or gender markers (Brooks & Kempe, Reference Brooks and Kempe2013; Brooks, Kwoka, & Kempe, Reference Brooks, Kwoka and Kempe2017; see also Robinson, Reference Robinson2005) and tests requiring spoken translations of presented words (Degani & Goldberg, Reference Degani and Goldberg2019). Two studies used less-controlled production during oral communication tasks. In these studies, scoring focused on complexity, accuracy, and fluency (Granena, Reference Granena2019) and relative clause production (McDonough et al., Reference McDonough, Kielstra, Crowther, Smith, Mackey and Marsden2016). Finally, one study used GCSE scores, which combine assessments of listening, speaking, reading and writing in a foreign language (Kaufman et al., Reference Kaufman, DeYoung and Gray2010).

Research Question 2: Measures of ISL

All but one of the 25 studies employed a single measure of ISL. Impressively, Kerz and Wiechmann (Reference Kerz, Wiechmann, Pastor and Mitkov2019) used four different measures of this construct. Thus, 28 administrations of ISL tasks occurred in the studies reviewed. Of these, several measures were consistently used across studies, mostly notably the SRT task, which was used 11 times. In seven studies, the SRT task most commonly employed was the probabilistic SRT task. According to Kaufman et al. (Reference Kaufman, DeYoung and Gray2010), this measure requires participants to monitor stimuli at one of four locations on a computer screen and respond as quickly as possible by pressing a corresponding key. Training consists of blocks of trials interspersed with two sequences, which occur with varying probability, and online learning is measured by computing the RT difference across probable versus improbable trials. As an ISL task, a major advantage of the SRT is the integration of the training and testing phases, as participants are not alerted to the presence of regularities. Other versions of the SRT task were also used (see Table 16.2).

Table 16.2 Instruments used to measure ISL ability in L2 studies

	Auditory training modality	Visual training modality
Accuracy-based scoring method	AGL task (Brooks, Kwoka, & Kempe, Reference Brooks, Kwoka and Kempe2017) SL task based on adjacent dependencies (Brooks & Kempe, Reference Brooks and Kempe2013; Degani & Goldberg, Reference Degani and Goldberg2019; Kerz & Wiechmann, Reference Kerz, Wiechmann, Pastor and Mitkov2019; Onnis et al., Reference Onnis, Frank, Yun, Lou-Magnuson, Papafragou, Grodner and Mirman2016) SL task based on nonadjacent dependencies (Kerz & Wiechmann, Reference Kerz, Wiechmann, Pastor and Mitkov2019; McDonough et al., Reference McDonough and Trofimovich2016; 3 studies reported in McDonough & Trofimovich, Reference McDonough and Trofimovich2016)	AGL task (Lee, 2016; Robinson, Reference Robinson2005) SL task based on adjacent dependencies (Chui, Reference Chui2017; Frost et al., Reference Frost, Siegelman, Narkiss and Afek2013; Kerz & Wiechmann, Reference Kerz, Wiechmann, Pastor and Mitkov2019; Mor & Prior, Reference Mor and Prior2020) SL task based on nonadjacent dependencies (Lee, Reference Lee2014)
Speed-based scoring method	None	Alternating SRT task (Cox, Reference Cox2013) SRT task (Linck et al., Reference Linck, Hughes and Campbell2013; Granena, Reference Granena2019; Yilmaz & Granena, Reference Yilmaz and Granena2019) Probabilistic SRT task (Kaufman et al., Reference Kaufman, DeYoung and Gray2010; see also Ćurčić, Reference Ćurčić2018; Granena, Reference Granena2013; Kerz & Wiechmann, Reference Kerz, Wiechmann, Pastor and Mitkov2019; Suzuki & DeKeyser, Reference Suzuki and DeKeyser2015, Reference Suzuki and DeKeyser2017; Yi, Reference Yi2018)

In contrast, statistical learning tasks involve separate training and testing phases and vary according to whether training occurs via the auditory or visual modality and whether the dependencies to be learned are adjacent versus nonadjacent. To exemplify the material to be learned, studies in both modalities used stimuli drawn from Gómez (Reference Gómez2002), wherein nonadjacent dependencies were instantiated using three-part nonword strings, such as pel-kicey-jic. During training, pel consistently occurs before jic, which yields a transitional probability of 1. SL tasks may vary in the transitional probabilities between dependencies and, in the case of nonadjacent dependencies, the number of interleaved nonwords. The advantage of SL tasks is that they remove all but the distributional cues to word boundaries and long-distance dependencies while retaining some semblance to language. Testing often involves an alternative forced choice task in which participants must decide whether items adhere to the training input. As noted, Kerz and Wiechmann (Reference Kerz, Wiechmann, Pastor and Mitkov2019) used a battery of four tasks taken from the wider literature to gauge statistical learning ability, including three statistical learning (abbreviated later as SL) tasks (auditory-verbal-adjacent, auditory-verbal-nonadjacent, visual-nonverbal-adjacent) and a probabilistic SRT task.

As core specifications, the (a) training modality and (b) scoring methods in ISL studies can be used to compare tasks and identify gaps in current measurement practices. Table 16.2 provides a list of all tasks categorized by these key features.

Using this broad classification, several noticeable gaps emerged. First, auditory abilities were used solely by studies using artificial grammar learning (AGL) or SL tasks, whereas visual abilities were used by studies using SRT, though not exclusively. No auditory ISL tasks used speed-based scoring. Although SL tasks as well as similar AGL tasks may not be intended to measure response time, newer tasks used outside of SLA could help bridge this gap (e.g., the AGL–SRT task reported in Misyak, Christiansen, & Tomblin, Reference Misyak, Christiansen, Tomblin, Taatgen and van Rijn2009; see also Vuong, Meyer, & Christiansen, Reference Vuong, Meyer and Christiansen2016).

Lastly, a note of caution regarding reliability is warranted. Fewer than half of the studies provided reliability estimates for these ISL measures. A noteworthy exception, however, was McDonough and Trofimovich (Reference McDonough and Trofimovich2016), who reported the Cronbach’s alpha for their SL task in each of their three reported studies separately, ranging from 0.51 to 0.56 to 0.63. Internal consistency ranged even more considerably in the case of SRT tasks (for further discussion, see Granena, Reference Granena2020).

Research Question 3: Evidence Concerning the ISL–SLA Relationship

Overall Relationship

The population estimate of the relationship between ISL and adult SLA was r = .13. The 95% credible interval (CrI) around this r-value was .05 to .22. Figure 16.1 graphically represents the posterior distribution of the correlation coefficient. The (posterior) probability that the synthesized correlation was larger than zero (i.e., $r > 0$ ) was 99.85%. This meant that with the likelihood of 99.85%, the relationship between ISL and adult SLA was positive. However, it was also clear that the effect itself was very small because the correlation of $r = .13$ meant that ISL only explained 1.69% of variance in adult SLA (or vice versa). Again, this small effect may be due to the effects of lurking moderator variables, which can potentially pull the estimate toward both positive and negative ends. I², a statistic that quantifies the amount of study heterogeneity, was 0.62 in the current dataset. According to Deeks, Higgings, & Altman (Reference Deeks, Higgings, Altman, Higgings, Thomas and Chandler2019), this value represents substantial heterogeneity among study effect sizes. We thus posited that there was good reason to conduct a moderator analysis to investigate how much of the heterogeneity was actually caused by the effect of moderators.

Figure 16.1 Posterior distribution of the population correlation coefficient

Note. The dotted line shows the point estimate of the distribution, r = .13.

Moderator Analysis

After incorporating each moderator into the multilevel meta-analysis model (see above) as a predictor variable, we used the R-package emmeans (version 1.5.3, Lenth, Reference Lenth2020) to obtain model-based means of the synthesized correlations at each level of the moderators. Figure 16.2 summarizes the results. At first glance, most estimates did not seem to markedly differ, but we also found some notable differences based on proficiency and scoring. First, the size of the correlations seemed to gradually decrease as the participants’ proficiency moved from None ( $r = .20$ [.05, .37]) through Intermediate ( $r = .11$ [−.02, .27]) to Advanced ( $r = .06$ [−.08, .21]). Second, the ISL–SLA relationship was stronger when the scoring method of ISL measures was Accuracy Based ( $r = .18$ [.08, .27]) rather than RT Based ( $r = .07$ [−.02, .18]). Lastly, the correlation became stronger when L2 outcome measures were Accuracy Based ( $r = .20$ [.11, .29]) rather than RT Based ( $r = .05$ [−.04, .16]). There were, however, no discernible differences across different input modalities of ISL measures (Auditory: $r = .15$ [.02, .29] and Visual: $r = .12$ [.02, .22]) nor across different test modalities of L2 outcome measures (Comprehension: $r = .14$ [.05, .24] and Production: $r = .13$ [.00, .25])

Figure 16.2 Estimated mean of the synthesized correlations at each level of moderator variables

Discussion and Implications for Aptitude Theory

This chapter has considered aptitude for ISL as a potential predictor of L2 outcomes, synthesizing this body of research to reveal study features, cognitive ID measures, and aggregate findings. Regarding our first research question, the 23 reports showed that studies have been carried out with participants from various language backgrounds to assess the role of ISL ability, in either ab initio or ongoing learning of languages including Arabic, English, Japanese, Russian, Spanish, and others. The L2 outcomes assessed in these studies, which included measures of comprehension and production, were consistent with those used in SLA research in general. Our chapter makes a valuable contribution insofar as its second research question (concerning measures of ISL) shed light on the use of a variety of ISL measures. Table 16.2 classified these instruments by training modality (visual or auditory) and scoring method (accuracy- or speed-based). We note again that cross-modal training paradigms are also possible (Misyak, Christiansen, & Tomblin, Reference Misyak, Christiansen, Tomblin, Taatgen and van Rijn2009), and these paradigms represent a gap in the current ISL–SLA literature. Furthermore, a richer understanding may come from studies using multiple ISL measures. This chapter could thus be used to guide selection of ISL measures in future studies (see also Perruchet, Reference Perruchet2021).

The results for the third and final research question may help to temper researchers’ expectations concerning the magnitude of the association between ISL and SLAFootnote ⁴. The overall correlation between ISL and SLA that was estimated using a Bayesian multilevel random-effects model was very small, $r = .13$ [.05, .22]. This is noticeably weaker than the relationship between working memory and L2 outcomes ( $r = .25$ in Linck et al., Reference Linck, Osthus, Koeth and Bunting2014) or language aptitude and grammatical development ( $r = .31$ in Li, Reference Li2015). Nonetheless, this is the first study to use meta-analytic techniques to demonstrate a positive correlation between ISL and SLA based on the primary literature, which has yielded mixed results. Our moderator analyses suggested that the influence of ISL ability may gradually weaken as proficiency increases ( $r s = .20$ , .11, and .06, for no, intermediate, and advanced proficiency, respectively). This finding contrasts with the effects for procedural memory on adult grammatical abilities, which ranged from $r = - 0.1$ (low language experience) to 0.5 (high language experience) in Hamrick, Lum, & Ullman (Reference Hamrick, Lum and Ullman2018), implying that ISL and procedural memory, regarded as distinct constituents of ILA, may vary in their influence at different stages of L2 development. Perhaps ISL is relevant to making input-based generalizations early on and procedural memory to developing implicit knowledge later, though more research is needed concerning this possibility. Additionally, the finding also contrasts with previous research on implicit learning aptitude (e.g., Linck et al., Reference Linck, Hughes and Campbell2013), which reported that implicit aptitude was particularly predictive of L2 learning at high levels of proficiency. This may be in part because the studies with absolute beginners (i.e., no proficiency) were mostly based on controlled experiments using a miniature or an artificial language (or part thereof) as the target of learning. Compared to other naturalistic samples or quasi-experimental studies, these controlled experiments afford more control and consistency in task administration, including that of ISL and L2 outcome measures, resulting in higher task reliability (and hence higher correlations). This also accounts for another key finding from the moderator analyses, that the effects of ISL ability were stronger when L2 tasks were accuracy based because those experimental studies all relied on accuracy-based scoring methods for their L2 outcome measures. In this light, care must be taken when interpreting the results because they can be only as good as the validity and reliability of the ISL and L2 outcome measures used in each study. Unfortunately, we were not able to investigate this issue further as fewer than half of the studies provided reliability estimates for the ISL measures.

ISL ability, which involves sensitivity to the natural properties of input, is not entirely inscrutable, and L2 research to understand its role is growing. This chapter has offered synthetic evidence directly relevant to investigating the construct of ILA (Granena, Reference Granena2020; Li & DeKeyser, Reference Li and DeKeyser2021) and, more generally, to promoting views of L2 aptitude as multi-componential (e.g., Skehan, Reference Skehan, Granena, Jackson and Yilmaz2016; Wen et al., Reference Wen, Skehan, Biedroń, Li and Sparks2019). These findings can be applied not only to advance research agendas and test development, but also to evaluate theoretical proposals, such as claims of a default implicit processing mode in adult SLA.

Book contents

Part IV - Aptitude–Treatment Interaction (ATI)

Summary

Information

Introduction

Proficiency and Aptitude

Proficiency and Timing

Motivation of the Present Study

Research Questions

Method

Participants

Tasks

Target Structure

Form-Focused Instruction

Data Collection Instruments

Grammaticality Judgement Test

Elicited Imitation Test

LLAMA Aptitude Test

LLAMA E

LLAMA F

LLAMA D

Analysis

Results

Table 14.1 Grammaticality judgement test: Descriptive statistics for lower- versus higher-proficiency participants by group

Table 14.2 Elicited imitation test: Descriptive statistics for lower- versus higher-proficiency participants by group

Table 14.3 Scores on the LLAMA subtests for lower-proficiency participants

Table 14.4 Scores on the LLAMA subtests for higher-proficiency participants

Table 14.5 Significant predictors of the grammaticality judgement posttests for lower-level proficiency participants

Table 14.6 Significant predictors of the elicited imitation posttests for lower-level proficiency participants

Table 14.7 Significant predictors of the grammaticality judgement posttests for higher-level proficiency participants

Table 14.8 Significant predictors of the elicited imitation posttests for higher-level proficiency participants

Discussion

Timing of Instruction and Aptitude

Level of Proficiency and Components of Aptitude

Level of Previous Knowledge

Pedagogical Implications

Limitations and Future Studies

Introduction

Explicit Learning Aptitudes and Explicit Knowledge

Implicit Learning Aptitude and Implicit Knowledge

Explicit and Implicit Learning Aptitudes and Grammatical Difficulty

The Current Study

Methods

Participants

Instruments

Procedure

Data Coding of Visual-World Task

Preliminary Analysis of Visual-World Task

Results

Debriefing of Visual-World Task

Descriptive Statistics for Dependent and Independent Variables

Table 15.1 Descriptive statistics for language and aptitude measures for L2 learners

Relationship of Eye-Movement Data with Explicit and Implicit Learning Aptitudes

Table 15.2 Correlation coefficients (p values) between eye-tracking scores and aptitude scores

Multiple regression results

Composite of Definiteness and Mass–Count n=55

Definiteness n=60

Mass–Count n=55

Discussion

The Role of Explicit and Implicit Learning Aptitudes in Adult SLA

Implicit Learning Aptitude Predicted the Learning of “Difficult” Grammatical Structure

Table 15.4 Summary of previous research on implicit learning aptitude

Suggestions for Future Research: Aptitude and Language Test Development

Conclusions

Introduction

Terminology and Conceptual Issues

The Role of ISL in Adult L2 Learning

Method

Inclusion Criteria

Study Retrieval

Coding Procedures

Inter-Rater Reliability

Table 16.1 Coding reliability for key study features

Synthetic Methods

Results

Research Question 1: Studies of the ISL–SLA Relationship

Participant Age

Language Background

L2 Outcomes

Research Question 2: Measures of ISL

Composite of Definiteness and Mass–Count $(n = 55)$

Definiteness $(n = 60)$

Mass–Count $(n = 55)$