Vocabulary learning from reading and listening: Replications of Brown et al. (2008) and Vidal (2011)

Abstract There has been a great deal of interest in second language vocabulary studies regarding the potential of reading as a source of incidental vocabulary learning. More recently, several studies have also focused on comparing reading with other input modes, such as listening, or reading-while-listening. Among these studies there are two – Brown et al. (2008) and Vidal (2011) – that have been extensively cited because of the evidence they provided regarding the differential effects of reading versus listening in promoting incidental vocabulary gains. The present study presents different arguments for replication of these two original studies as well as specific ideas on how such replications could be conducted.


Introduction
The aim of the present paper is to suggest some replications of two key studies in the field of incidental vocabulary learning from reading and listening.There is a lot of interest in the SLA literature in how exposure to second language (L2) written and aural input can help learners develop their L2 lexicons (Webb, 2020).As is the case of many areas in second language acquisition (SLA), claims about factors that affect learning are often based on very few empirical studies, and replications are necessary in order to validate, verify, and generalize findings from previous research (Porte, 2012;Porte and McManus, 2019).
Two studies were chosen for replication because of their relevance in the field of vocabulary studies for the evidence they have presented regarding incidental vocabulary learning from reading and listening.The studies were published in international academic journals that are well respected in the field.The first study by Brown et al. (2008) was published in Reading in a Foreign Language, and the second by Vidal (2011) in Language Learning.The studies examine the potential of input from different sources, graded readers (Brown et al., 2008) and academic discourse (Vidal, 2011), to promote incidental vocabulary learning.The two studies focused on the effect of input mode on incidental vocabulary learning and compared the amount of vocabulary that L2 learners acquired through written input (reading), aural input (listening), and additionally, in Brown et al. (2008), through a combination of both (reading-while-listening).
It is important to conduct replication studies of these two papers in order to verify their findings, as well as generalize their results beyond their specific contexts (adult college English as a Foreign Language (EFL) students in Japan and Spain, respectively).Such replication studies would be beneficial for theoretical purposes in order to advance our knowledge of the role of input to foster L2 vocabulary learning, especially on the differences between written versus aural input.Moreover, replication studies would also help the field to provide more solid evidence that would be informative for teachers and L2 learners to choose between reading, listening or reading-while-listening if they are interested in teaching or learning new vocabulary from exposure to L2 input.
The present paper will first provide some background information on incidental vocabulary learning from different input modes (Section 2).Then, each of the two studies selected will be summarized, with special emphasis on the methodology and findings, together with the perceived needs for replication based on their reported limitations, as well as on the need to verify and generalize their results.After this, concrete ideas for replication studies will be proposed (Sections 3 and 4).Finally, Section 5 will conclude the paper.

Background
Considering the vast amount of word families that L2 learners need to know in order to understand authentic aural and written input (around 6,000-7,000 and 8,000-9,000, respectively) (Nation, 2006), and also the complexity of acquiring the multiple word knowledge components (Nation, 2013), it seems unrealistic to expect such knowledge to come exclusively from intentional, vocabulary-focused activities (Pellicer-Sánchez, 2016).It has been claimed that incidental vocabulary learning from reading, as well as other sources, such as TV viewing, online gaming, active use of social networks, and so forth, contributes to L2 learners' vocabulary development, and many studies have set out to explore to what extent L2 learners can acquire new words in this fashion (see Webb, 2020).Incidental learning is broadly defined as 'learning without the intention of doing so' (Brown et al., 2008, p. 136).However, considering the obvious difficulty in getting into the learners' minds to examine their actual intention to learn vocabulary, many researchers have focused on the methodological features of learning experiments in order to operationalize incidental learning (Hulstijn, 2003).One of these features is test announcement, and incidental learning is assumed when learners are not warned about a subsequent vocabulary test.Alternatively, some studies have operationalized incidental vocabulary learning as that which occurs when learners are engaged in a meaning-focused task, such as reading comprehension.
Although lexical gains tend to be more robust when using more explicit teaching approaches than when reading for meaning comprehension (Laufer, 2003), several studies have provided evidence for the potential of reading to promote incidental vocabulary learning (Pellicer-Sánchez, 2016;Pellicer-Sánchez and Schmitt, 2010;Waring and Takaki, 2003;Webb andChang, 2012, 2015).It is assumed that learners need to read extensively for them to experience a significant boost in their vocabulary knowledge and encounter novel words multiple times (Grabe and Stoller, 2011).
Incidental learning from aural input (listening) has not been as widely examined as in the case of reading.However, there is evidence that L2 listeners can also expand their lexicons by encountering words through this input mode (Van Zeeland and Schmitt, 2013;Vidal, 2003).
Recently, some studies have examined aural input in combination with written and/or visual input.On the one hand, we have the reading-while-listening mode, which has been claimed to be beneficial for the learning of single words and also collocations (Vu and Peters, 2021;Webb and Chang, 2015).Often, when the students engage in reading-while-listening, for instance through graded readers, they also get visual support from the images included in the books (Serrano and Pellicer-Sánchez, 2022).In the case of TV/video viewing, there is also visual support, aural input, and written input if captions are used.This multimodal type of input has also been shown to promote L2 vocabulary development (Montero Perez et al., 2013;Peters and Webb, 2018;Pujadas and Muñoz, 2019;Rodgers and Webb, 2019).
While all these studies provide evidence of incidental vocabulary learning from written and aural input, for theoretical as well as pedagogical purposes, it is interesting to know which mode/s promote/s more learning, for whom, the ideal frequency of words in the input, or what dimension/s of word knowledge can be expected to improve more clearly.Among the studies that just focus on linguistic input (and not on audiovisual material), Brown et al. (2008) as well as Vidal (2011) are key references.Brown et al. (2008) is the only study that has compared reading-only, reading-while-listening, and listening-only for the learning of single words and was one of the first to examine the role of input modality in incidental vocabulary learning.Others have focused on different input modes; for example, reading-only versus reading-while-listening (Webb and Chang, 2012) or reading-only, reading-while-listening, and reading with input enhancement (Vu and Peters, 2021).The findings from Brown and colleagues have been cited extensively (689 citations on Google Scholar on 22 April 2022) and are typically presented as evidence for the superiority of written text (also in combination with audio) versus aural input to promote incidental vocabulary learning.However, the lack of statistically significant differences between reading-only and reading-while-listening (also reported by Vu and Peters, 2021) contrast with the statistically significant advantage of the latter mode found by Webb and Chang (2012), which is another reason why revisiting the study by Brown et al. (2008) would be welcome.Similarly, Vidal (2011) is a key reference in research comparing reading and listening, and the results of this study are also typically cited in support for the claim that reading is a more beneficial source of vocabulary learning than listening (341 citations on Google Scholar on 22 April 2022).
This presupposed advantage for the reading mode over the listening mode has been assumed as valid until recently.Webb and Chang (2022) show that, for the learning of collocations, however, listening seems to be as effective as reading, and reading-while-listening is claimed to be the most beneficial input mode.Although this study targets collocations while the two focused on in the present paper examine single-word knowledge, it is important to do replication studies of Brown et al. (2008) as well as Vidal (2011) and re-examine the generalizability of their results.Additionally, such replications might also help to confirm Webb and Chang's (2022) claims that single words and collocations are differentially affected by input modality.A final reason for selecting these two studies for replication is that they considered two different sources of L2 input: while Brown et al. (2008) examined graded readers, Vidal (2011) focused on academic lectures, and it is interesting to examine more in depth the potential of both types of materials to promote incidental vocabulary learning, also because of the obvious pedagogical implications.
3. Suggested study for replication 1: Brown et al. (2008) 3.1.Background to the study Brown et al. (2008) examined incidental learning of vocabulary from three graded readers to which a group of Japanese EFL learners from a private university were exposed in three different modes.In the reading-only mode, the students read the graded reader silently; in the listening-only condition, the participants listened to the story; and in the reading-while-listening mode, they read at the same time as they listened to the story.In all three conditions, the learners had access to the pictures included in the graded reader and were also provided with a short, written summary of the story.
There were three experimental groups: Group A (n = 12), Group B (n = 14), and Group C (n = 9), all exposed to each of the three target graded readers in one mode in a counterbalanced fashion.The students spent approximately 60 minutes on each story, regardless of mode.Twenty-eight words from each book were replaced by pseudowords (total = 84), created by making changes in the spelling of existing known words.The study focused on the learning of these words at intervals immediately after the story, one week later, and three months later.The vocabulary tests included a multiple-choice meaning recognition test (with the target meaning and three distracters) and a meaning-translation test, in which the students were asked to provide the meaning of the target words in Japanese.Apart from comparing the learning rates of the target words in the three different modes, the researchers also examined the role of frequency of appearance of the target words, the role of the testing instrument, and students' preferences.
The results of the study show that on the immediate posttest, 48% of the target words were learned in the reading-while-listening mode, 45% in the reading-only mode, and 29% in the listening-only mode, according to the multiple-choice meaning recognition test.The results of the meaning-translation test were significantly lower, showing gains of 16%, 15%, and 2% for reading-while-listening, reading-only and listening-only, respectively.In both tests, however, the differences between reading-while-listening and reading-only were not statistically significant, unlike the differences between each of the two reading modes and listening-only.When examining the knowledge of the target words on the delayed posttests, the authors report that for the multiple-choice test, scores did not significantly change in the reading-while-listening and reading-only mode, while they significantly increased in the listening-only mode.In the translation test, participants' scores in the reading-while-listening and reading-only mode significantly decreased across time, while no statistically significant differences were registered for the listening-only mode.
Regarding frequency, the target words appeared within a range of 2-20 times in the graded readers.The results show that higher frequency led to more learning in the reading-only and reading-while-listening mode according to both vocabulary tests.Frequency was not found to have a clear role in the listening-only mode and the authors suggest this might be because of floor effects.Additionally, Brown et al. (2008) further suggest that it might be necessary for words to appear more than 20 times for them to be incidentally learned in the listening-only mode.
Although, as explained above, the scores of both tests showed the same between-mode differences, there were statistically significant differences within each mode between the scores of the multiplechoice test, which were higher, and the translation test.This finding has important implications when interpreting the results of studies on L2 vocabulary learning as it points towards the crucial role of the testing instrument when assessing vocabulary gains from reading/listening.Additionally, this finding further confirms the importance of considering different dimensions of word knowledge when assessing incidental vocabulary learning from reading/listening (Pellicer-Sánchez and Schmitt, 2010).
Finally, according to students' answers to the questionnaire about their attitudes towards the different modes, the students found the story in the reading-while-listening mode easier to understand and more interesting than in the other two modes.This preference for the reading-while-listening mode has also been found in other studies (Serrano and Pellicer-Sánchez, 2022;Tragant et al., 2016).

Approaches to replication
There are several reasons why replications of Brown et al. (2008) would be desirable.First, the results of this study are usually presented as evidence for the superiority of the reading-while-listening mode to promote incidental vocabulary learning, or for the higher difficulty in picking up words from listening than from reading.More studies need to be performed in order to find more evidence to further support this claim, as no other studies exist that compare incidental learning of single words in these three modes.Second, this paper examines incidental vocabulary learning from graded readers, which have been shown to be an excellent source of L2 vocabulary learning (Llanes and Tragant, 2021;Webb and Chang, 2015).It is important to delve deeper into their potential and re-examine the results obtained by Brown and colleagues in order for such results to be generalized to other participants of a different age group and in other contexts (the study only included 35 adult Japanese EFL learners).Replicating the study would help us deepen our knowledge about the potential of graded readers to promote incidental vocabulary learning as well as get more solid evidence regarding the input mode that is more likely to facilitate incidental vocabulary gains.
The authors themselves call for a replication of their study with learners in other contexts: '…this study examined only Japanese learners.Therefore, learners from other language backgrounds should be investigated as well.A replication of this experiment would be welcomed' (p.158).It would be especially interesting to examine whether the same differences between the three modes are also found in the case of EFL learners that speak first languages (L1s) that are closer to English (e.g., Dutch, Danish, or Swedish), who are also more used to receiving aural input in this language through audiovisual material since they are very young.The study by Lindgren and Muñoz (2013) examining EFL listening and reading skills of primary school children in six European countries (Croatia, Italy, the Netherlands, Poland, Spain, and Sweden)found that listening and reading skills were related to both L1 distance to English as well as out-of-school contact, and that the Dutch and the Swedish pupils outscored the other L1 groups in both aspects.Other studies have also shown the high degree of engagement with English out of school, especially through audiovisual material, in Denmark (Muñoz et al., 2018), Sweden (Henry, 2019), or Belgium (De Wilde et al., 2020).Replicating the study by Brown and colleagues with learners of other L1 backgrounds would undoubtedly contribute to generalize their findings beyond their context.
Useful information might also be obtained from a close replication involving young learners (primary or secondary school) instead of university students, the target population in the original study.Graded readers are more commonly used in schools and extensive reading programs including these materials have been implemented in different contexts (Lightbown, 1992;Tragant et al., 2016), which is why analyzing the role of modality in incidental vocabulary learning for this population would be especially important.Typically, graded readers for young learners come in the form of audio-books.However, it would be useful to examine further whether this reading-while-listening mode promotes more vocabulary learning than reading-only or listening-only (with minimal pictorial support, as in Brown et al., 2008).Children are in the process of developing their reading skills, also in their L1, which is why the differences between input modes in their contribution to vocabulary learning might not parallel the findings reported by Brown and colleagues.To the author's knowledge, there is only one study comparing two modes in the case of young learners (reading-only and reading-while-listening), showing advantages for the later mode (Tragant et al., 2019), but the graded readers were not fiction and the listening mode was not considered.Apart from the obvious pedagogical implication, this close replication ( just changing the participants' age) would also provide insights into the generalizability of the findings reported in the original study to a younger population.
It would also be desirable to perform a close replication of this study by making one variable change in the methodology in terms of the information provided to participating students.In this case, the replication should be performed with a group of similar participants (adult Japanese EFL learners of equivalent proficiency) and using the same material but creating a methodological condition that would probably lead to more incidental learning than the one in the original study.In the procedure section, the authors report that the students were given the following information about the study: . . . the main focus of the study was a 'vocabulary-learning strategies program'. . . to determine whether they learn vocabulary better from reading, reading-while-listening, or listening to stories.It was explained that they would read and listen to three stories in which certain words had been changed.The rationale for, and examples of substitute words were explained, but none of the actual test items were cited.They were told to enjoy reading and listening to the stories and to do their best to guess the meanings of the substitute words.Afterwards, they would have to answer some questions.(pp. 143-144) After hearing that the purpose of the stories was to examine vocabulary learning and after their attention had been turned to some 'substitute' words, it would not be surprising that many students actually engaged in intentional learning of those words, despite the fact that they did not know explicitly when they would be tested about their knowledge of those words.The reason for the exceptionally high learning rates in the multiple-choice test in the reading-only or reading-while-listening modes compared to other studies (see Webb, 2020) might reflect this, and it would be useful to be in a position to dismiss such a potential effect.Even though analyzing pure incidental learning in classroom settings might be challenging (the participants, after all, are in an L2 class in which there is, or should be, an intention to learn), intentional learning could be minimized in some ways.For instance, the students might be told that the main objective of using graded readers is to examine how easy/difficult it is for learners to understand their content.Therefore, they should be encouraged to focus on meaning comprehension and trying to understand and remember as many details as possible about the stories.Alternatively, the researchers could tell the participants that they are interested in examining learners' preference for one input mode or the other.The results of this replication study would further support the validity of the findings regarding incidental vocabulary learning reported by Brown et al. (2008) and would also help researchers reflect on the impact that different operationalizations of incidental learning could have on vocabulary learning research.
A further close replication would involve a change in the testing procedure.In the original study, the students were tested on the same set of target words three times: immediately after the treatment, one week later, and three months later.While it is relevant to examine short-term learning as well as retention of the meaning of the target words, a testing effect cannot be discarded in the study by Brown et al. (2008).In fact, multiple testing could at least partly explain the statistically significant increase from the immediate to the two delayed posttests in the listening-only mode in the meaning-recognition test.A close replication would involve between-participant testing, with participants being tested only once and one third of the participants being tested each time.In order to do this, a larger participant pool than in the original study would be necessary, at least three times as many participants.Alternatively, instead of having three testing times, the replication study could include an immediate and a delayed posttest only, following the suggestion above for betweenparticipant testing, in which case a lower number of participants would be required (i.e., twice as many as the original study).

Background to the study
The study by Vidal (2011) analyzed how much vocabulary a group of college students in Spain could incidentally learn by reading three academic texts or listening to three academic lectures, as part of their English for Specific Purposes (ESP) course.The participants were divided into two groups: 80 received written input, and 112 aural input.The time spent reading and listening was equivalent, around 14-15 minutes, after which the student answered some comprehension questions.The students were never told that they had to focus on vocabulary learning and were encouraged to focus on understanding the input they were presented with.The vocabulary test consisted of 36 target words (12 from each reading/lecture), together with an additional set of 18 nonwords.For each word on the test, the students needed to: (1) say whether they had seen or heard it and where/ when; (2) provide an explanation of the meaning in Spanish or English; (3) provide a translation in Spanish; and (4) include the word in a sentence in English.The students did the test immediately after each reading/lecture and one month later.Apart from comparing incidental vocabulary gains through reading versus listening, the study also aimed to examine the role of different factors: frequency of occurrence, type of word, type of elaboration, predictability from word form and parts, and students' initial English proficiency.
The results of the vocabulary test provide evidence for the potential of both academic texts and academic lectures to promote incidental vocabulary learning, as both experimental groups made statistically significant vocabulary gains on the immediate and delayed posttests, while a control group only performing the tests did not.The gains in the immediate posttest ranged between 19%-38% for the lower and higher proficiency groups respectively in reading and 7%-28% for the listening group, although roughly half of the gains were lost at the delayed posttest.Another relevant finding of the study was that more statistically significant gains were made after reading than after listening, but differences decreased as proficiency increased and there were no differences between the two input modes for high proficiency learners.According to Vidal (2011), the main difference between reading and listening lies in form recognition, as learners learned the form of words they had encountered in the texts more often than the ones encountered in the lectures.
Interestingly, in terms of retention, the results show that the incidental vocabulary gains made through listening were more durable than from reading, especially for higher-proficiency learners, which is in line with the results of the delayed posttests presented by Brown et al. (2008), in which words acquired through reading were more readily forgotten than through listening.
Finally, Vidal (2011) found that different factors predicted incidental vocabulary learning through reading as opposed to listening.For instance, the best predictor in the case of listening was similarity to the L1 whereas for reading it was frequency, which in turn was the least important in predicting incidental vocabulary gains through listening.

Approaches to replication
This study deserves replication because it is one of the few that has compared the reading and listening modes in their potential to promote incidental vocabulary learning.After this study, as well as Brown et al.'s, the field seems to have assumed as fact that reading is a better source of incidental vocabulary learning than listening.However, returning to Vidal's study and reconsidering some of the aspects of its methodology could help researchers obtain more nuanced and generalizable findings.
First of all, an approximate replication study could be performed with younger learners.In this case, two variables would be changed: first, the age of the participants, but also the materials for the lectures/readings, which should be adapted for this population.Since the study concerns the acquisition of academic vocabulary from written and oral input, replicating the study with primary or secondary school students could be highly useful in the context of Content and Language Integrated Learning (CLIL).Since one of the main L2 areas for which this approach can be expected to be beneficial is vocabulary development (Agustín-Llach and Jiménez-Catalán, 2007;Buyl and Housen, 2014), it would be interesting to examine whether reading is also a better source of academic vocabulary learning than listening to the teacher; especially in primary education, when reading skills are not as well-developed even in the L1.This approximate replication would help generalize the findings of the study beyond university students and will also have important implications for the application of CLIL in primary/secondary schools.
Another useful avenue for a close replication would involve including collocations and not just single words.As was mentioned in the introduction, Webb and Chang (2022) did not find any differences between the listening and the reading modes for the learning of collocations.According to Webb and Chang (2022), listening might be more helpful to identify collocations than single words because of the prosodic information it provides, which facilitates the segmentation of 'chunks'.Replicating Vidal (2011), including both single words and collocations, would help generalize the original findings to other aspects of vocabulary knowledge as well as explain some of the contradictory findings reported in the literature comparing reading and listening.
A further suggestion for a close replication could be to increase the number of exposures to the target words that learners receive.The target words appeared from one up to six times, which might be one of the reasons for the small gains experienced by the students.Increased exposure might lead to higher gains overall but crucially might also lead to fewer differences between the reading and listening modes.As Vidal (2011) suggests, words might need to be repeated more than six times for them to be picked up from listening.Brown et al. (2008) even speculate that more than 20 exposures might be necessary for incidental vocabulary learning from listening to take place, which might be one of the reasons for the lack of a significant effect of word frequency in the study by Van Zeeland and Schmitt (2013), which included a frequency range of 3-15 exposures.A replication of Vidal (2011) could thus be performed either by increasing the frequency of the target words in the same texts/lectures or by encouraging repeated reading/repeated listening, which has also been claimed to be a good way for learners to acquire vocabulary incidentally (Liu and Todd, 2016;Webb and Chang, 2012).The author herself encourages more research in this direction: 'Future research should aim at obtaining higher vocabulary gains by exposing learners to larger amounts of input' (p.249), as such research can also help throw more light on incidental vocabulary learning in the reading versus the listening mode.
A final suggestion for a close replication of the original study could be performed by adding one more testing instrument.Considering the results reported by Vidal (2011), it is at the recognition level that there appeared to be more differences between modes.One of the reasons might be that the scores that targeted the other components of vocabulary knowledge involving recall were too low in both modalities, in line with other findings in the literature that show that learners first develop recognition before recall knowledge (González-Fernández and Schmitt, 2020).Vidal only tested FORM RECOGNITION by asking students whether they recognized having seen or listened to the target words.In the case of the form-meaning connection, the testing instrument only examined MEANING RECALL by asking students to provide a translation in the L1 or an explanation (in L1 or L2) of the meaning of the target words.It would be interesting to perform a replication also including a test of MEANING RECOGNITION by providing the students with several choices from which to select the meaning of the target words.A MEANING RECOGNITION test could capture some vocabulary gains that the meaning recall tests could not, as found by Brown et al. (2008) and also Waring and Takaki (2003).It would be useful for the field to further explore the differences between the reading and listening modes for incidental vocabulary learning for this other dimension of word knowledge.The multiple-choice test could include the L1 translation as well as three other distracters, plus an 'I don't know' option to minimize guessing.This meaning recognition test could be administered after the students had performed the meaning recall tests.

Conclusion
Considering the important role that incidental vocabulary acquisition can have in students' lexical development, it is important for the field to gain deeper knowledge and obtain generalizable findings about how the different input modes in which linguistic information is presented (reading, listening, and the combination of the two) contribute to lexical gains.The studies by Brown et al. (2008) as well as Vidal (2011) are key studies in the field and have guided many subsequent research studies, which have built on their findings.However, these findings were obtained with a specific group of EFL learners under specific methodological conditions.The present paper suggested several replication studies that could be performed by making some changes to the original methods used by the authors.Findings from such replications would contribute to the verification and generalization of the original results, which would certainly help the field move forward.