1. Introduction
This paper explores the use of replication as a research tool in second language pragmatics. Advocates of replication research point to advantages such as better understanding of the findings of previous studies, confirming results, and enhancing generalizability (Porte & McManus, Reference Porte and McManus2019; McManus, Reference McManus2024). Replication has also been credited with illuminating aspects of research design, which has been an ongoing concern in L2 pragmatics research since its conception (Bardovi-Harlig, Reference Bardovi-Harlig2025). Two levels of replication are relevant to the current discussion: close replications and approximate replications. In a close replication study, only one major variable is modified, and all other aspects of the initial study are kept as constant as possible. In an approximate replication, two variables may be modified (McManus, Reference McManus, Mackey and Gass2023). Porte and McManus (Reference Porte and McManus2019, p. 26) also caution researchers to assure that their replication study has ‘the same or greater methodological and analytical rigor as that of the original’.
Earlier discussions of replication in applied linguistics and second language acquisition have observed that few replication studies have been published, although replication has become more frequent because of the efforts of some researchers to educate the field of its value (see e.g. Language Teaching Review Panel, 2008; McManus, Reference McManus2022, Reference McManus2024, inter alia; Polio & Gass, Reference Polio and Gass1997; Porte, Reference Porte2012; Porte & McManus, Reference Porte and McManus2019; Valdman, Reference Valdman1993). This paper takes a different approach, investigating replication research in terms of what it can tell us about pragmatics. This paper has three goals: to understand how replication has been used in pragmatics, to explore how replication research can enrich research in pragmatics and language learning, and to offer some suggestions for getting started on a replication project in L2 pragmatics.
Replication research is not new in L2 pragmatics. Two replications were published in the early 2000s (Billmyer & Varghese, Reference Billmyer and Varghese2000; Niezgoda & Roever, Reference Niezgoda, Roever, Rose and Kasper2001). Bardovi-Harlig (Reference Bardovi-Harlig and Trosborg2010) used ‘replicability’ as one point of evaluation of 152 studies in L2 pragmatics in terms of whether studies provided sufficient information about method to allow subsequent replication. Of 111 studies that reported on experimental (vs observational) data, 72, or 65% provided sufficient information about method to support replication. Another indication of recent interest in replication in L2 pragmatics is found in Taguchi and Li’s (Reference Taguchi and Li2019) article which outlines how to replicate two studies researching contextual and individual influences in pragmatic competence (Bardovi-Harlig & Bastos, Reference Bardovi-Harlig and Bastos2011; Taguchi et al., Reference Taguchi, Xiao and Li2016b).
As we will see, replication studies in L2 pragmatics have been used to test task effects, interpret results, and investigate learning contexts. These studies have resulted in further improvement of elicitation tasks and other aspects of method, identifying task effects, confirming or extending results, and extending findings to other languages. The sections that follow examine replications of studies in pragmatics, discuss repetitions of specific scenarios across studies in light of replication, and conclude by offering suggestions for starting points for variable manipulation in L1 and L2 pragmatics studies.
2. The replications
Collecting replication studies raises the question of how we recognize replications. McManus (Reference McManus2022) suggests that they should be identified by ‘replication’ in the title (e.g. Understanding indirect meaning: A close replication, Knight & Edmonds, Reference Knight and Edmonds2024) or in the abstract (‘set out to replicate’ Schauer, Reference Schauer2006). This search for replication studies in pragmatics started first with pragmatics journals (Journal of Pragmatics and Intercultural Pragmatics) and searched ‘replicat- (-e, -ed, -ion, -ing)’. In applied linguistics journals, I searched ‘replicat-’ and ‘pragmatics’ anywhere in the article. This was followed by a search on Ebsco and Google Scholar for ‘replicat-’ and ‘pragmatics’. Uses of ‘replicated’ to refer to results that were consistent with an earlier study rather than a replication of a study were excluded (see also Marsden et al., Reference Marsden, Morgan-Short, Thompson and Abugaber2018, supplement 1, p. 15). All original studies were articles published in peer-reviewed journals. Nine self-reported replications were identified. One study was excluded because it did not replicate the instrument used in the original study.
It is worth noting that Knight and Edmonds (Reference Knight and Edmonds2024) were the only authors to identify their study as a replication in the title. Two studies were identified as replications in the abstract using ‘replication’ (Chen & Yang, Reference Chen and Yang2010; Schauer, Reference Schauer2006). The book chapter (Niezgoda & Roever, Reference Niezgoda, Roever, Rose and Kasper2001), lacking an abstract, identified the replication in the introduction. Idemaru et al. (Reference Idemaru, Winter and Brown2019) identified components of the replication, using the same method and comparing the studies, in the abstract (i.e. ‘compared to previously published Korean data collected through the same methodology’) and reported the study to be a replication in the introduction (p. 519). Licea-Haquet et al. (Reference Licea-Haquet, Velásquez-Upegui, Holtgraves and Giordano2019) also used a paraphrase in the abstract and ‘replication’ in the body of the paper. The other two articles used paraphrases, one referring to ‘original’ and ‘extended’ formats of items (Wiegmann, Reference Wiegmann2023), and the other to ‘descriptions of situations […] formulated by [the original study]’ (Billmyer & Varghese, Reference Billmyer and Varghese2000). Replications are probably under-reported (McManus, Reference McManus2022) and so the studies discussed here are to be considered examples, but not an exhaustive account.Footnote 1 For that reason, I have included L1 empirical studies as well as L2 studies for breadth of coverage. (I will claim that replicating L1 studies by manipulating the L1/L2 variable would be informative; see Section 4.1.)
Eight replication studies from one edited volume and six journals, Journal of Pragmatics, Intercultural Pragmatics, Foreign Language Annals, Analysis, Language Learning, and Applied Linguistics, were identified and are discussed here. Table 1 provides an overview of the original studies and the replications. Two of the studies draw on two studies each for their replications. Idemaru et al. (Reference Idemaru, Winter and Brown2019) replicates procedures and tasks from Winter and Grawunder (Reference Winter and Grawunder2012) and Brown et al. (Reference Brown, Winter, Idemaru and Grawunder2014); Wiegmann (Reference Wiegmann2023) replicates the items from Weissman and Terkourafi (Reference Weissman and Terkourafi2019) with the format used by Reins and Wiegmann (Reference Reins and Wiegmann2021). Two replication studies, Niezgoda and Roever (Reference Niezgoda, Roever, Rose and Kasper2001) and Schauer (Reference Schauer2006), replicate the same study, Bardovi-Harlig and Dörnyei (Reference Bardovi-Harlig and Dörnyei1998). Four of the replications are carried out by one or more of the original authors (indicated by underlining in Table 1).Footnote 2 The variables that were manipulated by the replications include time (diachronic comparison), item format, subject population, language, and learning context. The descriptions of the studies do not capture all the details of the original or replicated designs but focus on the pragmatics of the studies.
Table 1. Replication studies in pragmatics

Note: The underlined names indicate researchers who participated in both the original and replication studies.
2.1. Time: Diachronic comparison
Chen and Yang (Reference Chen and Yang2010) executed a close replication of Chen (Reference Chen1993) which studied compliment responses in Mandarin Chinese. They administered the same four-scenario written DCT that encouraged multiple alternative responses, changing no features of the task. The replication took place in Xi’an, in the same city and at the same university as the original study 17 years later. The variable that is manipulated is the year of data collection. The results show that there had been a noticeable shift in responses to compliments in Mandarin Chinese in the city of X’ian. Whereas Chen (Reference Chen1993) reported that the dominant response to a compliment in Mandarin was a rejection (95.7% of compliment responses), Chen and Yang (Reference Chen and Yang2010) found that the preferred response was acceptance (62.6%) followed by deflection (28.3%). The replication provides a picture of diachronic change in a relatively short time.
2.2. Refining method
Replications may also be undertaken for the purpose of studying the effect of task design. In the case of the two methodological investigations considered here, the variable that is manipulated is item format. Rose (Reference Rose1992) used a six-item written DCT to study the effects of including a hearer response. Half of the participants completed the DCT with no hearer response and the other half completed the DCT with a hearer response following a blank for a turn to be supplied by the participant. There was no significant difference in NS responses to the two DCT formats, but there was a slight tendency for longer requests and greater use of downgraders and supportive moves on the DCT with no hearer responses. Billmyer and Varghese (Reference Billmyer and Varghese2000) conducted a partial replication of Rose (Reference Rose1992) using the DCT with no hearer responses. Rose (Reference Rose1992) used unelaborated scenarios to elicit requests as in Example (1). Billmyer and Varghese added a second DCT format, with elaborated scenario descriptions providing more information about settings and speakers, as in Example (2). They also manipulated the participant population by testing L2 English speakers as well as L1 speakers. Half of the L1 speakers and half of the L2 learners took Version 1 (Example 1), and half took Version 2 (Example 2).
1. Rose (Reference Rose1992), Notes (without hearer response)
You missed class and need to borrow a friend’s notes. What would you say?
YOU:
2. Billmyer and Varghese (Reference Billmyer and Varghese2000), elaborated Notes item
You are at the end of a history class and you are sitting next to Tom Yates. You missed last week’s class and need to borrow his notes. He has been in the same program as you for one year and you see him socially about once a month in a group. You will also be taking classes together in the future. He is a good note-taker and one of the best students in the class. You have borrowed his notes twice before for the same class and the last time you borrowed them he was reluctant to give them up. In two weeks you both have the final exam for your class. What would you say?
YOU:
3. Sample learner responses
a. Hi. I missed class. Would you mind if I handed your notes? (Short version, Notes, L2 speaker)
b. Tom, would you mind if I want to borrow your notes? I know it’s the third time that I asked for but I had no chance to join the class last week. I’ll be happy to help you if you need my notes in other classes. (Long version, Notes, L2 speaker)
(Billmyer & Varghese, Reference Billmyer and Varghese2000, p. 540)
The longer scenarios yielded longer responses, more supportive moves, and more personalized responses. An example like (3a) is more likely to occur after (1), and (3b) is more likely to occur after (2). The replication showed the importance of the amount of information provided in the scenario used to elicit the production.
A second exploration of item design comes from Wiegmann (Reference Wiegmann2023) which replicated Weissman and Terkourafi (Reference Weissman and Terkourafi2019), using the items from Weissman and Terkourafi (Reference Weissman and Terkourafi2019) and the format from Reins and Wiegmann (Reference Reins and Wiegmann2021) to investigate differences in findings. The studies investigated the question of whether false or deceptive implicatures are lies. Weissman and Terkourafi (Reference Weissman and Terkourafi2019) cite the following example from Meibauer (Reference Meibauer2005, p. 1380) to illustrate a false implicature.
4. The story of the mate and the captain
A captain and his mate have a long-term quarrel. The mate drinks more rum than is
good for him, and the captain is determined not to tolerate this behaviour any
longer. When the mate is drunk again, the captain writes into the logbook: Today,
11th October, the mate is drunk. When the mate reads this entry during his next
watch, he is first getting angry, then, after a short moment of reflection, he writes
into the logbook: Today, 14th October, the captain is not drunk.
The implication of the mate’s entry is that it is unusual that the captain was not drunk (and therefore remarkable), although the statement itself is true: The captain was not drunk. Subsequent studies by Weissman and Terkourafi (Reference Weissman and Terkourafi2019) and Reins and Wiegmann (Reference Reins and Wiegmann2021) experimentally investigated whether speakers regard false implicatures, like the one created by the mate’s entry, to be lies. However, they found conflicting results using different formats. Weissman and Terkourafi (Reference Weissman and Terkourafi2019) had found that false implicatures are not necessarily lies, whereas Reins and Wiegmann (Reference Reins and Wiegmann2021) had found that false implicatures are lies. In order to reconcile the differences, Wiegmann (Reference Wiegmann2023) replicated Weissman and Terkourafi (Reference Weissman and Terkourafi2019) as Condition 1 (Original, Example 5) and ran Condition 2 (Explicit, Example 6) in which the same scenarios were extended to explicitly state the speaker’s intention to deceive and the implicated content. The changes appear in square brackets in Example (6). The scenarios were followed by a question, ‘Did the speaker lie?’ and a seven-point Likert scale from 1 ‘definitely not a lie’ to 7 ‘definitely a lie.’
5. Weissman and Terkourafi (Reference Weissman and Terkourafi2019), original format
(Hammer) Rumours have spread about an incident in the art studio yesterday. Alex was in the studio all day and saw Sarah, frustrated with a project, pick up a hammer, walk over to a statue and kick the statue over with her foot, causing it to smash all over the floor. The following day, Alex talks about the incident.
Mark: I heard Sarah had a meltdown in the art studio yesterday! What happened?
Alex: You should’ve been there! In a fit of rage, Sarah picked up a hammer and broke a statue.
6. Wiegmann (Reference Wiegmann2023), explicit format
(Hammer) Rumours have spread about an incident in the art studio yesterday. Alex was in the studio all day and saw Sarah, frustrated with a project, pick up a hammer, walk over to a statue and kick the statue over with her foot, causing it to smash all over the floor. The following day, Alex talks about the incident [and wants Mark to believe that Sarah used the hammer to destroy the statue].
Mark: I heard Sarah had a meltdown in the art studio yesterday! What happened?
Alex: You should’ve been there! In a fit of rage, Sarah picked up a hammer and broke a statue.
[Mark comes to believe that Sarah used a hammer to destroy the statue.]
The replication showed that in judging the original scenarios, participants found that most cases were not examples of lying (as the original did), but that, in contrast, the participants judged the explicitly restated scenarios as lies (Wiegmann, Reference Wiegmann2023, p. 115). Thus, the replication was able to demonstrate that item format is responsible for different outcomes. Taken together, the replications in this section illustrate the importance of this design feature.
2.3. Replication to expand languages investigated
Most branches of linguistics are interested in exploring multiple languages to determine whether language characteristics are universal or restricted to particular languages, and pragmatics is no exception. Within L2 pragmatics, either the first languages (L1s) or the second languages (L2) may be the variable that is manipulated during the replication.
1. Manipulating the L1
Both replications of Bardovi-Harlig and Dörnyei (Reference Bardovi-Harlig and Dörnyei1998) manipulated the L1 variable. Bardovi-Harlig and Dörnyei (Reference Bardovi-Harlig and Dörnyei1998) tested Hungarian L1 EFL learners in Hungary and mixed L1 ESL learners in the Midwest of the US. Niezgoda and Roever (Reference Niezgoda, Roever, Rose and Kasper2001) tested Czech L1 EFL learners in the Czech Republic and mixed L1 ESL learners in Hawaii. Schauer (Reference Schauer2006) tested German L1 students at home (EFL) and abroad in the UK (ESL), thus creating a German L1 ESL group, in contrast to the mixed L1 ESL groups used in the two previous studies. Varying the L1 while keeping the L2 constant allows researchers to utilize the original task. (These studies are discussed in more detail in Section 2.4.)
2. Manipulating the focal or target language
Replications that manipulate the language variable may change the language under investigation. Manipulating the language of investigation requires that the original task be translated, checked for pragmatic appropriateness, and repiloted. In this section we consider three replications that manipulated the focal language and thus created new instruments using the original task as a model and following the procedures for task development detailed in the original studies.
Idemaru et al. (Reference Idemaru, Winter and Brown2019) investigated phonetic cues of deferential speech in Japanese by replicating two studies on Korean, asking whether the phonetics of Korean deferential speech are found in Japanese. The replication had both a production component (Winter & Grawunder, Reference Winter and Grawunder2012) and a perception component (Brown et al., Reference Brown, Winter, Idemaru and Grawunder2014).
Winter and Grawunder (Reference Winter and Grawunder2012) investigated what phonetic cues signal deferential speech using two oral production tasks: leaving a phone message to a high or low status addressee and an oral DCT with five paired scenarios (to a higher status and lower or equal status addressee). They found that Korean speakers lowered their pitch when speaking in the formal register. A second study, Brown et al. (Reference Brown, Winter, Idemaru and Grawunder2014), tested the perception of intended politeness levels through phonetic cues in the absence of morphological or lexical information. The stimuli were ten request situations addressed to a low and high addressee read by eight native speakers of Korean (8 x 10 × 2 levels), creating 160 items. One crucial utterance was extracted from each item for the perception task. Participants were asked to judge whether the speaker was speaking to someone ‘above the speaker’ or ‘below the speaker’. They found that L1 Korean listeners could perceive politeness without morphological or lexical indicators.
In their replication study, Idemaru et al. (Reference Idemaru, Winter and Brown2019) also used an oral DCT to elicit oral production data, following Winter and Grawunder (Reference Winter and Grawunder2012). They expanded the original five paired scenarios to higher and equal status addressees (ten items) to six pairs (12 items). To prepare the perception stimuli, they translated the ten request scripts at two levels used by Brown et al. (Reference Brown, Winter, Idemaru and Grawunder2014) into Japanese, modifying four out of ten items slightly, adding one pair, and then having 20 native speakers record them (creating a larger production sample as well as serving as the stimuli for the perception study). Following Brown et al. (Reference Brown, Winter, Idemaru and Grawunder2014), Study 2 extracted the crucial utterances from the script-reading task used in Study 1. They selected recordings from eight speakers, matching the original 160 utterances. In the replication, participants heard paired sentences and indicated which one was spoken to a higher status addressee (a departure from the original study that asked participants to judge whether there was a lower or higher status addressee for each sentence).
Idemaru et al. (Reference Idemaru, Winter and Brown2019) compared a range of phonetic characteristics of deferential speech in Japanese to those previously reported for Korean. In Korean and Japanese deferential voice is quieter and breathier, and with less fluctuation in pitch and volume, ‘it sounds calm and soft’ (p. 517). The replication pointed to differences too. Korean deferential speech shows lower pitch, whereas Japanese showed no consistent relationship of deference and pitch. Speakers of Japanese correctly identified the level of deference in only 56%, but Korean listeners identified the levels more robustly, in 70% of the cases. The results of the replication also challenge previous claims that Japanese polite speech is high pitched, and consequently the findings also challenge the claim that deference (or polite speech) is universally associated with high pitch. The replication led to new information about deferential speech in general and about Japanese in particular.
The second set of studies are psycholinguistic studies of the comprehension and processing of speech acts and the identification of the illocutionary force of an utterance in the absence of performatives. The original study was conducted in English (Holtgraves, Reference Holtgraves2008) and the replication in Spanish (Licea-Haquet et al., Reference Licea-Haquet, Velásquez-Upegui, Holtgraves and Giordano2019). In Experiment 1, Holtgraves (Reference Holtgraves2008) presented short conversations to L1 English speakers. In the experimental conversations, the last utterance performed a speech act such as thanking, ‘I appreciate your help so much. I couldn’t have done it without you’. In the control versions the last utterance was altered so that it did not perform the speech act but included most of the same words: ‘He appreciated my help so much. He couldn’t have done it without me’. In addition to 12 speech act conversations and 12 control conversations in which the probe word did not occur, 24 filler items were created in which the probe word did occur. There was a separate listening and reading condition. When participants were presented with a probe word that described the speech act, such as Thank in the example shown here, and were asked whether the word appeared in the sentence, participants were more likely to incorrectly report that the probe word had occurred in the speech act sentence. Moreover, they took significantly longer to reject the probe word after the speech act sentence. Holtgraves interpreted this as showing that comprehension entails speech act activation.Footnote 3
Licea-Haquet et al. (Reference Licea-Haquet, Velásquez-Upegui, Holtgraves and Giordano2019) replicated the study with L1 Spanish speakers, using a translation of the written format of Experiment 1 in Holtgraves (Reference Holtgraves2008), piloting the conversations as Holtgraves did for the original English version. In the first Spanish version, Licea-Haquet et al. used verbs in infinitive form as the probe word for the experimental (speech act) utterances and the controls, but in conjugated forms for the fillers. The native Spanish speakers did not show the same patterns as the English speakers, which Licea-Haquet et al. hypothesized could be due to the form of the probe words. When the task was revised, and all the probe words were presented as infinitives, Spanish speakers took significantly longer to answer the speech act trials than the controls. That is, it took respondents longer to rule out the speech-act naming probe when the speech act had been performed than when it had not, even though the word did not occur in the last utterance of the conversation, just as in the original study. However, there was no difference in accuracy between the experimental and control conditions. Thus, the replication corroborated the increase in processing time, but not the reduced accuracy in rejecting the non-occurring probe word. This replication also illustrates the challenges of translation, especially when languages use non-equivalent forms, and it raises the necessity of additional replications to test the impact of the morphological forms of the probes.
The third set of studies discussed here investigated the interpretation of indirect speech. As in the previous two sets of studies, the replication manipulated the language to be investigated. Both the original and the replication studies tested the interpretation of indirect refusals, indirect opinions, and ironic utterances, addressing the question, ‘Do L2 [language] learners demonstrate different comprehension accuracy and response speeds across different types of indirect meaning?’ Based on conventionality, the authors of the original study, Taguchi et al. (Reference Taguchi, Gomez-Laich and Arrufat-Marques2016a), expected the order of difficulty to progress from indirect refusals to indirect opinions to irony. However, they found that for L2 learners of Spanish (95% of whom were L1 English speakers), indirect refusals and indirect opinions were of equal difficulty and ironic utterances were more difficult. The replicating study (Knight & Edmonds, Reference Knight and Edmonds2024) tested this finding with L1 speakers of English learning L2 French with the additional goal of increasing the generalizability of the findings with another language.
Taguchi et al. (Reference Taguchi, Gomez-Laich and Arrufat-Marques2016a) implemented a video task that included 48 items, including 12 items each of indirect refusals, indirect opinions, and ironic utterances, ten fillers which had literal interpretations, and two practice items. The video clips had four to five speaker turns. Each item on the video was followed by a multiple-choice question from which participants selected an interpretation. The video and the items were developed expressly for the study; the pre-pilot phase of development included 16 items for indirect refusals, indirect opinions, and ironic utterances, which were reduced to 12 each after two rounds of piloting with native speakers of Spanish.
To develop a French equivalent of the task for their close replication of the original, Knight and Edmonds contacted the first author of the original study for the pre-pilot task with 16 items per indirect speech act. They then followed the procedures for developing the original task reported by Taguchi and colleagues. Knight and Edmonds (Reference Knight and Edmonds2024) translated the items, verified cultural appropriacy (e.g. changing ‘pretzels’ to ‘ice cream’ for a snack purchased at a theatre for the Movies item), piloted the items with L1 French speakers, revised problematic items, piloted them again, and filmed the new task using L1 speakers, developing the multiple-choice questions (with appropriate responses and alternatives) using the procedure outlined in the original study. The L2 learners of French scored the same on indirect refusals and indirect opinions which both scored higher than irony, as was the case with L2 learners of Spanish. This replication showed the same results as the original, regardless of language. Thus, it showed that the findings could be generalized to languages beyond Spanish.
All the replications in this section illustrate the value of expanding the languages that are investigated. Whether L1 or L2, this increases the generalizability of the claims. The replications by Idemaru et al. (Reference Idemaru, Winter and Brown2019), Licea-Haquet et al. (Reference Licea-Haquet, Velásquez-Upegui, Holtgraves and Giordano2019), and Knight and Edmonds (Reference Knight and Edmonds2024) which tested additional target or focal languages also emphasize the investment in detail and cross-cultural design required when a new language is investigated; the replications all re-piloted the translated tasks. The replicating authors retained the methodological rigor of the original studies while adapting the task to a new language and culture.
2.4. Manipulating learning context and testing interpretations
This section discusses two well-known replications of Bardovi-Harlig and Dörnyei (Reference Bardovi-Harlig and Dörnyei1998), namely Niezgoda and Roever (Reference Niezgoda, Roever, Rose and Kasper2001) and Schauer (Reference Schauer2006). Bardovi-Harlig and Dörnyei tested the ability of L2 learners to recognize pragmatic violations through a video judgement test that tested pragmatic and grammatical awareness. The 20-item video task featured the interactions of two students in various encounters with other people during their school day. There were eight items that were pragmatically appropriate but ungrammatical, eight items that were grammatical but pragmatically inappropriate, and four items that were both grammatical and appropriate.
The video played every scenario twice. In the first presentation, participants were told to ‘just watch the scene’ and in the repeated presentation, they were told to ‘watch and mark your answer sheet’ (p. 244). An exclamation mark appeared on the screen before the last utterance which was to be judged. Participants were given a written answer sheet which contained only the number of the item and the final utterance. They indicated whether the sentence was good/appropriate or bad/inappropriate by checking a box, then decided how bad the sentence was on a Likert scale from ‘not bad at all’ to ‘very bad’ (p. 244). The task was administered to EFL learners in Hungary (L1 Hungarian) and mixed L1 ESL students in the Midwest of the US. The study reported that ESL learners recognized more pragmatic infelicities and rated them as being more serious than grammatical errors, and the EFL learners recognized more grammatical errors and rated them more as being more serious than pragmatic infelicities. Bardovi-Harlig and Dörnyei concluded that the learning context, ESL vs EFL, was the determining variable.
Niezgoda and Roever (Reference Niezgoda, Roever, Rose and Kasper2001) and Schauer (Reference Schauer2006) both used the video task to elicit learner judgements. In addition to the original question, they each had an additional investigation of the original findings. The authors of the original study had assumed that learners would reject the pragmatic infelicities on the basis of pragmatics and the ungrammatical sentences on the basis of grammatical violations, but did not explicitly test the assumption. However, both replications tested the original assumption by implementing an additional task. I will discuss this aspect of the replication first. Both replications also manipulated the learning context (and with it the L1, discussed briefly in 2.3.1) to test whether learning context was the determining factor.
Niezgoda and Roever (Reference Niezgoda, Roever, Rose and Kasper2001) conducted a close replication, using the video task and the original worksheet. Niezgoda and Roever also expanded the study to investigate whether learners had responded to grammatical items based on their grammatical errors and pragmatic items based their pragmatic infelicities. After running the original task, Niezgoda and Roever then trained a subset of participants to identify grammatical and pragmatic errors. The participants completed the task a second time, classifying errors as either pragmatic or grammatical. Learners were able to distinguish the pragmatic and grammatical items on the original task, strengthening the interpretation of the original study. Schauer’s (Reference Schauer2006) replication additionally interviewed learners to determine why they had marked errors; learners were told how they had responded to each item and then were asked to explain why. Like Niezgoda and Roever (Reference Niezgoda, Roever, Rose and Kasper2001), Schauer also found that learners could distinguish pragmatic from grammatical errors. By implementing additional tasks, both replication studies provided evidence that allows the results of original task to be interpreted with confidence.
Both replications also manipulated the variable of learning context, testing the original conclusion that learning context was the causative variable. Manipulating the context variable also entailed changing the subject population (and hence the L1). Niezgoda and Roever tested Czech L1 EFL learners in the Czech Republic and mixed L1 ESL learners in Hawaii. The L1 Czech learners were the top 5% of all English-language learners in the Czech Republic and aspiring EFL teachers, and the ESL learners were described as being in Hawaii to learn English and enjoy the beaches. The results showed that at a high enough level of proficiency and/or motivation EFL learners could recognize pragmatic infelicities as well as ESL learners (at low enough proficiency). The results of the replication suggested that the difference in learning context alone is not sufficient to determine the outcome, in contrast to the original interpretation.
In an approximate replication, Schauer (Reference Schauer2006) tested a different population, namely German L1 speakers studying in a university in Germany (EFL learners), and German L1 speakers abroad in the UK (ESL learners). This replication kept the L1 constant across learning contexts which eliminates an extraneous variable in both Bardovi-Harlig and Dörnyei (Reference Bardovi-Harlig and Dörnyei1998) and Niezgoda and Roever (Reference Niezgoda, Roever, Rose and Kasper2001). Schauer also tested the study-abroad students twice, nine months apart, which added a longitudinal component to the study. Schauer reported that, as in the original study, at the first test, the EFL learners were less aware of pragmatic infelicities than the ESL group was. Schauer calculated both an ‘uncorrected’ rate (used in the original) and a ‘corrected rate’ which included only the pragmatic infelicities and grammatical errors that the learners correctly identified in the interview. Both analyses supported the original analysis. Learners in the UK identified more pragmatic errors than grammatical errors, and the learners in Germany identified more grammatical errors. Although the corrected rates were lower than the uncorrected rates of the original study, the learners in the UK rated pragmatic infelicities as more serious than grammatical errors, and the learners in Germany rated grammatical errors more seriously, further supporting the findings of the original study. The higher proficiency of the German ESL learners at nine months is more directly comparable to the high-level learners in the first replication by Niezgoda and Roever (Reference Niezgoda, Roever, Rose and Kasper2001), but is also at odds with their findings.
The longitudinal component of the study showed that at nine months of the length of stay, the ESL learners in the UK identified more pragmatic and grammatical errors, approximating the native speakers’ scores. The severity ratings for the pragmatic infelicities increased significantly by the end of the nine months, whereas the severity ratings for the ungrammatical items were not significantly different. Thus, the severity ratings of the learners in the ESL context were even greater at nine months than those of the learners in the EFL context, supporting the imbalance of the severity ratings found in the original study.
There is one change to the original method that is worth mentioning. Schauer altered the answer sheet, thereby possibly altering the task. Instead of providing only the utterance to be judged on the answer sheet, the revised answer sheet provided the target utterance and the immediately preceding sentence to stimulate recall of the scenarios during the interview that followed. The original study provided only the last utterance to compel participants to watch the video in order to complete the task, judging the utterance in the video context rather than imagining the speakers, their prosody, or other relevant characteristics of the exchange as people may do in a written task. Providing the preceding utterance presents an adjacency pair (an utterance and the reply) on the answer sheet and this may have been sufficient to allow learners to exclusively focus on the written text if they were inclined to do so. The ramifications of changing the task are not discussed in the report. Given the advanced level of the learners, it is likely that the German L1 learners in Schauer’s replication study did watch the video, but this may not be the case in all learning contexts or at all proficiency levels. This could be investigated in a subsequent study.
The Niezgoda and Roever (Reference Niezgoda, Roever, Rose and Kasper2001) and Schauer (Reference Schauer2006) replications confirmed that the respondents were evaluating pragmatic infelicity and ungrammaticality as intended. Schauer also demonstrated that typical ESL learners showed greater pragmatic awareness than EFL learners, confirming the original findings. Niezgoda and Roever showed that exceptional EFL learners were not constrained by context, challenging the original interpretation. Both replications add to our understanding of the role of learning contexts, an area that remains of interest in L2 pragmatics research.
This review of the eight replication studies shows the range of replications that have been undertaken in pragmatics. The studies illustrate how replications can be used to test (and confirm) findings and investigate method and item construction, diachronic change, L1 and learning contexts, and different target languages. They provide examples for how additional replications may be carried out in L2 pragmatics. This set of replication-original pairs and triples also show that replications can lead to additional questions that require further empirical investigation.
3. Item replications
In addition to the replications at the level of a study, there are also item or scenario replications throughout the L2 pragmatics literature. This is possible because L2 pragmatics research is dominated by scenario-based elicited-production tasks (e.g. see Examples 1–2 and 7–10). While the predominant task is still the written DCT, oral DCTs and role plays are also scenario-based. Combined, they make up 172 of 246 (or 70%) of all the tasks and 172 of the 217 (or 79%) production tasks in Nguyen’s (Reference Nguyen and Taguchi2019) methodological review. Item replications are not overtly described as replications and may not be thought of as replications by the researchers. The results from the individual items are not compared to the results for the same items in other studies. Yet both earlier and later studies use the scenarios to answer the question of how the particular speech act is realized, and addressing the same question is the most basic element of replication. These repeated items weave pragmatics studies together in a way not found in other areas of SLA research.
The repeated scenarios often draw on the some of the same scenarios that were introduced by the cross-cultural speech act realization pattern (CCSARP; Blum-Kulka et al., Reference Blum-Kulka, House and Kasper1989; Blum-Kulka & Olshtain, Reference Blum-Kulka and Olshtain1984) project. In L2 pragmatics, the scenarios are often given short names so that they can be easily referred to in a paper; this means that they can also be identified across different studies. I will consider two scenarios here, ‘Notes’ and ‘Forgot Book’ (or ‘Book’). The Notes scenario in Example (7) was used in the CCSARP and predates Examples (1) and (2) in Section 2.2. Notice that the CCSARP item format used third person and a rejoinder (a response from the hearer). The practice of using third-person wording in scenarios had changed to using second person by 1990 (e.g. Beebe et al., Reference Beebe, Takahashi, Uliss-Weltz, Scarcella, Anderson and Krashen1990; Hudson et al., Reference Hudson, Detmer and Brown1992; Rose, Reference Rose1992).
7. Notes (CCSARP, S5, Request to borrow notes)
At the university
Ann missed a lecture yesterday and would like to borrow Judith’s notes.
Ann:___________________________________________
Judith: Sure, but let me have them back before the lecture next week.
(Blum-Kulka et al., Reference Blum-Kulka, House and Kasper1989, p. 14)
The Notes scenario has been used to elicit both requests and refusals (Table 2). For requests, the scenarios may portray the requester sympathetically, as someone whose request for the notes is legitimate. In contrast, to elicit refusals the requester may be portrayed less sympathetically to assure that the speaker will refuse the request. Example (8) provides an elaborated Notes scenario used by Félix-Brasdefer (Reference Félix-Brasdefer2004) to elicit refusals in Spanish (see also Example 2).
8. Elaborated Notes scenario used to elicit refusals in Spanish (Félix-Brasdefer, Reference Félix-Brasdefer2004)
Table 2. Use of ‘Notes’ in L2 pragmatics studies by speech act, task, and target language

* Note: Indicates that students were from an American and a Spanish university, respectively, but L1 information was not provided.
Imagine that you are in (Spanish-speaking country of your preference). You are taking a course in Latin American literature this semester. You haven’t missed this class once this semester and consider yourself a diligent student. So far you have a good average in the class, not because it is easy for you, but because you have worked very hard. Among your classmates, you have a reputation for taking very good notes. The professor has just announced that the mid-term exam is next week.
One of your classmates, who is taking a class with you for the first time this semester and who has frequently missed the class, asks you for your notes. You haven’t interacted with him outside the class, but have occasionally done small group work together in class. When the class ends, he approaches you for your notes, but you don’t want to lend them to him.
A second illustration of repeated scenarios in the L2 pragmatics literature is the ‘Forgot Book’ scenario used to elicit apologies (Table 3). In different studies the situation is described as forgetting an instructor’s book or a peer’s book. The original from the CCSARP which has written conversational turns is presented in Example (9); other items present only a scenario as in Example (10).
9. Forgot book (CCSARP, S4, Apology for forgetting a borrowed book)
Table 3. The use of ‘Forgot book’ in studies of L2 apologies

At the college teacher’s office
Teacher: I hope you brought the book I lent you
Miriam:
Teacher: OK, but please remember it next week.
(Blum-Kulka & Olshtain, Reference Blum-Kulka and Olshtain1984, p. 198; Blum-Kulka et al., Reference Blum-Kulka, House and Kasper1989, p. 14)
10. Forgot book (Beckwith & Dewaele, Reference Beckwith and Dewaele2008, p. 21)
Imagine you are a student. You borrowed a book from your university
lecturer, but you forgot to return it. What do you say to your lecturer?
Some articles do not provide the full scenario that was given to the particpants, but instead provide a summary such as ‘a student forgets a professor’s book’ (e.g. Al Masaeed et al., Reference Al Masaeed, Waugh and Burns2018; Sabaté i Dalmau & Curell i Gotor, Reference Sabaté i Dalmau and Curell i Gotor2007). The repeated scenarios have been used in a variety of tasks, including written DCTs, oral DCTs, oral role-plays, and less frequently, judgement tasks (Tables 2 and 3). Articles do not typically cite other studies that have used the same basic scenario; however, attributing the sources of scenarios would enhance methodological transparency. See, for example, documentation of sources of scenarios for eliciting conventional expressions in Bardovi-Harlig and Su (Reference Bardovi-Harlig and Su2018) and Bardovi-Harlig et al. (Reference Bardovi-Harlig, Comajoan-Colomé, Mossman and Rodríguez Sánchez2025). Providing the full task with all the scenarios as presented to the participants in publications may also encourage and facilitate replications.
In the aggregate, these repeated items could illustrate similarities and differences in L2 acquisition, cross-linguistic use, and different L1–L2 pairs. More to the point in terms of replication, understanding the repeated scenarios as replications would allow the aggregated studies to confirm findings across studies and test the generalizability of the findings of individual studies, two goals of replication. The repeated use of scenarios also shows a certain openness to replication in L2 pragmatics, although the item replications have some way to go to be real replications, even on their small scale.
4. Routes to replication in L2 pragmatics
The replication studies in Section 2 provide examples of how replications of pragmatics studies have been designed and implemented with respect to the initial studies. Porte and McManus (Reference Porte and McManus2019) offer 18 characteristics that might make a study a candidate for replication (pp. 23–24). Researchers are advised to consider the relevance of a study, access to the original data collection materials, unexpected or unusual outcomes, familiarity of the method (to the researcher), and the original publication venue. Perhaps most important is their admonition to researchers to ‘select your target studies for replication with an eye to helping move forward your own research practice as well as giving service to current knowledge in the field’ (Porte & McManus, Reference Porte and McManus2019, p. 15).
This section focuses on variables that might influence the pragmatic outcomes (other variables such as statistical analyses are discussed by Porte & McManus, Reference Porte and McManus2019; McManus, Reference McManus2022, Reference McManus, Mackey and Gass2023, inter alia). Drawing on the review of replications in Section 2, there are five particularly promising variables that can be manipulated in the replication of pragmatics studies: speaker population (testing L2 learners using tasks that have exclusively tested native speakers), the first language of L2 learners, the learning contexts, the modality of tasks, and target language. Each is considered in turn.
4.1. Manipulating the subject population by testing L2 learners
Empirical studies in L1 pragmatics may offer new perspectives for L2 studies. In his 1992 study, Rose tested the effects of item design exclusively on L1 speakers, whereas Billmyer and Varghese (Reference Billmyer and Varghese2000) included L2 speakers as well as L1 speakers showing that the change in the scenarios affected both groups. There are also L1 studies that investigate areas of pragmatics that L2 studies have not yet investigated. For example, replications of the studies reported here on interpreting false implicatures (Reins & Wiegmann, Reference Reins and Wiegmann2021; Weissman & Terkourafi, Reference Weissman and Terkourafi2019; Wiegmann, Reference Wiegmann2023) would expand the existing L2 research on implicature (and lies, which have not been investigated in L2 pragmatics).
Although the explicit identification of speech acts has been investigated with L2 learners (e.g. Koike, Reference Koike1989, Reference Koike, Gass and Neu1996), speech act identification during comprehension has not. Holtgraves (Reference Holtgraves2008) and Licea-Haquet et al. (Reference Licea-Haquet, Velásquez-Upegui, Holtgraves and Giordano2019) have shown that speech act identification is activated during comprehension in native speakers in both English (Holtgraves, Reference Holtgraves2008) and Spanish (Licea-Haquet et al., Reference Licea-Haquet, Velásquez-Upegui, Holtgraves and Giordano2019), but this has not yet been tested for L2 learners using the speech act probe described in Section 2.3.2.Footnote 4 Moreover, their design could additionally provide a novel means of exploring the comprehension of direct and indirect speech acts which were not a target of investigation in the original studies.
4.2. Expanding the first language of L2 learners
Expanding the range of L1s represented in L2 pragmatics research is a natural area for investigation. Investigating the acquisition of any single target language begs for a variety of L1 speakers to determine what is specific to particular L1–L2 pairs and what is universal. SLA research and L2 pragmatics research have both been interested in L1 influence, and expanding the range of L1 participants that complete the same task is one way to test it. If the original task was conducted in the target language, delivering it to another learner population requires no modification of the task (see, e.g. Bardovi-Harlig & Dörnyei, Reference Bardovi-Harlig and Dörnyei1998, and its replications; Niezgoda & Roever, Reference Niezgoda, Roever, Rose and Kasper2001; Schauer, Reference Schauer2006).
4.3. Manipulating the learning context
Changing the learning context of a study for a replication may also involve changing the first language(s) of the participants, as discussed in the previous section. Nevertheless, L1 and learning context are variables that can also be considered separately. Niezgoda and Roever (Reference Niezgoda, Roever, Rose and Kasper2001) and Schauer (Reference Schauer2006) are good examples of studies that changed the learning contexts. Niezgoda and Roever’s learners were Czech EFL learners and mixed L1 ESL students, but in different programs. In contrast, Schauer’s study manipulated only the context, keeping the L1 constant. The learners were German L1 learners who were in the UK and at home in Germany.
Learning contexts in SLA studies usually refer to host environments or foreign language learners studying at their home institution. But given the dominance of university students in our studies (an issue in SLA more generally), we could include non-degree seeking participants. For example, Bella (Reference Bella2012) collected data using a written DCT modeled on Bardovi-Harlig and Dörnyei (Reference Bardovi-Harlig and Dörnyei1998) from economic migrants learning Greek in Greece. Studies of L2 pragmatics in New Zealand have included newcomers heading to the workforce (Riddiford & Joe, Reference Riddiford and Joe2010). Expanding the range of L2 learning contexts and participants not only tests the generalizability of study results, but also strengthens SLA inquiry.
4.4. Manipulating the spoken/written variable for conversational pragmatics
Given my advocacy for oral tasks in L2 pragmatics research (Bardovi-Harlig, Reference Bardovi-Harlig2025; Bardovi-Harlig & Hartford, Reference Bardovi-Harlig, Hartford, Bardovi-Harlig and Hartford2005, inter alia), it will come as no surprise that I think a promising line of replication research in L2 pragmatics would involve manipulating the variable of task modality, specifically in the direction of matching modality. Matching modality (Bardovi-Harlig, Reference Bardovi-Harlig2018) is a simple concept that refers to matching the modality of research instruments or tasks to the modality of the language event being investigated. In the study of conversational pragmatics, this would mean that tasks would be oral/aural. In addition, there is pressure to use oral tasks from investigations of pragmaprosody which emphasize the role of speaking in pragmatics (e.g. Hao et al., Reference Hao, Su and Chang2024; Kang et al., Reference Kang, Kermad and Taguchi2021; Liu, Reference Liu2025). Many comprehension and interpretation studies ask participants to read written conversations, and many production studies ask participants to write conversational turns. Manipulating the variable of modality, studies that asked participants to read utterances or conversations and then judge pragmatic appropriateness, determine the illocutionary force or implicature, or interpret meaning would be presented aurally and participants would hear the discourse to be judged. Similarly, production tasks that ask participants to write their responses in the original study would ask them to speak in the replication. Additionally, other areas of SLA recognize that untimed written tasks are more likely to tap explicit (declarative) knowledge, whereas timed oral tasks are likely to draw on implicit (procedural) knowledge. Matching modality would address this concern as well, particularly in the oral/aural condition while increasing ecological validity of tasks that represent conversation.
4.5. Expanding language coverage by exploring other target languages
Expanding the language of investigation is perhaps both the most obvious and the most challenging variable for manipulation. It is obvious, yet worthwhile, to investigate whether what we have discovered about the acquisition of L2 pragmatics based on one language holds for other languages. This is particularly important given that some languages have been researched more than others. As we have seen from the examples set by Knight and Edmonds (Reference Knight and Edmonds2024), Idemaru et al. (Reference Idemaru, Winter and Brown2019), and Licea-Haquet et al. (Reference Licea-Haquet, Velásquez-Upegui, Holtgraves and Giordano2019), producing faithful reproductions of the original task in another language requires not only a translation of the task, but cultural adjustments, and then piloting, revising, recording, and sometimes filming the task in the new target language.
The range of possible cultural adaptations is indeed large, but in terms of a single item or task, they could be small but meaningful. Attention to detail is key. Considering only the authors already cited, I will give four examples. Knight and Edmonds (Reference Knight and Edmonds2024) changed the snack in their Movies scenario to a snack available in movie theatres in France (i.e. ice cream). Bella (Reference Bella2012) describes a request scenario using her knowledge of the local Greek public transportation system. The next two adaptations were made in studying regional pragmatic variation rather than in replications but are also relevant here. When studying regional pragmatic variation in Spanish, Bardovi-Harlig et al. (Reference Bardovi-Harlig, Comajoan-Colomé, Mossman and Rodríguez Sánchez2025) changed the desired purchase from a blanket in El Paso/Ciudad Juarez to sunglasses in Barcelona in a bargaining scenario at an outdoor market. Su and Chang (Reference Su and Chang2019) presented the apology scenarios in an oral DCT to Mandarin speakers in Mainland China in simplified Chinese characters and voiced by a speaker of standard Mandarin and they presented the same oral DCT to Mandarin speakers in Taiwan in traditional Chinese characters and voiced by a speaker of Taiwanese Mandarin. Each of these cultural adaptations make the scenarios they describe more familiar and thus more believable to the intended participant group when using tasks to elicit data from speakers and learners.
Developing interpretation or comprehension tasks requires a significant effort because they involve developing scenarios and capturing authentic or near-authentic language. In contrast, production tasks such as oral DCTs or role-plays require only the translation and cultural adjustments in the scenarios, and whatever initiating oral turns may be included in the task. The preparation of prompts is therefore less extensive than preparing a judgement, interpretation, or comprehension task. It is worth noting that replicating a study that used observational (production) data requires no translation. Although challenging, manipulating the target-language variable is necessary for understanding the limits of generalizability that may result from studying a small range of target languages.
4.6. Other considerations
In addition to Porte and McManus’s (Reference Porte and McManus2019) admonition to maintain methodological rigor in the replicating studies, other considerations arise. Glaser (Reference Glaser2020) raises the issue of updating scenarios used in the original studies. As she correctly points out, the scenario used by Bardovi-Harlig and Dörnyei (Reference Bardovi-Harlig and Dörnyei1998) in which a teacher asks a student ‘to stop by the bus stop on your way home’ to check the bus schedules for the class trip has been rendered obsolete by the internet. Bus schedules are easily found on websites. Glaser modifies her analogous item to read ‘Lisa, could you do an online search and check out the …’? (p. 64). (Perhaps even this modification has been replaced by apps.) A similar item was unexpectedly discovered in Bardovi-Harlig (Reference Bardovi-Harlig2009). The oral DCT modelled an item on a multiple-choice item from Roever’s (Reference Roever2005) routines test. The scenario, leaving a message with a roommate, was designed to elicit the expression ‘Can I leave a message?’ However, by the time of data collection in 2006, the item was no longer relevant. Speakers no longer leave messages at a central location at friend’s homes or businesses; voicemail has replaced this speech act. These examples show that changes in technology – like cultural differences – may need to be considered for replications.
While Glaser’s (Reference Glaser2020) observation highlights potential changes to specific items in a task, the results reported in Chen and Yang’s (Reference Chen and Yang2010) study raise the issue of pragmatic diachronic variation in the language produced. We know that language changes, but we might not be fully aware of how quickly pragmatics may change (after all, we have only recently embraced synchronic variation in pragmatics, e.g. Barron & Schneider, Reference Barron and Schneider2009). Chen and Yang (Reference Chen and Yang2010) revealed a change in preferred responses to compliments at the level of pragmatic strategy. This suggests that it might be worthwhile to replicate some of our classic studies in L2 pragmatics to explore language changes and concomitant changes in language produced by L2 learners.
These are only a few variables that can be explored via close or approximate replications of L2 pragmatic studies. Taguchi and Li (Reference Taguchi and Li2019), discussing potential replications of Taguchi et al. (Reference Taguchi, Xiao and Li2016b) and Bardovi-Harlig and Bastos (Reference Bardovi-Harlig and Bastos2011), additionally suggest varying proficiency, widening the range of length of stay, and individual differences whether cognitive (such as aptitude and working memory) or social-affective (such as motivation, willingness-to-communicate, and personality). Naturally, the ultimate identification of variables to be manipulated depends on the original study and the questions that lead to the replication.
5. Conclusion
By examining replication studies in L2 pragmatics, I hope to have raised awareness of the value of replication research among L2 pragmatics researchers and to have raised awareness of L2 pragmatics and its potential value as a site for replication among researchers already interested replication research. I have provided some ideas for variables that might be manipulated in L2 pragmatics replication studies. The replication studies reviewed here continue the focus on task design, which has been a crucial concern to L2 pragmatics research both in original and replication studies.

