Unpacking L2 explicit linguistic knowledge and online processing of the English modals may and can : A comparison of acceptability judgments and self-paced reading*

The present study uses self-paced reading as a measure of online processing and an acceptability judgement task as a measure of offline explicit linguistic knowledge, to understand L2 learners ’ comprehension processes and their awareness of subtle differences between the modal auxiliaries may and can. Participants were two groups of university students: 42 native speakers of English and 41 native speakers of Croatian majoring in L2 English. The study is part of a larger project that has provided empirical evidence of the two modals, may and can , being mutually exclusive when denoting ability ( can ) and epistemic possibility ( may ) but equally acceptable in pragmatic choices expressing permission. The present results revealed that L1 and L2 speakers rated the acceptability of sentences in offline tasks similarly; however, L2 learners showed no sensitivity to verb – context mismatches in epistemic modality while demonstrating sensitivity when processing modals expressing ability. Implications for L2 acquisition of modals and future research are discussed.


Introduction
Research into second language (L2) acquisition has long been debating the nature of the target language knowledge that L2 learners develop.Due to both its theoretical importance and the difficulty of knowledge measurement (Bowles, 2011;DeKeyser, 2003;Ellis, 2005;Norris & Ortega, 2013), this exploration has driven much L2 acquisition research.Although linguists differ in views on the sources and development of L2 knowledge, most would agree that explicit and implicit linguistic knowledge are two separate constructs that develop in distinct parts of the brain and are accessed through different processes (Paradis, 2009).Explicit knowledge in L2 is described as conscious, imprecise, unstable, unsystematic, and as common in instructed L2 learners, whereas implicit knowledge is unconscious, intuitive, tacit, systematic, and easily accessible (Bowles, 2011;Ellis, 2005;Ellis et al., 2009). 1 Because implicit knowledge is required for linguistic competence (Norris & Ortega, 2013;Rebuschat, 2013), it is crucial to investigate this type of knowledge, both for its theoretical importance and for understanding L2 development and its implications for pedagogy.
In L2 research, the most widely used instruments to measure L2 linguistic knowledge have been grammaticality and acceptability judgment tasks (Plonsky et al., 2020).Judgment tasks without time constraints tend to be approximated with the measurement of explicit knowledge (Ellis, 2005;Ellis et al., 2009;Godfroid et al., 2015;Rebuschat, 2013;Zhang, 2015) where time is needed to retrieve an explicitly known, pedagogical rule and related information.The same tasks with time constraints are generally treated as tests of implicit knowledge where immediate, fast performance is demonstrated.However, the role of time pressure has been heavily debated.DeKeyser (2003) and Suzuki and DeKeyser (2017), for example, suggested that explicit knowledge can also be accessed rapidly after long practice, so time pressure itself cannot entirely block the influence of explicit knowledge during test performance.While maintaining the distinction between implicit knowledge and automatized explicit knowledge as two separate constructs, Suzuki and DeKeyser (2017) and Vafaee et al. (2017) showed timed grammaticality judgement tasks (GJTs) to be a rather crude measure involving focus on form and the addition of time pressure insufficient for engaging implicit knowledge.In a recent study, Maie and Godfroid (2022) reported the findings of an eye-tracking study that also question timepressured AJT as a measure of automated processing tapping implicit knowledge.Their findings suggested that time pressure supressed both the controlled and the automatic processes and that different L2 speakers were affected by time pressure in different ways.
As is increasingly recognized, more appropriate task types need to be used to measure real-time processing during comprehension and indirectly, implicit linguistic knowledge.Such tasks, along with other psycholinguistic methods, include self-paced reading (SPR), word monitoring, and visual-world task, which Suzuki (2017) proposed as measures of implicit knowledge and distinct from automatized explicit knowledge.
Along these lines, the present study employed SPR to examine whether upperintermediate L2 users have developed mental representations of modals may and can similar to native speakers, which can be demonstrated only in online tasks, during realtime processing.So far, studies that have attempted to measure implicit knowledge or processing have typically focused on morphosyntactic features and to a lesser extent on lexical aspects of the target language.Processing of modals in L2 English has been a neglected area in research to date; thus, the present study set out to fill in this gap.
We adopted the psycholinguistic approach of sentence processing, which is based on the view that real-time sentence processing offers an insight into how grammar acquisition of an L2 occurs.In SPR studies, a baseline condition containing a set of well-formed stimuli is examined in relation to an experimental condition containing the same set of stimuli with an anomaly on the feature being tested.In the present study, this is the alternation of two modals, may and can, which are typically presented as a pair in L2 English textbooks and are known as being especially challenging to master due to their multiple meanings and functions.
We compare L2 speakers' online processing of sentences containing the modals may and can with their acceptability ratings using the same sentences in a judgment task.The significance and originality of the present research lies in the fact that this is the first study to investigate L2 comprehension of the two modals by introducing modality as a grammatical category into the implicit/explicit L2 research paradigm.

Modality and modal auxiliaries
Modality is a prime conceptual domain that enables people to comprehend and produce meanings that are not related to facts and reality, such as belief, imagination, possibility, necessity, inferred certainty, obligation, and permission (Coates, 2014;Leech, 2003;Traugott & Dasher, 2004).Modality exists in all natural languages, but different languages use different means to express modal meaning (Bybee et al., 1994).In English, modality is primarily expressed by the modal auxiliaries can, may, will, shall, could, might, would, should, and must,.There are also lexical expressions such as adverbs (e.g., possibly, probably, likely, etc.), verbs (e.g., believe, think, etc.), and whole sentences-that is, conditional sentences (Stewart et al., 2009).English modal auxiliaries are well known for their peculiarities (Bybee et al., 1994;Coates, 2014;Leech, 2003;Palmer, 1986Palmer, , 2003) ) such as their differences from English lexical verbs in terms of interrogative and negative formation, their lack of nonfinite forms (infinitive and participles), and their complexities in terms of semantic meaning and pragmatic interpretation.Most classifications of modals make the basic differentiation between epistemic and nonepistemic (root) modals where the same form can function differently in different contexts.The following example demonstrates the dual nature of the modal may: (1) They may have arrived-we do not know whether they have arrived or not: there is only a possibility that they have arrived (epistemic possibility).
(2) You may borrow this book-meaning 'you are allowed to borrow this book' or 'you are permitted to borrow this book' (nonepistemic meaning of the same modal expressing permission).
However, it is also possible to use the modal can to convey the same meaning as in (2): (3) You can borrow this book.
The above examples demonstrate some of the "fuzziness" or "indeterminacy" (Coates, 2014) of modal usage in the English language, which presents significant challenges to learners of English as an L2.These challenges are deep rooted especially in the usage of two among the most frequently used modals, may and can (Biber et al., 1999;Brezina & Gablasova, 2015). 2 This suggests that they need to be examined closely, 2 In the New General Service List (Brezina & Gablasova, 2015) of the 2,500 most frequent English words, can is ranked 41st and may 85th (could, ranked 52nd, is the only modal ranked higher than may).In Longman's Grammar (Biber et al.,1999), based on the Longman Spoken and Written English corpus (LSWE), the overall frequencies of can and may are, respectively, 2,500 and 1,000 occurrences per million words.Can is relatively common in all registers, whereas may is less common in spoken conversation but extremely common in academic prose.In academic prose, can has less than 500 occurrences per million words for expressing permission, slightly less than 1,500 occurrences for root possibility, and slightly more than 1,500 the main reason for making them the focus of our analysis.May and can are commonly paired together in textbooks for L2 English learners (Bolinger, 1989;Coates, 2014), and they are usually discussed as a pair in theoretical linguistics.The fact that they are interchangeable in some but not all situations is likely to make their usage additionally puzzling for L2 learners.In terms of research, these two modals present a dichotomy that allows for an examination at the level of sentence processing and detection of the effect that syntactic and semantic anomalies may have when they are used in mismatching contexts.

Syntactic, semantic, and pragmatic functions of may and can
In our investigation we adopt Bybee's descriptive functionalist framework, which is based on a crosslinguistic and diachronic perspective focusing on the semantic content of grammatical categories.In describing the functions and sources of modals, Bybee et al. (1994) identified four types of modals according to their source of modality: (a) agent oriented, which includes obligation, necessity, ability, and desire; (b) epistemic, which is concerned with the speaker's knowledge or belief, expressing possibility, probability, and inferred certainty; (c) speaker oriented, which allows the speaker to impose certain conditions on the addressee, such as commands, demands, requests, permissions, etc.; and (d), subordinating moods, which involve the same forms as those used to express the above modalities to mark subordinate clause verbs.
We selected some of the most frequently used meanings (Biber et al., 1999;Coates, 2014) for agent-oriented, epistemic, and speaker-oriented modalities and omitted the subordinate moods, as the latter only appear in subordinate clauses and are always a secondary source of modality.For agent-oriented modality, which "reports the existence of internal and external conditions on an agent with respect to the completion of the action expressed in the main predicate" (Bybee et al., 1994, p. 177) ability (including the ability to sense) is selected, as in the examples (4) and ( 5): (4) She can speak four languages.
(5) I can smell something burning.
For epistemic meaning, epistemic possibility is selected, as in (6), which indicates that the speaker is not entirely confident that a proposition is true. 3 (6) He may have forgotten my address.
For speaker-oriented modality, which includes the meanings of imperative, prohibitive, optative, hortative, admonitive, and permissive, and does "not report the existence of conditions on the agent, but rather allow(s) the speaker to impose such conditions on the addressee" (p.179), asking and giving permission and offers are selected, as in ( 7) and ( 8): (7) You may use only the main entrance to the building.(8) You may/can come with me this way.
for ability.The frequency of may is less than 500 occurrences for permission but very high (almost 3,000 occurrences per million words) when expressing epistemic possibility.
3 Epistemic possibility, which is framed internally or subjectively, should be kept distinct from root possibility that is regulated by external circumstances (Bybee et al., 1994;Coates, 2014;Lyons, 1977).
Matching the modals may and can with the meanings of ability, epistemic possibility, and permission in relevant contexts allows us to clearly identify and differentiate the semantic, syntactic, and pragmatic functions of the two modals.In a previous SPR experiment with English native speakers (Mifka-Profozic et al., 2020), it was found that the reading penalty due to ungrammatical use of can in the context of epistemic possibility could be clearly distinguished from semantically ambiguous use of may in the context of ability expression.In the same experiment with L1 speakers, both modals were shown to be equally acceptable to express permission.The choice of one or the other in the latter case depended on pragmatic preferences.It is important to emphasize though, that pragmatic preference is a type of distinction that is qualitatively different from the differences that stem from syntactic or semantic origins.

Universal path of modal acquisition
Uniqueness of the modality domain is, to some extent, rooted in the historical development of language, which is well documented in empirical investigations of world languages and studies on grammaticalization (Bybee et al., 1994;Dittmar & Terborg, 1991;Giacalone Ramat, 1992;Traugot & Dasher, 2004).Grammaticalization as a process of linguistic change over time explains how linguistic units may develop out of lexical items and become more subject to the rules of grammar.Diachronically, the development of modal meaning follows the route from nonepistemic (agent-oriented and speakeroriented) modality denoting externally determined situations towards meanings denoting internally (perceptually, cognitively) defined situations (Traugott & Dasher, 2004).Working with data from numerous world languages, Bybee et al. (1994) demonstrated the path of development from modal expression of physical ability and mental ability to root possibility and permission on the one hand and to epistemic possibility on the other.This change is explained as a metaphorical extension, a shift to a different domain: from externally imposed meaning to internal domains where epistemic meanings are encoded.
From the available research, it appears that L2 English acquisition of modal auxiliaries exhibits a very similar "nonepistemic before epistemic" acquisitional sequence (Dittmar & Terborg, 1991;Giacalone Ramat, 1992).For example, Gaccione Ramat (1992) studied the acquisition of L2 Italian by learners from various L1 backgrounds (Chinese, Tigrinya, Persian, German, and English).At early stages of acquisition only nonepistemic use of modal verbs was observed in grammaticalized, inflected verb forms, whereas epistemic meaning was expressed by epistemic adverbs such as forse and magari (Italian for "perhaps" or "maybe").Use of basic formulaic expressions such as non (lo)so ("I don't know") was also observed as a substitute for a modal, indicating "zero probability" (p.312).In a more recent study, Granget et al. (2018) confirmed the comprehension advantage for adult learners of L2 French epistemic modality in comparison with child and adolescent L2 learners.
In short, studies on grammaticalization and individual development suggest that modality in both L1 and L2 language acquisition, as well as in diachronic processes, develops from pragmatic and lexical means to grammaticalized verb forms.This is explained by the fact that at early stages of acquisition language users do not have the grammatical (morphological) means to mark temporal and modal relations.In these processes, as research shows, nonepistemic meanings precede the acquisition of epistemic meaning.

Research into L2 English acquisition of modal auxiliaries
The challenges related to L2 acquisition of English modals are shared, to a certain degree, among all L2 learners, although the scope of difficulty may depend on similarities and differences between L1 and L2.In particular, the modal polysemy presents a mapping and learning problem that involves matching a single lexeme to multiple meanings and functions.In turn, a single meaning can be covered by multiple lexemes.In L2 research, studies carried out to investigate the use of English modals by L2 learners (e.g., Ayoun & Gilbert, 2017;Gibbs, 1990;Hinkel, 2009) have demonstrated that both EFL and ESL learners experience persistent difficulties when attempting to use English modals.In an examination of Punjabi primary and secondary pupils' acquisition of English modality, Gibbs (1990) focused on four modals: can, could, may, and might.The study confirmed the universally ascertained order of modal acquisition, with nonepistemic meanings (ability, permission, and root possibility) acquired earlier than the hypothetical and epistemic possibility meanings.
Despite much research conducted to explore the use of L2 English modals, almost all previous studies have focused on production rather than comprehension.A different perspective, with an interest in comprehension, was offered in a small-scale SPR study with L1 Croatian advanced learners of L2 English (Mifka-Profozic, 2017).The study compared the L2 online processing of modals may and can with L1 English speakers' processing, and detected differences between the two groups in the processing of epistemic possibility.This study was perhaps the first to investigate comprehension of L2 modals at the level of sentence processing.The present study builds on those findings and fills the existing gap in research by introducing an acceptability judgment task (AJT) to compare offline ratings with the performance on an online SPR task.
Online and offline tasks to determine the status of learner knowledge Processing investigations are important, since research evidence suggests that only tasks involving real-time, online comprehension, where learners' attention is entirely focused on meaning, make the indirect assessment of implicit knowledge possible (Suzuki, 2017;Suzuki & DeKeyser, 2015, 2017;Vafaee et al., 2017).This is because in online processing for meaning, the conscious access to learners' explicit knowledge is precluded.Support for this view is found in theories of comprehension and in empirical evidence showing that sentence processing is incremental, which means that syntactic analysis is computed immediately on each word before the next word is encountered (Jegerski, 2012(Jegerski, , 2014;;Keating & Jegerski, 2015).In semantic analysis, the process is slightly different because, here, context plays an important role (Altmann & Steedman, 1988) while each ensuing word of a sentence is processed and checked against the previous context to facilitate interpretation and possible lexical ambiguity resolution.
In online processing for meaning language users interpret sentences word-by-word while reading, rather than at the end of sentence (Just et al., 1982).Research on monolingual sentence processing shows that native speakers vary their reading times (RTs) on a word-by-word basis and make reading adjustments according to word properties such as length, frequency, and word complexity (Just et al. 1982;Keating & Jegerski, 2015).L1 speakers incur a processing cost when encountering syntactic anomalies and mismatches between previous information and incoming input (e.g., Roberts & Liszka, 2013;Stewart et al., 2009).As words are processed incrementally, an increase in RT is detectable either on the target anomalous word(s) or as a spillover on words immediately following, indicating problems in form-meaning assignment when grammatical or logical/semantic incongruencies make sentence meaning ambiguous.
In psycholinguistic studies 4 with L2 learners, online processing is compared with the processing of L1 speakers and with the performance on judgment tasks to determine the status of L2 learner knowledge (e.g., Hopp, 2006Hopp, , 2016;;Jegerski, 2012Jegerski, , 2016;;Pliatskias & Marinis, 2013;Roberts & Felser, 2011;Roberts & Liszka, 2013).In L2 research, the term "grammaticality judgments" has generally been preferred to "acceptability judgments" (Spinner & Gass, 2019).More recently, "acceptability" has also been used (e.g., Maie & Godfroid, 2022) as a theoretically more appropriate term because "grammaticality" is an abstract concept referring to competence (Sprouse, 2013), whereas acceptability judgments are perceptions of how acceptable a sentence or a language feature is; thus, the data elicited this way are behavioral and refer to performance.
Notwithstanding the controversies around their usage and terminology, judgment tasks are seen as a useful tool for understanding the developmental stage of the learner interlanguage.Studies in L2 research have traditionally used binary, categorical selection between grammatically acceptable and unacceptable items or sentences.The binary judgment task administered with no time pressure has been validated as a measure of explicit knowledge in Ellis (2005) and used in numerous studies examining the knowledge of L2 morphosyntax (e.g., Coughlin & Trembley, 2013;Ellis et al., 2009;Godfroid et al., 2015;Jiang et al., 2011;Zhang, 2015).In SPR studies comparing L2 and L1 online processing, a graded AJT has been the preferred offline choice in measuring participants' knowledge (Jegerski, 2012(Jegerski, , 2015(Jegerski, , 2016;;Kaltsa et al. 2016;Roberts & Felser, 2011;Roberts & Liszka, 2013).
With a different target, the present study contributes to this body of research by juxtaposing online processing and offline (untimed) judgements in investigating modal knowledge.A graded AJT using a Likert-type scale is well suited for present purposes, where the use of modals is evaluated in relation to the surrounding context.To date, the research conducted within the implicit/explicit knowledge paradigm has been focused almost exclusively on morphosyntactic language features.The present study, however, introduces a new linguistic target into this paradigm and fills an important gap by investigating modality, or more precisely, two modal auxiliaries, may and can, involving their semantic, syntactic, and pragmatic functions.

The present study
The study sought to answer the following research questions (RQs): RQ1: (a) To what extent does a match/mismatch between the context and modals expressing ability and epistemic possibility affect acceptability judgement 4 As pointed out by an anonymous reviewer, SPR can also be used in SLA studies, for example, Lee et al. (2022) in which SPR was employed to demonstrate improvement in comprehension and faster processing following treatment with processing instruction.
L2 explicit knowledge and online processing of English modals may and can ratings in target sentences for upper-intermediate L2 English (L1 Croatian) adult speakers compared with L1 English speakers?(b) Does the same alternation, may vs. can, affect the ratings of sentences conveying the meaning of permission? RQ2: (a) To what extent does a match/mismatch in sentences conveying the meaning of ability and epistemic possibility affect reading times in online processing of target sentences for these participants?(b) Does the same alternation, may vs. can, affect online processing in sentences conveying the meaning of permission?
An untimed AJT task was administered to both L2 and L1 speakers, and the results were compared with the performance of both groups on a SPR task.In the Materials and methods subsection of Method and the Results section below we cover the tasks in the order of the RQs-that is, AJT followed by SPR.However, in the procedure subsection of Method, we cover the tasks in the chronological order that participants encountered them-that is, SPR then AJT.

Method
All research materials, analytic methods, and data files associated with this study are available in the Open Science Framework (OSF) via the following link: https://osf.io/6qynj/.The analyses were performed using R (R Core Team, 2022) with various packages and functions.

L1 English
The L1 English participants were undergraduate students from the Education, Biology, and Economics departments at a UK University (mean age = 20.95,SD = 2.17, range: 18-35, N = 44).The SPR data from 40 L1 English participants and AJT data from 42 L1 English participants were used.Originally, a total of 44 participants completed the SPR, but data from four of these participants (including two who also did not show up for the AJT) were removed after more than two comprehension questions were answered incorrectly (<90% accuracy).

L2 English
L2 English participants were first-and second-year undergraduate students majoring in English at a Croatian university (mean age 20.6, SD = 1.08, range: 19-24, N = 42).The SPR data from 35 L1 Croatian (hereafter "L2 English") participants and AJT data from 41 L2 English participants were used.Originally, 42 L2 English participants completed an Oxford Placement Test and achieved a mean accuracy score of 80.48% (SD = 5.62, range: 70-92), which is comparable to a high B2/low C1 CEFR proficiency level or to an approximate TOEFL overall score range of 84 to 106.The AJT data were obtained from 41 participants, as one participant did not attend the session.SPR data were retained for 35 L2 English participants.A further seven (including one who did not attend the AJT) completed the SPR task but were removed for answering more than two comprehension questions incorrectly.

Acceptability judgment task (AJT)
The AJT stimuli contained the same sentences as those used in SPR (see below), minus the final sentence, which we removed because it was nonessential for eliciting acceptability ratings and to reduce the reading burden.Target sentences were presented in bold to emphasize the focal point of participants' acceptability ratings.The AJT used a Likert-type scale from 1 to 6, where 1 was least acceptable, and 6 was most acceptable.At the top of the sheet, the instructions asked participants to read sentences carefully, to indicate in each case, the acceptability of the sentence, and for less/unacceptable sentences (i.e., those rated 1-3), to also underline the perceived error.

Self-paced reading (SPR) task
The SPR stimuli were 36 target items comprising sentences with the modals can and may manipulated so that each appeared in a matching or a mismatching context relative to the surrounding text, referred to as "congruent/incongruent" for ability/ sensation and epistemic possibility.For permission/offers we use the term "formally marked/unmarked" (but for convenience, we use congruent/marked and incongruent/ unmarked interchangeably below).There were 18 target items (six each for ability/ sensation, epistemic possibility, and permission/offers) in one of the two conditions (congruent/formally marked and incongruent/formally unmarked).Following Stewart et al. (2009), each item comprised three sentences, the first providing a situational context, the second the target sentence containing the modal, and the third serving to wrap up the situation described.Here we provide an example of each modal category as they appeared in the SPR task (target sentences are in bold).
(9) A modal indicating ability: Sara is a very experienced driver.Surprisingly, she can/may drive a van, but she is not able to ride a bike.She has been driving for more than twenty years on all sorts of roads.
(10) A modal indicating epistemic possibility: 5 Carol is waiting for her friends to pick her up, but they haven't arrived yet."They may/can be waiting in the car," her mum says.Carol is very impatient.
(11) A modal indicating permission: 5 An anonymous reviewer pointed to the potential problem with the 'modal + HAVE + past participle' construction of certain epistemic possibility items, which, from a morphosyntactic perspective, is arguably more complex than a 'modal + base form' verb construction and could therefore require more cognitive resources to process.However, such constructions do not appear to have had any effect on L1 online processing because the slowdown reaction to the mismatching modal in the context of epistemic possibility starts on Segment 3, earlier than in sentences expressing ability that all use the 'modal + verb base form' construction.
L2 explicit knowledge and online processing of English modals may and can https://doi.org/10.1017/S0272263123000475Published online by Cambridge University Press Andrea was sorting the books on her bookshelf when her friend Lisa came in."You can/may take some if you wish, but please don't keep them for long," said Andrea.Lisa was happy to take several books.
There were 56 items in total, comprising 36 experimental target items and 20 fillers, which also contained three sentences but with a second sentence unrelated to modals (see the stimuli list on the study's OSF page).Participants also answered 20 comprehension questions (10 each for target items and fillers) to confirm that they were reading the sentences, focusing on meaning, and not skipping through.Target items, adapted from British National Corpus (BNC Consortium, 2007), Coates (2014), and Palmer (1986Palmer ( , 1990)), were designed with an equivalent number of syllables as far as possible (Jegerski, 2014).As shown in Table 1, the first word in the sentence (Segment 0) was either a personal pronoun or a one-syllable name, the second word (Segment 1) was the modal (can/may), the third (Segment 2) was a one syllable word in 34 out of 36 sentences (two words had two syllables), the fourth (Segment 3) a one syllable word in 29 of the target sentences, and the remaining seven words had two syllables.Experimental target items and fillers were pseudorandomized (see Procedure).

Instrument reliability
Overview.For methodological transparency and to provide useful information on the psychometric properties of these stimuli (Plonsky & Derrick, 2016;Marsden et al., 2018), we make the full set of AJT and SPR instrument reliability estimates available on the study's OSF page, where information on the procedure for estimating instrument reliability can also be found.
AJT task reliability summary.Instrument reliability estimation for the AJT data was challenged by nonconvergence in seven out of 24 analyses, with items having no or little variance due to participants rating them as clearly highly acceptable or unacceptable (i.e., all/most participants selecting 6 or 1) or a negative loading or loading of 1 on a single factor involved in reliability computation.Because such rating patterns are theoretically valid, we retained all items for the main analyses (note also that item was modeled as a random effect).However, we generally needed to remove statistically "rogue" items when computing AJT instrument reliability estimates (for full details see "All reliability estimates" on the study's OSF page).The estimates summarized immediately below reflect these issues, offering a best possible (but imperfect) picture of AJT instrument reliability in the present study (for further information see "AJT and SPR instrument reliability summary interpretation" on the study's OSF page).

Procedure
In this section, we cover tasks in the order the participants encountered them (i.e., SPR then AJT).Following the procedure suggested in Keating and Jegerski (2015), we first administered the SPR task and then AJT to avoid any possibility that participants consciously notice the presence of ungrammatical items in the SPR, as the AJT may make them metalinguistically aware.As common practice in psycholinguistic and SLA research shows, studies that measure both online processing and offline performance on tests of explicit knowledge administer the tests one immediately after another: implicit first, then explicit (e.g., Coughlin & Tremblay, 2013;Jegerski, 2012Jegerski, , 2015Jegerski, , 2016;;Maie & Godfroid, 2022;Roberts & Liszka, 2013).We administered the two tasks with a 1-or 2-day difference at both research sites (the UK and Croatia) depending on whether a participant had completed the SPR on Day 1 or Day 2. The reason for spreading the administration of the SPR over two days was that each participant was tested individually, having to spend at least 15-20 min with the research assistant: to read the information about the study, ask questions and receive answers, have the procedure explained, read and sign the ethical consent, and have practice before completing the task itself.We are confident that the split administration of the two tasks could not have any effect on results because, despite using the same sentences, the two tasks were entirely different.

SPR task
The SPR task was administered using the freely available Psychopy software (Peirce, 2007(Peirce, , 2009)).Participants were instructed to read each sentence at a normal speed and press the space bar to proceed to the next word, with each word on the screen disappearing before the next word appeared.A centred noncumulative "stationary window" method was used, with the white experimental items/fillers text appearing on the black screen word by word until the end of the entire task.Only one word was visible at a time.After each set of sentences, an instruction appeared on the screen to remind participants what they were required to do.
Before starting the main task, participants read three items for practice to help them become familiar with the task.The practice items had a structure similar to that of the experimental items but were unrelated to the modals.For the main task, participants read through the 56 items that appeared in the general order of two experimental target items followed by one filler (a target-item-to-filler ratio of 2:1 for 32 target items and 1:1 for two target items), with 20 randomly appearing comprehension questions, 10 following the 36 target items (comprehension was tested 28% of the time) and 10 following the 20 fillers (filler comprehension was tested 50% of the time).Comprehension questions were unrelated to the use of modals to avoid interfering with target item processing (Roberts & Liszka, 2013).
To account for possible order effects (see also Data analysis), the experimental target items and fillers were counterbalanced into two versions (1 and 2), each encountered by only half of the participants.Thus, half of the participants encountered target items in the order 1-36 and fillers in the order 1-20 (Version 1), and the other half encountered target items in the order 36-1 and fillers in the order 20-1 (Version 2).If an item in Version 1 had a congruent modal, the same item in Version 2 had the incongruent modal and vice versa; otherwise, the two versions were identical.At the end of the SPR task, a sentence appeared explaining that this was the end and thanking participants.

AJT task
The AJT task was administered as a pen-and-paper task.First, verbal and written instructions were given.Participants were asked to ignore any spelling or punctuation mistakes and not to go back and change a response once decided.One example sentence (not involving a modal) was then provided, with a corresponding rating and underlining, as it had lower acceptability.
The AJT task used the same target items and fillers as the SPR task and was also counterbalanced into two versions, but it did not contain comprehension questions.Participants completed the same version encountered in SPR, with the difference that the AJT task presented all 56 items (36 target items and 20 fillers) in a single list, with either two items followed by one filler (14 times) or one target item followed by one filler (6 times), whereas the last two items were target items.Target items and fillers were randomized within this ordering (e.g., target items did not follow the order 1-36).

Data analysis
To address RQ1 we ran ordinal mixed-effects regression analyses separately for each modal type (ability, epistemic, permission) using the clmm function in the ordinal package in R (Christensen, 2019).These analyses comprised six AJT models.We ran separate analyses for the L1 English participants (AJT models 1-3 for ability, epistemic, and permission, respectively) and L2 English participants (AJT models 4-6 for the respective modality types) and visually compared models, as this offered more nuanced and interpretable findings than entering "group" (L1/L2) as a predictor in an interaction with "congruency" (but interested readers can construct such analyses using the OSF R script).All analyses had one ordinal outcome variable, "rating" (responses spanned a 1-6 scale covering least to most acceptable).The AJT models 1-6 had one fixed effect, "congruency" with two levels (congruent, incongruent), sum-coded to compare the rating mean for a given level with the overall rating mean for both levels, with by-participant and by-sentence random slopes.
To address RQ2, we first collected all SPR target item reading times (RTs) calculated in milliseconds from Segment 0 (the word preceding the modal) to Segments 1, 2, 3, 4, 5, and 6 and responses to the 20 comprehension questions (10 following target items, 10 following fillers).As noted above (see Participants), four L1 English and seven L2 English participants with less than 90% comprehension accuracy (>2/20 responses incorrect) were removed before further analyses.Given that the L2 English participants were upper-intermediate proficiency, we applied the same comprehension accuracy standard as for the L1 English participants.These steps resulted in 40 L1 English participants and 35 L2 English participants retained for the main SPR analyses.
Before proceeding with main analyses, we conducted an item analysis to evaluate the scope and functioning of the items across both tasks. 6A qualitative reexamination of items 26 and 27 (adapted from Coates's corpus) revealed that although they would satisfy Bybee's definition of speaker-oriented modality, it was preferable to remove them from all SPR and AJT analyses because they are fixed phrases.Quantitatively, an examination of the L1 participants' AJT average ratings and item discriminability analyses revealed no systemic problems with the statistical function of items.
In line with recent methodological syntheses of L2 SPR methodology and outlier treatment in applied linguistics studies, we considered RTs in terms of their potential legitimacy and distribution rather than using standard deviation boundaries (Nicklin & Plonsky, 2020).First, we set the lower boundary for a legitimate RT at 150 ms, the point at which L1 magnetoencephalography (MEG) research suggests that lexicality (i.e., word form identification) likely begins, although findings vary (Hsu et al, 2011;Nicklin & Plonsky, 2020).Although the 150-ms boundary is derived from L1 MEG research, we also used it for the L2 English participants given their reasonably high proficiency and in the absence of equivalent L2 MEG research (Nicklin & Plonsky, 2020).The upper boundary was set at the commonly used 2,000-ms level for both L1 and L2 English participants, a potentially strict cutoff for mid and lower L2ers but reasonable, we argue, for upper-intermediate L2ers and considering that Plonsky and Nicklin (2020) found the median RTs in 18/19 L2 studies (some including sentence/ phrase rather than word-by-word presentation) to lie below this boundary.These steps resulted in the removal of 26/9,520 L1 English RTs (0.27%) and 78/8,330 L2 English RTs (0.94%) after the exclusion of items 26 and 27.The distribution of the data were then checked, and for each SPR model, RTs were log transformed to reduce positive skew.
For the main analyses, we ran linear mixed-effects regression analyses separately for each modal type (ability, epistemic, permission) using the lmer function in the lme4 R package (Bates et al., 2015).These analyses comprised six SPR models.Again, we ran separate analyses on the L1 English data (SPR models 1-3 for ability, epistemic, and 6 We thank the anonymous reviewers for suggesting this preliminary step before proceeding with further analyses. L2 explicit knowledge and online processing of English modals may and can 13 https://doi.org/10.1017/S0272263123000475Published online by Cambridge University Press permission modality, respectively) and L2 English data (SPR models 4-6 for the respective modality types), in each case looking at which sentence segments readers significantly slowed down on (and by how much) after encountering an incongruent/ formally unmarked modal.This offered a more nuanced and interpretable set of findings compared with entering "group" as predictor in a three-way interaction with "congruency" and "segment" (again, interested readers are referred to the OSF R script).These analyses had one continuous outcome variable, "log RT," and two categorical fixed effect predictors entered as an interaction, "congruency" (sum-coded: congruent as 1, incongruent as -1) × "segment" (seven levels, coded [segment] 0, 1, 2, 3, 4, 5, and 6).
The buildmer R package (Voeten, 2022) was used to automatically identify optimal models based on which terms made significant contributions to log-ratio likelihood.Thus, for the SPR analyses, random effects were maximally specified as by-participant and by-sentence random intercepts and slopes for the congruency-segment interaction (Barr et al., 2013), with buildmer arriving at optimal, simpler structures-namely, the intercept and slope of congruency conditioned on "participant" for all models and on "sentence" for all models except SPR Model 6 (L2 participants, permission), for which only the intercept was used.Given the counterbalancing in our experimental design and random effects pertaining to items, we opted not to enter stimuli version as an additional fixed or random effect (cf.Mifka-Profozic et al., 2020).
For the AJT models, we report effect estimates and corresponding 95% confidence intervals (CIs), standard errors, degrees of freedom, z values (equal to the estimate divided by the standard error), p values, and, estimated independently from the models (Jegerski, 2018), standardized effect sizes (Cohen's d) and 95% confidence intervals for within-participants comparisons of mean ratings for congruent versus incongruent items by modal and participant type, calculated using the effsize R package taking into account within-participant variation (Torchiano, 2020).Both AJT and SPR effects (see below) are interpreted using Cohen's (1988) benchmarks of d = .20(small), d = .50(medium), and d = .80(large) and by noting 95% confidence intervals that did not pass through zero, indicating a reliable effect (cf.Jegerski, 2018).As Plonsky and Oswald's (2014) L2-field-specific scale for within-group contrasts of d = .60(small), d = 1.0 (medium), and d = 1.4 (large) was derived from a meta-analysis of a large number of pedagogical interventions, it had limited applicability given the current study's instrumentation, design, and specific focus.
For the SPR models, we report effect estimates and corresponding 95% CIs, standard errors, degrees of freedom, Wald t values (the estimate divided by the standard error), p values, and standardized effect sizes (Cohen's d) and 95% confidence intervals, as described for the AJT, with the addition of by-segment analyses and consideration of Avery and Marsden's (2019) indicators of reliable SPR ambiguity resolution to interpret ability modal effect sizes (for L1 d = .23[.13, .32];for L2 d = .19[.12, .25])and anomaly detection to interpret epistemic modal effect sizes (for L1 d = .41 [.29, 54]; for L2 d = .19[.09, .29]).As permission modals expressed a pragmatic function, Avery and Marsden's (2019) findings were not appropriate for interpreting effect sizes for this type of modal.Any by-segment results for SPR are interpreted from Segment 2 (the lexical verb immediately following the modal) onward.
To establish the statistical significance of regression estimates, the conventional alpha values of .05,.01,and .001were lowered to .0083,.0017,and .00017 to reduce the chance of a Type I error (i.e., dividing these alpha values by six, given that in any analysis the highest order interaction had six parameters, any of which could support our predictions).The p values for estimates are thus interpreted against these adjusted alpha values.For simplicity, we report unadjusted 95% CIs of estimates rather than 99.167% CIs (i.e., the equivalent precision of 95% CIs adjusted for six repeated tests).
For all models we used the r2 function in the performance R package (Lüdecke et al., 2021) to compute and report marginal and conditional R 2 to show the proportion of variance explained, respectively, by the fixed effects alone and combined fixed and random effects.The R 2 values were interpreted as small (.18), medium (.32), or large (.51) based on the amount of variance in AJT ratings and SPR log RTs explained (Plonsky & Ghanbar, 2018).

Results
In the Results section we cover tasks in the order of the RQs (AJT then SPR).

AJT analyses (RQ1)
In this section we report the results of the ordinal mixed-effects regression analyses of the AJT data (AJT models 1-6) focusing on model estimates that directly address RQ1 and Cohen's d effect sizes (see the study's OSF page for descriptive statistics and the R script to obtain the full list of model estimates).
Table 2 and Figure 1 show that both L1 and L2 groups rated congruent/formally marked items as significantly more acceptable than incongruent/formally unmarked items across all three modal types.
The estimates in Table 2 and Cohen's d effect sizes in Table 3 show that the effect of congruency (ability and epistemic modals) was stronger for L1 English participants than for the L2 English participants, but for formality (permission modals) the effect was slightly stronger for the L2 participants.Items were rated significantly differently for all modal types by both participant groups, with Cohen's d effect size confidence intervals not passing through zero.Effects were (very) large for ability (L1 d = 9.64 [6.47,  L1 d = 11.15 [7.92, 14.37]; L2 d = 3.17The medians and interquartile ranges contained in Figure 1 show that on average, both L1 and L2 participants rated congruent items on the upper half of the acceptability scale (i.e., 4-6) and incongruent items on the lower half of the scale (i.e., 1-3) for ability and epistemic possibility.Both groups rated both formally marked and unmarked items on the upper half of the scale for permission modal, with may significantly higher in each case.
To summarize, the analyses of the AJT data showed large effects for both the L1 and L2 English groups, with each rating congruent/formally marked items as significantly more acceptable than incongruent/formally unmarked items for all modals.For both L1 and L2 English participants, effects were most pronounced with ability and epistemic possibility The two groups were most distant from one another in the way they rated epistemic possibility, which for L2ers showed a comparatively smaller (but still large) effect than for ability modals.For ability and epistemic possibility, the L1 English participants consistently provided more extreme ratings at either end of the rating scale compared than did the L2 participants.The L1 English participants rated both marked and unmarked items slightly higher on the whole (medians in the 5-6 range) than the L2 English participants did (medians in the 4-5 range).

SPR analyses (RQ2)
In this section we report the results of the mixed-effects regression analyses of the SPR data (SPR models 1-6) and Cohen's d effect sizes.For each model, we again focus on the fixed-effect estimates that directly address the research questions-namely the Congruency × Segment interaction (see the study's OSF page for descriptive statistics and the R script to obtain the full list of model estimates).
The estimates in Table 4 (visualized in Figure 2) and Cohen's d effect sizes in Table 5 show that both participant groups displayed sensitivity to semantic ambiguities with the modal denoting ability.Specifically, the L1 English participants had significantly slower log RTs for ambiguous sentences at Segment 4 (estimate = -0.109[-0.136, -0.83], t = -8.095,p < .001,d = -0.92[-1.24, -0.59]) and Segment 5 (estimate = -0.038[-0.064,L2 explicit knowledge and online processing of English modals may and can Table 4. Summary of fixed effect predictors of log RTs (outcome) for L1 English SPR data (SPR models 1-3) and L2 English SPR data (SPR models 1-3), significant predictors in bold, results interpreted from Segment 2 (the lexical verb immediately following the modal) onward     Note._ = effect size not interpreted using this framework; blank space = effect size was negligible by this framework.
Cohen's d effect sizes show the slowdown on incongruent sentences was greatest for L1 ability Segment 4 (a large effect size with reliable sensitivity to semantic ambiguity), followed by L1 epistemic Segment 4 (medium with reliable sensitivity to grammatical violation), and L2 ability Segment 4 (medium with reliable sensitivity to semantic ambiguity).Table 5 shows that various other L1 ability, L1 epistemic, and L2 permission segments had small effects (Cohen, 1988), reliable sensitivity to grammatical anomaly and ambiguity indicated (Avery & Marsden, 2019), and confidence intervals not passing through zero, but as shown in Table 4 and Figure 2, the mixed-effects model estimates for these segments were not significant with the adjusted alpha value.
To summarize, analyses of the SPR data showed that for ability modal L1 and L2 English participants were similarly sensitive to context-modal mismatches involving the incongruent may (SPR Models 1 and 4) with an observable spillover effect commencing at Segment 4, which was large for L1ers and medium for L2ers.For epistemic modality, only the L1 English participants were sensitive to mismatches involving the incongruent can, with a small, observable spillover effect commencing in significance at Segment 3 (SPR model 2).For permission, no significant difference was shown in either group, between the RTs associated with the use of either can or may.

Discussion
The present study sought to investigate the nature of L2 English knowledge for modals may and can and how this compares with that of L1 speakers.RQ1 asked about how a match/mismatch between the context and modals expressing agent-oriented ability and epistemic possibility as well as the formal context in speaker-oriented permission affect offline acceptability judgement ratings for the two participant groups.RQ2 asked about how a match/mismatch and the formal context affect reading times in online processing of sentences containing these modals, indirectly tapping into implicit knowledge.
The results revealed two major findings.The first is that L1 and L2 English speakers rate the acceptability of sentences containing modals can and may in an offline AJT similarly-that is, sentences containing a modal congruent with the context were rated significantly higher than sentences with the modal mismatching the context, with very large effect sizes observed.Likewise, the pragmatic use of may in formal situations was rated by both groups significantly higher than the use of can, with large effect sizes observed; however, can was still rated on the upper half of the scale, which suggests that the participants do not reject it as they do incongruent use of the modal mismatching the context for ability and epistemic possibility.The difference between the reliable acceptance of context/modal match and rejection of the mismatch referring to the modals expressing ability (can) and epistemic possibility (may) suggests that L2 learners exhibit a level of explicit knowledge related to the semantics of the two modals.A difference between the L1 and L2 English speakers can be seen in the levels of their certainty.The L1 speakers consistently used extreme ratings (with a smaller spread) in both acceptance and rejection, showing a high level of certainty.The L2 speakers, on the other hand, seem to be more hesitant and apply more cautious ratings (with somewhat more spread), although still generally in accord with the L1 group.This may be interpreted as characteristic of explicit knowledge, which is known to be variable and unsystematic in L2 learners (Ellis et al., 2009).
The second major finding of the study is that the L2 learners differed from native speakers in their processing of epistemic modality-that is, L2 learners did not show sensitivity to grammatical violation when can was used instead of may in a context suggesting epistemic possibility.However, they were sensitive to mismatches, like L1 speakers, when processing sentences using the modal may instead of can to express ability, with a large effect for L1ers and a medium effect for L2ers.This suggests that the upper-intermediate L2 learners have developed their representations of the semantics contained in the modal can, but this cannot be confirmed for the epistemic meaning encoded in the modal may.
The differential behavior of L2 speakers in the two types of task (AJT and SPR) is significant because it indicates that two different types of knowledge are being tapped into while rating sentence acceptability, on the one hand, and real-time processing, on the other.Our study sought to contribute an examination of modality to the discussion on implicit/explicit L2 knowledge.As such it is one of the rare studies that used graded acceptability judgments to tap into explicit linguistic knowledge.However, one may argue that modal knowledge cannot be acquired declaratively (i.e., by learning the pedagogical rules) because it is a different type of knowledge than knowledge of morphosyntactic features, which sometimes can be learned explicitly.Specifically, one may ask to what extent epistemic meaning can be taught or learned explicitly or declaratively.
Even though it is fair to say that epistemic meanings can be acquired only inductively or by experience, an important consideration is that it is not always possible to make a direct link between explicit knowledge and explicit learning or between implicit knowledge and implicit learning (DeKeyser, 2003).In other words, explicit knowledge does not have to be the product of only explicit teaching/learning.In Bialystok's (1994) model of learning built on the cognitive processes of analysis and control, implicit knowledge may become explicit through the process of analysis.Furthermore, as Bialystok explains, both the L1 and L2 develop through the same cognitive processes of analysis and control.Therefore, it should not be surprising to see a level of explicit knowledge relating to the areas of language that are usually acquired inductively by L2 learners or even to the natively acquired L1.A recent longitudinal study by Kim and Godfroid (2023) provided evidence of the development path from explicit to implicit knowledge in L2 learners.As for the L2 participants in the current study, it merits mentioning that they are students majoring in English who study language, thus modals and their usage are included in their curriculum.Language study also involves much practice with language, reading, writing, speaking, listening, etc.It is likely that all such experience will contribute to the development of explicit knowledge in absence of any pedagogical rules and, with time, possibly to the development of implicit knowledge.
What matters here is the fact that in providing their acceptability judgments without time constraint, both L2 and L1 speakers were able to think about, analyze, and report their perceptions, be they built on declarative memory or drawn from native intuitions.Sprouse (2013) refers to AJTs as "consciously reported perceptions of acceptability that arise when native speakers attempt to comprehend a (spoken or written) utterance" (p.97).Because the analyzed knowledge built on native intuitions is by definition more stable than acquired knowledge in a second language, it is not surprising that the L2 speakers demonstrated more variability in their acceptability judgments.In this case, the levels of certainty and variability may just reflect the levels of proficiency.
Evidence demonstrating a difference between L2 speakers' offline acceptability judgments and their online processing has previously been found in L2 studies conducted within the explicit/implicit framework (Jegerski, 2015;Jiang et al., 2011;Roberts & Liszka, 2013;Tokowicz & Warren, 2010).The majority of these studies focused on morphosyntax such as inflectional agreement and syntactic ambiguities such as garden-path sentences.The present study extends this research agenda to modality and, along with Roberts and Liszka (2013), contributes to the investigation of L2 acquisition of tense, aspect, and modality from the perspective of sentence processing.Our results suggest that implicit knowledge as a component of language competence can be developed relatively early (Tokowicz & Warren, 2010) for some modals and some meanings-namely, those expressing ability 7 -but not for other meanings-namely, epistemic possibility.As the results show, native-like linguistic behavior can be seen in the L2 learners' processing of sentences containing an incongruent modal expressing ability/sensation, where both L1 and L2 groups experienced a slowdown caused by a mismatch between the modal and the context.However, L2 readers recovered sooner whereas L1 readers took more time for disambiguation.We explain this as evidence of a somewhat easier/quicker recovery from semantic ambiguity for L2 readers, which may suggest that L1 speakers, operating on their native linguistic competence, experience a greater disruption when encountering semantic ambiguity and it takes them longer to recover.Similarly, L1 speakers take longer to recover from syntactic violation, to which L2 speakers did not show sensitivity.
No significant change in reading time for either L1 or L2 speakers was observed in sentences using may or can for speaker-oriented meaning expressing permission.As previously pointed out, this function of the two modals is of a different nature than the meanings contained in their syntactic and semantic roles: speaker-oriented may and can denoting permission serve the pragmatic function.Our findings offer further proof that for giving or asking permission, both modals are today used interchangeably.Although may has typically been considered pragmatically suitable in more formal settings, Leech's (2003) analysis of written and spoken corpora of British and American English found that the use of may for permission had declined over the 30-year period from 1961 to 1991, whereas the use of can in formal situations had increased.It is possible that the trend is present now, too.
In addition to the substantive findings, the study provides valuable data on instrument reliability in AJT and SPR stimuli (Marsden et al., 2018) and, adding to those in Mifka-Profozic et al. (2020), shows how estimates varied across participant and instrument features.It also sheds light on the psychometric properties and error associated with these types of instrumentation.For these purposes, the instrument reliability analyses benefitted from the application of superior coefficients to the much used (but often misapplied) Cronbach's alpha (McNeish, 2018;cf. Raykov & Marcoulides, 2019), steps we encourage other researchers to take.
For the main analyses, mixed-effects regression offered a more robust and nuanced approach to modeling the SPR data than analysis of variance (Plonsky & Oswald, 2017), and ordinal mixed-effects regression offered a powerful method for handling the ordinal AJT outcome variable, avoiding the need to (mis)treat this variable as continuous.The standardized effect sizes reported allowed us to consider the magnitude of effects alongside statistical significance, providing evidence that is less confounded by sample size in standardized measurement units (standard deviations) that enable 7 Anecdotal evidence from EFL teachers suggests that, at least for Croatian L1 speakers, the modal can is learned very early, as one of the first verbs used.
L2 explicit knowledge and online processing of English modals may and can systematic comparison across studies with different designs, foci, participants, and instrumentation (Avery & Marsden, 2019).In combination, this nuanced methodology helped us build on Mifka-Profozic et al.'s (2020) findings, extending the inquiry to establish how linguistics knowledge and online processing of English modals manifest across first and second languages for these types of learners.

Limitations
In conducting SPR in the current study, we made all efforts to strictly follow the procedures recommended by Keating and Jegerski (2015) and Jegerski (2012Jegerski ( , 2014)).However, we note several potential limitations.First, experimental items were designed to be as comparable as possible in terms of the number of syllables (Jegerski, 2014), but ensuring natural-sounding sentences meant there was not always perfect consistency.The first word in the sentence (Segment 0) was always either a personal pronoun or a name, the second word (Segment 1) was always the modal (may/can), the third word (Segment 2) was a one-syllable word in 34 out of 36 sentences (two words had two syllables), the fourth word (Segment 3) was a one syllable word in 29 of the target 36 sentences, and the remaining seven words had two syllables.Nevertheless, as we controlled for random effects linked to items, we believe that these minor discrepancies in the number of syllables did not affect the results.
A second possible limitation is that each target item was not always followed by a filler or a comprehension question.We used a target-item-to-filler ratio of 2:1 for 32 target items and 1:1 for four target items, with 20 randomly appearing comprehension questions, 10 following the 36 target items and 10 following the 20 fillers.The reason for such a decision was that in our study each item consisted of three sentences rather than of isolated sentences as in most other SPR studies; thus, we reduced the number of fillers to avoid the participants' fatigue.Overall, we used a ratio of 36 target sentences (one in each item of three sentences) versus 108 nontarget sentences in experimental items plus 60 filler sentences, unrelated to the target items (168 in total).In making clear these potential limitations, we hope to aid future researchers seeking to achieve the optimal balance between item authenticity, coverage, and participant attention/fatigue.

Suggestions for further research
Well-attested syntactic and semantic complexities of the English modals, along with their complex pragmatic interpretations, contribute to L2 modal acquisition in various ways.However, we still do not know what the single contribution of each specific aspect may be.Therefore, further studies are needed on other modals to more fully understand how acquisition of these modals links to the type of knowledge developed.The participants in the present study were upper-intermediate speakers of L2 English, but research suggests that only highly proficient or near-native L2 speakers perform at the level comparable to that of L1 speakers in online processing (Jackson, 2008;Jegerski, 2016;Hopp 2016;Hopp & Lemmerth, 2018).It is possible that the participants in the current study are still developing their comprehension of epistemic meaning while having achieved a fairly advanced level in other aspects of linguistic performance.Immersion may play a role in accelerating the acquisition.Thus, further investigation with highly proficient, near-native L2 users of English would be welcome.
A possible factor here is the late acquisition of epistemic meaning individually in both L1 and in L2 (Giacalone Ramat, 1992;Granget et al., 2018;Ozturk & Papafragou, 2015;Traugott & Dasher, 2004) and diachronically from pragmatic and lexical means to grammaticalized forms (Bybee et al., 1994).If Giacalone-Ramat's hypothesis regarding L2 modal acquisition is correct, longitudinal studies following L2 learners from their beginner to more advanced stages would be able to show in what ways L2 acquisition takes place.In this strand of research, online sentence processing could also reveal the stages that L2 learners go through to achieve L2 competence.

Table 1 .
Distribution of segments in target verb phrases a Agent-oriented modality b Speaker-oriented modality

Table 3 .
Cohen's d effect sizes for within-participants comparisons of mean ratings for congruent/ formally marked versus incongruent/unmarked items by modal type for L1 English and L2 English participants with two interpretation frameworks

Table 5 .
Cohen's d effect sizes for within-participants comparisons of mean SPR RTs for raw congruent versus incongruent items by modal type and segment for L1 English and L2 English participants with three interpretation frameworks, results interpreted from Segment 2 (the lexical verb immediately following the modal) onward