Introduction
In early second language acquisition (SLA), learners face the simultaneous challenges of segmenting unfamiliar phonological forms, mapping words to meaning, and extracting morphosyntactic regularities from continuous, variable input. Although phonological encoding, lexical learning, and grammar acquisition have been widely studied in isolation, much less is known about how they interact during initial exposure to a novel language. How do learners form stable form–meaning mappings when phonological forms are unfamiliar, lexical boundaries unclear, and grammatical cues opaque?
A central difficulty in early second language (L2) learning is referential ambiguity—the problem that words often co-occur with multiple potential referents (Quine, Reference Quine1960). One mechanism for resolving this ambiguity is cross-situational learning (CSL), which enables learners to track statistical co-occurrences between linguistic forms and referents across multiple exposures. CSL has been shown to support the acquisition of various lexical categories across development, including nouns (Yu & Smith, Reference Yu and Smith2008), verbs (Scott & Fisher, Reference Scott and Fisher2012), and combinations of both (Monaghan et al., Reference Monaghan, Mattock, Davies and Smith2015). More recently, it has been extended to grammar learning: Rebuschat et al. (Reference Rebuschat, Monaghan and Schoetensack2021) developed a CSL paradigm in which participants were exposed to an artificial language containing nouns, verbs, adjectives, and case markers. Participants viewed two dynamic visual scenes and selected the one that matched an auditory sentence in the artificial language. Crucially, no explicit feedback was provided, and participants were unaware that individual trials targeted different linguistic features (e.g., nouns, verbs, adjectives, case markers). This paradigm allowed researchers to test understanding of different linguistic features in a controlled, incidental learning environment. While lexical forms were learned relatively successfully, case marker acquisition remained at chance level, highlighting the difficulty of acquiring low-salience morphological features without explicit instruction.
In the present study, we adapt the paradigm introduced by Rebuschat et al. (Reference Rebuschat, Monaghan and Schoetensack2021) with two key innovations. First, we systematically manipulate the perceptual salience of case markers to test their impact on morphosyntactic learning. Second, we introduce L2-sounding pseudowords, thereby simulating early stages of L2 acquisition and increasing phonological variability. This approach reduces reliance on familiar lexical templates and aligns the learning task more closely with the demands faced by L2 learners in real-world contexts.
This design allows us to investigate whether CSL can support the joint acquisition of novel sounds, words, and grammatical structure in a typologically rich but under-researched language. While recent findings suggest that CSL can support L2 noun learning (Escudero et al., Reference Escudero, Smit and Mulak2022; Ge, Monaghan, & Rebuschat, Reference Ge, Correia, Lee, Jin, Rothman and Rebuschat2025a), this is, to our knowledge, the first study to systematically test whether cross-situational input is sufficient for acquiring an artificial language that integrates L2 phonology, lexical semantics, and morphosyntax.
The role of CSL in language learning
CSL paradigms offer a powerful tool for studying how learners extract meaning from ambiguous input. Typically, participants are presented with two novel referents and one or more pseudowords, requiring them to track word–referent co-occurrences across trials to infer correct mappings. CSL thus operationalizes statistical learning under referential ambiguity, a core challenge in naturalistic language learning.
In relation to SLA, CSL has been shown to support the acquisition of nouns containing L2 phonology, although accuracy tends to be lower when compared to learning in native-like phonotactic contexts (Escudero et al., Reference Escudero, Smit and Mulak2022). This capacity remains across the lifespan, persisting into late adulthood (Ge et al., Reference Ge, Monaghan and Rebuschat2025b). Recent studies further demonstrate that CSL is robust enough to support the learning of words that differ in difficult L2 phonological contrasts (e.g., Escudero et al., Reference Escudero, Smit and Mulak2022; Ge et al., Reference Ge, Correia, Lee, Jin, Rothman and Rebuschat2025a), supporting its potential as a model for early stages of phonolexical acquisition. However, existing research has focused almost exclusively on isolated word learning, leaving open the question of whether CSL can support more complex aspects of language learning, such as morphosyntactic development.
To address this, Rebuschat et al. (Reference Rebuschat, Monaghan and Schoetensack2021) developed a sentence-level CSL paradigm, described above, which captures a more naturalistic form of incidental learning. Studies using this paradigm have shown that learners can acquire multiple levels of linguistic structure simultaneously (Monaghan et al., Reference Monaghan, Ruiz and Rebuschat2021; Walker et al., Reference Walker, Monaghan, Schoetensack and Rebuschat2020), with learning outcomes that mirror developmental sequences in natural language acquisition—most notably, the earlier acquisition of content words relative to function morphemes (Rebuschat et al., Reference Rebuschat, Monaghan and Schoetensack2021). Furthermore, these effects are not transient: learning persists over time and correlates with memory systems known to underlie naturalistic SLA (Walker et al., Reference Walker, Monaghan, Schoetensack and Rebuschat2020).
A major challenge in L2 acquisition, particularly in immersive contexts, is unfamiliar phonology. While novel form representations may be encoded rapidly, their precision often declines when L2 phonemes diverge from the learner’s L1 inventory. This can lead to underspecified or fuzzy representations (Bordag et al., Reference Bordag, Gor and Opitz2022), particularly when multiple L2 sounds are assimilated to a single L1 category—for example, English /æ/ and /ɛ/ both mapping onto German /ɛ/ (Llompart, Reference Llompart2021). Such fuzziness disrupts the mapping between phonological and semantic representations and introduces processing delays, comprehension errors, and reduced learning efficiency.
Against this backdrop, the first aim of the current study is to investigate whether CSL can support the simultaneous acquisition of L2 words and grammar from continuous sentence input, using a phonology that differs typologically from participants’ L1. By incorporating L2-like pseudowords and longer speech streams, we seek to approximate early L2 acquisition in ecologically valid learning conditions—especially those that characterize immersion environments, where learners face high referential ambiguity and minimal explicit instruction. Additionally, we examine the phonological granularity of the resulting lexical representations, testing whether learners encode fine phonemic detail in their newly formed L2 words and how this relates to semantic accuracy. This approach allows us to bridge research on CSL, phonolexical encoding, and morphosyntactic development within a unified experimental framework.
The role of salience in the acquisition of L2 morphology
Morphological case marking remains one of the most persistent challenges in adult language learning. Even intermediate-level learners often struggle to use case consistently (Haznedar, Reference Haznedar and Lle2008), a pattern replicated in artificial language learning studies (e.g., Fedzechkina et al., Reference Fedzechkina, Newport and Jaeger2017; Kenanidis et al., Reference Kenanidis, Dąbrowska, Llompart and Pili-Moss2023; Rebuschat et al., Reference Rebuschat, Monaghan and Schoetensack2021). Several factors contribute to the difficulty of acquiring L2 morphology, including low perceptual salience, redundancy of form–function mappings, lack of transparency, and crosslinguistic transfer (Ellis, Reference Ellis, Gass, Spinner and Behney2017).
In this study, we focus on perceptual salience, defined as an intrinsic property of the linguistic form itself—distinct from contextual, semantic, or learner-related notions of salience (Boswijk & Coler, Reference Boswijk and Coler2020; Knell et al., Reference Knell, Cipitria, De Cuypere, Housen and Struys2025). Goldschneider and DeKeyser (Reference Goldschneider and DeKeyser2001) propose that, in the domain of morphology, perceptual salience can be operationalized via segmental length, syllabicity, sonority, stress, and positional prominence. Their meta-analysis of English L2 morphology acquisition showed that perceptual salience was the strongest predictor of acquisition order (r = .63), underscoring its foundational role.
Theoretical models of SLA have long emphasized the role of attention in form learning. According to the Noticing Hypothesis (Schmidt, Reference Schmidt1990), learners must consciously attend to linguistic forms for these to be integrated into developing interlanguage systems. Increased perceptual salience is thought to enhance noticing by making relevant cues more available to awareness (Williams & Rebuschat, Reference Williams, Rebuschat, Godfroid and Hopp2023). Supporting this view, Simoens et al. (Reference Simoens, Housen, De Cuypere, Gass, Spinner and Behney2017) manipulated case marker salience in a semi-artificial language during meaning-focused versus grammar-focused reading tasks. Highly salient markers were skipped less often and elicited longer fixations in ungrammatical contexts—especially under explicit learning conditions—indicating greater attention allocation and deeper processing. These effects were not observed for low-salience markers, suggesting that form characteristics themselves modulate perceptual uptake.
In a recent paper, Kenanidis et al. (Reference Kenanidis, Llompart, Fernández Santos and Dąbrowska2024) provided evidence from a fully artificial language learning paradigm which further suggests that increasing the perceptual salience of morphological markers can promote learning even in the absence of explicit instruction. However, this study conflates salience with marker number—the high-salience condition used a single agentive marker (-pazz), whereas the low-salience condition featured two markers (-i and -o)—thereby limiting the interpretability of its findings.
The second aim of the present study was therefore to isolate and test the effect of perceptual salience on L2 morphological acquisition under tightly controlled conditions. Using an artificial language learning paradigm, we systematically manipulated the salience of two morphological case markers (agent and patient) across four phonological dimensions: length, syllabicity, stress, and sonority. Low-salience markers (/–l/, /–ɾ/) were monosegmental, non-syllabic, unstressed, and contained liquid consonants, which are closest to vowels in the sonority scale except for glides (Clements, Reference Clements1990; Selkirk, Reference Selkirk, Aronoff and Oehrle1984).Footnote 1 In contrast, high-salience markers (/–’ka/, /–’ʃi/) were multisegmental, syllabic, stressed, and composed of phonologically distinct consonant–vowel sequences including obstruents, which are substantially more distant from the preceding vowel on the sonority scale. This design allows us to test whether increasing perceptual salience enhances incidental morphological learning without conflating it with marker frequency, function, or form complexity.
By embedding this manipulation within a meaning-focused cross-situational learning task, we directly assess whether perceptual salience alone can facilitate the acquisition of otherwise challenging grammatical forms in an L2 learning context.
The present study
This study’s design, hypotheses, and analysis plan were preregistered (OSF link; https://osf.io/a258c).
We exposed adult native English speakers to an artificial language based on the phonotactics and morphology of Portuguese—a global language that remains underrepresented in SLA research (Plonsky, Reference Plonsky2023). The artificial language included nouns, verbs, adjectives, and morphological case markers that varied in perceptual salience (see previous section), and the sentences had a verb-final structure. Learners were exposed to the language through a CSL paradigm in which they mapped spoken sentences to animated scenes across multiple exposures. Comprehension of vocabulary was tested within the CSL task itself, while morphosyntactic awareness was assessed using a grammaticality judgment task. To evaluate the phonological specificity of lexical representations, we included a word–picture matching (WPM) task featuring phonological lures, followed by an AX phoneme discrimination task aimed at four phonological contrasts of Portuguese that are known to be challenging for L1-English learners.
Our pre-registered hypotheses focus on the effect of salience on L2 morphology acquisition. Particularly, we hypothesized that participants exposed to high-salience case markers would outperform those in the low-salience condition in their comprehension of case morphology (H1). In addition to improved grammatical performance, we expected salience to yield knock-on effects on lexical learning. Specifically, we hypothesized that the high-salience group would show higher accuracy scores on overall learning through the blocks (H2), more accurate acquisition of nouns (due to co-occurrence with salient markers; H3), and verbs (due to enhanced understanding of argument structure; H4). Similarly, we hypothesized that the high-salience group would show better detection of word order violations (due to clearer segmentation of morphemes; H5) and of case marker violations (because the greater salience should make the structure more prominent; H6). The potential effect of salience condition on adjective learning was explored without a directional hypothesis.
Additionally, we pre-registered as exploratory analyses those that correspond to the WPM task and AX task, which measured the phonetic detail of the acquired representations and their relation to phoneme discrimination ability. Particularly, we investigated whether the lexical representations formed during CSL are phonologically specific or fuzzy with the WPM task. In this regard, we expected that while learners would reliably reject pseudowords with phonological substitutions that are substantially different from the learned forms, they would struggle to reject minimally different lures where the target sounds substituted are expected to be confusable, suggesting that the phonolexical representations formed during CSL remain partially underspecified. We also asked whether individual differences in phonolexical precision are linked to L2 phoneme discrimination abilities. For this question, we expected that learners who were better able to discriminate challenging L2 phonological contrasts in the AX task would also show greater accuracy in rejecting 1-feature mismatches in the WPM task (which corresponded to the same contrasts tested in the AX task), indicating that more precise phonetic sensitivity supports more accurate phonolexical encoding.
Regarding the question of whether adults can acquire novel words and grammar simultaneously by tracking cross-situational statistics even when the language has a non-native phonology, we expected participants to successfully learn both lexical items (particularly nouns and verbs) and grammatical structure (e.g., case assignment and word order), despite the L2 phonological properties of the input.
Methods
Participants
One hundred and sixty participants (Mage = 38.2; SDage = 11.4; female = 83) took part in the study in exchange for financial compensation. Half of the participants (n = 80) were assigned to the high-salience condition and the remaining half (n = 80) to the low-salience condition. All participants were self-identified English monolinguals, and none reported knowledge of Portuguese or any verb-final language. Participants were recruited using the online platform Prolific (www.prolific.co). The study was approved by the ethics review panel of the Faculty of Arts and Social Sciences of Lancaster University and conducted in accordance with the Declaration of Helsinki. All participants provided informed consent before their participation.
Sample size was calculated through a power analysis consisting of running simulations of the planned mixed effects model for the main effect of salience, which, based on previous studies, was estimated to produce a difference of around 10% in case marker accuracy by the end of the study. This estimation was based on data from two artificial language experiments. From Kenanidis et al. (Reference Kenanidis, Llompart, Fernández Santos and Dąbrowska2024), we compared accuracy in the last block of grammar test trials between a group with only one salient patient marker (-pazz, 76.44%) and one with two less salient patient and agent case markers (-i and -o, 62.67%). From Zhu et al. (Reference Zhu, Rebuschat, Nixon and Monaghan2025), we compared accuracy in the last block of number marker test trials between three groups differing in the salience of number markers (high salience [plural and singular markers]: 60%; low salience [plural and singular markers): 43%; low salience [plural] and no marker [singular]: 50%). After running 100 simulations, the power estimation surpassed .80 for the main effects of salience with 80 participants per condition, i.e., 160 total.
Materials: Artificial language
Participants were trained and tested on an artificial language designed to incorporate novel phonological, lexical, and grammatical features. The language consisted of 14 disyllabic pseudowords: 8 nouns (each referring to a unique alien character), 4 verbs (representing transitive actions), and 2 adjectives (corresponding to alien colors). In addition, two obligatory case markers were implemented as suffixes on nouns to indicate grammatical roles (agent and patient). All pseudowords adhered to Portuguese phonotactics and followed a CVCV structure, with either the first vowel or the second consonant corresponding to one of four L2 phonemes: /ʎ/, /ɲ/, /e/, or /o/.
The perceptual salience of case markers was systematically manipulated (high-salience: /-‘ʃi/ and /-‘ka/, low-salience: /-l/ and /-ɾ/). Details of this manipulation are provided in the introduction. To minimize potential sound–meaning biases, four different pseudoword-to-referent mappings were created and randomly assigned to participants. The same lexicon was used consistently across all learning and testing phases. All stimuli are available in the supplementary materials.
The pseudowords were recorded in isolation by a female native speaker of European Portuguese. Noun + suffix combinations were recorded as single lexical items. Recordings were made using a Tascam DR-22WL digital recorder and a Monacor HSE-130/SK head-mounted microphone at 44.1 kHz, 16-bit resolution. A 250 ms silence was appended to the end of each file using Audacity® (Audacity Team, 2024). Sentences were constructed through concatenationFootnote 2 of these pseudowords in Gorilla Game Builder.
The artificial language used two sentence structures: SOV and OSV. Optional pre-nominal adjectives occurred in half of the noun phrases and were counterbalanced across subject and object positions. Both nouns in each sentence were always marked for case. Sentence structures followed those of Rebuschat et al. (Reference Rebuschat, Monaghan and Schoetensack2021), with vocabulary testing expanded to include more trials. In total, 192 training (exposure) sentences and 128 vocabulary test sentences were constructed. Eight fully counterbalanced stimulus lists were created by crossing the four pseudoword–referent mappings with the two salience conditions.
Example sentences for the high- and low-salience conditions are shown in (1) and (2) respectively:
(1)
/ (tiʎu) megu’ʃi (piɲu) kiʎu’ka suʎu /
(2)
/ (tiʎu) megul (piɲu) kiʎuɾ suʎu /
(red) megu-ACC (blue) kiʎu-NOM jump-over
“The (blue) kiʎu is jumping over the (red) megu.”
For the grammaticality judgment task, an additional 48 sentences were constructed: 24 grammatical and 24 ungrammatical. Ungrammatical items contained either incorrect verb placement (as in Rebuschat et al., Reference Rebuschat, Monaghan and Schoetensack2021) or incorrect case marker assignment (newly introduced). See Table 1 for a summary of grammatical and ungrammatical structures.
Grammatical and Ungrammatical Sentence Types Used in the Grammaticality Judgment Task

Table 1. Long description
The table has four columns labeled Sentence Type, Violation Type if applicable, Syntactic Pattern, and Nr of Trials. The first two rows under Grammatical list n slash a for violation type, with NP sub NOM NP sub ACC VP and NP sub ACC NP sub NOM VP as syntactic patterns, each with 12 trials. Ungrammatical sentences are divided by violation type. For Verb placement error, the patterns are VP NP sub ACC NP sub NOM, VP NP sub NOM NP sub ACC, NP sub NOM VP NP sub ACC, and NP sub ACC VP NP sub NOM, each with 2 trials. For Case misassignment, the patterns are NP sub ACC NP sub ACC VP and NP sub NOM NP sub NOM VP, each with 2 trials. For Case marker omission, the patterns are NP dash NP sub ACC VP, NP sub ACC NP dash VP, NP sub NOM NP dash VP, and NP dash NP sub NOM VP, each with 2 trials.
The visual referents used for the CSL task were identical to those developed by Rebuschat et al. (Reference Rebuschat, Monaghan and Schoetensack2021). They included eight distinct cartoon-style alien characters, each rendered in either red or blue, and animated using Gorilla Game Builder to perform one of four transitive actions: hiding behind, jumping over, pushing, or lifting another character.
Tasks and procedure
Participants completed the study online via Gorilla (www.gorilla.sc). After providing informed consent, they filled out the revised version of the Language and Social Background Questionnaire (LSBQ; Anderson et al., Reference Anderson, Mak, Keyvani Chahi and Bialystok2018) to report their linguistic and educational background. The experimental session consisted of four tasks, presented in fixed order: (1) the cross-situational learning (CSL) task, (2) the grammaticality judgment task (GJT), (3) the word–picture matching (WPM) task, and (4) the AX phoneme discrimination task. Finally, participants provided a brief retrospective verbal report to assess their awareness of the morphological case markers. The full experiment lasted approximately 70 minutes. All materials and data are available via the OSF repository (https://osf.io/zm9q6).
Cross-situational learning task
Each CSL trial presented participants with two animated scenes side by side, accompanied by a spoken sentence in the artificial language. Participants were instructed to select the scene that best matched the sentence by pressing “Q” (left scene) or “P” (right scene) on their keyboard. The only instruction given was: “Choose the scene that matches the sentence”.
The CSL task consisted of two trial types: exposure trials, designed to promote learning, and test trials, designed to assess learning.
Exposure trials. In exposure trials, the two scenes differed in two or more aspects (e.g., both the verb and the identity of the agent). This preserved referential ambiguity while enabling participants to track statistical regularities across trials. For example, one scene might show a red alien pushing a blue alien, while the other shows a blue alien lifting a red alien. In this way, multiple cues (e.g., verb, agent, patient, color) co-varied, allowing participants to incrementally associate spoken forms with their meanings through cross-situational inference. See Figure S1 in the supplementary materials for an example.
Test trials. Test trials were identical in structure and appearance to exposure trials but served to probe what participants had learned. The critical difference was that in test trials, the two scenes differed in only one specific aspect, creating a minimal pair. This manipulation allowed us to isolate participants’ knowledge of particular lexical or grammatical elements. No indication was given that testing was taking place.
There were four types of test trials. In noun trials, the two scenes differed only in the identity of one alien character (subject or object). In verb trials, the scenes depicted the same characters but performing different actions. In adjective trials, one alien differed in color (red vs. blue). Finally, in marker trials, the agent–patient roles were reversed, keeping all other elements constant. For example, in a marker test trial, one scene might show a red alien pushing a blue alien, and the other a blue alien pushing a red alien—differing only in thematic role assignment. See Figure S2 in the supplementary materials for examples.
Structure. The CSL task comprised 8 exposure blocks and 4 mixed blocks, with each mixed block including both exposure and test trials. This structure enabled us to monitor the trajectory of learning while maintaining uniform task presentation.
Each exposure block contained 16 exposure trials. Mixed blocks added a further 16 exposure trials and 32 test trials each, yielding a total of 192 exposure trials and 128 test trials across the task. Among the exposure trials, 170 sentences were unique and 11 were repeated once (never consecutively).
Test trials were distributed as follows per mixed block: 16 marker trials, 8 noun trials, 4 verb trials, and 4 adjective trials. Marker test trials included 32 unique sentences, each presented twice (non-consecutively) within a block, resulting in 64 marker items overall. The remaining test trials (noun, verb, adjective) comprised 56 unique sentences and 4 repeated ones. None of the test sentences had occurred during exposure.
This design allowed for a fine-grained assessment of vocabulary and morphosyntactic learning while keeping the format and appearance of test trials identical to exposure trials. Figure 1 provides a visual overview of the CSL structure.
Structure of the cross-situational learning (CSL) task. The task consisted of 8 exposure blocks and 4 mixed blocks combining exposure and test trials. Each exposure block included 16 exposure trials. Each mixed block included 16 exposure trials and 32 test trials targeting vocabulary and case marker learning.

Figure 1. Long description
Starting at the top left, the first column contains two consecutive exposure blocks labeled ‘16 exposure trials’ each, followed by a mixed block labeled ‘16 exposure trials plus 32 vocabulary test trials.’ Next are two more exposure blocks, each labeled ‘16 exposure trials,’ and a final mixed block labeled ‘16 exposure trials plus 32 vocabulary test trials.’ A horizontal arrow labeled ‘Optional break’ connects the two columns. The second column mirrors the first: two exposure blocks, a mixed block, two exposure blocks, and a final mixed block, all with identical trial counts and labels. Vertical arrows indicate the progression from top to bottom in each column.
Across all blocks, sentence presentation was balanced for word order, the presence of adjectives, noun and verb frequency, and agent–patient roles. Participants were not informed of any properties of the language and received no explicit instruction or feedback at any point.
Grammaticality judgment task
We used a grammaticality judgment task (GJT) to assess participants’ knowledge of morphosyntactic patterns after the CSL task. In each trial, participants viewed a single animated scene while listening to a spoken sentence and judged whether the sentence was grammatical or ungrammatical. Responses were made by pressing “Q” (“weird”) or “P” (“good”) on the keyboard. See Figure S3 in the supplementary materials for an example trial. The task included 48 trials, presented in randomized order. Half of the sentences were grammatical and counterbalanced for word order; the remaining half were ungrammatical, featuring violations of case marking or word order (see Table 1). None of the sentences had been presented during the CSL task.
Word-picture matching (WPM) task
The WPM task was used to assess the phonetic specificity of lexical representations formed during the CSL task. In particular, it tested whether participants encoded fine-grained phonological detail for the L2 phonemes /ʎ/, /ɲ/, /e/, and /o/, or whether their representations were fuzzy and underspecified. It also provided an additional measure of form–meaning mappings acquired during learning.
During the task, participants saw an isolated visual referent (noun, adjective, or verb) from the CSL task for 2 seconds, followed by a spoken pseudoword. They were instructed to judge whether the spoken word matched the referent learned during CSL. Responses were recorded via keypress: “Q” for “incorrect word” and “P” for “correct word”.
The spoken stimuli comprised four conditions designed to assess the phonological specificity of lexical representations. In the match condition, the pseudoword was the correct label for the visual referent learned during the CSL task (e.g., /kiʎu/ for the alien). The one-feature mismatch (1FM) condition presented a pseudoword that differed from the target by a single phonological feature, crucially forming one out of four challenging L2 contrasts: /l/–/ʎ/, /n/–/ɲ/, /e/–/ɛ/, and /o/–/ɔ/ (e.g., /kilu/). In the three-feature mismatch (3FM) condition, the pseudoword differed from the target by three phonological features (e.g., /kiʃu/). Particularly, for consonant contrasts, the 1FM pseudowords differed from the target only in place of articulation, whereas the 3FM pseudowords differed in place of articulation, manner of articulation, and voicing; and for the vowel contrasts, the 1FM pseudowords differed only in vowel height, whereas the 3FM pseudowords differed in vowel height, tongue position, and rounding (see Supplementary Materials, Table S3). Finally, the control condition used a pseudoword that corresponded to a completely different referent, unrelated to the one presented (e.g., /diɲu/).
To ensure participants were familiar with the referents in isolation (as they had only encountered them in sentence context during CSL), three introductory screens presented all referents by category: alien characters (black and white line drawings), color squares, and animated actions. Each screen was accompanied by a reminder (e.g., “During this experiment, you have learned the words for these [aliens/colors/actions]”).
The task comprised 84 trials presented in random order. Each correct mapping was presented three times to balance the number of trials across conditions. An example of the visual layout is shown in Figure S4 in the supplementary materials.
AX discrimination task
Given the similarities between the L2 sounds in the match and 1-feature mismatch conditions of the WPM task (e.g., [kiʎu]-[kilu]), we expected low accuracy in detecting the mismatch when accessing the newly acquired representations. The purpose of the AX discrimination task was to determine whether participants could perceptually distinguish between these challenging L2 contrasts—and thereby assess the extent to which low accuracy in the WPM task stemmed from difficulty in perceiving these phonological differences.
The task, adapted from Ge et al. (Reference Ge, Correia, Fernandes, Rato and Rebuschat2024), involved 12 disyllabic pseudoword minimal pairs, with three items per contrast: /l/–/ʎ/, /n/–/ɲ/, /e/–/ɛ/, and /o/–/ɔ/. Pseudowords were sourced from Correia et al. (Reference Correia, Rato, Ge, Fernandes, Kachlicka, Saito and Rebuschat2025). They were recorded by three native Portuguese speakers—two female and one male—including the speaker who had recorded the CSL stimuli. Most pseudowords were novel and had not appeared in the previous tasks, with the exception of three items (two from the /n/–/ɲ/ contrast and one from the /o/–/ɔ/ contrast). In each pair, the two pseudowords were always produced by different speakers, and each minimal pair was recorded twice, using different speaker combinations.
During the task, participants listened to a pair of pseudowords and judged whether the two items sounded the same or different by clicking on “SAME” or “DIFFERENT.” The task comprised 48 trials presented in randomized order. Half of the trials contained identical pairs (e.g., [kiʎu]–[kiʎu] or [kilu]–[kilu]), and half contained different pairs (e.g., [kiʎu]–[kilu] or [kilu]–[kiʎu]).
Retrospective verbal reports
At the end of the experiment, participants completed a short retrospective questionnaire to assess their awareness of the case-marking system in the artificial language (Rebuschat, Reference Rebuschat2013). They were asked open-ended questions about the strategies they used during the CSL task, whether their approach changed over time, and whether they noticed any patterns or regularities in the language, such as recurring word endings (e.g., “-L” and “-R”). To quantify awareness, we calculated an awareness score for each participant. One point was awarded for each marker correctly identified as a suffix during the first half of the CSL task, and .5 points for markers identified during the second half, yielding a maximum possible score of 2.
Analysis
Pre-registered analyses
All analyses were conducted in R. For all the mixed-effects models, the random-effects structure was defined as maximally as possible without causing convergence issues (see supplementary materials for full model specifications).
To explore H1, which stated that the high-salience condition would outperform the low-salience condition in case marker learning, we conducted a generalized linear mixed-effects model with a logit link function (using the lme4 package; Bates et al., Reference Bates, Mächler, Bolker, Walker, Christensen, Singmann and Dai2015). The dependent variable was accuracy on case marker test trials in the final block of the CSL task (1 = correct, 0 = incorrect). The model included salience condition as a fixed effect, with random intercepts for participants and items. We refer to this model henceforth as GLMER1.
To explore H2, which stated that perceptual salience of case markers would influence overall learning across the CSL task, we conducted a generalized linear mixed-effects model with accuracy on all exposure trials as the binary dependent variable (1 = correct, 0 = incorrect). Fixed effects included salience condition, block (1–12), and their interaction. The model included random intercepts for participants and items, and random slopes for block over participants. We refer to this model henceforth as GLMER2.
To test H3 and H4, which stated that the high salience markers would positively impact the acquisition of nouns and verbs, respectively, and to explore the possible knock-on effect on adjective learning, we ran separate models for each category using test trial accuracy from the final CSL block. All models included salience condition as a fixed effect and random intercepts for items and participants. We refer to these models as GLMER3 (nouns), GLMER4 (verbs), and GLMER5 (adjectives).
To test the hypotheses that case marker salience would benefit word order acquisition (H5) and marker syntax (H6), we computed d-prime (d′) scores on the GJT for each participant, separately for word order and case marker trials, respectively. We then ran two linear regression analyses to test the effect of salience on these d′ scores. We refer to these models as LM1 (word order) and LM2 (case markers).
As preregistered exploratory analyses, we first investigate the nature of the acquired lexical representations. For this, we ran a mixed-effects logistic regression on trial-level data of the WPM task. The dependent variable was accuracy. Fixed effects included mismatch condition, salience condition, target phoneme, referent type (Match pseudoword’s function during the CSL task), and the interaction between mismatch condition and target phoneme. Random intercepts were included for participant and item, with random slopes for salience over item and referent type over participant. We refer to this model as GLMER6. We also explore whether individual differences in phonetic discrimination ability help explain the phonolexical fuzziness observed in word learning. With this purpose, we ran another mixed-effects logistic regression on data from the AX task and WPM task trials on 1-feature mismatches. The dependent variable was accuracy. Fixed effects included task, target phoneme, their interaction, and salience condition. Random intercepts were included for participant and item, with random slopes for item over participant. We refer to this model as GLMER7.
Variable coding was pre-registered, and from this we kept the coding for variables “salience condition” (high = .5; low = −.5) and “task” (WPM task = .5; AX discrimination task = −.5). Due to model convergence issues, the coding scheme was changed and centered for variables “target phoneme” (sum-coding) and “mismatch condition” (custom coding via hypothesis matrix: the first contrast compares the Match condition with Control, the second contrast compares Match with 1FM, and the third contrast compares Match with 3FM). The variable of “referent type” in GLMER6, added upon reviewer request, was sum-coded. All the variables introduced in the additional analyses reported below were sum-coded.
Outlier removal
We followed our pre-registered plan for data removal. For all tasks, we removed individual reaction times over 15 s. In the CSL task, we removed data from blocks where a participant produced the same response (e.g., left side key) or alternating pattern (e.g., left/right/left/right) for 80% or more trials. This removed 0.03% of the total data.
Additional analyses
To explore the question of whether adults can simultaneously acquire a novel grammar and lexicon via CSL, we examined whether predicted accuracy differed from chance (50%) in each condition for exposure trials, vocabulary test trials, and grammaticality judgments. For exposure trials, we computed estimated marginal means (EMMs) from GLMER2 using the emmeans package in R. EMMs were extracted on the response (probability) scale, and Wald z-tests were conducted against a null of 0.5 (chance level). P-values were adjusted for multiple comparisons across blocks using the Holm procedure for these and all the following z-tests. For vocabulary test trials, we similarly computed EMMs from GLMER1 (case markers), GLMER3 (nouns), GLMER4 (verbs), and GLMER5 (adjectives) and conducted Wald z-tests against a null of 0.5. Lastly, for grammaticality judgments we computed EMMs on LM1 (word order—GJT) and LM2 (case marker—GJT) and conducted one-sample t-tests on d′ scores against a null of 0, reflecting chance level of d′.
Upon reviewer request, we included an additional linear model to compare self-reported awareness across the two salience conditions. As a supplementary signal-detection measure in the WPM task, we also computed d′ scores to probe discrimination between Match and 1FM pseudowords. We further conducted a series of Bonferroni-adjusted correlations between self-reported awareness and LSBQ scores with vocabulary performance and d′ scores in the GJT to explore whether retrospective awareness of the case marking suffixes or language background was related to vocabulary or grammar learning. Additionally, we examined via a Pearson correlation whether phoneme discrimination ability (AX task) predicted performance in the WPM task, specifically the ability to reject 1-feature mismatches, a key indicator of phonolexical specificity.
Results
All the statistical tests reported in this study were run with R Studio (R Core Team, 2017; R version 4.2.2). The data and code scripts for the analyses reported in this study can be accessed on this OSF site (https://osf.io/zm9q6).
Cross-situational learning task
The detailed performance on the exposure trials of the CSL task is reported in the supplementary materials (Figure S5). Accuracy for the four types of test trials is shown in Figure 2.
Percentage of correct responses across four test blocks in vocabulary tests for nouns, adjectives, verbs, and case markers, by salience condition (high vs. low). The graphs are scaled from 40% to 70% accuracy for clearer visualization. The dashed horizontal line indicates chance performance (50%). Error bars represent standard errors of the mean.

Figure 2. Long description
Top-left panel labeled Nouns plots percentage correct from 40 to 70 on the y-axis and blocks 1 to 4 on the x-axis. The high salience line starts near 53, rises slightly, and ends near 56. The low salience line starts at 50, rises steadily, and ends near 60. Top-right panel labeled Adjectives shows both lines peaking at block 2 around 58, then declining; high salience remains above low salience in later blocks. Bottom-left panel labeled Verbs shows both lines starting near 55, with low salience rising more sharply to about 67 at block 4, while high salience ends near 63. Bottom-right panel labeled Case Markers shows both lines fluctuating around 50, with no clear upward trend. All panels include a dashed line at 50 to indicate chance, and error bars for each point. Legends indicate red for high salience and blue for low salience.
GLMER1 was run to test H1, which stated that the high-salience group would perform more accurately than the low-salience group in case marker vocabulary test trials in the last block. The results showed a marginal effect of salience in the predicted direction: participants in the high-salience condition were more accurate than those in the low-salience group (b = .15, SE = .08, z = 1.83, p = .07), although the effect did not reach conventional significance. Model fit statistics indicated that the fixed effect of salience explained very little of the variance (marginal pseudo-R2 < .01; conditional pseudo-R2 < .01). To explore whether performance was above chance at the last block of case marker vocabulary test, we computed EMMs on GLMER1 for each condition and compared these against chance (.5). The results showed that model-based predicted accuracy in the final block did not differ significantly from chance in either the low-salience condition (M = .48, SE = .02, z = −1.31, p = .19) or the high-salience condition (M = .52, SE = .02, z = 1.17, p = .24).
GLMER2 was run to test H2, which stated that the high-salience group would perform more accurately than the low-salience group on training trials across the CSL task. The results revealed a significant effect of block (b = .10, SE = .01, z = 7.29, p < .001), indicating that learning improved over time. While the effect of salience did not reach statistical significance (b = .09, SE = .06, z = 1.51, p = .13), the trend suggested that participants in the high-salience condition may have outperformed those in the low-salience condition. The fixed effects explained 2% of the variance (marginal pseudo-R2 = .02), with total variance explained at 25% (conditional pseudo-R2 = .25).
To test overall learning in the CSL task, we extracted EMMs on GLMER2 for each condition across blocks and for each condition at each block. EMMs were compared against chance (.5) using Wald z-tests. Across all blocks, predicted accuracy was significantly above chance for both conditions (low salience: M = .63, SE = .03, z = 4.86, p < .001; high salience: M = .64, SE = .03, z = 5.27, p < .001). When examining block-wise performance, predicted accuracy was near chance in the initial blocks and increased over time, with both conditions exceeding chance at later blocks (high salience: from block 2 onwards, low salience: from block 3 onwards). Holm-adjusted p-values were used to correct for multiple comparisons across the twelve blocks. See supplementary materials, Table S5, for the results of the Wald z-test comparisons for each condition at each block.
To test whether salience affected the acquisition of nouns and verbs, as predicted by H3 and H4, respectively, as well as to explore the potential impact of salience group on adjective learning, we ran separate models for each category using test trial accuracy from the final CSL block: GLMER3 (nouns), GLMER4 (verbs), and GLMER5 (adjectives).
None of the models yielded significant effects. For noun learning, there was no effect of salience and a slight trend opposite to predictions (b = –.05, SE = .11, z = –.43, p = .66; marginal R2 < .01). The same was true for verbs (b = –.07, SE = .24, z = –.31, p = .75; marginal R2 < .01), despite moderate variance explained at the participant level (conditional R2 = .17). For adjectives, the salience effect also fell short of significance (b = .20, SE = .16, z = 1.21, p = .23), though the trend was in the expected direction. Wald z-tests against chance run on the EMMs of GLMER3, GLMER4, and GLMER5 showed that predicted accuracy for both salience groups was above chance in the last vocabulary test block for nouns (low salience: M = .55, SE = .02, z = 2.66, p < .01; high salience: M = .54, SE = .02, z = 2.07, p < .05) and verbs (low salience: M = .67, SE = .04, z = 4.29, p < .001; high salience: M = .66, SE = .04, z = 3.87, p < .001). In contrast, predicted accuracy for both salience groups in the last vocabulary test block for adjectives was not significantly different from chance (low salience: M = .48, SE = .03, z = –.47, p = .64; high salience: M = .53, SE = .03, z = .92, p = .35).
Although not part of our preregistered analyses, we explored whether retrospective awareness of the case marking suffixes was related to learning outcomes. Particularly, we ran correlation tests between awareness scores and LSBQ scores each with both overall vocabulary and case marker vocabulary performance. We report Bonferroni-corrected p-values. Awareness scores, based on participants’ verbal reports, were not significantly correlated with overall vocabulary performance in the final block after Bonferroni corrections, but the data shows a trend in the expected direction (r(154) = .19, p = .06). However, awareness scores correlated with performance on case marker trials only (r(154) = .24, p < .05). In contrast, composite scores from the LSBQ were not significantly correlated with either overall vocabulary performance (r(154) = –.09, p = .94) or case marker accuracy (r(154) = .04, p = 1).
Additionally, we ran a linear model to test whether awareness scores differed significantly between the two salience groups. This model revealed a significant effect of salience condition on awareness scores, with the low-salience group showing lower awareness scores than the high-salience group (b = –.27, SE = .01, t = –47.57, p < .001). Cohen’s d reveals that this is a moderate effect size (d = .39), although the model indicates that approximately only 3.8% of the variance in awareness is explained by salience group (R2 = .038).
Grammaticality judgment task
Basic descriptive analyses of performance on the grammaticality judgment task (GJT) across salience groups conform to our expectations. Participants in the high-salience group were more accurate overall (M = 66.66%, SE = 0.77) than those in the low-salience group (M = 56.07%, SE = 0.80). The high-salience group was also more likely to endorse grammatical sentences (M = 77.15%, SE = 0.97) and less likely to endorse ungrammatical ones (M = 43.83%, SE = 1.14) compared to the low-salience group (72.37% and 60.20%, respectively).
These patterns are illustrated in Figure 3, which plots endorsement rates by salience group and sentence type. The high-salience group clearly distinguishes between grammatical and ungrammatical structures, with endorsement rates well above chance for grammatical sentences and well below chance for most ungrammatical ones. In contrast, the low-salience group displays a more muted profile, with endorsement rates for ungrammatical sentences often close to or above chance, especially when case markers were omitted. This provides converging evidence that enhanced perceptual salience boosted learners’ ability to generalize case-marking patterns.
Endorsement rates for grammatical and ungrammatical sentences in the grammatical judgment task. Error bars show standard error of the mean. Dotted line at 50% shows chance performance.

Figure 3. Long description
The y-axis is labeled Endorsement rate, ranging from 0 to 100. The x-axis has two groups: High salience and Low salience. Each group contains six vertical bars, left to right: Grammatical, Object marker attached to the subject, Subject marker attached to the object, Omission of object marker, Omission of subject marker, and Word order violation. Grammatical sentences have the highest endorsement rate in both groups, above 75 percent in high salience and slightly lower in low salience. The next two types, Object marker attached to the subject and Subject marker attached to the object, have endorsement rates around 55 to 60 percent in high salience and increase to about 65 to 70 percent in low salience. Omission of object marker and omission of subject marker are lower, around 35 to 40 percent in high salience and about 45 to 50 percent in low salience. Word order violation is lowest, near 30 percent in high salience and about 40 percent in low salience. All bars include error bars. A horizontal dashed line marks 50 percent endorsement rate. The legend at right matches bar colors to sentence types.
To further assess sensitivity while controlling for response bias, we computed d′ scores for each participant, separately for case marker and word order trials. We then ran two linear regression analyses on these d′ scores, LM1 and LM2, to test the hypotheses that case marker salience would benefit word order acquisition (H5) and marker syntax (H6), respectively. For LM1, the effect of salience condition on d′ scores in word order trials was not statistically significant, β = −.12, SE = .08, t = −1.37, p = .173, indicating that salience condition did not predict detection of word order violations. A one-sample t-test against chance (d′ = 0) on the EMMs of LM1 indicated that sensitivity to word order violations was significantly above chance in both the high-salience (d′ = .28, p < .001) and low-salience (d′ = .16, p < .05) conditions. For LM2, the effect of salience condition on d′ scores in case marker trials was statistically significant, β = .37, SE = .06, t = 6.37, p < .001, with higher scores in the high-salience group. The model explained approximately 20% of the variance (R2 = .20, adjusted R2 = .20). A one-sample t-test against chance (d′ = 0) on the EMMs of LM2 indicated that sensitivity to case marker violations was significantly above chance in the high-salience condition (d′ = .15, p < .001), whereas sensitivity to case marker violations in the low-salience condition was significantly below chance (d′ = −.23, p < .001).
Finally, we examined correlations between d′ scores and participants’ awareness of case marking (from retrospective reports) and LSBQ language background scores. We report Bonferroni-adjusted p-values: For word order d′, there was no significant correlation with awareness (r = –.09, p = .26), but a trend of a negative correlation with LSBQ, although not reaching significance (r = −.18, p = .07). For case marker d′, there was a weak positive correlation with awareness that did not reach significance after Bonferroni corrections (r = .20, p = .05), but no significant correlation with LSBQ (r = .07, p = 1).
WPM task
Figure 4 presents endorsement rates of the pseudowords presented in the WPM task by salience group and type of auditory stimuli: the correct form (Match), a 1-feature mismatch (1FM), a 3-feature mismatch (3FM), and a Control item from a different referent. Accurate phonolexical encoding would be reflected in high endorsement of match forms and low endorsement of all mismatches, particularly those differing by just one phonological feature. Accuracy was comparable across salience conditions, with the high-salience group performing slightly worse (M = 63.58%, SE = .59) than the low-salience group (M = 65.26%, SE = .58).
Endorsement rates of the different conditions on the WPM task. Error bars show standard error of the mean. The dotted line at 50% indicates chance performance.
Note: For example, for the correct mapping of the pseudoword /kiʎu/ (Match), the 1-feature mismatch (1FM) was /kilu/, the 3-feature mismatch (3FM) was /kiʃu/, and the control (Control) was /diɲu/.

Figure 4. Long description
The y-axis is labeled Endorsement rate, ranging from 0 to 100. The x-axis has two groups: High salience and Low salience. Each group contains four bars, left to right: 1 F M (dark blue), 3 F M (medium blue), Control (light blue), and Match (very light blue). A dashed horizontal line marks 50 percent. In High salience, 1 F M and Control are near 40 percent, 3 F M is near 25 percent, and Match is just above 70 percent. In Low salience, 1 F M and Control are near 45 percent, 3 F M is near 25 percent, and Match is near 75 percent. Error bars indicate standard error. The legend at the right identifies bar colors for 1 F M, 3 F M, Control, and Match.
GLMER6 was run to explore the phonological detail of the lexical representations formed through cross-situational learning (CSL). The model revealed a significant main effect of mismatch condition in all the specified comparisons. Accuracy for match items was higher than accuracy for controls (b = .35; SE = .05; z = 7.37; p < .001), higher than accuracy for 1FM items (b = .36; SE = .06; z = 6.04; p < .001), but lower than accuracy for 3FM items (b = −.62; SE = .06; z = −11.04; p < .001). However, there was no significant effect of salience (b = –.10; SE = .07, z = –1.45, p = .15). To further explore discrimination sensitivity between Match and 1FM items, we computed d′ scores per participant, which showed very limited discriminability and large individual differences (M = .39, min = −.45, max = 1.70).
To more accurately explore the differences in accuracy to pseudowords with different target phonemes and referent types, we computed EMMs for each variable. Regarding the effect of phoneme, EMMs varied across phoneme categories (/e/, /o/, /ɲ/, and /ʎ/), averaged over match, word function, and condition. Accuracy was lowest for /e/ (.64, 95% CI [.61, .67]), followed by /o/ (.66, 95% CI [.62, .69]), /ɲ/ (.66, 95% CI [.63, .70]), and highest for /ʎ/ (.69, 95% CI [.66, .71]). Pairwise comparisons showed that /e/ yielded significantly lower accuracy than /ʎ/ (b = −.2234, SE = .0709, p < .01). No other pairwise contrasts reached significance. Regarding the effect of referent type, i.e., each word’s function during CSL, EMMs showed clear differences among referent type categories (noun, verb, adjective), averaged across match, phoneme, and condition. Accuracy was lowest for nouns (.62, 95% CI [.60, .64]), followed by adjectives (.66, 95% CI [.63, .69]), and highest for verbs (.71, 95% CI [.68, .74]). Pairwise comparisons confirmed that nouns were recognized significantly less accurately than both verbs (b = −.393, SE = .0635, p < .001) and adjectives (b = −.171, SE = .0657, p < .05). Verbs were also recognized more accurately than adjectives (b = .222, SE = .0739, p < .01).
The model explained approximately 1% of the variance in accuracy (marginal pseudo-R2 = .05; conditional pseudo-R2 = .11), consistent with subtle but reliable effects of phonological distance on performance.
AX discrimination task
As shown in Figure 5, overall accuracy on the AX discrimination task was comparable across salience groups: high-salience (M = 64.61%, SE = .77) and low-salience (M = 65.47%, SE = .77). Across both groups, the /l/–/ʎ/ contrast was most accurately discriminated, followed by /o/–/ɔ/. The /e/–/ɛ/ and /n/–/ɲ/ contrasts yielded lower accuracy overall, though the low-salience group performed somewhat better on the latter contrast than the high-salience group.Footnote 3
Accuracy in the AX discrimination task for each target phoneme contrast, by salience group. Error bars show standard error of the mean. Dotted line marks chance level at 50%.
Note: “e” = /e/–/ɛ/, “J” = /n/–/ɲ/, “L” = /l/–/ʎ/, “o” = /o/–/ɔ/.

Figure 5. Long description
The graph has two main groups on the x-axis: high salience and low salience. Each group contains four vertical bars, colored from dark to light blue, representing the phoneme contrasts e, J, L, and o as defined in the legend. The y-axis shows percentage correct from 0 to 100, with a dashed line at 50 indicating chance. In both salience groups, the J contrast (n versus engma) yields the highest accuracy, reaching about 75 percent, followed by L (l versus turned y) and o (o versus open o), both near 65 percent. The e contrast (e versus epsilon) is lowest, just above 55 percent. Error bars indicate standard error for each bar. The pattern is consistent across salience groups, with all contrasts above chance.
We ran a Pearson correlation to explore the potential relationship between phoneme discrimination ability and the ability to reject 1-feature mismatches in the WPM task, a key indicator of phonolexical specificity. This revealed a weak but statistically significant relationship between AX accuracy and accuracy in rejecting 1-feature mismatches in the WPM task (r = .26, p < .001).
See supplementary materials for a visualization of this correlation (Figure S6) and a supplementary mixed-effects model (GLMER7) further supporting the conclusion that variability in phonetic perception contributed to observed differences in lexical encoding.
Discussion
This study examined how adults acquire vocabulary and grammar in an unfamiliar language through cross-situational learning (CSL), focusing in particular on the role of perceptual salience and phonological discrimination. English-speaking participants were exposed to an artificial language modeled on Portuguese, through animated scenes paired with spoken sentences. We followed two main research aims. First, we focused on our pre-registered hypotheses regarding a positive effect of perceptual salience of case markers on case marker comprehension (H1), overall learning (H2), noun (H3) and verb comprehension (H4), and the detection of grammatical violations in word order (H5) and case marker (H6). Then, outside the pre-registration but in line with our pre-registered analyses, we explored whether participants could acquire lexical and morphosyntactic knowledge simultaneously from linguistic input with non-native phonology by tracking cross-situational statistics. Additionally, we explored the phonetic detail of the acquired representations and their relation to phoneme discrimination ability. To test these questions, we assessed vocabulary learning, grammaticality judgment, word–picture matching, and phoneme discrimination. The findings are discussed below.
Learning words and grammar in L2 CSL
This section addresses the general learning patterns, exploring whether learners could acquire vocabulary and grammar in a language with unfamiliar phonology. We hereby evaluate our tentative expectation that participants would successfully learn lexical items and morphosyntactic markers through CSL despite the L2 sound system. Performance on training trials becomes significantly above chance from block 2 onward in the high-salience group, and from block 3 onward in the low-salience group. Paired with a significant effect of block in accuracy, this indicates that adults were able to acquire at least part of the novel language through cross-situational statistical learning despite the novel phonology.
Performance on the vocabulary tests was above chance for nouns and verbs in both salience groups by the end of the experiment, indicating partial learning of the corresponding lexical mappings. However, participants failed to acquire adjectives and case markers, as indicated by chance-level performance in these conditions. These results closely mirror those of Rebuschat et al. (Reference Rebuschat, Monaghan and Schoetensack2021, Experiment 2), who found a similar pattern—successful learning of nouns and verbs but not adjectives and case markers—even when the artificial language used native phonology. This parallel suggests that the lack of learning for adjectives and markers may reflect a general learning difficulty associated with these categories, rather than one caused by L2 phonology per se.
Our findings stand in partial contrast to those of Escudero et al. (Reference Escudero, Smit and Mulak2022), who reported significantly better noun learning when the phonology of the artificial language matched participants’ L1. In our case, noun learning was small but evident across groups, and broadly comparable in magnitude to that reported by Rebuschat et al. (Reference Rebuschat, Monaghan and Schoetensack2021, Experiment 2), who used the same paradigm with native phonology. This similarity suggests that phonological unfamiliarity may not uniformly hinder lexical acquisition.
Grammar test performance offered a more nuanced picture. On average, participants did not reliably reject sentences with incorrect case marking, indicating that they did not acquire the grammatical function of the case markers. This aligns with the weak performance on case marker vocabulary trials. Nevertheless, participants in the high-salience condition showed above-chance performance in detecting grammatical violations involving marker omission and word order, pointing to some emergent awareness of morphosyntactic regularities, particularly when case markers were perceptually prominent. In contrast, the low-salience group is only above chance in detecting word order violations, albeit marginally. The limited overall grammar learning may, at least partly, be attributable to L1 interference. Given that all learners were L1 English speakers, a VO language without overt case marking, acquiring an unfamiliar (OV) word order and case morphology may have been a particularly challenging task. Although, in the present study, L1 effects are not directly tested, previous research has documented this kind of L1 interference (Al-Khresheh, Reference Al-Khresheh2010; Murakami & Alexopoulou, Reference Murakami and Alexopoulou2016). Importantly, as we discuss in the following section, our results seem to suggest that increasing perceptual salience facilitates noticing of these novel aspects during early L2 learning, which may have positive consequences for subsequent learning. Taken together, these results suggest that participants were able to acquire key aspects of the vocabulary and limited aspects of the grammar, particularly when perceptual cues were enhanced.
Salience in CSL
This section addresses the specific question of whether perceptual salience facilitates the acquisition of morphosyntactic structure, and evaluates H1, H2, H3, H4, H5, and H6.
Although the salience effect approached significance in case marker comprehension, both salience groups performed near chance level on case marker vocabulary trials in the final test block. It is therefore possible that case marker learning was simply too limited to produce a clear between-group difference. Thus, our data do not support the prediction that participants in the high-salience condition would show superior comprehension of case morphology, as predicted by H1.
Similarly, we observed no significant salience effect in the comprehension of transitive sentences during the training phase. The absence of knock-on effects may reflect the failure to form reliable form–meaning mappings for the case markers, limiting their utility in sentence parsing. In line with this, salience had no effect on the learning of nouns, verbs, or adjectives. As a result, we partly reject H2, which posited salience-driven benefits to overall learning during exposure. We also reject H3 and H4, which predicted salience-driven benefits in noun and verb learning, respectively.
The two salience groups diverged more clearly on the grammar test. Although both groups were able to detect word order violations, only the high-salience group reliably rejected grammatical violations involving case marker omissions. Therefore, we reject H5, which predicted that salience would improve sensitivity to word order grammaticality, but found support for H6, which predicted improved sensitivity to case marker grammaticality. These effects suggest that enhanced salience drew learners’ attention to the form and position of the case markers, promoting sensitivity to morphological structure even in the absence of complete functional understanding. This is consistent with the higher reported awareness of case marking suffixes in the high-salience group compared to the low-salience group. Additionally supporting this argument is the trend towards a positive correlation between reported awareness of case marking suffixes and performance in case marker trials. Overall, these findings suggest that increased perceptual salience aids noticing and segmentation more than it establishes functional mappings, at least during early stages of L2 learning.
Consistent with the Noticing Hypothesis (Schmidt, Reference Schmidt1990), we interpret the high-salience group’s heightened sensitivity to marker omission as evidence of increased structural awareness, albeit weak, likely driven by greater perceptual prominence. This enhanced awareness may reflect noticing of surface features without full mapping to meaning. Indeed, neither group was able to reject incorrect role assignments, indicating that the case markers’ grammatical function was not acquired. As mentioned above, one likely explanation is the absence of a direct L1 equivalent: English lacks overt case markers, making it difficult to associate novel morphemes with unfamiliar grammatical functions. This is supported by findings from L2 morphosyntax (Murakami & Alexopoulou, Reference Murakami and Alexopoulou2016) and artificial language studies (Kenanidis et al., Reference Kenanidis, Dąbrowska, Llompart and Pili-Moss2023), both of which show greater difficulty with morphosyntactic features lacking L1 analogues. Accordingly, our results suggest only a weak effect of perceptual salience, which may reflect the fact that this notion captures only physical stimulus properties, whereas broader definitions incorporating psycholinguistic and experiential factors such as redundancy or L1 experience may reveal stronger salience effects (see Knell et al., Reference Knell, Cipitria, De Cuypere, Housen and Struys2025).
In sum, although perceptual salience did not facilitate vocabulary learning, it played a key role in enhancing learners’ sensitivity to morphosyntactic form and structure.
Phonolexical representations
This section addresses our exploration of the level of phonological detail of the lexical representations formed during CSL, and whether individual differences in phonetic discrimination ability could predict the precision of these representations. We evaluated these questions through the WPM task and the AX phoneme discrimination task.
Participants reliably endorsed correct forms (Match) and rejected clearly deviant forms that differed by three phonological features (3FM), indicating that learners acquired word–referent associations with sufficient specificity to detect gross phonological mismatches. However, rejection accuracy dropped substantially for the challenging L2 contrasts (1-feature mismatches), which suggests that the phonological representations formed under CSL are only partially specified. This pattern aligns with previous evidence for phonolexical fuzziness in L2 acquisition (Darcy et al., Reference Darcy, Llompart, Hayes-Harb, Mora, Adrian, Cook and Ernestus2025) and reflects difficulty with encoding subtle contrasts in unfamiliar phonological systems.
Interestingly, rejection of unrelated control pseudowords was also relatively poor, suggesting that form–meaning associations remained weak overall. Such relatively low rejection of control pseudowords, particularly compared to the higher rejection of 3FM pseudowords, could reflect a familiarity effect. Unlike 1FM and 3FM pseudowords, which were created specifically for the WPM task and thus heard for the first time during testing, control pseudowords were previously heard during the CSL task, although linked to different referents. Given the overall modest vocabulary learning (<60%), participants may have relied on a coarse familiarity signal rather than precise form-meaning mappings, leading them to endorse familiar-sounding forms even when paired with incorrect referents. This familiarity effect could be attributed to the limited duration of exposure (~40 minutes), or to challenges posed by the unfamiliar phonology, as documented in prior work (Escudero et al., Reference Escudero, Smit and Mulak2022; Kaushanskaya et al., Reference Kaushanskaya, Yoo and Van Hecke2013), or both. Without a native-language control group, we cannot disentangle these two factors.
Regarding the potential relationship between phonetic discrimination ability (AX task) and phonolexical precision (WPM 1FM condition), we observed a significant but weak positive correlation. This finding replicates prior findings that phoneme differentiation accounts for only a small proportion of variance in lexical encoding (Elvin, Reference Elvin2016; Llompart et al., Reference Llompart, Gorba and Prieto2025). This suggests that phonolexical fuzziness is only partially rooted in perceptual acuity, and that other individual difference factors, such as working memory, attention, or auditory processing, likely contribute to lexical learning under uncertain conditions.
In summary, our findings indicate that lexical representations acquired through CSL are underspecified, especially for minimal phonological contrasts. We also find a weak but significant link between phonetic discrimination ability and phonolexical specificity.
Conclusion
This study demonstrated that adults are capable of learning both vocabulary and grammatical structure in a novel language with unfamiliar phonology through cross-situational statistical learning. Participants acquired nouns and verbs successfully, and, particularly when exposed to more perceptually salient case markers, showed emerging sensitivity to word order despite the challenges posed by the L2 phonological forms. These results align with previous findings from studies using native phonology (e.g., Rebuschat et al., Reference Rebuschat, Monaghan and Schoetensack2021), suggesting that CSL mechanisms can still support learning even under more demanding learning conditions.
The role of perceptual salience proved especially relevant for developing sensitivity to grammatical morphemes. Although overall comprehension of case markers remained limited, learners exposed to high-salience markers were more attuned to their omission and placement. This supports the idea that enhancing the perceptual distinctiveness of grammatical morphemes can draw learners’ attention to their structural properties, consistent with the predictions of the Noticing Hypothesis (Schmidt, Reference Schmidt1990). However, this awareness did not result in robust form–function mappings for the case markers themselves, likely due to the absence of comparable grammatical functions in the learners’ native language. This highlights the importance of conceptual alignment between L1 and L2 in the acquisition of morphosyntactic features.
Interestingly, while increased salience facilitated awareness of morphosyntactic structure, it did not enhance lexical learning. There were no measurable benefits for the learning of nouns, verbs, or adjectives. Nonetheless, we take this to suggest that perceptual factors can influence syntactic awareness even in the absence of full morphological comprehension.
Turning to the nature of the lexical representations acquired, we found evidence that participants encoded word forms imprecisely: while they readily accepted correct forms and rejected clearly mismatching lures, they struggled with minimally different distractors. This pattern suggests that phonological representations were underspecified, a finding consistent with previous accounts of lexical fuzziness in L2 word learning. Although phonetic discrimination ability contributed to this variability, the relationship was relatively weak, pointing to the likely involvement of other relevant factors.
In sum, our findings underscore the effectiveness of CSL for supporting early L2 learning and highlight the important, but limited, role of perceptual salience in facilitating morphosyntactic awareness. They also reveal the fragile and fuzzy nature of early lexical representations when learning occurs under conditions of phonological unfamiliarity. Future research should build on these findings by examining how phonological, conceptual, and perceptual factors interact to shape the trajectory of L2 development across learning contexts.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0272263126101776.
Data availability statement
The data that support the findings of this study are openly available in OSF at https://osf.io/zm9q6. This study was pre-registered: https://osf.io/a258c.
Acknowledgments
We would like to thank Dr. Yuxin Ge and Dr. Yun-Wei Lee for their assistance with the experimental setup, and all members of the Lancaster Language Learning Lab for insightful discussions on statistical learning. This research was partly supported by Portuguese national funding through the FCT – Portuguese Foundation for Science and Technology, I.P., as part of the project UID/03213/2025 (https://doi.org/10.54499/UID/03213/2025), and by the European Union NextGenerationEU, as part of the project UID/PRR/03213/2025 (https://doi.org/10.54499/UID/PRR/03213/2025).
Competing interests
The authors declare none.