Listening like a native: Unprofitable procedures need to be discarded

Two languages, historically related, both have lexical stress, with word stress distinctions signalled in each by the same suprasegmental cues. In each language, words can overlap segmentally but differ in placement of primary versus secondary stress ( OCtopus , ocTOber ). However, secondary stress occurs more often in the words of one language, Dutch, than in the other, English, and largely because of this, Dutch listeners find it helpful to use supraseg-mental stress cues when recognising spoken words. English listeners, in contrast, do not; indeed, Dutch listeners can outdo English listeners in correctly identifying the source words of English word fragments ( oc -). Here we show that Dutch-native listeners who reside in an English-speaking environment


Introduction
The efficiency of listening to speech is based on our ability to adjust the processing mechanisms involved to ensure that they function optimally in the language under use. Different languages deploy different acoustic cues to distinguish between phonemes and hence between spoken words, and listeners learn to process speech in the most efficient manner; together these situations produce language-specific listening, with native users of each language listening in a way that is tailored to the particular properties of the language they have been exposed to (L1; Cutler, 2012).
As a result, the cues used during speech processing can differ from one listener group to another. This can even hold true with two languages that are historically closely related and in which many structural features are highly similar, such as Dutch and English. These two Germanic languages have broadly comparable syntactic and phonological systems. For instance, both languages use lexical stress, and as a result, the syllables of the words of each language differ in the way they are realised suprasegmentally (i.e., in the syllable's duration, and the intensity and the fundamental frequency producing its vocalic portion). The placement of primary stress in English and Dutch is not rule-governed. Stress may fall at any word position (PRImary, poSItion, fundaMENtal, withIN; upper case letters indicate primary stress); also, in every word there is one and only one syllable that may bear primary stress. In both languages, the location of stress within a word may shift under influence of sentence rhythm (e.g., thirTEEN becomes THIRteen when it is followed by MEN; Gussenhoven, 1983;Kager & Visch, 1985;Liberman & Prince, 1977). Nonetheless, research in recent years has shown Dutch and English to differ quite reliably in the way their listeners handle the various kinds of phonetic cue available for identifying spoken words, with the markers of lexical stress playing the leading role (e.g., Cooper et al. 2002;Tremblay et al., 2021).
In English, lexical stress is cued suprasegmentally by duration, intensity and pitch. The most important cue to lexical stress, however, occurs at the segmental level and is provided by the quality of the vowel (e.g., Chrabaszcz et al., 2014;Lin et al., 2014;Zhang & Francis, 2010): stressed syllables always contain a full vowel, but vowels in unstressed syllables are frequently reduced towards schwa (Fourakis, 1991), so that minimal pairs such as PREsent (noun) and preSENT (verb) are segmentally and suprasegmentally distinct. In Dutch, reduction of vowels in unstressed syllables occurs much less frequently than in English (Sluijter & van Heuven, 1996), leaving duration, intensity and pitch as the most important acoustic correlates to lexical stress (van Heuven & de Jonge, 2011). In contrast to other studies that have compared listeners' weighting of all acoustic cues to stress (i.e., both segmental and suprasegmental cues), in the present study we focus specifically on listeners' use of suprasegmental cues to lexical stress only.
In principle, stress in both English and Dutch can be contrastive and serve to distinguish between segmentally identical word pairs such as INsight and inCITEalthough in fact such minimal pairs are rare in all stress languages (Cutler & Jesse, 2021), with neither English nor Dutch defying this rule. What is particularly useful about such minimal stress pairs, of course, is how well they show the availability of the suprasegmental cues for listeners. Figure 1 shows the English pair PERvert (noun)/perVERT (verb), and the Dutch pair VOORnaam (noun: "first name")/voorNAAM (adjective: "respectable"). In duration, amplitude and pitch, each primary-stressed syllable clearly outdoes its segmentally matched but suprasegmentally mismatched companion.
Even without many such minimal pairs, the simple fact that stress patterns vary from word to word should make suprasegmental stress cues useful for listeners engaged in spoken-word recognition. Word pairs with segmentally identical first syllables, such as PRImary versus priMEval, or OCtopus versus ocTOber, could surely be distinguished more rapidly if a listener takes the stress cues into consideration as well as the segmental differences later in the word.
Indeed, there is evidence that Dutch listeners use them very efficiently. An early demonstration of this (van Heuven, 1988) used a gating task and sentences in which both words from a pair with versus without initial primary stress were equally plausible (e.g., ORgel, "organ", versus orKEST, "orchestra"). Listeners heard these sentences truncated so that only a short fragment of the final word was audible, and had to guess which word it was; 76% of their guesses from just the initial vowel were correct, and this could only have been due to use of the suprasegmental differences. Other Dutch listeners in a similar study using minimal stress pairs also achieved high correct identifications of the source word (this time in 86% of cases; Cutler & van Donselaar, 2001).
Although both of these results are from 'offline' tasks (with decision responses collected after speech processing has concluded), they certainly indicate that Dutch listeners exploit not only segmental but also suprasegmental information. Investigations using 'online' tasks measuring processing speed confirmed these findings. In a priming task with minimal stress pairs Dutch listeners were quicker to accept words primed with their initial syllable only if the prime had the correct suprasegmental cues (Cutler & van Donselaar, 2001), and quicker to accept a visually presented word when it was primed by a spoken bisyllabic fragment of the same word as long as, again, the suprasegmental cues were correct (van Donselaar et al. 2005). Likewise, incorrectly applied stress patterns proved to affect word recognition in Dutch, in that mis-stressing impeded word recognition (Koster & Cutler, 1997;van Leyden & van Heuven, 1996). Clearly, suprasegmental stress cues aid listeners of Dutch to quickly distinguish between differently stressed Dutch words. Figure 1 suggests that the strength of suprasegmental cues to lexical stress in spoken English words is no less than that of Dutch words. It is thus on the face of it surprising that the Dutch results above have no match in English. Mis-stressing does not prevent English word recognition in noise as long as vowels are intact (Slowiaczek, 1990), and it fails to affect the speed with which English words are recognised (Small et al. 1988), the acceptability of spoken words in sentences (Slowiaczek, 1991) or the judged naturalness of spoken words (Fear et al. 1995). In English, minimal stress pairs even prime each other's associates (Cutler, 1986). As for the fragment priming results from Dutch, these too do not replicate in English; segmental overlap does prime matching word forms, but whether the segments are accompanied by matching suprasegmental features as well makes no difference to listeners' responses (Cooper et al., 2002;Experiment 1a;Fear et al., 1995;Small et al., 1988). Native listeners of Dutch and English thus appear to differ in the extent to which they exploit suprasegmental stress cues during spoken-word recognition, despite the similarity between the two languages and their close relatedness. In both languages, the information is there in the signal; in one language, the information is used, in the other it is not. As proposed by Cooper et al. (2002), listeners' use (or otherwise) of suprasegmental stress information depends on whether it is useful. That, in turn, depends on the structure of the lexicon (Cutler & Pasveer, 2006).
The vocabularies of English and Dutch differ in the distribution and the frequency of occurrence of speech fragments that are ambiguous on a segmental level yet can be disambiguated when suprasegmental stress patterns are taken into account. In English, such fragments occur relatively infrequently, since the vowel in a syllable which itself is preceded by a stressed syllable is frequently reduced, leading to a pair of segmentally differing rather than a pair of segmentally identical syllables. English listeners are therefore not confronted with segmental ambiguity at all; the first two syllables of words such as ocTOber (with a stressed and therefore full vowel in the second syllable) and OCtopus (with a reduced vowel in the second syllable) can be disambiguated on segmental differences alone. There is no additional information to be gained by taking suprasegmental stress cues into account.
The Dutch lexicon, on the other hand, contains many words of three syllables or more that have full vowels in the first two syllables, and as a result, many pairs that are temporarily ambiguous (such as okTOber and OKtopus). For Dutch listeners, the use of suprasegmental stress cues is thus efficient, indeed essential, as it provides disambiguating information that is not available on a segmental level. The vocabulary asymmetry results in native speakers of English and Dutch developing differently weighted models of segmental and suprasegmental information and, in consequence, quite different listening strategies. In both languages, the suprasegmental information is there in the signal; but whether listeners use it depends on whether it is useful in speeding the recognition of their words. The asymmetry in this case of otherwise highly similar languages simply reflects the efficiency of the speech processing system.
The question at issue in the present study is what consequences the asymmetry may have for those who fully command both languages. Previous research on lexical stress has shown that listeners' use of acoustic cues to lexical stress in a second language (L2) is strongly influenced by their use of these cues in the L1 (e.g., Choi, 2022;Cooper et al., 2002;Dupoux et al., 2008;Qin et al. 2017;Tremblay et al., 2021). While some listeners of languages without lexical stress may struggle to perceive English lexical stress (e.g., Lin et al., 2014), others may be able to perceive it by exploiting acoustic cues that they rely on for other aspects of lexical access in their native language. For instance, Cantonese listeners, experienced in the use of F0 as a cue to lexical tones, and listeners of Gyongsang-Korean, a dialect with lexical pitch accents, can both successfully discriminate minimal stress pair words in the L2 English despite the lack of lexical stress in their native language (Choi et al. 2019;. Listeners whose L1 does have lexical stress tend to transfer their cue use from the L1 to the L2, leading to non-native-like stress perception (Cooper et al., 2002;Cutler, 2009;Tremblay et al., 2021). Dutch listeners presented with segmentally identical but suprasegmentally distinct word fragments (such as oc-/OC) in their L2, English, actually outdo native listeners in their ability to correctly classify the source word (Cooper et al., 2002;Cutler, 2009). Thus, when they process L2 speech they draw upon skills induced by their L1 which are not in the possession of L1 listeners to English whose previous English input of course has not induced any such skills. But with substantial experience in the same L2, might Dutch-native listeners learn to listen like the English do, and ignore those features which are useful in their L1 but actually are not appropriate for their L2? Of particular interest then is the kind of learning involved. With a few notable exceptions (e.g., Tremblay & Spinelli, 2014;Weber & Cutler, 2006), existing studies of phonological structure in L2 listening have tended to focus on the ACQUISITION of L2-appropriate strategies, but the present question amounts to whether L2 listeners can learn that their perceptual performance could be improved by DROPPING an L1 strategy.
The appropriate population for such a question is one immersed in an L2 environment and predominantly using the L2 in daily life. Our study involves a population of native Dutch-speaking emigrants in Australia. Dutch emigrants tend to quickly adopt the language of their new environment (Clyne & Pauwels, 1997), with the result that Dutch emigrants in Australia typically use English, their L2, for everyday communication. In Experiment I, these Dutch emigrants living in Australia completed a replication of Experiment 3 from Cooper et al.'s (2002) study. If the emigrants exploit suprasegmental cues to lexical stress in English, their accuracy is predicted to be high and resemble that of the Dutch L2 listeners in the original study by Cooper and colleagues. If, on the other hand, the emigrants have stopped using suprasegmental stress cues as they are not useful for the L2, accuracy is predicted to be lower than that of Cooper et al.'s Dutch listeners and more similar to the accuracy of the English L1 listeners in that same experiment. Experiment II aimed to establish the validity of new Dutch stimulus materials that we constructed in parallel to the English stimuli from Cooper et al. (2002), and was conducted with native Dutch listeners in the Netherlands; Experiment III then used these new materials to examine the L1 identification accuracy available to the same group of Dutch emigrants who had completed Experiment I.

Participants
Twenty-four participants were recruited from the Dutch emigrant community in the wider Sydney area (aged 27-73 years, M = 48.8, SD = 14.9; 14 females). All participants were native speakers of Dutch, who grew up in the Netherlands and had migrated to Australia as adults (mean age at migration: 28.4 years, SD = 7.7, range: 18-52). Their mean length of residence in Australia was 20.5 years (SD = 15.2). Participants were highly proficient in their L2, English, as indicated by their mean score of 93.6 (SD = 5.3) on the Lexical Test for Advanced Learners of English (LexTALE; Lemhöfer & Broersma, 2012). To measure their frequency of L1 and L2 use, the question "Please indicate to what extent you use Dutch and English in the situations listed" was included as part of a background questionnaire participants completed prior to the start of the experiment. All participants Bilingualism: Language and Cognition reported using the L2, English, more frequently than the L1, Dutch, which was mostly restricted to use with family members. See Appendix S1 (Supplementary Materials) for the full list of situations and a tally of responses to this question. No participant reported any hearing problems. All participants provided written informed consent prior to the start of the experiment and were paid for their participation.

Materials
Stimulus materials were taken from Experiment 3 of Cooper et al. (2002) and consisted of truncated recordings of 21 pairs of English words, spoken by a male native speaker of Australian English (see Appendix S2, Supplementary Materials). Words in each pair differed in their stress pattern, so that in each case one word had primary stress on the first syllable (e.g., RObot), while primary stress for the other word fell on the second syllable (e.g., roBUST). To ensure the truncated words in each pair were segmentally the same and differed only suprasegmentally, the first syllable of all words always contained a full vowel. Mean log word frequencies in the CELEX lexical database of English (Baayen et al. 1995), as reported by Cooper et al. (2002), were 2.18 for first-syllable stress words, and 1.88 for second-syllable stress words. Each word was truncated at the end of the first syllable and had been recorded twice, resulting in a total of 84 spoken word fragments, that were each presented twice (making 168 trials). Mean durational, F0 and amplitude measures for the syllable fragments are shown in Table 1, averaged across all fragments with the same stress type. All measures were computed over the voiced portion of a fragment only, with the exception of duration, which was measured over the entire fragment. In conformity with the study by Cooper et al., different pseudo-randomised stimulus lists were created for all participants, and fragments from the same word pair never occurred in successive trials.

Procedure
Participants were tested individually in a sound-attenuated booth. Auditory stimuli were presented over Beyerdynamic DT770 PRO headphones at a comfortable sound level, kept constant for all participants. Instructions in English were displayed on the computer screen and were repeated and clarified orally (in Dutch) by the experimenter. Participants were instructed to listen carefully to each word fragment and decide whether the fragment they heard formed the beginning of the word displayed on the left of the screen or of that on the right. The screen position (left or right) of the word that was the correct response was counterbalanced across presentations of the same word fragment. At the start of each trial, the response words were displayed on the computer screen for a preview period of 2000 ms. The truncated word fragment was then played and participants gave their response. There was no time-out period and the next trial started 500 ms after a response was received. Participants responded using the shift keys, pressing the left shift key to select the word printed on the left of the screen and the right shift key to choose the word printed on the right. Upon completion of the experiment, participants completed the English version of the LexTALE (Lemhöfer & Broersma, 2012) to assess their English proficiency.

Results and discussion
One trial had a response time of less than 100 ms and was therefore excluded from all analyses reported below. The results of the remaining trials are displayed in Figure 2a. For comparison, Figure 2b contains the mean results of both the English and Dutch listener groups tested by Cooper et al. (2002; henceforth referred to as L1 CONTROLS and L2 CONTROLS, respectively). Overall, the emigrants correctly identified the source word for 61.9% of truncated fragments. They assigned fragments more accurately to their source words when they had first-syllable stress (72.3%) than when they originated from words with second-syllable stress (51.5%). This asymmetry may be the result of the fact that listeners selected the response option with firstsyllable stress more often than the other option. Indeed, in 60.4% of all trials, participants judged a word with first-syllable stress to be the source of the fragment they had heard, and this percentage is very similar to the first-syllable-stress judgments on these same materials made by the L2 (58.5%) and L1 listeners (62.9%) of Cooper et al. (2002). This bias towards words with first-syllable stress may reflect differences in word frequency (of the source words used in the present experiment, those with first-syllable stress had higher word frequencies than those with second-syllable stress), in acoustic clarity (syllables with primary stress tend to be articulated more precisely; Scarborough et al., 2009), and/or in the lexical statistics of stress patterns (firstsyllable stress is the most frequently occurring stress pattern in English; Clopper, 2002;Cutler & Carter, 1987).
The emigrants' overall identification accuracy was statistically compared to that of the L1 (mean accuracy = 59.2%) and the L2 controls (mean accuracy = 72.3%) from Experiment 3 of Cooper et al.'s (2002) study by fitting a generalised linear mixed-effects model to the combined data from the study by Cooper et al. and the present experiment. This was done in R (R Core Team, 2019), using family 'binomial' and the logit-link function from the lme4 package (Bates et al., 2015). Listener group (emigrants, L2 controls, L1 controls) was entered into the model as a fixed categorical predictor. This predictor was coded using Helmert contrasts, such that the beta value of Group1 represents the difference between the mean of the L1 listeners on one hand, and that of both groups of L2 listeners combined on the other, whereas the beta value of Group2 represents the difference between the means of those latter two groups (see Table 2 for the contrast matrix). Random intercepts were added to the model for participants and items. Results of the model fit are displayed in Table 3, and showed significant effects of Group1 and Group2. Post-hoc analyses with Tukey-adjusted α-levels were conducted with the emmeans package (Lenth, 2019) and revealed that the emigrants' accuracy was significantly different from that of the L2 controls ( p < .001) but not from the accuracy of the L1 controls ( p = .58). This suggests that the emigrants no longer use suprasegmental stress cues to the same extent as their compatriots who remained in the Netherlands. We then compared the emigrants' response accuracy to chance level (i.e., 50%) with a two-sided binomial test. Since the aforementioned bias towards first-syllable-stress responses prevents a meaningful interpretation of participants' accuracy for fragments with this stress pattern, this comparison was only carried out with participants' judgments for items with second-syllable stress (cf. Cooper et al., 2002). While the L2 controls had performed significantly better than chance, this was not the case for the emigrants, who performed neither better nor worse than chance level (z = 1.34, p = .181).
In sum, the results from this experiment clearly show that the Dutch emigrants do not exploit suprasegmental information to the same extent as Dutch L2 listeners living in the Netherlands, and that their use of this information is more in line with that of English L1 listeners. This indicates that after an extended period of daily L2 use, the emigrants have learned the properties of the English lexicon and adjusted the way they listen accordingly to optimise processing efficiency. This finding can be interpreted in different ways. On one hand, the emigrants may have expanded their strategy repertoire to include not only an L1-specific way of suprasegmental cue use, but also an extra, L2-specific way. Alternatively, under influence of their L2, the emigrants may have lost the L1-specific ability to exploit suprasegmental cues in favour of a new strategy that is more efficient for the L2, essentially replacing one strategy with another. Under this interpretation, the emigrants would only have the new L2-specific strategy at their disposal, even when listening to their L1.
To determine which of these two interpretations is the most likely, we decided to examine the emigrants' use of suprasegmental stress cues in their L1, Dutch. However, in contrast to Experiment I, for which stimuli and control data were readily available from the literature, no suitable stimuli nor pre-existing control data were available for this Dutch experiment. Previous studies of suprasegmental stress cue use in Dutch (e.g., Cutler & van Donselaar, 2001;Donselaar et al., 2005;van Heuven, 1988) could not provide a direct comparison as they had used paradigms that differed from the present study. Therefore, our new stimulus materials were first tested with a group of Dutch L1 listeners living in the Netherlands (Experiment II), before the emigrants' use of suprasegmental stress cues in Dutch was assessed using the same materials (Experiment III).

Participants
Participants were 20 native Dutch-speaking participants (aged 18-67 years, M = 28.1, SD = 15.6; 15 females), recruited from the participant pool of the Centre for Language Studies at Radboud University in Nijmegen, the Netherlands. All were native speakers of Dutch and none reported any hearing problems. Data from a further five participants were excluded because it was revealed after testing had been completed that they did not meet participation requirements. Participants were given the choice between a gift voucher or course credit in return for their participation. Written informed consent was obtained from each participant before the experiment.

Materials
Twenty-one pairs of bisyllabic Dutch words (see Appendix S3, Supplementary Materials) were recorded by a 29-year-old female native speaker of Dutch. First syllables in each pair were segmentally the same but suprasegmentally different, in that one word in each pair had primary stress on the first syllable (e.g., GIEter "watering can"), while primary stress for the other word fell on the second syllable (e.g., giTAAR "guitar"). Mean log word frequencies in the CELEX lexical database of Dutch (Baayen et al., 1995), were 0.79 for first-syllable stress words, and 0.60 for second-syllable stress words. Each word was recorded    Table 4, averaged across all fragments with the same stress type. As for Experiment I, duration was calculated across the entire syllable, whereas all other measures were computed over the voiced portion of the fragment only. Each fragment was presented four times each, for a total of 168 trials. As in Experiment I, each participant was presented with a different pseudo-randomised stimulus list, and fragments from the same word pair never occurred in successive trials.

Procedure
The procedure was identical to that of Experiment I, with the following exceptions. Participants were tested individually in a quiet room at the Centre for Language Studies at Radboud University, using Sennheiser HD215 headphones. Written and oral instructions were in Dutch.

Results and discussion
There were no responses faster than 100 ms, so none were excluded. Overall response accuracy was 71.6%, with participants correctly selecting the source word for 71.4% (SD = 10.5) of fragments from words with first-syllable stress, and for 71.9% (SD = 9.7) of fragments from words with second-syllable stress (see Figure 3). Unlike the results of Experiment I, the response pattern was symmetric across fragment types, and there was no response bias; participants selected the first-syllable-stress response in 49.7% of all trials. As in Experiment I, we then compared participants' judgments to chance level (i.e., 50%). The absence of response bias allowed us to do this for both types of fragments. Two-sided binomial tests showed that participants correctly identified both fragment types above chance level (first-syllable stress: z = 17.54, p < .001; second-syllable stress: z = 17.98, p < .001). The high response accuracy and above-chance performance found here are in line with previous findings regarding Dutch listeners' use of suprasegmental cues to lexical stress and thus confirm the appropriateness of our set of Dutch stimulus materials.

Participants
Twenty of the emigrants who had previously participated in Experiment I also completed the present experiment. The remaining four emigrants were unavailable for participation. Participants were aged 27-73 years (M = 49.1, SD = 15.3; 11 females) and had resided in Australia for an average of 20.5 years (SD = 15.6). Their mean age at migration was 29.2 years (SD = 8.19, range: 18-52). Participants' mean score on the Dutch version of the LexTALE was 91.7, indicating that they maintained high proficiency in their L1, Dutch, despite migration to an English-speaking environment. Participants provided written informed consent before the start of the experiment and were paid for their participation.

Materials and procedure
Stimulus materials and procedure were as in Experiment II, with the following exceptions. The emigrants were tested in a quiet room at our lab, their house or their workplace, using Sennheiser HD280 headphones. Written and oral instructions were in Dutch. To assess their Dutch proficiency, participants completed the Dutch version of the LexTALE (Lemhöfer & Broersma, 2012) once the experiment had finished.

Results and discussion
Data for one participant were lost due to experimenter error. Thus, the results from 19 emigrants were included in the analyses reported below. There were no responses faster than 100 ms, so none were excluded. The results are shown in Figure 4a; for comparison, the mean results of the Dutch listeners tested in Experiment II are shown in Figure 4b. Participants correctly responded in 66.7% of all trials. As in Experiment II, there was no response biasfirst-syllable-stress responses were given on 51.1% of trialsand the response pattern was symmetrical: the emigrants correctly assigned 67.8% (SD = 10.4) of fragments   from words with first-syllable stress, and 65.6% (SD = 12.0) of fragments from words with second-syllable stress. Participants' judgments were once again compared to chance (i.e., 50%) with two-sided binomial tests. This comparison showed that participants performed above chance level assigning fragments from words with first-syllable stress (z = 14.24, p < .001), as well as from words with second-syllable stress (z = 12.52, p < .001) to their source words. The performance accuracy of the emigrants was compared statistically to that of the Dutch control participants of Experiment II with a generalised linear mixed-effect model, again using family 'binomial' and the logit-link function from the lme4 package. Listener group (with Emigrants coded as −0.5 and Controls as 0.5) was entered into the model as a deviation-coded fixed categorical predictor. Random intercepts were added to the model for participants and items. Results of the model fit are displayed in Table 5, and showed no significant effect of Listener group, suggesting that the emigrants do not differ from the control participants in the extent to which they exploit suprasegmental stress cues in Dutch.
This experiment investigated the ability of Dutch-English bilingual emigrants to exploit suprasegmental stress cues in their L1, Dutch. While the emigrants gave slightly fewer correct responses than a control group of L1 listeners residing in the Netherlands, statistical comparisons indicated that this difference was not significant. This suggests that when it comes to the use of suprasegmental stress during L1 listening, the emigrants still apply L1-appropriate lexical procedures, despite the fact that they no longer live in an L1 environment and predominantly use the L2 in daily life.

General discussion
Efficient listening is tailored to the language of input. Our results show that listeners who competently use two languages can adjust their speech processing separately for each language. This holds even when the languages share a particular phonological feature which is realised similarly in each; if the vocabulary structures are such that attention to that feature speeds lexical recognition in one language but not in the other, listeners indeed apportion their attention differently in accord with this contrast in utility. Also notably, this processing asymmetry occurs even in cases where the feature is useful in the L1 but not in the L2; a known and well-used L1 processing operation that can easily be applied in just the same way to the L2 will nonetheless eventually be abandoned if the return that it offers on word recognition speed is low.
Dutch and English are both lexical stress languages, with stressed and unstressed syllables suprasegmentally differing from one another in the same ways in each language. It is known that the amount of lexical competition is significantly reduced by taking suprasegmental information into account in Dutch, and that listeners do avail themselves of this assistance in listening. In our second experiment here we have extended evidence for this processing behaviour to a task not previously tested with listeners to Dutch. It is further known that this facilitatory effect of competition reduction for recognition does not appear in English, and that English listeners generally ignore the suprasegmental dimensions in recognising words.
Also previously known was that Dutch listeners with English as their L2 would indeed attend to the relevant suprasegmental cues when presented with English stimuli, and in consequence would actually outperform English-native listeners in a simple fragment identification task. What was not previously known was what we have shown in Experiments I and III of our present study: that when such listeners were no longer living in their L1 environment, but instead were exposed to the L2 on a daily basis, this greater experience would lead them to abandon the suprasegmental processing for English (despite maintaining it in their L1 Dutch). Thus, sufficient experience in L2 listening can, at least in this dimension, cause a listener to listen like a native.
The listener's ability to adapt to the conditions under which listening occurs is well documented; it is quite often our lot to have to compensate for noisy listening environments, or for talkers with unfamiliar speech patterns, and when the language involved is our L2 rather than our L1, the task is known to become even harder (Garcia Lecumberri et al. 2010). Notwithstanding this variation in the resulting difficulty, the flexibility of listening is expressed in L2 as clearly as in L1, and our present results confirm that this can result in language-by-language adjustment differences.
Note that the emigrant community we have studied here is already known to consider the language as a factor in fine-tuning phoneme categorisation decisions separately for individual talkers. This fine-tuning, vital to successful speech processing, can be elicited in the laboratory using a two-part procedure in which listeners first hear a slightly unusual phoneme presented within a word that supports a clear phonemic interpretation (e.g., an unusual [f] at the end of autogra-); this is then followed by a phoneme categorisation test, with materials spoken by the same talker. The latter test reveals that (in this case) the listeners' phoneme category for [f] as uttered by that talker has expanded to include the unusual sound that was heard. The emigrants' adaptation processes were found, in such a test, to be active in their (dominant) L2 English, but not in their L1 Dutch (Bruggeman & Cutler, 2020). That is, their speech perception was subject to language-specific constraints.
This L1/L2 adaptation asymmetry was ascribed to differences in the talker populations that provided the emigrants' conversation partners. Although they reported using both English and Dutch extensively and regularly, the interlocutors involved in their English conversations were many and varied, while their Dutch interactions were mainly with family members. The proposed explanation was that such adaptation processes need not be called upon with highly familiar interlocutors, whose particular speech patterns will be long-known. Interestingly, the emigrant group was not alone in showing a differential listening pattern across interlocutor groups; just such a pattern also appeared in heritage learners with differing interlocutor groups for their languages (Cutler et al. 2019). These participants, born into Mandarin-speaking families but living in English-speaking Sydney, showed, like the emigrants, strong perceptual learning only in English (the environmental language which was also their language at work and among their friends). In Mandarin, substantially less learning was observed. And, like the emigrants, the heritage learners largely confined their use of their earliest language to interactions with family members. Note that the Bilingualism: Language and Cognition 7 same Mandarin materials used in that study had produced robust perceptual learning with other participant groups, as had the Dutch materials used by the emigrant listeners. Thus, the flexibility of speech perception as displayed in perceptual learning also leads, where necessary, to outcomes that differ across a bilingual's languages.
The present finding significantly extends the scope of these earlier findings in that it involves a particular level of processing which, as the evidence shows, can be switched on or switched off. Suprasegmental realisation of syllables is noted, and with phonemic information is taken into account in lexical recognition decisions. We have seen evidence of this processing in our listeners' L1, but not in their L2. In failing to apply such processing in the L2, the listeners have succeeded in listening in just the way that native L1 English listeners do. Now it might seem sensible (and has been staunchly held to be so: van Heuven, 1988) that if you as a listener are accustomed to taking account of suprasegmental cues to stress in lexical processing in L1, and the same cues are present in your L2, then you would use them there too. They might even provide a little reduction in the overall disadvantage that is the lot of L2 listeners. But there is clear evidence from the statistics of the English lexicon that accounts for why native listeners to English avoid using these cues. The added information provides, on average, less than a one-word difference in the number of competitor words that the listener needs to consider (Cutler & Pasveer, 2006, comparing word overlap statistics with versus without taking into account syllabic stress match). In languages in which experimental evidence indicates regular use of such cues by native listeners, analyses like this have shown much larger competition effects than are seen in English. In Spanish, for instance, in which use of the cues leads to competition being reduced by two-thirds (Cutler et al. 2004), and in Dutch there is a 50% reduction (Cutler & Pasveer, 2006).
Note that both the Dutch and English vocabularies have a strong tendency for words to bear stress on the first syllable. Comparing the two complete vocabularies, however, reveals that medial syllables are significantly more likely to bear secondary stressi.e., have a full vowel but not be marked for primary stress in Dutch than in English. In English, the most likely vowel option for such medial syllables is the unstressed vowel schwa. If a lexical manipulation is undertaken, wherein those syllables in Dutch words are deemed to contain schwa instead of their actual full vowel, then recalculated competition statistics for such a version of Dutch now resemble those reported by Cutler and Pasveer (2006) for English .
Thus, it is actually sensible behaviour for English L1 listeners to take no account at all of the suprasegmental cues, because their usefulness is so limited that, one can only conclude, it simply does not warrant taking on the processing load that would result from adding such a calculation into the process of word recognition. As the empirical evidence described in the introduction of this study reveals, taking no account of this suprasegmental dimension is exactly what L1 users do when processing English words. The present results are thus encouraging for L2 learners; when an L1 listening strategy fails to provide us with increased processing efficiency for the L2, it may be abandoned in favour of a strategy more suitable for the language in question.
The findings of the present study are more heartening for L2 listeners than those of several previous phonetic examinations of late-bilingual emigrants, which have suggested that native-like L2 speech perception may be hard to achieve for those who, like the Dutch emigrants of the present study, move to the L2 environment as adults. Native Italian-speaking and native Catalan-speaking emigrants in Canada still perceive L2 English vowels differently from native English listeners more than 20 years after emigration and despite using English as the dominant language (Cebrian, 2006;Flege & MacKay, 2004). Spanish-Swedish bilingual emigrants do not categorise L2 Swedish voice onset times in a native-like way even after residing in Sweden for many years (Abrahamsson & Hyltenstam, 2009). The language pairs in these studies were not as closely related as Dutch and English, so their typological distance may have played a role in listeners' difficulty attaining native-like L2 perception. However, a subset of the same Dutch emigrants who participated in the present study were previously found to be insensitive to the transitional cues in English that native listeners use to identify /f/ and /s/ (Bruggeman, 2016;Cutler et al. 2016), suggesting that typological proximity cannot be the sole factor enabling nativelike L2 perception. The level of processing involved may also play an important role: in contrast to the lexical task used in the present study, all aforementioned studies concerned the lower-level perception of segmental differences; prosody was not investigated. Conceivably, lower level speech processing may be less malleable than the higher level of processing examined here.
Previous research on the use of lexical stress and other prosodic cues during L2 listening has mostly focused on the ACQUISITION of new listening strategies (i.e., listeners' ability to exploit cues that are useful for the L2 but are not used in their L1), and has found varying degrees of success holding for such acquisition (e.g., Choi et al., 2019;Dupoux et al., 2008;Gilbert et al., 2021;Gilbert et al. 2016;Lin et al., 2014;Tremblay, 2008;Tremblay et al., 2021). There is also some evidence suggesting that listeners are influenced by listening strategies from both their L1 and L2 when processing unfamiliar languages, without committing to a single strategy, even if strategies conflict with one another and one of them would be most appropriate for the unfamiliar language .
This does not occur for all L2 listeners under all circumstances, as evidenced by the results of Cooper et al. (2002). The listeners in that study differed from those in the present study on several dimensions: they were proficient enough in their L2 to complete listening experiments in that language, but they were not immersed in the L2 and did not use it as often, nor in as many facets of daily life, as the emigrants in the present study. As with many other aspects of L2 processing, a certain level of proficiency, usage, and/or dominance may thus be necessary for listeners to achieve a result such as we report here: being able to successfully ignore cues in the L2 though the same cues are useful in the L1. We do not know how large the L2 lexicon has to be to achieve this result, or even whether actually knowing many words is the crucial trigger. Future research should probe this further.
In the L2 version of the experiment, the emigrants gave a similar percentage of first-syllable-stress responses (60%) as the L1 controls from the study by Cooper et al. (2002). This buttresses our conclusion that they have acquired a sense of the lexical statistics of English, and of the frequency of occurrence of stressed and unstressed full vowels. Interestingly, in the L1 version of the experiment, the emigrants gave fewer first-syllable-stress responses (51%) than in the L2 version; indeed, their percentage was similar to that of the L1 controls. The emigrants thus appear to have retained earlier acquired knowledge about the Dutch lexicon and its statistics.
In the paradigm used here, listeners respond once their processing of the auditory stimulus has finished, leading to a measure of their use of the suprasegmental stress cues that may be termed 'offline'. Thus, although the emigrants' use of suprasegmental cues to lexical stress proved native-like in both their languages, we do not yet know exactly how they analyse such cues during spoken-word recognitionfor instance, we do not know whether bilinguals' knowledge of L2 suprasegmental properties modulates lexical competition in L2 listening, as L1 knowledge does for L1 listeners (Jesse et al. 2017;Reinisch et al. 2010;Sulpizio & McQueen, 2012).
Future studies could also examine the exact (segmental and suprasegmental) acoustic cues used by each listener population. The stimulus materials of the present study were specifically designed not to contain segmental cues to lexical stress. So, while our findings clearly indicate that the Dutch emigrants use SUPRASEGMENTAL cues to English stress to a similar extent as L1 listeners of English, they only allow indirect inferences regarding their use of SEGMENTAL cues. Additionally, since we used naturally produced speech and did not manipulate duration, amplitude and pitch independently of one another, the results of the present study do not allow us to draw conclusions about the exact weighting listeners apply to the individual suprasegmental cues to stress. Dutch listeners to L2 English prioritise pitch cues (Cutler et al., 2007;Tremblay et al., 2021). However, since proficient Dutch L2 listeners seem to pay more attention to vowel quality in English than those who are less proficient , it appears that L2 cue weighting becomes more nativelike with increasing proficiency. To further explore this possibility, we examined which acoustic correlates to stress were used by the highly proficient emigrants tested here in Experiment I, and compared this to the findings of Cutler et al. (2007), who analysed the cue use of the less-proficient Dutch L2 listeners of English who participated in the study by Cooper et al. (2002). This showed a mixed pattern: like Cooper et al.'s L2 listeners, the emigrants identified fragments with primary stress more accurately with higher mean rms amplitude. However, no significant correlations were found between the emigrants' response accuracy and any of the other cues used by the less-proficient L2 listeners tested by Cooper et al. (2002). This supports the notion that cue weighting may change with increasing proficiency. For L2-dominant, highly proficient L2 listeners like the emigrants tested here, this may then potentially have follow-on effects on the native-likeness of their cue weighting in the L1, Dutch.
As noted, Dutch and English are closely related languages that are highly similar in many respects. Among the things we do not know is also whether the present findings might extend to L2 listeners whose L1 and L2 are more typologically distant. On the one hand, such listeners may be more likely to abandon L1 strategies that do not improve listening efficiency during L2 listening, since the greater difference between their languages may have lowered their expectations regarding the cross-language applicability of listening strategies. On the other hand, the lack of overlap between their languages may make it harder for these listeners to abandon L1 strategies. The acquisition of new listening strategies for the L2 that increase proficiency may have to be prioritised over the abandonment of L1 strategies that impede efficiency but not performance. But this is for the future; for now, we are sure that processing strategies may be deployed in a language-by-language fashion. We further know that familiar L1 strategies that may have been applied to the L2 when the L2 was a newer experience can, once that L2 has become a more familiar and even dominant communication medium, simply be jettisoned if they do not pay off.