How does having a good ear promote successful second language speech acquisition in adulthood? Introducing Auditory Precision Hypothesis-L2

Abstract In this paper, I first provide a brief review of how scholars have conceptualized, tested, and elaborated aptitude frameworks relevant to second language (L2) speech learning. Subsequently, I introduce an emerging paradigm that assigns a fundamental role to domain-general auditory processing (i.e., having a good ear) in L1 speech acquisition and proposes that the same faculty acts as a cornerstone of L2 speech learning (i.e., the Auditory Precision Hypothesis-L2). This hypothesis predicts that learners with more precise auditory processing ability will be able to make the most of every input opportunity, which will result in more advanced L2 speech proficiency. To close, I will provide suggestions on how scholars can assess L2 students’ auditory processing ability (e.g., our team's offline test deposited at L2 Speech Tools for Researchers & Teachers [http://sla-speech-tools.com/]) and discuss how the results can be used to maximize learners’ L2 speech learning opportunities via optimal, profile-matched training programs (e.g., explicit vs. incidental training; naturalistic vs. classroom learning; phonetic vs. auditory training).


Introduction
Learning speech in a second language (L2) after puberty is a difficult task that is characterized by a great deal of individual variation.Some learners can achieve high-level L2 oral proficiency while others have a tremendous amount of difficulty doing so.These differences could be owing to not only the amount of time spent practicing the target language, but also to learners' ability to make the most of every opportunity for input via a range of perceptual and cognitive abilities relevant to L2 acquisition (i.e., APTITUDE; Wen & Skehan, 2021).Examining these abilities can provide a deeper theoretical understanding of the mechanisms that underlie and drive how learners process input, convert it into intake, and acquire a new language.There has been much debate about whether such mechanisms are specific to language learning or generalizable across various kinds of learning behaviors (domainspecific vs. domain-general; e.g., Hamrick et al., 2018); and whether they differ in first language (L1) and L2 acquisition (the degree of awareness; e.g., Diaz et al., 2016).An examination of this topic also has considerable pedagogical relevance.An understanding of individual aptitude profiles could help teachers identify students who would likely benefit more from certain types of instructional approaches.For example, those with explicit language learning aptitude would likely benefit more from a language-focused approach (e.g., metalinguistic instruction), while those with stronger implicit learning aptitude may benefit from implicit and meaning-oriented instruction (i.e., aptitude-treatment interaction; DeKeyser, 2012).
In this paper, I will first briefly review a range of aptitude frameworks relevant to L2 speech learning and then introduce an emerging paradigm that holds that having a good ear 1 (i.e., auditory processing precision) serves as an anchor of L1 acquisition and L2 speech learning in adulthood (i.e., the Auditory Precision Hypothesis-L2).Auditory processing is a complex of domain-general perception abilities related to encoding the acoustic characteristics of sounds.Since auditory processing is the first ability that learners rely on to extract linguistic information from spoken input, any individual differences in this ability are thought to affect various dimensions (segmentals, suprasegmentals, vocabulary, morphosyntax) and phases (speed of learning, ultimate attainment) of language learning.Finally, I will discuss how we can assess L2 students' auditory processing ability (e.g., our team's offline test deposited at L2 Speech Tools for Researchers & Teachers [http://sla-speech-tools.com/]) and make a range of pedagogical suggestions about how such assessments could be used to provide more effective instruction.Following the aptitude-treatment interaction paradigm, I will explain how L2 learners with diverse aptitude profiles (explicit vs. implicit; acuity vs. integration; strong vs. poor) can be encouraged to understand, speak, and master their L2 through profile-matched training programs (explicit vs. incidental; naturalistic vs. classroom; phonetic vs. auditory).

What is L2 speech learning?
According to Saito and Plonsky's (2019) framework, L2 speech proficiency comprises: (a) the ability to perceive and produce novel (or partially acquired) consonantal and vocalic sounds in an L2 without deleting and substituting them for L1 counterparts (i.e., SEGMENTAL proficiency); (b) the ability to use adequate and varied stress (characterized by longer, louder, and higher pitch) at the word (e.g., correct assignment of word stress) and sentence (appropriate use of intonation for declarative and interrogative intensions) levels (i.e., MELODIC and PROSODIC proficiency); and (c) the ability to deliver speech at an optimal tempo without making too many pauses or repetitions/self-corrections (RHYTHMIC and TEMPORAL proficiency).The last two dimensions have often been collectively described as "SUPRASEGMENTAL proficiency" (Trofimovich & Baker, 2006).The development of precise, robust, and refined L2 segmental and suprasegmental representations is fundamental for reaching advanced levels of listening (Field, 2008) and speaking proficiency (Levis, 2006).With solid L2 segmental and suprasegmental representations, L2 learners can more easily process phonologically similar and complex words (Saito, 2013), perceptually nonsalient morphosyntactic markers (Goldschneider & DeKeyser, 2001), and a range of discourse functions (Brazil, 1997), all of which underpin successful oral communication (Isaacs et al., 2018).
Scholars have extensively examined how learners' L1 phonetic systems influence their L2 speech acquisition.Major frameworks addressing this topic include the Speech Learning Model (Flege & Bohn, 2021), the Perceptual Assimilation Model (Best & Tyler, 2007), Structural Conformity Hypothesis (Eckman, 2004), and the Optimality-Theoretic Model (Escudero & Boersma, 2004).These theoretical accounts share the view that the phonetic distance between the L1 and L2 systems is partially responsible for determining the degree of speech learning difficulty.For example, very few Japanese speakers can perceive and produce English [r] and [l] contrast at a nativelike level because the 1 "Having a good ear" is listed in the Cambridge Dictionary as a commonly used expression meaning "good at hearing, repeating, and understanding … sounds [of music and language]."This expression is often used when discussing what is needed for the attainment of advanced L2 pronunciation proficiency.Given that this paper was written for Language Teaching, whose readers include practitioners as well as researchers, having "a good ear" was included in the title (and elsewhere) to ensure that the relatively novel and highly complex subject matter was accessible to the entire readership.Having said that, I must stress that the dichotomy implicated in this expression (i.e., good vs. bad) has been problematized in L1 acquisition literature.While many tasks have thresholds for the diagnosis of auditory processing disorders in various populations (Moore, 2006), it has been suggested that the nature of the link between auditory processing and language learning could be better characterized as a "spectrum" rather than a "dichotomy" (for a critical review on considerable heterogeneity and variability in the operationalization of auditory processing, see Protopapas, 2014).While some children with certain language impairments (e.g., dyslexia) may have less precise auditory processing, the degree of auditory precision and language learning is still subject to a great deal of individual variation even among so-called normal hearing children (Kalashnikova et al., 2019).relevant auditory and articulatory cues are not actively used in the L1 system (third formant frequencies and labial, alveolar, and pharyngeal constrictions; Iverson et al., 2003).
Another line of research has explored which individual difference factors predict advanced L2 speech acquisition.For instance, a wide body of research suggests that factors related to both the quantity (how much learners are exposed to and practice a target language) and quality (with whom learners use a target language [L1 vs. L2 users]), and timing of language experience (how early participants have started learning a target language and have arrived at an L2 speaking country; e.g., Derwing & Munro, 2013) are related to L2 speech learning outcomes.
However, research has shown that experience factors alone cannot fully explain the variability in ultimate L2 speech attainment.In one of my projects, for example, I examined the accuracy of English [r] production among approximately 200 L1 Japanese L2 English late bilinguals in Canada (Saito, 2015).All participants had an extensive amount of immersion experience (length of residence > 6 years), had arrived in Canada after puberty, and used their L2 (English) every day as a primary language of communication.Despite their similar backgrounds and overall speaking proficiency, analysis of the participants' performance on word reading and picture description tasks suggested that the degree of their English [r] pronunciation attainment widely varied-some demonstrated nativelike pronunciation while others had detectable L1 accents.
One hypothesized source of the individual variation observed in Saito (2015) and other studies (e.g., Abrahamsson & Hyltenstam, 2008) is APTITUDE-a talent for processing the L2 input more efficiently and/or effectively, resulting in larger learning gains in the long run (Doughty, 2019).But what characterizes aptitude for successful L2 speech learning?To answer this question, I will first provide a selective review on the role of aptitude in adult L2 speech learning.

What is L2 speech aptitude?
Fifty years of research have provided evidence that aptitude plays an important role in L2 vocabulary and morphosyntax learning (for comprehensive overviews, see Li, 2016;Wen & Skehan, 2021).This has led to the development of thorough conceptual and methodological aptitude frameworks, such as the Modern Language Aptitude Test (MLAT; Carroll & Sapon, 1959), LLAMA (Meara, 2005), and Hi-LAB (Linck et al., 2013).Although the existing aptitude tests do include audio materials (e.g., sound recognition in LLAMA-D) and refer to their relevance to speech learning on a broad level (e.g., phonemic coding in MLAT for "oral proficiency"; Baker Smemoe & Haslam, 2013), very few studies have examined in depth to what degree, how, and why their specific aptitude tests tap into the development of segmental, melodic, and temporal proficiency.In their focused review, Trofimovich et al. (2015) pointed out that "there has been little systematic research on the relationship between various components of aptitude and L2 pronunciation learning" (p.354).
To fill this gap, I surveyed a range of studies on this topic (i.e., aptitude and speech learning), focusing particularly on those published since Trofimovich et al.'s (2015) call for further research (Table 1).Skehan (2016) provided a set of useful frameworks that researchers can use to survey and categorize different types of L2 aptitude.The frameworks include: (a) linguistic focus (which dimensions of language is the aptitude related to?), (b) domain generality (is the aptitude specific to L2 learning or applicable to all learning behaviors?), and (c) explicitness (is the aptitude associated with explicit or implicit learning?).Accordingly, the following criteria were considered in the creation of a framework for L2 speech learning aptitude: • Is the aptitude relevant to segmental learning (enhancing consonant and vocalic accuracy) or suprasegmental learning (melody and rhythm)?• Is the aptitude domain-specific or domain-general?
• Is the aptitude associated with explicit or implicit learning?
As summarized in Table 1, though limited in number, a growing number of studies have explored the relationship between aptitude and L2 speech learning outcomes.The results of this body of work have shown that different types of aptitude (explicit vs. implicit; domain-general vs. -specific) uniquely relate to different areas of L2 speech learning (segmental vs. suprasegmental learning).

Auditory processing as an emerging aptitude framework
More recently, some scholars (including our team) have begun to conceptualize, test, and elaborate on an aptitude framework based on a very simple hypothesis: that having a "good ear" (i.e., domaingeneral auditory processing ability) is the root of language acquisition (Mueller et al., 2012).Since auditory processing is the first ability that infants rely on to parse incoming linguistic input, the detection and interpretation of acoustic information underlies every stage of phonetic, phonological, lexical, and morphosyntactic learning and delay.Thus, it is possible that auditory processing can explain the rate and ultimate attainment of L2 acquisition as well.
Auditory processing refers to a set of lower-order abilities related to precisely perceiving individual dimension of acoustic information, such as pitch (the perception of the lowest, fundamental frequency of a sound wave), formants (acoustic energy concentrations resulting from resonance), duration (length of sounds), and intensity (loudness of sounds).Corresponding to an influential view in cognitive psychology, auditory processing can be considered domain-general, and forms the basis of multiple domain-specific phenomena, such as music, emotion, environmental sounds, and language (Kraus & Banai, 2007).To measure such domain-general abilities, a number of synthesized stimuli are prepared.Since these stimuli comprise very simple acoustic characteristics (e.g., completely flat fundamental frequencies and formant contours), normal hearing listeners will not perceive them as speech.While exposed to the NONVERBAL stimuli, participants are assessed for their abilities to precisely perceive one particular acoustic dimension (e.g., pitch, duration).
In the context of language learning, this domain-general ability is thought to play a key role in the development of phonology, vocabulary, and morphosyntax.For example, infants rely on auditory Note.Phonemic coding refers to sound-symbol correspondence (featured in Carroll and Sapon's [1959] MLAT framework); tonal and rhythm imagery refers to sensitivity to differences in melody and rhythm (featured in Gordon's [1995] notion of Music Aptitude Profile) processing to detect the probabilities of individual phonemes in the L1 system within the first six to eight months of their life (Werker, 2018).During this critical period, every phoneme can be statistically defined in accordance with the different weighting of multiple acoustic cues, such as pitch (F0), first formant (F1), second formant (F2), third formant (F3), duration, and intensity (Kuhl, 2004).Auditory processing is also instrumental to the identification of word and phrase boundaries (Cutler & Butterfield, 1992), syntactic structures (Penner et al., 2001), and morphosyntactic markers (Joanisse & Seidenberg, 1998;Koester et al., 2004).
In terms of development trajectory, children reach adult-like auditory processing within the first eight to ten years of life (e.g., Thompson et al., 1999 for pitch discrimination;Elfenbein et al., 1993 for duration discrimination).From their early 20s onwards, however, auditory processing gradually declines over the rest of the lifespan (Skoe et al., 2015; but see the relatively slow peak and decline curve on the development of audio-motor integration abilities, see Thompson et al., 2015).
Based on these observations, many L1 acquisition researchers have put forth the hypothesis that auditory impairments are a source of many language problems (Goswami, 2015); that is, if someone experiences deficits in auditory processing, it immediately affects their speech perception, which could, in turn, prevent them from detecting, developing, and consolidating the speech categories, and could lead to a range of global language problems.For example, auditory processing measures have been suggested to be a diagnostic tool for dyslexia (Hornickel & Kraus, 2013) and other language-related disorders (Russo et al., 2008).
There is ample cross-sectional and longitudinal evidence showing that auditory individual differences among normal hearing children are significantly tied to a range of L1 outcomes (e.g., speech-in-noise perception, vocabulary use, literacy, and phonological awareness) (Anvari et al., 2002;Bavin et al., 2010;Boets et al., 2008;Tierney et al., 2021; for evidence as to how auditory processing influences L1 vocabulary development over the first three years of life, see Kalashnikova et al., 2019).In addition, correlation studies have shown a medium-to-large relationship between reading difficulty and auditory deficits for various dimensions of nonverbal sounds (see McArthur & Bishop, 2005 for frequency; Casini et al., 2018 for duration; Goswami et al., 2011 for amplitude rise time).
Because the Auditory Precision Hypothesis concerns causality, it is naturally subject to a great deal of controversy.Specifically, some scholars have argued that not all dyslexic children and adults have auditory deficits (see Rosen, 2003 for an overview).From a methodological point of view, it is important to remember that behavioral tasks for measuring auditory perception (e.g., A × B discrimination; for details, see below) inevitably tap into a set of higher-order executive skills (e.g., attentional control, memory), in addition to lower-order skills.For instance, the highly repetitive and abstract nature of laboratory tasks may make it difficult for participants to maintain auditory information in working memory and thus may limit how much information is available for acoustic analysis (Zhang et al., 2016).Accordingly, individuals with language impairments may perform poorly on auditory processing tasks because of problems with both auditory processing and executive functioning, which suggests that any link between auditory processing and linguistic deficits could be confounded with higher-order cognitive abilities (Gooch et al., 2014;Henry et al., 2012;Snowling et al., 2018).

Auditory Precision Hypothesis-L2
More recently, researchers have begun to explore how well the Auditory Precision Hypothesis generalizes to adult L2 speech learning (i.e., Auditory Precision Hypothesis-L2; Mueller et al., 2012).This concurs with the assumptions underlying major L2 speech theories that the mechanisms in successful L1 speech acquisition remain active throughout the lifespan, and are germane to any new speech learning experience (e.g., Flege & Bohn, 2021).In this paper and elsewhere (e.g., Saito et al., 2020b), I would like to further argue that auditory processing could be particularly consequential in post-pubertal L2 learning (relative to L1 acquisition).This is arguably owing to the QUANTITATIVE and QUALITIVE differences between the L1 and L2 learning experiences.
Because L1 learners are normally exposed to an extensive amount of spoken language, they may be able to overcome auditory-based difficulties via remedial strategies.For example, those with pitch deficits (amusics) can still process phrase boundaries normally using durational rather than pitch information (Jasmin et al., 2020).In contrast, the amount of communicatively authentic and interactive input that L2 learners receive is generally highly limited in classroom settings (Muñoz, 2014), and subject to a great deal of individual variation in naturalistic settings (Derwing & Munro, 2013).Thus, L2 learners may have more difficulty developing a similar range of remedial strategies.
Furthermore, different from L1 acquisition, which is free of influence from prior language learning experience, L2 speakers need to encode spectro-temporal patterns through already-developed and automatized L1 perception strategies (see McAllister et al., 2002 for the feature account of adult L2 speech learning).That is, to acquire new speech categories, L2 speakers need to not only adjust their already-attuned cue weighting patterns (e.g., Chinese speakers need to use both pitch and duration to perceive L2 English prosody; Jasmin et al., 2021), but also need to learn and develop new perception strategies that they do not actively use in their L1 (e.g., Japanese speakers need to discriminate variation in F3 to perceive English [r] and [l]; Iverson et al., 2003).

Components of auditory processing
Extending several popular aptitude frameworks in second language acquisition (SLA) (Skehan, 2016) and L2 speech (see Table 1), I propose FOUR different auditory process abilities particularly relevant to adult L2 speech learning.Under this 2 × 2 model (see Table 2), the key distinctions concern: (a) whether the abilities relate to L2 speech learning with or without awareness (i.e., explicit vs. implicit) and (b) whether the abilities concern the processing of formants or prosodic information, such as pitch, duration, and intensity.Scholars have operationalized auditory processing of duration and intensity via amplitude rise time (i.e., the time/duration from the onset of a sound to its maximum amplitude; Goswami, 2015).

Explicit acuity
Explicit acuity concerns how subtle of a difference in a particular acoustic dimension (e.g., formant, pitch, duration, and intensity) learners can encode.This ability is behaviourally measured via A × B discrimination tasks, where participants hear three nonverbal sounds, one of which is different from the other two, and must indicate which sound differs.The sounds featured in this task are typically synthesized stimuli whose acoustic dimensions are identical except for one dimension.As shown in Table 2, learners' sensitivity to first, second, and third formants (F1, F2, and F3) is thought to relate to segmental learning; and their sensitivity to prosodic information (fundamental frequency [F0], duration, and amplitude rise time) is thought to relate to suprasegmental learning.Lengeris and Hazan (2010) used this type of task to measure L1 Greek English learners' FORMANT acuity.A total of 51 stimuli were developed that differed in terms of a single formant (analogous to vowel F2 = 1,250-1,500 Hz), and were presented to participants.Those who were capable of perceiving the smaller differences in formants demonstrated more learning gains when intensively exposed to multi-talker English vowels.Similarly, the Qin et al. (2021) study with 32 Mandarin learners of Cantonese found that participants with more precise pitch acuity (F0 = 100.07-178.17Hz) benefited more from the intensive exposure to multi-talker Cantonese tones.

Implicit acuity
Implicit acuity concerns learners' ability to track a particular acoustic dimension on a SUBCONSCIOUS level.Our research team has so far explored whether and to what degree auditory processing can predict the ultimate attainment of high-level L2 speech proficiency.To reach such an advanced stage of speech development, we assume that learners will need to have years of naturalistic and classroom learning experience.In addition, we assume that they will need explicit and implicit auditory processing abilities that allow them to maximize any learning opportunities, regardless of awareness.In our recent studies (Kachlicka et al., 2019;Saito et al., 2019aSaito et al., , 2020a;;Sun et al., 2021), we propose the idea of using electroencephalography (EEG) to measure how the brain tracks and reacts to the acoustic characteristics of sounds at a subcortical level (i.e., implicit acuity).
Among the many EEG paradigms in L2 speech research (e.g., Diaz et al., 2016 for a comprehensive overview), we have adopted the frequency following response (FFR) to study the subcortical auditory system (Coffey et al., 2016).During FFR tasks, participants engage in a meaning-oriented activity (e.g., reading for pleasure, watching silent movies) while listening to a range of synthesized nonverbal sounds.As attention is not required in this task, FFR data can be assumed to reflect an unconscious sensitivity to certain aspects of acoustic signals (formants, pitch) without the contaminating influence of cognitive and affective states.There is a growing amount of research using FFR that has shown that those with more precise encoding of formants likely attain more advanced L2 segmental proficiency (e.g., Saito et al., 2019aSaito et al., , 2020a) ) and that those with more precise encoding of pitch gain more from pitch-based artificial language training (e.g., Chandrasekaran et al., 2012).

Empirical evidence
Our research team has conducted a series of cross-sectional and longitudinal projects to examine the complex relationships among auditory processing, experience, and L2 speech learning.We recruited more than 400 L2 speakers of English from Poland, Spain, China, Japan, and Vietnam who had studied in naturalistic and/or classroom conditions.Those participants with any immersion experience (range <1 to 20 years) had arrived at an L2 country after the age of 17 (i.e., late bilinguals), assuming that they used L2 English with detectable L1-related accents.We measured participants' L2 comprehension and production proficiency via measures of segmentals, suprasegmentals, vocabulary, and morphosyntax.Next, we assessed their auditory processing profiles via behavioral and EEG measures.Finally, we surveyed their biographical backgrounds, gathering data on experience-related variables (length of foreign language education and residence, daily L1/L2 use) and age-related variables (chronological age, age of learning, and arrival).
The findings were published separately in several different papers between 2020 and 2022.Adopting cross-sectional or/and longitudinal designs, each paper linked various types of auditory processing (explicit, implicit) to different dimensions (segmentals, suprasegmentals, vocabulary, morphosyntax), modes (perception, production), and stages (early, mid, final) of L2 speech learning.By analyzing these studies as a group, it is possible to synthesize their findings in order to obtain SUGGESTIVE patterns.
First and foremost, the results of multiple regression and mixed-effects modeling analyses showed that performance scores were EQUALLY associated with biographical and auditory processing factors.As visually summarized in Figure 1, half of the variance was explained by how much participants practiced a target language in a classroom setting, and how much they had been using the target language on a daily basis in immersion settings.The other half of the variance was accounted for by their auditory processing ability.
In terms of type of auditory processing, explicit auditory processing appeared to be important at every stage of adult L2 learning (e.g., Saito et al., 2020b for the longitudinal analyses of the first 1 year of immersion), while implicit auditory processing had stronger predictive power for experienced, long-term L2 residents (length of residence = 1-10 years; Kachlicka et al., 2019;Saito et al., 2019aSaito et al., , 2019bSaito et al., , 2020aSaito et al., , 2020b;;cf. see Sun et al., 2021 for short-term residents with less than 1 year of length of residence).Interestingly, the effects of auditory processing are relatively weak among L2 learners in classroom settings (Saito et al., 2020b(Saito et al., , 2021(Saito et al., , 2022)).This is probably because auditory processing may be unrelated to the outcomes of classroom L2 speech learning wherein learners receive and process the limited amount of aural input (but see the "Different types of auditory processing" section below).
Furthermore, in the context of 70 Japanese speakers of English with varied experience and proficiency levels, Saito et al. (in press-a) examined the extent to which auditory processing and cognitive abilities INTERACTED to determine the rate of success in L2 speech proficiency.The results of the correlation analyses showed that all variables were equally related to L2 speech outcomes.More interestingly, the results of the factor analyses showed that auditory processing and explicit cognitive abilities (phonological short-term memory, executive functions, and declarative memory) were clustered into two different categories (see Table 3).Of course, the findings are tentative as they need to be replicated with L2 learners with different L1 backgrounds (e.g., Polish, Spanish, and Vietnamese).However, the study here at least hints at the possibility that auditory processing may be distinct from explicit cognitive abilities and instead related to implicit and procedural memory.The suggestions here support the view that the test of auditory processing may trigger implicit statistical learning of the distribution of stimuli across trials (combining the prior stimulus distribution and the acoustic representations of each incoming stimulus; Raviv et al., 2012; for a more detailed discussion on the role of implicit statistical learning in auditory processing, see Saito et al., in press-a).
Taken together, there are three main observations from the empirical research.First, auditory processing appears to be a relatively independent construct.Second, individual differences in auditory processing may serve as a moderate-to-strong determinant of post-pubertal L2 speech acquisition, Language Teaching especially if learners engage in a great deal of authentic, conversational auditory input on a daily basis.The first two observations led me to propose the last observation: that even adult L2 learners may draw on similar language learning mechanisms used for L1 acquisition, and that these have a lifelong impact on the rate and ultimate attainment of language learning throughout the lifespan (for a comprehensive summary of auditory processing in L1 and L2 acquisition research, see Saito et al., 2021).

Future directions 10.1 Offline test development and dissemination
To facilitate follow-up studies on the role of auditory processing in L2 speech learning, our team has developed an open-source, freely available auditory processing test battery that researchers, students, and practitioners can use.The test comprises four subcomponents (formant discrimination, pitch, discrimination, duration discrimination, and amplitude rise discrimination) following an A × B discrimination task format (see Figure 2).The tasks adopt Levitt's (1971) adaptive procedure, wherein task difficulty decreases (i.e., the difference being wider) or increases (i.e., the difference being smaller) based on participants' trial-by-trial performance.Ultimately, the test allows us to measure the extent to which participants can perceive subtle differences in one of four different types of domain-general acoustic information: second formant (1,500-1,700 Hz), fundamental frequencies (300-360 Hz), stimulus duration (250-500 ms), and the timing of amplitude change (15-300 ms).Test materials and a user manual are deposited at Tools for Second Language Speech Research and Teaching (Mora-Plaza et al., 2022, [http://sla-speech-tools.com/]).
Evidence for the reliability of these instruments was provided in a test-retest study with 100 L1 and L2 speakers (Saito & Tierney, in press-e).The study found that the inter-class correlations among the different tasks could be considered "fair" to "good" (ICC (2,2) = .4-.6).This suggests that these behavioural measures can reliably tap into various dimensions of participants' supposedly stable perceptual acuity abilities (Moore, 2012).To further examine the source of individual variation among participants' auditory processing scores, future research could examine the auditory processing profiles of participants with varied biographical backgrounds (e.g., L1 vs. L2 vs. L3 speakers; classroom vs. immersion learners; tonal vs. non-tonal speakers; musicians vs. non-musicians).For instance, our tentative evidence suggests that auditory processing is relatively stable regardless of experience-related variables (e.g., length and intensity of immersion and foreign language education) but may be subject to the influence of age-related variables (e.g., Saito et al., 2020aSaito et al., , 2022 for chronological age; Saito et al., 2020a for age of arrival).Future studies on this topic will shed light on what characterizes the individual variation observed in explicit auditory processing ability.

Enhancing auditory processing
If auditory processing matters for L2 acquisition, one relevant question is, "Can it be enhanced via focused training?"In the L1 hearing literature, some studies have shown that a few hours of training can boost various dimensions of auditory processing among children with language disorders (see Merzenich et al., 1996 for temporal acuity;Micheyl et al., 2006 for pitch acuity; Whiteford & Oxenham, 2018 for audio-motor integration).In turn, they can reach optimal auditory thresholds, and subsequently make the most of every input opportunity in their L1.Following this line of work, our team's current study examined whether domain-general auditory processing (i.e., precise representation of sounds) can be improved via focused online training and whether this affects speech learning (Saito et al., in press-c).Ninety-eight adult Japanese speakers were divided into two training groups targeting the acquisition of English [ae] and [ʌ]: an auditory training group and a phonetic training group.The auditory training group completed activities designed to improve their ability to use the second formant frequency (1,200-1,600 Hz) to discriminate between nonverbal sounds.The phonetic training group was taught to discriminate between English [ae] and [ʌ] contrasts using multi-talker speech stimuli.The results showed that the phonetic training group improved only their English[ae] and [ʌ] identification, while the auditory training group enhanced both auditory and phonetic skills.The results suggest that auditory acuity to key, domain-general acoustic cues (F2 = 1,200-1,600 Hz) anchors, triggers, and promotes speech learning on a domain-specific level (English [ae] vs. [ʌ]).The findings also suggest that auditory training could help remediate difficulties with L2 speech learning in some individuals with auditory deficits.

Different types of auditory processing (beyond acuity)
Thus far, auditory processing has been conceptualized as the ability to encode subtle acoustic characteristics of sounds (i.e., perceptual acuity).On a broader level, auditory processing can also comprise a range of neighboring abilities, such as attention to particular acoustic dimensions while ignoring others (i.e., auditory selective attention) and the use of acoustic information for motor action (i.e., audio-motor integration).There is emerging evidence that different types of auditory training are more or less relevant to different dimensions of L2 speech learning.
On the one hand, perceptual acuity and audio-motor integration appear to be good indices of successful L2 speech learning in NATURALISTIC settings.Since such immersion experience can provide learners with ample L2 input and output opportunities, those with more precise acuity and integration can better encode the acoustic dimensions of new sounds and then integrate this information into their L2 system more efficiently and effectively.As a result, these learners can achieve more advanced L2 speech proficiency (e.g., Saito et al., 2022;Zheng et al., 2022).
On the other hand, the rate of learning success in CLASSROOM settings appears to be linked to audiomotor integration but NOT to perceptual acuity.In many English-as-a-Foreign-Language (EFL) classrooms, L2 learners typically learn the target language through decontextualized, production-based teaching methods (e.g., mechanical repetition and memorization of model pronunciation forms.Such learning environments do not provide an abundant amount of contextually rich, communicatively authentic input (Shintani et al., 2013).Owing to the asymmetry here (output > input), learners' audiomotor integration (but not perceptual acuity) has been found to impact the outcomes of classroom L2 speech learning (e.g., Saito et al., 2021 for Vietnamese EFL classrooms;Shao et al., 2022 for Chinese EFL classrooms).

Aptitude-treatment interaction
In L2 morphosyntax learning, there is a well-researched hypothesis stating that learners with greater explicit aptitude will benefit more from explicit training, and those with greater implicit aptitude will benefit more from implicit training (for comprehensive reviews, see DeKeyser, 2012;Fu & Li, 2021).Following this line of thought, it would be intriguing to examine the extent to which auditory processing tests can be used as a diagnostic tool to provide profile-matched instructional approaches.
As reviewed earlier, it has been shown that learners with high-level explicit auditory processing benefit from explicit, language-focused speech training such as high variability phonetic training (e.g., Lengeris & Hazan, 2010;Qin et al., 2021).Few studies have examined the relationship between auditory processing (or any measure of aptitude for that matter) and the effectiveness of INCIDENTAL, IMPLICIT, and MEANING-ORIENTED L2 speech training, arguably because scholars have exclusively used intentional, explicit and language-focused training to date.Though limited in number, some scholars have proposed using communicative focus on form (Lee & Lyster, 2016), task-based pronunciation training (Mora & Levkina, 2017), and phonological recasts (Saito, 2021) in this regard.
In accordance with the notion of incidental and multimodal auditory categorization learning in the field of cognitive psychology (Lim & Holt, 2011), our team has developed and tested the pedagogical potential of a video game-based target shooting game that aims to support segmental acquisition among Japanese learners of English (Saito et al., in press-b).In this game, participants are told that the faster targets are shot, the more points can be earned.Unknown to the participants, each target is accompanied by unique English consonants and vowel sounds.As such, participants are incidentally guided to use speech cues (L2 vowels and consonants) and acquire a series of novel foreign sounds as a by-product of playing the game.The findings of Saito et al. showed that participants' overall gains were similar to those of comparable explicit training (e.g., High Variability Phonetic Training; overt identification of target contrasts followed by trial-by-trial feedback), but that the degree of improvement widely varied among individual participants.Follow-up studies are called for, which investigate whether the effectiveness of this type of training is related to explicit and implicit auditory processing ability.
There is also a possibility that learners' degree of auditory precision in general (relatively strong or relatively poor) could help determine the extent to which they might benefit from phonetic training (using speech stimuli) and auditory training (using non-speech stimuli).Provision of phonetic training alone could be sufficient for L2 learners with strong auditory processing skills as they are more capable of encoding the acoustic dimensions of new sounds and are likely to show larger gains when they receive various types of intensive L2 speech training (see Lengeris & Hazan, 2010 for high variability phonetic training; Shao et al., 2022 for shadowing training;Sun et al., 2021 for five months of study abroad).
Conversely, such an approach (phonetic training only) could be confusing and/or have adverse effects when conducted with L2 learners with poorer auditory processing.Poor auditory processing prevents learners from detecting the novel acoustic characteristics of L2 speech while minimizing interference from their L1, extracting reliable acoustic cues (while ignoring irrelevant cues), and attaining robust L2 speech perception (e.g., pitch contour for the acquisition of lexical tones, Perrachione et al., 2011; formants and duration for the acquisition of vowels, Ruan & Saito, forthcoming).
As a remedial strategy, I propose that those with relatively low auditory processing may benefit from auditory training prior to phonetic training.During auditory training, learners are exposed to acoustically simple and monotonous nonspeech sounds that are manipulated along a single acoustic parameter.This can guide learners to focus on enhancing their sensitivity to the most useful dimensions of L2 speech (e.g., F2 = 1,200-1,600 Hz for English [ae] and [ʌ]; Saito et al., in press-c).

Auditory processing and different aspects of L2 learning
In a broader sense, L2 speech proficiency concerns one's ability to access multiple dimensions of linguistic knowledge while comprehending and speaking language on a global level.Intuitively, it is unsurprising that auditory processing can explain some variances in the phonological aspects of L2 speech learning because the role of auditory input processing is most directly linked to segmental and suprasegmental acquisition.The question has now become: To what degree does auditory processing matter not only for the acquisition of lower-order linguistic information (phonology), but also the acquisition of higher-order linguistic information (vocabulary and grammar)?Auditory precision plays an important role in word segmentation (Norris & McQueen, 2008) and the identification of word and phrase boundaries (Cutler & Butterfield, 1992).Further, auditory precision facilitates the detection of suffixes, inflection, and articles (Joanisse & Seidenberg, 1998) and word order (Penner et al., 2001).Since auditory processing is involved in every stage of L2 speech learning, future research can further explore how this ability DIFFERENTIALLY promotes the development of phonology, vocabulary, and grammar in a complementary fashion (for some emerging evidence, see Kachlicka et al., 2019;Saito et al., in press-d).

Conclusion
In this paper, I have introduced the auditory precision paradigm from L1 acquisition as a way to look at the complex mechanisms underlying adult L2 speech learning (i.e., Auditory Precision Hypothesis-L2).First and foremost, everyone can learn new sounds and achieve comprehensible, intelligible, communicatively adequate, and functional L2 oral proficiency as long as they practice the target language on a daily basis with a good level of motivation and willingness to communicate (Derwing & Munro, 2013).Here, the Auditory Precision Hypothesis-L2 is in line with major L2 speech learning theories in that both consider the quantity, quality, and intensity of experience as the crucial determinant of L2 speech learning (e.g., Flege & Bohn, 2021 for Speech Learning Model).
However, much individual variation has still been found in terms of the levels of attainment among highly experienced, regular, motivated, and functional L2 learners-some are able to reach a stage of proficiency where they are almost indistinguishable from native speakers of the target language.These differences in learning outcomes exist not only because of the amount of time spent practicing the target language, but also because some learners are more cognitively and perceptually adept at making the most of every opportunity for input.Consequently, this could lead to larger and more robust gains in the long run (Doughty, 2019).
An "auditory precision view" of L2 speech learning predicts that individuals with a good ear (i.e., precise auditory processing) are able to make the most of every input opportunity.That is, more precise auditory processing helps learners better capture the acoustic dimensions of L2 speech input (McAllister et al., 2002), adjust to new cue weighting patterns (Jasmin et al., 2021), develop new speech categories (or revise existing speech categories; Flege & Bohn, 2021), and continue to refine these categories to a nearnativelike level in the long run (Abrahamsson, 2012).The Auditory Precision Hypothesis-L2 assumes that domain-general and pre-categorical sound processing abilities govern language learning throughout the lifespan and play a key role in late L2 speech learning (Mueller et al., 2012).
Given that auditory processing is fundamental to parsing L2 aural input, any lower-order problems will likely slow down other L2 speech learning processes, even if learners have a relatively strong working memory and high attentional control, receive ample input, and/or are motivated to practice the target language (Perrachione et al., 2011;Ruan & Saito, forthcoming).Going forward, both researchers and practitioners are encouraged to carry out more auditory processing research that can provide insight into the different types of speech training participants may benefit from (e.g., explicit auditory processing for explicit speech training).In addition, more research is called for which explores how tests of auditory processing can be used to diagnose learners with relatively low-level auditory precision.This latter group of L2 learners may greatly benefit from auditory processing training, especially prior to L2 speech training and immersion experience.This will, in turn, ensure that all L2 learners can reduce the challenge of learning a new language despite any disadvantages they may have at the level of auditory processing.

Figure 1 .
Figure 1.Summary of the suggestive relationship between auditory processing, biographical background, and L2 speech outcomes

Table 1 .
Summary of perceptual and cognitive abilities relevant to L2 speech learning

Table 2 .
Summary of auditory processing relevant to L2 speech learning

Table 3 .
Factor analysis of auditory processing and cognitive abilities presented among 70 L2 learners inSaito et al. (in  press-a) Figure 2. Task instruction (A) and onscreen labels (B)