39.1 Introduction
To acquire language, infants need to extract information from speech and develop an understanding of the relationship between sounds and their meaning. When observing infants and young children in everyday life, this seems like a gradually developing and effortless task, but the underlying process is likely more complex and currently not completely understood. For the language learner, some form of tracking of the stream is arguably needed for language acquisition to take off, for word and grammar learning to proceed. The question arises as to what features of the phonetically rich, but lexically and grammatically opaque input, provide an entry point into, and facilitate, the acquisition of the structure of language in its full complexity, including words and grammar. Here we take a closer look at prosody and its role in early development of auditory neural oscillations, focusing on a model in which synchronization to the slow fluctuations associated with the prosodic phrase level scaffold grammar learning in infancy (Nallet and Gervain, Reference Nallet and Gervain2021).
39.2 The Prenatal Prosodic-Shaping Model
According to the prenatal prosodic-shaping model (Nallet and Gervain, Reference Nallet and Gervain2021), infants’ prenatal experience with the speech signal lays the foundation for subsequent grammar learning after birth. Developing fetuses are exposed to speech as early as from week 24 to 28 of gestation (Eggermont and Moore, Reference Eggermont, Moore, Werner, Fay and Popper2011). However, due to the intrauterine environment, sounds are low-pass filtered, essentially providing the fetus with the prosodic contour of the speech signal (Gerhardt and Abrams, Reference Gerhardt and Abrams2000; Menn et al., Reference Menn, Männel and Meyer2023).
Prosodic cues contribute to the parsing of the speech stream in the form of intonational phrase boundaries (Thompson and Balkwill, Reference Thompson and Balkwill2006), dynamic pitch changes (Watson and Gibson, Reference Watson and Gibson2005), metrical information (Liu et al., Reference Liu, Jiang, Wang, Xu and Patel2015), and so on. In terms of its function, prosody may be used as a grammatical marker, for example, of focus or interrogatives, and can also provide meaning to an utterance above and beyond the lexical and grammatical content by nuancing the speaker’s intent in communicating affect, emphasis, irony, and so on (Coutinho and Dibben, Reference Coutinho and Dibben2013; Scherer, Reference Scherer1986; Zentner et al., Reference Zentner, Grandjean and Scherer2008). As such, the above-mentioned linguistic phenomena associated with prosodic cues provide an anchor to the underlying structure of language and, importantly to the topic of this chapter, may thus bootstrap the development of grammar (Gervain and Werker, Reference Gervain and Werker2013; Nazzi and Ramus, Reference Nazzi and Ramus2003; Soderstrom et al., Reference Soderstrom, Seidl, Nelson and Jusczyk2003).
How such regularities are processed by the brain has been a much-debated topic (e.g., Ding et al., Reference Ding, Melloni, Zhang, Tian and Poeppel2016; Giraud and Poeppel, Reference Giraud and Poeppel2012). Recent advances have established that adults’ brain responses simultaneously track the different timescales of the speech signal. Bottom-up processing of units in the speech signal is supported by neural oscillations in the delta (0.5–3.5 Hz), theta (4–8 Hz), and low-gamma (>35 Hz) frequency bands (Ding et al., Reference Ding, Melloni, Zhang, Tian and Poeppel2016; Giraud and Poeppel, Reference Giraud and Poeppel2012). These bands, respectively, underlie the processing of prosodic phrases, syllables, and (sub-)phonemic units of speech (Giraud and Poeppel, Reference Giraud and Poeppel2012), as their frequencies match those of the relevant linguistic units. For further details, we direct the reader to Chapters 3 and 5. However, it is still unclear how the neural tracking of the speech and, in particular, the oscillatory hierarchy develop during the first year of life.
The prenatal prosodic-shaping model proposes that this development starts already prior to birth. In utero, the fine-grained phonemic information (i.e., the gamma band) is mostly suppressed in the low-passed auditory signal that reaches the fetus, while syllabic and prosodic phrase information is preserved, as the spectral content fluctuates at slower frequencies (corresponding to theta and delta frequency bands) (Gerhardt and Abrams, Reference Gerhardt and Abrams2000). Given this prenatal experience with the speech signal, the neural tracking of larger linguistic units may already start prenatally, while oscillations tracking (sub-)phonemic information may not be operational until after birth (Nallet and Gervain, Reference Nallet and Gervain2021). Postnatally, infants are exposed to the full-band speech signal, at which point the neural tracking of fine-grained phonemic elements may start being shaped by experience with the (unfiltered) speech signal.
Due to their prenatal exposure to parts of the speech signal, fetuses are born with a certain familiarity with language. It has been attested that neonates prefer sounds to which they have been exposed in the womb (DeCasper and Fifer, Reference DeCasper and Fifer1980; Mehler et al., Reference Mehler, Jusczyk and Lambertz1988; Moon, Reference Moon, Filippa, Kuhn and Westrup2017; Moon et al., Reference Moon, Cooper and Fifer1993), which suggests that they do have the ability to learn from the low-passed signal available to them, and that some shaping of the language system takes place already in utero. For example, newborns show a preference for their mother’s voice compared to an unknown female voice (DeCasper and Fifer, Reference DeCasper and Fifer1980; Moon, Reference Moon, Filippa, Kuhn and Westrup2017), and for their native language over unfamiliar languages (Mehler et al., Reference Mehler, Jusczyk and Lambertz1988; Moon et al., Reference Moon, Cooper and Fifer1993). Following these results, a growing body of research suggests that prenatal experience, with the prosodic features preserved in the low-passed speech signal, might lay the foundations for even more complex language acquisition.
Newborns can, for example, distinguish well-formed from ill-formed prosodic sequences based on duration, pitch, or intensity, but only if the varying element is contrastive in the language they heard before birth (Abboub et al., Reference Abboub, Nazzi and Gervain2016). Specifically, French newborns can discriminate between short-long and long-short sequences (variation in duration), which mark contrastive distinctions, but not between loud-soft and soft-loud (variation in intensity) or high-low and low-high sequences (variation in pitch), which are not markers of contrastive distinctions in French prosody (Nespor et al., Reference Nespor, Shukla and van de Vijver2008; Nespor and Vogel, Reference Nespor and Vogel2007). In addition, even though consonants are likely not perceivable by the fetus, some information about vowels might be available, because vowels, which are the main carriers of prosodic information, are high-energy events in the speech signal. Accordingly, Moon et al. (Reference Moon, Lagercrantz and Kuhl2013) observed opposite preferences between American and Swedish newborns for the vowel with which they had prenatal experience (the American /i:/ versus the Swedish /y/ vowel).
It thus becomes evident that infants are born with a certain familiarity with the prosodic features of their native language. Importantly, as prosody also carries lexical, morphosyntactic, and pragmatic information, it is highly relevant to language development overall. In older infants, prosody provides cues to, for example, word boundaries (Shukla et al., Reference Shukla, White and Aslin2011) and word order (Gervain and Werker, Reference Gervain and Werker2013), and is thus an important bootstrapping mechanism for lexical and grammatical development. However, already newborns display the ability to utilize prosodic features to gain access to more complex aspects of their native language. For instance, they can discriminate between function words (marking morphosyntactic structure) and content words (carrying lexical meaning) (Shi et al., Reference Shi, Werker and Morgan1999), and they are sensitive to word order and its violations (Benavides-Varela and Gervain, Reference Benavides-Varela and Gervain2017). In addition, newborns are sensitive to prosodic violations at the utterance level, in that they discriminate between well-formed and ill-formed prosodic contours (Martinez-Alvarez et al., Reference Martinez‐Alvarez, Benavides‐Varela, Lapillonne and Gervain2023).
According to the prenatal prosodic-shaping model (Nallet and Gervain, Reference Nallet and Gervain2021), prosody provides the earliest experience with language, and is one of the mechanisms that links innate predispositions and soon-to-be relevant input from the environment. In other words, as prosodic features of spoken language fluctuate at slower frequencies, they are likely to be preserved in utero, as suggested by newborns’ familiarity with these features. This prenatal prosodic experience is hypothesized to shape the neural architecture, meaning that neural entrainment to prosody is already operational at birth (Menn et al., Reference Menn, Männel and Meyer2023; Ortiz Barajas et al., Reference Ortiz Barajas, Guevara and Gervain2021). When newborns get exposed to the full-band speech signal after birth, which includes the fine-grained acoustic information at the phonemic level, oscillations in the delta and theta bands are already fine-tuned, at least to some extent, to the rhythm of the prenatally heard language. After months of exposure to the full speech signal, phoneme perception becomes attuned to the native language, and neural activity in its corresponding frequency band, gamma, is fine-tuned and hierarchically embedded in the delta- and theta-band oscillations. This model can thus offer a theoretical account in which prenatal experience with prosody is the foundation on which subsequent language development is built, in that, with postnatal exposure, oscillations in the gamma band are gradually embedded in the prenatally acquired delta- and theta-band oscillations. In other words, the developmental chronology of experience with speech, first filtered then full-band, explains the hierarchical organization of oscillations. However, see also Menn et al. (Reference Menn, Männel and Meyer2023) for a related perspective in which electrophysical maturation and the emergence of gamma-band activity shapes the acquisition of phonological knowledge.
39.3 Evidence for the Model
Recent research on the development of neural tracking suggests that delta-band tracking is operational in the first year of life. Infants at six and nine months of age, presented with streams of rhythmic stimuli in the form of the syllable “ta” or a drumbeat at a presentation rate of 2 Hz, displayed local peaks in electroencephalography (EEG) power at the presentation rate for both stimulus types, compared to a silent baseline condition (Choisdealbha et al., Reference Choisdealbha, Attaheri and Rocha2022). The response was entrained to the stimuli, namely time-locked to the onset of the stimulus, as observed through a relative increase in phase consistency. The response in the 4 Hz (harmonic frequency of the presentation rate) and 7 Hz (not-harmonic frequency of the presentation rate) was also assessed and compared to the response at the presentation rate. An increase in power at the harmonic frequency was observed, regardless of stimulus type, but a time-locked response similar to the 2 Hz response was only observed to the speech stimulus at the 4 Hz harmonic frequency. The infants were tested longitudinally, but, importantly, no evidence of improved tracking as a function of age was observed (Choisdealbha et al., Reference Choisdealbha, Attaheri and Rocha2022).
Similarly, when longitudinally following four-, seven-, and 11-month-old infants’ tracking of sung speech (nursery rhymes), a peak in power in the delta band (at ~2.2 Hz) and the theta band (at ~4.3 Hz) was observed (Attaheri et al., Reference Attaheri, Choisdealbha and Di Liberto2022). However, delta-band tracking stayed significantly higher compared to theta-band tracking at each age, and was specifically strong at four months, while theta-band tracking increased over the course of the first year of life. The alpha band (8–12 Hz) was used as a control condition, in which no reliable tracking was observed (Attaheri et al., Reference Attaheri, Choisdealbha and Di Liberto2022). Given the results of these studies, infants appear to faithfully track auditory speech-related stimuli in the theta and delta bands, a mechanism that is operational from at least four months of age (Attaheri et al., Reference Attaheri, Choisdealbha and Di Liberto2022; Choisdealbha et al., Reference Choisdealbha, Attaheri and Rocha2022).
Interestingly, in terms of the predictions made by the prenatal prosodic-shaping hypothesis, Attaheri et al. (Reference Attaheri, Choisdealbha and Di Liberto2022) also observed delta- and theta-band-driven phase amplitude coupling with higher-frequency amplitudes. Namely, the phase of delta- and theta-band activity acted as a carrier of amplitude in higher-frequency bands, both beta and gamma frequencies, although greatest for gamma-band activity. This finding is consistent with the model as it suggests that the prenatally unavailable higher frequencies associated with phonemes will become embedded in the delta-band oscillations when the infant is exposed to these frequencies after birth. In other words, the prenatal experience with the phases of delta- and theta-band frequencies is likely to play an important role in the temporal organization of the amplitude of higher-frequency bands in the infant brain, that is, in their nesting within the slower bands.
Another branch of research on early neural tracking of speech focuses on infant-directed speech (IDS) compared to adult-directed speech (ADS) (Kalashnikova et al., Reference Kalashnikova, Peter, Di Liberto, Lalor and Burnham2018; Menn et al., Reference Menn, Michel, Meyer, Hoehl and Männel2022). IDS, a type of speech often used by adults when speaking to infants, is characterized by a higher pitch, more variability in intonation, and a slower tempo compared to ADS. This is reflected in amplified slow-frequency modulations as compared to ADS, and has been shown to be preferred by infants, offering potential benefits in language development (Song et al., Reference Song, Demuth and Morgan2010). Seven-month-old infants have been found to track both naturally produced IDS and ADS, as measured in the increases in power in the theta band (Kalashnikova et al., Reference Kalashnikova, Peter, Di Liberto, Lalor and Burnham2018). Furthermore, significant correlations were found between the pattern of neural activity and the envelope of the speech signal for IDS, but not for ADS, meaning that the envelope of the speech signal was more strongly reflected in the neural signal when infants were presented with IDS compared to ADS (Kalashnikova et al., Reference Kalashnikova, Peter, Di Liberto, Lalor and Burnham2018).
Menn et al. (Reference Menn, Michel, Meyer, Hoehl and Männel2022) extend on these results by estimating whether the effect for IDS is driven by the syllabic rate or the prosodic stress for IDS compared to ADS in nine-month-olds, namely, which frequency band causes the effect (theta versus delta band, respectively). The infants listened to their mothers describe items in either an IDS- or ADS-like way. In addition to significant speech–brain coherence at the syllabic and prosodic stress rate for both IDS and ADS, a significantly higher coherence was found for IDS at the prosodic stress rate, but not at the syllabic rate, a difference driven by a left-central cluster. As such, their results indicate that prosodic stress (greater coherence at delta-band frequencies), but not syllable rhythm, might be the facilitator of greater neural tracking of IDS (Menn et al., Reference Menn, Michel, Meyer, Hoehl and Männel2022). These results might arise as a consequence of the differences in attentional salience between IDS and ADS, as the prosodic differences may contribute to increased attention to the speech sounds.
The results reviewed above come from somewhat older infants (between four and 11 months of age). However, are these abilities already present at birth? When assessing tracking of syllables, that is, activity in the theta band, no differences were found between newborns and six-month-old infants presented with short sentences read in IDS: Both groups similarly track the phase and amplitude of the envelope of familiar (native language) and unfamiliar (different rhythmic class and same rhythmic class as the first language [L1]) languages (Ortiz Barajas et al., Reference Ortiz Barajas, Guevara and Gervain2021). Interestingly, while phase tracking continues to be universal, amplitude tracking is only kept up for the unfamiliar language, especially the rhythmically different one. As such, phase and amplitude tracking might be differentially modulated over the course of the first year, which may be reflective of a perceptual narrowing following infants’ experience with their native language (Kuhl, Reference Kuhl2004; Ortiz Barajas et al., Reference Ortiz Barajas, Guevara and Gervain2021). More recent results with newborns (Ortiz Barajas et al., Reference Ortiz Barajas, Guevara and Gervain2023) suggest that newborns show enhanced power in the delta and theta bands in response to the language heard prenatally and, to some extent, to a rhythmically similar unfamiliar language, as compared to a rhythmically different unfamiliar language, whereas no such language differences were present in the gamma band, where no enhanced power was found for speech in any of the languages tested. These results also confirm the hypothesis that slower oscillations are fine-tuned already at birth (Nallet and Gervain, Reference Nallet and Gervain2021; see also Menn et al., Reference Menn, Männel and Meyer2023).
39.4 General Discussion
To acquire their native language, infants need to develop sensitivity to the phonological properties and contrasts characteristic of that language. These skills are a necessary first step on the path to word learning and grammar acquisition. Several models have been proposed to account for the relatively fast and seemingly effortless accomplishment of this challenging task. One such model, the native language model (Kuhl, Reference Kuhl2004; Kuhl et al., Reference Kuhl, Williams, Lacerda, Stevens and Lindblom1992), focuses on developmental changes in early infant auditory perception and aims to incorporate social and communicative factors in recent versions of the model (Kuhl et al., Reference Kuhl, Conboy and Coffey-Corina2008). The prenatal prosodic-shaping model offers a novel perspective on perceptual narrowing from the point of view of recent advances in the neurobiology of language and its alignment with neural structures and mechanisms that support its development in human infants. Development of synchronization of neural oscillations to speech at different levels of granularity offers a possible format for such an account.
Seen together, the current literature indicates that infants from four to 11 months of age reliably track speech in both the delta (Attaheri et al., Reference Attaheri, Choisdealbha and Di Liberto2022; Choisdealbha et al., Reference Choisdealbha, Attaheri and Rocha2022; Menn et al., Reference Menn, Michel, Meyer, Hoehl and Männel2022) and theta (Attaheri et al., Reference Attaheri, Choisdealbha and Di Liberto2022; Kalashnikova et al., Reference Kalashnikova, Peter, Di Liberto, Lalor and Burnham2018; Menn et al., Reference Menn, Michel, Meyer, Hoehl and Männel2022) frequency bands. Syllabic tracking, as reflected in the theta band, shows a developmental increase between four and 11 months (Attaheri et al., Reference Attaheri, Choisdealbha and Di Liberto2022). However, in terms of the phase of the signal, tracking appears to remain relatively stable from birth until around six months of age for both familiar and unfamiliar languages (Ortiz Barajas et al., Reference Ortiz Barajas, Guevara and Gervain2021). Amplitude tracking, on the other hand, is universal at birth, but only manifests for unfamiliar languages at six months of age (Ortiz Barajas et al., Reference Ortiz Barajas, Guevara and Gervain2021). Between four and 11 months, prosodic tracking in the delta band has been found to remain greater than syllabic tracking throughout these ages. In addition, IDS may facilitate neural tracking (Kalashnikova et al., Reference Kalashnikova, Peter, Di Liberto, Lalor and Burnham2018; Menn et al., Reference Menn, Michel, Meyer, Hoehl and Männel2022), an effect primarily driven by prosodic stress as reflected in stronger delta-band coherence for IDS compared to ADS (Menn et al., Reference Menn, Michel, Meyer, Hoehl and Männel2022).
In terms of the prenatal prosodic-shaping model, the low-frequency parts of the speech signal that are available prenatally are successfully tracked by the infant brain during the first year of life, although some developmental changes can also be observed. Based on extant research, both syllabic tracking and tracking of larger prosodic units are present from birth. Interestingly, in terms of nested oscillations, the phases of both delta- and theta-band oscillations do act as carriers of amplitude in higher-frequency bands, especially in the gamma band (Attaheri et al., Reference Attaheri, Choisdealbha and Di Liberto2022). While the currently available evidence supports the prenatal prosodic-shaping model, the number of available studies is still very small. In particular, few studies have tested newborn infants and fetuses. Studying newborns and fetuses poses a challenge compared to adults. Specifically with EEG, newborns’ data tend to have a lower signal-to-noise ratio, which limits the analysis methods that can be used. During the first months of life, the electrophysiological activity is less structured than in later development and in adulthood, and evidence can as such be of a more indirect nature. Applying the same oscillatory models as with adults and older infants can therefore be somewhat challenging.
Taken together, and considering future empirical word in the field, the prenatal prosodic-shaping model has the potential to explain how available brain mechanisms interface with the infant environment, both at prenatal and neonatal/postnatal stages. As such, this model offers predictable hypotheses concerning the hierarchical nesting of neural oscillations in concert with increased complexity of the acquired language skills, from word learning to advanced grammar.
39.5 Acknowledgements
ERC Consolidator Grant “BabyRhythm” nr. 773202 to Judit Gervain and a FARE grant nr. R204MPRHKE from the Italian Ministry for Universities and Research to Judit Gervain.
Summary
The current chapter reviews recent findings of infants’ neural tracking of speech and relates these findings to subsequent grammar acquisition. Specifically, we discuss the potential role of prenatal exposure to speech for speech-tracking abilities at birth and its potential as an entry point into language structure in early language acquisition, in light of the prenatal prosodic-shaping model.
Implications
There is a gap in the literature when it comes to newborns’ tracking of speech in the gamma band, corresponding to (sub-)phonemic elements of speech. Although several recent results are consistent with the predictions of the prenatal prosodic-shaping model, it can be empirically approached by addressing this gap.
Gains
Understanding the neural mechanisms that support grammar development is highly relevant to psycholinguistics/neurolinguistics, as much is yet unknown. The model presented in this chapter represents a potential framework for interpretation of the growing body of research on the role of neural oscillations in early speech processing.