25.1 Introduction
Recent research discusses relationships between language processing and musical rhythm processing. The ability to synchronise to an external rhythm links to phonological skills in preschool children, while the ability to discriminate rhythmic structure correlates with morphosyntactic production and comprehension in school-age children (Gordon et al., Reference Gordon, Shivers and Wieland2015; Lee et al., Reference Lee, Ahn, Holt and Schellenberg2020; Woodruff Carr et al., Reference Woodruff Carr, White-Schwoch, Tierney, Strait and Kraus2014). Moreover, children with developmental language disorders show impaired performance in synchronising with a metronome, rhythm discrimination, or rhythm reproduction (Corriveau and Goswami, Reference Corriveau and Goswami2009; Flaugnacco et al., Reference Flaugnacco, Lopez and Terribili2014). In typical adults, percussionists with exceptional rhythmic skills show better performance than vocalists or non-musicians in sentence-in-noise perception (Slater and Kraus, Reference Slater and Kraus2016; Yates et al., Reference Yates, Moore, Amitay and Barry2019). At the brain level, overlapping regions involved in rhythm and syntax processing have been identified in a recent activation likelihood estimation (ALE) meta-analysis (Heard and Lee, Reference Heard and Lee2020). Experimental data further show comparable event-related potentials evoked by rhythmic and syntactic violations, both of which are altered in patients with lesions or neurodegenerative diseases to the basal ganglia (Kotz and Schmidt-Kassow, Reference Kotz and Schmidt-Kassow2015; Schmidt-Kassow and Kotz, Reference Schmidt-Kassow and Kotz2009). While there is convincing evidence of a relationship between musical rhythm and language processing, overlap between the music and language processing systems remains hotly debated (Chen et al., Reference Chen, Affourtit and Ryskin2023). Numerous studies have shown that rhythmic stimulation can also influence performance in linguistic tasks. The key manipulations in these experiments include 1) imposing a metrical structure on the speech stimuli, 2) creating an alignment or lack thereof between the metrical structures of musical and linguistic stimuli, or 3) manipulating the structural regularity of the musical stimuli independently of the linguistic stimuli. We will provide a review and novel characterisation of these effects later in this chapter.
Several theories account for the behavioural and neural links reported between rhythm and language processing. Generally, these accounts focus on the precise localisation of rhythm processing and its interactions with language processing networks, the precise nature of internal oscillations, and their role in rhythm and language processing (Fiveash et al., Reference Fiveash, Bedoin, Gordon and Tillmann2021; Kotz et al., Reference Kotz, Stockert and Schwartze2014; Ladányi et al., Reference Ladányi, Persici, Fiveash, Tillmann and Gordon2020; Large et al., Reference Large, Herrera and Velasco2015; Large and Jones, Reference Large and Jones1999; Patel and Iversen, Reference Patel and Iversen2014). One line of research proposed that the processing of hierarchical structures might be a crucial overlap between rhythm and language processing (Fitch and Martins, Reference Fitch and Martins2014). Various frameworks have characterised music and language based on their hierarchical organisation (Lerdahl and Jackendoff, Reference Lerdahl and Jackendoff1983; Patel, Reference Patel2008, Reference Patel2011). Here, we will explain to what extent musical rhythm and linguistic syntax (and prosody) can be considered hierarchical structures, highlighting empirical evidence supporting hierarchical structure building in both domains. Next, we will review the effects of rhythmic stimulation on language processing and provide a novel characterisation of these effects. Finally, we will present some theories proposed to account for these rhythmic stimulation effects and evaluate to what extent hierarchical structure processing as a shared cognitive component between rhythm and language processing may provide further insight into the precise nature of rhythmic stimulation effects.
25.2 Hierarchical Structure in Rhythm and Language
Language and music are both considered hierarchical sequences, that is, ordered arrangements of unique elements that can be represented in a tree-like structure in which multiple levels of lower-level units and groups of units are combined into higher-level constituents (Fitch and Martins, Reference Fitch and Martins2014; Patel, Reference Patel2008). Consequently, several recent studies suggested a domain-general cognitive system to be responsible for coding hierarchical structure (Martins et al., Reference Martins, Gingras, Puig-Waldmueller and Fitch2017, Reference Martins, Fischmeister and Gingras2020). Hierarchies appear to be omnipresent in human cognition and culture, as we are able to generate hierarchies in a wide range of domains including metrical, linguistic, visual, action, and social hierarchies (Altmann et al., Reference Altmann, Bülthoff and Kourtzi2003; Fitch and Martins, Reference Fitch and Martins2014; Hauser et al., Reference Hauser, Chomsky and Fitch2010; Jackendoff, Reference Jackendoff2009; Martins et al., Reference Martins, Gingras, Puig-Waldmueller and Fitch2017; Schmid et al., Reference Schmid, Saddy and Franck2023; Tecumseh Fitch and Friederici, Reference Tecumseh Fitch and Friederici2012; Vender et al., Reference Vender, Krivochen and Compostella2020; Zink et al., Reference Zink, Tong and Chen2008). Recent evidence from musical rhythm and language processing suggests that human participants show a tendency to infer hierarchical structures even if overt cues to such structures are not acoustically present in the input (Criscuolo et al., Reference Criscuolo, Schwartze, Henry, Obermeier and Kotz2023; Ding et al., Reference Ding, Melloni, Zhang, Tian and Poeppel2016; Kaufeld et al., Reference Kaufeld, Bosker, Alday, Meyer and Martin2020; Large et al., Reference Large, Herrera and Velasco2015; Nozaradan et al., Reference Nozaradan, Mouraux and Jonas2017; Poudrier, Reference Poudrier2020; Schmidt-Kassow and Kotz, Reference Schmidt-Kassow and Kotz2009; Tal et al., Reference Tal, Large and Rabinovitch2017).
In rhythm, meter refers to the hierarchical structuring of a series of events in time into higher-order groupings (Kotz et al., Reference Kotz, Ravignani and Fitch2018). This metrical structure is built from temporal units and enables the system to make precise temporal predictions as to when the next event, such as a tone onset or beat, is expected to occur. In music, subjective rhythmisation experiments showed that listeners tend to perceive equitone isochronous sequences in groups of elements (Criscuolo et al., Reference Criscuolo, Schwartze, Henry, Obermeier and Kotz2023; Poudrier, Reference Poudrier2020). Despite the stimulus itself being a metronome, participants report that the first unit of each group is more salient than the rest or that a pause between the last element of a group and the first element of a new group is longer. Rhythmic grouping biases have also been described in the iambic–trochaic law, according to which sounds varying in intensity are more likely to be perceived as trochees (strong–weak), while those varying in duration are more often perceived as iambs (weak–strong, Bolton, Reference Bolton1984; Hayes, Reference Hayes1995). These grouping biases have been proposed to play a crucial role in language acquisition (Langus et al., Reference Langus, Mehler and Nespor2017). Recent studies have shown that if a metronome of identical tones is presented at a sufficient rate (e.g., 5 Hz), a large number of participants perceive groups of tones rather than an equitone isochronous sequence, and that these groupings are reflected in neural responses to isochronous tone sequences (Criscuolo et al., Reference Criscuolo, Schwartze, Henry, Obermeier and Kotz2023; Poudrier, Reference Poudrier2020). Results from studies focusing on ‘missing pulse’ phenomena provide further behavioural and neural evidence for the tendency to build a metrical hierarchy even if it is not acoustically present in the input. Participants listening to syncopated rhythms reportedly perceive a pulse frequency that is absent from the acoustic stimulus. In one key experiment that investigated neural entrainment to metrical groupings in musical rhythm, the authors constructed 11 rhythms, ranging from isochronous to highly complex rhythms that did not contain any spectral energy at the pulse frequency. Even in the most complex rhythms containing no overt pulse frequency, participants tapped in phase or anti-phase to a constant pulse frequency that was not present in the acoustic signal (Large et al., Reference Large, Herrera and Velasco2015). Similar observations have also been made at the brain level. In a recent study, participants passively listened to rhythms containing no spectral energy at the pulse frequency while their brain responses were recorded in magnetoencephalography (MEG). Neural oscillations corresponding to the acoustically missing pulse frequency were identified and taken as evidence for a non-stimulus-driven, internally generated neural representation of the pulse frequency, a crucial component of metrical hierarchy (Tal et al., Reference Tal, Large and Rabinovitch2017).
In linguistic syntax, words group into phrases that combine into higher-order phrases, clauses, and sentences. As syntactic structure is based on formal rules (Chomsky, Reference Chomsky1965), the parser generates discrete rule-based predictions (such as expecting a noun after a determiner). In prosody, syllables combine into feet that group into phonological words, phonological phrases, intonation phrases, and utterances (Nespor and Vogel, Reference Nespor and Vogel1986; Selkirk, Reference Selkirk1984). While the precise nature of hierarchy in prosody and syntax is not the same, a range of evidence suggests that they can both be characterised as hierarchical sequences (for a detailed overview of hierarchy in prosody and syntax, please refer to Kotz et al., Reference Kotz, Schwartze and Schmidt-Kassow2009; for an ongoing debate on linguistic syntax and neural oscillations, please see Lo et al., Reference Lo, Henke, Martorell and Meyer2023 and Kazanina and Tavano, Reference Kazanina and Tavano2023). It has been proposed that prosodic regularities are actively exploited during speech perception and production (Cutler, Reference Cutler, Sundberg, Nord and Carlson1991, Reference Cutler1994). Empirical evidence shows that the expectation of these regularities can even drive typical adults to erroneously produce lexical words by inserting prosodic boundaries before strong syllables and grammatical words by deleting boundaries before weak syllables (Cutler and Butterfield, Reference Cutler and Butterfield1992). More recent neurolinguistics findings show remarkably similar results to missing pulse experiments. In one such experiment, participants listened to speech stimuli in Mandarin in which syllable onsets were clearly marked but which contained no overt cues of phrase or sentence boundaries. Materials were constructed such that every sentence (1,000 ms) was built up of one noun phrase (500 ms) and one verb phrase (500 ms), each containing two monosyllabic words (250 ms each). Sentences were presented auditorily in immediate succession and no pauses such that the input sequence contained clear acoustic cues to syllables but no acoustic cues of phrase and sentence boundaries. Mandarin-speaking participants showed neural oscillations corresponding to the frequencies of syllables (4 Hz), phrases (2 Hz), and sentences (1 Hz), while participants with no knowledge of Mandarin only showed oscillations at the acoustically present syllable frequency (Ding et al., Reference Ding, Melloni, Zhang, Tian and Poeppel2016). A recent criticism of this study has remarked that syntactic phrase and sentence boundaries corresponded with (acoustically missing) prosodic boundaries in these materials (Glushko et al., Reference Glushko, Poeppel and Steinhauer2022). Replicating the original study with prosodic and syntactic boundaries distinguished, the authors suggest that these oscillations are more likely to reflect (covert) prosodic boundaries or an interaction between prosodic and syntactic boundaries than purely syntactic boundaries (Glushko et al., Reference Glushko, Poeppel and Steinhauer2022). Nevertheless, these studies provide evidence that neural entrainment to spoken language goes beyond stimulus acoustics and reflects top-down processes of internal hierarchical structure building in the absence of direct structural information in the input (Meyer et al., Reference Meyer, Sun and Martin2019).
25.3 Reviewing and Categorising Rhythmic Stimulation Effects
Several studies showed that rhythmic stimulation, that is, exposure to a rhythm prior to or during a linguistic task, can influence linguistic performance. All of the rhythmic stimulation studies reviewed here show short-term effects on subsequent language processing. Most of these studies are based on one of three manipulations: imposing a regular metrical structure on speech stimuli, creating an alignment between the rhythmicity of music and the linguistic stimulus (rhythmic cueing), or manipulating the structural regularity of the musical stimulus irrespective of the linguistic stimulus (rhythmic priming). A more detailed description of these paradigms as well as a summary of evidence from each will be elaborated in the following paragraphs.
In studies imposing a rhythm on speech stimuli, experimental sentences are created by manipulating the alternation of strong and weak syllables to create either a highly regular trochaic (strong–weak) or a less regular pattern. To our knowledge, all of these experiments have been conducted in German. These studies report that sentences with a regular metrical pattern are processed more easily than less regular sentences. In typical speakers, event-related potentials typically evoked by unexpected semantic (N400) and syntactic (P600) events show a lower amplitude in metrically regular than irregular sentences (Kotz and Schmidt-Kassow, Reference Kotz and Schmidt-Kassow2015; Roncaglia-Denissen et al., Reference Roncaglia-Denissen, Schmidt-Kassow and Kotz2013, Reference Roncaglia-Denissen, Schmidt-Kassow, Heine and Kotz2015; Rothermich et al., Reference Rothermich, Schmidt-Kassow and Kotz2012; Rothermich and Kotz, Reference Rothermich and Kotz2013). This finding is interpreted as a facilitation effect stemming from the interface between metric and syntactic expectancies. Highly regular sentences are also highly predictable and provide a clear metrical grid as reliable cues when the next strong syllable is expected to arrive. These temporal predictions of when a next event will occur could facilitate syntactic expectations of what will occur. Unlike typical adults, speakers with neurodegenerative diseases or focal lesions to the basal ganglia normally show no P600 in response to syntactic expectancy violations. However, when these violations are embedded in metrically regular sentences, the P600 is restored. This finding is taken as evidence that the syntactic deficit in these patients stems from an underlying temporal processing deficit, which can be compensated by the regular metrical structure of the target sentences (Kotz and Gunter, Reference Kotz and Gunter2015; Kotz and Schmidt-Kassow, Reference Kotz and Schmidt-Kassow2015).
Unlike these prior studies, rhythmic cueing experiments manipulate both musical and speech rhythm. Here, rhythmic stimuli precede the linguistic stimulus and the key variable is the alignment between the stress pattern of the musical rhythm preceding an utterance and that of the utterance itself. Typical adult speakers show faster detection of a target phoneme in speech stimuli, while children with cochlear implants show more accurate phonological reproduction of the target speech stimulus when it is preceded by a cue with a matching rather than a mismatching stress pattern (Cason et al., Reference Cason, Astésano and Schön2015a, Reference Cason, Hidalgo, Isoard, Roman and Schön2015b; Cason and Schön, Reference Cason and Schön2012). These findings are usually interpreted within the framework of the dynamic attending theory (DAT, Large and Jones, Reference Large and Jones1999). Originally proposed to explain how listeners process systematic changes in events, DAT proposes that attention is not distributed evenly over time but fluctuates in an oscillatory manner. Internal oscillators can entrain to the rhythmicity of an external stimulus such that attentional peaks correspond to when predictable stimuli are expected. In rhythmic cueing experiments, if the stress pattern of the preceding musical rhythm matches that of the target utterance, some of the attentional oscillations might entrain to this stress pattern and will orient attention to stressed syllables in a speech stimulus. This results in better performance when the cueing and target rhythms match.
Rhythmic priming studies manipulate the musical rhythm preceding a linguistic stimulus irrespective of the rhythm of the speech stimulus itself. Here, the key variable is the regularity of the musical prime, defined as the ease with which the prime allows listeners to extract its underlying metrical structure. Results show that typical children and adults, as well as children with developmental language disorder and developmental dyslexia, show improved grammaticality judgement after a regular prime than an irregular prime or after a rhythmically neutral or silent baseline (Bedoin et al., Reference Bedoin, Brisseau, Molinier, Roch and Tillmann2016; Canette et al., Reference Canette, Bedoin and Lalitte2020a, Reference Canette, Fiveash and Krzonowski2020b, Reference Canette, Lalitte and Bedoin2020c; Chern et al., Reference Chern, Tillmann, Vaughan and Gordon2018; Fiveash et al., Reference Fiveash, Bedoin, Lalitte and Tillmann2020; György et al., Reference György, Saddy, Kotz and Franck2024; Ladányi et al., Reference Ladányi, Lukács and Gervain2021; Przybylski et al., Reference Przybylski, Bedoin and Krifi-Papoz2013). The rhythmic priming effect may be specific to language processing, not affecting mathematics or visuo-spatial control tasks in English children, semantic fluency in French children, and picture naming and Stroop tasks in Hungarian children (Chern et al., Reference Chern, Tillmann, Vaughan and Gordon2018; Ladányi et al., Reference Ladányi, Lukács and Gervain2021). Rhythmic priming studies are frequently interpreted in the framework of DAT. Here, if the metrical structure of the external stimulus is regular and allows for reliable predictions of when the next key event will occur, listeners’ attention will focus on the target event, facilitating sequencing and predictive mechanisms. Accordingly, as syntactic processing also relies on sequencing and (syntactic) predictive mechanisms, it is argued that entrainment to rhythmically regular musical stimuli could benefit subsequent syntactic processing (Bedoin et al., Reference Bedoin, Brisseau, Molinier, Roch and Tillmann2016; Canette et al., Reference Canette, Bedoin and Lalitte2020a; Przybylski et al., Reference Przybylski, Bedoin and Krifi-Papoz2013).
Given the growing number and variety of rhythmic stimulation studies, we suggest that it may be relevant to refine our characterisation of rhythmic stimulation effects on language processing. These effects differ in two key dimensions: the temporal relationship between the rhythmic stimulation and the test stimulus (i.e., whether linguistic stimuli are presented in synchrony with or shortly after rhythmic stimulation), and the language component affected by the rhythmic stimulation (i.e., whether the linguistic task taps into a more perceptual or more abstract component of language processing).
Temporality varies such that the two stimuli can either be presented simultaneously or after a short delay. In synchronous presentation, the rhythmic stimulus falls at a particular time point of the test linguistic stimulus corresponding to enhanced (attentional) sensitivity. Such rhythmic manipulations facilitate the processing of a target stimulus when it falls on beat, on a strong beat, or on a temporally predicted event due to the temporal structure of the rhythmic stimulus. In the case of a short delay (cueing or priming), the test stimuli are presented immediately or a very short time after rhythmic stimulation. In rhythmic cueing experiments, facilitatory effects can be observed when there is a match (versus mismatch) between the metrical structure of the rhythmic stimulation and the prosodic structure of the test stimuli. In other words, the rhythmic and test stimuli are structurally aligned. In rhythmic priming experiments, facilitation effects are observed following the presentation of a musical prime with a regular (versus irregular or neutral) metrical structure even though there is no direct alignment between the metrical structure of the musical prime and the prosodic structure of the linguistic stimuli.
It seems important to distinguish between tasks that involve the processing of linguistic units grounded in perception (e.g., phoneme detection), on which the rhythmic stimulation may have a direct effect, and the processing of more abstract aspects of language that have no direct anchoring in perception (e.g., grammaticality judgement).
The studies reported in the literature appear to cover nearly all possible combinations of the temporality of the stimuli and the nature of the linguistic task: synchronised effects have been found for grammatical and semantic tasks (Kotz and Schmidt-Kassow, Reference Kotz and Schmidt-Kassow2015; Roncaglia-Denissen et al., Reference Roncaglia-Denissen, Schmidt-Kassow and Kotz2013, Reference Roncaglia-Denissen, Schmidt-Kassow, Heine and Kotz2015; Rothermich et al., Reference Rothermich, Schmidt-Kassow and Kotz2012; Rothermich and Kotz, Reference Rothermich and Kotz2013), while short-delay effects were found in the domains of phonology (Cason et al., Reference Cason, Astésano and Schön2015a, Reference Cason, Hidalgo, Isoard, Roman and Schön2015b; Cason and Schön, Reference Cason and Schön2012) and syntax (Bedoin et al., Reference Bedoin, Brisseau, Molinier, Roch and Tillmann2016; Canette et al., Reference Canette, Bedoin and Lalitte2020a, Reference Canette, Fiveash and Krzonowski2020b, Reference Canette, Lalitte and Bedoin2020c; Chern et al., Reference Chern, Tillmann, Vaughan and Gordon2018; Fiveash et al., Reference Fiveash, Bedoin, Lalitte and Tillmann2020; György et al., Reference György, Saddy, Kotz and Franck2024; Ladányi et al., Reference Ladányi, Lukács and Gervain2021; Przybylski et al., Reference Przybylski, Bedoin and Krifi-Papoz2013). However, as discussed above, two kinds of short-delay studies can be distinguished based on a direct structural alignment or lack thereof between the rhythmic and linguistic stimuli. To our knowledge, short-delay phonological effects were only found when the alignment between the rhythmic and test stimuli was manipulated (Cason et al., Reference Cason, Astésano and Schön2015a, Reference Cason, Hidalgo, Isoard, Roman and Schön2015b; Cason and Schön, Reference Cason and Schön2012), while short-delay syntactic effects were only found in experiments where there was no direct alignment between the two structures (Bedoin et al., Reference Bedoin, Brisseau, Molinier, Roch and Tillmann2016; Canette et al., Reference Canette, Bedoin and Lalitte2020a, Reference Canette, Fiveash and Krzonowski2020b, Reference Canette, Lalitte and Bedoin2020c; Chern et al., Reference Chern, Tillmann, Vaughan and Gordon2018; Fiveash et al., Reference Fiveash, Bedoin, Lalitte and Tillmann2020; György et al., Reference György, Saddy, Kotz and Franck2024; Ladányi et al., Reference Ladányi, Lukács and Gervain2021; Przybylski et al., Reference Przybylski, Bedoin and Krifi-Papoz2013).
25.4 Accounting for Rhythmic Stimulation Effects: Insights from Hierarchical Structure Processing
Numerous theories aim to account for the behavioural and neural links between rhythm and language processing. Some of these accounts focus more on the precise localisation of rhythm processing and its interaction with language processing networks (Kotz et al., Reference Kotz, Schwartze and Schmidt-Kassow2009, Reference Kotz, Stockert and Schwartze2014, Reference Kotz, Brown and Schwartze2016; Kotz and Schwartze, Reference Kotz and Schwartze2010; Patel, Reference Patel2011, Reference Patel2012). Other frameworks concentrate more on the precise nature of internal oscillations and their role in rhythm and language processing (Giraud and Poeppel, Reference Giraud and Poeppel2012; Goswami, Reference Goswami2011; please also refer to Chapters 5 and 35). The two approaches are not mutually exclusive as several accounts touch on both key brain regions and the role of neural oscillations (Fujii and Wan, Reference Fujii and Wan2014; Large and Snyder, Reference Large and Snyder2009; Tierney and Kraus, Reference Tierney and Kraus2014). The recently proposed processing rhythm in speech and music (PRISM, Fiveash et al., 2021; please also see Chapter 28) framework has attempted to synthesise several accounts proposing shared mechanisms between musical rhythm and speech processing and identified three key shared components: 1) the precise encoding of low-level information in the acoustic signal; 2) internal oscillations that entrain to external oscillations playing an important role in structural processing, temporal integration, and generating predictions; and 3) coupling between sensory and motor regions involved in music and speech perception (see Fiveash et al., Reference Fiveash, Bedoin, Gordon and Tillmann2021, and Ladányi et al., Reference Ladányi, Persici, Fiveash, Tillmann and Gordon2020, for detailed reviews). In the following paragraphs, we will focus on explanations proposed for the above-presented rhythmic stimulation effects. Referring back to our previous categorisation of rhythmic stimulation effects, we will review the proposed accounts and evaluate to what extent hierarchical structure processing may provide further insight into the rhythmic priming effect.
To account for cases where the language task is presented synchronously with rhythmic stimulation, several studies have relied on DAT (Large and Jones, Reference Large and Jones1999). In these accounts, rhythm could serve as a temporal framework for the language task as the key manipulation is whether the target stimulus falls on a temporally predicted event. A framework suggesting enhanced attentional focus on the target stimulus resulting from entrainment to a regular temporal structure of the rhythmic stimulus provides an adequate explanation. Regardless of whether the task drives the focus on perceptual or abstract components of language processing, providing a directly relevant language-external or language-internal cue to the temporal location of the target stimulus should lead to faster processing. The predictive mechanisms at play are proposed to be subserved by subcortico-cortical networks involving the basal ganglia, the cerebellum, as well as auditory and motor regions (Kotz et al., Reference Kotz, Stockert and Schwartze2014, Reference Kotz, Brown and Schwartze2016; Kotz and Schwartze, Reference Kotz and Schwartze2010). Another key element in some proposed accounts is the interface between syntactic and temporal predictions. Specifically, discrete predictions (e.g., predicting a noun after a determiner) are proposed to interface with temporal predictions based on a specific metrical structure (allowing the parser to predict the beat of a rhythm or to predict when the next strong beat or syllable is expected). However, the precise nature of this interaction is yet to be established.
In cases where the language task is presented immediately or shortly after a rhythmic stimulation, DAT is often cited as the key account for the effects of rhythmic stimulation. Such an explanation appears rather intuitive in rhythmic cueing studies, where the manipulation of interest is the alignment between the metrical structures of the rhythmic and linguistic stimuli, and the language task is often grounded in perception. If the metrical structures of the rhythmic and linguistic stimuli align, internal oscillations, having entrained to the phase or period properties of the rhythmic stimulus, will have also entrained to those of the language stimulus.
A purely DAT-based account may appear less straightforward in rhythmic priming studies, where the key manipulation is the metrical regularity of the prime rhythm with no direct relation to that of the language stimulus, and the language task is based on a more abstract component of language processing. The explanation in these studies is that the temporal cues provided by a regular rhythm enable internal oscillators to more easily entrain to the input and generate temporal expectations based on its metrical structure. As speech and language processing also rely on entrainment and temporal attention, entrained internal oscillators may benefit subsequent language processing (Bedoin et al., Reference Bedoin, Brisseau, Molinier, Roch and Tillmann2016; Canette et al., Reference Canette, Bedoin and Lalitte2020a; Przybylski et al., Reference Przybylski, Bedoin and Krifi-Papoz2013). As outlined above, there appears to be no direct correspondence between the metrical structure of the prime rhythm and that of the language stimuli used in these studies, and the language task relies on a more abstract (syntactic) component of language processing. Therefore, precisely how entrainment to a regular rhythmic prime benefits grammatical processing in a purely DAT-based framework is not fully understood. Furthermore, rhythmic priming effects identified in populations with typically developing speech/language processing and limited attentional capacities (typical infants) also suggest that a purely attention-based account cannot fully explain the priming effect (please also refer to Chapter 18).
A recent account proposes hierarchical cognitive control as a shared mechanism that might better explain the rhythmic priming effect (Asano et al., Reference Asano, Boeckx and Seifert2021). This explanation is two-fold. First, a regular rhythm provides an easily extractable metrical structure that can be constructed by highly automatic processes to construct its metrical representation. Therefore, fewer resources responsible for hierarchical control are necessary for metrical structure building and more of these resources can be used for subsequent syntax processing during regular rhythmic priming, while the opposite is true when the parser is exposed to the irregular rhythm (Asano et al., Reference Asano, Boeckx and Seifert2021). Second, the highly predictable structure of the regular rhythmic prime could also actively support syntactic structure building by providing a predictable metrical grid, thus improving syntactic performance (Asano et al., Reference Asano, Boeckx and Seifert2021; Kotz et al., Reference Kotz, Schwartze and Schmidt-Kassow2009). As mentioned earlier, it has been proposed that rule-based syntactic predictions can interface with temporally predictable cues during language processing (Kotz et al., Reference Kotz, Schwartze and Schmidt-Kassow2009; Kotz and Schmidt-Kassow, Reference Kotz and Schmidt-Kassow2015; Kotz and Schwartze, Reference Kotz and Schwartze2010). One possibility is that rather than establishing a direct link between temporal and syntactic processing, the two systems communicate through a shared system responsible for the internal construction and representation of complex hierarchical structures (György et al., Reference György, Saddy, Kotz and Franck2024). If this were the case, the regular temporal structure of a musical prime might activate this shared network, which, in turn, would facilitate the internal construction of the upcoming hierarchical (prosodic and syntactic) structure due to having already been activated by the hierarchical (temporal) structure of the preceding prime. A structure-based approach, which does not need to be mutually exclusive to other accounts such as dynamic attending or predictive coding, may better explain some of the rhythmic effects reviewed in this chapter. Specifically, observations where there is no clear alignment between the rhythmic stimulus and the target-language stimulus, and the effects reported lie in more abstract structural properties rather than more perceptual features of language processing, might find a more substantial explanation in a complex account based on hierarchical structure building in rhythm and syntax.
25.5 Conclusion
This chapter began with an examination of hierarchical structures where lower-level units are combined into higher-level constituents in language (syntax or prosody) and rhythm processing. It went on to review and present a novel characterisation of rhythmic stimulation effects in language processing based on the temporality of rhythmic stimulation and the nature of the language task used. This characterisation allowed us to isolate a number of short-delay rhythmic stimulation studies where the lack of alignment between the rhythmic and linguistic stimuli and the abstract nature of the linguistic component measured might necessitate a structure-based explanation. Hierarchical structure processing as a shared system between rhythm and language processing may present one, but not the only, potential account for these effects. Further research should seek to systematically evaluate the role of hierarchical structure processing and other proposed shared mechanisms between rhythm and language in the rhythmic stimulation effects reviewed here. More insight into the precise role of each of these mechanisms could pave the way for the development of rhythm-based, non-linguistic diagnostic and therapeutic tools for language disorders. However, before these potential tools could become valid and reliable in clinical practice, it is essential to acquire a more complete theoretical understanding of the rhythm–language interface.
Summary
This chapter aimed to examine rhythm and language processing from the perspective of hierarchical structure building. First, we presented hierarchical structural processing in both domains. Next, we reviewed and characterised rhythmic stimulation effects in language processing. Finally, we isolated effects that could benefit from a hierarchical structure-based explanation.
Implications
This chapter provides a novel classification of rhythmic stimulation effects, highlighting gaps in the literature where further research is needed. It also provides a possible reinterpretation of rhythmic priming effects, which further research could test empirically.
Gains
This chapter provides a novel classification of reported rhythmic stimulation effects, as well as possible interpretations. This might provide more insight into the precise mechanism underlying them. A more complete understanding of these effects could benefit our theoretical knowledge of the rhythm–language interface and pave the way for concrete applications.