The missing link in Spanish heritage trill production

Abstract While heritage language phonology has attracted a great deal of attention, little is known about the development of heritage phonological grammars. This study examines the production of the Spanish trill /r/ by school-aged (9-10 years) and adult heritage speakers. Results showed that the adult heritage speakers produced the trill in a more target-like manner than the child heritage speakers, although half of them diverged from non-heritage native baselines reported in other studies. Further analysis of the distribution of trill variants suggests that heritage Spanish trill development occurs in the order of single lingual constriction → frication → multiple lingual constrictions. However, instead of abandoning variants of early stages, some adult heritage speakers kept them in their trill inventories, demonstrating increased variability. Our findings indicate that 9- to 10-year-old heritage speakers are still in the process of developing heritage phonological grammars and even during adulthood their grammars may not reach stability.


Introduction
Heritage speakers are simultaneous or early sequential bilinguals that acquire a family language that is different from the societal language (Benmamoun, Montrul & Polinsky, 2013). Heritage speakers receive heritage language input mainly in colloquial registers and are rarely exposed to formal varieties. During the school years, heritage speakers become in frequent contact with the societal language and potentially shift their dominance from the heritage language to the societal language (Polinsky & Scontras, 2020;Stevens, 1992). Given these circumstances, heritage speakers are not a homogeneous group. Instead, they display rich heterogeneity in terms of their linguistic proficiency, use, and attitudes toward the heritage language (Montrul, 2008;Valdés, 2014), and demonstrate divergence from monolingual speakers to varying degrees. Potential causes of heritage speakers' divergent grammars are insufficient amount of heritage language input (Putnam, 2019;Putnam & Sánchez, 2013), exposure to input lacking target linguistic properties (Pires & Rothman, 2009), exposure to linguistic varieties other than the parents' varieties, or lack of access to monolingual forms (Lowther Pereira, 2015).
In order to move the field of heritage language acquisition forward, recent scholarship (Montrul, 2018;Polinsky, 2018) has urged to examine the stages of heritage language development over the lifespan. This type of research would provide a better understanding of heritage speakers' divergent grammars. In this study, we adopt a developmental approach to examine heritage language phonology.
Divergence from monolingual grammars is often interpreted as incomplete acquisition or acquisition without mastery (Montrul, 2002;Montrul & Bowles, 2009). While incomplete acquisition is a possible outcome in heritage grammars, this term has raised a lot of controversy in the literature (see Kupisch &Rothman, 2018 andDomínguez, Hicks &Slabakova, 2019 andcommentaries). As an attempt to redefine the construct, Pires and Rothman (2009) proposed a distinction between "true incomplete acquisition" and "missing-input competence divergence". The former arises when the heritage language input presents target linguistic properties and the latter appears when the input lacks the target properties. For instance, Mayr and Siddika (2018) compared the production of Sylheti stops across three generations of Bangladeshi immigrants in the United Kingdom and found that second-generation children produced the Sylheti voiced coronal /ɖ/ and velar stops /gʱ/ with longer voice onset time (VOT) than their first-generation mothers, but in a more target-like manner than age-matched third-generation children. Aside from the amount of input in Sylheti, these two groups of children differ in that the second-generation children received target-like input from their first-generation mothers (i.e., true incomplete acquisition), whereas the third-generation children were exposed to non-target-like stop productions by their second-generation mothers (i.e., missing-input competence divergence).
To understand heritage speakers' divergent grammars, it is important to establish an appropriate baseline for comparison (Polinsky, 2018). If the goal is to answer whether heritage speakers successfully acquired the language to which they were exposed, it would not be informative to compare heritage speakers only with homeland speakers, since the input that they receive may not be the same as the input of their monolingual peers. Heritage speakers' input most likely comes from their caregivers who are first-generation immigrants whose grammars sometimes show signs of L1 attrition after long-term residence away from the homeland. Additionally, heritage speakers are exposed to homeland varieties through interactions with relatives in the homeland or with recent immigrants from the homeland, as well as to other varieties in the speech community. This raises the question of how researchers can best characterize the sources of input that heritage speakers receive and, equally importantly, how to decide the baseline for comparison.
In order to account for the development of heritage speakers' divergent grammars, Polinsky and Scontras (2020) established three scenarios by comparing child heritage speakers (CHS), adult heritage speakers (AHS), and baseline first-generation immigrants (BASE). The first scenario occurs when a given linguistic property is present in the baseline, but it is used differently in both the adult and child heritage speakers (CHS = AHS ≠ BASE) (i.e., incomplete acquisition or divergent attainment). In the second scenario, child heritage speakers pattern like the baseline in their use of the property and adult heritage speakers differ from both groups (BASE = CHS ≠ AHS) (i.e., attrition during childhood). Lastly, adult heritage speakers and the baseline are alike, but child heritage speakers pattern differently from the two groups (CHS ≠ AHS = BASE) (i.e., reanalysis during adulthood).
A comprehensive bulk of research has been conducted on bilingual grammars in early childhood, specifically bilingual children residing in a country where the home language does not coincide with the societal language (i.e., child heritage speakers) (Fabiano-Smith & Goldstein, 2010;Kehoe & Havy, 2018;Lleó & Cortés, 2013;Lleó, 2018aLleó, , 2018b. Based on Paradis and Genesee's (1996) framework of cross-linguistic interaction, many of these studies explained bilingual children's divergence from age-match monolinguals through acceleration, deceleration, and transfer. Acceleration refers to a faster rate of acquisition in bilinguals than in age-matched monolinguals. Lleó, Kuchenbrandt, Kehoe and Trujillo (2003) found that Spanish-German bilinguals produced syllabic codas at an earlier stage than Spanish monolinguals, possibly due to exposure to an input with more codas. Deceleration refers to a slower rate of acquisition of a given linguistic property compared to age-matched monolinguals. For instance, Fabiano-Smith and Goldstein (2010) found that Spanish-English bilingual children between 3;0 and 4;0 years produced Spanish trills, fricatives, and glides with lower accuracy than Spanish monolinguals. Lastly, transfer is defined as the incorporation of a linguistic property of one language into another. Kehoe, Lleó and Rakow (2004) found that a Spanish-German bilingual child (2;3-2;6) produced Spanish voiceless stops with longer VOTs than those described in monolingual grammars. Kehoe (2015)'s review on child bilingual development documents two additional types of cross-linguistic influence: merging and deflecting. Merging arises when two phonological systems coalesce (see Kehoe & Lleó (2017) for assimilation in stressed-to-unstressed vowel duration ratios). Deflecting occurs when two phonological systems maximize their contrasts (see Yang & Fox (2017) for separation of L1-L2 acoustic vowel space).
Most studies on heritage language phonology examine bilingual language development during early childhood or adult heritage grammars. However, there is a lack of research on what happens when heritage speakers become more exposed to the societal language and experience a shift to that language: that is, during school-age period. Exploring this gap in the literature, identified as the "missing link" (Montrul, 2018), would shed light on heritage language phonological development.

Spanish alveolar trill /r/
The Spanish alveolar trill /r/ is canonically produced with 2-3 brief contacts between the tongue tip and the alveolar ridge (i.e., phonetic trill [r]) (Hualde, 2014) and surfaces word-initially (e.g., rana 'frog'), word-internally between vowels (e.g., perro '(male) dog'), or after alveolar consonants /n, l, s/ (e.g., alrededor 'around'). In word-medial intervocalic position, the trill is in phonemically contrastive relation with the tap /ɾ/ which is another rhotic consonant in Spanish (e.g., perro '(male) dog' vs. pero 'but'). According to Solé (2002), syllable-initial intervocalic positions provide an optimal articulatory condition to achieve successful trill production, such as constrained positioning, tongue configuration, and aerodynamic requirements for tongue-tip vibration. This may explain the contrastive nature of this position. In other contexts, the two rhotics are mostly found in complementary distribution.

Trill production by non-heritage native speakers
The production of the phonetic trill [r] requires a complex coordination of articulators and a sufficient amount of oropharyngeal pressure (Lewis, 2004;Solé, 2002). Due to its articulatory complexity, [r] is categorized as one of the latest developing sounds (Acevedo, 1993;Bosch, 1983;Fabiano-Smith & Goldstein, 2010). Typically developing monolingual children often do not have full command of [r] (i.e., 90% accuracy) until the age of 7 (Bosch, 1983) and alternatively substitute [r] with other phones, such as laterals, taps, and /d/, or omit it (Acevedo, 1993;Bosch, 1983;Fabiano-Smith & Goldstein, 2010). Carballo and Mendoza (2000) examined the production of the Spanish trill /r/ by children (3-6.6 years) of different intelligibility levels and compared them to a control group of older children (7.0-9.6 years) who successfully produced the phonetic trill [r]. Overall the high intelligibility group produced /r/ with longer duration, more apertures and occlusions, and shorter first aperture duration than the low intelligibility group, and showed similar values to those of the control group. Given that the high and low intelligibility groups are of similar ages (3-6.6 years), Carballo and Mendoza (2000) argued that the differences found between these groups may be associated with greater or lesser motor control that some children demonstrate when producing /r/ as they progress through maturation.
Besides being classified as one of the latest acquired sounds, the Spanish trill is phonetically realized in various ways within and across dialects, which includes approximant (Díaz-Campos, 2008), fricative (Bradley & Willis, 2012;Colantoni, 2006;Lewis, 2004;Willis, 2006), pre-breathy tap, or tap followed by frication (Bradley & Willis, 2012;Willis, 2006), and it can be either voiced or voiceless (Lewis, 2004). However, in some dialects /r/ is still realized most frequently with two apico-alveolar constrictions (e.g., see Lastra & Martín Butragueño, 2006for Mexico City Spanish or Henriksen, 2014. With regards to positional differences, Henriksen (2014) found that /r/ in phonemically contrastive position (i.e., word-medial intervocalic) presents more occlusions than /r/ in non-phonemically contrastive position (e.g., word-initial). Along the same line, Lastra and Martín Butragueño (2006) showed that the production of /r/ with two or more occlusions is more likely to appear in wordmedial intervocalic position than in word-initial position.
Regarding long-term Latino immigrants in the US, studies have shown that these speakers produce the Spanish /r/ with fewer occlusions than the canonical trill. Kissling (2018) found that long-term immigrants in Virginia produced /r/ with an average of 1.34 occlusions. Similarly, Henriksen (2015) found that long-term immigrants in Chicago presented a mean value of 1.20 occlusions. Due to the phonetic variation of the Spanish /r/ in non-heritage native grammars, it is important that the baseline group(s) of heritage language studies reflect such variation.

Trill production by heritage speakers
Various studies on Spanish heritage speakers have found a prevalence of non-target-like realizations and high variability in the trill productions (e.g., fricative, approximant tap, approximant trill) by both children (Fabiano-Smith & Goldstein, 2010;Kehoe & Havy, 2018;Menke, 2018) and adults (Amengual, 2016;Henriksen, 2015;Kissling, 2018). For instance, Kehoe (2018) examined the speech of Spanish-German bilingual children in Germany longitudinally from 1;9 to 3;6 years and compared their rhotic development to that of Spanish and German monolingual children. The Spanish monolinguals showed target-like realizations of the Spanish trill (i.e., phonetic trill [r]) at 3;0 years (60% accuracy), whereas for the bilinguals they occured at a later stage (3;6 years) (50-60% accuracy) by only half of the speakers. Similarly, Fabiano-Smith and Goldstein (2010) found that Spanish-English bilingual children in the US (3;6 years) produced the Spanish trill in a target-like manner less frequently (4.1%) than age-matched monolinguals of Mexican Spanish (37.5%). Note that while the monolinguals produced more targetlike trills, their accuracy rate was still low, which indicates that Spanish speakers do not completely acquire this sound early on. Further examining heritage speakers' trill development in childhood, Menke (2018) investigated the Spanish trill produced by school-aged child heritage speakers in the US between Grade 1 (6;8-7;6) and Grade 7 (12;8-13;5). Allophonic variants attested in monolingual data (i.e., phonetic trills, taps with frication, and assibilated trills) were considered target-like. Results showed that target-like trill rates gradually increased from Grade 1 (27.2%) to Grade 7 (76%), while the number of alveolar approximants followed a reverse path and decreased from Grade 1 (18.2%) to Grade 7 (0%). Segment duration also increased from Grade 1 (75.3 ms) to Grade 7 (84.65 ms). The number of occlusions of the phonetic trill variants, however, was consistent across the age groups (from 2.7 times in Grade 1 to 2.59 times in Grade 7). According to Menke (2018), this delay in development might be caused by the increase of exposure to English through interactions with school peers and teachers when child heritage speakers are still acquiring the trill.
Studies on adult heritage speakers lay out a more complex scenario where target-like rates vary depending on language dominance (Amengual, 2016), cultural identity (Kissling, 2018), and the type of baseline used for comparison (Kissling, 2018;Henriksen, 2015). For example, Henriksen (2015) examined heritage speakers in Chicago and included long-term immigrants as the reference level. Even though the heritage speakers produced trills with fewer lingual constrictions (1.10 occlusions) and shorter duration (70.49 ms) than the long-term immigrants (1.20 occlusions, 74.17 ms), no significant difference was found between the two groups. Nevertheless, Henriksen (2015) observed a difference in the manner of articulation, where the long-term immigrants favored fricatives, whereas the heritage speakers favored alveolar approximants. Kissling (2018) incorporated both longterm immigrants and homeland speakers as baseline groups to examine heritage speakers' trill production. Results showed that the heritage speakers presented trills with significantly shorter durations (80.78 ms), fewer occlusions (1.39) and more frication (16.45 ms) than the homeland speakers (duration: 89.26 ms;

456
Gemma Repiso-Puigdelliura and Ji Young Kim occlusion: 1.83; and frication portion: 9.04 ms), but no significant difference was found between the heritage speakers and the longterm immigrants. Finally, Amengual (2016) explored the variation among heritage speakers based on their language dominance and found that English-dominant speakers realized the trill with 0 or 1 occlusion, while Spanish-dominant speakers produced the majority of their trills with two or more occlusions.
In spite of the existing body of research on child heritage speakers (Fabiano-Smith & Goldstein, 2010;Kehoe, 2018;Menke, 2018) and adult heritage speakers (Kissling, 2018;Amengual, 2016;Henriksen, 2015), a direct comparison between the two cannot be drawn due to different research methods. For instance, in Kehoe's (2018) study, the child heritage speakers were bilingual speakers of Spanish and German, unlike other studies in which the heritage speakers were Spanish-English bilinguals. Moreover, while the studies on children used single-word picture naming tasks (Fabiano-Smith & Goldstein, 2010;Menke, 2018), the studies on adults used semi-spontaneous speech (Kissling, 2018;Henriksen, 2015) or a sentence reading task (Amengual, 2016). That is, some tasks were more controlled (e.g., reading, picture naming task) than others (e.g., semispontaneous speech), which may affect the articulation of the trill to varying degrees. Thus, it is important to use the same research method when comparing children and adults.

Research Questions
In this study we compared the production of the Spanish trill by school-aged child heritage speakers (i.e., past the age at which normally developing monolingual children acquire the trill) to adult heritage speakers. In this study we intend to answer the following research questions: (1) Is there an effect of age on heritage speakers' production of the Spanish trill? That is, do child heritage speakers and adult heritage speakers differ in their production of the trill?
An attrition-based model would predict that child heritage speakers will produce the trills in a more target-like manner than adult heritage speakers (CHS > AHS). A delayed-developmentbased model would predict that child heritage speakers will produce the trill in a less target-like manner than adult heritage speakers (CHS < AHS). A developmental path in which the acquisition of the trill is arrested would predict that child and adult heritage speakers will not significantly differ in their production of target-like trills (CHS = AHS).
(2) Is there an effect of position on heritage speakers' production of the Spanish trill?

Participants
Sixteen adult heritage speakers (14 F, 2 M, mean age = 20.5 years, SD = 1.5) and 11 child heritage speakers (5 F, 6 M, mean age = 9.6 years, SD = 0.54) participated in the study. All the participants were Mexican Americans who were born and raised in Los Angeles county, and had both parents that immigrated to the US from Mexico as adults, except for 3 child heritage speakers (CHS2, CHS9, and CHS10) for whom only one of their parents were from Mexico. 1 The heritage speakers were first exposed to Spanish at home and learned English before age 5. The adult heritage speakers were undergraduate students at the University of California, Los Angeles, and had previously taken courses in Spanish. The child heritage speakers were recruited from Spanish-English dual language immersion programs at two elementary schools in Los Angeles, in which the instructors mainly use Mexican Spanish.

Procedures
The heritage speakers narrated the story of a wordless picture book, Frog where are you? (Mayer, 1969) (henceforth the frog story). The frog story is an appropriate tool to elicit the Spanish trill in a naturalistic manner, since it includes many instances of words with this sound (e.g., perro 'dog', rana 'frog'). The recordings were conducted in a quiet room using an AKG C520 head-mounted microphone connected to a Zoom H4n handy portable digital recorder with a sampling rate of 44.1 kHz and a sample size of 16 bits.
Following Polinsky and Kagan (2007), we used lexical proficiency to assess participants' Spanish proficiency. 2 Based on the speakers' oral narratives, we calculated two measures of lexical diversity (i.e., VOCD 3 (McCarthy & Jarvis, 2007) and number of different words in the first 100 words (NDW)) and a measure of lexical sophistication (i.e., content word frequency). The first two measures were calculated using the Child Language Analysis (CLAN) program (MacWhinney, 1992) and the absolute content word frequency (per million) (Crossley & McNamara, 2012) was calculated using the CLEARPOND software interface (Marian, Bartolotti, Chabal & Shook, 2012) which adopts the SUBTLEX-ESP corpus (Cuetos, Glez-Nosti, Barbón & Brysbaert, 2011). Table 1 summarizes the results. We performed independent-samples t-tests to compare the proficiency measures between the two groups. Results showed that the adult heritage speakers performed significantly better than the child heritage speakers in VOCD (t(25) = 2.44, p < 0.05) and NDW (t(25) = 2.11, p < 0.05), whereas the content word frequency did not differ between the two groups (t(20) = −0.11, p = 0.91). This suggests that, as heritage speakers grow up, their vocabulary size may increase, but they may not acquire more advanced vocabulary due to limited domains of heritage language use.

Coding and Analysis
Forced alignment of heritage speakers' speech was carried out at the segmental level using EasyAlign (Goldman, 2011) which is a plug-in of Praat (Boersma & Weenink, 2020). All instances of Spanish phonological trill in word-initial (e.g., rana 'frog') and word-medial intervocalic positions (e.g., perro 'dog') were extracted. The classification of the variants was adapted from 1 The other parent was non-Spanish speaker (CHS2), Mexican-American (CHS9), or Columbian (CHS10). Rose (2010): phonetic trill, approximant trill, tap, approximant tap, perceptual tap, fricative, and tap+fricative. The phonetic trill was identified as a token with two or more occlusions, represented as clear breaks in the spectrogram. If a token showed two or more visible constrictions in the spectrogram, but with a continuation of the formant structure (i.e., trill with weaker lingual constriction), it was coded as an approximant trill. The true tap was coded when there was one occlusion that was clearly marked in the spectrogram. The approximant tap was identified as a token with a vertical band with continuation of the formant structure (i.e., tap with a weaker constriction). The perceptual tap was coded when a tap gesture was auditorily perceived, but no constriction was identified in the spectrogram. The fricative was coded when a turbulent noise was visible in the acoustic signal without any occlusions. The tap+fricative was coded when the tap was followed by an aperiodic waveform. Table S1 (Supplementary Material) demonstrates an example of each category. The variants that do not fit into any of the categories above were coded as "other" (e.g., lateral or retroflex realizations). To ensure inter-rater reliability, the kappa statistic was performed on a subset of the data (i.e., 12 participants, 337 tokens, 33.3% of the data) which were annotated by the authors who are trained phoneticians. The results showed that there was only a fair agreement between the two annotators (K = 0.437, p < 0.001), according to Landis and Koch's (1977) interpretation. The low kappa coefficient was mainly due to discrepancies in the annotation of the data of one speaker. Thus, we reviewed the discrepancies, re-annotated the remaining data, and re-ran the kappa statistic in which the kappa coefficient reached the level of almost perfect agreement (K = 0.868, p < 0.001).
The present study analyzed heritage speakers' production of phonetic trills and target-like trills. The trills were considered target-like if they were produced with two or more brief lingual constrictions: that is, as phonetic trills or approximant trills. Regarding the acoustic properties of the trill, segment duration (ms) and the number of lingual constrictions were extracted from all the tokens. For the tokens produced with two or more constrictions, the duration of the first aperture (ms) was also measured.
Statistical analyses were performed using R statistical software (R Core Team, 2020). Generalized linear mixed-effects models were conducted for the analyses of phonetic trills and target-like trills (i.e., binary data) using the glmer function in the lme4 package (Bates, Mächler, Bolker & Walker, 2015) with group (adult vs. child), position (initial vs. medial), and their interaction as fixed effects and participant and word as random effects. For the acoustic properties (i.e., continuous data) linear mixed-effects models were performed using the lmer function in the same package. Post-hoc power analyses (1-β) were simulated (100 simulations) using the simr package (Green & MacLeod, 2016). We report log odds ratio (OR) as a proxy for the effect size in the generalized linear mixed effects models. For the linear mixed effects models, we report the R 2 statistic (Snijders & Bosker, 1994;Bryk & Raudenbush, 1992) as a measure of explained variance.

Realization of the Spanish phonological trill
In total 836 cases of Spanish phonological trill were obtained. Among them 25 tokens were excluded from the analyses due to creaky voice (N = 21), devoicing (N = 1), or incorrect production (N = 3) (e.g., rona instead of rana 'frog'). The remaining 811 tokens consisted of 345 word-initial and 466 word-medial intervocalic phonological trills. As demonstrated in Table 2, the trills were realized in various forms and their distribution slightly differed depending on the age group and the position within the word. The adult heritage speakers produced the (phonological) trills most frequently as (phonetic) trills regardless of the position (word-initial: 27.85%, word-medial: 33.45%), although in word-initial position fricatives (25.32%) were also frequently found. With regard to the child heritage speakers, the fricative was the most commonly observed variant in word-initial position (40.74%), whereas in word-medial position no clear preference for a particular form was found.
As for the percentage of target-like trills, results showed that there was a main effect of group (β = −2.727, SE = 0.817, z = −3.336, p < 0.001, 1-β = 0.89, OR = 0.065, 95% CI [0.010, 0.301]), which indicates that the adult heritage speakers produced the trills in a target-like manner with significantly higher rates (M = 43.69%, SD = 49.65) than the child heritage speakers (M = 12.84%, SD = 33.51). No main effect of position or significant interaction between group and position was found.   The explained variance (i.e., R 2 as per Bryk & Raudenbush, 1992;Snijders & Bosker, 1994) was higher in the main effect of group than in the main effect of trill position. No significant interaction was found between the two fixed factors.

Discussion
Various studies in heritage language acquisition (Montrul, 2018;Polinsky, 2018) have encouraged researchers to bridge the gap between the scholarship on early bilingualism and adult heritage speakers. The present study followed this line of research in order to better understand Spanish heritage speakers' trill development and account for their divergent grammars when compared to non-heritage native speakers.
The first objective of this study was to compare trill production between child and adult heritage speakers. Our results showed that, compared to the 9-10 year-old child heritage speakers, the adult heritage speakers produced significantly higher rates of phonetic trills (i.e., with two or more clear occlusions) and targetlike trills (i.e., phonetic trills and variants with two or more soft constrictions that resemble phonetic trill production), and produced the trills with significantly more lingual constrictions and longer duration. However, an agreement has yet to be reached  in establishing the baseline of comparison for heritage speakers (Otheguy, 2016). In this section, we discuss the appropriate baseline groups for heritage speakers and compare our findings to those of the baseline reported in other studies that used the same or similar data elicitation methods (i.e., (semi-)spontaneous speech).

Identifying the baseline
Assuming that the heritage speakers in this study have been exposed to varieties of monolingual Mexican Spanish (e.g., family in parents' hometown in Mexico, recent immigrants from Mexico, Mexican media), as well as bilingual Spanish (e.g., long-term immigrants from Mexico), we compared our results to those reported in monolingual and bilingual varieties of Mexican Spanish. Table 3 summarizes the findings across groups.
We first contrasted our findings to those of monolingual Spanish varieties, specifically Veracruz Mexican Spanish (Bradley & Willis, 2012), Central Mexican Spanish (and recent immigrants) (Kissling, 2018), and Mexico City Spanish (Lastra & Martín Butragueño, 2006). The first two studies used the frog story for data elicitation, same with our study, and the third study analyzed natural conversations. With regard to trill production rates, Bradley and Willis (2012) defined normative trill as the variant consisting of two or more visible lingual contacts represented as a clear reduction in intensity in the waveform and spectrogram, which coincides with our criteria for target-like trill. Lastra and Martín Butragueño (2006) defined vibrante (rr) as the variant demonstrating two or more brief interruptions of energy corresponding to "spaces in white" in the spectrogram, which coincides with our criteria for phonetic trill. Thus, we make comparisons with heritage speakers' trill rates for both target-like trills and phonetic trills. The monolinguals in Bradley and Willis (2012) demonstrated a slightly higher targetlike trill rate (49.26%) than the adult heritage speakers in this study (43.69%) and a much higher rate than the child heritage speakers (12.84%). As for the monolinguals in Lastra and Martín Butragueño (2006), the phonetic trill rate was even higher (65%). With regard to the number of lingual constrictions, both the adult heritage speakers (1.39) and the child heritage speakers (0.8) demonstrated lower values than the speakers in Kissling (2018) (1.87). Regarding segment duration, the adult heritage speakers (word-initial: 72.33 ms, intervocalic: 62.81 ms) and the child heritage speakers (word-initial: 56.4 ms, intervocalic: 44.13 ms) presented shorter trills, compared to those of the speakers in Bradley and Willis (2012) (word-initial: 77 ms, intervocalic: 70 ms) and Kissling (2018) (position combined: 89.26 ms).
To compare our results to those of bilingual varieties, we relied on the findings of long-term immigrants from Mexico (Henriksen, 2015;Kissling, 2018). Both studies used the frog story for data elicitation. While these studies did not report phonetic or target-like trill rates, Henriksen (2015) presented the

460
Gemma Repiso-Puigdelliura and Ji Young Kim distribution of tokens with varying numbers of occlusions (0-4) which he determined using similar criteria as Bradley and Willis (2012). Thus, we calculated the percentage of cases in which the trill was produced with 2 occlusions or more and, based on this information, the long-term immigrants' trill rate in Henriksen (2015) was 39.63%. This is in between the target-like trill rate (43.69%) and the phonetic trill rate (30.87%) of the adult heritage speakers in our study and much higher than both child heritage speakers' target-like trill rate (12.84%) and phonetic trill rate ( To summarize, when monolingual speakers of Mexican Spanish (Bradley & Willis, 2012;Kissling, 2018;Lastra & Martín Butragueño, 2006) were set as the baseline, the heritage speakers in our study, both adults and children, seemed to diverge from the baseline in all three acoustic properties (i.e., phonetic/ target-like trill rates, number of lingual constrictions, and segment duration) and the child heritage speakers demonstrated stronger divergence than the adult heritage speakers (CHS < AHS < BASE). On the other hand, when long-term immigrants from Mexico (Henriksen, 2015;Kissling, 2018) were set as the baseline, we found mixed results. As for the phonetic/target-like trill rates and the number of occlusions, only the adult heritage speakers produced the trills in a similar manner as the baseline, while the child heritage speakers produced them with lower rates and with fewer occlusions (CHS < AHS = BASE). With regard to segment duration, both heritage speaker groups produced the trills with shorter duration than the baseline, and the deviance from the baseline was larger for the child heritage speakers than the adult heritage speakers (CHS < AHS < BASE).
In the case of the child heritage speakers, apart from the two baseline groups above, a comparison with Spanish monolingual children (Carballo & Mendoza, 2000) was carried out to explore whether these speakers show similar developmental patterns to those of age-matched monolinguals (Paradis & Genesee, 1996). Since Carballo and Mendoza (2000) used a different elicitation method (i.e., picture-naming task) and the participants were Peninsular Spanish speakers (Granada), we acknowledge that it is not ideal to directly compare the findings of the two studies. However, to our knowledge, Carballo and Mendoza (2000) is the only study that investigated the trill production of school-aged Spanish monolingual children. Thus, we make this comparison with caution.
Compared to the monolingual peers in Carballo and Mendoza (2000), the child heritage speakers in our study produced fewer lingual constrictions (0.8 vs. 2.3) and shorter trills (48.6 ms vs. 115.7 ms). While the large durational difference between the two groups may be due to difference in the number of lingual constrictions, it may also be associated with different task types. That is, the speakers in Carballo and Mendoza (2000) might have produced longer trills, because the task elicited more controlled speech than in our study. It is worth pointing out that the child heritage speakers in Menke (2018), who completed a similar task as in Carballo and Mendoza (2000) (i.e., picturesorting task), also demonstrated noticeably shorter durations (70.4 ms) than the monolingual children. Thus, the shorter segment duration found in our study compared to Carballo and Mendoza (2000) is likely to be a result of both fewer lingual constrictions and task type. With regard to the duration of the first aperture in target-like trills, which is a property that clearly distinguishes more proficient from less proficient trillers (Carballo & Mendoza, 2000), the child heritage speakers in our study (20.41 ms), as well as the adult heritage speakers (22.46 ms), presented similar values to those of the monolingual children (21.9 ms) in Carballo and Mendoza (2000). To summarize, the child heritage speakers produced the trill with fewer occlusions and shorter segment duration than the agematched monolingual baseline, but when they are able to produce the trill with two or more lingual constrictions (i.e., in a target-like manner), they do so with the same degree of articulatory precision as monolingual children.

Effects of position on heritage speakers' trill production
The second objective of this study was to examine whether phonetic trill production was affected by the position of the trill in the word (word-medial and word-initial). In this study, we found that, while heritage speakers' trills in word-medial intervocalic position (i.e., phonemically contrastive) were not produced more frequently as phonetic/target-like trills than those in word-initial position (i.e., non-phonemically contrastive), they were produced with more lingual constrictions. This is similar to the findings of monolingual Spanish varieties (Henriksen, 2014;Lastra & Martín Butragueño, 2006). Thus, our data align with previous studies in heritage language phonology in that heritage speakers maintain the distinction in language-internal phonemic contrasts (Chang et al., 2009(Chang et al., , 2011Einfeldt et al., 2019;Lein et al., 2016). As suggested in Kupisch (2020), heritage speakers may maintain or even over-mark phonemic contrasts as a way to ease the overtaxing costs of one-to-more mappings in a situation in which more than one language competes for limited cognitive resources (i.e., avoidance of ambiguity in Polinsky & Scontras, 2020).
With regard to segment duration, we found that the heritage speakers produced the trills with longer duration in word-initial (adult: 72.33 ms, child: 56.4 ms) than in word-medial intervocalic position (adult: 62.81 ms, child: 44.13 ms). While this may appear counterintuitive, the longer duration in word-initial position is likely to be an effect of domain-initial strengthening, by which consonants in higher prosodic domains (e.g., word-initial) are produced with stronger articulation (e.g., longer duration) than those in lower prosodic domains (e.g., word-medial) (Fougeron & Keating, 1997).

Connecting the dots between child and adult heritage speakers' trill production
With the goal to account for adult Spanish heritage speakers' divergent trill production from the monolingual norms (Amengual, 2016;Kissling, 2018;Henriksen, 2015), the present study adopted a developmental approach by directly comparing child and adult heritage speakers. We then compared our findings to those of non-heritage native speakers reported in other studies (Bradley & Willis, 2012;Carballo & Mendoza, 2000;Henriksen, 2015;Lastra & Martín Butragueño, 2006;Kissling, 2018). While both the adult heritage speakers and the child heritage speakers showed divergence from the non-heritage native baselines in one or more phonetic properties of the trill, the adult heritage speakers produced the trill in a more target-like manner than the child heritage speakers (CHS < AHS). Thus, our findings support Menke (2018) in that heritage speakers continue developing the Spanish phonological trill during childhood. Moreover, our data showed that, apart from the adult baseline groups, the child heritage speakers diverged from age-matched monolingual children, suggesting that heritage trill development occurs at a slower rate compared to their monolingual peers. Deceleration in bilingual development has also been proposed in Goldstein and Washington (2001) in which 4-year-old child heritage speakers produced the Spanish trill with lower accuracy than the English approximant. Goldstein and Washington (2001) argued that, as a way to distinguish their two phonological systems, it is likely that child heritage speakers focus on mastering the English approximant prior to later-developing sounds in Spanish, such as taps and trills.
When comparing heritage speakers to the baseline groups, it is important to take into account that non-heritage native speakers demonstrate variation. Lastra and Martín Butragueño (2006) found that Mexico City Spanish speakers mainly used three trill variants: normative trill (65%), non-sibilant fricative (19%), and sibilant fricative (14%). Similarly, Bradley and Willis (2012) showed that the representative allophones found in Veracruz Mexican Spanish were normative trill (49.3%), tap followed by vocalic r-coloring or frication, and non-vibrant forms such as fricatives. 4 With regard to long-term Mexican immigrants, Henriksen (2015) found that more than half of the trills had zero or one occlusion. Although allophonic distribution of nonnormative trills was not the main focus of the study, Henriksen (2015) reported that the variants with zero occlusion were primarily fricatives. Moreover, based on the findings in Henriksen (2015), in which the speakers who mainly produced the trills with one occlusion demonstrated significantly longer duration than the phonological taps, it is likely that the trill variants with one occlusion were taps followed by vocalic r-coloring or frication, similar to those in Bradley and Willis (2012). Thus, it appears that, apart from the normative trill, non-heritage native speakers often use variants containing frication (i.e., sibilant/nonsibilant fricative, tap followed by frication).
We further explored the distribution of heritage speakers' non-target-like realizations of the Spanish trill (i.e., variants other than the phonetic trill and the approximant trill). We found that those containing frication (i.e., fricative, tap+fricative) were frequently used by both the child heritage speakers (46.96%) and the adult heritage speakers (41.55%) (see Table 1), which suggests that, like the non-heritage native baselines, heritage speakers associate frication with the Spanish trill. We also found that, while the phonetic tap and its continuant variants (i.e., approximant tap, perceptual tap) comprised a large part of the child heritage speaker data (36.15%), the adult heritage speakers produced 4 The frequency of the non-normative allophones was not reported in the study.

462
Gemma Repiso-Puigdelliura and Ji Young Kim these variants in only 12.62% of the time. That is, the variants related to the phonetic tap and those with frication were the two most frequent types of realization in the child heritage speakers' speech, while for the adult heritage speakers the two most frequent types of realization were the variants with frication and those related to the phonetic trill (i.e., phonetic trill, approximant trill). We classified the allophonic variants presented in Table 1 into three broad categories: trill (i.e., phonetic trill, approximant trill), frication (i.e., fricative, tap+fricative), and tap (i.e., true tap, approximant tap, perceptual tap). Table 4 presents heritage speakers' trill inventories based on the types that comprised more than 10% of their productions. The last row (i.e., trill, frication) represents the inventory found in non-heritage native baselines.
All the heritage speakers used frication as one of the strategies to produce the trill, except for one child heritage speaker (CHS2) who consistently produced the tap variants (true tap: 50%, approximant tap: 38.9%, perceptual tap: 11.1%). Almost half of the adult heritage speakers (AHS2, AHS3, AHS5, AHS6, AHS8, AHS10, AHS11, AHS13, AHS15) used both trill and frication (i.e., target-like inventory), while fewer than a third of the child heritage speakers (CHS3, CHS4, CHS8) demonstrated this pattern. Note that these speakers largely coincide with the ones whose target-like trill rates and number of lingual constrictions were within the baseline range (see Section 7.1). While most of them produced the trill variants more frequently than the variants with frication, 1 adult heritage speaker and 1 child heritage speaker primarily used frication to produce the trill (AHS6: 82.1%, CHS4: 65.2%). Frication was also the predominant strategy used by heritage speakers who demonstrated non-target-like inventories. It is important to note that the most frequent non-target-like inventory found in the child heritage speaker data was frication and tap, whereas for the adult heritage speakers it was the full inventory (i.e., trill, frication, tap). This finding suggests that the child heritage speakers have not yet acquired the trill variants and the adult heritage speakers, even after acquiring the trill variants, have not abandoned the tap variants in their inventories.
The association between frication and the Spanish trill has also been attested in L2 phonological development. Morales Reyes, Arechabaleta-Regulez and Montrul (2017) found that 4-to-7-year-old American English L2 learners of Spanish produced the Spanish tap as a phonetic tap or its variants (i.e., approximant tap, perceptual tap) most of the time (89.7%), similar to Spanish monolingual children (88.9%), whereas they produced the Spanish trill with frication (i.e., fricative, tap+fricative) with much higher rates (66.7%) than their monolingual peers (37.8.%). The variant that was observed most frequently in the monolingual data was the phonetic trill (46.7%). Morales Reyes et al. (2017) also examined the relationship between the amount of exposure to Spanish and learners' realizations of the Spanish trill. 5 and found that the percentage of the phonetic trill was higher for those with more exposure to Spanish. Rose (2010) found similar patterns in the speech of adult American English L2 learners of Spanish and argued that L2 learners go through several stages when acquiring the Spanish tap-trill contrast. That is, L2 learners initially do not distinguish the two phonemes and associate both of them with the English alveolar approximant. Later, they gradually introduce other continuants in their repertoire, such as the approximant tap and the perceptual tap, and then the phonetic tap. At a later stage of the development, L2 learners begin to associate the variants that involve frication (i.e., fricative, tap+fricative) and the phonetic trill with the phonological trill, and associate the tap variants with the phonological tap. However, these studies also showed that the tap variants persisted in highly proficient L2 learners' trill inventories (Morales Reyes et al., 2017;Rose, 2010).
While heritage speakers do not share the same language history with L2 learners, the similarities found between them indicate that phonetic trills are introduced later in the phonological development of Spanish-English bilinguals and that bilingual trill development occurs in the following order with overlaps between stages: single lingual constriction → frication → multiple lingual constrictions. Note that this is very similar to the early trill development of Spanish monolingual children who acquire the trill later than the tap and often use the tap as a substitute for the trill (Acevedo, 1993;Bosch, 1983). However, unlike Spanish-English bilinguals, Spanish monolingual children abandon the tap variants by the time they reach school age (2.2%) (Morales Reyes et al., 2017). Some monolingual children at this stage may experience difficulties when producing the Spanish trill. Carballo and Mendoza (2000) argued that this is due to their tongue body shape during trill production that may not be appropriate to meet the aerodynamic requirements to successfully trill. While Carballo and Mendoza (2000) did not describe the variant as a fricative, we speculate that the non-target-like constriction observed in their study resembles the fricatives in our study.
Based on our findings, it seems that Spanish heritage speakers go through a similar developmental process as non-heritage native speakers, but they do so at a slower rate (i.e., deceleration). Moreover, even as adults some heritage speakers may not fully develop the speech motor control necessary to produce the Spanish trill and exhibit increased variability. The increase of variability found in our study aligns with Kupisch's (2020) remarks addressing that heritage speakers exploit language-inherent variation to avoid markedness. For instance, Kupisch (2020) pointed out that, when producing the Italian alveolar trill, Italian heritage speakers avoid the use of phonetic trills and instead produce phonetic taps and other variants. Similarly, Putnam (2019) argued that, as a result of constant competition between the two languages, heritage speakers acquire linguistic representations that are more gradient and less stable than those in non-heritage grammars, which may contribute to their increased variability. Thus, deceleration followed by acquisition without mastery of the heritage language (Montrul, 2016, p. 126) or unstable/unconsolidated heritage grammars (Putnam, 2020) seems to best explain divergent trill productions found in some adult heritage speakers. This calls for the addition of a fourth scenario (i.e., CHS ≠ AHS ≠ BASE) in Polinsky and Scontras's (2020) model. It is important to note that the present study included two baseline groups (i.e., Spanish monolingual speakers and longterm immigrants from Mexico) whose patterns do not completely align. Although overall the adult heritage speakers in this study diverged from both baseline groups in one or more phonetic properties, they performed more similarly to the long-term immigrants than the monolingual speakers. Specifically, the adult heritage speakers patterned like the long-term immigrants (Henriksen, 2015;Kissling, 2018) in that they produced fewer target-like trills and produced the trill with fewer lingual constrictions than the monolingual speakers (Bradley & Willis, 2012;Kissling, 2018;Lastra & Martín Butragueño, 2006). These findings suggest that heritage speakers develop a phonological system approaching that of long-term immigrants who, after living in the US for a long period of time, may show attrition in their native variety in favor of the local variety (e.g., Los Angeles Spanish) or the majority language (e.g., English). In order to confirm this, future research should carefully examine heritage speakers' source(s) of Spanish input, especially the input from long-term immigrants of their speech community, which may include varieties other than those of their parents' homeland. Given that the long-term immigrant data were collected in Chicago, Illinois (Henriksen, 2015) and Richmond, Virginia (Kissling, 2018), which differ from Los Angeles in the distribution of the Latino populations and the regional English dialects, it is possible that their varieties have undergone changes differently from the varieties of long-term immigrants in Los Angeles. While this study compared child heritage speakers at an age when full mastery of the Spanish trill is reported in monolinguals, we cannot entirely rule out other possibilities leading to attrition, such as attrition during late childhood or adolescence. That is, if heritage speakers during late childhood or adolescence demonstrate target-like production of the trill and adult heritage speakers do not, this will be a case of delayed (complete) acquisition followed by attrition. Although this scenario seems less likely based on Menke's (2018) findings which demonstrated continued trill development of child heritage speakers between ages 6;8 and 13;5, we emphasize that meta-analysis should be taken with caution when done on studies that used different research methods (see Section 3.2). Thus, future research should consider the complete age spectrum, including early childhood, late childhood, adolescence, early adulthood, and late adulthood, and make comparisons using the same data elicitation and analysis methods to fully understand heritage Spanish trill development.

Conclusion
In order to address the "missing link" (Montrul, 2018) between early bilingualism and adult heritage grammars in the literature on heritage language phonology, we compared the production of the Spanish trill by school-aged (9-10 years) and adult heritage speakers. We found that the adult heritage speakers outperformed the child heritage speakers. However, almost half of the adult heritage speakers demonstrated divergence from non-heritage native baselines. Our findings indicate that child heritage speakers during this period are still in the process of developing heritage phonological grammars, but their grammars may not reach stability in adulthood.
While our study is the first to directly compare child heritage speakers' and adult heritage speakers' production of the Spanish trill, future research should include more age groups, including those of late adolescence and late adulthood, in order to track the complete developmental process of heritage Spanish trill. Heritage speakers' divergence from monolingual norms is often claimed to result from reduced heritage language input and/or use. Although the amount of heritage language input and use account for major differences between heritage speakers and their monolingual peers, the type of heritage language input should also be taken into account. In our study, as a point of reference, target-like trill productions were considered as variants with two or more lingual constrictions (i.e., normative trill). However, this by no means indicates that heritage speakers should produce these variants categorically, given the variability of the Spanish trill found within and across dialects. Thus, future research should carefully examine the varieties to which heritage speakers are exposed and whether heritage speakers use the variants found in their input in a consistent manner.