Variegation – or the presence of two different supraglottal consonants in one lexical unit – is a central challenge for children in the early word learning period. The difficulty of producing words containing differing consonants (other than glottals or glides, which are present from an infant’s earliest vocalizations) is evidenced in the fact that children attempt few such words when they first begin to talk and produce such forms even less (Menn & Vihman, Reference Menn, Vihman, Clements and Ridouane2011). As the child’s expressive vocabulary grows, the bias in favour of producing only a single supraglottal consonant per word form continues to be in evidence.
On the one hand, the children continue to target many words that include only a single consonant. For example, in the sessions analysed for this study, from the end of the single word period, seven of the 15 English-learning children produced baby (as [beɪbi; be:bi; pebi; beɪ:bi:, beɪʔbiç; beɪbeɪ:i; bəbi; biːbiːç]). On the other hand, to meet the challenge of expanding the types of words they can produce, children typically accommodate this bias by either partial or full reduplication, consonant or syllable omission or replacement of consonants by glottals or glides. To illustrate, drawing on our English data, examples of partial reduplication (consonant harmony) from our sample include all gone [guːgɒː], bagel [bʌbu], circle [geːgə], doggy [gɒgiː], ladder [dɛːdəʰ], lady [jɛ:ji] and piggie [pεpi]. Full reduplication is less common in English but can also be observed, in both target words (bye-bye, choochoo, mama, walk-walk, woofwoof) and child forms adapting adult words to this prosodic structure: cookie [kɛkɛ], glasses [kæ̥kæ̥], picture [p’œp’œ], snowman [mɪmɪ:], balloon [bɛbɛ], bee [bʷi:bʷi::]. Note that some adaptations to reduplicated form involve repeating a monosyllable (e.g., bee), while others involve changing one or both vowels (in words whose target form already has consonant harmony, such as cookie), and yet others (for target forms without harmony) involve replacing one of the syllables by repeating the other (glasses) or producing two identical syllables that resemble one of the target syllables but accurately match neither (the remaining examples). Finally, we also see consonant omission (with metathesis: Nicky [ɪnːi]), glottal substitution (sofa [ʔəf:æ]), and gliding (lady [lɛji]) as responses to variegation in target words, which suggests the difficulty of accessing and producing such forms.
We will briefly review previous research on the processes that result in child production of word forms that lack variegation in response to variegated targets. The omission of target segments or whole syllables and the inclusion of non-target-based glottals or glides is widely recognized and has been illustrated in studies of individual children (see, for example, diary data from children acquiring Estonian, French, German and Hindi as well as English in Vihman & Croft, Reference Vihman, Croft, Vihman and Keren-Portnoy2007) or of developmental phonology more generally (e.g., Grunwell, Reference Grunwell1982, provides a useful overview of the typical chronology of phonological processes in English, including final consonant omission, which is commonly seen up to age two years).
Reduplication in child language has so far received relatively little attention, especially from a cross-linguistic perspective; early studies based on English include Fee and Ingram (Reference Fee and Ingram1982), Ferguson (Reference Ferguson1983) and Schwartz, Leonard, Wilcox and Folger (Reference Schwartz, Leonard, Wilcox and Folger1980). In contrast, consonant harmony (CH), which describes a process in which the consonantal features of one syllable ‘spread’ to another within the same word, has been widely reported and discussed (e.g., Levelt, Reference Levelt, van Oostendorp, Ewen, Hume and Rice2011; Lleó, Reference Lleó1990; Smith, Reference Smith1973; Stoel-Gammon & Stemberger, Reference Stoel-Gammon, Stemberger and Yavas1994). Most studies of consonant harmony in child forms have featured only languages spoken in Europe. However, one early cross-linguistic account analysed the extent of CH in three children learning Cantonese in addition to one to three children learning each of five European languages (Vihman, Reference Vihman, Greenberg, Ferguson and Moravcsik1978); the data covered a wide range of ages and periods of data collection. In that study the Cantonese differed from the European data in its low level of both variegated target forms and child use of CH. These findings, though limited by the small number of children per group as well as the uneven sampling, suggest that, despite its common occurrence, CH might not be a ‘universal tendency’ of child phonology (Smith, Reference Smith1973). If the occurrence of such common processes is not universal, it is worth asking how the phonological structure of different languages may guide children to different solutions to the variegation challenge.
Evidence of the nature and extent of deployment of these solutions in different languages may also provide insight into two opposing hypotheses that have been proposed as to the type of challenge variegation poses. Some have heavily emphasized the role of motoric advances or the maturation of articulatory skills. For example, Davis, MacNeilage and Matyear (Reference Davis, MacNeilage and Matyear2002) present evidence and argument to support their Frame and Content model, which sees early child word forms as based on the motoric principles that underlie canonical babble. Similarly, McAllister Byun, Inkelas and Rose (Reference McAllister Byun, Inkelas and Rose2016) have argued for articulatory limitations as the primary factor shaping child outputs in the first years of word use.
In contrast, Aitchison and Chiat (Reference Aitchison and Chiat1981) tested their intuition that (long-term) memory (or representation) is likely a key source of adult-child discrepancies in early word forms by carrying out an experiment in word production with 4- to 9-year-olds. They found that, in naming novel picture-book images as they recurred, these older children produced errors that resembled those of the early-word period, such as [tu:du:] for kudu and [kəku:n] for racoon, both with consonant harmony; articulatory limitations are unlikely to have been involved at these ages.Footnote 1
More recently, Hodges, Munro, Baker, McGregor, Docking and Arciuli (Reference Hodges, Munro, Baker, McGregor, Docking and Arciuli2016) studied the effects on longer-term learning (in two-to-three-year-olds) of accuracy in initial imitation of nonwords, testing immediately, for short-term memory effects, and 5 minutes later and then again 1 to 7 days later, for longer-term effects. The results showed that accuracy of imitation had an immediate effect only (i.e., on short term memory) and that ‘expressive phonology’ (based on standard phonological skill test scores) was unrelated to performance, whereas expressive vocabulary – or long-term memory for a range of different word forms – ‘predicted performance at all time intervals’ (p. 457). This again provides evidence that memory or representation plays a role in constraining word learning in development.
Finally, cases of metathesis in early child word forms, though rare, suggest representational, not articulatory challenges, since the child in such cases is able to produce the target sounds (with or without voicing change), but appears to lack a robust memory for their place in the word. We find just 17 such child forms out of the 991 different child disyllabic variants we coded (2%). Examples include Finnish jalka ‘foot, leg’ [gala], Japanese kiɕa ‘train’ [ciga], Mandarin ʈʂuo1tsi0 ‘table’ [tɤ1ʈʂai4]. Despite their infrequency of occurrence these examples suggest that long-term memory for word forms (and emergent pattern preferences, or templates: Vihman, Reference Vihman2019) must constitute at least one aspect of child responses to the challenge of targets with more than one supraglottal consonant.
Here we consider phonological data from five languages – English, Finnish, French, Japanese and Mandarin – that contrast sharply in their prosodic structures and accentual patterning. We will be interested in establishing, first, the extent to which children learning different languages are called on to deal with variegation in the target words that they attempt and, second, the ways in which they deal with the challenge it poses. Comparing such cross-linguistic data should also provide insight into the related question of the likely sources of child failures to match their targets in early word production. That is, if immaturity of articulatory skill is the main obstacle to accurate child production, we can expect the proportion of matches to variegated forms to be roughly equivalent across all five language groups, once we control for phonotactic structures. On the other hand, if the memory load, or representational challenge, involved in retaining complex word forms well enough to reproduce them also plays a role, then this may be expected to lead to different child responses to variegation in different languages. More specifically, we might expect different effects on representation based on target language differences in rhythmic patterning and phonotactics.
Accordingly, we address the issue of child responses to variegation in adult word targets with a specific focus on cross-linguistic similarities and differences. Based on analyses of both target words and child forms, we ask the following research questions:
(1) To what extent does variegation occur in the target words children attempt in five different languages?
(2) Does the structure of the input language affect children’s responses to the challenge of variegation? And if so, what structural characteristics of the adult language might support variegated child word form production in response to target variegation?
(3) Can the cross-linguistic data shed light on the question of the sources of child failures to match their target form? In other words, can we weigh the balance of relative articulatory as compared with representational advance in achieving accurate production? More specifically, does the data provide evidence as to whether ambient language differences – for example, in stress patterning, the presence of medial geminates, or the diversity of syllable structures – affect children’s long-term memory for (or representation of) target forms?
Note on phonological structure
To facilitate consideration of possible ambient language effects on child responses to variegation we briefly characterize relevant aspects of the phonological structure of each language included in our analyses. We focus on consonant inventory size, syllable structure and accent; we assume that these are the features most likely to affect child representation and production. We do not discuss vowel inventories, which do not directly affect our analyses.
English words have stress on the first syllable of 75% of disyllabic (trochaic) words but on the second syllable of 75% of disyllabic (iambic) phrases (Delattre, Reference Delattre1965). English has 24 consonants and permits, even in the monomorphemic forms that underlie most early child words, up to three consonants in both onset and coda positions, creating a highly diverse set of syllable types.
French lacks lexical stress but marks phrase-final syllables with lengthening (Delattre, Reference Delattre1965). The consonant inventory (N = 21) and phonotactic structure are similar to that of English in terms of diversity of syllable types, but word-final consonants are far less frequent in French (66% occurrence in American English content words in child-directed speech [CDS], 25% in French: Vihman, Kay, Boysson-Bardies, Durand & Sundberg, Reference Vihman, Kay, Boysson-Bardies, Durand and Sundberg1994).
Finnish has demarcative stress on the first syllable, but its rhythm is described as relatively even, with secondary stress falling on alternate syllables (Suomi & Ylitalo, Reference Suomi and Ylitalo2004). The consonant inventory is small (N = 11) but consonants may occur as intersyllabic geminates anywhere in the word; vowel length is also contrastive and unrestricted as to word position. The language also has front-back vowel harmony. Codas are restricted to coronals; no clusters occur word-initially or -finally.
Japanese is a pitch accent language, with a high or falling pitch on any one syllable of a word or on none (Ota, Reference Ota, Vihman and Keren-Portnoy2013). Like Finnish, Japanese has contrastive length in both vowels and consonants. However, geminate consonants are less well contrasted phonetically than in Finnish (Aoyama, Reference Aoyama2000; Kunnari, Nakai & Vihman, Reference Kunnari, Nakai and Vihman2001); consonant duration is also less consistently produced in input speech, as both consonants and vowels may be lengthened for expressive purposes (Kunnari et al., Reference Kunnari, Nakai and Vihman2001). There are 15 consonants; aside from geminate clusters, which form a coda with the preceding syllable, the only coda permitted is a nasal, which agrees in place of articulation with the next syllable onset, if any, but is otherwise uvular, often with vowel nasalization as well.
Mandarin lacks lexical stress or word-level accentual patterning but has four lexical tones, with each syllable bearing one tone (conventionally marked as numbers in phonemic transcription: 1 high level, 2 rising, 3 fall-rise, 4 falling); in addition, a ‘neutral tone’ occurs in some cases, such as on grammatical particles and, more importantly for our purposes, on the second syllable of reduplicated words in CDS. The rimes of these syllables are reduced, with the tone manifested as mid-falling after Tones 1, 2 and 4 and low-rising after Tone 3 (Cheng, Reference Cheng1973; Lin, Reference Lin2007). The only codas allowed are alveolar or velar nasals. There are two syllable types, C0V1-3 or C0VV0N (that is, an optional onset consonant followed by up to two nuclear vowels, optionally followed by either a third vowel or a nasal consonant); the language is said to have an inventory of only some 400 unique syllables, if tone is disregarded (Deng & Dang, Reference Deng, Dang, Li and Lee2007) – while English has some 9000 (Huff, Reference Huff2017). Mandarin has 25 consonants. It does not allow syllable-initial or -final clusters, but in disyllabic words first-syllable codas create word-medial clusters with second-syllable onsets.
The languages in our sample differ in numbers of contrasting consonants and syllables, dominant accent patterns, and syllabic and metric structure. As suggested above, if articulatory immaturity is the sole or primary source of child failure to accurately match target words, we may reasonably expect the children in each group to show a similar range of responses to the challenge of variegation in producing words with comparable phonotactic structures. That is, regardless of the language they are learning, children should employ similar strategies to cope with variegation in words with an equivalent structure, such as, for instance, C1VC2V, where C1 and C2 are consonants with different supraglottal places of articulation. If, instead, we find systematic cross-linguistic differences in the distributions of child responses, then this could be taken to reflect rhythmic and other phonological differences in the children’s linguistic exposure and experience, which would in turn suggest a role for access to long-term memory or representation in early word form production.
Our data derive from children acquiring languages representing five distinct families, two of them Indo-European (English [Germanic] and French [Romance]), one Finno-Ugric (Finnish), one a likely isolate (Japanese) and one Sinitic (Mandarin Chinese). The six samples consist of previously reported data sets for US and UK English, Finnish, French and Japanese and a new data set for Mandarin Chinese. The American English data were collected in California and New Jersey, the UK English data in Bangor, Wales, and York, England, the Finnish data in Oulu, Finland, the French data in Paris and the Japanese data in the United States, one corpus from California, the other from Washington D.C. The Mandarin data were recorded in Yorkshire, England. Although English is the dominant language in the communities where the Japanese and Mandarin data were collected, all the children recorded mainly hear the heritage language from both parents and none produced more than four English words, if any, in the data sampled for this study.
We draw on data from the end of the single-word period, as this is a time when the child’s phonetic skills are still slowly developing, while at the same time the rate of vocabulary learning shows a steady, rather rapid increase (Ganger & Brent, Reference Ganger and Brent2004). Most of the typical phonological processes can be observed in this period, including those that we have picked out as possible responses to the challenge of variegation (i.e., of production of whole words, not individual segments or clusters): reduplication, consonant harmony, inclusion of non-target glottals and glides and consonant or syllable omission (Grunwell, Reference Grunwell1982). Accordingly, this is an optimal developmental point for assessing the effects, if any, of different linguistic structures on children’s deployment of these processes in their attempts to produce variegated target words.
In order to ensure that the children’s lexical level was similar enough cross-linguistically to permit a meaningful comparison, we selected samples from longitudinal data based on extent of spontaneous word production, a reliable index of child lexical advance in this period (Vihman, Reference Vihman2019). We identified the first half-hour session in which each child produced 25 or more lexical items (words or phrases) spontaneously (see Table 1). This corresponds to the end of the single-word period for most children, with the child’s first word combinations typically occurring within a month of the ‘25-word-point’ session. (The longitudinal recording sessions for three of the Japanese children – Hiromi, Kenta and Takeru – were over an hour long. We thus chose sessions with 25 words in the first 30 minutes but included all the words produced in the session in our analyses.) Imitations are also included in the analyses but do not contribute to our estimates of vocabulary size. The data we analyse consist of both the target words that the children attempt and the child forms themselves.
* Twenty-five spontaneously produced words identified in the first 30 minutes of hour-long recording; total spontaneous words recorded indicated here.
Note: Data sources are, for US English, Vihman & McCune (Reference Vihman and McCune1994); for UK English, Keren-Portnoy, Vihman, DePaolis, Whitaker & Williams (Reference Keren-Portnoy, Vihman, DePaolis, Whitaker and Williams2010) and DePaolis, Keren-Portnoy & Vihman (Reference DePaolis, Keren-Portnoy and Vihman2016); for Finnish, Kunnari (Reference Kunnari2000) and Savinainen-Makkonen (Reference Savinainen-Makkonen2001); for French, Boysson-Bardies, Vihman, Roug-Hellichius, Durand, Landberg & Arao (Reference Boysson-Bardies, Vihman, Roug-Hellichius, Durand, Landberg, Arao, Ferguson, Menn and Stoel-Gammon1992) and Wauquier & Yamaguchi (Reference Wauquier, Yamaguchi, Vihman and Keren-Portnoy2013); for Japanese, Boysson-Bardies et al. (Reference Boysson-Bardies, Vihman, Roug-Hellichius, Durand, Landberg, Arao, Ferguson, Menn and Stoel-Gammon1992) and Ota (Reference Ota2003); and for Mandarin, Lou (Reference Lou2021).
Target words and child forms by prosodic structure
We began by making a preliminary analysis of the target words produced in each recorded session, counting the targets of all child word forms produced as single units, not combinations (i.e., including some forms targeting fixed expressions such as what’s this). Table 2 indicates the distribution by length in syllables. The table shows that one- and two-syllable words account for over 90% of the words the children attempt in every group except Japanese, where they account for 78%. Table 2 also shows that the occurrence of monosyllabic and disyllabic targets differs cross-linguistically. Monosyllables outweigh disyllables in both the English groups; disyllables dominate in the other languages, but most sharply in Finnish and Japanese. Because monosyllables are variegated only when they include codas or clusters and these occur only rarely in Finnish or Japanese, children learning those languages encounter variegation mainly in longer words – and this mostly means disyllables. Accordingly, we restrict our analyses largely to disyllabic adult words, which are targeted in sufficient numbers in all our language groups to permit a controlled comparison. In addition, we include target monosyllabic words and words of more than two syllables whenever they result in child disyllabic forms; this makes it possible to gain a more complete idea of the children’s responses to variegation, as 74% are variegated (109 of 148 monosyllabic or longer-than-disyllabic targets). Disyllabic targets make up 83% of the words included.
Note: The number of children in each group is indicated in parentheses after each language name, in this table only. Ordered by proportion of disyllables; group means based on individual child means for each word length.
Note. Mandarin tones are indicated using a number for each syllable in both target and child forms: 1 high level, 2 rising, 3 falling-rising, 4 falling. RED reduplication, CH consonant harmony, VRG variegation, Mono monosyllabic target. Variegated targets and child forms are shaded.
For each child learner we then established the variant forms of each word that fall into the four basic prosodic structures that we distinguish for the purposes of this analysis: variegation (or words with more than one supraglottal consonant type: VRG), full reduplication (RED), partial reduplication (or consonant harmony: CH), and OTHER, which includes words with no more than a single supraglottal consonant (e.g., VCV or glottal or glide in any position plus at most a single supraglottal consonant). In addition, we include, for adult target words only, monosyllabic (MONO) and longer-word (LONGER) targets for disyllabic child forms (e.g., French tiens ‘here, take it’ /tjɛ̃/, produced as disyllabic [tete]; Japanese inaijo ‘I’m not here’, produced as [naɪjɔ:], or omeme ‘eyes’, produced as [m:emɛʔ]). Any target form that contained two supraglottal consonants was classified as VRG, including reduplicated syllables with a coda (e.g., night-night), words with a medial cluster with two or more supraglottals (e.g., Elma), variegated monosyllables (e.g., bus) and variegated longer targets (e.g., spaghetti).
Selecting examples from one child from each language group, Table 3 shows all the types of prosodic structures targeted for that child’s disyllabic forms and all the types produced. For each child we show one form from every category represented in that child’s sample.
Note that for Finnish and Japanese, which have medial geminates, we disregard differences in either vowel or consonant duration in categorizing word forms as reduplicated, so that target words like Finnish pappa /pap:a/ ‘grandpa’ or Japanese nen:e ‘sleep’ (both CVCCV) are treated as reduplicated despite the fact that, strictly speaking, the first syllables /pap/ and /nen/ are not repeated. Similarly, for Mandarin we disregard tone differences in categorising word forms, as they do not contribute to the variegation challenge per se. Furthermore, accuracy in disyllabic tone pattern production has been found to be independent of accuracy in segmental pattern production at this developmental point (Choo, Reference Choo2022). Recall that the second syllable of Mandarin reduplicated words tends to have neutral tone in CDS (71% of child targets here, based on the means for each child), but some repeat the tone (20%) and a few change tone (9%). Thus ‘reduplication’ refers to the segmental sequence only.
In our analysis of child forms we count no more than one variant per word in any one structure (following Vihman, Reference Vihman2019), but include as many structures per word as the child forms warrant (see, in Table 3, Flora’s forms for English baby, Kazuko’s forms for Japanese /dak:o/ ‘hug’ or Keke’s forms for Mandarin /ɕiɛ4ɕiɛ0/ ‘thanks’). Note that differences in voice onset time (voicing or aspiration: VOT), which are not reliably produced in this developmental period (Macken, Reference Macken, Yeni-komshian, Kavanagh and Ferguson1980), are not taken to constitute variegation (e.g., Deborah’s form [k̥ɪ̥g̥w̥ɛ̥] for English spaghetti is categorized as CH with velar harmony despite the voicing change); VOT contrasts are also disregarded in categorising target words. Also, clusters, which may occur in any position in English or French but only medially in Finnish, Japanese or Mandarin, complicate the picture to some extent. All word types with variegation – whether across vocalic intervals or in clusters – are combined in the variegation tallies for both targets and child forms, so that, for example, we treat as variegated both Finnish kaksi ‘two’, with its two distinct supraglottal onset consonants /k/ and /s/, and ankka ‘duck’, the sole medial consonantal slot of which is filled by a cluster of two supraglottal consonants. Finally, note that although we treat consonants with full or partial supraglottal closure differently from glides or glottals for the purpose of identifying variegated forms, reduplicated and harmony forms may consist of glides or glottals only (e.g., for targets, reduplicated French ouioui /wiwi/ ‘yes, yes’, Finnish hauhau ‘woofwoof’; for child forms, harmonized US English hello [hɔʔhi:::], Japanese /dʑadʑa/ ‘bath, pouring water’ [ʝeʝaʰə]). These are then treated as RED or CH, not as OTHER.
Results: Variegation in targets and child word forms
We begin with an overview of the relative frequency of occurrence of variegation in the target words children produced as disyllables in each language group. Table 4 presents the median counts of each prosodic type targeted by the children. In Figure 1, these are converted to proportions with respect to the total disyllables produced by each child and presented as means for language groups in order to allow comparisons within and between languages. Here we see, in direct response to RQ1, that variegation is by far the most common prosodic structure attempted overall, accounting for 52% of the target words on average. Nevertheless, we can also see that the language groups differ, with Japanese (at 42%) and Mandarin (at 29%) having the smallest proportion of variegated targets. Mandarin is the only language in which reduplication (at 52%) exceeds variegation in target words (see Fig. 1).
Note. VRG variegation (shaded), RED reduplication, CH consonant harmony. OTHER no more than one supraglottal consonant. MONO or LONGER Target with one syllable or more than two syllables that are produced by the child as disyllable.
The distribution of target word counts by language and prosodic structure was submitted to a mixed-effects Poisson regression analysis. As fixed effects, Language and Structure were deviation-coded against the grand mean for each category, with French and OTHER chosen as the levels coded as -1. The procedure was run using the lmer4 package on R. The initial model also contained by-participant random intercepts and slopes for Language and Structure (Count ~ Structure * Language + (1 + Structure * Language | Child), but it failed to converge. Model reduction was performed by reducing the random factor structure, first by removing the interaction between the slopes and then the individual slopes, first for Language, then for Structure, until convergence without singularity was obtained. The final model included an interaction term for Language and Structure and random intercepts for individual children (Count ~ Structure * Language + (1 | Child)). The full results are presented in Appendix A.Footnote 4 Against the expected frequencies based on the overall distributions along Language and Structure, the observed frequencies for VRG targets were significantly higher in US English and Finnish while they were lower in UK English and Japanese. Observed frequencies of RED targets were significantly higher than expected in Japanese and Mandarin but lower in the two varieties of English. Frequencies of CH targets were higher in Finnish but lower in Japanese. Finally, both US and UK English had more MONO/LONGER targets than expected.
Next, as a first step toward responding to RQ2, we consider the distribution of the disyllabic child word forms produced for variegated targets in each language group. Table 5 presents the median counts of each prosodic type and Figure 2 shows the distributions as proportions with respect to the total disyllables produced by each child for variegated targets. The results reveal that children learning Mandarin or Japanese produce a higher proportion of disyllabic variegated forms for variegated targets than do the children learning any of the European languages. This is striking, given the relatively small proportion of variegated targets they attempt (Figure 1).
Note. VRG variegation (shaded), RED reduplication, CH consonant harmony, OTHER no more than one supraglottal consonant. Means for each group based on individual means in each structure.
As with the target words, we conducted a mixed-effects Poisson regression to compare the distributions of the child forms belonging to each prosodic category. Details of the category coding and model selection procedure were the same as those used for the target analysis. The final model consisted of both Language and Structure as fixed effects and by-participant random intercepts. The full results are presented in Appendix B. Against the expected frequencies based on the overall distributions along Language and Structure, the observed frequencies for VRG child forms were significantly higher in Japanese and Mandarin while they were lower in Finnish. Observed frequencies of RED child forms were significantly higher than expected in Mandarin but lower in UK English. Frequencies of CH child forms were higher in Finnish but lower in Mandarin. On the whole, these outcomes are not inconsistent with what we saw in the target analysis, but one crucial difference is that Finnish has a relatively low proportion of child VRG forms despite having a relatively high proportion of VRG targets and, conversely, Japanese and Mandarin have a relatively higher proportion of child VRG forms despite having relatively low proportions of VRG targets (although the latter was not statistically significant for Mandarin).
Child responses to variegated targets
Having established the extent and distribution of the children’s responses to variegated targets in each language group, we now examine these results more closely. Our target word analysis showed that both Japanese and Mandarin words provide a high proportion of reduplication and a small proportion of harmony alongside a comparatively low proportion of variegation. Our child data show that the relative use of reduplication and consonant harmony tends to be complementary: all groups make use of both, but harmony is rare in Mandarin while it is about equally common in the other groups; the balance between the two processes is roughly related to occurrence in the targets, with the high proportion of reduplication in Mandarin child forms, in particular, seeming to reflect its occurrence in targets. Combining the proportions for reduplication and harmony, we can see that, for all groups, some form of repetition is a resource for responding to variegation in target words. The third option, OTHER, is less clearly related to the proportions seen in the targets. In the next section we look more closely into the cases where children have recourse to that option in producing variegated forms.
Child uses of OTHER
In the European language groups OTHER accounts for a higher proportion of child forms for variegation (Fig. 2) than of targets (Fig. 1), but that is not the case in Japanese or Mandarin. We carried out an analysis to determine whether children in the different language groups also tend to differ as to their preferred single-consonant output forms. Table 6 illustrates children’s adaptations of variegated target words to OTHER forms, with at least one example for each subcategory, where possible; empty cells indicate that no instances occurred in our data.
As some subcategories occurred only rarely, for the purpose of quantification we combined (i) forms with either onset or medial glide, (ii) forms either lacking an onset or having a glottal onset and (iii) forms lacking a medial consonant or having a glottal medial consonant. Table 7 shows the distribution of OTHER forms by subcategory for each language.
Note. The most-used structures in each group are in bold face. Means for each group based on individual means in each structure.
Forms lacking a word-initial supraglottal consonant account for a particularly high proportion of the Finnish and French forms. Heavy use of onset consonant omission, affecting even early-learned consonants, has been reported elsewhere for both Finnish (e.g., nalle ‘teddy’ [al:e], pallo ‘ball’ [al:o]) and French (e.g., debout ‘standing’ [əbu], garçon ‘boy’ [aʐa]) and ascribed to the perceptual effect of medial geminates in Finnish (Savinainen-Makkonen, Reference Savinainen-Makkonen, Vihman and Keren-Portnoy2007) and of phrase-final accent in French (Vihman & Croft, Reference Vihman, Croft, Vihman and Keren-Portnoy2007). Both these effects of perceptual salience (lengthened medial consonant closure, final syllable lengthening) have been shown experimentally to detract from attention to (and thus representation of) the word-initial consonant (cf. Vihman & Majorano, Reference Vihman and Majorano2017; on geminates in Italian; Hallé & Boysson-Bardies, Reference Hallé and Boysson-Bardies1996; on final syllable accent in French, Vihman, Nakai, DePaolis & Hallé, Reference Vihman, Nakai, DePaolis and Hallé2004; on iambic accent in Hebrew, Segal et al., Reference Segal, Keren-Portnoy and Vihman2020).
In English, on the other hand, strong first-syllable stress means that initial consonant omission is rare (Vihman et al., Reference Vihman, Nakai, DePaolis and Hallé2004); the children’s VCV forms seldom derive from disyllables with a supraglottal initial consonant but include vowel-initial target words, monosyllables and longer words with truncated first syllables (Table 6). Instead of initial consonant omission, glide substitution is the most common pattern; this relates in part to the common process of gliding liquids (e.g, balloon [bɛ:[j]ʊ] or, from another child, [ɪwu:un]), but glides in the child forms do not always relate to a target liquid (Table 6). Japanese learners make roughly equivalent use of each of the subcategories. Finally, the very few OTHER child forms for variegated targets in Mandarin all involve glide substitution.
The structure of the input language clearly affects children’s responses to variegation in every category, leading to differences in relative use of full and partial consonant harmony and in reduction of the variegated form to a single-consonant output (OTHER), despite the fact that the challenge and the ‘in-principle solutions’ involved must be similar cross-linguistically. Our analysis also revealed that the challenge involved in variegation is more successfully handled by some language groups than others, as evidenced by the relatively high proportion of Mandarin- and Japanese-learning children’s variegated forms for variegated targets. We turn our attention now to the second part of RQ2, the question as to how structural differences between the languages might lead to child differences in the ability to reproduce variegated targets in general.
The effect of structural predictability on child responses to variegation
One source of cross-linguistic differences in child ability to reproduce variegated targets might be the relative complexity of the shapes of those target words. To test that possibility, we reanalysed the European-language data, leaving out all words that include word-final consonants other than nasals or supraglottal consonant clusters other than medial nasal + consonant, which Japanese and Mandarin structure allow. In other words, we reduced the variegated target words that the children attempted in the European languages to just those that conform to the more constrained structures of Japanese and Mandarin, to test whether the children were more able to produce variegated forms for those simpler variegated targets: that would suggest that the greater success of the children learning Japanese and Mandarin at using variegated forms for variegated targets could be ascribed to the fact that their target words are simpler in shape and therefore easier to match. Table 8 presents the median counts of each prosodic type targeted by the children. In Figure 3, these are converted to proportions with respect to the total disyllables produced by each child and presented as means for language groups to allow comparisons within and between languages. This analysis does not affect Japanese and Mandarin, so the data from those languages are simply repeated from Table 4 and Figure 1.
Note. VRG variegation (shaded), RED reduplication, CH consonant harmony. OTHER no more than one supraglottal consonant. MONO or LONGER Target with one syllable or more than two syllables that are produced by the child as disyllable. Data for Japanese and Mandarin are the same as those from Table 4.
We ran a mixed-effects Poisson regression to compare the distributions of the child forms belonging to each prosodic category, following the same model selection procedure used for the analysis of the data in Table 4 (see Appendix C for full results). The final model consisted of fixed effects for Language and Structure and by-participant random intercepts. Under this analysis, VRG targets were still significantly more frequent than expected in Finnish but lower in UK English. RED targets were significantly more frequent than expected in Japanese and Mandarin, but lower in UK English. Conversely, CH targets were more frequent in UK English but lower in Japanese and Mandarin. MONO and LONGER targets were more frequent in the two English dialects. This analysis of the stripped-down target words shows that the different European languages are affected differently in terms of VRG targets: Finnish, with its restriction of codas to coronals and of clusters to word-medial position only, remains little changed, while the number of variegated target words in English and French is reduced.
We now turn to the distribution of the disyllabic child word forms produced for variegated targets in each language group, but without those with word-final non-nasals and without all but medial nasal clusters. Table 9 presents the median counts of each prosodic type using this stripped-down data set and Figure 4 shows the distributions as proportions with respect to the total disyllables produced by each child. The results of the statistical analysis using Poisson regression show few differences from those applied to the entire data (full results are presented in Appendix D; the model selection procedure and the final model structure were the same as in the other analyses above). Most importantly, VRG forms were significantly more frequently produced by Japanese and Mandarin children and less by Finnish children (see Appendix D).
Note. VRG variegation (shaded), RED reduplication, CH consonant harmony, OTHER no more than one supraglottal consonant. Data for Japanese and Mandarin are the same as those from Table 5.
In other words, the reanalysis, which filters the European target words through the syllable-structure constraints of Japanese and Mandarin, has little effect on the proportion of child variegated forms produced in response to those targets. Finnish targets are very little changed by this reduction and child production of variegation for variegated targets remains very low in Finnish in comparison to English and French, which changed more substantially where targets are concerned (Table 8); English and French child variegated forms for variegated targets remain at approximately the same level as in the full analysis (Table 5). Overall, the proportion of variegated forms that we find for Japanese and Mandarin child forms now appears even more extreme, given the lower mean for the European languages.
From this we conclude that the greater proportion of variegated forms that Japanese- and Mandarin-learning children produce for variegated targets cannot be accounted for – or not entirely – by the simplicity, or the restricted set of structures, represented by the particular adult word forms that they are trying to produce: reducing the targets of children learning European languages to comparably simpler structures does not result in a greater proportion of variegation in the children’s word forms. In short, although the overall simplicity of syllable structure in these East Asian languages appears to affect child production of variegated targets, it cannot tell the whole story; the children’s wider experience of the language must be playing a role. The challenge for children learning languages such as Finnish, English and French, then, must not be so much the specific difficulty of individual variegated words, such that a selection of simpler variegated forms would pose a lesser challenge. The difficulty seems rather to be in the overall complexity of the phonological structure of those languages. That is, variegation appears to present a problem for the children not only in terms of handling the occurrence of more than one supraglottal consonant per word, but also in terms of retaining the many different structures and positional combinations of such consonants that those languages allow.
We therefore propose that the crucial difference between the East Asian languages and the European languages analysed here may be the number of different syllables required to map out the words of the language. The smaller the syllable inventory, the easier it should be for a learner to retain, represent and reproduce words by combining the possible syllables, allowing them to produce variegated outputs more successfully. To test this idea, we provide two analyses of syllable inventory size in the languages of interest.
The first column in Table 10 shows the syllable inventory size of (UK) English, Finnish, French, Japanese and Mandarin.Footnote 5 Note that durational differences in both vowels and consonants are respected but tonal differences were disregarded for Mandarin syllables such that syllables with different tones were considered the same if they had the same segmental structure.Footnote 6 To check that this observation extends to a typical child lexicon, we applied the same principle to our data by merging the disyllabic target words of the various children for each language group and calculating the number of unique syllables that make them up. Here again we disregarded tone in classifying Mandarin syllables. The relevant figures, shown in the third column of Table 10, have been scaled to per-10-word values to allow direct cross-linguistic comparison. The results show that children learning Mandarin or Japanese need only about 10 syllables to represent/produce 10 disyllabic words, whereas children learning English, French and also Finnish – despite its relatively simple syllable structure – need more.
Note. All data for adult languages, except Mandarin, are the number of unique syllables found in the most frequent 20,000 words in adult corpora (Oh, Reference Oh2015). The count for adult Mandarin is the number of segmentally-defined syllable types in the Xinhua dictionary (Xia, Reference Xia2000). Estimates for child data are based on the number of syllable types per 10 disyllabic words found in the corpora.
A remaining puzzle is the far lower rate of child variegated production in Finnish as compared to English and French. The Finnish syllable inventory is larger than that of Japanese or Mandarin, falling between those of English and French. Finnish phonological structure is less restrictive than that of the two Asian languages – it allows all five of its coronal consonants (/t, s, n, l, r/) to occur as codas, both word-medially and finally – but more restrictive than English or French. As in Japanese, the nasal in Finnish nasal + stop clusters assimilates in place to the adjacent stop; the velar nasal can also occur as a geminate, under morphophonological alternation with /ŋk/. Given the size of the syllable inventory involved, it is not surprising that Finnish children use fewer variegated productions for variegated targets than the Japanese and Mandarin learners. But what can explain the low rate of variegation in child Finnish as compared with English and French?
The key observation here is that Finnish children respond more often to variegation with the category OTHER than children learning any of the other languages, primarily due to omission of the word-initial consonant – and this is the case even though Finnish consistently stresses the first syllable of content words. Finnish children clearly pay reduced attention to the word-initial consonant (Savinainen-Makkonen, Reference Savinainen-Makkonen2000); we assume that this is related to the presence of medial geminates (Vihman & Majorano, Reference Vihman and Majorano2017). In fact, of the 25 target forms with geminates, all but one (pallo ‘ball’ > [a.o]) are produced as VCV forms; complementarily, of the 25 Finnish child forms produced with initial consonant omission for consonant-initial variegated targets, just four of the target words have medial singletons.
In French the accentual lengthening of the second syllable of disyllabic words similarly draws attention away from the onset consonant (Hallé & Boysson-Bardies, Reference Hallé and Boysson-Bardies1996; Vihman et al., Reference Vihman, Nakai, DePaolis and Hallé2004); the considerable difference in child variegation in French as compared to Finnish appears to be due to the higher use of reduplication and harmony in Finnish.
Japanese also has geminates, but as noted earlier, these are less strongly contrasted than is the case in Finnish and less reliably produced in input speech; Japanese children’s production of duration also takes longer to become adult-like (Aoyama, Reference Aoyama2000). Furthermore, only half as many of the words that the children target have geminates in Japanese (24%, averaged across the 7 children) as in Finnish (51%, for the 5 children). Both the weaker acoustic difference and the lower input frequency help to account for the lesser impact of geminates on child production in Japanese. This leads us to conclude that the basic difference between Finnish and Japanese is likely representational: Finnish children appear to best retain the medial consonant while Japanese children retain something of the variegated sequence.
We set out to establish the extent to which children learning languages of differing phonological structure must face the challenge of producing variegated adult words and to explore how differently they respond to that challenge. Our findings are necessarily limited by the size of our language groups and the relatively small number of words available for analysis for each child. Nevertheless, we found clear answers to our first two research questions. First, we found that variegated adult targets made up over half the words the children attempted to say overall, although there were differences by language. Secondly, we found clear differences by language group in child responses to target-word variegation. The Mandarin and Japanese child forms for variegated targets were considerably more often variegated than those of the other language groups.
We found evidence that these differences are related to structural differences in the adult languages. There were sharp differences by language in the children’s uses of the processes that reduce the numbers of different supraglottal consonants in word forms. The children learning Mandarin tended to produce full reduplication, reflecting the high incidence of reduplication in the input, while the other groups made greater use of partial reduplication, or consonant harmony; when these are taken together, however, we see that they amount to much the same process, which reflects children’s preference for forms with consonant repetition rather than variegation. In addition, initial consonant omission was strongly represented in French and Finnish, a likely reflection of the adult-language rhythmic differences mentioned earlier; (glottal)VCV forms were less often produced in English and Japanese and not at all in Mandarin.
We return now to the more fundamental question of the relative importance, for the shaping of children’s word forms, of the maturity of their articulatory skills in comparison with the adequacy of their long-term representations of the words they attempt to produce. Note that vocal practice, first in prelinguistic babbling and then in early word production, contributes to both aspects: vocal practice necessarily improves articulatory skills, but it also lends salience (and thus memorability) to aspects of input speech that are like what the child is producing, or that are, in other words, familiar from their own often-repeated output (Majorano, Vihman & DePaolis, Reference Majorano, Vihman and DePaolis2014; Vihman, DePaolis & Keren-Portnoy, Reference Vihman, DePaolis and Keren-Portnoy2014; Waterson, Reference Waterson, Vihman and Keren-Portnoy1971). Beyond that, vocal practice lays down the foundations of phonological memory, or the ability to retain novel word forms (Keren-Portnoy et al., Reference Keren-Portnoy, Vihman, DePaolis, Whitaker and Williams2010; Vihman, Reference Vihman2022); this helps to account for the fact that lexical advance itself supports further word learning (Fernald, Swingley & Pinto, Reference Fernald, Swingley and Pinto2001; Torkildsen, Hansen, Svangstu, Smith, Simonsen, Moen & Lindgren, Reference Torkildsen, Hansen, Svangstu, Smith, Simonsen, Moen and Lindgren2009). In short, both articulatory and representational factors undoubtedly play a role in early word learning; the debate over what supports accurate production likely cannot be resolved in favour of one or the other alone.
We argued that if variegation was difficult for articulatory reasons alone, then we should see essentially the same degree of difficulty – especially in relation to variegated targets – in children learning any language. And indeed we do see that all of the children respond to variegated targets with less variegation than is found in the targets and produce more of the simpler prosodic structures (reduplication, consonant harmony or forms involving consonant omission or reduction to glottals or glides). However, we found that children learning different languages differ in the extent to which they resort to solutions that do not require variegation. Interestingly, these differences do not accord with the degree to which the languages present children with the need to produce variegation. Although we might expect that experiencing proportionately more variegation in the words they attempt provides children with more opportunities for vocal practice with variegation, our results go in the opposite direction: children learning either Mandarin or Japanese, whose targets are less often variegated, still succeed more often in producing variegated forms for those targets than do children in the other groups. Thus, our results suggest that it is highly unlikely that immature articulatory abilities are the single most important source of difficulty with producing words in an adult-like manner around the end of the single-word period.
Although we have found that the structure of the input language does affect child responses to variegation, we have no conclusive answer as to just how those structural effects translate into production differences. Target language shaping is partially rooted in perceptual salience: recall our earlier comments on the effects of both geminates and accentual patterning on infant word-form recognition. However, as noted above, the ambient language structure also affects infants’ production experience, based on the options the language provides, which leads to ongoing accommodation to the structural requirements of the language. That is, sensorimotor experience (production practice) helps to develop children’s articulatory skills while at the same time tuning up their sensitivity to the phonological patterning of their language.
The range of syllable choices in a language and differences in prosodic structure both seem to contribute to the differences in child responses to variegation (see Post & Payne, Reference Post, Payne, Prieto and Esteve-Gilbert2018). Languages like English, French and Finnish, with their greater range of syllables, are more challenging for some aspects of early word learning than Japanese or Mandarin. Both hearing and producing forms that are phonotactically simple and relatively predictable (disregarding effects of lexical tone, pitch change or segmental duration) makes the remembering or planning of the production of such forms an easier task than can be the case when each word may involve any one of a considerable range of syllable shapes and of possible consonants for each syllabic slot. That multiplicity of options must increase the challenge of remembering and/or planning word production.
Furthermore, it has been demonstrated that overall familiarity (from production practice) with subcomponents (syllables or segments) of a novel form leads to greater accuracy in production (Cychosz, Erskine, Munson & Edwards, Reference Cychosz, Erskine, Munson and Edwards2021; Dollaghan, Biber & Campbell, Reference Dollaghan, Biber and Campbell1995; Keren-Portnoy et al., Reference Keren-Portnoy, Vihman, DePaolis, Whitaker and Williams2010). This suggests that having a less diverse inventory of possible syllable types to learn and produce might well support more ambitious and more accurate word learning, as the child’s repertoire of distinct syllabic motor routines could more readily be recruited to first retain and then reproduce novel patterns. Here Japanese and Mandarin would present some advantages – although learning, for each lexical item, contrastive vowel and consonant duration and pitch accent (in Japanese) or tone patterns (in Mandarin) adds a layer of difficulty that we have disregarded here. In general, the lesson we draw from our comparison of ‘stripped down’ English, Finnish and French with Japanese and Mandarin is that it is not just the structure of individual word targets but the entire system or set of possible structures a child has experienced that shapes child responses to variegation.
The authors declare none.
The authors thank Ryan Quinn, undergraduate student at York, for asking the question that got us started.
Appendix A: Model estimates for the frequency of target words
Notes. Significance codes: ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05
Appendix B: Model estimates for the frequency of child forms
Notes. Significance codes: ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05
Appendix C: Model estimates for the frequency of target words (structurally-adjusted)
Notes. Significance codes: ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05
Appendix D: Model estimates for the frequency of child forms (Structurally-adjusted)