Hostname: page-component-89b8bd64d-9prln Total loading time: 0 Render date: 2026-05-07T06:33:02.405Z Has data issue: false hasContentIssue false

Frequency, redundancy, and context in bilingual acquisition

Published online by Cambridge University Press:  12 December 2024

Paul Ibbotson
Affiliation:
School of Education, Childhood, Youth and Sport, The Open University, Milton Keynes, UK
Stefan Hartmann*
Affiliation:
Faculty of Arts and Humanities, German Department, Heinrich-Heine-University Düsseldorf, Germany
Nikolas Koch
Affiliation:
Institute for German as a Foreign Language, Ludwig-Maximilians-Universität München, Germany
Antje Endesfelder Quick
Affiliation:
Department of British Studies, University of Leipzig, Germany
*
Corresponding author: Stefan Hartmann; Email: hartmast@hhu.de
Rights & Permissions [Opens in a new window]

Abstract

We report findings from a corpus-based investigation of three young children growing up in German-English bilingual environments (M = 3;0, Range = 2;3–3;11). Based on 2,146,179 single words and two-word combinations in naturalistic child speech (CS) and child-directed speech (CDS), we assessed the degree to which the frequency distribution of CDS predicted CS usage over time, and systematically identified CS that was over- or underrepresented in the corpus with respect to matched CDS baselines. Results showed that CDS explained 61% of the variance in CS single-word use and 19.3% of the variance in two-word combinations. Furthermore, the bilingual nature of the over or -underrepresented CS was partially attributable to factors beyond the corpus statistics, namely individual differences between children in their bilingual learning environment. In two out of the three children, overrepresented two-word combinations contained higher levels of syntactic slot redundancy than underrepresented CS. These results are discussed with respect to the role that redundancy plays in producing semiformulaic slot-and-frame patterns in CS.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Table 1. Overview of the number of utterances in the three datasets. The CS data have been tagged for whether they are German, English, or code-mixed. Differences to 100% were categorised as ambiguous (e.g., one-word utterances like hm? that cannot be clearly assigned to one of the two languages.)

Figure 1

Figure 1. Top graphs represent the proportion of CDS frequency variance that explains CS frequency variance over the course of development, single words = orange line, two-word combinations = blue line, linear trends = dotted lines. Numbers in the table represent the total number of tokens entered into the analyses and the R2 expresses the strength of the association between CDS and CS, and whether it significantly changes over time (p-value).

Figure 2

Table 2. The disassociation of language and under- and overrepresentativeness by single words and two-word combinations and associated Chi-squared statistics.

Figure 3

Figure 2. Bilingual character of the outliers over the course of development. German = orange, English = blue. y-axis =outliers, and x-axis =age.

Figure 4

Figure 3. Redundancy of CS two-word combinations over the course of development. Overrepresented use = blue line and underrepresented = orange line. As before, the y-axis is limited to 25 because we limited ourselves to the 25 highest and lowest residuals.