Early language experience in a Papuan community

Abstract The rate at which young children are directly spoken to varies due to many factors, including (a) caregiver ideas about children as conversational partners and (b) the organization of everyday life. Prior work suggests cross-cultural variation in rates of child-directed speech is due to the former factor, but has been fraught with confounds in comparing postindustrial and subsistence farming communities. We investigate the daylong language environments of children (0;0–3;0) on Rossel Island, Papua New Guinea, a small-scale traditional community where prior ethnographic study demonstrated contingency-seeking child interaction styles. In fact, children were infrequently directly addressed and linguistic input rate was primarily affected by situational factors, though children's vocalization maturity showed no developmental delay. We compare the input characteristics between this community and a Tseltal Mayan one in which near-parallel methods produced comparable results, then briefly discuss the models and mechanisms for learning best supported by our findings.

A key puzzle for developmental language science is then uncovering how the human cognitive toolkit for language learning can flexibly adapt to the variable circumstances under which it occurs, including circumstances in which CDS is infrequent, is produced in large part by other children, or is primarily restricted to a small number of activities (Brown, 2014;Casillas, Brown & Levinson, 2019;Gaskins, 2006;Ochs & Schieffelin, 1984). Resolving this puzzle requires researchers to find ways to track the distribution and characteristics of linguistic input over multiple interactional contexts, across developmental time, between families, and across different cultural groups. In what follows we explore two major factors that may impact children's linguistic environments: culturally held ideas about talking to children, and situational features of everyday life. We build a case for testing both sources of variation using clips sampled from recordings of whole waking days at home. We then use this approach to report on the language environments of children under 3;0 in one child-centric subsistence farming society (Yélî (Papuan), Rossel Island, Papua New Guinea), and compare the findings to a parallel set of results from another subsistence farming society that is, by contrast, not child-centric (Tseltal Mayan, Tenejapa, Mexico).

Ideological and situational variation in CDS
Caregivers' personal and cultural notions about how children should develop as members of the broader language community influence the prevalence and style of their child-directed talk (Gaskins, 2006;Harkness & Super, 1996;Ochs & Schieffelin, 1984;Rowe, 2008). For example, extensive ethnographic research among multiple, distinct Mayan communities of Southern Mexico and Guatemala has forged a consistent view of childrearing and child-directed speech: adult caregivers shape infants' and young children's worlds in such a way that children learn to attend to what is going on around them rather than expecting to be the center of attention (e.g., Brown, 2011Brown, , 2014de León, 2011;Gaskins, 2000;Pye, 1986;Rogoff et al., 2003). These ethnographic findings lay out a broader ideology of caregiving, including a number of component attitudes (e.g., infants as inadequate conversational partners), that lead to the prediction that, on average, typically developing Mayan children are only infrequently directly addressed during their days at home. Indeed, using data from daylong recordings of children under age 3;0, Casillas and colleagues (2019) found that the Tseltal Mayan children in their sample heard an average of 3.6 minutes per hour of speech directed to themaround one third of the current estimate for North American English (Bergelson et al., 2019b)yet they hit established benchmarks for the onset of single-and multi-word utterances (see also Cychosz et al., 2019). This finding appears to support the idea that attitudes about child-directed talk mediate how frequently children are addressed. However, any direct comparison between these two childrearing contexts is critically confounded: the arrangement of everyday life is highly different between the subsistence farming, rural Tseltal Mayan community and the (sub)urban, middle-class North American populations to which they are being compared.
Children's pattern of linguistic input also varies depending on the social organization of everyday life, which shapes the circumstances for their interactions with others over the course of the day. Prior analyses of daylong recordings in both North American and Tseltal Mayan contexts suggests that different activities impact the rate at which children hear child-directed speech from hour to hour (Bergelson et al., 2019a;Greenwood, Thiemann-Bourque, Walker, Buzhardt & Gilkerson, 2011;Soderstrom & Wittebolle, 2013). The limited evidence to date shows approximately similar patterns in input rate fluctuation across the waking day: children in both North American and Tseltal Mayan contexts hear their highest rates of linguistic input in the morning and afternoon, with a dip around midday (Greenwood et al., 2011;Soderstrom & Wittebolle, 2013). Intriguingly, the activities associated with dense adult talk in the North American context are highly rare in the Tseltal Mayan sample (e.g., sing-alongs) and the activities associated with the least dense periods in the North American data are associated with peak input periods in the Tseltal Mayan sample (e.g., mealtimes, . In the Tseltal Mayan context specifically, the afternoon-dip pattern likely arises as a consequence of morning and later afternoon communal eating events with multiple adult and child speakers, separated by a longer, relatively quiet midday period of work or rest. The fluctuations in linguistic input Tseltal Mayan children hear over the day thus appear to be driven by the presence of multiple adult and child speakers whose home presence is regulated by the schedule and workload of farming, food preparation, rest, and other domestic activities.

The current study
Here we investigate the language environments of children growing up on Rossel Island, Papua New Guinea. While the Rossel lifestyle is broadly similar to that of the Tseltal Mayans, their orientation to verbal interaction with infants is more similar to that of middle-class North Americans: Rossel caregivers engage in intensive face-to-face verbal interactions with prelinguistic children, as described in more detail below (Brown, 2011;Brown & Casillas, accepted). Rossel families therefore offer a critical new datapoint in our understanding of cross-cultural variation in linguistic input 1 : If patterns of CDS on Rossel Island are similar to those reported for North American English, it would support the idea that caregiver ideology drives substantial differences in language input across variable contexts. If, instead, CDS patterns are more similar to that of the Tseltal Mayan community, it would support the idea that lifestyle drives substantial differences in language input across variable contexts; specifically, subsistence farming vs. post-industrial lifestyles.
We use manually annotated daylong recordings of Rossel children's language environments to track how much speech they hear from different speakers over the course of a day at home. During these recordings, the target child freely navigates their environment for multiple hours at a time while wearing an audio recorder, a simple method that can be similarly deployed across diverse linguistic and cultural settings Cychosz et al., 2019). We capture both situational variation and variation due to differences in caregiver responsiveness by sampling the daylong recordings in two different ways. First, we randomly sample clips to get a baseline estimate for how much speech children encounter, on average, over the course of the day. Because these clips are indiscriminately distributed over the whole recording, they include variation in input due to both specific activities (e.g., mealtime vs. work periods) and social-organizational effects (e.g., subsistence farming schedule, household composition). Second, we look specifically at patterns of interlocutor responsiveness by manually selecting the day's peak clips of sustained interaction between the target child and one or more co-interactants. By identifying clips in which children are hearably interacting with others, we aim to partlyalbeit imperfectlysample from home interactional contexts in which we know the target child is alert and socially engaged, similar to contexts in which cross-cultural differences in CDS have been shown in the past with these same two communities (e.g., Brown, 2011;Brown & Casillas, accepted).
On the basis of past comparative work, we predicted that Rossel children would hear frequent CDS from a wide variety of caregiver types throughout the day, which would support the idea that ideologies about child-directed talk drive substantial cross-context variation in language input rate. Prior ethnographic findings also led us to predict that: (a) distributed caregiving practices on Rossel Island would weaken hour-to-hour fluctuations in CDS rate attributed previously to a subsistence farming schedule , (b) children would hear an increasing proportion of CDS from other children as they got older, and (c) other-directed speech (ODS) would be abundant. We also predicted that any ideology-derived differences between the Tseltal and Rossel data would be most apparent during the clips targeting interactant responsiveness, which better approximate the contexts in which past differences between these communities have been found (Brown, 2011(Brown, , 2014Brown & Casillas, accepted). Consonant with prior daylong child language data across multiple cultural contexts, we also expected little-to-no increase in CDS rate with age, a decrease in ODS rate with age, and for CDS to occur in non-uniform bursts throughout the day (Abney, Smith & Yu, 2017;Bergelson et al., 2019b;.
In what follows we review the ethnographic work done in this community previously, describe our methods for following up on that work with daylong recordings, present the current findings, and discuss the similarities and differences that arose. All methods for annotation and analysis in this study closely follow those reported elsewhere for Tseltal children's speech environments .

Corpus
The participants in this study live in a collection of small hamlets on north-eastern Rossel Island, approximately 250 nautical miles off the southern tip of mainland Papua New Guinea with only intermittent access to and contact with the outside world. The traditional language of Rossel Island is Yélî Dnye, an isolate (Papuan), which features a phonological inventory and set of grammatical features unlike any other in the (predominantly Austronesian) languages of the region. The islanders are swidden horticulturalists, cultivating taro, sweet potato, manioc, yam, coconut, and more for their daily subsistence, with protein coming from fishing and (occasionally) slaughtering pigs or wild animals such as possums, goannas, snakes. Children often forage independently for shellfish and wild nuts, extra sources of protein. Most children on Rossel Island grow up speaking Yélî Dnye at home, though English, Tok Pisin, and a number of languages from the nearby islands and mainland are frequently heard from adults and school-aged children. Formal training in English as a second language begins in school, around age 7. Children grow up in patrilocal household clusters (i.e., their family and their father's brothers' families), usually arranged such that there is some shared open space between households.
During their waking hours, infants are typically carried in a caregiver's arms as they go about daily activities. Infants, even very young ones, are frequently passed between different people (male and female, young and elderly) throughout the day, returning to the mother to suckle when hungry. This baby-lending practice is not restricted to the natal family, or even close relatives; between feedings, one may find an infant several villages away from its mother, with older infants and young children being transferred between distant caregivers for even longer periods (sometimes for several weeks or longer). The arc of a typical day for an infant might include waking, being dressed and fed, then a mix of (a) spending time with nearby adults or older children as they walk around socializing and completing tasks with others and (b) more feeding, perhaps followed by short bouts of sleep in the late morning and afternoon, usually with the mother. Sometimes children are also taken along for gardening after the morning meal. Afternoon meals are cooked from around 15:00 onward, with another feed and more socializing before resting for the night. Starting around age two or three, children spend much of their time in large, independent child playgroups (10+ cousins and neighbors) who freely travel near and around the village searching for nuts and fruits, bathing in nearby rivers, and engaging in group games (e.g., tag, pretend play, etc.).
Interaction with infants and young children on Rossel Island is initiated by women, men, girls, and boys alike in a face-to-face, contingency-seeking, and affect-laden style (Brown, 2011;Brown & Casillas, accepted). Children are considered a shared responsibility, but also a source of joy and entertainment for the wider network of caregivers in their community. In her prior ethnographic work, Brown details some ways in which interactants make bids for joint attention and act as if the infant can understand what is being said (Brown, 2011). Infants pick up on this pattern of caregiving, initiating interactions with others twice as frequently as Tseltal children, who are encouraged instead to observe the interactions going on around them (Brown, 2011). Brown and Casillas (accepted) document how Rossel caregivers encourage early independence in their children, observing their autonomy in choosing what to do, wear, eat, and say while finding other ways to promote pro-social behavior (e.g., praise). Overall, Rossel Island could be characterized as a child-centered language environment (Ochs & Schieffelin, 1984; but see Brown & Casillas, accepted), in which children, even very young ones, are considered interactional and conversational partners whose interests are often allowed to shape the topic and direction of conversation.
The data presented here come from the Rossel subset of the Casillas HomeBank Corpus (Casillas, Brown & Levinson, 2017), a collection of raw daylong recordings and supplementary data from over 100 children age four and younger growing up on Rossel Island and in the Tseltal Mayan community described elsewhere . The Rossel subcorpus was collected in 2016 and includes daylong audio recordings and experimental data from 57 children born to 43 mothers. These children had 0-2 younger siblings (mean = 0.36; median = 0) and 0-5 older siblings (mean = 2; median = 2); most participating caregivers were on the younger end of those in the community, though two children's primary caregivers were their biological grandparents (mean = 33.9 years; median = 32; range = 24-70 and fathers: mean = 35.6; median = 34; range = 24-57). Based on available demographic data for 40 of the biological mothers we estimate that mothers are typically 21.4 years old when they give birth to their first child (median = 21.5; range = 12-30). On the basis of demographic data for 34 of those mothers, we estimate an average inter-child interval of 2.8 years (median = 2.6; range = 1.75-5.2).
The size of households, defined here as the number of people sharing kitchen and sleeping areas on a daily basis, ranged between 3 and 12 (mean = 7; median = 7). Households are clustered into small patrilocal hamlets which afford a wider group of communal caregivers and playmates. The hamlets themselves are clustered together into patches of more distantly related patrilocal residents. The average hamlet in our corpus comprises 5.8 households (median = 5; range = 3-11); the typical household in our dataset has 2 children under age seven (i.e., not yet attending school) and 2 adults, leading us to estimate that there are around 10 young children and 10 adults present within a hamlet throughout the day. This estimate does not include visitors to the target child's hamlet or relatives that the target child encounters while visiting others. Therefore, while 24.6% of the target children in our corpus are first born to their mothers, these children are incorporated into a larger pool of young children whose care is divided among numerous caregivers.
Among our participating families, most mothers had finished their education at one of the island's schools (6 years of education = 32.6%; 8 years of education = 37.2%) 2 , with about a quarter having attended secondary school off the island (10 years of education = 25.6%; 12 years of education = 2%). Only one mother had less than six years of education. Similarly, most fathers had finished their education at one of the island's schools (6 years of education = 44.2%; 8 years of education = 20.9%) or at an off-island secondary school (10 years of education = 27.9%), with only 7% having less than six years of education. Note that in Table 1 we use a different set of educational levels than is used on the island so that we can more easily compare the present sample to the Tseltal sample used in  see Table 1 caption for details). As far as we could ascertain at the time of recording, all but two children were typically developing; one showed signs of significant language delay and one showed signs of multiple developmental delay (motor, language, intellectual). Both of these children's delays were consistently observed in follow-up trips in 2018 and 2019. Their recordings are not included in the analyses reported below.
Dates of birth for children were initially collected via caregiver report. We were able to verify the majority of birth dates using the records at the island health clinic. Because not all mothers give birth at the clinic and because dates are logged by hand, some births are not recorded, are inaccurately recorded, or otherwise significantly diverge from what the caregivers report. In these cases we gathered information from as many sources as possible and followed up with the families, often using the dates of neighboring children born around the same time to determine the correct date.
The data we present come from 7-9-hour recordings made at home during daylight hours (6:00-18:00; there is little or no powered light after dark). Children wore the recording device: an elastic vest containing a small stereo audio recorder (Olympus WS-832 or WS-853) and a miniature camera that captured photos from the child's frontal viewpoint at a fixed interval (every 15 seconds; Narrative Clip 1). The camera was outfitted with a fisheye lens (Photojojo Super Fisheye) that allowed us to capture 180 degrees of the child's frontal view. This photo technique increases the ease and reliability of transcription and annotation by giving scene information that aids activity and interlocutor identification. However, because the camera and recorder are separate devices, we had to synchronize them manually. We used an external wristwatch to record the current time at start of recording on each device individually, with accuracy down to the second (photographed by the camera and spoken into the recorder). The camera's software timestamps each image file such that we can calculate the number of seconds that have elapsed between photos. These timestamps were used with the cross-device time synchronization cue to create photo-linked audio files of each recording, which we then formatted as video files (see https://github.com/marisacasillas/Weave for scripts). The informed consent process used with participants, as well as data collection and storage, were conducted in accordance with ethical guidelines approved by the Radboud University Social Sciences Ethics Committee.

Data selection and annotation
From the daylong recordings of 57 Rossel children, we selected 10 representative children between ages 0;0 and 3;0 for transcription and analysis. The 10 children were selected to be spread between the target age range (0;0-3;0) while also representing a range of typical maternal education levels found in the community and being evenly split between male and female children (Table 1). We selected a series of non-overlapping sub-clips from each recording for transcription (Figure 1) in the following order: nine randomly-selected 2.5-minute clips, five manuallyselected 'peak' turn-taking activity 1-minute clips, five manually-selected 'peak' target child vocal activity 1-minute clips, and one manually-selected 5-minute expansion of the highest-activity one-minute clip, for a total of 37.5 minutes of transcribed audio for each child (6.25 audio hours in total).
Manual clip selection proceeded as follows: one person (the first author or a Western research assistant) listened through the entirety of each recording, documenting the approximate onset time, duration, and notable features of any short period that they perceived to be a burst of turn taking and/or target-child vocalization; judgments were made subjectively, and with reference to the lack of such activity in other parts of the recording. After compiling a list of candidate bursts for each recording, the first author Table 1. Demographic overview of the 10 children whose recordings are sampled in the current study, including from left to right: child's age (years;months.days); child's sex (M/F); mother's age (years); highest level of maternal education achieved (primary (grades 6-7)/secondary (grades 8-11)/ preparatory (grade 12)); and the number of people living in the child's household. listened again to each candidate, adding further notes about the diversity of target-child vocalizations and the density of turn taking. Clips that overlapped with previously transcribed segments or that featured significant background noise were eliminated. From the remainder, the five 1-minute clips that best demonstrated sequences of temporally contingent vocalization between the target child and at least one other person were selected as the 'turn-taking' clips. From the remaining candidate clips, the five that best demonstrated high density, high maturity, and high diversity vocalizations by the target child were selected as the 'vocal activity' clips. After these ten 1-minute clips had been transcribed for each recording (i.e., during the field visit), the first author assessed each for its density of vocal and turn-taking activity and searched for continuation of that activity before and after the one-minute clip. The clip that best balanced dense, minimally repetitious verbal activity with continuation in neighboring minutes was selected to have a 5-minute extension window for further annotation. Finally, all else being equal, we gave preference to clips featuring speech from underrepresented foreground speakers (e.g., adult males; see more details at https://git.io/fhdUm). We were limited to annotating these sub-clips from only 10 children because of the time-intensive nature of transcribing these naturalistic data; 1 minute of audio typically took approximately 60-70 minutes to be segmented into utterances, transcribed, annotated, and loosely translated into English (∼400 hours total). Yélî Dnye is almost exclusively spoken on Rossel Island, where there is no electricity (we use solar panels) and unreliable access to mobile data, so transcription was completed over the course of three 4-6 week visits to the island in 2016, 2018, and 2019.
We used the ACLEW Annotation Scheme (Casillas, Bunce, et al., 2017) in ELAN (Wittenburg, Brugman, Russel, Klassmann & Sloetjes, 2006) to transcribe and annotate all hearable speech in the clips. Using both the audio and photo context, we segmented out the utterances and ascribed them to individual speakers (e.g., older brother, mother, aunt, etc.). We then annotated the vocal maturity of each utterance produced by the target child (non-canonical babble/canonical babble/single word/multi-word/unsure) and annotated the addressee of all speech from other speakers (addressed to the target child/one or more other children/one or more adults/a mix of adults and children/any animal/other/unsure).
Regarding vocal maturity annotations, a vocalization was considered a 'single word' if it contained a single recognizable (transcribed) word or a repetition of the same word (e.g., 'mine', 'mine mine'). It was considered a 'multi-word' vocalization if it contained at least two different words (e.g., 'my mango'), with non-lexical linguistic vocalizations annotated as 'canonical babble' (containing at least one consonant with an adult-like transition with its neighboring vocalic sound(s)) or 'non-canonical babble', and non-linguistic vocalizations classified as 'crying' or 'laughing'. Vocalizations that were too ambiguous to make a decision were marked as 'unsure'. Vegetative sounds (e.g., burps, sneezes) were ignored.
The audio and photo context were reviewed to identify, for each utterance, to whom the speaker was talking (i.e., the addressee for each utterance); utterances were only considered directed to the target child when the native Rossel-speaking research assistant and first author felt certain of this judgment given the context. Utterances were otherwise classified as directed to a 'child' (1+ children; a group that may include the target child so long as another child is also being addressed), 'adult' (1+ adults), 'both' (1+ children and 1+ adults; a group that may include the target child), 'animal' (1+ animals), 'other' (a clear addressee that doesn't fit into the other categories), or 'unsure' (not enough evidence to make a judgment about addressee).
Note that all transcription and annotation was done together by the first author and one of three community members (all native Yélî Dnye speakers). The communitybased research assistants personally knew all the families in the recordings, and were able to use their own experience, the discourse context, and information from the accompanying photos in reporting what was said and to whom speech was addressed for each utterance. These annotations relied on mutual agreement between the first author and the Rossel research assistant, so there is no direct way to estimate interrater reliability for the 4308 target-child vocalizations and 10133 other-speaker vocalizations discovered in the clips. That said, independent vocal maturity annotations of these same target child vocalizations in a different study revealed a highly similar pattern of results (Cychosz et al., 2019). Detailed manuals and self-guided training materials, including a 'gold standard test' for this annotation scheme can be found at https://osf.io/b2jep/wiki/home/ (Casillas, Bunce, et al., 2017).
In what follows we first analyze the nine randomly selected 2.5-minute clips from each child to establish a baseline view of their speech environment, focusing on the effects of child age, time of day, household size, and number of speakers on the rate of target child-directed speech (TCDS) and other-directed speech (ODS). Next, we repeat these analyses, focusing instead only on the turn-taking clips to gain a view of the speech environment as it appears during the peak interactions for the day. Then as a first approximation of children's linguistic development, we map a coarse trajectory of children's use of babble, first words, and multi-word utterances. Lastly, we compare our findings to those from the Tseltal Mayan community, and briefly relate our results to the larger literature on child-directed speech and its role in language development.

Statistical models
We conducted all analyses in R, using the glmmTMB package to run generalized linear mixed-effects regressions (Brooks, Kristensen, van (Wickham, 2016). This dataset and analysis are available at https://github.com/ marisacasillas/Yeli-CLE. TCDS and ODS minutes per hour are naturally restricted to non-negative (0-infinity) values, causing the distributional variance of those measures to become positively skewed. To address this issue we use negative binomial regressions, which can better fit non-negative, overdispersed data (Brooks et al., 2017;Smithson & Merkle, 2013). There were also many cases of zero minutes of TCDS across the clipsfor example, this occurred in the randomly sampled clips when the child was sleeping in a quiet area. To handle this additional distributional characteristic of the data, we added a zero-inflation component to TCDS analysis which, in addition to the count model of TCDS (e.g., testing effects of age on the input rate), creates a binary model to evaluate the likelihood of clips with no TCDS being used at all. More conventional, gaussian linear mixed-effects regressions with log-transformed dependent variables are provided in the Online Supplemental Materials (Supplementary Materials), but are qualitatively similar to what we report here.

Results
The models included the following predictors: child age (months; centered and standardized), household size (number of people; centered and standardized), number of non-target-child speakers present in that clip (centered and standardized), and time of day at the start of the clip (factor: "morning" = before 11:00; "midday" = 11:00-13:00; "afternoon" = after 13:00). We also included two-way interactions of (a) child age and the number of speakers present and (b) child age and time of day, with a random effect of child. For the zero-inflation model of TCDS, we included the number of speakers present. We limit our discussion to significant effects; full model results are provided in the Online Supplemental Materials (Supplementary Materials).

Target-child-directed speech (TCDS)
In the random sample, these 10 children heard an average of 3.13 minutes of speech directly addressed to them per hour (median = 2.95; range = 1.58-6.26; Figure 2 left panel, purple/solid summaries). For comparison, this is slightly less than reported The zero-inflated negative binomial regression of TCDS minutes per hour (N = 90, log-likelihood = -195.26, overdispersion estimate = 3.37) suggested significant effects of child age, time of day, and their interaction on the rate at which children are directly addressed. First, the older children heard a small but significantly greater amount of TCDS per hour (Figure 2 left panel purple/solid summaries; B = 0.73, SD = 0.23, z = 3.20, p < 0.01). Secondly, overall, all children were also more likely to hear TCDS in the mornings (Figure 3 top left panel), with significantly higher TCDS rates in the morning compared to both midday (midday-vs-morning: B = 0.80, SD = 0.36, z = 2.23, p = 0.03) and the afternoon (afternoon-vs-morning: B = 0.54, SD = 0.26, z = 2.10, p = 0.04), and no significant difference in TCDS rate between midday and the afternoon. However, the time-of-day pattern changed with child age. Older children were more likely than younger children to show a peak in TCDS during midday, with a decrease in TCDS between midday and the afternoon (midday-vs-afternoon: B = -0.60, SD = 0.29, z = -2.04, p = 0.04) and marginally less TCDS in the morning than at midday (midday-vs-morning: B = -0.59, SD = 0.30, z = -1.94, p = 0.05). There were no further significant effects in either the count or the zero-inflation models.
Children heard TCDS from a variety of different speakers. Most TCDS came from adults (mean = 72.65%, median = 75.51%, range = 41.41-100%). On average, 82.35% of the total TCDS minutes from adults came from women. However, older target children were more likely to hear TCDS from other child speakers than younger target children (e.g., TCDS from siblings, cousins, or neighbors; Child-TCDS); a Spearman's . TCDS min/hr (left panels) and ODS min/hr (right panels) across the recorded day in the random clips (top panels) and turn-taking (bottom panels) clips. Each box plot summarizes the data for children age 1;0 and younger (light) or age 1;0 and older (dark) at the given time of day. correlation showed a significant positive relationship between the average proportion of Child-TCDS in a clip and target child age (Spearman's rho = 0.78; p = 0.01).

Other-directed speech (ODS)
In the random sample, these children heard an average of 35.90 minutes of other-directed speech per hour (Figure 2 right panel, purple/solid summaries; median = 32.37; range = 20.20-53.78): that is more than eleven times the average quantity of speech directed to them, with many clips displaying near-continuous background speech. For comparison, the prior estimate for Tseltal Mayan children using near-parallel methods found an average of 21 minutes of overhearable speech per hour , and a recent study of North American children's daylong recordings found that adult-directed speech (a subset of ODS) occurred at a rate of 7.3 minutes per hour (Bergelson et al., 2019b).
The negative binomial regression of other-directed speech rate (N = 90, log-likelihood = -370.87, overdispersion estimate = 9.14) revealed effects of child age, number of speakers present, and time of day on the rate of ODS encountered. The rate of ODS significantly decreased with child age (Figure 2 right panel, purple/solid summaries; B = −0.57, SD = 0.17, z = −3.28, p < 0.01) and significantly increased in the presence of more speakers (B = 0.50, SD = 0.05, z = 10.07, p < 0.001). Across the randomly selected clips, there were an average of 6.19 speakers present other than the target child (median = 6; range = 1-19), an average of 59.99% of whom were adults. Comparing again to Tseltal Mayan and to North American English daylong recording findings, in which the average number of speakers present, not including the target child, was 3.9 and 3.44 respectively (Bergelson et al., 2019a;, we can infer that the increased rate of ODS on Rossel Island is due in part to there simply being more speakers present. Time-of-day effects on ODS only came through in an interaction with child age (Figure 3 top right panel). In particular, older children heard a pattern of ODS mirroring the general pattern of TCDS; significantly more ODS in the mornings compared to midday (midday-vs-morning: B = 0.65, SD = 0.20, z = 3.23, p < 0.01) and the afternoon (afternoon-vs-morning: B = 0.37, SD = 0.15, z = 2.50, p = 0.01). There were no other significant effects on ODS rate.
In sum, the random baseline rates of TCDS and ODS in children's speech environments are influenced by child age (TCDS increases, ODS decreases), by time of day (both generally peak in the morning), and by their interaction (older children hear more TCDS and less ODS than younger children at midday). The rate of ODS is also impacted by the number of speakers present. Correlational results suggest that TCDS comes increasingly from other children over the first three years. That said, the baseline rate of TCDS is low, on par with estimates in other small-scale rural communities , while the ODS rate is quite high relative to estimates in prior work.

TCDS and ODS during interactional peaks
If we instead investigate the rates of TCDS and ODS encountered by these children during interactional peaks, a different picture emerges (Figures 2 and 3 green/dashed summaries). Unsurprisingly, the children heard much more TCDS in the turn-taking clips -14.45 min/hr; more than four times the rate of TCDS in the random baseline ( Figure 2, left panel, green/dashed summaries; median = 15.07; range = 9.61-18.73).
Children also heard a reduced rate of ODS: 25.27 min/hr (70.39% of the randomsample ODS rate, Figure 2, right panel, green/dashed summaries; median = 19.59; range = 6.68-60.18). The next question was whether the pattern of TCDS and ODS use across age, time of day, and number of speakers in these turn-taking clips differed from what was seen in similarly sampled clips from the Tseltal Mayan community . To investigate the effects of these variables we ran parallel regressions to what was used with the random clips above.
During interactional peaks, as in the random sample, older target children heard more Child-TCDS than younger target children. While, overall, more of the TCDS in interactional peaks came from adults than in the random clips (mean = 82.68%, median = 88.04%, range = 50-100%), a Spearman's correlation showed an even stronger positive relationship between the average proportion of Child-TCDS in a clip and target child age (Spearman's rho = 0.92; p = < 0.001). Notably, women contributed proportionally less TCDS during interactional peaks than they did during the random clips: on average, women contributed 61.55% of the children's TCDS minutes from adults in the turn-taking clips (compared to 82.35% in the random clips). In brief, compared to the random sample, interactional peaks included more directed speech from men and, for older target children, more directed speech from other children.
The negative binomial mixed-effects regression of ODS (N = 55, loglikelihood = -202.60, overdispersion estimate = 4.66) only revealed a significant effect of number of speakers. As before, ODS rates were higher when more speakers were present (B = 0.56, SD = 0.08, z = 6.76, p < 0.001). There were no other significant effects on ODS rate (Figure 3, bottom right panel).
Overall, the results suggest that these children typically hear very little directly addressed speech, but that interactional peaks provide opportunities for dense input. While the majority of directed speech comes from women, an increasing portion of it comes from other children with age, and directed speech from men is more likely during interactional peaks. Directed and overhearable speech are most likely to occur during the morning, before most of the household has dispersed for their work activities, similar to other findings from subsistence farming households . However, older children are more likely than younger children to experience higher input rates at midday, perhaps due to their increased interactions with other children while adults attend to gardening and domestic tasks. Possibly because of the large number of speakers present, these children were also in the vicinity of a great deal of overhearable speech, underscoring the availability of otheraddressed speech as a resource for linguistic input in this context.

Vocal maturity
Given the low baseline rate of directed speech, one might expect that Rossel children's early linguistic development, particularly the onset and use of single-and multi-word utterances, shows delays in comparison to children growing up in more CDS-rich environments. We plotted the proportion of all linguistic vocalizations for each child (i.e., discarding laughter, crying, or unknown-types; leaving a total of 4308 vocalizations) that fell into the following categories: non-canonical babble, canonical babble, single-word utterance, or multi-word utterance. Children are expected to traverse all four types of vocalization during development such that they primarily produce single-and multi-word utterances by age three.
Over all annotated clips, children produced an average of 7.18 linguistic vocalizations per minute (median = 7.79; range = 4.57-8.95), which is a vocalization rate lower than recorded for short recordings of US infant-caregiver interaction (Oller et al., 1995) but similar to estimates for Tseltal Mayan children (Brown, 2011;.

Discussion
We analyzed the speech environments of 10 Rossel children under age 3;0 to investigate: (a) how often children were spoken to directly, (b) how much other overhearable speech is available to them, and (c) how these sources of linguistic input are shaped by child age and interactional context. We then additionally conducted a preliminary investigation into (d) whether this (relatively) low rate of directed input appears to impact their early production milestones.
By investigating the language environments of children in this child-centric subsistence farming context, we aimed to provide a new and critical comparative datapoint to a research area that has previously confounded differences in childdirected speech ideology with differences in broad lifestyle features (post-industrial/ nuclear vs. subsistence-farming/multi-generational, . Our idea was that, if Rossel children's language environments pattern like North American ones, it would support that idea that caregiver ideology drives substantial differences in language input, whereas if they patterned like Tseltal Mayan environments, it would instead support the idea that lifestyle drives substantial differences. Overall, our findings point toward broad effects of lifestyle on the quantity of directed and overheard speech children hear. Evidence for the influence of CDS ideologies only begins to emerge when we look at patterns in who speaks to the target child, not in overall rates of linguistic input. Input rate similarities across subsistence farming communities Based on prior ethnographic work, we hypothesized that Rossel children would hear frequent child-directed speech (Brown & Casillas, accepted). In fact, Rossel children were rarely directly addressed over the course of the day. We found a baseline rate of TCDS comparable to that found in a Tseltal Mayan community where infrequent use of TCDS is one means of socializing children into attending to their surroundings (Rossel: 3.13 TCDS min/hr vs. Tseltal: 3.63). As in the case of Tseltal Mayan children, this relatively low rate of TCDS was not associated with any delay in the appearance of vocal maturity milestones, including the use of single-and multi-word utterances. Since we know from prior, in-depth ethnographic work that caregivers' ideas about talking to young children do, in fact, differ enormously in these two communities (Brown, 2011(Brown, , 2014Brown & Casillas, accepted), we attribute the similarity in baseline rates of TCDS to the fact that all these children are growing up in multi-generational, subsistence farming households. This inference is bolstered by the fact that fluctuations in TCDS rate over the day in the Rossel data are highly similar to those reported for Tseltalpeak rates in the morning, with older children eliciting more TCDS during midday hours than younger children , and with ODS rate following a similar contour. While a basic afternoon-dip pattern has been shown in at least some North American home recordings (Greenwood et al., 2011;Soderstrom & Wittebolle, 2013), the activities and total number of speakers present during periods of peak linguistic input are likely to be different across these economic contexts; an important avenue for future research. In line with prior work linking high caregiver workload to less CDS, our prediction is that the Tseltal and Rossel fluctuations derive from some of the (broadly) similar tasks associated with their subsistence farming lifestyles (see also findings from Kaluli, Samoan, Gusii, and Yucatec communities in, e.g., LeVine et al., 1996;Ochs, 1988;Schieffelin, 1990;Gaskins, 2006).
We had hypothesized that cultural differences in quantity of caregiver talk to children would be most visible in the turn-taking clips, which were selected in particular for their view into caregiver responsiveness patterns. Against expectations, we found a similar overall rate of TCDS in the Rossel turn-taking clips compared to that of the Tseltal Mayan children (Rossel: 14.45 TCDS min/hr vs. Tseltal: 13.28). In both cultural contexts, peak TCDS clips displayed around four times the rate of directed speech as the baseline rate, though we note that this relative increase was greater in the case of the Rossel data than the Tseltal data (Rossel: 4.62x the random rate vs. Tseltal: 3.66x).

Input source differences across subsistence farming communities
One distinctive feature of the Rossel data that was not observed for Tseltal is the division of TCDS among women, men, and other children. On Rossel Island, all of these types of speakers attend to the care of young children (Brown & Casillas, accepted). In line with these observations, we find that Rossel children hear more CDS from other children than Tseltal children do (Rossel: 27% of TCDS vs. Tseltal 20%), and that the proportion of TCDS from other children increases with age, a pattern not found for Tseltal children in this age range . Additionally, TCDS from men was far more frequent in the Rossel data, making up nearly 20% of adult TCDS in the random baseline and nearly 40% of adult TCDS in the turn-taking clips. 3 We take this substantial proportion of TCDS from children and men as evidence that caregiving is indeed divided among many types of speakers in Rossel communities (Brown & Casillas, accepted); note that, together, child and adult male speakers contribute more than half of the TCDS during interactional peaks (see also . In brief, we only get a glimpse into the different caregiving arrangements between the Tseltal and Rossel cultural contexts with respect to who is talking to the target child, and not with respect to how often the child is being talked to. The age-related increase in TCDS from other children recalls findings from Shneidman and Goldin-Meadow (2012; see also Brown, 2011 and Brown & Casillas, accepted) in which Yucatec Mayan children's directed speech rate increased enormously between ages one and threemuch more than the increase observed in these Rossel children's recordingsprimarily due to increased input from other children (see also Scaff et al., in preparation). Interestingly, data from the Tseltal community, which is from the same Mayan cultural milieu as the Yucatec families studied in , show no evidence for increased input from other children in this same age range (0;0-3;0; , possibly because Tseltal children only begin to more fully engage in independent, extended play with other children AFTER age three. In contrast, independence is a primary concern for caregivers of young children on Rossel Island; from early toddlerhood Rossel children are encouraged to choose how they dress, when and what to eat, and whom to visit (Brown & Casillas, accepted). The formation of hamlets in a cluster around a shared open area, often close to a shallow swimming area, further nurtures a sense of safe, free space in which children can wander. These features of childhood on Rossel Island support extended independent play with other children from an early age and may help explain the strongly increasing presence of child TCDS in the present data. Further work combining the time-of-day and interactant effects found here with ethnographic interview data are needed to explore these ideas in full.
Replicating daylong language environment patterns Prior work using daylong audio recordings in both Western and non-Western contexts led us to expect that the quantity of TCDS would be relatively stable across the age range studied, that ODS rate would decrease with age, and that TCDS would be non-uniformly distributed over the recording day (Abney et al., 2017;Bergelson et al., 2019b;. Counter to expectations, we found a small but significant increase in TCDS rate with child age in the random clips and a small and significant decrease in TCDS rate with age in the turn-taking clips. The age-related baseline increase in TCDS may derive from more frequent participation in independent play with other children; in prior work, increased proportional input from other children was also associated with an increase in overall input rate . The age-related decrease in TCDS rate during peak interactional moments was not expected, but may also be attributable to this change in interactional partners with age; if adults are more likely to be the source of TCDS during interactional peaks for younger children, they may also provide more voluminous speech during those peaks than other children do during interactional peaks later in development. Sleep during the day may also help explain these patterns; if older children sleep less than younger children, they may be more likely hear more TCDS during random but not peak-based clips. All of these explanations require follow-up work from a larger sample of children and, ideally, from a larger sample of their interactions throughout the day. Finally, consistent with prior daylong language environment analyses, ODS rate decreased with age, and the random and turn-taking clips across the day revealed substantial fluctuations in TCDS rate (Abney et al., 2017;Bergelson et al., 2019b;. One implication of our findings is that TCDS rate estimates from daylong data do not directly distinguish distinct caregiver attitudes toward talking to young children. While Rossel caregivers view their children, even their young infants, as potential co-interactants in conversational play (Brown & Casillas, accepted), the circumstances of everyday life shape the broader linguistic landscape such that most of what children hear is talk between others. We suggest that, in the daylong context, caregivers from these two subsistence farming communities are preoccupied for most of the day with social and domestic commitments in which they are motivated to converse with the other adults and (older) children present; not just to get their daily tasks done but also because these more mature speakers enable more complex verbal interactions and social routines. Rather, we suspect that caregiver attitudes about how to engage children in interaction are more clearly expressed during interactional peaks and, even then, via behaviors more nuanced than what can be captured by input quantity measures alone. In the case of Rossel Island, we saw not only more TCDS but also TCDS from more diverse speaker types during interactional peaks. We suggest, then, that the forces shaping the rate of Rossel children's linguistic input are somewhat different from the forces shaping the content and sources of their linguistic input. In order to comparatively examine culturally distinct codes of verbal interaction in children's at-home speech environments, future work should focus not only on the rate, but also the sources and content of the speech children are exposed to, perhaps using strategic subsampling similar to what was implemented here.

Implications for theories of language learning
Despite hearing relatively little directed linguistic input, these 10 Rossel children show no sign of delay in their achievement of early linguistic milestones, including the use of single-and multi-word utterances. This finding is hard to explain under any theory of language learning that requires very large amounts of TCDS input. While prior evidence predicts a highly robust onset of canonical babble (e.g., Oller et al. 1995;Oller, Eilers, Neal & Cobo-Lewis, 1998; but see also Lee, Jhang, Relyea, Chen &Oller, 2018 andCychosz et al., 2019), the stable use of individual phonological segments in speech-like babble and the subsequent appearance of recognizable words is indeed variable between children (McGillion, Herbert, Pine, Vihman, DePaolis, Keren-Portnoy & Matthews, 2017; see also McCune & Vihman, 2001) and, further on, children's early productive vocabulary size predicts their later syntactic development, including early word combinations (Frank et al., in press;Marchman et al., 2004). In sum, while prior evidence led us to expect a stable onset of canonical babble across diverse input contexts, it would not have led us to expect cross-context stability in the onset of early lexical productions, as found here.
Following a similar set of findings regarding both the language environment and vocal maturity of Tseltal-learning children, Casillas and colleagues (2019) suggested three ways in which children might proceed in language learning without delay despite hearing relatively little directed speech: (a) an ability to learn from observing others' language use (see also de León, 2011;Rogoff et al., 2003;Shneidman, 2010;, (b) capitalizing on regularities in language used during day-to-day routines, and (c) benefiting from a natural cycle in which children frequently sleep following short bursts of interactional linguistic input. In this third case, the idea is that short-term memories of directed input are consolidated before significant interference takes place (Gómez, Bootzin & Nadel, 2006;Horváth, Liu & Plunkett, 2016;Kurdziel, Duclos & Spencer, 2013;Mullally & Maguire, 2014). These three proposals for Tseltal children, which are not mutually exclusive, may also apply in the case of Rossel children, considering that the overall characteristics of their linguistic environments are not dissimilar.
Mechanisms for language learning that efficiently capitalize on sparse bursts of CDS and/or overhearable speech (e.g., massed learning, as in Schwab & Lew-Williams, 2016; or attention to others' talk, as in Akhtar, 2005 andShneidman, Arroyo, Levine &) may help us understand the current findings. Further, theoretical models of language learning that: (a) make the most of each linguistic "datapoint" in the input and (b) enable rapid uptake of streams of talk (e.g., when observing speech between others) may be key to explaining language development in this kind of context. For example, prediction-based models allow the learner to compare the predicted vs. observed properties of each utterance as it unfolds, with recalibration when errors are detected (Chang, Dell & Bock, 2006;Christiansen & Chater, 2016;Elman, 1990Elman, , 1993McCauley & Christiansen, 2017). Such models hypothetically make the most of each utterance by rapidly updating knowledge on the basis of both the occurrence and non-occurrence of expected events (see Rabagliati, Gambi & Pickering, 2016 for a balanced overview). In contrast, models of learning that rely on pedagogical cueing or frequent and fitted responses to infant vocalizations by an adult caregiver are not easily reconciled with the results presented here, nor indeed those reported for several other rural, traditional communities (Cristia, Dupoux, Gurven & Stieglitz, 2017;Gaskins, 2006;Ochs & Schieffelin, 1984;Vogt, Mastin & Schots, 2015).

Limitations
Our language outcome measures, which track the onset and relative usage frequency of broad linguistic phenomena, crucially differ from those used in prior work establishing a relationship between child vocabulary and input quality measures (e.g., Cartmill et al., 2013; Hirsh-Pasek, Adamson, Bakeman, Owen, Golinkoff, Pace, Yust & Suma, 2015; Ramírez, Lytle & Kuhl, 2020;Ramírez-Esparza, García-Sierra & Kuhl, 2014;Rowe, 2012). Vocabulary development on Rossel Island may be similarly responsive to the type and quantity of CDS children encounterfor example, referentially transparent utterances would theoretically still facilitate the acquisition of word meanings. That said, our impression is that such variation does not play a meaningful role in Rossel children's development as a full-fledged members of the language community. So, future work along those lines would likely be limited to interpreting such effects with respect to the mechanisms underlying lexical category formation, and not as prerequisites for normative language development. With respect to input quality measures, we are similarly unable to assume that the features of language experience considered to be "quality" in a North American middle-class context also happen to promote the suite of language behaviors particular to Yélî Dnye speakers. Instead, we here use target-child-directed speech as a proxy for the quantity of tailored input children hear; that is, we focus here on the quantity of input we know to be designed for the child's attention and ability at the moment the speech was uttered.

Conclusion
We estimate that, on average, children on Rossel Island under age 3;0 hear 3.13 minutes of directed speech per hour, with an average of 14.45 minutes per hour during peak interactive moments during the day. Most directed speech comes from adults, but older children hear more directed speech from other children. There is also an average 35.90 minutes per hour of overhearable speech present. Older children heard more directed speech and less overhearable speech than younger children. Bursts of speech featuring mostly TCDS appear to be present from infancy onward. Despite this relatively low rate of directed speech, these children's vocal maturity appears on track with norms for typically developing children in many other populations (Cychosz et al., 2019;Lee et al., 2018;Warlaumont et al., 2014). The present findings thus join the numerous other documented cases of non-delayed language development without frequent child-directed speech (Brown, 2011;Brown & Gaskins, 2014;Cristia et al., 2017;de León, 2011;Gaskins, 2006;Ochs, 1988;Ochs & Schieffelin, 1984;Rogoff et al., 2003;Schieffelin, 1990;. Our findings diverged in several ways from expectations developed on the basis of prior ethnographic work in this community, including the frequency of childdirected talk and the distribution of talk over the course of the day. When considered together with data from a Tseltal Mayan community, the findings suggest that estimates of input rate that are derived from daylong data are far more sensitive to situational variation (e.g., the number of speakers present, which varies with activity) than they are to established ideological variation in how caregivers talk to children. Whether child language development is better predicted by meaningful individual differences in average situational variation in input rate, ideologically based variation in other verbal behaviors (e.g., who talks to the child), or something in between, is a question for future work. Cross-cultural and cross-linguistic data will have a major role to play in teasing out the causal factors at play in this larger issue relating children's early linguistic experience to their later language development.
The data presented here come from an evolving corpus of Yélî Dnye developmental data; any reader interested in citing descriptive features of the Rossel child language environment (e.g., TCDS rate) or in replicating or extending these analyses is strongly encouraged to visit the following address for up-to-date estimates: https://middycasillas. shinyapps.io/Yeli-Child_Language_Environment/. The information on that linked page will include any new data, annotations, and analyses added after the publication of this study.
Notes 1 While a comparison between the Rossel and Tseltal communities is still confounded by numerous other cultural and linguistic differences, their similarity in subsistence lifestyle facilitates comparative interpretations more than either community compared to a post-industrial one. 2 Local schools include elementary (∼3 years; ages ∼7-10) and primary (∼6 years; ages ∼10-16) education. Subsequent education is not locally available and students pursuing this route must find accommodations on other islands in the region or on mainland PNG.