Spontaneous lexical overlap is a common linguistic phenomenon in conversation, occurring when a speaker repeats lexical content from their communication partner’s prior utterance (Che et al., Reference Che, Brooks, Alarcon, Yannaco and Donnelly2018; Sokolov & MacWhinney, Reference Sokolov and MacWhinney1990; Speidel & Nelson, Reference Speidel, Nelson, Speidel and Nelson1989). Lexical overlap occurs in adult–adult and adult–child conversations (Pickering & Garrod, Reference Pickering and Garrod2004), across languages (Casla et al., Reference Casla, Méndez-Cabezas, Montero, Murillo, Nieva and Rodríguez2022; Che & Brooks, Reference Che, Brooks, Dionne and Covas2021; Chieng et al., Reference Chieng, Wynn, Wong, Barrett and Borrie2024; Clark & Bernicot, Reference Clark and Bernicot2008), and with considerable frequency (Che & Brooks, Reference Che, Brooks, Dionne and Covas2021). In early parent–child conversations, lexical overlap occurs in about 15%–25% of child utterances and 7%–14% of parent utterances (Casla et al., Reference Casla, Méndez-Cabezas, Montero, Murillo, Nieva and Rodríguez2022; Che et al., Reference Che, Brooks, Alarcon, Yannaco and Donnelly2018; Che & Brooks, Reference Che, Brooks, Dionne and Covas2021; Masur & Rodemaker, Reference Masur and Rodemaker1999). As such, lexical overlap is a pervasive feature of discourse (Chieng et al., Reference Chieng, Wynn, Wong, Barrett and Borrie2024; Pickering & Garrod, Reference Pickering and Garrod2004; Užgiris et al., Reference Užgiris, Broome, Kruper, Speidel and Nelson1989).
In addition to being common, lexical overlap has clear benefits within parent–child conversations. It can serve a range of communicative functions, such as confirming, clarifying, or extending a message, which establishes common ground in a transparent way (Clark, Reference Clark, MacWhinney and O’Grady2015; Clark & Bernicot, Reference Clark and Bernicot2008; Pickering & Garrod, Reference Pickering and Garrod2004). Lexical overlap allows toddlers to participate in conversation without having to generate novel linguistic content (Kirchner & Prutting, Reference Kirchner and Prutting1987; Snow, Reference Snow1981). In caregiver utterances, lexical overlap offers children semantically related and temporally adjacent responses (Che et al., Reference Che, Brooks, Alarcon, Yannaco and Donnelly2018), a feature of child-directed speech shown to facilitate language learning (Masek et al., Reference Masek, McMillan, Paterson, Tamis-LeMonda, Golinkoff and Hirsh-Pasek2021). Taken together, lexical overlap enables toddlers and caregivers to co-construct ideas over successive utterances by repurposing words from one utterance to the next (Pickering & Garrod, Reference Pickering and Garrod2004). This may promote sustained engagement in conversation (Užgiris et al., Reference Užgiris, Broome, Kruper, Speidel and Nelson1989) and potentially maximise children’s exposure to responsive, child-directed speech. Both characteristics are central to social (Bruner, Reference Bruner1982; Tomasello, Reference Tomasello, Moore and Dunham1995) and transactional (Camarata & Yoder, Reference Camarata and Yoder2002; Sameroff & Chandler, Reference Sameroff, Chandler, Horowitz, Hetherington, Scarr-Salapatek and Siegal1975) theories of language development.
Most studies examining lexical overlap have used summary measures, characterising the overall frequency and proportion of parent and child lexical overlap across conversations (Casla et al., Reference Casla, Méndez-Cabezas, Montero, Murillo, Nieva and Rodríguez2022; Che et al., Reference Che, Brooks, Alarcon, Yannaco and Donnelly2018; Che & Brooks, Reference Che, Brooks, Dionne and Covas2021; Masur & Rodemaker, Reference Masur and Rodemaker1999). These studies have revealed a positive, concurrent relationship between parent and child lexical overlap (Casla et al., Reference Casla, Méndez-Cabezas, Montero, Murillo, Nieva and Rodríguez2022; Masur, Reference Masur1987; Masur & Eichorst, Reference Masur and Eichorst2002; Masur & Rodemaker, Reference Masur and Rodemaker1999) and demonstrated that lexical overlap, particularly parent lexical overlap, is associated with children’s language outcomes later in development (Casla et al., Reference Casla, Méndez-Cabezas, Montero, Murillo, Nieva and Rodríguez2022; Che et al., Reference Che, Brooks, Alarcon, Yannaco and Donnelly2018; Fusaroli et al., Reference Fusaroli, Weed, Rocca, Fein and Naigles2023; Masur & Eichorst, Reference Masur and Eichorst2002; Olson & Masur, Reference Olson and Masur2012; Soler et al., Reference Soler, Murillo, Nieva, Rodríguez, Mendez-Cabezas and Rujas2023; Tamis-LeMonda et al., Reference Tamis-LeMonda, Bornstein and Baumwell2001; Taumoepeau, Reference Taumoepeau2016; Užgiris et al., Reference Užgiris, Broome, Kruper, Speidel and Nelson1989). However, summary measures offer limited insight into how lexical overlap unfolds within parent–child conversations in real time.
Sequential measures of lexical overlap fill this gap by capturing its turn-by-turn dynamics. Studies employing sequential measures have found that lexical overlap often elicits contingent responses across English-speaking (Farrar, Reference Farrar1987; Masur & Olson, Reference Masur and Olson2008; Olson & Masur, Reference Olson and Masur2012; Scherer & Olswang, Reference Scherer and Olswang1984) and French-speaking (Clark & Bernicot, Reference Clark and Bernicot2008) parents and children. For instance, Masur and Olson (Reference Masur and Olson2008) found that parents responded to toddler lexical overlap approximately 90% of the time, and toddlers responded to parent lexical overlap approximately 80% of the time. Moreover, speakers respond to lexical overlap with lexical overlap (Clark & Bernicot, Reference Clark and Bernicot2008; Farrar, Reference Farrar1987; Masur & Olson, Reference Masur and Olson2008; Scherer & Olswang, Reference Scherer and Olswang1984). Both Farrar (Reference Farrar1987) and Scherer and Olswang (Reference Scherer and Olswang1984) examined two specific forms of lexical overlap and found that 2-year-old children were more likely to respond with an imitation if their parents had first produced an expansion. Unfortunately, relatively few studies use sequential measures (Clark & Bernicot, Reference Clark and Bernicot2008; Farrar, Reference Farrar1987; Masur & Olson, Reference Masur and Olson2008; Scherer & Olswang, Reference Scherer and Olswang1984; Sokolov, Reference Sokolov1993), likely because sequential analysis is time- and labour-intensive. There are compelling theoretical reasons to pursue this work. The transactional model of development emphasises that real-time interactions incrementally shape children’s developmental outcomes (Sameroff & Chandler, Reference Sameroff, Chandler, Horowitz, Hetherington, Scarr-Salapatek and Siegal1975). Examining how lexical overlap influences caregiver and child participation in early conversations may shed light on how conversations emerge over time and pinpoint precise features of conversation that support children’s language development (Camarata & Yoder, Reference Camarata and Yoder2002; Masur & Olson, Reference Masur and Olson2008). To conduct this work, efficient tools for sequential analysis are needed.
The CHIP and CHIPUTIL programs in Child Language ANalysis (CLAN) were developed to automate the coding of lexical overlap (MacWhinney, Reference MacWhinney2000). CHIP analyses source–response utterance pairs and codes the response for matches, additions, deletions, and substitutions (Sokolov & MacWhinney, Reference Sokolov and MacWhinney1990). CHIP has been used in several investigations to efficiently compute summary measures of lexical overlap (Casla et al., Reference Casla, Méndez-Cabezas, Montero, Murillo, Nieva and Rodríguez2022; Che et al., Reference Che, Brooks, Alarcon, Yannaco and Donnelly2018; Che & Brooks, Reference Che, Brooks, Dionne and Covas2021; Soler et al., Reference Soler, Murillo, Nieva, Rodríguez, Mendez-Cabezas and Rujas2023). However, it is not well suited for computing sequential measures. CHIP only codes the response utterance, without linking it back to the source. Thus, automated CHIP measures only encompass the speaker who produced lexical overlap, not the speaker who produced the source utterance. For a sequential analysis, it is necessary to derive measures for both speakers.
To address this limitation, and at the request of the first author, the developers of CLAN created CHIPUTIL. CHIPUTIL extends CHIP’s functions by coding the source utterance and linking it to the response utterance coded by CHIP. Additionally, CHIPUTIL can extract and organise these linked source–response pairs into two subsets: those in which the response contains lexical overlap and those in which it does not. Researchers can then apply any CLAN program to these subsets to compute a range of linguistic and discourse measures. Measures computed on the source utterances offer information about the first turn in a conversational exchange, and the linked response utterance indicates whether lexical overlap was produced on the subsequent turn or not. Thus, automated measures can be computed on source utterances and compared relative to the presence or absence of lexical overlap in the response utterances.
In this study, we examine whether certain types of parent utterances increase the likelihood of children’s lexical overlap in their subsequent turns. We focus on parent imitations and expansions because these are associated with child lexical overlap in studies employing summary measures (Casla et al., Reference Casla, Méndez-Cabezas, Montero, Murillo, Nieva and Rodríguez2022; Masur, Reference Masur1987; Masur & Eichorst, Reference Masur and Eichorst2002; Masur & Rodemaker, Reference Masur and Rodemaker1999) and in the few small-scale studies employing sequential analyses (Farrar, Reference Farrar1987; Scherer & Olswang, Reference Scherer and Olswang1984). To illustrate the differences in these approaches, we ask the following research questions:
-
1. Is there an association between a summary measure of child lexical overlap and a combined summary measure of parent imitations and expansions?
-
2. Do parent imitations and expansions increase the likelihood of lexical overlap in the child’s subsequent turn?
We hypothesise positive associations for both, with sequential measures offering more insight into how lexical overlap contributes to parent–child conversations.
1. Method
1.1. CHILDES corpus
The current study utilised the Champaign longitudinal corpus (Rispoli & Hadley, Reference Rispoli and Hadleyn.d.) in the Child Language Data Exchange System (MacWhinney, Reference MacWhinney2000). The corpus contains transcripts of parent–child language samples recorded when children were 1;9, 2;0, 2;3, 2;6, 2;9, and 3;0 years. All 44 children in this database spoke General American English, were developing language typically, and had no significant medical history. Language samples were collected in a lab playroom where researchers provided a standard set of toys (i.e., baby doll with accessories, play kitchen set, bubbles, blocks, windup toys, and bowling set) and instructed parents to “play as they would at home” for 30 minutes (Hadley et al., Reference Hadley, Rispoli, Holt, Fitzgerald and Bahnsen2014). Language samples were video recorded and transcribed by research assistants (for full procedures, see Hadley et al., Reference Hadley, Rispoli, Holt, Fitzgerald and Bahnsen2014).
We analysed 44 parent–child language samples collected at child age 1;9 (SD = 0.38 months). Per parent report, children were 86.4% White (n = 38), 9.1% Black (n = 4), and 4.6% Mixed race (n = 2). No children were reported to be Hispanic. Parents included 42 mothers and 2 fathers, with education levels ranging from high school diploma (n = 2), some college or an associate’s degree (n = 7), and a bachelor’s degree (n = 21), to an advanced degree (n = 14).
1.2. Procedures
Transcripts were analysed using CLAN (MacWhinney, Reference MacWhinney2000). For detailed procedures, refer to Appendix B of Harrington (Reference Harrington2025). We first standardised the spelling of lexical items using the FREQ and CHSTRING commands to ensure that CHIP could accurately identify matches among child-like variations of lexical items (e.g., duck and ducky). We used the CHIP program to code for parent and child lexical overlap (Figure 1). Child utterances were coded for spontaneous lexical overlap, which was adapted from the CLAN definition (MacWhinney, Reference MacWhinney2000) and operationalised as the immediate overlap of meaningful lexical content in a spontaneous manner. To achieve this, we added unique operators to the CHIP command line. The +q3 operator narrowed CHIP’s analysis window from six to three utterances, so only child utterances within two utterances of the parent were coded. The -h operator excluded conversational devices (e.g., yeah and okay) and high-frequency function words (e.g., the, is, and you), an approach adapted from Sokolov and Moreton (Reference Sokolov, Moreton, Sokolov and Snow1994) to direct CHIP to ignore less meaningful instances of child lexical overlap (e.g., CHIP ignored the overlap of yeah in M: Yeah; C: Yeah). Child lexical overlap codes were placed on a %chi tier. After running CHIP, we used the FREQ command to search for the words say and tell on the parent speaker tier to find instances where parents explicitly prompted for lexical overlap (e.g., Say dog) and the children responded with prompted lexical overlap (e.g., Dog). We identified 156 parent prompts (n = 30 parents) to which children (n = 15) responded 44 times with prompted lexical overlap. The %chi tiers were removed from these 44 utterances. Thus, the remaining CHIP codes reflected spontaneous lexical overlap.

Figure 1. CHAT transcript excerpt and study-relevant code descriptions. Note: This CHAT transcript is annotated to define CHIP and CHIPUTIL codes mentioned throughout this paper. Refer to the CLAN manual (MacWhinney, Reference MacWhinney2000) for the description of other codes.
We ran the CHIP program again to code for parent imitations and expansions. A parent imitation was defined as an exact replication of all or some of the child’s words, whereas a parent expansion was defined as an exact replication of all the child’s words with new additions (MacWhinney, Reference MacWhinney2000). Again, the +q3 operator was added to the base CHIP command to set the CHIP analysis window to three utterances, and the -h operator was added to direct CHIP to ignore conversational devices. Unlike the child CHIP coding pass, high-frequency function words were not excluded when coding parent imitations and expansions because adding function words to make a child’s utterance more grammatically complete is a recognised form of expansion (Baker & Nelson, Reference Baker and Nelson1984). Parent lexical overlap codes were placed on an %adu tier.
The first author verified the accuracy of CHIP coding by checking for errors of commission (i.e., CHIP coded overlap in error) and errors of omission (i.e., CHIP missed an instance of overlap) across every child and parent utterance in nine randomly selected transcripts (20%). For child lexical overlap, four errors of commission and three errors of omission were found across 1,144 child utterances (99.4% accuracy). For parent imitation and expansion, two errors of omission were found across 3,248 parent utterances (99.9% accuracy).
Finally, we used the CHIPUTIL program to link parent source utterances with child responses based on the existing CHIP codes in the transcript (Figure 1). A $SOURCE code was placed on a %chU tier of the parent utterance if the child response had a CHIP overlap code on the %chi tier. A $NON code was placed on the %chU tier of the parent utterance if there was no child utterance within the next two utterances or if the child response did not have a CHIP overlap code on the %chi tier of the child response. The +s operator extracted all pairs of parent $SOURCE utterances and child responses into a new file. The -s operator extracted all parent $NON utterances and any corresponding child responses into a new file. This allowed measures to be automatically computed on different subsets of parent utterances: those followed by child lexical overlap responses ($SOURCE) and those that were not ($NON).
1.3. Variables
We used the MLU and FREQ commands to compute the mean length of utterance in morphemes, the number of different words, and the frequency of communication acts and analysis set utterances. Communication acts were defined as any verbal or nonverbal utterance in the transcript, including fully unintelligible utterances. Analysis set utterances were a subset of communication acts, excluding utterances that were (a) nonverbal, (b) fully unintelligible, (c) made up exclusively of words on the conversational device and high-frequency function word lists, and (d) prompts for lexical overlap (for parents) or responses to these prompts (for children).
The FREQ command was used to compute the frequency of child spontaneous lexical overlap, parent imitation, and parent expansion. Refer to the CHIP section in the CLAN manual for the definitions of the codes (MacWhinney, Reference MacWhinney2000). Child spontaneous lexical overlap was computed by summing the $DIST codes on the %chi tier, which was chosen because it is present on every instance of lexical overlap, regardless of type. The frequency of parent imitations was computed by totalling the $EXACT and $REDUC codes on the %adu tier, and the frequency of parent expansions was computed by totalling the $EXPAN codes on the %adu tier. Parent imitation and expansion variables were combined for a summary measure of parent imitation and expansion.
To compute sequential variables, FREQ summed the frequency of parent imitations and expansions in the $SOURCE transcripts and the $NON transcripts, separately. This resulted in four variables reflecting sequential parent➔child turns:
-
(a) parent imitation➔child spontaneous lexical overlap,
-
(b) parent imitation➔no child spontaneous lexical overlap,
-
(c) parent expansion➔child spontaneous lexical overlap,
-
(d) parent expansion➔no child spontaneous lexical overlap.
We computed the frequency of all other parent utterances by subtracting the frequency of parent imitations and expansions from the frequency of parent analysis set utterances in the $SOURCE and $NON transcripts, separately. This resulted in the final two sequential variables:
-
(a) frequency of other parent utterance➔child spontaneous lexical overlap,
-
(b) frequency of other parent utterance➔no child spontaneous lexical overlap.
Together, these six variables characterised every parent analysis set utterance by the type of utterance it was (i.e., parent imitation, parent expansion, and other) and whether children responded to it with spontaneous lexical overlap or not (i.e., $SOURCE or $NON).
1.4. Data analysis
Descriptive statistics were used to summarise children’s spontaneous lexical overlap, parent imitation, and expansion. We analysed the association between the summary measures of child spontaneous lexical overlap and parent imitation and expansion using a bivariate Spearman correlation, which was appropriate for the non-normally distributed data.
Next, we restructured the dataset from a participant-level format to a long-form utterance-level format using the dplyr (Wickham et al., Reference Wickham, François, Henry, Müller and Vaughan2023) and tidyr (Wickham et al., Reference Wickham, Vaughan and Girlich2024) packages in R. This dataset included 17,884 rows, each reflecting one parent analysis set utterance. Columns represented categorical variables of participant ID, parent utterance type, and child lexical overlap. Each parent contributed multiple utterances. This nested data structure justified the use of mixed effects logistic regression (MELR) model, which was fit using the lme4 package (Bates et al., Reference Bates, Mächler, Bolker and Walker2015) in R via maximum likelihood estimation with Laplace approximation. The binary outcome variable was the presence or absence of spontaneous lexical overlap in the child’s response. Parent utterance type was modelled as the fixed effect with three levels: parent imitation, parent expansion, and other. Parent “other” utterances were set as the reference level. Participant ID was modelled as the random effect to account for the nested data structure and subject-level variability. An analysis of variance test was used to compare the Akaike information criterion between the base model (with only the random effect) and the full model (with both random and fixed effects) and evaluate the final model fit. Odds ratios were computed from the fixed effects coefficient estimates.
2. Results
Table 1 summarises descriptive data for summary measures of parent and child lexical overlap and related child language variables. Our first research question examined the association between summary measures of child’s spontaneous lexical overlap and combined parent imitations and expansions. A bivariate Spearman correlation revealed a moderately strong and positive association (r(42) = 0.656, p < .001).
Table 1. Descriptives of child and parent variables

Note: CDI = MacArthur Bates Communicative Development Inventory (Fenson et al., Reference Fenson, Marchman, Thal, Dale, Reznick and Bates2007).
a Computed by dividing spontaneous lexical overlap by analysis set utterances.
b Computed by dividing imitation + expansion by analysis set utterances.
Our second research question examined the sequential association between parent utterance type and child spontaneous lexical overlap. Table 2 collapses data across all participants to display the frequency of child spontaneous lexical overlap responses by parent utterance type. Two MELR models were fit to test whether parent imitation and expansion increased the likelihood of child lexical overlap. Relative to a base model with only Participant ID as a random effect, adding parent utterance type as a fixed effect significantly improved model fit (χ2(2) = 95.42, p < .0001). Table 3 presents model estimates and fit statistics. Fixed effects indicated that both parent imitation and expansion significantly increased the log-odds of child lexical overlap compared to the reference category of other parent utterances (all p < .0001). Odds ratios indicated that child spontaneous lexical overlap was more than twice as likely to follow parent imitation (OR = 2.55, 95% CI [2.05, 3.18]) and parent expansion (OR = 2.23, 95% CI [1.82, 2.74]) than other parent utterance types.
Table 2. Frequency and proportion of child spontaneous lexical overlap relative to parent utterance type

a The number of utterances in the “Yes” column divided by the total number of parent utterances in that row.
Table 3. Model estimates for the effect of parent utterance type on child spontaneous lexical overlap

Note: AIC = Akaike information criterion; SD = standard deviation.
3. Discussion
This study piloted the CHIPUTIL program, demonstrating its utility in efficiently analysing spontaneous lexical overlap in a sequential manner. This novel automated approach allowed us to include a larger sample of dyads and longer language samples compared to previous sequential analyses of lexical overlap (Clark & Bernicot, Reference Clark and Bernicot2008; Farrar, Reference Farrar1987; Masur & Olson, Reference Masur and Olson2008; Scherer & Olswang, Reference Scherer and Olswang1984). Moreover, by conducting analyses with both summary and sequential measures, we revealed the added value of sequential analyses. Consistent with prior findings (Casla et al., Reference Casla, Méndez-Cabezas, Montero, Murillo, Nieva and Rodríguez2022; Masur, Reference Masur1987; Masur & Eichorst, Reference Masur and Eichorst2002; Masur & Rodemaker, Reference Masur and Rodemaker1999), we observed a positive, moderately strong association between summary measures of child lexical overlap and parent imitations and expansions. Our sequential findings built on this, revealing that child lexical overlap was twice as likely to follow a parent imitation or expansion than other parent utterance types. These results suggest that the association observed between parent and child summary measures may be partially driven by sequences of parent–child lexical overlap.
Parent imitations and expansions appeared to invite children to participate in multi-turn exchanges. Although our analysis focused on the association between a parent utterance and an immediate child response, these sequences involved three turns: a child utterance, a contingent parent imitation or expansion, and a child response containing spontaneous lexical overlap. These sequences reflected moments of cohesive discourse, where children and parents built on each other’s contributions in semantically related ways (Pickering & Garrod, Reference Pickering and Garrod2004). Parent imitations and expansions helped maintain an established topic and modelled lexical overlap as a natural and meaningful way to sustain conversation (Clark & Bernicot, Reference Clark and Bernicot2008; Kirchner & Prutting, Reference Kirchner and Prutting1987; Pickering & Garrod, Reference Pickering and Garrod2004; Snow, Reference Snow1981; Taumoepeau, Reference Taumoepeau2016; Užgiris et al., Reference Užgiris, Broome, Kruper, Speidel and Nelson1989). In turn, children repurposed their parents’ words in their response. For these children transitioning from single words to word combinations, it is possible these sequences of parent–child lexical overlap allowed them to communicate messages they would not have been able to generate independently (Kirchner & Prutting, Reference Kirchner and Prutting1987; Snow, Reference Snow1981). In this way, lexical overlap may play a key role in scaffolding early conversations that resemble the multi-turn structure and cohesion typical of adult–adult discourse. Not only does this offer opportunities for children to actively participate in and sustain conversations early in language development, but it also creates conditions well suited for language learning (Casla et al., Reference Casla, Méndez-Cabezas, Montero, Murillo, Nieva and Rodríguez2022; Che et al., Reference Che, Brooks, Alarcon, Yannaco and Donnelly2018; Fusaroli et al., Reference Fusaroli, Weed, Rocca, Fein and Naigles2023; Hirsh-Pasek et al., Reference Hirsh-Pasek, Adamson, Bakeman, Owen, Golinkoff, Pace, Yust and Suma2015; Levickis et al., Reference Levickis, Reilly, Girolametto, Ukoumunne and Wake2014; Masek et al., Reference Masek, McMillan, Paterson, Tamis-LeMonda, Golinkoff and Hirsh-Pasek2021; Tamis-LeMonda et al., Reference Tamis-LeMonda, Bornstein and Baumwell2001).
The findings point to promising future directions for understanding children’s language development. Grounded in the transactional model of development (Camarata & Yoder, Reference Camarata and Yoder2002; Sameroff & Chandler, Reference Sameroff, Chandler, Horowitz, Hetherington, Scarr-Salapatek and Siegal1975), this work highlights the value of studying proximal, real-time transactions to inform the mechanisms underlying development. Our findings suggest that parent imitations and expansions promote child lexical overlap, and together, these sequences of parent–child lexical overlap create contingent, multi-turn exchanges. Similarly, an extensive body of research has demonstrated that parent input that is temporally adjacent, semantically related, and embedded in multi-turn interactions is associated with children’s language development (Hirsh-Pasek et al., Reference Hirsh-Pasek, Adamson, Bakeman, Owen, Golinkoff, Pace, Yust and Suma2015; Levickis et al., Reference Levickis, Reilly, Girolametto, Ukoumunne and Wake2014; Masek et al., Reference Masek, McMillan, Paterson, Tamis-LeMonda, Golinkoff and Hirsh-Pasek2021; Tamis-LeMonda et al., Reference Tamis-LeMonda, Bornstein and Baumwell2001). Taken together, it seems possible that child lexical overlap may serve as an entry point into multi-turn conversations that set in motion the kinds of input and interactions that contribute to later language development. Our future work will explore how these parent–child lexical overlap sequences evolve over time, how early lexical overlap contributes to broader measures of multi-turn, contingent exchanges, and whether this transactional pathway explains language growth over time. CHIPUTIL will allow for efficient replication of sequential analyses, offering a feasible approach to test these hypotheses.
We see promise in the CHIPUTIL program for the broader child language community. To the best of our knowledge, this is the first study to publish on this novel automated coding approach. A major benefit of automated programs is the ability to replicate work across labs in a scalable way. Moreover, CHIPUTIL is flexible in what it allows researchers to analyse. While our work is focused on parent and child lexical overlap, CHIPUTIL allows researchers to explore other features as well. For example, researchers might be interested in understanding whether child lexical overlap is dependent on parent utterance length. This could be easily adapted from our methodology, replacing measures of parent imitation and expansion with a measure of parent utterance length. In this way, CHIPUTIL is a broadly applicable resource for the field.
While this study holds promise for future work, several limitations should be considered. The sample lacked racial, ethnic, and parental educational diversity, limiting generalisability. In addition, all children spoke General American English, which is common across the lexical overlap literature. This further motivates the need for efficient and replicable methodologies to extend work on lexical overlap to more diverse samples. While the utterance-level sequential analysis enabled fine-grained analysis of conversational sequences, we recognise our model was simple and did not include specific participant-level predictors. Future work should examine how parent and child characteristics influence sequential patterns. Moreover, this analysis was restricted to three-turn sequences. In future work, we plan to explore automated approaches that characterise longer sequences. Finally, our focus on a single time point limits our ability to speak directly to later language outcomes. However, we believe that the present findings are promising, offering new directions to understand how lexical overlap may contribute to child language development and an efficient and replicable approach to pursue this work.
Acknowledgements
The CHIPUTIL program was created by Brian MacWhinney and Leonid Spektor at the request of E.K.H. We thank them for their efforts.
Disclosure of the use of AI
The authors wrote all original text. ChatGPT was used to provide editorial suggestions to improve clarity and reduce word count forthis Brief Report. The first author reviewed suggested changes, revising some and rejecting others, and the second and third authors reviewed and edited the final draft. The authors take full responsibility for the content of this publication.
Funding statement
Data collection for the archival data used in this study was supported by National Science Foundation Grant BCS-08-22513, awarded to M.R. and P.A.H. Secondary analysis was partially supported by an Illinois Distinguished Fellowship awarded to E.K.H. by the Graduate College at the University of Illinois Urbana–Champaign.
Competing interests
The authors declare no competing interests.