Children are exceptional information seekers (e.g., Mani & Ackermann, Reference Mani and Ackermann2018; Saylor & Ganea, Reference Saylor and Ganea2018). When faced with novelty or uncertainty, children direct their gaze to gather information from speakers about objects and labels (e.g., Hembacher, deMayo, & Frank, Reference Hembacher, deMayo and Frank2020; Vaish, Demir, & Baldwin, Reference Vaish, Demir and Baldwin2011) and point to elicit linguistic input (e.g., Begus, Gliga, & Southgate, Reference Begus, Gliga and Southgate2014; Lucca & Wilbourn, Reference Lucca and Wilbourn2018). Children’s ability to solicit language-relevant information about objects and actions develops rapidly past these early nonverbal attempts; preschoolers demonstrate a robust ability to verbally request information about labels and word meanings (e.g., Chouinard, Harris, & Marastos, Reference Chouinard, Harris and Marastos2007; Janakiefski et al., Reference Janakiefski, Tippenhauer, Liu, Green, Loughmiller and Saylor2022; Janakiefski, Guicherit, & Saylor, Reference Janakiefski, Guicherit and Saylor2024; Jimenez, Sun, & Saylor, Reference Jimenez, Sun, Saylor, Saylor and Ganea2018). While it is clear that children are adept at seeking information from others, it is less clear what the costs or benefits of such information-seeking episodes may be. The present study investigates whether children experience retention benefits after asking a question. To investigate this, we focus on the special case of information-seeking during word learning.
Prior research on children’s word learning has clarified that preschoolers leverage a host of skills to rapidly grow their vocabularies. For example, they learn words by tracking how often objects they see co-occur with labels they hear (e.g., Smith & Yu, Reference Smith and Yu2008), by drawing on social cues from speakers (e.g., Baldwin, Reference Baldwin1993; Baldwin & Tomasello, Reference Baldwin and Tomasello1998), and by extracting meaning from syntactic information (e.g., Naigles, Reference Naigles1990). Notably, the most-cited theories of word learning have primarily painted the child as a passive recipient of adult input, absorbing the cues that are present in the environment (Bloom, Reference Bloom2000).
However, recent work has clarified that children also ask questions about novel vocabulary and that they tailor their questions to their informational needs. In one study, preschoolers who were presented with pictures of unfamiliar objects asked more than twice as many questions about the name of the object as about what the object’s function was (Ünlütabak, Nicolopoulou, & Aksu-Koç, Reference Ünlütabak, Nicolopoulou and Aksu-Koç2019). Other work has found that preschoolers ask more questions about novel than about familiar labels and continue asking questions if they receive uninformative definitions (Janakiefski, Tippenhauer, et al., Reference Janakiefski, Tippenhauer, Liu, Green, Loughmiller and Saylor2022). This finding aligns with research that shows children prefer informants who offer informative, rather than uninformative definitions (Tippenhauer, Sun, Jimenez, Green, & Saylor, Reference Tippenhauer, Sun, Jimenez, Green and Saylor2020). Together this work provides suggestive evidence that when asking questions about new words, children are attuned to the potential for knowledge acquisition, targeting unknown information with their questions. These findings are in agreement with work on preschooler’s use of explanatory (i.e., why) questions (e.g., Frazier, Gelman, & Wellman, Reference Frazier, Gelman and Wellman2009, Reference Frazier, Gelman and Wellman2016; Mills & Landrum, Reference Mills and Landrum2016; Mills, Legare, Bills, & Mejias, Reference Mills, Legare, Bills and Mejias2010) suggesting a similar mechanism may be at play.
Asking effective questions about word meanings and other content relies on many different cognitive and linguistic abilities (e.g., Jimenez et al., Reference Jimenez, Sun, Saylor, Saylor and Ganea2018; Ronfard, Zambrana, Hermansen, & Kelemen, Reference Ronfard, Zambrana, Hermansen and Kelemen2018). Jimenez et al. (Reference Jimenez, Sun, Saylor, Saylor and Ganea2018) describe cognitive processes involved in asking questions about new words that include monitoring uncertainty about word meanings, being aware of lexical ignorance, and finding motivation in the desire to fill a gap in their word knowledge. Additionally, more general cognitive skills that are at play in question-asking include identifying who to ask and what information to ask about (Ronfard et al., Reference Ronfard, Zambrana, Hermansen and Kelemen2018). Put another way, in the process of asking a question about a word, children must first recognise that information is lacking. Children can then reflect on their prior knowledge to identify what information to ask for, as well as how to phrase a question so that their speech partner understands what information to provide in response. In parallel, children must identify who to ask and whether the context is appropriate for a question. Finally, children evaluate the response to their question and decide whether additional information is needed. It is possible that this process leads to an activation of cognitive skills that benefit children’s learning. However, it is also possible that the cognitive load incurred by asking a question might lead to costs that impede or stall learning.
On the side of potential benefits, asking questions may heighten children’s attention to and encoding of the requested information, supporting retention. One proposal suggests that the process of recognising an information gap and seeking information leads to enhanced memory by increasing attention, encoding, and consolidation even at the neural circuitry level (i.e., PACE framework; Gruber & Ranganath, Reference Gruber and Ranganath2019). These generative learning behaviours also allow for deeper processing, resulting in more elaborated knowledge structures (Chi et al., Reference Chi, Adams, Bogusch, Bruchok, Kang, Lancaster, Levy, Li, McEldoon, Stump, Wylie, Xu and Yaghmourian2018; Chi & Wylie, Reference Chi and Wylie2014). This “desirable difficulty” during a learning task is proposed to lead to deeper processing, which leads to better learning and retention (Bjork, Reference Bjork2018). There have been many studies describing the contexts in which children ask questions and whom they are likely to seek information from. However, there have been few direct tests of whether questions about words or other things enhance or augment learning. For this reason, in what follows below, we focus on somewhat dated, but classic studies, that are most relevant to the current investigation.
Some classic research suggests that question-asking may support retention in first- to fifth-graders (Ross & Balzer, Reference Ross and Balzer1975; Ross & Killey, Reference Ross and Killey1977). In this work, children were placed into pairs, and an experimenter showed them slides with pictures and descriptive statements. Both children in the pair were invited to alternate asking the experimenter questions in a detective game about the slides, and researchers compared children’s memory for the answers they received to answers they heard their partner receive. In both studies, children remembered answers to their own questions better than answers they heard their partner receive, even after a 3-day delay, suggesting that being the one who asks the question leads to higher memory.
However, asking questions may not benefit learning. It is possible that the cognitive requirements of asking a question may instead make retention of the provided information difficult. Asking a question may split attention between the act of asking a question and attending to the response to the question. When attention is divided, memory performance declines (e.g., Craik, Naveh-Benjamin, Govoni, & Anderson, Reference Craik, Naveh-Benjamin, Govoni and Anderson1996). The combination of generating the question and attending to the answer may increase cognitive load compared to only attending to information that is offered. Novice learners are prone to high levels of cognitive load, resulting in challenges associated with organising information, which can limit the intake and retention of critical information (e.g., Kirschner, Sweller, & Clark, Reference Kirschner, Sweller and Clark2006; Sweller, Reference Sweller1988). As such, young children may have trouble encoding and linking the response they receive to the information they requested.
Given the potential cognitive difficulty of asking a question, young learners may benefit from being provided with direct instruction about word meanings rather than actively eliciting the information. Children may show more success at acquiring new words and meanings when provided with informative input from adults, rather than having to determine what relevant input would be useful to learn a new word. As such, listening to direct instruction or input could lead to greater retention of new information than question-asking.
Some suggestive evidence of this possibility comes from classic work with preschoolers, which has shown no benefit of asking questions for retention of the answers (Pierce, Reference Pierce1985, Reference Pierce1990). Pierce (Reference Pierce1985) observed 3- and 4-year-olds during naturalistic interactions with science displays with their parents. Parents were given a sheet of information about each display and asked to discuss the information and answer their child’s questions during a 20-minute session; researchers recorded the child’s questions and any information the parent mentioned from the sheet. Overall, 3- and 4-year-olds recalled the answers to their questions less than the other information the parent had mentioned, suggesting that asking questions may have impaired their memory for the information they requested. In a similar study, Pierce (Reference Pierce1990) showed preschoolers’ memory for the answers to their own questions was no different from their memory for other information, suggesting that asking questions provided no added benefit. Importantly, in both studies, children’s variable knowledge of the topic may have influenced the frequency of question-asking (and subsequent retention of information). Regardless, Pierce (Reference Pierce1985) and Pierce (Reference Pierce1990) indicate that asking questions may impact learning in variable ways, not necessarily providing a boost for retention.
It is clear from the preceding discussion that the previous work investigating the impact of question-asking on retention has conflicting findings. Some evidence suggests that asking questions may benefit children’s memory for novel word meanings (e.g., Ross & Balzer, Reference Ross and Balzer1975 ; Ross & Killey, Reference Ross and Killey1977). Other evidence suggests that asking about novel word meanings may not provide a boost in retention compared to listening (e.g., Pierce, Reference Pierce1985, Reference Pierce1990). However, features of past studies may have obscured the effect of asking questions versus listening on retention. Ross and Balzer (Reference Ross and Balzer1975) and Ross and Killey (Reference Ross and Killey1977) compared whether children were more likely to remember answers to their own versus others’ questions during a collaborative problem-solving game. Because two children were engaged in the game, collaborative learning was confounded with the effect of asking questions, making it difficult to determine whether peer dynamics influenced the findings. Further, the work by Pierce (Reference Pierce1985, Reference Pierce1990) did not experimentally control opportunities for question-asking versus listening, but used the naturalistic questions that children produced during play observations, meaning that there could be confounding variables influencing the findings. As such, it is still unclear whether asking a question differs from listening to information provided directly in terms of retention.
To address these limitations, in the current study, preschool children were taught novel words in either a Question-Asking or Listening condition. In both conditions, children were asked to move pictures of novel objects around according to instruction statements that contained a novel label (e.g., “Can you put the teebu in the bucket?”). The target pictures were placed in arrays with other novel pictures, such that the referent of a given novel label was ambiguous. In the Question-Asking condition, children had the opportunity to ask questions before being given object descriptions. In the Listening condition, children were given object descriptions immediately after hearing the new word (so there was no time for them to ask a question before receiving the relevant information). At test, children were asked to select the targets.
If asking questions enhances attention to the requested information, children who have an opportunity to ask a question should show higher rates of selecting the targets at test than those who are provided with the same information unprompted. Alternatively, if asking a question does not benefit learning, children who have an opportunity to ask a question should select the target at the same or lower rates than those who are provided with the same information unprompted. Random assignment ensured that differences in children’s responses during the experimental task were the result of the manipulation rather than individual or contextual factors. While these factors are undoubtedly important determinants of children’s behaviour, investigating their impact was not the focus of the current study.
1. Methods
1.1. Participants
Four- to six-year-old children (N = 64) participated in the study (M age = 4.99, SDage = 0.60, 31 females). Preschoolers were recruited from a database in a large southeastern city in the United States. Three additional children participated but were not included in the final sample due to failure to complete all exposure trials (n = 2), or experimenter error (n = 1). Participants were English-speaking with no reported diagnoses of developmental disorders, speech or language delays, hearing loss or impairment, or vision loss. Demographic information is listed in Table 1.
Table 1. Demographic information collected from participating families

1.2. Open practices
The hypotheses, design, and analysis plan for all experiments were pre-registered. The pre-registration documents are available at the project’s Open Science Framework (OSF) site (https://doi.org/10.17605/OSF.IO/3EZFR). Any analyses not included in the pre-registration are listed below as exploratory.
1.3. Sample size determination
Effect sizes were gathered from previous studies assessing children’s novel word learning (e.g., Axelsson, Churchley, & Horst, Reference Axelsson, Churchley and Horst2012; Horst & Samuelson, Reference Horst and Samuelson2008; Marulis & Neuman, Reference Marulis and Neuman2010; Partridge, McGovern, Yung, & Kidd, Reference Partridge, McGovern, Yung and Kidd2015; Roseberry, Hirsh-Pasek, & Golinkoff, Reference Roseberry, Hirsh-Pasek and Golinkoff2014; Russo-Johnson, Troseth, Duncan, & Mesghina, Reference Russo-Johnson, Troseth, Duncan and Mesghina2017) as well as studies assessing question-asking (e.g., Blank & Covington, Reference Blank and Covington1965; Ross & Balzer, Reference Ross and Balzer1975; Ross & Killey, Reference Ross and Killey1977) and converted as needed (Borenstein, Hedges, Higgins, & Rothstein, Reference Borenstein, Hedges, Higgins and Rothstein2009), with studies demonstrating consistently large effect sizes of d > 0.8. An effect size of d = 0.73 (i.e., corresponding to f = 0.37, OR = 3.75) was chosen as a meaningful difference in retention between Question-Asking and Lstening conditions, as this translates to a one-word difference in retention (e.g., 3.5 vs. 4.5 words out of 6, with a standard deviation of 1.361 based on the largest observed standard deviation in Janakiefski, Printz, Warren, and Saylor (Reference Janakiefski, Printz, Warren and Saylor2022)). A power analysis using G*Power with an α = 0.05 to detect an effect of d = 0.73 suggested a sample size of 62 participants in a between-subjects design (one-way ANOVA with six trials) would achieve 80% power. We ran an additional power analysis for a non-parametric test to ensure that we still had sufficient power when we had to pivot analyses (due to a violation of the normality assumption). The resulting analysis suggests a roughly equivalent number of participants (64) would be necessary.
1.4. Materials
Children were shown pictures of familiar and novel objects on laminated cards arranged in 2 × 2 arrays. There was one array per trial, and all arrays contained four pictures. The warm-up phase consisted of three arrays of all familiar pictures. The exposure phase consisted of 18 arrays; 12 arrays where all pictures were novel and six arrays where all pictures were familiar. The six familiar trials were interleaved with the 12 novel trials to keep the task engaging. Novel objects and labels were selected from the NOUN Database (Horst & Hout, Reference Horst and Hout2016). Familiar objects were selected based on items in the MacArthur-Bates Communicative Development Inventories (Fenson et al., Reference Fenson, Pethick, Renda, Cox, Dale and Reznick2000) and pulled from a Google search. Each picture within a given array was distinct in colour (e.g., only one blue picture in a set). Presentation order as well as respective position in the arrays were counterbalanced. There was a list of three locations where participants were asked to place the selected object, which included a bucket, box, or “spaceship” (i.e., a container with a picture of a spaceship on the front). Locations were counterbalanced to avoid the final designation of the object affecting participants’ selections.
The test phase consisted of six arrays composed of pictures drawn from different exposure arrays: the target, a labelled novel distractor (i.e., requested during the exposure phase), a non-labelled novel distractor (i.e., seen during the exposure phase), and a labelled familiar distractor (i.e., requested during the exposure phase). Pictures only appeared once in the test phase. The test contained six words rather than all 12 from the exposure phase because (1) this allowed for the test to contain labelled competitors from the exposure phase and (2) reduced the total number of trials for preschoolers to complete. As such, two test lists of six words each, and a single set of six test arrays were created. Half of the participants were tested on one set of novel word–object pairs (e.g., List 1: spoov, parloo, virdex, mobi, fiffin, nelby), with the other set of novel word–object pairs serving as distractors (e.g., List 2: zoba, tulver, kibby, guffi, teebu, goola). The other half of participants had the reverse. The test arrays were counterbalanced for presentation order as well as respective positions within the 2 × 2 arrangement. Example exposure and test arrays are included in Figure 1.

Figure 1. Example exposure and test arrays.
Note. The top image shows example exposure trials, and the bottom image shows an example test trial. For the retention test, half of the children were tested on the items outlined in blue, and the other half were tested on the items outlined in orange.
1.5. Measures
1.5.1. Parent vocabulary report
Parents were asked to complete a vocabulary report as a check of whether children were familiar with the known items and descriptions (adjectives and colour words) used in the study. Novel words were not included in the report because they were nonsense words drawn from the NOUN Database (Horst & Hout, Reference Horst and Hout2016).
1.5.2. Learning Attitudes Questionnaire
Parents were asked to complete the Learning Attitudes Questionnaire as a measure of children’s curiosity in everyday settings and tendency to try new things (Jirout, Reference Jirout2017). The questionnaire was included for exploratory analyses to consider how children’s curiosity at home would relate to how likely they were to ask questions and learn words in the study. Parents responded using a 5-point Likert scale: rarely/never true (1), not often true (2), sometimes true (3), often true (4), and always true (5). Example items include: “Asks many questions,” “Likes to take things apart to see how they work,” and “Likes to explore new places.”
1.6. Procedure and design
Participants were randomly assigned to either a Question-Asking (n = 32, M age = 4.97, SDage = 0.64, 15 females) or Listening condition (n = 32, M age = 5.01, SDage = 0.57, 16 females) with the constraint that mean age and distribution of male and female participants was roughly equivalent. Neither household income nor maternal education differed across conditions (t(62) < −0.143, p > .887). All participants heard the same request for the novel item (e.g., “Now the aliens need a teebu. Can you get the teebu?”) and the same descriptions (e.g., “A teebu is purple and fuzzy. Yeah! The purple and fuzzy one is called a teebu. Can you put the teebu in the spaceship?”). What differed across conditions was whether participants had a chance to ask a question or were given the description immediately after the request for the item.
Participants were seated across from the experimenter at a table during the study. Participants first completed three warm up trials where they were shown familiar pictures (e.g., “bear,” “lollipop”) and asked to place them in a bucket to familiarise them with interacting with the experimenter and the basic task structure.
1.7. Exposure phase
After the warm-up, participants were asked to help aliens get party supplies for their spaceship party. Participants were told that they should try to remember the party supplies because the aliens would ask about them later. Before the start of the exposure trials, to indicate that participants in the Question-Asking condition could ask questions, preschoolers were told “You might not know some of the words I use. That’s okay! You can ask me about them!” The instructions in the Listening condition differed slightly, where preschoolers were told “You might not know some of the words I use. That’s okay! I’ll tell you about them!” The difference in instructions across conditions was included to strengthen the condition manipulation.
On each exposure trial, participants were shown the arrays and asked to select a specific one from the set, based on a label (e.g., “Now the aliens need a teebu. Can you get the teebu?”) and asked to place them in one of the locations (e.g., bucket, box, or spaceship). In the Question-Asking condition, participants had the opportunity to ask questions after they heard this request to select the novel label. To give the child time to ask a question, the experimenter paused briefly for 3 seconds after saying the novel label before providing the description (e.g., “A teebu is purple and fuzzy. Yeah! The purple and fuzzy one is called a teebu.”). Children heard the standardised descriptions of novel labels in response to their questions, regardless of the type of question they asked. In the listening condition, participants were told the description of the novel picture immediately after they heard the novel label, with no pause between the label and the description. In both conditions, participants heard each novel label five times and heard each description two times in different sentence frames for each novel picture. For example, participants heard the novel noun at the start of the first repetition of the description (i.e., “A teebu is purple and fuzzy”) and then heard the novel noun at the end of the second repetition of the description (i.e., “The purple and fuzzy one is called a teebu”).
1.8. Test phase
At test, participants were told that the aliens invited more friends to their party, so they needed extra party supplies. Children were asked to select the target pictures as a measure of their word learning (e.g., “Can you put another teebu in the spaceship?”).
1.9. Coding
Question-asking frequency, immediate selection of target, and retention of target were coded during the session and verified by video after the session. Questions were transcribed from the videos and question type was coded after the session. To quantify question-asking frequency, for each trial, participants were given a score of 0 if they did not ask a question, and a 1 if they asked a question. If a child asked multiple questions on a given trial, they still only received a score of 1 for that trial. This happened rarely.
Questions were coded during the window after the child heard the novel label. Participants’ questions were also coded for the type of question they asked. The categories of questions were loosely adapted from Jirout (Reference Jirout2017) and included: General questions (“what is a teebu?,” “what does that mean?,” “what does it look like?,” “I don’t know what parloo means”), Specific questions (“is it the blue and spiky one?,” “does it have bumps?”), and Other questions that fell outside of these two main categories of interest (“why does it look like that?,” “is it a toy?,” “what?”). Specific questions were also coded for whether the question matched (“is it the blue one?” for a blue target) or did not match the target (“is it the blue one?” for a red target). Participants’ selection of the target picture on each exposure trial and each test trial was coded for accuracy (0 or 1). Given the array of four objects available, chance performance was 25%. To count as a “selection,” the child had to place the object in one of the locations. Which location they placed it in did not factor into correctness.
A subset of the total sessions (20%, 14 sessions) was coded by a second independent coder to calculate interrater reliability for the question-asking data. Interrater reliability for question-asking frequency was high, with 100% agreement across coders (Cohen’s Kappa = 1.0). Interrater reliability for object selections (immediate selection and retention of the target) was high, with 100% agreement across coders (Cohen’s Kappa = 1.0). Interrater reliability for question type was high, with 98.9% agreement across coders (Cohen’s Kappa = 0.947).
2. Results
2.1. Parent vocabulary report
From the parent vocabulary report, all but two participants understood all nine familiar words (from three warm-up and six familiar exposure trials): one understood seven out of nine and the other eight out of nine (both in the Question-Asking condition). All but one participant knew all nine colour words used in the descriptions (blue, orange, yellow, red, black, green, grey, pink, purple); that participant knew all but “grey” (in the Listening condition). On average, participants understood 10.70 of the 12 adjectives used in the descriptions, with no significant difference across conditions (t(62) = −1.52, p = .133).
2.2. Question-asking frequency
Participants in the Question-Asking condition asked questions on 8.47 of 12 trials (71%, SD = 4.91), and participants in the Listening condition asked questions on 1.16 trials (9.6%, SD = 2.75; see Figure 2 for individual scores on the question-asking frequency measure). A Kruskal–Wallis test showed that participants asked more questions in the Question-Asking condition than the Listening condition (H(1, n = 64) = 26.49, p < .001). There was no trial order effect, such that participants did not change their frequency of question-asking in the first half of trials compared to the second half of trials in either condition (Friedman test χ2 (1) = 0.53, p = .47).

Figure 2. Mean number of trials with questions.
Note. Individual dots represent the number of trials each participant asked at least one question. Children were given a score of 0 or 1 on each trial, which were summed across the 12 novel trials.
2.3. Question type
Participants asked a total of 309 questions about novel labels: 82% were General, 17% were Specific, and < 1% Other. Of the 53 Specific questions, 22% matched the target, and 78% did not. When considering the 150 questions for target items that were tested (6 of the 12 from the exposure phase) 82% were General questions, 17% were Specific, and there was a single Other question. Of the 26 Specific questions about tested items, 23% matched the target, and 77% did not. For sample questions, see Table 2.
Table 2. Example questions for each question type category

The highest-level breakdown for question type (General, Specific, No Question) was included as a predictor of retention accuracy in the model described below. There was also a pre-registered comparison that further subdivided Specific questions into whether they matched or did not match the target. Due to the low frequency of Specific questions overall, the planned analysis to subdivide this category further and consider the effect on retention was not completed.
2.4. Immediate selection accuracy
For familiar trials, participants selected the target for an average of 5.84 out of 6 (97%) familiar trials in the Question-Asking condition (SD = 0.72), and 6 out of 6 (100%) trials in the Listening condition (SD = 0.00). For novel trials, participants in the Question-Asking condition selected the target an average of 11.16 trials out of 12 (93%, SD = 1.73), and participants in the Listening condition selected the target an average of 11.81 trials (98%, SD = 0.40; see Figure 3 for individual immediate selection scores on the novel trials).

Figure 3. Mean immediate selection accuracy.
Note. Mean number of correct selections for the immediate selection score during the Exposure phase. Individual dots represent the number of trials each participant made correct selections.
Participants in the Question-Asking condition (V = 528, p < .001) and the Listening condition (V = 528, p < .001) selected the target above chance (25%) according to Wilcoxon Signed Rank tests, indicating that participants used the descriptions in the moment to make their selections. Comparing performance across conditions with a Kruskal–Wallis test, participants in the Listening condition used the descriptions with more accuracy than the Question-Asking condition (H(1, n = 64) = 3.96, p = .047); however, there was an outlier with a score of 3 out of 12 trials, which was 2.9 standard deviations from the Question-Asking condition mean. When the outlier was removed, the difference was no longer significant (H(1, n = 63) = 3.19, p = .074). A Bayesian Mann–Whitney test indicated a Bayes factor (null/alternative) of BF01 = 8.16, indicating that the data were 8.16 times more likely to occur under the null hypothesis than the alternative hypothesis. A Bayes factor of 3 is typically considered moderate evidence and a factor of 10 is considered strong evidence (Jarosz & Wiley, Reference Jarosz and Wiley2014; Lee & Wagenmakers, Reference Lee and Wagenmakers2014).
2.5. Retention test accuracy
During the retention test, participants in the Question-Asking condition selected the target an average of 2.81 trials out of 6 (47%, SD = 1.18), and participants in the Listening condition selected the target an average of 3.03 trials (50.5%, SD = 0.86; see Figure 4 for individual retention scores). The same pattern of results holds for proportions. Participants in the Question-Aasking condition (V = 489.5, p < .001) and the Listening condition (V = 524, p < .001) selected the target above chance (25%) according to Wilcoxon signed-rank tests. Although not part of our original analysis plan, we coded the full range of selections children made during the delayed test in response to a reviewer concern. Children in the Question-Asking condition selected the named target/named distractor/unnamed distractor/familiar object at rates of 49.5%/41.7%/7.3%/1.6% and children in the listening condition selected from between the target (50.5%) and named distractor (47.9%). They did not make a selection on remaining 1.6% of trials. Even with the small differences in the types of errors children made, selections were remarkably similar across the two conditions.

Figure 4. Mean retention test accuracy.
Note. Mean number of correct selections during the Test phase. Individual dots represent the number of trials each participant made correct selections. Note these data include the 60 participants who completed all six-test trials, as well as two participants who completed four of the six trials, and two participants who completed five of the six test trials.
To determine whether retention differs when asking a question compared to listening, the data was first analysed using a Kruskal–Wallis test to compare overall retention accuracy (score out of 6) across conditions. Participants showed no difference in retention accuracy across conditions (H(1, n = 64) = 0.201, p = .654). This suggests that the opportunity to ask a question to hear the label description did not provide a boost in retention compared to hearing the description unprompted. A Bayesian Mann–Whitney test indicated a Bayes factor (null/alternative) of BF01 = 5.64, indicating that the data were 5.64 times more likely to occur under the null hypothesis than the alternative hypothesis. A Bayes factor of 3 is typically considered moderate evidence and a factor of 10 is considered strong evidence (Jarosz & Wiley, Reference Jarosz and Wiley2014; Lee & Wagenmakers, Reference Lee and Wagenmakers2014).
Second, a more fine-grained analysis with mixed-effects logistic regression was used to test the trial-level effect of asking questions on target selection, while accounting for participant-level variability using the glmer function in lme4 (Bates, Mächler, Bolker, & Walker, Reference Bates, Mächler, Bolker and Walker2015) in R version 4.2.2 (R Core Team, 2021). For this, the dependent measure of test retention was whether (1) or not (0) children selected the target on that trial. A parsimonious random effects structure was determined using the buildmer function (Matuschek, Kliegl, Vasishth, Baayen, & Bates, Reference Matuschek, Kliegl, Vasishth, Baayen and Bates2017; Voeten, Reference Voeten2020) which indicated that all random effects should be removed from the model (equivalent to running a logistic regression). Due to an a priori interest in participant variability, we elected to maintain random intercepts for participants. Note that the participant-intercept estimates were at zero, meaning the results of the participant-intercept model are equivalent to the buildmer-identified logistic regression model.
With a random intercept for participant, the model included the fixed effects of condition (Question-Asking or Listening; 0.5 and −0.5), question type (General, Specific, or No Question; broken down into two contrasts described below), immediate selection accuracy (0 or 1), and age (continuous). Because of the infrequence of the Other questions (only a single Other question occurred for the items that were tested), this category was included in the General questions category for analysis. Two Helmert contrasts for question type were defined. The first contrast compared asking a question (General or Specific; 0.25 and 0.25) to not asking a question (No Question; −0.5), reported as “Ask versus Not Ask” in the results table. The second contrast compared asking a General question (0.5) to asking a Specific question (−0.5; No Question = 0), reported as “General versus Specific Question” in the results table.
The results of the model showed that condition (p = .821), whether a child asked a question about that word (p = .056), whether a child asked a General compared to Specific question (p = .203), and age (p = .855) did not have a significant effect on whether the child selected the target during the retention test. Whether the child selected the target during the exposure phase, as indexed by immediate selection accuracy, showed a significant effect on whether the child selected the target during the retention test (OR = 4.98, p = .017). If a child selected the target on the immediate selection accuracy measure, they were 4.98 times more likely to select the target correctly at test. This suggests that children’s accuracy in their use of verbal information in the moment is important for their later retention (see Table 3 for the full results of the model). With the outlier on the immediate selection accuracy measure removed, the effect of immediate selection accuracy on test retention accuracy was no longer significant (p = .992) (see Table 4).
Table 3. Logistic mixed-effects regression predicting accuracy at test

Note. Odds ratios, confidence intervals, and p-values for each of the fixed predictors included in the analysis of retention accuracy. The table displays the results when including the outlier score on the immediate selection accuracy measure.
Table 4. Contingency table of question type by retention test accuracy

Note. Question type is grouped into General, Specific, or No Question, displaying the counts of trials where participants did (1) or did not (0) select the target at test. This table does not account for by-participant grouping.
2.6. Exploratory analyses
As an exploratory analysis, correlations between age and question-asking frequency, immediate selection accuracy, test retention accuracy, and curiosity score on the Learning Attitudes Questionnaire were computed. For the Learning Attitudes Questionnaire, participants had an average curiosity score of 3.58 out of 5 (SD = 0.43, min = 2.67, max = 4.47), which did not differ across conditions, t(62) = 0.683, p = .497. All correlations were non-significant (p s > .05), except for a positive correlation between age and immediate selection accuracy (Spearman’s ρ = 0.324, p = .009), which holds when excluding the outlier on the immediate selection measure (Spearman’s ρ = 0.289, p = .022), but not with adjustments for multiple comparisons with the outlier excluded (Hommel-adjusted p = .220), suggesting that the relationship between age and immediate selection accuracy may be spurious (see Table 5).
Table 5. Correlations between variables

Note. Spearman’s correlations between age, question-asking frequency, immediate selection accuracy, retention test accuracy, and curiosity score. These include the outlier.
* p < .01.
3. General discussion
The present study is a first test of whether asking a question about a word helps children retain information about a recently labelled item. Children were randomly assigned to a Listening condition (where they were not given an opportunity to ask a question) and a Question-sking condition (where they were able to ask questions). In both conditions, children used the verbal information offered to identify target referents in the moment and showed above-chance retention at test. However, children showed no difference in retention across conditions. Importantly, children who had an opportunity to ask questions in the Question-Asking condition did ask more questions than children in the Listening condition. However, asking a question did not predict retention of labels. The type of question children asked also had no apparent effect on retention accuracy. This work provides suggestive evidence that young children do not show a clear benefit from asking about the meaning of words for either immediate selection of referents or retention.
To be clear, this is not meant to suggest that children never benefit from asking a question. It may be that children use information they receive in the moment but do not retain it unless there is a compelling reason to do so. This view would be commensurate with findings from the adult literature that suggest relatively poor memory for the content of conversational exchanges – in one review, authors estimated that adults recall less than 20% of idea units discussed (Brown-Schmidt & Benjamin, Reference Brown-Schmidt and Benjamin2018). Below, we discuss factors that may lead to better retention of information following questions in the hopes that our suggestions lay the groundwork for future studies that address the role of important contextual and individual factors, including the presence of a peer, age, and the intent of the question asker.
The lack of an overall difference in retention after asking about a new word compared to listening is a finding that is inconsistent with predictions from active learning accounts (e.g., Chi et al., Reference Chi, Adams, Bogusch, Bruchok, Kang, Lancaster, Levy, Li, McEldoon, Stump, Wylie, Xu and Yaghmourian2018; Chi & Wylie, Reference Chi and Wylie2014; Gruber & Ranganath, Reference Gruber and Ranganath2019; Markant, Ruggeri, Gureckis, & Xu, Reference Markant, Ruggeri, Gureckis and Xu2016; Wellman, Reference Wellman, Butler, Ronfard and Corriveau2020). Under this account, generating questions improves retention in part because the learner can identify information gaps and make decisions about what information they receive (e.g., Kedrick, Schrater, & Koutstaal, Reference Kedrick, Schrater and Koutstaal2023). Instead, in the present work, children showed no difference in target selection when they asked a question about a word compared to when they were directly provided with the information, suggesting that the mere act of generating questions did not benefit novel word retention. It is possible that this study context may have been missing key elements that are, in fact, central to the act of question-asking and any potential benefits it may have on learning.
One factor may be the lack of a peer with whom to engage. In support of this, the results of the present work also differ from the empirical work on question-asking by Ross & Balzer, Reference Ross and Balzer1975 and Ross & Killey, Reference Ross and Killey1977, in which children showed better memory for answers they requested themselves than for answers their partner requested. One reason for the diverging results could be that Ross and colleagues used a study design that included two children participating as a pair. It is possible that the presence of peers may have influenced children’s retention. Boosts in learning from being around peers have been shown across a variety of content areas and age groups (Tenenbaum, Winstone, Leman, & Avery, Reference Tenenbaum, Winstone, Leman and Avery2020). Even 9-month-old infants have been shown to learn language from video better when they are learning alongside a peer than they do when learning on their own (Lytle, Garcia-Sierra, & Kuhl, Reference Lytle, Garcia-Sierra and Kuhl2018).
Although we did not reveal any effects of age in the 3-year span studied, another intriguing possibility is that the effect of question-asking on learning and memory may emerge later in development. In support of this, the work by Ross and colleagues was conducted with children who had already entered or attended elementary school for quite some time (e.g., between first and fifth grade). One possibility is that younger children who are just beginning to learn how to ask questions may not benefit as much from asking questions as children who have had ample practice using questions to elicit information. Although children produce high rates of questions during the preschool period, their rates of efficient question-asking continue to increase across early childhood (e.g., Ruggeri, Lombrozo, Griffiths, & Xu, Reference Ruggeri, Lombrozo, Griffiths and Xu2016; Ruggeri, Walker, Lombrozo, & Gopnik, Reference Ruggeri, Walker, Lombrozo and Gopnik2021). The effect of age may be one reason that the current work was more consistent with findings by Pierce (Reference Pierce1985, Reference Pierce1990), which showed that preschool children remembered information they were directly told at the same or better rates than information they requested.
Children’s intent when asking questions may also play a role. The current study suggests that asking questions about words did not produce a clear benefit. The lack of a difference in retention for question-asking and listening may suggest that the act of asking a question about a word itself does not necessarily result in different cognitive processing compared to more passive listening. As such, it is worth considering whether children may benefit differently from asking questions depending on the reason they are asking. It is possible that asking questions in a more reflexive way (e.g., any time a novel word is heard) or to functionally complete a task (e.g., asking to pass the butter at dinner) may be a more surface-level way of engaging during learning. In this case, children may be more likely to use the information in response to their question in the moment, but not store it in their lexicon.
It is also possible that children were expecting different answers to the questions they asked in the current experiment. Although responding with a description of features was an appropriate response to the questions about specific objects (e.g., “Is this a dawnoo?”) and general questions (e.g., “What is a dawnoo?”) that predominated children’s questions, we do not know for certain whether they perceived the answers as relevant and aligned to their intent in asking the question. A study design that allows for repeated inquiries or that varies the response to children’s questions systematically would be necessary to test this possibility.
One final possibility is that if children ask questions about words to supplement conceptual information (e.g., asking what a vehicle is, then about how cars move, then about a specific car part) they may be more likely to benefit from asking a question. Children’s domain-level expertise may also play a role in how much they retain from the questions they ask. This issue has not been directly measured in the realm of question-asking. However, an investigation of toddlers’ word learning clarifies that they leverage their existing vocabulary and semantic knowledge to learn novel labels (Borovsky, Ellis, Evans, & Elman, Reference Borovsky, Ellis, Evans and Elman2016). In this study, each individual child’s existing knowledge of a set of six domains was classified as “Low” or “High” and then children were taught novel words in each of these domains. Two-year-olds showed recognition of novel word meanings by looking longer at referents in semantic categories about which they know more words. Borovsky et al. (Reference Borovsky, Ellis, Evans and Elman2016) did not measure children’s question-asking, but the study provides suggestive evidence that children’s familiarity with words and their associated concepts may drive what words they ask about, in addition to helping them store meanings for later use.
A limitation of the current study is the homogeneity of the sample. Participants were primarily White, from middle- to upper-socio-economic status households in the United States. Over half of parents reported a household income of $1,50,000 or higher and most participants were highly educated. Children from higher socio-economic backgrounds and those with more education may ask more direct questions or request information in different ways than other children (e.g., Callanan, Solis, Castañeda, & Jipson, Reference Callanan, Solis, Castañeda, Jipson, Butler, Ronfard and Corriveau2020; Gauvain, Munroe, & Beebe, Reference Gauvain, Munroe and Beebe2013; Solis & Callanan, Reference Solis and Callanan2016; Ünlütabak et al., Reference Ünlütabak, Nicolopoulou and Aksu-Koç2019), which may influence how they learn from the process of asking questions. Caregivers may also prompt and respond to preschooler’s questions differently across socio-economic and education groups (e.g., Kurkul & Corriveau, Reference Kurkul and Corriveau2018), which may influence children’s expectations about the answers they will receive to their questions. Our study did not reveal connections between SES variables and question-asking or learning, but the study was not designed to test this question in specific detail. A study with a larger, more variable sample would be necessary to do justice to this specific research topic.
4. Conclusion
The current work investigated whether children’s question-asking affects vocabulary learning. Although children showed evidence of using the information offered in response to their question immediately, the act of asking a question did not improve retention relative to receiving the same information passively. The results of the current work suggest that the act of asking a question about a word, in and of itself, does not necessarily improve retention in young children. The role of individual variation in linguistic and conceptual skills, social contexts in which the questions are asked, intent of the asker, and tuning of responses to questions are factors that remain to be explored. These questions about questions open several pathways for future work in the field.
Data availability statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Acknowledgements
The authors would like to thank the research assistants for data collection, transcription, and coding, and the families for participating. Versions of this article were presented as a poster at the 2024 Biennial Cognitive Development Society Meeting in Pasadena, California.
Competing interests
The authors have no conflicts of interest to disclose.



