Introduction
In recent years, there has been growing concern over the assessment of bilingual children’s language skills (De Houwer, Reference De Houwer2023). This concern is rooted in an increased awareness that the use of tools developed for monolingual populations on bilingual children can result in faulty conclusions (Armon-Lotem et al., Reference Armon-Lotem, de Jong and Meir2015; Fleckstein et al., Reference Fleckstein, Prévost, Tuller, Sizaret and Zebib2018; Zurer-Pearson, Reference Zurer-Pearson2010). Bilinguals’ language development is somewhat different from that of monolinguals (De Houwer, Reference De Houwer1990, Reference De Houwer2023; Genesee & Nicoladis, Reference Genesee, Nicoladis, Hoff and Shatz2007; Kurcz, Reference Kurcz2005), as each bilingual child has a unique constellation of relevant linguistic experience when we consider the set of factors including age of onset, input, length of exposure, context of exposure, and language status. The specific features of each language also play an important role in child language development (Byers-Heinlein & Fennell, Reference Byers-Heinlein and Fennell2014).
De Houwer (Reference De Houwer2023) highlights how, as a result of this diversity of experience, bilingual children follow distinct developmental trajectories that cannot be fairly assessed using monolingual norms. In particular, assessments that focus solely on one language, often the societal language, can underestimate a bilingual child’s total linguistic capacity and lead to misdiagnosis. Notably, in certain areas of linguistic competence, typically developing bilingual children may score similarly to monolinguals with developmental language disorder (previously also known as specific language impairment (SLI), language impairment, or primary language impairment) (Bishop, Reference Bishop2017; Ebert & Kohnert, Reference Ebert and Kohnert2016; Kohnert et al., Reference Kohnert, Windsor and Ebert2009).
Developmental language disorder (DLD) is a neurodevelopmental condition affecting children’s language skills across domains such as syntax, phonology, semantics, and pragmatics (Bishop, Reference Bishop2017a; Leonard, Reference Leonard2014; Montgomery & Evans, Reference Montgomery and Evans2009). Symptoms vary cross-linguistically, but when present in bilinguals, DLD typically affects both languages spoken by the child (Armon-Lotem, Reference Armon-Lotem2012; Paradis et al., Reference Paradis, Crago, Genesee and Rice2003). Diagnosis is usually made between 4 and 7 years through standardized tests of expressive and receptive language, supplemented with caregiver and teacher reports (Bishop et al., Reference Bishop, Snowling, Thompson and Greenhalgh2017b).
For bilingual children, diagnostic challenges are substantial. Because bilinguals follow developmental trajectories that differ from monolingual norms (De Houwer, Reference De Houwer2023), using monolingual tests risks both over- and under-identification of DLD (Pert & Bradely, Reference Pert and Bradley2018). Some language behaviors typical of bilingual development resemble the profile of DLD (Ebert & Kohnert, Reference Ebert and Kohnert2016; Kohnert et al., Reference Kohnert, Windsor and Ebert2009), which increases the risk of false positives. Conversely, bilingual children with DLD may go unrecognized if their exposure to the societal language has been insufficient for reliable assessment (Grimm & Schulz, Reference Grimm and Schulz2014). Indeed, bilingual children are overrepresented in referrals for suspected DLD in several clinical settings (e.g., Laasonen et al., Reference Laasonen, Smolander, Lahti-Nuuttila, Leminen, Lajunen, Heinonen, Pesonen, Bailey, Pothos, Kujala, Leppänen, Bartlett, Geneid, Lauronen, Service, Kunnari and Arkkila2018) yet may also be underdiagnosed due to limited linguistic experience in the testing language.
These issues highlight the need for diagnostic approaches that take a bilingual child’s complete linguistic profile into account (De Houwer, Reference De Houwer2023). Although best practice recommends assessing both of a bilingual child’s languages (Bedore & Peña, Reference Bedore and Peña2008; Gillam et al., Reference Gillam, Peña and Miller1999; Peña, Bedore & Kester, Reference Peña, Bedore and Kester2016), this is rarely feasible in practice. For many languages, no standardized tests exist, and obtaining norms for heterogeneous bilingual populations is difficult. Moreover, clinicians may not speak all of the languages they must assess.
These limitations have prompted interest in language-neutral, or minimally language-dependent, assessment tools (Armon-Lotem et al., Reference Armon-Lotem, de Jong and Meir2015; Polišenská et al., Reference Polišenská, Chiat, Fenton and Roy2020). Such tools aim to evaluate abilities that develop in similar ways across languages and are therefore less tied to specific morphosyntactic or lexical knowledge. Prominent examples include non-word repetition (e.g., the quasi-universal Q-U NWRT; Chiat, Reference Chiat, Armon-Lotem, de Jong and Meir2015) and narrative macrostructure tasks, which rely on general cognitive–pragmatic skills that transfer across a bilingual’s languages (Gagarina & Lindgren, Reference Gagarina and Lindgren2020; Pearson, Reference Pearson, Oller and Eilers2002). Semantic-pragmatic skills, including pronominal reference, quantification, exhaustivity, and scalar implicatures, also appear to follow comparable developmental trajectories across languages (Avrutin, Reference Avrutin1999; Katsos et al., Reference Katsos, Cummins, Ezeizabarrena, Gavarró, Kraljević, Hrzica, Grohmann, Skordi, Jensen de López, Sundahl, van Hout, Hollebrandse, Overweg, Faber and Noveck2016; Philip, Reference Philip1995). Because these skills are less dependent on language-specific morphology, they offer promising candidates for language-neutral assessment.
Building on this work, the present study investigates one such domain: quantifier comprehension. It is important to note that, while this study focuses on typically developing bilingual children, we are motivated by the broader goal of addressing diagnostic challenges associated with DLD, particularly in multilingual populations.
As noted, several factors contribute to the differing pace of linguistic development in a child’s languages, including the duration of exposure, language dominance, age of L2 onset, and typological differences. For practical purposes, such as diagnosing DLD, it would be convenient if we were able to assume that shared properties in a bilingual child’s languages follow a uniform developmental trajectory. The general developmental milestones have already been identified in early language acquisition, including the universally recognized stages of babbling, one-word utterances, two-word combinations, and multi-word sentences (Berman, Reference Berman1981; Berk, Reference Berk2009; Bloom, Reference Bloom1973; Brown, Reference Brown1973; Gillis & Ravid, Reference Gillis, Ravid, Sandra, Östman and Verschueren2009; Slobin, Reference Slobin and Slobin1985, Reference Slobin, Bowerman and Levinson2001). However, while it is well documented that monolingual and bilingual children reach early key linguistic milestones, ranging from sounds to grammar, within similar timeframes (De Houwer, Reference De Houwer2009; Genesee et al., Reference Genesee, Nicoladis and Paradis1995; Genesee et al., Reference Genesee, Paradis and Crago2004; Meisel, Reference Meisel, Bhatia and Ritchie2004; Muszyńska et al., Reference Muszyńska, Krajewski, Dynak, Garmann, Romøren, Łuniewska, Alcock, Katsos, Simonsen, Hansen, Krysztofiak, Sobota and Haman2025), this cannot be taken as a diagnostic shortcut. Variation between languages can be substantial, especially in morphosyntactic domains such as tense marking, subject-verb agreement, and negation (Bittner et al., Reference Bittner, Dressler and Kilani-Schoch2011). Therefore, evaluating these skills in only one language does not reliably reflect abilities in another; comprehensive assessment across both languages remains necessary. Moreover, the early milestones, while well-established, are of limited relevance for the assessment of older children.
An alternative approach, advocated by Polišenská et al. (Reference Polišenská, Chiat, Fenton and Roy2020), is to develop language-neutral, evidence-based assessments that evaluate fundamental language-learning capacities independent of specific language knowledge. In this context, “language-neutral” features are those that develop similarly across languages and are not specific to the grammar or vocabulary of any one language—that is, those in which a child’s performance in one language informs us about the other(s) (Armon-Lotem et al., Reference Armon-Lotem, de Jong and Meir2015).
Candidate language-neutral indicators such as non-word processing and narrative tasks have been used as a basis for assessment tools. A striking example is the quasi-universal non-word repetition task (Q-U NWRT) of Chiat (Reference Chiat, Armon-Lotem, de Jong and Meir2015). Designed to minimize the influence of knowledge from any specific language, the Q-U NWRT incorporates segmental and phonotactic features common across languages while avoiding language-specific characteristics. The premise of this design is that bilingual children can draw on their overall linguistic competence, making their performance on this task a valid reflection of their underlying language skills (Boerma & Blom, Reference Boerma and Blom2017).
Narrative tasks are also commonly used to assess a bilingual child’s linguistic abilities (Gagarina & Lindgren, Reference Gagarina and Lindgren2020). Specifically, research shows that bilingual children are able to apply knowledge about narrative macrostructure (e.g., story grammar elements such as goals, actions, and outcomes) acquired in their first language (L1) to their second language (L2) (Pearson, Reference Pearson, Oller and Eilers2002). In contrast to narrative microstructure, which encompasses factors such as lexical diversity and relational and referential devices, macrostructure does not appear to be influenced by the amount of language-specific exposure that a child receives in each of their languages and therefore allows linguistic skills in a second or third unassessed language to be inferred from performance in the child’s L1 (Boerma & Blom Reference Boerma and Blom2017; Hipfner-Boucher et al., Reference Hipfner-Boucher, Milburn, Weitzman, Greenberg, Pelletier and Girolametto2015; Pearson, Reference Pearson, Oller and Eilers2002; Squires et al., Reference Squire, Andrews, Davis, Esin, Harrison, Hyden and Hyden2014). Recognizing the diagnostic value of such tasks, the LITMUS battery (Language Impairment Testing in Multilingual Settings) developed through COST Action IS0804 (2010–2013) incorporates narrative tasks alongside sentence repetition, nonword repetition, and lexical assessments to support the identification of language impairment in multilingual contexts (Armon-Lotem et al., Reference Armon-Lotem, de Jong and Meir2015).
As in the case of the language-neutral skills mentioned above, the acquisition of many semantic and pragmatic properties seems to follow a similar developmental path cross-linguistically. This has been documented for pronominal reference (Avrutin, Reference Avrutin1999; Baauw, Reference Baauw2002), quantification (Hanlon, Reference Hanlon1987; Katsos et al., Reference Katsos, Cummins, Ezeizabarrena, Gavarró, Kraljević, Hrzica, Grohmann, Skordi, Jensen de López, Sundahl, van Hout, Hollebrandse, Overweg, Faber and Noveck2016; Philip, Reference Philip1995), exhaustivity in question interpretation (Roeper et al., Reference Roeper, Pearson, Penner and Schultz2002; Schulz & Roeper, Reference Schulz and Roeper2011), and scalar implicatures (Noveck, Reference Noveck2001; Papafragou & Musolino, Reference Papafragou and Musolino2003). Therefore, assessing a child’s performance with these semantic and pragmatic skills in one language might allow us to approximate their performance in another, unassessed language.
In the current study, we examine one of these semantic categories, namely quantification in Polish-English bilingual children. Specifically, we focus on children’s comprehension of quantifying expressions such as “all,” “some,” “most,” and “none” across the two languages. Alongside this, we use the English and Polish versions of the Test for Reception of Grammar (TROG-2) (Bishop, Reference Bishop2003) to assess the same children’s understanding of a range of grammatical structures.
Our principal goal in this study is to investigate whether these quantifiers constitute a language-neutral linguistic category in the sense described above: that is, whether a bilingual child’s capability in quantifier comprehension in one of their languages can be predicted by their performance when tested in the other language. Under this hypothesis, we expect a strong correlation between children’s performance in English and Polish versions of the quantifier comprehension task. In particular, we expect this correlation to be significantly stronger than that between the English and Polish versions of TROG-2, as this tests grammatical categories that are not language-neutral and that in some cases follow different developmental trajectories in the bilingual child’s two languages. While this study does not include children with suspected or diagnosed DLD, our motivation stems from the diagnostic challenges DLD poses in bilingual populations. We focus here on typically developing bilingual children in order to evaluate whether quantifier comprehension offers promise as a language-neutral assessment tool that could later be validated in clinical contexts.
In the following sections, we present a more detailed account of the meaning and acquisition of quantifiers before turning to the specifications of the experimental task.
Quantifiers
In natural language, the meaning of the quantifiers such as “all” and “some” is taken to correspond to relations between sets rather than denoting properties of individuals (Barwise & Cooper, Reference Barwise, Cooper, Barwise, Etchemendy and Perry1981). Under this view, the statement “All books are in the boxes” is true if all books under discussion are inside the boxes, and it is false in every other circumstance. The statement “Most of the books are in the boxes” is true if the number of books inside the boxes is greater than the number of books outside and false otherwise. Similarly, the statement ‘Some of the books are in the boxes’ is true if two or more books are in the boxes, and false otherwise.”Footnote 1
Quantified sentences have systematic entailment properties. Quantifiers that guarantee inferences from sets to supersets (e.g., “Some students play the piano” entailing “Some students play an instrument”) are known as monotone increasing. Quantifier comprehension involves both semantic and pragmatic processing. While the truth conditions of expressions such as “some” or “none” can be evaluated semantically (e.g., “none” entails an empty intersection between sets), children must also assess whether the utterance is pragmatically appropriate or underinformative. For example, the statement “some of the toys are in the boxes” is semantically true even if all of the toys are in the boxes, but it violates expectations of informativeness. Detecting such underinformativeness engages pragmatic reasoning, particularly the ability to interpret utterances relative to Gricean maxims (Grice, Reference Grice1989; Katsos et al., Reference Katsos, Cummins, Ezeizabarrena, Gavarró, Kraljević, Hrzica, Grohmann, Skordi, Jensen de López, Sundahl, van Hout, Hollebrandse, Overweg, Faber and Noveck2016). The Quantifier Comprehension Task used in this study was therefore designed to elicit both semantically false responses and pragmatically infelicitous responses, enabling a direct assessment of children’s sensitivity to both semantic accuracy and pragmatic appropriateness. In downward-entailing linguistic environments, replacing a general term (i.e., the set) with a more restrictive term (i.e., the subset) preserves truth. Quantifiers that guarantee inferences from sets to subsets (e.g., “None of the students is smoking” entailing “None of the students is smoking expensive cigars”) are known as monotone decreasing.
Many human languages contain quantifiers, and these quantifiers have similar entailment properties cross-linguistically (Von Fintel & Matthewson, Reference Von Fintel and Matthewson2008). They also appear to impose similar usage constraints, such as the need to be informative. For instance, speakers should not describe a situation in which all students are running by saying, “Some students are running.” Although strictly speaking this statement is true, it is an underinformative statement, and it potentially invites the listener to draw the further conversational inference that the speaker has reason not to utter the more informative statement “All students are running” (Grice, Reference Grice1989; Horn, Reference Horn1972).
Acquisition of quantifiers
Young children’s comprehension of quantified expressions has received significant attention in the literature (Crain, Reference Crain2017 i.a.; Gualmini, Reference Gualmini, Meroni and Crain2003; Katsos et al., Reference Katsos, Cummins, Ezeizabarrena, Gavarró, Kraljević, Hrzica, Grohmann, Skordi, Jensen de López, Sundahl, van Hout, Hollebrandse, Overweg, Faber and Noveck2016; Musolino, Reference Musolino1998; Papafragou & Musolino, Reference Papafragou and Musolino2003; Pouscoulous et al., Reference Pouscoulous, Noveck, Politzer and Bastide2007).
In general, children appear to have mastered several of the semantic properties of quantifiers at around age six, but their responses to sentences with quantifiers differ from adults’: children are more likely than adults to accept underinformative quantified expressions (Noveck, Reference Noveck2001). Several accounts of this phenomenon have been offered in previous literature, including appeal to children’s immature cognitive capacities (Inhelder & Piaget, Reference Inhelder, Piaget, Lunzer and Papert1964 i.a.) or immature grammars (i.e., difficulties in mapping form to meaning in quantifier comprehension; Philip, Reference Philip1995). However, researchers have yet to reach consensus on the source of young children’s non-adult-like responses to quantified expressions (Crain, Reference Crain2017).
Katsos et al. (Reference Katsos, Cummins, Ezeizabarrena, Gavarró, Kraljević, Hrzica, Grohmann, Skordi, Jensen de López, Sundahl, van Hout, Hollebrandse, Overweg, Faber and Noveck2016) examined children’s understanding of quantified expressions across 31 languages from 11 families. The results revealed a cross-linguistically similar order of acquisition that accorded with four main constraints posited by the authors: monotonicity, totality, complexity, and informativeness. Specifically, in 27 of the 31 languages tested, children performed better with monotone increasing quantifiers (“all,” “some”) than with their monotone decreasing counterparts (“none,” “some…not”). In 25 of the 31 languages tested, children performed better with total than partial quantifiers—that is, with “all” and “none,” which attribute the same property to every member of the set, compared to “some” and “some…not,” which do not. Children were more successful at comprehending “some” than “most” in all 31 languages, which the authors take as evidence for the role of complexity: they argue that, in order to successfully understand “Most of the As are Bs,” children need to be able to compare the cardinalities of the set of As that are Bs with those of the set of As that are not Bs. By contrast, “Some As are Bs” is less complex because children do not need to restrict the quantifier to a specific set of entities or compare cardinalities in order to evaluate the existential claim that is being made. Finally, the results also confirm that in all 31 languages studied, children follow the “informativeness” constraint: they are aware of violations of pragmatic felicity arising from underinformativeness, although they treat these violations less strictly than violations of truth. These findings broadly chime with proposals arguing for cross-linguistic similarities in the meaning and use of quantifiers. Slim et al. (Reference Slim, Lauwers and Hartsuiker2021) provided evidence from adult bilinguals that quantifiers are not only interpreted similarly across languages but also share their underlying logical form. In sentence-picture matching tasks, Dutch–French bilinguals showed cross-linguistic priming of quantificational scope, suggesting that the logical representations constructed for quantifiers in one language are accessible and used when processing the other language.
Drawing upon these findings, in this study, we explore the usefulness of bilingual children’s performance on a quantifier interpretation task in one language as a predictor of their performance on that task in the other language. We compare this to the cross-linguistic predictive usefulness of their performance on a task designed to measure complex linguistic skills, which are typically strongly language-specific. The ultimate goal of this approach is to exploit the similarities in the developmental trajectories of quantifiers across languages by using quantifier interpretation in bilingual assessment.
In attempting this, we take the view that, although performance with quantifiers requires some measure of numerical skill (Pietroski et al., Reference Pietroski, Lidz, Hunter and Halberda2009; Shikhare et al., Reference Shikhare, Heim, Klein, Huber and Willmes2015), it is more closely associated with linguistic abilities. This is because quantifier comprehension depends not just on estimating quantity but also on understanding set-theoretic relations, logical entailments, and pragmatic appropriateness, features that are encoded linguistically and governed by semantic and pragmatic rules (Cheng et al., Reference Cheng, Zhou, Yu, Chen, Jia and Zhou2013; Dolscheid & Penke, Reference Dolscheid and Penke2018; Katsos et al., Reference Katsos, Ezeizabarrena, Gavarró, Kuvač Kraljević, Hrzica, Grohmann, Skordi, Jensen de López, Sundahl, van Hout, Hollebrandse, Overweg, Faber, van Koert, Cummins, Smith, Vija, Parm, Kunnari, Noveck, Biller, Chung and Kimball2012). For example, interpreting “some” as pragmatically distinct from “all” requires knowledge of scalar implicatures, not just quantity. We consider this study a first step towards directing attention to the possibility of using quantifiers as part of the assessment of bilingual children’s linguistic aptitude.
General methods
Participants
A total of 45 Polish-English bilingual children living in the east of England participated (mean age: 6;10, range: 4;11–7;10). Mean age of onset of English was 2;3 (range: birth–4;1). The mean length of exposure to English was 3;10 (range: 1;8–6;2). Participants did not have a history of autism or hearing or neurological impairment. Both of their parents were Polish: around 30% of the parents had college education, with around 60% reporting secondary and around 10% elementary education. In the final analysis, 43 children were included (22 males and 21 females). One child was excluded due to strong acquiescence bias (responding “Yes” to all items), and one child was excluded for failing the pre-test for the quantifier task (see below).
Measures and procedures
The following tasks were used: (1) the Quantifier Comprehension Task in English and in Polish (Katsos et al., Reference Katsos, Ezeizabarrena, Gavarró, Kuvač Kraljević, Hrzica, Grohmann, Skordi, Jensen de López, Sundahl, van Hout, Hollebrandse, Overweg, Faber, van Koert, Cummins, Smith, Vija, Parm, Kunnari, Noveck, Biller, Chung and Kimball2012); (2) the Test for Reception of Grammar (TROG-2) (Bishop, Reference Bishop2003) in English and in Polish; and (3) the Non-verbal IQ test (Raven, Reference Raven, Szustrowa and Jaworowska2003). Children were tested individually at nurseries or primary schools, in a quiet room outside their classroom, following the ethical protocols designated by the host institution, the University of Cambridge. Parents were asked to sign a Parental Consent Form (provided in two language versions: Polish and English) and to fill in a short questionnaire collecting data about their child’s onset of exposure to English and length of exposure, measured as length of time in the UK. This included information about the language used at home, the language used with the child, the parent’s level of education, their profession, and the child’s birth order and number of siblings. The parents were asked if they had any concerns about their child’s development and confirmed that their child had no history of autism and no hearing or neurological impairment. The experimenters also made sure that the children independently agreed to take part in all of the activities involved in collecting the data.
The tasks were administered in a counterbalanced order both at the level of language (Polish-English, English-Polish) and within language (TROG-Quantifiers, Quantifiers-TROG), except for the non-verbal IQ test, which was administered in a Polish adaptation only (Raven, Reference Raven, Szustrowa and Jaworowska2003) and which was always used as the last test in the session.
Measures: Quantifier Comprehension Task
In the Quantifier Comprehension Task (for details, see Katsos et al., Reference Katsos, Ezeizabarrena, Gavarró, Kuvač Kraljević, Hrzica, Grohmann, Skordi, Jensen de López, Sundahl, van Hout, Hollebrandse, Overweg, Faber, van Koert, Cummins, Smith, Vija, Parm, Kunnari, Noveck, Biller, Chung and Kimball2012), participants were seated in front of a laptop computer and were shown visual displays on the screen while listening to prerecorded sentences. To ensure cross-linguistic parity, all QCT stimuli were pre-recorded by a female native speaker following a matched recording script and consistent prosodic parameters across both language versions. Furthermore, the Polish version was not a direct translation but was developed collaboratively within COST Action A33 as part of a coordinated, cross-linguistic empirical project. As documented in Katsos et al. (Reference Katsos, Ezeizabarrena, Gavarró, Kuvač Kraljević, Hrzica, Grohmann, Skordi, Jensen de López, Sundahl, van Hout, Hollebrandse, Overweg, Faber, van Koert, Cummins, Smith, Vija, Parm, Kunnari, Noveck, Biller, Chung and Kimball2012), all participating language teams followed shared theoretical principles and common item structures and agreed on methodological constraints when selecting and validating quantifier expressions, ensuring that all language versions, including Polish, were prepared in parallel and under comparable design assumptions.
Before the experiment started, the experimenter introduced the participant to the Cavegirl, a fictional cartoon character who appears on the screen. The participant was told that her task was to help the Cavegirl learn better the language (English or Polish). The Cavegirl would say how many toys there are in the boxes. If what she said was right, the participant should tell her, “That was right.” If what she said was wrong, the participant should tell her, “That was wrong,” and, in order to help her learn, they should tell her why it was wrong. The experimental task was preceded by a warm-up session where children were familiarized with the Cavegirl, the task demands, and the pictures of the objects mentioned in the sentences. Children were shown a picture of each of the objects on the computer screen. The experimenter then pointed to each object and named it. The objects were chosen from the domain of household items, food, musical instruments, and toys, and teachers independently confirmed that the participants knew these objects.
After participants were familiarized with the task, their competence with numbers was tested with five statements, one for each number from one to five, three of which were true and two of which were false. This was done to confirm that participants could count up to five in order to ensure that children could make correct judgments about quantity when simple counting is involved. A discontinue rule was applied, under which, if the child failed an item in the warm-up phase, the child was asked to be a bit more careful, and the sentence was repeated. If a participant failed in the repetition of an item, or if he/she failed in the first attempt in three or more warm-ups, then the experiment was stopped. In this study, one child was excluded at this stage.
In each experimental trial, the Cavegirl produced a single utterance of the type “[Quantifier] of the [objects] are (not) in the boxes.” As instructed at the beginning, children evaluated whether what the Cavegirl said was “right” or “wrong” and, if they said “wrong,” were encouraged to justify why they did so. This was done in order to check whether the participant was rejecting a statement for reasons unrelated to falsity or informativeness. In this study, all justifications of rejections, whether correct or incorrect, mentioned a quantity-related word or deictic expression often combined with a spatial expression (e.g., “Because these are out”), which suggests that children responded based on the appropriateness of the quantifier rather than some other aspect of the sentence.
Seven quantifying expressions were tested: “some,” “most,” “some…not,” “all,” “none,” “all…not,” and “not all.” For each quantifier tested, two types of visual situations were used, one which rendered the utterance true and informative and one which rendered the utterance false. For “some,” “most,” and “some…not,” a third type of situation was also used, which rendered the utterance true but pragmatically underinformative. In the case of “some” and “most,” this was a display in which all of the objects were in the boxes; in the case of “some…not,” it was a display in which none of the objects were in the boxes.
Six blocks of items were created, with the order of items pseudo-randomized within each block to avoid mention of the same quantifier or the same object in adjacent items. The blocks were presented in one of three different orders.
Measures: Test for Reception of Grammar (TROG-2)
Participants were assessed using TROG-2 (Bishop, Reference Bishop2003) and its Polish translation (Smoczyńska, Reference Smoczyńska2008, unpublished manuscript submitted to Pearson Assessment) as a measure of receptive grammar. TROG is an individually administered, standardized test of morphosyntax. It has been used to assess children with specific language disorders, deafness, intellectual disability, and cerebral palsy. The test is appropriate for children aged 4 to 13 years and has been normed with more than 2000 British children who did not have any known learning difficulty, hearing loss, or other disability. The results of the standardization were found to be valid even after controlling for social background (Bishop, Reference Bishop1989).
Each test item is presented in a multiple-choice format with four pictures presented on a single board. One of the pictures illustrates the target structure, and three constitute the lexical and grammatical foils to this structure. The child is auditorily presented with the stimulus, containing a particular grammatical structure. Then the experimenter asks the child to point to one of four pictures that best corresponds to what he/she has heard.
The test consists of 20 blocks of 4 items. Each block assesses the child’s comprehension of a specific type of morphosyntactic construct (e.g., singular/plural personal pronouns, masculine/feminine personal pronouns, nouns, comparatives, verbs, reversible passives, negation, prepositional phrases, relative clauses), arranged in increasing order of difficulty (as established for the English version). The test is usually scored according to the number of blocks successfully passed (where passing a block involves the child responding correctly to all 4 items, the probability of doing so by chance being 0.004). TROG is designed to use a restricted and simple set of lexical items (e.g., elephant, girl, boy, dog, table, etc.) in order to minimize the likelihood of the participant’s failure due to lack of knowledge of individual words.
The Polish version of TROG is an official translation accepted by Pearson Co., although so far it lacks psychometric properties. However, there are indications that it closely resembles the original English version in terms of results: in a study of Polish-English bilinguals compared to Polish and English monolinguals (age 7 to 10 years), average TROG results for all three groups (total n = 78) in either Polish or English were very close to each other (Grose-Hodge et al., Reference Grose-Hodge, Dabrowska and Divjak2024).
Non-verbal IQ
Non-verbal IQ was tested with Raven’s Coloured Progressive Matrices (CPM, Raven Reference Raven, Szustrowa and Jaworowska2003). CPM is one of the most well-known, most widely researched, and most widely used of all culture-reduced tests. The test demonstrates high reliability and validity across a wide range of populations (Raven et al., Reference Raven, Court and Raven1996). Although the CPM is a non-verbal test, national standardizations are essential to ensure that administration procedures, instructions, and normative comparisons are culturally and linguistically appropriate for the target population. The version used in this study was the Polish standardization, which includes norms and protocols validated for Polish-speaking children (Raven, Reference Raven, Szustrowa and Jaworowska2003; Szustrowa & Jaworowska, Reference Szustrowa and Jaworowska2003).
Results
In the Quantifier Comprehension Task administered in Polish, participants were 82.0% accurate on semantically determined conditions (those which were true and informative or false), with individual performance ranging from 47.2% to 100% (SD 16.5%) and 55.6% accurate on pragmatically determined conditions (that is, they rejected underinformative utterances at this rate), with individual performance ranging from 0% to 100% (SD 30.2%). The overall accuracy (weighting semantic and pragmatic scores equally) was 68.8% (range 27.8%–98.6%, SD 18.7%). In the Quantifier Comprehension Task administered in English, the corresponding figures were 75.5% (range 38.9%–100%, SD 15.7%) for the semantic conditions and 57.9% (range 0%–100%, SD 28.5%) for the pragmatic conditions, with an overall accuracy of 66.7% (range 32.6%–97.2%, SD 17.0%).
We do not assume that the outcome measures are normally distributedFootnote 2, but in what follows we report parametric statistics on the basis of the sufficiently large sample size (n = 43). For completeness, we include the corresponding non-parametric test results in footnotes. Paired t-tests disclose that the participants were significantly more accurate on the semantic conditions in the Polish task than in the English task (t = 3.93, p < 0.001), but there was no significant difference between languages in the pragmatic conditions (t = 0.74, p = 0.47) or on the overall measure (t = 1.37, p = 0.18).Footnote 3
In the TROG-2 administered in Polish, participants passed an average of 10.1 blocks (range 3–19, SD 4.42) and got an average of 53.2 items correct (range 22–77, SD 17.34). In the English version of TROG-2, participants passed an average of 6.88 blocks (range 1–16, SD 3.21) and got an average of 37.9 items correct (range 5–69, SD 14.54). Paired t-tests show that this represents significantly higher performance in the Polish than the English version of TROG-2 (on the item measure, t = 5.34; on the block measure, t = 4.94; both p < 0.001).Footnote 4
Participants’ results on the CPM were age-appropriate. Specifically, their average score was 21.4 (range 14–32, SD 4.87). The corresponding stenFootnote 5 scores ranged from 4 to 10, with a mean of 6.91 (SD = 1.59).
Correlations between measuresFootnote 6
The correlation between the English and Polish QCT scores was 0.776 for the semantic conditions and 0.772 for the pragmatic conditions. The correlation between the English and Polish overall QCT scores was 0.843. The correlation between the English and Polish TROG scores, as measured by the number of blocks passed, was 0.422 (and as measured by the number of items passed, 0.310).Footnote 7 Comparing the semantic QCT correlation with the TROG (by block) correlation, we find a significant difference (Z = 2.62, p = 0.009); comparing the pragmatic QCT correlation with the TROG (by block) correlation in the same way, we again find a significant difference (Z = 2.57, p = 0.010); and the same is true for the composite QCT measure (Z = 3.49, p < 0.001).Footnote 8
The correlation between the English and Polish TROG raw scores is significantly lower than the English and Polish QCT correlations for the semantic condition (Z = 3.20, p = 0.0014), the pragmatic condition (Z = 3.16, p = 0.0016), and the composite QCT measure (Z = 4.07, p < 0.001).Footnote 9
Therefore, our main expectation of a strong correlation between children’s performance in English and Polish versions of the Quantifier Comprehension Task was confirmed. In particular, we expected this correlation to be significantly stronger than that between the English and Polish versions of TROG-2, and our results confirmed this hypothesis.
Polish TROG scores by block were moderately correlated with the English QCT semantic scores (r = 0.570), weakly correlated with the English QCT pragmatic scores (r = 0.207), and moderately correlated with the English QCT overall scores (r = 0.437).Footnote 10 The same is true if we consider Polish TROG raw scores (r = 0.535, r = 0.265, and r = 0.469, respectively).Footnote 11 Similarly, English TROG scores by block were moderately correlated with the Polish QCT semantic scores (r = 0.562), weakly correlated with the Polish QCT pragmatic scores (r = 0.268), and moderately correlated with the Polish QCT overall scores (r = 0.465).Footnote 12 Again, the same is true if we consider English TROG raw scores (r = 0.533, r = 0.214, and r = 0.408, respectively).Footnote 13
Finally, CPM scores were moderately correlated with the Polish TROG scores by block (r = 0.434) and in raw score (r = 0.389); English TROG scores by block (r = 0.505) and in raw score (r = 0.469); Polish semantic QCT scores (r = 0.359); Polish overall QCT scores (r = 0.328); English semantic QCT scores (r = 0.397); and English overall QCT scores (r = 0.402). They were weakly correlated with the Polish pragmatic QCT scores (r = 0.210) and English pragmatic QCT scores (r = 0.262).Footnote 14
Predicting scores
Thinking of these measures as potential predictors of the TROG scores in the other language, we find that the Polish semantic QCT is a numerically better predictor of English TROG scores than are either the Polish TROG or the CPM, and the English semantic QCT is a numerically better predictor of Polish TROG scores than are either the English TROG scores or the CPM. The other-language pragmatic QCT measures are numerically worse predictors than the other-language TROG scores or the CPM. This applies whether we consider the TROG block score or raw score.
Discussion
Bilingualism presents a challenge to language assessment because children’s skills may differ across their two languages. There are no clear norms that reflect the diversity of bilingual development, and bilingual professionals and/or suitable tests are often unavailable for both languages. As a consequence, bilingual children are typically evaluated in the dominant language of their community and with reference to monolingual norms, with adverse implications for the accuracy of the diagnosis of DLD. One prospective solution to this problem is an increased focus on language-neutral skills, the aspiration being to develop a test battery that enables professionals to estimate language skills in an unassessed language based on a child’s skills as assessed in another language (Pearson, Reference Pearson, Oller and Eilers2002; Squires et al., Reference Squire, Andrews, Davis, Esin, Harrison, Hyden and Hyden2014).
Our work explored one possible contributor to such a test battery. Specifically, drawing on research suggesting that certain semantic and pragmatic skills follow a similar developmental trajectory cross-linguistically (Slobin, Reference Slobin and Slobin1985; Thordardottir, Reference Thordardottir, Armon-Lotem, de Jong and Meir2015), we wished to investigate whether we could support the stronger claim that these skills are language-neutral in the sense required for assessment. To that end, we examined Polish- and English-speaking bilingual children’s comprehension of quantifier expressions such as “all,” “some,” “most,” and “none” across these two languages and tested whether we can use the bilingual child’s quantifier comprehension performance in one language as an approximator of their corresponding skills in the other language. We compared the children’s performance on the Quantifier Comprehension Task (QCT) in Polish and English, and their performance on a well-studied morphosyntactic evaluation task, the Test for Reception of Grammar (TROG-2), in both languages.
We hypothesized that, since quantifier comprehension may be a language-neutral skill, the results of the QCT would be more highly correlated between languages than the results of TROG. Our results confirmed this hypothesis, indicating that quantifier comprehension abilities are more consistent across a bilingual child’s languages than are grammatical abilities in general.
These results have important implications for bilingual language assessment. As noted above, speech and language therapists (SLTs) and other language professionals are frequently tasked with evaluating bilingual children’s language skills. However, given the diversity of languages involved, it is impossible for an SLT to be able to speak all the languages spoken by the children they need to evaluate. Hence, the task of finding suitable proxy measures for a child’s linguistic skills in their other languages is a pressing one. Our results are promising in that they demonstrate the QCT to be a relatively stable test instrument across languages for bilingual children, compared to standard morphosyntactic measures.
The question then arises of whether, and to what extent, the QCT is tapping general linguistic abilities. Could it be that the QCT results are instead a reflection of non-linguistic cognitive capabilities, such as those for analogical reasoning? There are reasons to doubt this interpretation. Firstly, as demonstrated in this study, the QCT results are not strongly predicted by the non-verbal IQ measure used, Raven’s Coloured Progressive Matrices (CPM); indeed, there was no significant difference in correlation between CPM and QCT scores and between CPM and TROG scores.
Secondly, previous research by Katsos et al. (Reference Katsos, Roqueta, Estevan and Cummins2011) suggests that the QCT is associated with language skills in monolinguals. Specifically, Katsos et al. (Reference Katsos, Roqueta, Estevan and Cummins2011) compared Spanish-speaking children with diagnoses of Specific Language Impairment (SLI, currently labeled DLD) (n = 29, mean age: 6;6, age range 4;0–6;9, 20 male) with age-matched and language-matched typically developing controls. Recall that DLD is characterized by difficulty with language that is not caused by a known neurological, sensory, intellectual, or emotional deficit, and consequently, non-linguistic cognition in DLD is generally expected to be age-appropriate.
Katsos et al. (Reference Katsos, Roqueta, Estevan and Cummins2011) evaluated their participants’ non-verbal IQ by the Raven’s Standard Progressive Matrices (SPM) test and the Gestalt closure subtest of the Kaufman Assessment Battery for Children-Second Edition (KABC-II; Kaufman & Kaufman, Reference Kaufman and Kaufman2004), and results for the DLD group were age-appropriate. In another study, participants were also tested on the Spanish adaptation of the QCT (Katsos & Smith, Reference Katsos and Smith2010). Here, the participants with DLD scored significantly lower than their age-matched controls, but their results did not significantly differ from the younger language-matched controls. This suggests that the QCT is indeed tapping linguistic rather than non-linguistic cognition.
Our findings may contribute to the need identified by Polišenská et al. (Reference Polišenská, Chiat, Fenton and Roy2020), who advocate for the use of language-neutral assessment tools that target core language-learning abilities rather than language-specific knowledge. Their recommendations for tools such as the Cross-linguistic Nonword Repetition (CL-NWR) task, the Early Sociocognitive Battery (ESB), and dynamic assessment strategies provide concrete avenues for addressing the diagnostic challenges for DLD posed by multilingualism. Each of these tools is designed to be culturally inclusive and minimally biased by specific language exposure, thereby enabling more equitable and accurate identification of language disorders in multilingual children. The QCT, as demonstrated in this study, aligns conceptually with these proposals by offering a method to evaluate a bilingual child’s underlying language competence in a way that is robust across languages. Integrating such tools into a broader assessment battery may form the foundation for more inclusive, scalable, and reliable diagnostic practices in multilingual contexts. However, it remains to be established precisely which linguistic abilities are tapped by the QCT and how it might best fit into a battery of tasks designed to evaluate bilingual children’s language abilities through the medium of a single language.
One of the limitations of the current study is the relatively small set of quantificational expressions included in the QCT. Although these seven expressions (“some,” “most,” “none,” etc.) were chosen to capture key logical and pragmatic contrasts, they represent only a subset of the broader domain of pragmatic competence. To build a more comprehensive profile of bilingual children’s pragmatic skills, future research should examine areas such as scalar implicature beyond quantifiers (e.g., “or” and “might”) and referential communication (e.g., use of definite vs. indefinite expressions). These pragmatic abilities also develop early and could help to extend and refine the current findings.
Some caution is also advisable regarding the characteristics of the sample, as the final sample (N = 43) reflected the size of the accessible population and practical recruitment constraints in local schools. Although geographically concentrated in the East of England, the participating children came from first-generation Polish immigrant families originating from diverse regions across Poland. Nevertheless, the sample is not fully representative of all bilingual profiles, and this limits the generalizability of the findings. Future research should therefore extend this paradigm to larger samples and to bilingual populations with different linguistic and sociolinguistic backgrounds.
In sum, our work points to the potential value of the Quantifier Comprehension Task in the assessment of bilingual children. Our results indicated that children’s performance on this task is strongly correlated between their two languages. This opens up the possibility of treating quantification as a potentially language-neutral linguistic category, which might be productively used in the assessment of bilingual children, in that the results of a test of this kind administered in one language are predictive of the results of a corresponding test in the other language. Given the pressing need for better assessment of language development in bilinguals (Peña et al., Reference Peña, Bedore, Lugo-Neris and Albudoor2020), we argue that our results should motivate further investigation into the use and development of tools that include quantifier comprehension as a component of language assessment.
Acknowledgements
We are grateful to the anonymous reviewers for their constructive feedback, including the suggestion to conduct additional analyses.This research was supported by the European Cooperation in Science and Technology (COST) Actions A33 Cross-Linguistically Robust Stages of Children’s Linguistic Performance and IS0804 Language Impairment in a Multilingual Society: Linguistic Patterns and the Road to Assessment, the British Academy, and the Polish Ministry of Science and Higher Education. Data collection was partly funded by Grant no. 809/N-COST/2010/0 from the Polish Ministry of Science and Higher Education/National Science Centre, Rozwój poznawczy i językowy polskich dzieci dwujęzycznych u progu edukacji szkolnej—szanse i zagrożenia [Cognitive and Linguistic Development of Polish Bilingual Children: Risks and Opportunities at the School Entrance Age]. Additional support was provided by COST Action IS0804 and A33 STMS (Short Term Scientific Mission) grants; the British Academy (SG-090676) Bilingual language development: a comparison of morphosyntactic and semantic-pragmatic competence; and EuroXPRAG grants.