On the multidimensionality of bilingualism and the unique role of language use

Abstract The multidimensionality of the bilingual experience makes the investigation of bilingualism fascinating but also challenging. Although the literature distinguishes several aspects of bilingualism, the measurement methods and the relationships between these aspects have not been clearly established. In a group of 171 relatively young Polish–English bilinguals living in their first-language environment, this study investigates the relationships between the multiple measures of bilingualism. The study shows that language entropy – an increasingly popular measure of the diversity of language use – reflects a separate aspect of the bilingual experience from language-switching and language-mixing measures. The findings also indicate that language proficiency is not a uniform aspect of the bilingual experience but a complex construct that requires appropriately comprehensive measurements. Collectively, the findings contribute to the discussion on the best practices for quantifying bilingualism.


Introduction
There is a growing understanding in the literature that bilingualism is not a zero-one phenomenon: it is a multifaceted experience, and each aspect should be treated as a continuous rather than binary variable (Antoniou, 2019;Kroll, 2015;Luk, 2015;Luk & Bialystok, 2013;Luk & Esposito, 2020). The fact that the bilingual experience is complex (Grosjean, Grosjean & Li, 2013) makes investigations of bilingualism both fascinating and challenging. In fact, it has been argued recently that it is exactly the complexity of the bilingual experience that is responsible for the "phantom-like quality" of bilingual effects on cognitive functioning (Bialystok, 2017;Blanco-Elorrieta & Pylkkänen, 2018;de Bruin, 2019;Leivada, Westergaard, Duñabeitia & Rothman, 2020;Luk, 2015;Luk & Bialystok, 2013;Luk & Esposito, 2020;Pliatsikas, DeLuca & Voits, 2020). Therefore, efforts have been made in recent years to investigate which aspects of the bilingual experience affect cognitive-and language-related processes. A big challenge in this endeavor is the adequate quantification of the bilingual experience. The goal of this paper is to contribute to the discussion on the best practices for quantifying bilingualism.
Recent reviews (de Bruin, 2019;Surrain & Luk, 2017) indicate that there are three aspects of the bilingual experience that have gained the most prominence in the literature: (1) THE ONSET OF BILINGUALISM, (2) DAILY USE OF LANGUAGES, and (3) LANGUAGE PROFICIENCY (DeLuca, 2019;DeLuca, Segaert, Mazaheri & Krott, 2020;Leivada et al., 2020). The assessment of the first two aspects is relatively consistent across the research. The onset of bilingualism is typically quantified as the age of second-language (L2) acquisition (L2 AoA; Unsworth, 2016), or the age of active communication in L2 (L2 AoAC; also called the age of active L2 use; Luk, Sa & Bialystok, 2011), although the latter measure is less frequently used. Daily use of a language (also called language exposure) is typically represented as the proportion of time spent using a language. These measures are typically collected using well-established language-background questionnaires such as the Language Experience and Proficiency Questionnaire (Marian, Blumenfeld & Kaushanskaya, 2007; see also Kaushanskaya, Blumenfeld & Marian, 2020), Language History Questionnaire (Li, Zhang, Tsai & Puls, 2014; see also Li, Zhang, Yu & Zhao, 2020), or the Language and Social Background Questionnaire (Anderson, Mak, Keyvani Chahi & Bialystok, 2018). The situation gets more complicated when one wants to quantify language proficiency. While most research has assessed it using self-reports (i.e., asking participants to self-assess how proficient they are in their languages; as implemented, e.g., in the Language Experience and Proficiency Questionnaire, Marian et al., 2007), it has recently been argued that language proficiency is a complex construct and self-assessment may be insufficient to provide a complete indication of bilingual language competence (de Bruin, 2019;DeLuca, Rothman & Pliatsikas, 2019b). This view is supported by a recent study (de Bruin, Carreiras & Duñabeitia, 2017), in which four different measures of language proficiency, i.e., self-assessment, oral proficiency interview, and two language-proficiency tasks (LexTALE, Lemhöfer & Broersma, 2012, and Multilingual Naming Test, Gollan, Weissberger, Runnqvist, Montoya & Cera, 2012), moderately correlated with each other. On the one hand, self-assessmentby sharing variances with the other measurescan be considered a reliable measure of language proficiency. On the other hand, the unique variances of these measures suggest that each reflects a slightly different aspect of language proficiency. Therefore, the literature suggests that self-assessment should be supplemented with additional language proficiency measures. A more comprehensive evaluation of language proficiency is also motivated by recent research which shows that self-assessed language proficiency depends on the personal/cultural characteristics and language-learning history of bilinguals (Hansen, Łuniewska, Simonsen, Haman, Mieszkowska, Kołak & Wodniecka, 2019;Tomoschuk, Ferreira & Gollan, 2019). For example, Tomoschuk and colleagues (2019) showed that the correlations between selfassessed language proficiency and an objective language proficiency task (Multilingual Naming Test; Gollan et al., 2012) varied considerably depending on language dominance (see Analysis 3 in the paper). Therefore, self-assessed language proficiency may not be easily generalizable when individuals differ in their bilingual experience.
In addition to the three aspects mentioned above, recent proposals hold that it is also important to consider THE WAYS in which bilinguals use their languages in everyday communication (de Bruin, 2019;DeLuca, 2019;DeLuca, Rothman, Bialystok & Pliatsikas, 2019a;Green & Abutalebi, 2013;Pliatsikas et al., 2020;Wodniecka, Casado, Kałamała, Marecka, Timmer & Wolna, 2020). Two early bilinguals with a native-like proficiency in L2 and similar daily exposure to L2 can still differ in how they actually use their languages. Therefore, THE BILINGUAL PATTERNS OF LANGUAGE USE have been proposed as a fourth important aspect of bilingualism that should be considered. This view is supported by recent theoretical perspectives on bilingual language control (Abutalebi & Green, 2016;Blanco-Elorrieta & Pylkkänen, 2018;DeLuca, 2019;Green & Abutalebi, 2013). In particular, the Adaptive Control Hypothesis (Green & Abutalebi, 2013;ACH hereafter) proposes that different patterns of language use may impose completely different demands on bilingual language control and, in turn, differently impact bilinguals' neurocognitive efficiency. Most importantly, the ACH predicts that extensive language switching (i.e., switching that occurs between utterances) in the absence of language mixing (i.e., mixing elements, e.g., words, of two languages within utterances) has the greatest impact on language/cognitive mechanisms in bilinguals.
Although the role of language-use patterns in shaping the bilingual experience has been well-motivated theoretically, their measurement methods are not firmly established in the literature (for reviews, see van den Noort, Struys, Bosch, Jaswetz, Perriard, Yeo, Barisch, Vermeire, Lee & Lim, 2019;Wodniecka et al., 2020). In most studies, language-use patterns were quantified as the frequency of language switching or the frequency of language mixing. A commonly used tool to obtain these measures is the Bilingual Switching Questionnaire (Rodriguez-Fornells, Krämer, Lorenzo-Seva, Festman & Münte, 2012). In addition to these measures, it has also recently been proposed to represent language-use patterns by means of entropy (Gullifer & Titone, 2019;Hartanto & Yang, 2020;Kałamała, Szewczyk, Chuderski, Senderecka & Wodniecka, 2020). The general concept of entropy comes from information theory (Shannon, 1948) and provides a measure of the diversity (uncertainty) of a phenomenon when the relative proportion of occurrences of a set of 'states' is known. When applied to language-use data, these 'states' represent 'languages', and the entropy value reflects the relative balance (diversity) in the use of different languages. If all languages are used equally, then the entropy value is high; when only a single language is used, then the entropy value is zero.
Summing up, the bilingual experience can be measured in multiple ways. The multitude of available measures is problematic as it is not clear which measures should be chosen to ensure the most adequate quantification of bilingualism. Moreover, the existing variety of measures leads to fragmentation of the literature and makes between-study comparisons difficult as different research groups prefer to use different measures. The most straightforward way to address this issue is to investigate the relationships between the available measures. Such evidence would benefit the literature at both the methodological and the conceptual levels. At the methodological level, it would help to identify those measures that are redundant and those which simply do not reflect inter-individual variability in the bilingual experience. At the conceptual level, it would inform how different measures relate to each other and to what extent they reflect the assumed aspects. However, while many studies have tested how the abovementioned measures of the bilingual experience relate to (neuro) cognitive functioning (for reviews, see Leivada et al., 2020;van den Noort et al., 2019), surprisingly few studies have tested their interrelationships (Anderson et al., 2018;Fishman & Cooper, 1969;Friesen, Luo, Luk & Bialystok, 2015;Gullifer & Titone, 2019;Gullifer, Kousaie, Gilbert, Grant, Giroud, Coulter, Klein, Baum, Phillips & Titone, 2020;Luk & Bialystok, 2013;Saito, 2015).
Two consecutive studies by Gullifer and colleagues (Gullifer et al., 2020;Gullifer & Titone, 2019) are of particular importance as they took into account not only the classic aspects of the bilingual experience (i.e., onset of bilingualism, daily use of languages, and language proficiency) but also the patterns of language use (assessed by means of language entropy). In both studies, the researchers analyzed data collected via self-assessment from young adult bilinguals living in Montreal. In the second study, the self-assessment of language proficiency was supplemented by a verbal fluency task. In both studies, the language entropy data were collected by asking participants to assess the extent to which they use their languages in different communicative situations, such as when spending time at home or reading. Language entropy estimates were calculated for each type of situation and then subjected to data-reduction techniques (i.e., principal component analysis in the first study and factor analysis in the second one). Data-reduction techniques were used to reduce the complexity of the language entropy data by capturing the common variance of different situational contexts in a smaller set. The first study by Gullifer and Titone (2019) showed that earlier L2 AoA and greater daily use of L2 (called L2 exposure in their paper) were related to better self-assessed L2 abilities. Crucially, the language entropy components (i.e., GENERAL ENTROPY representing language use for personal purposes and WORK ENTROPY representing language use for professional purposes) predicted L2 proficiency over and above the classic measures (i.e., L2 AoA and daily use of L2). Higher language entropy was related to better self-assessed L2 abilities. Most of these effects were then replicated in the second study (Gullifer et al., 2020), in which L2 proficiency was represented as a latent variable comprising fluency tasks in addition to self-ratings. Overall, the picture emerging from Gullifer and colleagues' research is that bilinguals who are highly proficient in L2 acquire L2 early, use L2 frequently, and exhibit great variation in daily language use. While relationships between L2 AoA, daily use of L2, and L2 proficiency have

472
Patrycja Kałamała et al. been reported in several previous studies (Anderson et al., 2018;Luk & Bialystok, 2013), Gullifer and colleagues were the first to show the unique role of patterns of language use in shaping the bilingual experience. Since the seminal paper by Gullifer and Titone (2019) was published, there have been many calls in the literature for the use of language entropy in characterizing the bilingual experience (de Bruin, 2019;DeLuca, 2019;DeLuca et al., 2019a;DeLuca et al., 2019b;Leivada et al., 2020;Pliatsikas et al., 2020). However, some researchers have recently questioned whether language entropy can actually capture all the complexity of language-use patterns (de Bruin, 2019;Leivada et al., 2020). While the properties of entropy unarguably make it possible to reflect the diversity/balance of language use, it is unclear how well this measure reflects bilingual language-switching habits. Although a high language entropy value seems to imply intense language switching, an equally likely scenario is that a bilingual with a high language entropy score used L2 half of the time but switched between languages only once. Considering that different types of language switching may impose completely different demands on language control (as predicted by the ACH; Abutalebi & Green, 2016;Green & Abutalebi, 2013;Green & Wei, 2014;Green, 2011), it is crucial to learn how the diversity of language use (indexed by language entropy) relates to language-switching habits. So far, however, this relationship has not been studied in the literature.
Moreover, the method of computing the output measures of the language-use patterns proposed in previous research (Gullifer et al., 2020;Gullifer & Titone, 2019) raises a concern. As reviewed above, Gullifer and colleagues proposed estimating language-use patterns on the basis of data-reduction techniques applied to language entropies calculated for different contexts. Data-reduction techniques identify sets of highly correlated variables and extract these as components or factors. When applied to language-entropy data, these techniques decompose unique patterns of language use represented by individual bilinguals into the more general patterns that prevailed in the tested sample. This implies that bilinguals who do not represent typical patterns of language use in a given situation either do not contribute to the factor structure or their contribution is negligible. Consequently, the factorial method does not seem to reflect the unique patterns of bilinguals' language use; instead, it informs about the patterns of language use that prevailed in the tested sample. Moreover, there are two other issues in the research by Gullifer and colleagues (Gullifer et al., 2020;Gullifer & Titone, 2019) that complicate inferences about the role of language-use patterns in shaping the bilingual experience. First, although researchers have collected language-entropy data with regard to many different contexts, some of these contexts pertained to the same spheres of life (e.g., home context and communication with family likely overlap). Since some life spheres were overrepresented in the collected dataset, the observed factor structure could be biased. Second, since not all participants engaged in all targeted contexts, there were missing data in their datasets (three-quarters of participants had missing data in some contexts in the second study). In order to account for the missing data, the researchers artificially imputed NAs. However, this approach is questionable because the lack of data, in this case, is not a technical problem with data collection; instead, it conveys important information that an individual did not engage in a given context. Therefore, artificial data compensation seems to violate the uniqueness of bilinguals' patterns of language use and introduces noise to the results.

Present study
Although the literature distinguishes several aspects of the bilingual experience, the relationships between these aspects and their measurement methods are not clearly established. The literature urges researchers to further investigate the sources of individual differences and pay particular attention to the role that language-use patterns play in shaping the bilingual experience.
The aim of the presented analyses is to continue recent efforts to quantify the bilingual experience. To this end, we reanalyzed data from our previous study , in which a large sample of relatively young adult bilinguals (N = 215 tested in total; N = 171 in the analyses) completed a set of questionnaires and two classic L2 proficiency tasks. All participants were Polish-English bilinguals who lived in an L1 environment and declared the daily use of L2. Crucially, the participants displayed great variability in terms of their language experience, but they were relatively homogeneous in terms of their socio-demographic background (for details, see section 2.1). This makes this dataset exceptionally useful as it allows the investigation of individual differences in the bilingual experience while reducing the impact of potential confounding variables related to socio-demographic characteristics. This large dataset allowed us to compute many measures that tapped into the four aspects of the bilingual experience: THE ONSET OF BILINGUALISM (L2 AoA; L2 AoAC), L2 PROFICIENCY (self-ratings for four basic L2 skills; score in the LexTALE task; score in the semantic fluency task), DAILY USE OF L2 (percentage of time spent using L2 each day), and PATTERNS OF LANGUAGE USE (language entropy; language mixing; intersentential codeswitching; intra-sentential code-switching).
To examine the relationships between the aspects of bilingualism, we adopted the approach already proposed in the previous literature and asked which aspects of the bilingual experience predict L2 proficiency (Flege, Munro & MacKay, 1995;Gullifer et al., 2020;Gullifer & Titone, 2019). In order to learn about the underlying structure of L2 proficiency, the L2 proficiency data (i.e., self-ratings for listening, reading, speaking, and writing, LexTALE, and semantic fluency scores) were subjected to factor analysis. To preview the findings, the factor analysis showed evidence for two moderately correlated factors. The first factor consisted of self-ratings and was considered to represent participants' subjective beliefs about their L2 abilities and their personal confidence in using this language (SELF-CONFIDENCE IN USING L2 hereafter). The second factor consisted of the LexTALE scores and semantic fluency scores. Since both LexTALE and semantic verbal fluency have been recognized as measuring vocabulary size and lexical access (Gollan, Montoya & Werner, 2002;Lemhöfer & Broersma, 2012;Shao, Janse, Visser & Meyer, 2014), the second factor was considered to represent vocabulary knowledge in L2 (VOCABULARY KNOWLEDGE hereafter). For each latent variable, we fitted a model that included a complementary proficiency measure as well as the other measures of the bilingual experience.
We believe that two main features make the presented analyses unique. First, given the questionable utility of data-reduction techniques in reflecting the uniqueness of language-use patterns, we propose a different approach to estimating indices related to language-use patterns. Here, the patterns of language use were assessed using two questionnaires (one developed for the purpose of the original study and one devised by Hartanto & Yang, 2016), each of which targeted language-use experience with regard to four social contexts (i.e., home, work, school, and free-time setting). The measures of the language-use patterns (i.e., language entropy, indices of intra-sentential code-switching, index of intersentential code-switching) were calculated for each context separately and then averaged across contexts with weights proportional to the time spent using languages in each context. Since the four contexts represented distinct but complementary social settings and we accounted for their relative contribution (using weighted averages), the resulting estimates accommodated the fact that every bilingual represents a unique pattern of language use. Second, the aspects of the bilingual experience were probed using multiple measures, which should inform to what extent the adopted measures reflected the assumed aspects of the bilingual experience.
The overarching goal of this study was to unravel the relationship between language-entropy, language-switching, and language-mixing measures and learn about their utility in differentiating bilinguals in terms of L2 proficiency. This should enable us to propose a more comprehensive description and a more appropriate assessment of the language-use patterns. Moreover, since the performed factor analysis provided evidence for two separable aspects of L2 proficiencynamely, self-confidence in using L2 and vocabulary knowledgein a post hoc manner we aimed to understand the similarities and differences between these proficiency constructs. In particular, we were interested in identifying those predictor measures that would show the opposite effects for these two aspects of L2 proficiency. Opposite effects for selfconfidence in using L2 and vocabulary knowledge in L2 would indicate that predictor measures can simultaneously capture the specific variance of these two proficiency constructs. This, in turn, would indicate the particular utility of these measures in reflecting the bilingual experience.

Participants
Two hundred and fifteen participants took part in the original study. Only volunteers who declared the daily use of English were invited to the study. Three participants were excluded due to technical problems with data collection. Five other participants were excluded due to the incomplete completion of at least one of the questionnaires. Another 36 participants were excluded because they did not perform the correct language version of either the semantic fluency task or the LexTALE (n = 22 and n = 12, respectively). In total, one hundred and seventy-one participants were included in the current analyses. The participants' socio-demographic characteristics were assessed using a background questionnaire delivered in Polish (its Polish and English versions can be found at the Open Science Framework, OSF; https://osf.io/ecnw5/). On average, the participants were young adults (mean age 24 years, SD = 4.64; 132 female; 90 righthanded); they reported a moderate to high income and aboveaverage social status (for details, see Table 1). Their fluid intelligence was measured using a shortened version of Raven's Advanced Progressive Matrices test (only odd-numbered items, 20 minutes to complete; score as the sum of correct responses). On average, participants scored 71% on this test. Table 2 presents the self-assessment data concerning the participants' language experience (for a description of the questionnaires, see section 2.2). Participants were Polish-English bilinguals who acquired Polish (L1) in their early childhood (below the age of four). Seventeen participants also acquired English (L2) in early childhood, while the others started learning L2 in primary school at around the age of seven. On average, the participants started using L2 more intensively when they attended junior high school at around the age of 12. Participants considered their L1 proficiency to be significantly higher than their L2 proficiency, which they considered intermediate to high. They also scored relatively high in English LexTALE (see Table 3). On average, they used L1 for slightly over half the day, and 32% of them used L2 more than L1 on any given day. In addition, 47 participants declared that they had knowledge of some additional languages (predominantly German, Spanish or French). However, the overall proficiency of these additional languages was relatively low, and the daily use of these languages was marginal. They also declared relatively short exposure to L3 (no longer than five years for 34 participants).
The participants lived in their L1 environment (Poland) at the time of testing. The general language environment in Poland is monolingual, with Polish being the official language of communication. However, as globalization continues, the use of other languages (mostly English) is increasing, especially in work environments. This situation was well reflected in the participant sample, as the majority of participants reported using L2 at work (60%). A third of them reported using L2 at school and/or in their free time. Less than 10% of participants reported using L2 at home. L3 experience was limited mainly to school settings (language courses). The participants were likely to use more than one language in the same context (as indicated by relatively high language entropy), and they moderately often switched and mixed languages (based on the indices of language mixing, intrasentential codeswitching, and inter-sentential codeswitching; for descriptive statistics, see Table 3).
The study met the requirements and gained the approval of the Ethics Committee of Jagiellonian University in Krakow, Institute of Psychology, concerning empirical studies with human participants. The participants were not aware of the reasoning behind the study. Each participant signed an informed consent form prior to the procedure. Following the testing, the participants were debriefed, informed about the study's goals, and paid for their participation (PLN 40, about $10). Notes. a score in a version of Raven's Advanced Progressive Matrices test (for description see the text); b self-ratings were 1 = less than high school, 2 = high school (with/without high school certificate), 3 = college/graduate school (with/without a BS/BS/master certificate), 4 = more than a master's degree; c participants were asked to self-rate their social status in relation to other people living in their country of residence; they answered on a scale resembling a ladder with rungs numbered from 1 to 10; one rung corresponded to one SES level and only extreme rungs had word labels, i.e., 1 = people who have the least money, the least education, and the least prestigious jobs or no job and 10 = people who have the most money, the most education, and the most prestigious jobs; d self-ratings were 1 = less than €500 per month to 6 = more than €4,500 per month; SD, standard deviation.

Measures
Self-assessment of language proficiency and the onset of bilingualism Data on the participants' language proficiency and onset of bilingualism were collected using a language background questionnaire, which included questions from two commonly used questionnaires, i.e., the Language Experience and Proficiency Questionnaire (Marian et al., 2007) and the Language History Questionnaire (Li et al., 2014). Participants indicated all languages they know (including their native language and all other languages learned). Then, they rated their language abilities for each language with regard to listening, reading, speaking, and writing on a scale from 1 (no knowledge of given language) to 9 (native-like proficiency). They also declared the age at which they started acquiring each language and the age at which they started using this language actively in communication. The questionnaire was delivered in Polish, and its Polish and English versions can be found in the OSF repository (https://osf.io/ecnw5/). The age of L2 acquisition (L2 AoA) and the age of active communication in L2 (L2 AoAC) served as measures of the onset of Notes. SD, standard deviation; the self-ratings range for proficiency were 1 = no knowledge of a given language to 9 = native-like proficiency; a statistics for the average of additional languages; b age in years; c daily use of languages do not sum up to 100% because not all participants used more than two languages. Notes. use, daily use of L2 in %; Lex, LexTALE score in %; SF, semantic fluency score; listen, listening rating; read, reading rating; speak, speaking rating; writ, writing rating; AoA, age of L2 acquisition; AoAC, age of active communication in L2; ent, index of entropy; mix, index of language mixing; inter, index of inter-sentential codeswitching; intra, index of intra-sentential codeswitching; N = 171; SD, standard deviation; min, minimum; max, maximum; p < .05 bolded.
bilingualism. The self-ratings concerning abilities in L2 served as measures of L2 proficiency.

Objective measurement of L2 proficiency
In addition to self-assessment, data on L2 proficiency were collected using two objective proficiency tasks: the Lexical Test for Advanced Learners of English (LexTALE hereafter) and the semantic fluency task. LexTALE was based on (Lemhöfer & Broersma, 2012). Participants were presented with a series of letter strings, presented one by one on a computer screen in a fixed order. The width of the five-character string on the display was 58 mm (5.0 • ). For each string, participants indicated whether or not it was an existing English word by pressing either the left shift (yes) or the right shift (no) on the keyboard. If they were unsure, they were instructed to respond "no". The task was not timelimited. The score was computed as the percentage of correct responses. Before data collection, participants received three practice items to ensure that they understood the task.
The semantic fluency task had a written form. It was based on a version that accommodates individual variation in writing speed (Abrahams, Goldstein, Al-Chalabi, Pickering, Morris, Passingham, Brooks & Leigh, 1997;Abrahams, Leigh, Harvey, Vythelingum, Grise & Goldstein, 2000). Within a 2-min time limit, participants were first asked to produce as many English words as possible that belong to a semantic category (generation condition). Afterward, the list of generated words appeared on the left side of the screen, and participants were asked to rewrite this list on the right side of the screen (control condition). The width of the five-character string on the display was the same as in LexTALE. Participants were instructed to avoid producing repetitions and names of people and places. They gave responses using the keyboard. The control condition was not time-limited, but its duration was measured from the time the first character was written to the time the "STOP" button was pressed. The task included three categories (the same as in (Baus, Costa & Carreiras, 2013;Linck, Kroll & Sunderman, 2009)i.e., fruits and vegetables, animals, parts of the body) that were counterbalanced across participants. The semantic fluency score was calculated as the number of exemplars (without repetitions and proper names) divided by the time spent writing a single word (i.e., the total duration of the control condition divided by the number of words generated in this condition). The main task was preceded by a training session in which participants generated exemplars belonging to the "clothes" category within a 30-s time limit.
Self-assessment of patterns of language use and daily use of L2 Data were collected using two questionnaires: The Patterns of Language Use Questionnaire and The Code-switching and Interactional Contexts Questionnaire. The former questionnaire was developed for the purpose of the original study and was delivered in Polish (for its Polish and English versions, see the OSF repository, https://osf.io/ecnw5/). It consists of four parts, each of which targets a different social context: home, work, school, and free time. For each context, participants list all languages they use in this context and estimate how many hours a day they use these languages in that context. If they use more than one language in a given context, they additionally assess how often they mix words of different languages within single utterances. The statements are the same for each context and are accompanied by examples of situations that are specific to a given context. The statements are assessed on a scale from 1 (never) to 9 (always). The theoretical validity of the questionnaire was examined prior to the original study (for details, see Kałamała, Szewczyk, Chuderski, Senderecka & Wodniecka, 2020).
The language-use data served to compute LANGUAGE ENTROPY and THE PERCENTAGE OF DAILY USE OF L2. Language entropy was computed for each context separately using the following formula: where p i is the probability of the use of a language in a context. This procedure resulted in four entropy scores for each participant. Entropies were then averaged across the contexts with weights proportional to the time spent using the languages in each context. A higher score indicates a more diverse use of languages during a given day. If a bilingual uses only one language in a context, then the entropy value is 0. If a bilingual uses several languages to the same degree, then the entropy value is about 1 for two languages and about 1.60 for three languages. Daily use of L2 was computed as the averaged percentage of time spent using L2 during a given day.
The assessment of statements served to compute the INDEX OF LANGUAGE MIXING. Since statement 1 was found to be unreliable in the original study (see Footnote 2 in Kałamała et al., 2020), it was excluded from the computation. The responses to statements 2-4 were first averaged within contexts and then averaged across contexts with weights proportional to the time spent using languages in each context. A lower score indicates less frequent mixing of languages within utterances.
The Code-switching and Interactional Contexts Questionnaire (Hartanto & Yang, 2016)) was translated into Polish; the only modification in comparison to the original questionnaire was the use of a 9-point scale (1 = never to 9 = always) instead of a 5-point scale (1 = never to 5 = always) (for its Polish and English versions, see Appendix E in their paper or the OSF repository, https://osf.io/ecnw5/). Similarly to the questionnaire by Kałamała and colleagues (2020), the questionnaire consists of four parts, each of which targets one of four social contexts: home, work, school, and free time. For each context, participants assess how often they switch languages between sentences (intersentential codeswitching; one question per context) and how often they mix words of different languages within single sentences (intra-sentential codeswitching; one question per context). The INDEX OF INTER-SENTENTIAL CODESWITCHING and THE INDEX OF INTRA-SENTENTIAL CODESWITCHING were computed by averaging corresponding responses with weights proportional to the time spent using languages in each context. For both indices, a lower score indicates less frequent switching between and less frequent mixing of languages. The index of intra-sentential codeswitching corresponds to the index of language mixing from the first questionnaire.

General procedure
The participants were tested in groups of up to eight during a session lasting approximately 2-2.5 hours (including breaks between blocks of tasks and a longer break in the middle of a session). After informed consent was obtained, the participants performed the battery of tasks and questionnaires, which were administered in Polish in a fixed order: antisaccade task, semantic fluency task, LexTALE task, Stroop task, Patterns of Language Use
Questionnaire, go/no-go task, fluid intelligence test, Codeswitching, and Interactional Contexts Questionnaire, stop-signal task, socio-demographic background questionnaire, language background questionnaire. The questionnaires were administered using electronic PDF forms, and the tasks were administered using DMDX (Forster & Forster, 2003). The data concerning the antisaccade task, go/no-go task, Stroop task, and stop-signal task were beyond the scope of this study and have been published elsewhere .

Data preparation and analyses
The data were analyzed using R (R Core Team, 2019) with the following packages: stats, tidyverse ( (Kline, 2011). All measures were centered and scaled in order to ensure a common measurement scale. Since most of the variables were ordinal, Spearman's rank correlation coefficient (rho) was used in the correlations. We followed the analysis workflow of Gullifer and colleagues (Gullifer et al., 2020;Gullifer & Titone, 2019). First, we reduced the complexity of the data. Since the onset-of-bilingualism measures (i.e., L2 AoA and L2 AoAC) and the language-switching measures (i.e., language mixing, inter-sentential codeswitching, intra-sentential codeswitching) highly correlated with each other (rhos > 0.5; see Table 3), we used their averaged values in the analysis (i.e., an averaged onset-of-bilingualism score and an averaged language-switching score). In order to determine the latent structure of language proficiency, the self-ratings concerning the L2 abilities (i.e., listening, reading, speaking, and writing), the LexTALE scores, and the semantic fluency scores were subjected to factor analysis. First, we used the Kaiser-Meyer-Oklin test to determine whether the measures were suitable for the factor analysis. A Kaiser-Meyer-Oklin score > .60 indicated the inclusion of a measure. The number of factors was determined via the parallel function, and the factors were allowed to correlate (i.e., we used oblimin rotation) in order to reflect the real structure of the data. Next, we used multiple regression to determine which factors predict L2 proficiency. For each L2 proficiency factor, we fitted three nested models. The BASE MODEL included the complementary L2 proficiency factors, averaged onset-of-bilingualism score, and daily use of L2. The ADDITIVE MODEL additionally included the language entropy and averaged language-switching score. The INTERACTION MODEL comprised all measures included in the additive model and also all two-way interactions with the onsetof-bilingualism score. Model comparisons using χ 2 tests indicated whether the addition of the measures in the additive and interaction models improved the models' fit. The study materials, data, and R scripts are available at https://osf.io/ecnw5/. Table 3 presents descriptive statistics and Spearman's rho correlation matrix for the measures of bilingualism. All of the measures demonstrated great variability, as indicated by the SDs and the min-max value ranges. The measures of L2 proficiency (selfratings for listening, reading, speaking, and writing, LexTALE score, semantic fluency score) positively correlated with each other, but the strength of the correlation varied between pairs. The onset-of-bilingualism measures (L2 AoA and L2 AoAC) showed a relatively strong positive correlation, suggesting that both could be taken as measures of the same construct. The language-switching measures (i.e., language mixing, intersentential codeswitching, intra-sentential codeswitching) also demonstrated strong positive correlations, suggesting that they all reflected language-switching behavior.

Predicting L2 abilities
The Kaiser-Meyer-Oklin test indicated that the data on L2 proficiency (i.e., self-ratings for listening, reading, speaking, and writing, LexTALE, and semantic fluency scores) were suitable for factor analysis (overall score = 0.81; no measures fell below the 0.60 threshold for sampling accuracy). The factor analysis for L2 proficiency measures demonstrated that a two-factor solution best described the data and explained 63% of the total variance (see Table 4). The self-ratings loaded onto Factor 1, which explained 74% of the variance. The LexTALE and semantic fluency scores loaded onto Factor 2, which explained 26% of the variance. For Factor 1, the factor loadings were similar. For Factor 2, the LexTALE loaded onto the factor more than the semantic fluency did. Factor 1 was assumed to reflect selfconfidence in using L2, whereas Factor 2 constituted an objective measurement of vocabulary knowledge (for a justification, see section 1.2). The factor scores positively correlated with each other (r = .53, p < .001) and were subsequently used as dependent variables in the regression analyses. Table 5 presents the outcomes of the regression analyses for self-confidence in using L2 (Factor 1). Compared to the basic model, the addition of language entropy and language switching improved the model's fit, χ 2 (2) = 4.22, p < .05, but the addition of two-way interactions did not further improve the fit, χ 2 (4) = 0.60, p = .66. The analysis showed that vocabulary knowledge (Factor 2), the onset of bilingualism, and language entropy significantly predicted self-confidence in using L2. The data suggest that higher self-assessment of L2 proficiency is related to greater vocabulary knowledge, earlier onset of bilingualism, and higher diversity of language use. Table 6 presents the outcomes of the regression analyses for vocabulary knowledge (Factor 2). Here, the addition of language entropy and language-switching measures to the basic model improved the model's fit, χ 2 (2) = 6.37, p < .001; the addition of interactions further improved the fit, χ 2 (4) = 3.93, p < .001. The analysis showed that self-confidence in using L2 (Factor 1), daily use of L2, and language entropy significantly predicted vocabulary knowledge. This suggests that greater vocabulary knowledge is related to higher self-confidence in using L2, greater daily use of L2, and less diverse language use. Compared to the basic model, the interactive model further showed two interactive effects, which are presented in Figure 1. The onset of bilingualism significantly interacted with self-confidence in using L2 (Factor 1), indicating that the relationship between self-assessment and vocabulary knowledge is stronger for bilinguals who became bilinguals relatively early. The onset of bilingualism also interacted with language switching, indicating that more frequent language switching is related to greater vocabulary knowledge, but only for bilinguals who started using L2 relatively late. At the same time, there was no relationship between language switching and vocabulary knowledge for bilinguals who started using L2 relatively early.

Discussion
This study aimed to test the discriminatory properties and interrelationships between the commonly used measures of the bilingual experience. To this end, we reanalyzed the data from our previous study , in which a relatively large group of young adult Polish-English bilinguals completed a set of language questionnaires and two L2 proficiency tasks. In the analyses, we focused on four different aspects of the bilingual experience and probed each of them using multiple measures. The onset of bilingualism was quantified as the age of L2 acquisition (L2 AoA) and age of active communication in L2 (L2 AoAC); daily use of L2 was represented as the percentage of time spent using L2; L2 proficiency was assessed by means of self-ratings (for listening, reading, speaking, and writing) and two objective L2 proficiency tasks (i.e., LexTALE and semantic fluency). The patterns of language use were assessed by means of language entropy and three different indices reflecting language-switching habits: one reflecting language switching between utterances (i.e., the inter-sentential codeswitching index), and two indices reflecting the mixing of elements of two languages within single utterances (so-called language mixing), i.e., the intra-sentential codeswitching index and the language mixing index. Below, we discuss the relationships between the applied measures and their utility in differentiating bilinguals in terms of L2 proficiency.

Relationships between the measures of the bilingual experience
The strong relationship between L2 AoA and L2 AoAC suggests that they both tap into the same construct, which we defined as the onset of bilingualism. The strong relationships between the language-switching and language-mixing measures suggest that they jointly reflected a general language-switching tendency in the tested sample: if a bilingual frequently switched languages BETWEEN utterances, they were also likely to mix languages WITHIN single utterances (language-switching tendency hereafter). This finding seems to challenge the widely held notion that language-switching behavior is a continuum in which language switching and language mixing are opposite poles (Deuchar, Muysken & Wang, 2007;Green & Wei, 2014;Hofweber, Marinis & Treffers-Daller, 2016). It should be noted, however, that this study is the first to compare self-assessed switching and mixing habits in a single study, and it might well be that the relationships between the measures would be different in other language environments, e.g., in a code-switching community. Alternatively, it could also be the case that bilinguals are unable to retrospectively separate their switching and mixing experiences and therefore consider them jointly in self-assessments. We will come back to these issues in section 4.5 (Limitations and future directions). At the same time, language entropy moderately correlated with the language-switching and language-mixing measures, suggesting that language entropy reflects a property of the language-use pattern that is to some extent independent of the language-switching tendencies (this is further confirmed by the regression analyses discussed below). Regarding the measures of L2 proficiency, the factor analysis showed evidence for two moderately correlated factors. The first factor consisted of self-ratings and was considered to represent self-confidence in using L2. The second factor consisted of the LexTALE scores and semantic fluency scores and was considered to represent vocabulary knowledge in L2 (for a justification, see section 1.2).

Predicting L2 abilities
In order to better understand how the measures differentiate bilinguals in terms of the bilingual experience, we performed a set of regression analyses. Self-confidence in using L2 was related to the onset of bilingualism, indicating that the earlier an individual becomes bilingual, the greater their personal confidence in using L2. Vocabulary knowledge was related to the daily use of L2, indicating that the more bilinguals use L2 on a daily basis, the greater their vocabulary in that language. These effects replicate previously reported findings (Anderson et al., 2018;Friesen et al., 2015;Gullifer et al., 2020;Gullifer & Titone, 2019;Luk & Bialystok, 2013).
The model for vocabulary knowledge further showed the interaction between the onset of bilingualism and self-confidence in using L2 (see the left panel in Figure 1). This interaction indicated that the sooner individuals become bilinguals, the more their personal confidence in using L2 translates into their actual knowledge of the vocabulary of that language. This effect has not been reported in the literature so far, and it has important implications for measuring bilingual language proficiency (we will comment on this in section 4.4).
Crucially, the analyses also showed the discriminative utilities of the measures related to patterns of language use. Language entropy explained the unique variance of both L2 proficiency constructs. A higher value of language entropy was related to better selfconfidence in using L2 but worse vocabulary knowledge. These findings indicate that the more diverse the daily language use, the greater the self-confidence in using L2 but the poorer the knowledge of the vocabulary in that language. Moreover, the language-switching tendency was found to be a predictor of vocabulary knowledge but only if quantified by the onset of bilingualism (see the right panel in Figure 1). The data suggest that late bilinguals who frequently switch languages show better knowledge of L2 vocabulary. No such relationship was found for individuals who started using L2 early, which suggests that language-switching tendency improves vocabulary knowledge but only in late bilinguals.

Implications for measuring patterns of language use
The study demonstrates the important discriminative properties of measures related to patterns of language use. Both language entropy and language-switching tendency explained the unique variance of L2 proficiency over and above the classic measures of the bilingual experience, such as the onset of bilingualism and daily use of L2. Importantly, language entropy was related to both L2 proficiency factors but in exactly the opposite way. This indicates that language entropy covers variance that is simultaneously specific to both subjective and objective measurements. This finding is novel and speaks to the importance of accounting for language entropy in research practice: by taking language entropy into account, we are able to establish the conditions under which greater self-confidence in using a language is accompanied by more impoverished vocabulary in that language. The significant effects of language entropy and languageswitching tendency indicate the important role of language-use patterns in shaping the bilingual experience. However, the significance of these effects could also be attributed to the fact that language entropy and language-switching measures accounted for unique variance related to experience of additional languages (L3), whereas the other predictor measures did not. Since only a minority of the participants reported contact with L3, and the L3-related variance was limited (see section 2.1), we deliberately did not make any reference to this source of variance in the analyses. Nevertheless, to unequivocally reject the hypothesis that L3 experience affected the pattern of results, we fitted additional regression models in which the predictor pertaining to daily use of L2 had been replaced with the overall daily use of additional languages (for a similar analysis, see Gullifer & Titone, 2019). The pattern of results largely remained the same. 1 Moreover, it should be noted that the language-entropy effects reported in this study do not fully correspond to the previously Table 5. Model outputs for the three nested models that predict self-confidence in using L2 (Factor 1).
1 For self-confidence in using L2, the additive model was still the best-fitting model (χ 2 (2) = 4.14, p < .03). For vocabulary knowledge, the interactive model also remained the best-fitting model (χ 2 (4) = 4.00, p < .05). The significance and direction of the effects were the same as in the original models. The magnitude of some of the effects has changed, but only slightly, and thus did not affect the interpretation provided in the paper. reported findings. Specifically, Gullifer and colleagues reported positive effects of language entropy on language proficiency in two consecutive studies (Gullifer et al., 2020;Gullifer & Titone, 2019). In their earlier study, language proficiency was assessed via self-assessment. In the later study, self-assessment was accompanied by fluency tasks and, thus, was assumed to provide a more objective measurement of language proficiency. Therefore, the negative effect of language entropy for vocabulary knowledge, which in fact was measured using objective proficiency tasks, seems to contradict the second study of this group. We think, however, that the present study cannot be directly compared with previous research as we employed a different approach to estimating overall language-entropy scores. The weighted-average method used in our study considers the overall variability of the language-use patterns represented by individual bilinguals, thus providing a complete and comprehensive quantification. This was not necessarily the case in previous studies because Gullifer and colleagues employed data-reduction techniques that focus on extracting common variance. It may therefore be the case that the specific variance that was accounted for in our analyses was filtered out by the data-reduction techniques used in previous research. Moreover, a closer look at the outcomes of the factor analysis by Gullifer and colleagues suggests that the factor that was assumed to represent an objective measurement of L2 proficiency in their study was dominated by self-ratings, while the L2 fluency tasks made little contribution to this factor. This undermines the objectivity of this measurement and suggests that the discrepancy between the present findings and the findings reported in Gullifer and colleagues (2020) is only apparent: if the measurement of language proficiency mostly reflects selfconfidence in using a language, then greater diversity in language use translates into better language proficiency. However, if language proficiency points to vocabulary knowledge, then greater diversity in language use should translate into poorer language proficiency. This speaks to the importance of the adequate operationalization of language proficiency. We will come back to this issue in the following section.
Summing up, the research by Gullifer and colleagues (Gullifer et al., 2020;Gullifer & Titone, 2019), which showed the predictive value of language entropy, certainly made an important contribution to understanding the sources of individual differences in bilingualism. Our research further advances the state of knowledge, showing that greater diversity of language use is related to better L2 proficiency but only if language proficiency is understood as personal confidence in using L2. The opposite effect should be expected for vocabulary knowledge. Overall, the moderate correlations between language entropy and language switching, along with their unique effects on vocabulary knowledge in the current study, suggests that there might be two interrelated aspects of language-use patterns: one that reflects the balance (diversity) of language use (represented by language entropy), and another that reflects general language-switching tendency (represented by the language-switching and mixing measures). This implies that language entropy should not be considered a substitute for language-switching measures (Gullifer et al., 2020;Gullifer & Titone, 2019); instead, both aspects should be considered to ensure sufficiently precise measurement of language-use patterns.

Implications for measuring language proficiency
So far, most studies have assessed language proficiency on the basis of self-assessment (Surrain & Luk, 2017). However, as reviewed in the Introduction, self-assessment of L2 proficiency may not provide a complete indication of language proficiency (de Bruin et al., 2017;Tomoschuk et al., 2019). Moreover, limiting the Figure 1. Predicted values of vocabulary knowledge derived from the interactive model. Panel A shows vocabulary knowledge as a function of self-confidence in using L2 quantified by the onset of bilingualism. Panel B shows vocabulary knowledge as a function of the averaged language-switching score quantified by the onset of bilingualism. Vocabulary knowledge and self-confidence in using L2 refer to the respective factor scores. Onset of bilingualism refers to the averaged onset-of-bilingualism score. Language-switching tendency refers to the averaged language-switching score (for descriptions, see the text). SD refers to standard deviation. Ribbons represent 95% confidence intervals.

480
Patrycja Kałamała et al. evaluation of language proficiency to self-assessment can be misleading when bilinguals differ considerably in their language experiences (Tomoschuk et al., 2019). Therefore, the literature calls for a more comprehensive measurement of L2 proficiency. The outcomes of our factor analysis further support this call. Evidence for two moderately correlated factors of L2 proficiency suggests that language proficiency is not a uniform aspect of the bilingual experience but is itself a complex construct. Importantly, self-assessment of language abilities is insufficient to fully account for vocabulary knowledge, and additional measurements are needed to represent this aspect of language proficiency. The outcomes of the regression analyses further advance the current understanding of the factors that affect language proficiency estimates. Language entropy and the onset of bilingualism seem to play important roles in shaping self-confidence in using L2, whereas language entropy, language switching, and daily use of L2 seem to shape the knowledge of L2 vocabulary (but language switching impacted vocabulary knowledge only in the case of the late onset of bilingualism). Importantly, selfconfidence in using L2 was more likely to overlap with vocabulary knowledge for the earlier onset of bilingualism, thus suggesting that the earlier L2 is acquired, the more language proficiency is considered through the prism of vocabulary knowledge. This finding contributes to the discussion on the validity of selfassessed language proficiency. While Tomoschuk and colleagues (2019) pointed to the role of language dominance in interpreting self-assessed L2 proficiency within a bilingual group, we show that bilinguals' understanding of self-assessed language proficiency (i.e., the very meaning behind the concept of language proficiency) may change depending on the onset of bilingualism.

Limitations and future directions
The study advances the current understanding of the factors that shape the bilingual experience. However, some of the reported findings are limited by the available data and require further investigation. The first limitation is related to the specificity of the tested participant sample. The participants were comparable in terms of their socio-demographic characteristics, which reduced the impact of experiential factors not related to language experience. However, since most of the participants were relatively young adults (88% under thirty years old) and all of them lived in their L1 environment, the findings cannot be generalized to the entire bilingual population. In particular, we anticipate that the relationship between language switching and mixing may be different for older bilinguals or other language environments. Therefore, an important question concerns the extent to which the switchingmixing relationship observed for relatively young bilinguals living in the L1 environment generalizes to other bilingual populations.
The second limitation is related to the selection of measures and the corresponding aspects of the bilingual experience. In this study, we have focused on those aspects that have received the greatest attention in the literature (Marian & Hayakawa, 2021;Surrain & Luk, 2017). However, some researchers have also pointed to other aspects of bilingualism that we have not accounted for in this study (e.g., identity with L2 culture or language environment/context; for more examples, see Marian & Hayakawa, 2021). Therefore, future research should consider addressing a more extensive range of aspects. Moreover, the data indicate that self-assessment of language proficiency may hinder credible comparisons between bilinguals. However, it needs to be acknowledged that eight of our measures were, in fact, based on self-assessment. Therefore, one may argue that the self-assessment biased the results related to the other aspects of the bilingual experience as well. We think, however, that this is an unlikely scenario. Since the measures related to the onset of bilingualism and daily language use (e.g., L2 AoA, language entropy, daily use of L2) referred to relatively objective facts, they are largely assumed to be reliable and valid (de Bruin, 2019;Leivada et al., 2020;Marian & Hayakawa, 2021). Their reliability was also supported by our additional analysis. 2 Yet, it should be acknowledged that little is known about the reliability and validity of self-assessed language-switching habits (but see de Bruin, 2019;Dewaele & Wei, 2014;Jevtović, Duñabeitia & de Bruin, 2020). Therefore, the strong correlations between language switching and language mixing should be treated with caution as they could also be attributed to the fact that bilinguals cannot separate their language-switching and language-mixing tendencies and always consider them collectively. Future research should thoroughly examine how well self-ratings reflect actual language-switching behavior across different language environments. This can be achieved by comparing natural language-use habits (retrieved from voice recordings collected over a day) with self-assessment data. Relatedly, we argue that language proficiency is a complex construct and provide evidence for its 'vocabulary knowledge' aspect (for a similar argument, see DeLuca et al. 2019b). Therefore, future studies should investigate whether the effects reported for vocabulary knowledge can be replicated using other proficiency measures, such as the Multilingual Naming Test (Gollan et al., 2012), which is used to measure expressive vocabulary, or the Oxford Quick Placement Test (Geranpayeh, 2003), which is used to measure grammatical knowledge. In order to facilitate future research and enable direct between-study comparisons, we have shared the materials and scripts that we used in this study. The free online-available tool for calculating entropy scores devised by Gullifer and Titone (2018) could also be helpful.

Conclusions
The study shows the interrelationships and discriminative utilities of measures related to the bilingual experience in a relatively large group of young adult bilinguals living in their L1 environment. In particular, the study shows the discriminative potential of measures related to language-use patterns, i.e., language entropy and broadly understood language-switching tendency. Overall, the correlation pattern and the outcomes of the regression analyses suggest that language-use patterns can be composed of two interrelated aspects: the balance (diversity) of language use and language-switching tendency. The study also suggests that language switching and language mixing may not always be two sides of the same coin as they can coincide in some bilingual populations (but this effect needs further investigation). Moreover, the study also draws research attention to the problem of measuring language proficiency. The self-assessment of language proficiency is likely to reflect self-confidence in using a language. Crucially, bilinguals' individual understanding of language 2 In order to verify the reliability of L2 AoA and L2 AoAC, we performed an additional analysis on a longitudinal dataset collected in our laboratory (63 Polish-English bilinguals; a 7-month period between two subsequent testing sessions). The intra-class correlation coefficients (internal consistency for a two-way fixed model) for the L2 AoA and L2 AoAC were 0.91 and 0.81, respectively, which indicates satisfactory reliability of the measurements (Cicchetti, 2001). proficiency (the subject of self-assessment) may differ, even within a participant sample drawn from the same language environment. Therefore, self-assessment should not be considered a unique index of language proficiency but should be supplemented with additional measurements.