When and how to use confirmatory composite analysis (CCA) in second language research

Abstract Researchers in second language (L2) and education domain use different statistical methods to assess their constructs of interest. Many L2 constructs emerge from elements/parts, i.e., the elements define and form the construct and not the other way around. These constructs are referred to as emergent variables (also called components, formative constructs, and composite constructs). Because emergent variables are composed of elements/parts, they should be assessed through confirmatory composite analysis (CCA). Elements of emergent variables represent unique facets of the construct. Thus, such constructs cannot be properly assessed by confirmatory factor analysis (CFA) because CFA and its underlying common factor model regard these elements to be similar and interchangeable. Conversely, the elements of an emergent variable uniquely define and form the construct, i.e., they are not similar or interchangeable. Thus, CCA is the preferred approach to empirically validate emergent variables such as language skills L2 students’ behavioral engagement and language learning strategies. CCA is based on the composite model, which captures the characteristics of emergent variables more accurately. Aside from the difference in the underlying model, CCA consists of the same steps as CFA, i.e., model specification, model identification, model estimation, and model assessment. In this paper, we explain these steps. and present an illustrative example using publicly available data. In doing so, we show how CCA can be conducted using graphical software packages such as Amos, and we provide the code necessary to conduct CCA in the R package lavaan.


Introduction
Theoretical constructs are at the heart of second language (L2) research.Well-known examples include L2 motivation (Alamer et al., 2023;Dörnyei & Ryan, 2015), L2 emotions (Dewaele & Li, 2020;Nakamura et al., 2021;Pritzker et al., 2019), and L2 anxiety (Alamer & Lee, 2021;Horwitz, 2010).Often, constructs are not directly empirically observable.To make them accessible to empirical research, researchers develop and validate scales and inventories (Dörnyei & Dewaele, 2022;Iwaniec, 2019).Almost by default, L2 scholars rely on the common factor model (also known as the reflective measurement model) to validate their scales, and they mainly employ confirmatory factor analysis (CFA; Jöreskog, 1969) or the newly introduced method, exploratory structural equation modeling (ESEM; Asparouhov & Muthén, 2009; also see Alamer & Marsh, 2022;Marsh & Alamer, 2024) for that purpose.Following the common factor model, scale items are regarded as error-prone measurements of their latent variable, which means that the items are assumed to share a common cause that is responsible for their covariance structure.
However, not all constructs follow this definition.Instead, constructs can also emerge from elements/parts, i.e., the items define and form the construct and not the other way around.In such instances, the construct has the character of a collection or an inventory.Therefore, these types of constructs are called emergent variables (Henseler & Schuberth, 2020;Henseler, 2021).The emergent variable (also called a component, formative construct, or composite construct) "is an abstraction that results from the combined effects of all of the particular measures" (Cole et al., 1993, p. 175).An example from psychology is mother's availability to interact with and monitor a particular child.This construct merely groups the following three variables, namely, the number of children in a family, illness of the mother, and hours of maternal employment.It can be argued that these variables are distinct in meaning and not interchangeable, which makes them elements, thereby aligning more with the emergent variable perspective.In the following, we argue that L2 research also deals with constructs that are made up of elements.Potential examples are L2 achievement (Papi & Khajavy, 2021;Sparks & Alamer, 2022;2023), first language skills (Sparks, 2023), and the use of strategies such as language learning strategies (Oxford & Griffiths, 2016), vocabulary learning strategies (Alamer et al., in press), and metacognitive reading strategies (Alamer & Alsagoafi, 2023;Mokhtari & Reichard, 2002).Because emergent variables are made up of their elements but do not cause them, the implied assumptions of the common factor model seem to be invalid.Therefore, using CFA is hardly suitable for empirically studying emergent variables.So, which method should be used instead?
A recently developed analytical method, namely confirmatory composite analysis (CCA) (Schuberth et al., 2018), allows researchers to empirically investigate emergent variables.CCA is analogous to CFA, with the crucial difference being that CCA builds on the composite model rather than the common factor model (Schamberger et al., 2023).The composite model more accurately captures the characteristics of emergent variables.Apart from this difference, CCA follows the same steps as CFA: model specification, model identification, model parameter estimation, and model assessment.A crucial development in CCA is the recently introduced Henseler-Ogasawara (H-O) specification of composites (Schuberth, 2023, Yu et al., 2023), which makes it possible to conduct a CCA using conventional structural equation modeling (SEM) software packages such as Amos (Arbuckle, 2020), the R package lavaan (Rosseel, 2012), and Mplus (Muthén & Muthén, 1998-2017).Consequently, researchers can gain all the benefits they are accustomed to in SEM with latent variables, e.g., dealing with missing values (e.g., Allison, 2003;Muthén et al., 1987), gaining access to well-established model fit measures (Schermelleh-Engel et al., 2003), and constraining parameters.
The remainder of this paper is structured as follows.In the next section, we argue that, in several cases, L2 researchers could be dealing with emergent variables rather than latent variables.As a contextual example, we refer to the Strategy Inventory for Language Learning (SILL; Oxford, 2011), arguing that CFA is limited when it comes to assessing the SILL because the common factor model underlying CFA does not fit the characteristics of the SILL.As a more suitable approach, we argue that CCA more accurately captures the conceptual definition of the SILL.In addition, we provide guidelines on how to use CCA.Subsequently, we demonstrate the use of CCA by applying it to the Metacognitive Awareness of Reading Strategies Inventory (MARSI; Mokhtari & Reichard, 2002).We conclude our paper with a discussion and then indicate avenues for future research.
Confirmatory factor analysis (CFA) and its underlying assumptions CFA is a de facto standard technique for empirically validating scales reflecting latent variables in L2 and education research (e.g., Alamer, 2022;Schreiber et al., 2006;Shao et al., 2022).It builds on the common factor model to describe the relationship between scale items and a construct.Specifically, the common factor model is grounded in classical test theory (e.g., Lord & Novick, 2008); therefore, it assumes that the items are measurement error-prone manifestations of the construct, i.e., the latent variable causes the items (Fukuta et al., 2023).Consequently, correlations between items are expected as they are assumed to share a common cause, i.e., the latent variable that they purport to measure.For this reason, latent variables are often regarded as ontological entities: "If something does not exist, then one cannot measure it" (Borsboom et al., 2004(Borsboom et al., , p. 1061)).Conventionally, the common factor model assumes that the latent variable is solely responsible for the items' covariance structure, i.e., the items are unidimensional.Unidimensionality in CFA implies that each item loads on one latent variable only, i.e., all cross-loadings are fixed to zero (e.g., Alamer & Marsh, 2022;Marsh & Alamer, 2024).Thus, the common factor model assumes, at least in theory, that items from a homogenous pool reflecting the latent variable can be interchanged or dropped without altering the construct's meaning (e.g., Bollen & Bauldry, 2011).Table 1 summarizes the characteristics of the common factor model.
In the L2 research context, CFA and the common factor model have proven to be useful tools to empirically validate questionnaires that intend to measure phenomena that are not directly observable, i.e., scales that measure latent variables (Alamer & Marsh, 2022;Marsh & Alamer, 2024).Typical examples of latent variables from L2 research are L2 motivation, anxiety, enjoyment, and boredom.However, as we will explain in the following section, the use of CFA to empirically validate questionnaires intended to evaluate phenomena that are defined by a set of elements/parts, i.e., emergent variables, is limited.

Example from the literature illustrating the limitation of CFA in studying emergent variables
To illustrate the limitations of CFA and the common factor model in studying emergent variables, we focus on language learning strategies.Following Cohen (2014, p. 7), language learning strategies can be defined as "[t]houghts and actions, consciously chosen and operationalized by language learners, to assist them in carrying out a multiplicity of tasks from the very onset of learning to the most advanced levels of target-language performance."Similarly, learning strategies are "specific actions taken by the learner to make learning easier, faster, more enjoyable, more self-directed, more effective, and more transferable to new situations" (Oxford, 1990, p. 8) and "actions chosen by learners for the purpose of language learning" (Griffiths, 2018, p. 88).
Although various classes of learning strategies have been proposed over the recent decades, the three most commonly studied classes are cognitive, affective, and social learning strategies (Oxford & Griffiths, 2016).To limit our focus, we discuss only cognitive strategies, which are defined as strategies that language students use to help them understand, transform, and apply their language knowledge (Oxford, 1992).To evaluate the degree to which a learner uses cognitive learning strategies, researchers frequently employ the SILL (Oxford, 1990;Oxford & Griffiths, 2016).In particular, they use the following SILL items to evaluate L2 learners' use of cognitive strategies (rated on a Likert scale from 1: Never or almost never true of me, to 5: Always or almost always true of me; Oxford, 2011, pp. 102-136).In L2 English learning, the items would be: Item 1: I connect the sound of a new English word and an image or picture of the word to help me remember the word.Item 2: I use the English words I know in different ways.Item 3: I find the meaning of an English word by dividing it into parts that I understand.Item 4: I use new English words in a sentence so I can remember them.Item 5: I try to find patterns (grammar) in English.Item 6: I try not to translate word for word.
To judge the suitability of the common factor model and CFA for modeling and assessing the use of cognitive learning strategies, one should ask the following questions (Bollen & Bauldry, 2011; see also "Model specification" in Table 2): Does a change in the construct lead to a change in all items, i.e., does an increase in the use of cognitive strategies entail an increase in all the items, from 1 to 6? Can the items in principle be interchanged or removed without altering the meaning of the construct?Do we expect high correlations between the items because they share a common cause?For example, does a respondent who answers a certain way to Item 3 "I find the meaning of an English word by dividing it into parts that I understand" respond in a similar way to Item 6 "I try not to translate word for word"?The weight estimates reflect the relevance of the elements in forming the emergent variable.
Weight estimates should be positive and statistically significant.If not, the composite loadings of the emergent variable should be further assessed.

Composite loading estimates
The composite loading estimates show the elements' absolute contribution to the emergent variable and provide information about the orientation of the emergent variable.
The sign should be in line with the expected orientation of the emergent variable.Loading estimates should be statistically significant.Criterion validity, i.e., concurrent and predictive validity The extent to which an emergent variable correlates with another construct based on theory.
A significant and sizable correlation supports criterion validity.
Considering the questions posed above, we answer all of them with a no.This puts the appropriateness of the common factor model and the application of CFA into question for their ability to model and assess the use of cognitive learning strategies.In contrast, we argue that the use of cognitive learning strategies is an emergent rather than a latent variable, which is composed of elements/parts.Specifically, each item of the SILL represents particular information about a cognitive strategy and how a given learner uses it to master the L2.Together, these items determine how much a learner uses cognitive learning strategies.Consequently, items are not interchangeable because each item is unique and represents a different cognitive strategy.Additionally, removing an item from the construct would probably alter its meaning because the dropped strategy (i.e., the item) cannot be recovered conceptually by any other item in the inventory.Furthermore, substantial correlations between items are not necessarily expected because learners can apply strategies differently.For instance, they might use English words they know in different ways (i.e., Item 2) but not divide the English word into its constituent parts to help determine its meaning (i.e., Item 3).Moreover, the SILL used to evaluate a learner's use of language learning strategies has often not been empirically supported by the results of previous empirical studies using CFA.For example, various studies observed fit measures indicating unacceptable model fit and/or low factor loading estimates (Hsiao & Oxford, 2002;Paige et al., 2004;Tragant et al., 2013).In response to this counterevidence, previous studies have proposed that some of the constructs or items should be removed (Habók & Magyar, 2018;Hsiao & Oxford, 2002;Yeh, 2014).Overall, it seems that the common factor model does not adequately support the conceptual definition of the use of cognitive learning strategies and thus the SILL.From a theoretical perspective, we argue that the use of cognitive learning strategies should rather be identified as an emergent variable; thus, the SILL should be evaluated via CCA and the composite model.

Confirmatory composite analysis (CCA)
CCA was proposed as a tool to empirically assess composite models (Henseler et al., 2014;Schuberth et al., 2018) and introduced to various research fields including business (Henseler & Schuberth, 2020), information systems (Hubona et al., 2021), tourism and hospitality (Liu et al., 2022), and human development (Schamberger et al., 2023).The first application of CCA in the L2 domain was done by Alamer and Alsagoafi (2023) who empirically tested the validity of the Revised Metacognitive Awareness of Reading Strategies Inventory (MARSI-R; Mokhtari et al., 2018).In their study, the authors compared the results of CCA and CFA when examining the validity of MARSI-R.They found support for CCA as model fit indices were acceptable in the CCA but not in the CFA, and item weights functioned in the expected direction.Given these recent findings, it appears that the L2 field warrants guidelines and practical tutorials for using CCA (Alamer et al., in press).In the following sub-sections, we present the four steps of CCA, i.e., model specification, model identification, model estimation, and model assessment.

Model specification
To study emergent variables, the composite model can be used (Henseler & Schuberth, 2020).At the heart of the composite model is an emergent variable and not a latent variable.First and foremost, an emergent variable is a weighted linear combination of elements, i.e., it is a composite.Hence, the composite model assumes that the construct is fully defined by its elements.Consequently, an emergent variable is not assumed to exist independent of its elements.This contrasts with the latent variable in the common factor model, which is measured and therefore assumed to exist independent of its measures (Borsboom et al., 2004).However, as Henseler and Schuberth (2023) noted, if the elements exit, so does the emergent variable.Since each element plays a constituent role in forming the construct, omitting an element will alter the construct's meaning in the composite model.An additional and important property of an emergent variable is that it accounts for the covariances between its elements and other variables in the model.This property is expressed by the axiom of unity (Henseler & Schuberth, 2021a).Thus, an emergent variable conveys all the information shared between its elements and other variables of the model (Dijkstra, 2013;2017).Hence, an emergent variable acts as a whole and not as a mere loose collection of elements (Henseler & Schuberth, 2021b).Table 1 summarizes the characteristics of the composite model.Note that composites are usually depicted by hexagons (e.g., Grace & Bollen, 2008).However, most SEM software packages with a graphical interface have not implemented this graphical representation yet.
Initially, the proposition was to express emergent variables in CCA by means of weights because this is highly intuitive (Schuberth et al., 2018).However, this prevents a researcher from conducting CCA in SEM, limiting its ability to benefit from SEM's advantages, such as obtaining fit measures and dealing with missing values (Schuberth, 2023).This is because an emergent variable will always be modeled as a dependent variable.For this reason, it is not possible to specify covariances between an emergent variable and other variables, which is an essential requirement for conducting CCA.Therefore, taking such an approach, researchers could only specify covariances between the emergent variable's disturbance term and other variables in the model.However, since an emergent variable is assumed to be fully determined by its elements, the variance of this disturbance term must be constrained to zero.Besides the fact that covariances with a constrained disturbance term cannot be specified, this clearly contradicts specifying covariances between emergent variables.
To overcome this limitation, in this study, we use the H-O specification (Henseler & Schuberth, 2021b;Schuberth, 2023), in particular its refined version (Yu et al., 2023).In the H-O specification, not only a single composite, but as many composites as elements are formed from a set of elements, i.e., one emergent and several excrescent variables.The emergent variable depicts the construct of interest, whereas the excrescent variables have no further meaning.They are merely formed to span the space of the elements together with the emergent variable.This approach resembles a principal component analysis (Hotelling, 1933) in which as many principal components as variables are extracted.For a more technical description, we refer the reader to Schuberth (2023) and Schamberger et al. (2023).Figure 1 presents an example of the H-O specification in the SEM software Amos.In this example, the emergent variable is formed by five elements; consequently, four excrescent variables (ex1 to ex4) are specified.Notably, the elements are assumed to be free from random measurement error.Moreover, the emergent variable must be connected to at least one other variable of the model as indicated by the doubleheaded arrow, as we will elaborate in the next subsection about model identification.This additionally highlights that emergent variables are context-specific, i.e., their meaning also depends on the model's other variables (Yu et al., 2021).Since Amos software does not allow for drawing hexagons, the emergent and excrescent variables are displayed as ovals in Figure 1.
To facilitate the application of CCA in the R environment (R Core Team, 2022) using the lavaan package (Rosseel, 2012), the R function specifyHO can be used. 1In doing so, the user must specify the model's emergent variables in lavaan syntax using the '<' operator.Subsequently, this model can be applied to the specifyHO function to obtain lavaan model syntax in which emergent variables are specified in compliance with the H-O specification.Subsequently, this obtained model syntax can be used as input for the sem function of the lavaan package to conduct CCA.

Model identification
To ensure that the model parameters are identified, constraints must be imposed on the parameters.This involves determining the variances of the emergent and excrescent variables.For this purpose, we set one composite loading for each emergent and excrescent variable to one.In this regard, no element can serve as a scaling variable more than once.In our example model, depicted in Figure 1, each element is used only once as scaling indicator, i.e., only one of its composite loadings is constrained to 1.For example, the composite loading of Element 1 on the emergent variable is constrained to 1. Similarly, Element 2 shows that a composite loading on the excrescent variable ex1 is constrained to 1. Further, the emergent variable must be uncorrelated with the excrescent variables, in this case ex1 to ex4.Also, the emergent variable must be related to at least one other variable in the model other than its elements, e.g., to another observed, latent, or emergent variable in the model.In our illustrative model, this is indicated by the double-headed arrow.In contrast, the excrescent variables are only allowed to correlate with one another, as Figure 1 shows.Consequently, the emergent variable fully accounts for the covariances between the elements and other variables in the model.In other words, all information between the elements and other variables in the model is conveyed by the emergent variable (Dijkstra, 2017).Further, one needs to ensure that the excrescent variables span the remaining space of the elements that is not spanned by the emergent variable (Schuberth, 2023).To this end, we fix all composite loadings of each excrescent variable to zero except for two, namely the composite loading that is fixed to one to determine the excrescent variable's variance and one composite loading that is freely estimated.For example, in Figure 1, the composite loading of Element 1 on the first excrescent variable ex1 is a free model parameter, whereas all other composite loadings on that excrescent variable are constrained to 1 or 0. The composite loadings of the other excrescent variables are fixed in similar fashion.In fixing the composite loadings of the excrescent variables, it must be ensured that no excrescent variables are connected to the exact same elements.Finally, by default, most SEM software applications specify random measurement errors connected to the elements.In this case, the variances of these error terms must be constrained to zero.

Model estimation
Once identification of the model parameters has been ensured, they can be estimated.The H-O specification allows us to draw on different kinds of SEM estimators such as maximum-likelihood (ML) (Jöreskog, 1970) or generalized least squares (GLS) (Browne, 1974).As a result, researchers applying CCA can gain all the benefits that they are accustomed to having with SEM, e.g., fixing parameters and gaining access to well-established model fit indices (Kline, 2015, Chapter 12).
A supposed disadvantage of the H-O specification is that weight estimates are not obtained by default because the relationships between the emergent and excrescent variables and their components are expressed by composite loadings instead of weights.However, as shown in Schuberth (2023) and Schamberger et al. (2023), the weight estimates can be retrieved from the inverted composite loading matrix.As most SEM software applications allow users to specify new parameters, this feature can be exploited to obtain the (standardized) weight estimates.For an explanation of how the weights can be obtained from the composite loadings, we refer the reader to Schuberth (2023) and Yu et al. (2023).Further, the specifyHO function offers the option of determining weights.

Model assessment
In CCA's final step, the model is assessed, and its parameter estimates are interpreted.This involves assessing the overall model fit and assessing the emergent variables (Henseler & Schuberth, 2020;Schuberth et al., 2018).As in CFA, overall model assessment is crucial in CCA and typically involves considering the outcomes of the exact model fit test and various fit indices.If the estimated model's fit is found to be unacceptable, then the elements forming the emergent variable probably act not as a new whole, but rather as merely a loose collection of parts.Consequently, researchers are urged to consider the elements individually or to modify their models.
To assess overall model fit in CCA, researchers can, in principle, draw on all that is known through CFA and SEM.This includes the chi-square test to assess the exact overall model fit (Jöreskog, 1967).However, because testing the exact overall model fit has been criticized as unrealistic (e.g., Bollen, 1989, Chapter 7), various fit indices have been proposed to gauge model fit.These include the standardized root mean square residual (SRMR; Bentler, 1995), the comparative fit index (CFI; Bentler, 1990), the Tucker-Lewis index (TLI; Tucker & Lewis, 1973), and the root mean square error approximation (RMSEA; Steiger, 2016).Although existing studies have indicated that fit indices can detect misspecified composite models (Schuberth et al., 2018;2022), future research still has to reassess their cut-off values for composite models.
Besides the overall model fit assessment, parameter estimates should be investigated.In this context, the composite loading and weight estimates are of particular interest.The emergent variables' composite loadings are the covariances between an element and the corresponding emergent variable.Therefore, they show an element's absolute contribution to the emergent variable (Cenfetelli & Bassellier, 2009).Further, the composite loadings provide information on the orientation of an emergent variable.Specifically, the scaling indicator, i.e., the element whose loading was constrained to 1, determines the orientation of the emergent variable.If it eventually appears that the other elements forming the particular emergent variable show negative composite loadings-even if they are expected to correlate positively with that emergent variable-the researcher should either reconsider the scaling variable or fix the loading of the scaling variable to -1 instead of 1, to ensure the correct orientation of the emergent variable.In addition, the magnitude and significance of the composite loading estimates should be assessed, e.g., by considering the outcome of the z-test or confidence intervals.Furthermore, researchers who are interested in the composition of an emergent variable or want to calculate emergent variables' scores should consider the weight estimates.Note that weight estimates are affected by multicollinearity, i.e., correlations among the elements, which can lead to differences in the signs of the composite loading and weight estimates.Finally, researchers should take criterion validity into account by considering concurrent and/or predictive validity (e.g., Piedmont, 2014).This is done by examining the extent to which an emergent variable correlates with a criterion variable.
Against the description above, we emphasize that CCA is not a replacement for CFA.While CFA is based on the common factor model to empirically validate latent variables and their measures, CCA is based on the composite model to empirically validate emergent variables and their elements.Consequently, the two techniques make different assumptions about the type of construct and serve different purposes.Therefore, their parameter estimates should not be compared as they have different conceptual meanings.

Illustrative example
In this section, we present an illustrative example from second language learning research to demonstrate the application of CCA following the steps presented in Table 2. Specifically, we consider the Metacognitive Awareness of Reading Strategies Inventory (MARSI), which assesses learners' awareness and use of reading strategies while reading academic texts (Mokhtari & Reichard, 2002).Originally, this inventory consisted of 30 strategy statements belonging to one of the following three strategy classes: (i) global reading strategies (GRS), (ii) problem-solving strategies (PSS), and (iii) support reading strategies (SRS).Because the fit of the common factor solution was not satisfactory, the MARSI was revised to result in a shortened version, i.e., the MARSI-R (Mokhtari et al., 2018), which has five items per construct, where each item refers to a different reading strategy.
The dataset we used in our illustrative example was collected and studied by Ondé et al. (2022) and is publicly available. 2It consists of 548 valid student responses to the MARSI-R, including a variable measuring self-reported reading level, referred to as READ.The students were enrolled in compulsory secondary education at various educational centers in Barcelona and Madrid (Spain).For more detail on data collection and the sample, we refer the reader to Ondé et al.'s (2022) original study.We conducted our CCA in the statistical programming environment R (R Core Team, 2022) using the lavaan package (Rosseel, 2012, version 0.6-13) and the semTools package (Jorgensen et al., 2022, version 0.5-6). 3he semTools package was used to calculate the confidence intervals of the weight estimates.

Model specification
As explained in the previous section, the use of a strategy class can be considered an emergent variable.Considering the items of the MARSI-R, we argue that the various items determine the use of the three strategy classes, i.e., they define the three constructs instead of measuring them.Consequently, removing an item would most likely alter the meaning of the constructs.Therefore, we employed the composite model and CCA to empirically validate the use of the three strategy classes, namely GRS, PSS, and SRS.The use of each strategy class was modeled as an emergent variable composed of the corresponding five items from the MARSI-R.Additionally, we added the READ variable to assess criterion validity.Figure 2 shows the specified model.To guide practitioners using SEM software with a graphical interface, this figure presents the specification in Amos (Arbuckle, 2020).To specify the model in lavaan, researchers can use the user-written R function specifyHO.

Model identification
To ensure that the parameters are identified, we have employed the rules presented in the previous section.As Figure 2 shows, the composite loadings were constrained appropriately, and each item served only once as scaling indicator.Further, the excrescent variables were correlated only with the excrescent variables of their block and not with other variables in the model.Finally, each emergent variable was connected to at least one other variable besides its elements.In our case, the three emergent variables, i.e., GRS, PSS, and SRS, and the READ variable were allowed to covary.

Model estimation
The items of the dataset showed a mild degree of non-normality, i.e., skewness ranging from -1.57 to 0.13 and excess kurtosis ranging from -1.34 to 1.40.To account for the non-normality in the items, we used the maximum likelihood estimator with robust standard errors to estimate the model parameters, including a Satorra-Bentler scaled test statistic (MLM; Satorra & Bentler, 1994) as implemented in the R package lavaan.
model estimation terminated normally.

Model assessment
To assess the model, we followed our guidelines as given in Table 2.In doing so, we considered the overall model fit.The chi-square test rejected the null hypothesis of exact fit (χ 2 = 156.97,df = 72, p < 0.01).As a supplement, we considered various indices to judge model fit.The SRMR equaled 0.042, indicating a good model fit.Similarly, the robust RMSEA equaled 0.051, with a 90% confidence interval ranging from 0.040 to 0.062, thus also indicating a good model fit.Finally, the robust CFI and TLI equaled 0.934 and 0.891, respectively.As a result, we regarded the fit of the composite model to be acceptable.
Table 3 shows the standardized weight estimates, including their 95% confidence intervals.As this table demonstrates, all standardized weights were positive, i.e., all the elements contributed positively to forming their corresponding construct.Regarding the confidence intervals of the standardized weights, none contained zero except the standardized weight of PSS1, which indicates that PSS1 did not contribute significantly to PSS.However, following the guidelines in Table 2, in the next step we inspected the estimated standardized composite loadings.Results revealed that the standardized composite loading of PSS1 on PSS was both sizable and significant.Thus, we decided to keep PSS1 in order not to risk altering the meaning of the emergent variable PSS (Benitez et al., 2020).Similarly, all other elements showed a positive and significant composite loading with their corresponding emergent variable.In addition, we report the correlations among the three emergent variables of GRS, PSS, and SRS which were within a reasonable range, i.e., r (PSS, GRS) = 0.571 (95% CI [0.509, 0.633]), r (GRS, SRS) = 0.576 (95% CI [0.513, 0.639]), and r (PSS, SRS) = 0.568 (95% CI [0.504, 0.632]).
Finally, we examined the criterion validity of the emergent variables by considering the extent to which the three emergent variables GRS, PSS, and SRS correlated with students' self-perception of their reading level (READ).From a theoretical point of view, this measure was expected to be correlated positively with GRS, PSS, and SRS (e.g., Mokhtari et al., 2018).With respect to our results, correlations between READ and GRS, PSS, and SRS were all positive and significant: r (READ, GRS) = 0.337 (95% CI [0.257,  CI [0.164;0.330]).Consequently, we find no violation of criterion validity.Overall, our results are in line with our hypothesis that GRS, PSS, and SRS behave as emergent variables.

Discussion
Researchers in the L2 and education domain frequently use questionnaires and inventories to collect data about their constructs of interest (Dörnyei & Dewaele, 2022).To validate such tools, L2 and education researchers regularly rely on CFA (or more recently, ESEM), which is based on the common factor model (Alamer, 2022;Alamer et al., 2023;Marsh & Alamer, 2024).Although CFA and the common factor model have proven to be useful in empirically validating questionnaires intended to measure latent variables, as explained in this paper, this approach has limited use for empirically validating inventories in which items make up the constructs, so-called emergent variables.This is because CFA assesses the factorial structure implied by the existence of a latent variable.However, this ignores important characteristics of Note: λ std = standardized composite loadings.w std = standardized composite weights emergent variables, which are not measured but composed of their constituting elements.A more suitable method for assessing emergent variables is CCA, which is based on the composite model and which our study has introduced into the education and language learning domains. 4o demonstrate the application of CCA, we made use of an illustrative example.For this reason, we used a publicly available dataset and considered the MARSI-R, which is an inventory designed to evaluate the perceived use of three reading strategy classes, i.e., global reading strategies, problem-solving reading strategies, and support reading strategies in L2 learning.Each of the 15 items captures the use of a specific strategy from one of the three classes, i.e., each item is unique and not interchangeable.Therefore-and in contrast to previous studies-we argue that the use of each strategy class is determined and not measured by its items.Consequently, we modeled the use of the three strategy classes by means of the composite model, which we assessed via CCA.Our results show that the model fit indices were within an acceptable range.Further, all composite loading estimates were both positive and significant, indicating that each strategy contributes in absolute terms to the use of its strategy class (Cenfetelli & Bassellier, 2009).Similarly, most weights (except one) were both significant and positive, showing that each strategy makes a unique contribution to the use of the strategy class to which it is assigned.
To perform our analysis, we mainly used the lavaan R package (Rosseel, 2012) and complemented the analysis with semTools (Jorgensen et al., 2022).We deliberately opted for R and its packages as they are widely used and available free of charge.In addition, lavaan allows for specification of new parameters, which is essential for obtaining the (standardized) weight estimates.For this purpose, the user-written function specifyHO was developed for the readers of this paper and can be used freely.Further, the most recent lavaan version, version 0.6.13 and above, shows good convergence behavior in comparison to other SEM software packages.However, CCA can also be conducted using commercial SEM software such as Amos (Arbuckle, 2020) as our visual representations of the composite model illustrate.For software tutorials on CCA, we refer the reader to www.confirmatorycompositeanalysis.com.
Finally, researchers may feel tempted to compare CCA and CFA results.However, it is important to note that the two techniques serve different purposes and therefore the decision whether to use CFA or CCA should be based on theoretical arguments.Due to conceptual differences between CCA and CFA, researchers should not compare their parameter estimates, such as comparing composite loading values with factor loadings, as they have different conceptual meanings.

Extensions to CCA
Although we used CCA in our study, there are various possible extensions to it.For instance, latent variables can be included in the analysis, i.e., a CCA and a CFA can be conducted jointly.In such a case, we have a confirmatory composite factor analysis (CCFA; Hubona et al., 2021), which can be particularly valuable for researchers who study both latent and emergent variables simultaneously and who want to follow the two-step approach known from SEM (Anderson & Gerbing, 1988).Specifically, in the first step, a CCFA is conducted to assess the composite and common factor models and in the second step, the emergent and latent variables are embedded in a structural model together with their items.For example, past research has shown that motivation has an impact on the use of GRS, PSS, and SRS (e.g., Alamer & Alsagoafi, 2023).To analyze such a situation, in the first step, a CCFA can be conducted to assess the composite and common factor models used to model the four constructs.As shown in Figure 3, motivation is modeled as a latent variable, while the use of each strategy class is modeled as an emergent variable.If no evidence against the validity of the composite and common factor models become apparent, a second step follows in which the substantive theory is assessed, i.e., the emergent and latent variables are embedded in a structural model, as shown in Figure 4.
Finally, various inventories for evaluating the use of strategies have been empirically validated using CFA.Since the outcome was often not satisfactory, ad hoc modifications were applied, e.g., by removing items or allowing multiple measurement errors to be correlated.These actions are often not theoretically justifiable.For instance, the MARSI, which originally consisted of 30 items, was reduced to the MARSI-R consisting of 15 items (Mokhtari et al., 2018) following data-driven ad hoc modifications.This might have resulted in important strategies being sacrificed.Therefore, we suggest that future research should re-evaluate such inventories using CCA (e.g., Alamer & Alsagoafi, 2023).

Conclusion
Past L2 researchers have mainly used CFA to assess their inventories, including those that evaluate emergent variables, i.e., constructs that are composed of elements/parts.As we have explained, for emergent variables CFA should not be the method of choice because it is based on the common factor model that does not align with the definition of emergent variables.The characteristics of emergent variables are captured more accurately in the composite model.As this paper proposes, CCA can be used to assess composite models.
Originally, PLS-PM and approaches to generalized canonical correlation analysis were proposed for model parameter estimation in CCA (Henseler et al., 2014;Schuberth et al., 2018).However, researchers faced various limitations, e.g., it is not possible to impose parameter constraints or deal with missing values using the full-information maximum likelihood (FIML) method, and there is only limited access to well-known fit indices (Schuberth et al., 2022).To overcome such limitations, in this study, we relied on the recently proposed H-O specification that allows researchers to conduct CCA using conventional SEM software applications such as Amos, and Mplus (Schuberth, 2023, Yu et al., 2023).In this way, researchers conducting a CCA can gain all the benefits that they are accustomed to when using SEM with latent variables.A supposed disadvantage of the H-O specification is that the weight estimates are not obtained by default because the relationships between the emergent and excrescent variables and their elements are expressed by composite loadings.However, as Schuberth (2023) and Yu et al. (2023) showed, the weight estimates can be retrieved from the inverted composite loading matrix.Also, using the user-written function specifyHO makes it easy to obtain weight estimates automatically.We have demonstrated the use of CCA by means of an illustrative example.Specifically, we considered the MARSI-R, which is an inventory for evaluating the use of different reading strategies in reading academic texts.To facilitate the application of CCA, we used the R open-source software (R Core Team, 2022) and a publicly available dataset (Ondé et al., 2022).The R code used for the analysis, including our user-written R function specifyHO, is freely available.In addition, we have presented model illustrations using Amos to show the reader how to specify models in software that offers a graphical interface.In this way, we hope that future research will benefit from CCA to the greatest extent possible and that researchers will consider revisiting the validity of their inventories by application of CCA.

Figure 1 .
Figure 1.Example of an H-O specification to conduct CCA.Note: ex=excrescent variable.

Figure 2 .
Figure 2. The CCA model specification used in the illustrative example.Note: The figure shows the model specification in Amos.

Figure 3 .
Figure 3.An example of a CCFA.

Figure 4 .
Figure 4.A structural model containing both emergent and latent variables.

Table 1 .
Characteristics of the common factor model and the composite model

Table 2 .
Steps to conduct CCA