In the 1970s, Ronald Inglehart (Reference Inglehart1971) began to develop his version of modernization theory, focusing on long-standing changes in values and behavioral attitudes caused by unprecedented growth of economic well-being and existential security in the world's most developed societies after WWII. In The Silent Revolution, Inglehart (Reference Inglehart1977) described in detail a major intergenerational shift in the values of the Western public: from materialist orientations, emphasizing economic welfare and physical security, to what he called “postmaterialist” orientations, emphasizing the importance of noneconomic aspects of human life, such as personal autonomy and self-expression.
Inglehart's initial findings were based on survey data for the Western world. The initiation and development of large-N cross-national survey programs such as the World Values Survey (WVS) allowed Inglehart to test his version of modernization theory globally, using data representing most of the world population. The logic of value change, outlined in The Silent Revolution, was then refined by Inglehart and his coauthors (Inglehart Reference Inglehart1990; Abramson and Inglehart Reference Abramson and Inglehart1995; Inglehart Reference Inglehart1997; Inglehart and Baker Reference Inglehart and Baker2000; Inglehart and Norris Reference Inglehart and Norris2003; Inglehart and Welzel Reference Inglehart and Welzel2005). Several important accomplishments in comparative politics, political sociology, and social psychology are based on the framework of modernization theory and its most advanced reformulation, the “evolutionary theory of emancipation” (Welzel Reference Welzel2013).
It has been shown that, in a cross-national perspective, value changes towards postmaterialist, or emancipative, orientations are associated with an increase in support for democracy (Inglehart and Welzel Reference Inglehart and Welzel2003; Welzel, Inglehart, and Klingemann Reference Welzel, Inglehart and Klingemann2003; Welzel and Inglehart Reference Welzel and Inglehart2006), tolerance of minorities and other out-groups (Andersen and Fetner Reference Andersen and Fetner2008), and gender equality (Inglehart and Norris Reference Inglehart and Norris2003; Bergh Reference Bergh2007; Alexander and Welzel Reference Alexander and Welzel2011). Self-expression and emancipative value orientations have been shown to foster interpersonal trust (Welzel Reference Welzel2010) and lead to a decline in violence, both domestic (Welzel Reference Welzel2010; Welzel and Deutsch Reference Welzel and Deutsch2012) and international (Inglehart, Puranen, and Welzel Reference Inglehart, Puranen and Welzel2015). These value changes also contribute to democratization (Welzel Reference Welzel2006, Reference Welzel2007; Inglehart and Welzel Reference Inglehart and Welzel2010) and secularization (Inglehart and Appel Reference Inglehart and Appel1989; Inglehart and Norris Reference Inglehart and Norris2004) across the world.
As with any theory dealing with people's attitudes, the validity of modernization-emancipation theory depends to a large extent on the measurement validity of the survey instrument used to operationalize the theory's crucial component, the concept of values. A short version of the index of postmaterialism, the original measure of value priorities used by Inglehart, was formed with only four survey items. The Index of Emancipative Values (EVI), the most recent measure derived from modernization-emancipation theory, includes twelve items, combined into four first-order constructs and one second-order construct (see Figure 1). However, both these measures, as well as indices of survival/self-expression and traditional/secular-rational values (Inglehart and Baker Reference Inglehart and Baker2000), representing an intermediate stage of the evolution of the “modernization-emancipation family”Footnote 1 of value indices, still have many common features, are defined by the intersected sets of survey items, and are based on common theoretical assumptions.

FIGURE 1. Measurement model for the EVI
In particular, modernization-emancipation theory assumes that cross-cultural differences in value priorities reflect different socioeconomic trajectories and other forms of historical path dependency but not differences in the meaning of the values on which cultures differ (Welzel Reference Welzel2013, 41–3). Importantly, this premise assumes that the same survey items can be used to measure values in virtually all countries in the world and that country means aggregated from those items are largely comparable on a worldwide scale.
To ensure that people in various countries rank their value priorities on the same scale and that their composite value scores are therefore comparable across countries, the measurement model for that scale should be invariant (i.e., apply in different cultural and geographical contexts). In statistical terms, this means that all model parameters should be (approximately) the same in each possible subsample (i.e., country). This assumption is often called the assumption of measurement invariance/equivalence (Steenkamp and Baumgartner Reference Steenkamp and Baumgartner1998; Davidov et al. Reference Davidov, Meuleman, Cieciuch, Schmidt and Billiet2014). If the invariance assumption does not hold, then it means that people in different countries interpret the relevant survey questions and respond to them differently. In that case, cross-national comparisons and regression analyses using noninvariant value scales are invalid, because such scales confuse true differences in individual values and country-specific response biases (Stegmueller Reference Stegmueller2011).
The assumption of invariance has rarely been tested statistically for any index from the modernization-emancipation family. To some extent, this may be explained by reference to the fact that 40 years ago, when the postmaterialism index was introduced, the importance of the invariance assumption was not yet widely recognized in comparative survey research. In addition, proper statistical tools for invariance testing had simply not been developed at that time. The few papers that did check for invariance of some indices belonging to the modernization-emancipation family, however, reveal a lack of cross-national comparability for these indices (e.g., Mackintosh Reference MacIntosh1998; Davis, Dowley, and Silver Reference Davis, Dowley and Silver1999; Ippel, Gellisen, and Moors Reference Ippel, Gelissen and Moors2014; but see Sacchi Reference Sacchi1998).
In particular, Alemán and Woods (Reference Alemán and Woods2016) recently tested for invariance of (1) traditional–secular-rational and survival–self-expression values and (2) secular and emancipative values across the cultural zones defined in Welzel (Reference Welzel2013), using all available data from the World Values Survey from 1981 to 2014. They concluded that “For the most part, WVS orientations are not [. . .] comparable cross-nationally, except among a small number of Western post-industrial societies.” They also admitted that “Welzel may be getting closer to formulating a set of concepts that are equivalent and coherent across the world” but then added that “The empirical properties of these value orientations differ in important ways from the ones he [Welzel] and Inglehart have proposed” (ibid., 1059).
It is, however, worth noting that Alemán and Woods generated a slightly modified version of the EVI, including only two instead of the original four first-order latent variables and without the second-order factor and only eleven items of twelve.Footnote 2 In addition, they conducted an invariance test across only four of ten cultural zones defined by Welzel, three of which were represented only by one country each.
The goal of the present article is to extend this line of comparative attitudinal research and provide a full-scale investigation of the measurement validity of the EVI, the most recent and complex representative of what I have called the “modernization-emancipation family” of value indices. In this respect, the main contribution of the article is threefold. First, it demonstrates that the original measurement model for the EVI based on the pooled WVS sample has multiple misspecifications, resulting in a relatively poor fit. Furthermore, the results of tests for measurement invariance of emancipative values show that the EVI is not even configurally invariant across the ten cultural zones (as defined in Welzel Reference Welzel2013) and therefore across WVS countries. These two findings suggest that the EVI is a biased estimator of cross-national differences in prevalent value patterns. However, for one subcomponent of the EVI, known as “pro-choice values” or “choice,” cross-zone and even cross-national invariance can be established using a less restrictive approximate Bayesian approach. Hence, this value subdimension (which reflects how permissible people in different countries find sexual self-determination in such matters as homosexuality, abortion, and divorce) can serve as a reliable benchmark for cross-national comparisons of the prevalence of a mass-level desire for emancipation.
The paper also contributes to the general discussion on definition and measurement of emancipative values and broadly of any multi-item attitudinal construct in comparative research. In a recent paper, Welzel and Inglehart (Reference Welzel and Inglehart2016) reject the latent-variable (or reflective) interpretation of their value measures, including the EVI, and claim all such measures are formative constructs, so the issue of noninvariance, while being essential for reflective constructs, is not critical for the validity of the indices from the modernization-emancipation family. They also argue that their indices are intended to capture societal-level change in prevalent patterns of values and assert that evidence of a construct's measurement noninvariance at the individual level has nothing to do with the same construct's validity at the country level. Finally, they claim that the exceptionally strong external linkages of their value measures are an excellent indication of those measures’ validity.
I challenge these claims. First, I show that the strong aggregate-level associations of particular components of the EVI with each other and with other variables of interest can be partially attributed to measurement error. I also propose a thought experiment challenging Welzel and Inglehart's formative interpretation of the EVI and show that the index seems to poorly fit the validity criteria for formative measures. Using another thought experiment and the case of the effective democracy measure (Alexander, Inglehart, and Welzel Reference Alexander, Inglehart and Welzel2012) as an empirical example, I then demonstrate that the EVI mixes up different value dimensions and their actual associations with political outcomes.
In the section that follows, I introduce the concept of emancipative values; assess the internal validity of its measure, the EVI; examine whether the EVI meets the established methodological standards of cross-cultural comparability; and highlight three arguments used by Welzel and Inglehart to defend their value measures. In the next section, I show that these arguments are not applicable in the case of emancipative values. I conclude by specifying several suggestions on how the measurement of value orientations from the modernization-emancipation family can be improved and emphasizing the general importance of proper construct development and measurement for comparative political research.
ASSESSMENT OF THE ORIGINAL MEASUREMENT MODEL FOR EMANCIPATIVE VALUES
Emancipative Values: Definition and Measurement
Modernization-emancipation theory states that if societies develop progressively, then their course follows a two-stage pathway: a transition from agrarian to industrial societies and from industrial to knowledge societies. The first transition leads to an increasing bureaucratization of society, while the second transition comes with increasing individualization (Inglehart and Welzel Reference Inglehart and Welzel2005; Welzel Reference Welzel2013, 58). Individualization here means an increase in people's desire for emancipation or “an existence free from domination” (Welzel Reference Welzel2013, 2).
The EVI is a further development of a widely used index of self-expression values (Inglehart and Baker Reference Inglehart and Baker2000). It is designed to measure “how strongly people claim authority over their lives for themselves” and how much equality they concede in this matter to everyone else (Welzel Reference Welzel2013, 59). The measurement model behind the concept of emancipative values may be represented as a hierarchical model (Figure 1), in which emancipative values are a second-order construct related to four first-order constructs, which in turn affect observed indicator variables. Welzel calls these first-order constructs “personal autonomy,” “reproductive choice,” “gender equality,” and “people's voice,” or, in short, “autonomy,” “choice,” “equality,” and “voice.”
To measure people's emphasis on “autonomy,” Welzel uses three WVS items showing whether respondents consider (a) independence and (b) imagination as desirable qualities in children while not considering (c) obedience to be such a quality. To measure how strongly people value “choice,” again three items are used, indicating how acceptable respondents find (a) divorce, (b) abortion, and (c) homosexuality. A respondent's emphasis on “equality” is measured with another three items indicating how strongly people disagree with the statements that (a) “education is more important for a boy than a girl,” (b) “when jobs are scarce, men should have priority over women to get a job,” and (c) “men make better political leaders than women.” Finally, to measure how strongly respondents value the “voice” of the people as a source of influence in their society, the following three items are used, indicating whether respondents assign first, second, or no priority to the goals of (a) “protecting freedom of speech,” (b) “giving people more say in important government decisions,” and (c) “giving people more say about how things are done at their jobs and in their communities” (Welzel Reference Welzel2013, 67).
Internal Validity and Comparability of Emancipative Values
Welzel does not justify his index based on its internal consistence. Still, he reports the results of a hierarchical (second-order) exploratory factor analysis of twelve items defining the EVI using the country-pooled individual level data of 95 countries surveyed at least once by the WVS/EVS, based on the latest available survey from each country (from the period of 1995–2005). Factor loadings produced by that procedure are documented in Welzel (Reference Welzel2013, 71, Table 2.3) and are quite high both at the first level and at the second level (almost all >0.7).
I replicate his analysis using the data of all 99 countries surveyed during the third to sixth rounds of the World Values Survey (1995–2014) using confirmatory factor analysis (CFA). I also conduct tests for measurement invariance of emancipative values. Given that my results in these respects are similar to Alemán and Woods's findings and because of space limitations, I provide a detailed description of my analysis, as well as a discussion of the relevant methodological issues, in the supplementary material (Appendices A and B). Here I briefly summarize the key findings.
First, I find that, according to the standard CFA goodness-of-fit criteria, Welzel's original measurement model for the EVI is not decidedly unacceptable but still obviously misspecified in some way (see Appendix A). In particular, several items do not discriminate well between the first-order factors and may simply not be useful for measuring the overall construct. Also, there is evidence of multidimensionality of the latent trait behind the index.
I then proceed with tests for cross-cultural comparability of emancipative values. I follow the empirical strategy of Alemán and Woods and employ multigroup CFA (MGCFA) for measurement invariance testing but extend the spatial coverage of their analysisFootnote 3 and use more appropriate CFA techniques.Footnote 4 Nevertheless, I come to the same substantive conclusion: The EVI does not satisfy even the weakest requirement necessary for cross-national comparisons, which is known as configural invariance.Footnote 5
Table 1 illustrates the point: It clearly shows that the factor loading patterns are inconsistent across Welzel's ten cultural zones. Several items in certain zones were excluded from the analysis to achieve model convergence. In each zone, at least one (typically more) item has a factor loading that is either insignificant or critically low (<0.3)Footnote 6 or even has a negative sign. This unambiguously suggests that Welzel's original twelve-item measure of emancipative values is not universally applicable in different cultural contexts.
TABLE 1. Group-Specific CFAs of 12 Variables from the Sixth Wave of the WVS for Ten Cultural Zones (2010–2014)

Notes: Entries are standardized factor loadings. All estimates are significant at the 0.05 level (except those marked as n.s. = nonsignificant). Loadings in bold are those lower than 0.30. Negative loadings are in italic. Variable intercepts, thresholds, and variances are not shown. Models were estimated in MPLUS version 7.11. National samples were weighted to equal size (N = 1,500). Due to the fact that nine of twelve observed indicators are categorical ordered variables, the WLSMV estimator was used for parameter estimation. Pairwise present analysis was used to deal with missing values. N = number of observations used. CFI = Comparative Fit Index. TLI = Tucker-Lewis Index. RMSEA = Root Mean Standard Error of Approximation.
At the same time, I reveal that one component of emancipative values, “choice,” has high-loaded indicators across all cultural zones. I examine whether more demanding assumptions required for comparability (known as metric and scalar invariance) hold for this value dimension. For this purpose, I use a recently introduced method of invariance testing known as the approximate Bayesian approach. This method is less restrictive than the classical (strict) approach to invariance testing but still allows for valid comparability testsFootnote 7 (Van de Shoot et al. Reference Van De Schoot, Kluytmans, Tummers, Lugtig, Hox and Muthén2013; Muthen and Asparouhov Reference Muthén and Asparouhov2013).
Applying this approach yields an encouraging result: For “choice,” the stronger forms of [approximate] invariance hold not only across ten cultural zones but also across all WVS countries. This suggests that “choice” is a reliable indicator of cross-national differences related to the modernization-emancipation process.Footnote 8
Unfortunately, “choice” is the only comparable component of the EVI in the WVS data. All other index components, as well as the second-order construct itself, fail to satisfy even the weakest requirement of configural invariance. Strictly speaking, this means that the EVI, in its current version, does not measure the same latent value dimension(s) in different cultural zones and, consequently, in different countries. Therefore, it should not be used for cross-national comparisons and statistical analyses.
Why Do Welzel and Inglehart Think the Results Above Are Not as Relevant as Their Critics Say?
The finding of noninvariance is not very surprising. Inglehart and Welzel themselves have pointed out explicitly a number of times “the variable and [. . .] weak inter-item coherence of [. . .] value constructs at the individual level within countries” (Inglehart and Welzel Reference Inglehart and Welzel2005, 231–44; Welzel Reference Welzel2013, 74–9, 110–2). They, however, do not think that the EVI's lack of invariance poses serious problems to substantive inferences involving emancipative values as an explanatory variable. In a recent article, Welzel and Inglehart (Reference Welzel and Inglehart2016) directly respond to criticisms of their value measures based on the noninvariance issue.
First, Welzel and Inglehart state that despite the fact that the data they “use to create [. . .] value constructs are collected from individuals, the purpose of these constructs is not to measure internally convergent personality traits.” Instead, they “aim to capture value configurations that emerge first and foremost—and sometimes solely—at the group level, the place where culture takes shape” (ibid., 1070–1). In other words, what matters for them is societal prevalence of different types of values, not individual value priorities.
They also note that the strength of associations between different components of emancipative values dramatically increases at the aggregate level compared to the individual level, referring to that regularity as the “micro-macro puzzle.” They explain this finding with the impact of measurement error that partially obscures true correlations at the individual level. Moreover, Welzel and Inglehart claim that aggregate-level correlations are more reliable since they are free of individual-level measurement error (which is removed by aggregation). Their overall conclusion is that “the strength of this aggregate-level association is in no way invalidated by its weakness at the individual level” (ibid., 1072) and that one cannot assess the measurement quality of an aggregate-level construct at the individual level, even if the aggregation emerges from individual-level data.
Welzel and Inglehart go on to argue that the EVI is designed according to combinatory (the term “formative” is more popular in the methods literature) principles of index construction, not to the dimensional (the equivalent term “reflective” is more popular in the methods literature) ones predominant in contemporary social sciences. The main difference between the two is that the reflective logic of measurement assumes that the latent construct causes variation in the observed indicators (and the latter are interchangeable manifestations of that construct), while the formative assumes that the observed indicators define the construct and requires constituent components to be recognizably different, not interchangeable. According to formative principles, the construct is simply an algebraical function of its constituent items. Welzel and Inglehart state that the EVI, as a formative measure, is designed to measure overall performances across a theoretically defined field of emancipation, and the dimensionality of the partial performances simply does not enter into consideration.
If so, then it is inappropriate to assess the validity of measures designed according to combinatory principles against the standards of the dimensional approach (including requirement of invariance). What can be used to assess the validity of a formative measure? According to Welzel and Inglehart, the most important criterion to judge the measurement quality of a combinatory construct is its “predictive power over other aspects of reality” (ibid., 1075). Welzel and Inglehart note that their “value constructs associate at exceptional strength across nations and over time with several dozen key indicators of (1) socioeconomic development, (2) cultural legacies, and (3) institutional performance[. . .] The explanatory power of the value constructs over these aspects of social reality ranges from 60% to 80%, across some 100 countries representing more than 90% of the world population” (ibid.). Therefore, these constructs “tap something very real” and represent fully valid measures.
In the following section, I show why appeals to (1) the presumed independence of the validity of macro-level constructs from the validity of individual-level constructs, (2) the standards of formative logic, and (3) high explanatory power of emancipative values and their strong linkages to various structural variables are not sufficient to justify the current version of the measurement model for the EVI.
CHALLENGING WELZEL AND INGLEHART'S DEFENSE
Does Macro-Level Construct Validity Not Depend on Micro-Level Measurement?
Two principal objections can be raised against the first pillar of Welzel and Inglehart's defensive argument. The first objection accentuates the contradiction between (a) Welzel and Inglehart's statement that what matters for their theories are value configurations that exist and operate at the macro level and (b) their own description of causal mechanisms linking mass-level value change and its presumed political effects. Let me use a putative link between values (or, broadly speaking, political culture) and democracy, which has been advanced in dozens of papers by Inglehart, Welzel et al., to substantiate this point.
Most political scientists can agree that institutional changes, such as democratization, typically spring from the actions of (groups of) individuals. The latter, however, are partially driven by macro-level factors (e.g., economic depression, income inequality, ethnic or religious discrimination, foreign intervention) that create incentives and redistribute resources for political action. Welzel and Inglehart recognize this basic macro-micro-macro sequence. In a very small nutshell, their general argument, which appears in their papers again and again in modified forms, states that the principal cause of value change and therefore political evolution is economic development, which relaxes existential constraints on people's actions, facilitates satisfaction of basic needs, turns the nature of everyday life from a source of pressures into a source of opportunities, and, hence, increases the utility and value of universal freedoms.
However, the growth of well-being and security from survival threats does not affect political institutions directly. As Inglehart and Welzel (Reference Inglehart and Welzel2010, 551) state in a recent paper, “simply reaching a given level of economic development could not itself produce democracy; it can do so only by bringing changes in how people act.” A liberating impulse of improving life conditions and expanding action resources travels first from the macro to the micro level, changes relevant individual behavioral dispositions (values and regime preferences), and only then goes back to the macro level, where it manifests itself through revolutionary or peaceful democratizations.
This pattern clearly suggests that values are important determinants of the individual actions that trigger political change or, to put it simply, that individual values matter. Since the mechanism is supposed to be universal, in order to be able to test it, a researcher should observe comparable value dimensions in different countries.Footnote 9 If she does not, as in the case with configural noninvariance of respective measures, it implies that either (1) the whole theory is wrong or (2) the way value orientations are measured is flawed (Seligson Reference Seligson2002).
My second objection underscores the danger of ignoring measurement error in analyses of international survey data. It is commonly acknowledged in contemporary social sciences that measurement error is an inevitable evil within survey research (Alwin Reference Alwin2007; Adcock and Collier Reference Adcock and Collier2001). Various factors may cause observed individual responses to be different from true individual preferences or attitudes. Welzel and Inglehart recognize this danger but argue (e.g., Inglehart and Welzel Reference Inglehart and Welzel2005, 231–44; Welzel and Inglehart Reference Welzel and Inglehart2016, 1071–2) that the errors in survey data are distributed randomly, so aggregation mostly cancels them out. This argument may be correct when one deals with individual-level errors. However, comparative researchers face another, potentially much more dangerous source of measurement error, arising exclusively at the level of society.
Let me illustrate the problem by introducing a very simple formal notation, borrowed from multigroup structural equation modeling. In the general context, the relationship between the unobserved individual preference ηij of the individual i regarding the issue j and the observed response y ij to some survey item j can be described as follows:

where εij represents nonsystematic individual-level error.Footnote 10
When individuals are clustered within groups (e.g., countries), not only individual-level but also group-level factors can distort the correspondence between the latent preference and the observed response. Such factors, which are often called “method effects” (Stegmueller Reference Stegmueller2011), include inaccurate translations of survey questions, differences in sampling procedures, response rates, and survey modes across studied countries, systematic differences in response styles and culturally specific understandings of certain measured concepts (Davidov et al. Reference Davidov, Meuleman, Cieciuch, Schmidt and Billiet2014, 56; Przeworski and Teune Reference Przeworski and Teune1966). To capture potential bias due to macro-level method effects, Equation (1) should be rewritten as

where g is the group index and u jg represents the group-level component of the overall measurement error for the item j in the group g. By contrast with other terms in Equation (2), u jg does not vary across individuals but simultaneously affects all individual responses within the group g. As a result, the average of the observed responses to the item j in group g does not correspond directly to the average of respective individual latent preferences but is generally biased, and the magnitude and direction of group-specific bias are determined by the term ujg.Footnote 11
Sometimes the macro-level bias can be substantial. Consider the case of a standard four-point Likert item. Suppose that there are two countries in which the vast majority of the population has an equally favorable view of some social phenomenon. Thus, the item asking whether respondents agree with some statement expressing positive attitude toward this phenomenon should receive responses of “agree”/“strongly agree” or, to quantify them using a 1-to-4 scale, 3 and 4, in both countries. Consequently, the country means in both should be around 3.5.
Suppose now that extreme responding is prevalent in one country and midpoint responding is prevalent in the other. In that case, in the first country the average score will be biased upward (and close to 4), while in the second country the average score will be biased downward (and close to 3). In the limit, the observed difference in means between the two countries will tend to 1. Given that the theoretical range for mean scores is 3 (4 minus 1), an analyst who uses raw data and ignores measurement error will conclude that people in the first country are almost 50% more supportive of the phenomenon of interest than people in the second country—despite the true difference being nearly zero.Footnote 12
What is even more important is that the country-level measurement bias affects not only the hierarchy of country means but also the magnitude of aggregate-level correlations between survey items. As it follows from Equation (2), the observed aggregate correlation between two attitudinal items y 1 and y 2 consists of two main components: (a) the correlation between the country means on η1 and η2 and (b) the correlation between the country-level error terms u y 1 and u y 2.Footnote 13
The latter correlation is typically much stronger. Because the bias is introduced at the macro level, it unidirectionally affects many (if not most) attitudinal items included in a country-specific questionnaire, especially items intended to measure close concepts (for instance, for such items a common translation error is more likely to occur). Therefore, the observed aggregate-level correlation in the presence of this kind of bias will always be higher than the true correlation. In the supplementary material (Appendix C), I report a simple simulation experiment showing that, under quite realistic conditions, the country-level error can easily double the strength of correlation at the aggregate level compared to the individual level.
The general principle determining the size of the observed aggregate-level correlations is simple: (a) the higher relative magnitude of the country-specific errors compared to the range of national mean scores on the respective construct for each involved variable and (b) the stronger correlation between the country-level errors, and then (c) the higher upward bias in the macro-level correlations between attitudinal items, compared to the individual-level correlations between the same items.
Importantly, here (and in the simulation in the supplementary material) I assume that the macro-level error terms u y 1 and u y 2 are correlated with each other but not with the mean vectors μy 1 and μy 2. That is, I assume that the country-level error is not driven by the same systematic country characteristics that produce variation in the prevalence of values. Welzel and Inglehart insist that the country-specific internal consistency (they prefer the terms “coherence” and “inter-item convergence”) of the EVI increases in parallel with the country-mean scores on emancipative values. Furthermore, both the variation in the intranational degree of the index's internal consistency and the variation in country mean scores on the EVI can be quite satisfactorily predicted with a set of structural variables, most importantly the level of cognitive mobilization (see Welzel Reference Welzel2013, 74–9, for a detailed description of this concept). They call these variables “coherence-inducing forces.”
According to Welzel and Inglehart (Reference Welzel and Inglehart2016, 1080–2), since the coherence-inducing factors reflect pivotal aspects of modernization, such as technological progress and expansion of a knowledge society, their tremendously high correlations both with the mean level and with the degree of intranational coherence of the EVI suggest that this index represent a fully consistent measure of modernization's manifestation in collective mentalities.
This result is not as supportive of Welzel and Inglehart's argument as it appears at first glance. First, their finding simply means that in more technologically advanced societies surveys provide more reliable results. It does not prove with certainty that in less-developed societies emancipative values are actually as low as is observed in the WVS data. From the methodological viewpoint, it only illustrates that (a) there is much more uncertainty about the estimates of relative prevalence of emancipative values in such societies and (b) those estimates are more likely to be biased.
Second, they measure intranational coherence of emancipative values using Cronbach's alpha with respect to the four subcomponents of the EVI. But the internal convergence of three of the four first-order components of the index is itself highly doubtful for most WVS countries. Moreover, Cronbach's alpha is often criticized as a measure of either internal consistency or reliability (Sijtsma Reference Sijtsma2009; Alwin Reference Alwin2007). Since it assumes equal factor loadings and error variances for all indicators, it is not recommended for use in comparative contexts (Alemán and Woods Reference Alemán and Woods2016, 1051). Moreover, as follows from Figures 1 and 3 in Welzel and Inglehart (Reference Welzel and Inglehart2016, 1078–82), country-specific alphas for the EVI do not exceed 0.65 even for the most developed Western countries (Croatia, Austria, and Denmark approach, but do not pass, that threshold), although it is generally considered that the acceptable values of alpha range from 0.70 to 0.90 (Tavakol and Dennick Reference Tavakol and Dennick2011, 54).
So, even while increasing with cognitive mobilization, the overall consistency of the EVI remains low even for the most developed countries. As Table 1 above shows, configural differences in factor loading patterns exist even between Welzel's three zones containing the world's most developed nations (in which, according to Welzel-Inglehart's thesis, both the degree of convergence and the average level of the EVI should be highest): the Old West, the Reformed West, and the New West. This suggests that, in the case of the EVI, cross-national differences in the level of cognitive mobilization do not fully account for measurement bias.
Third, empirical evidence indicates that method effects do indeed amplify the observed macro-level correlations between WVS variables. As even configural invariance does not hold for the EVI, one cannot directly account for country-level bias and estimate its impact on the observed correlations between the EVI and other variables. However, it is possible to do so for the “choice” subdimension. I test the robustness of the correlation between country-average pro-choice values and country-average willingness to fight for one's countryFootnote 14 (Inglehart, Puranen, and Welzel Reference Inglehart, Puranen and Welzel2015; see also the supplementary material, Appendix D), using data from the sixth wave of the WVS.
I measure country means on “choice” using both Welzel's approach (i.e., arithmetic means) and the approximate Bayesian MGCFA approach, which allows for accounting for both country- and individual-level measurement error. For its part, willingness to fight is a binary indicator with a very clear and (in theory) universally understandable meaning, so one may expect that this item is a little less noninvariant than other WVS questions. Although, as Stegmueller (Reference Stegmueller2011, 484) stresses, “If only a single item is available for analysis, cross-national equivalence cannot be tested but must be assumed.” All in all, these two variables provide a relatively good benchmark for assessment of how much bias measurement error induces in the correlation between attitudinal scores aggregated from cross-national survey data.
I find that the raw country means on “choice” correlate more strongly (ρ = –0.499) with the average willingness to fight than the Bayesian means (i.e., adjusted for measurement error) do (ρ = –0.443). Therefore, accounting for measurement error even for only one of the two attitudinal constructs decreases the strength of correlation between these constructs by approximately 13%. In this case, the bias is obviously not critical: The adjusted association remains statistically significant and relatively strong.
Yet, due to the absence of even configural measurement invariance, one cannot directly (that is, using a measurement model) account for the bias in the country means of the EVI's other components. Given that these other components are much more problematic than “choice” in terms of their measurement validity, I would suggest that their high macro-level correlations with each other and with other political variables, either aggregated from the individual-level data or existing solely at the macro level, can be explained by the influence of measurement error at least to the same degree as by existence of true associations.
Are Emancipative Values a Truly Formative Construct?
While both the formative and the reflective are powerful approaches to the specification of multidimensional constructs, the choice between these two may have a substantial impact on inferences and is not an easy task (Coltman et al. Reference Coltman, Devinney, Midgley and Venaik2008). Law et al. (Reference Law, Wong and Mobley1998) suggest a relatively simple rule to distinguish which of formative and reflective measurement is more appropriate in a given situation. According to Law et al., reflective specification (they use the term “latent model”) of the construct should be used whenever the multidimensional construct exists at a deeper and more embedded level than its dimensions or, in other words, when the multidimensional construct is a higher-order abstraction underlying its dimensions. The formative specification (they use the term “aggregate model”) of the construct should be used whenever the multidimensional construct exists at the same level as its dimensions and is defined as a combination of its dimensions and, in addition, can be formed as an algebraic function of its dimensions (Law et al. Reference Law, Wong and Mobley1998, 742–3).
Do the EVI and its particular components exist at the same level of abstraction or not? Though in previous works Welzel and Inglehart relied on factor analytic procedures to validate their measures (Inglehart and Baker Reference Inglehart and Baker2000; Welzel Reference Welzel2013), they now oppose the latent variable interpretation of their value constructs and advocate the combinatory interpretation instead. Consider, however, a hypothetical situation, when items measuring one specific component of emancipative values are not included in the WVS questionnaire that will be used in the next wave.
Can one make an approximate inference about country mean scores on the unobserved values component using measured scores on other components of the emancipative values? Can one make an inference about at least the sign of the association between emancipative values measured on a reduced scale and other aggregate country characteristics? Finally, will the latter inference significantly change if one suddenly obtains scores on the missed component and then reconducts the analysis using all the relevant information?Footnote 15
The way emancipative values are typically used suggests that one can recover scores on that dimension with an acceptable level of precision, as well as potentially use particular scores on the remaining components of emancipative values to estimate the strength and direction of the association between the EVI and other country-level variables of interest. Therefore the first-order components (and, certainly, the observed items used to measure them) apparently represent interchangeable manifestations of the higher-order construct of emancipative values.
Then, Coltman et al. (Reference Coltman, Devinney, Midgley and Venaik2008, 1254) recommend the formative measurement when observed items are not highly correlated. Moreover, the use of correlated components may cause estimation problems for formative measures. As Table 1 and the analysis in the supplementary material demonstrates, most observed items and first-order constructs defining emancipative values are strongly correlated with each other, and in general, on the pooled WVS sample the EVI shows an acceptable, though not ideal, fit. So, in this respect, it also should be considered a reflective measure. The problem with the EVI is that the strength of correlations between its components varies significantly across different countries and cultural zones.
Certainly, the substantial consequences of the lack of strong (metric or scalar) forms of invariance should not be overestimated. As Davidov (Reference Davidov2008, 43) notes, “Measurement invariance is too strict [. . .]. In other words, the measurement invariance test could fail [. . .] although there is cognitive equivalence” (see also Oberski Reference Oberski2014). Unfortunately for the EVI, even the weakest form of invariance, configural invariance, could not be established across global cultural zones and countries. Moreover, some configural differences exist even between the zones with the highest mean scores on emancipative values.
This indicates that emancipative values, in their current form, are definitely an ideal construct, not an empirically observable value dimension. Welzel and Inglehart recognize that their construct refers to an ideal rather than empirical benchmark and measures only how far individual responses to selected survey items in different countries are (in the aggregate) from the normative standard of a liberal and equal society. They are quite convincing in arguing that if such a “deviation-from-the-standard” measure associates strongly with other important features of reality, it may be a meaningful measure.Footnote 16 But this is not formative measurement. It is essentially normative measurement.Footnote 17
Finally, it must be noted that the formative models are not free from measurement error. They must be specified carefully in order to account for it (Jarvis, MacKenzie, and Podsakoff Reference Jarvis, MacKenzie and Podsakoff2003). They also cannot be assumed to be equivalent across countries merely because they are formative. This assumption can and generally must be tested (Diamantopoulos and Papadopoulos Reference Diamantopoulos and Papadopoulos2010). Formative measurement is not a panacea, especially given that the estimator of the nation-level prevalence of emancipative values, according to Welzel's definition, is a simple average of all relevant indicators; that is, it assumes (a) no measurement error and (b) equal contribution of all observed items.Footnote 18 These assumptions are too dubious even by the standards of the formative approach.
Are Explanatory Power and a Convincing Theory Sufficient When Assessing the Quality of Formative Constructs?
A potential undesirable consequence of the use of complex value measures is that, despite their complexity, such indices may oversimplify, or blur, actual associations between particular value dimensions and their expected correlates. In the supplementary material (Appendix E), I report another simulation experiment that clearly shows that both a theory-driven formative and a data-driven reflective combination of several distinct constructs into a single higher-order construct may yield measures that have high internal and external validity and fit theoretical predictions but miss some important aspects of the reality at the same time (e.g., the opposite effects of the first-order components on the outcome of interest). The detection of misspecifications of such composite measures is not an easy task (Coltman et al. Reference Coltman, Devinney, Midgley and Venaik2008, 1253), and it may also require considerable revision of the theory as its consequence. Nevertheless, when some indirect evidence of misspecification is available, researchers should not simply ignore it. For an empirical illustration of this point, allow me to consider the association between emancipative values and effective democracy.
Table 2 reports pairwise correlations between the 2014 Effective Democracy Index, or EDI (Alexander, Inglehart, and Welzel Reference Alexander, Inglehart and Welzel2012), and country scores on four components of the EVI measured during the sixth round of the WVS.Footnote 19 If Pearson's ρ is used as a measure of the association's strength, then “choice” correlates with the EDI almost at the same level as the EVI does (ρs are 0.809 and 0.830, respectively; the correlation between “choice” and the EVI is 0.92).Footnote 20 The correlation between “equality” and the EDI is also numerically close to these associations in strength (ρ = 0.730). “Voice” and “autonomy” correlate with the EDI to a substantively lesser, though still significant extent (ρs are 0.579 and 0.568, respectively).
TABLE 2. Pairwise Correlations between Effective Democracy and Various Components of Emancipative Values

Notes: Entries are pairwise correlations between Alexander, Inglehart, and Welzel's (Reference Welzel and Deutsch2012) Effective Democracy Index (EDI) and various components of emancipative values. “Choice (latent means)” = country mean scores on the “choice” value dimension, computed using MGCFA with approximate Bayesian invariance. “Choice,” “autonomy,” “equality,” and “voice” are country mean scores on the respective components of the EVI. They, as well as the EVI country mean scores, are computed as Welzel (Reference Welzel2013, 63–9) describes. The EDI is updated for 2014 and rescaled so that the theoretical minimum is 0 and the maximum is 1. All value scores are computed using the data from the sixth wave of the WVS (2010–2014; 58 countries: Kuwait and Egypt were excluded because one or more items measuring pro-choice orientations were not included in the national WVS questionnaires in those countries and therefore the latent means on “choice” could not be estimated).
***P < 0.001; ** P < 0.01; * P < 0.05.
Because the EDI computed for 60 countries surveyed in the sixth wave of the WVS has an obvious bimodal distribution, the use of Pearson's ρ may produce biased estimates of the correlations between different value dimensions and effective democracy. However, the rank correlation coefficients, Spearman's ρ and Kendall's τ, both tell broadly the same story: “choice” and “equality” correlate with the EDI almost as strongly as the EVI does. The associations between “voice” and the EDI and between “autonomy” and the EDI are significantly lower (by 25%–50%).
Furthermore, when “voice” and either “choice” or “equality” are used simultaneously in multiple regression analyses, regression coefficients for “voice” are insignificant (contrary to bivariate regression of the EDI on “voice”; see Table 3: Models 1, 3, and 5). In contrast, “choice” and “equality” remain significant predictors of the EDI even when simultaneously included in the regression equation (Table 3: Model 7).
TABLE 3. Regressing Effective Democracy on Various Components of Emancipative Values

Notes: Entries are nonstandardized OLS regression coefficients with standard errors in parentheses. The dependent variable is Alexander, Inglehart, and Welzel's (Reference Welzel and Deutsch2012) Effective Democracy Index (EDI). “Choice,” “autonomy,” “equality,” and “voice” are country mean scores on the respective components of the Emancipative Values Index. They are computed as Welzel (Reference Welzel2013, 63–9) describes. The EDI is updated for 2014 and rescaled so that the theoretical minimum is 0 and the maximum is 1. All value scores are computed using the data from the sixth wave of the WVS (2010–2014; 60 countries surveyed). Test statistics for heteroskedasticity (Breush-Pagan test), multicollinearity (variance inflation factors), and influential cases (Bonferroni P values for studentized residuals) reveal no violations of ordinary least squares (OLS) assumptions, except models 5 and 6, for which the Breush-Pagan test is significant. Reestimation of these models with heteroskedasticity corrected standard errors gives the same substantive results.
*** P < 0.001; ** P < 0.01; * P < 0.05.
Moreover, the use of “choice” and “equality” as separate predictors of the EDI in a linear regression model gives an R-squared value almost as high as for the EVI (Model 8), 0.661 versus 0.685. So these two particular components of the index, when considered as separate value dimensions, have essentially the same explanatory power than the composite score on emancipative values. This suggests that “choice” and “equality” are presumably the only two components of the EVI that are actually important covariates of effective democracy.Footnote 21 This finding clearly shows that a simple composition of theoretically relevant items, even if it has high external validity, may sometimes overshadow truly important factors.
Some readers may not be convinced by my argument because it involves only one real data counter-example. A deep investigation of associations between particular components of emancipative values and various societal-level outcomes, however, requires a full-length volume. I refer such readers to Table 2.6 in Welzel's book (Reference Welzel2013, 80; see also related discussion on p. 81), which reports correlations of emancipative values and their components with assumed antecedents and consequences. In that table, “choice” demonstrates the strongest or at least the second-strongest correlation among the subcomponents of emancipative values with almost all individual- and especially country-level covariates, suggesting that, even if it is declared that all value components are equal, some of them are more equal than others.Footnote 22
DISSCUSION AND CONCLUDING REMARKS
This article reports evidence of misspecification of the measurement model for the index of emancipative values. It shows that this index, which is used as the key explanatory variable in many important contributions in sociology and political science, is cross-culturally noninvariant and confuses different value dimensions and their actual associations with political variables. However, an analysis using a novel approximate Bayesian approach shows that one subdimension of emancipative values, “choice,” is comparable across WVS societies.
Welzel and Inglehart (Reference Welzel and Inglehart2016) argue that MGCFA-based invariance testing is not an appropriate method to test the validity of the EVI, as well as of other value measures from the modernization-emancipation family of value indices that have been proved to be noninvariant in recent studies, because all these value constructs (a) operate and manifest themselves only at the aggregate level, (b) are formative measures and therefore should not be judged against the standards of reflective measurement, and (c) have high external validity.
My thought experiments and real data examples suggest that their argument is doubtful. Even if we assume that the EVI fully satisfies the definition of a formative construct (which is itself a highly questionable assumption), it is quite likely that the observed associations between values and various political outcomes can be explained mainly by the effects of two subcomponents of the overall index, “choice” and “equality.” Although this article strongly supports the reflective interpretation of emancipative values, it nevertheless shows that, whatever approach to the measurement of these values is used, their current operationalization remains problematic. In addition, high aggregate-level correlations between the EVI or its components and other variables (especially those aggregated from survey data) are at least partially bolstered by country-level method effects.
Of course, these results by no means imply that all substantive results based on the use of the EVI, or any other index belonging to the modernization-emancipation family, are completely wrong. The long tradition of highly respected research addressed in this article shows that values are important determinants of both individual behavior and desirable societal-level changes, such as growth of trust and tolerance or democratization. My aim is not to cast doubt on the fundamental theses and findings of modernization-emancipation theory. Instead, I have sought to show that at the current level of the theory's development, there is still some work to be done to clarify what emancipative values actually are and how they should be measured. So I would like to conclude by outlining several implications of this study that might be useful for further clarification of the concept of emancipative values and investigation of associations between emancipative values and other aspects of social reality.
My main positive finding is that the “choice” subdimension of emancipative values is a reliable indicator of a given society's emphasis on “freedom from external domination.” It must also be noted that, as can be seen from Table 1, two items measuring the “equality” dimension of emancipative values have high loadings in almost all cultural zones. This may indicate that perceptions of issues related to gender equality can also be cognitively equivalent across different cultures. However, even in such a reduced form this subindex cannot be assumed to be sufficiently reliable, because, since MGCFA is not applicable with only two indicators, it is problematic to account for potential bias due to zone- or country-level measurement error. Thus, a further search for additional effective indicators of this value would be worthwhile.
Importantly, these two value dimensions appear to be equally powerful predictors of variation in political outcomes to the EVI itself. This finding probably reflects better measurement properties of the respective constructs but also a different, and more fundamental, nature of these value dimensions compared to other components of emancipative values. As Inglehart, Ponarin, and Inglehart (Reference Inglehart, Ponarin and Inglehart2017, 1315) note, “all preindustrial societies that survived for long, encouraged much higher human fertility rates than do today's high-income societies.” Traditional societies did so by indoctrinating their members with rigid cultural norms emphasizing the priority of reproduction, and of the community's survival in general, over individual goals and desires.
Those norms stigmatized deviant forms of behavior, such as homosexuality, abortion, and divorce, among others, and often involved physical repressions against deviant individuals. They also placed harsh constraints on the space of permitted social roles for women. Even today, sexual liberation is highly contested by traditionalist forces (Alexander, Inglehart, and Welzel Reference Alexander, Inglehart and Welzel2016, 911), while emancipation in other domains does not meet such fierce resistance from religious and other conservative organizations. It is not surprising that repressive norms that have been deeply rooted in the everyday social practices of virtually all societies throughout the whole history of humanity are mirrored in different cultures in a more coherent way than abstract Western-centric ideas of political liberty and permissive parenting styles.
Overall, this article and other recent research show that the “choice” value construct (a) has a well-developed theoretical basis, (b) demonstrates a reasonable level of cross-national comparability, and (c) has strong external linkages to structural variables and also high external validity. Moreover, (d) it is simply a more parsimonious measure than the EVI. I believe these reasons are sufficient to justify the use of “choice” as the prime indicator of cross-national value differences related to modernization and moral evolution.
That said, I do not mean to say that “choice,” in its current form, is the best possible measure of the cultural differences linked with human empowerment and other aspects of development. Rather, I argue that it is the best measure among currently available options. It can certainly be improved in a couple of ways. For example, additional indicators of tolerance toward some “nonconventional” (for traditional societies) forms of sexual behavior can be added to the “choice” index, or alternative measures of other subdimensions of emancipative values, first, “equality,” can be proposed.
My research also has implications that go beyond the discussion of the measurement validity of Inglehart and Welzel's value measures. Multi-item measurement models are common in comparative political research. Often researchers use them uncritically and are not aware of (or do not have effective incentives to deal with) several important measurement issues, such as the need for a proper definition of the construct, the choice of a relevant measurement strategy, and assessment of the construct's measurement equivalence. In general, the measurement validity of any survey-based measure cannot be assumed and must be tested empirically.
Duane F. Alwin notes that there are “two basic approaches to minimizing [measurement] error. The first is to emphasize the reduction of errors in the collection of survey data through improved techniques of questionnaire design, interviewer training and survey implementation. The second is to accept the fact that measurement errors are bound to occur, even after doing everything that is in one's power to minimize them, and to model the behavior of errors using statistical designs” (Alwin Reference Alwin2007, 9).
Both approaches are highly relevant for comparative political science. If some survey-based measures of theoretically important constructs are found to be inappropriate (e.g., performing nonequivalently across countries), then they should be replaced by better-calibrated measures (e.g., using alternative item wordings, alternative item scales, or simply alternative sets of items). A good example of such conduct was given by Shalom Schwarz and his collaborators. Schwartz's 21-item measure of human values implemented in the European Social Survey was at one point found to be problematic in terms of invariance (e.g., Davidov Reference Davidov2008). Schwartz then proposed a refined instrument to measure human values, which was later shown to be approximately invariant across European nations (Cieciuch et al. Reference Cieciuch, Davidov, Algesheimer and Schmidt2017).
Unfortunately, refining such complex measurement instruments as comparative surveys is a difficult enterprise, which can take a great deal of time and resources. Despite the titanic efforts of survey administrators, money and organizational constraints often prevent comparative survey programs from reaching the highest standards of data quality and comparability. And, as Davidov et al. (Reference Davidov, Meuleman, Cieciuch, Schmidt and Billiet2014, 57) note, even “the most rigorous application of current standards cannot guarantee that measurements are comparable across nations.”
Existing international survey programs nevertheless remain an extremely important source of information for political scientists and definitely can and should be used in applied research. However, researchers using comparative survey data must (a) be aware of the potential pitfalls inherent to this kind of political data and (b) inform their audience of how they deal with the relevant methodological problems. In particular, the following three points are worth clarifying by applied researchers when they introduce their data and methods to the readers: (1) how their constructs are defined and measured (e.g. whether the authors are employing reflective or formative measurement), (2) how individual- and/or country-level measurement error is taken into account (if it is modeled directly or assumed negligible, and, if so, why), and (3) how the issue of cross-national comparability is addressed.
In the book cited above, to underline the importance of studying errors of measurement in survey research, Alwin quotes Mark Twain, who reportedly said about the weather: “Everybody talks about the subject, but nobody does anything about it” (Alwin Reference Alwin2007, xii). I hope that the present study, along with other recent methodological contributions, sufficiently clarifies why political scientists have to stop “talking” (or, to be precise, reading) about measurement error only on the pages of methodological journals and finally begin to take it seriously in their research.
SUPPLEMENTARY MATERIAL
To view supplementary material for this article, please visit https://doi.org/10.1017/S0003055417000624.
Replication Materials can be found on Dataverse at https://doi.org/10.7910/DVN/TPGPCR.
Comments
No Comments have been published for this article.