The Index of Emancipative Values: Measurement Model Misspecifications

BORIS SOKOLOV

doi:10.1017/S0003055417000624

The Index of Emancipative Values: Measurement Model Misspecifications

Published online by Cambridge University Press: 30 January 2018

BORIS SOKOLOV

Show author details

BORIS SOKOLOV*: Affiliation:
National Research University Higher School of Economics
*: Boris Sokolov is a Research Fellow at the Laboratory for Comparative Social Research and an Associate Professor at the Department of Sociology, St. Petersburg School of Social Sciences and Humanities, National Research University Higher School of Economics, 55 Sedova Street, 190068 St. Petersburg, Russian Federation (bssokolov@gmail.com).

Article contents

Abstract
ASSESSMENT OF THE ORIGINAL MEASUREMENT MODEL FOR EMANCIPATIVE VALUES
CHALLENGING WELZEL AND INGLEHART'S DEFENSE
DISSCUSION AND CONCLUDING REMARKS
SUPPLEMENTARY MATERIAL
Footnotes
References

Get access

Rights & Permissions

Abstract

This article reports evidence of misspecification of the measurement model for the index of emancipative values, a value construct used as a key explanatory variable in many important contributions to political science. It shows that the scale on which the index is measured is noninvariant across cultural zones and countries in the World Values Survey. In addition, it demonstrates that the current index composition mixes different value dimensions and their actual associations with various political outcomes, in particular the index of effective democracy. However, an analysis using a novel approximate Bayesian approach shows that at least one specific subdimension of emancipative values, known as pro-choice values, truly exists and may be validly measured and compared cross-nationally. The article also contributes to the recent discussion on whether emancipative values are a reflective or a formative construct by providing thought experiments and empirical evidence supporting the former interpretation.

Information

Type: Research Article
Information: American Political Science Review , Volume 112 , Issue 2 , May 2018 , pp. 395 - 408

DOI: https://doi.org/10.1017/S0003055417000624 [Opens in a new window]
Copyright: Copyright © American Political Science Association 2018

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

In the 1970s, Ronald Inglehart (Reference Inglehart1971) began to develop his version of modernization theory, focusing on long-standing changes in values and behavioral attitudes caused by unprecedented growth of economic well-being and existential security in the world's most developed societies after WWII. In The Silent Revolution, Inglehart (Reference Inglehart1977) described in detail a major intergenerational shift in the values of the Western public: from materialist orientations, emphasizing economic welfare and physical security, to what he called “postmaterialist” orientations, emphasizing the importance of noneconomic aspects of human life, such as personal autonomy and self-expression.

Inglehart's initial findings were based on survey data for the Western world. The initiation and development of large-N cross-national survey programs such as the World Values Survey (WVS) allowed Inglehart to test his version of modernization theory globally, using data representing most of the world population. The logic of value change, outlined in The Silent Revolution, was then refined by Inglehart and his coauthors (Inglehart Reference Inglehart1990; Abramson and Inglehart Reference Abramson and Inglehart1995; Inglehart Reference Inglehart1997; Inglehart and Baker Reference Inglehart and Baker2000; Inglehart and Norris Reference Inglehart and Norris2003; Inglehart and Welzel Reference Inglehart and Welzel2005). Several important accomplishments in comparative politics, political sociology, and social psychology are based on the framework of modernization theory and its most advanced reformulation, the “evolutionary theory of emancipation” (Welzel Reference Welzel2013).

It has been shown that, in a cross-national perspective, value changes towards postmaterialist, or emancipative, orientations are associated with an increase in support for democracy (Inglehart and Welzel Reference Inglehart and Welzel2003; Welzel, Inglehart, and Klingemann Reference Welzel, Inglehart and Klingemann2003; Welzel and Inglehart Reference Welzel and Inglehart2006), tolerance of minorities and other out-groups (Andersen and Fetner Reference Andersen and Fetner2008), and gender equality (Inglehart and Norris Reference Inglehart and Norris2003; Bergh Reference Bergh2007; Alexander and Welzel Reference Alexander and Welzel2011). Self-expression and emancipative value orientations have been shown to foster interpersonal trust (Welzel Reference Welzel2010) and lead to a decline in violence, both domestic (Welzel Reference Welzel2010; Welzel and Deutsch Reference Welzel and Deutsch2012) and international (Inglehart, Puranen, and Welzel Reference Inglehart, Puranen and Welzel2015). These value changes also contribute to democratization (Welzel Reference Welzel2006, Reference Welzel2007; Inglehart and Welzel Reference Inglehart and Welzel2010) and secularization (Inglehart and Appel Reference Inglehart and Appel1989; Inglehart and Norris Reference Inglehart and Norris2004) across the world.

As with any theory dealing with people's attitudes, the validity of modernization-emancipation theory depends to a large extent on the measurement validity of the survey instrument used to operationalize the theory's crucial component, the concept of values. A short version of the index of postmaterialism, the original measure of value priorities used by Inglehart, was formed with only four survey items. The Index of Emancipative Values (EVI), the most recent measure derived from modernization-emancipation theory, includes twelve items, combined into four first-order constructs and one second-order construct (see Figure 1). However, both these measures, as well as indices of survival/self-expression and traditional/secular-rational values (Inglehart and Baker Reference Inglehart and Baker2000), representing an intermediate stage of the evolution of the “modernization-emancipation family”Footnote ¹ of value indices, still have many common features, are defined by the intersected sets of survey items, and are based on common theoretical assumptions.

Note: Rectangles represent observed variables, ovals latent ones.

FIGURE 1. Measurement model for the EVI

In particular, modernization-emancipation theory assumes that cross-cultural differences in value priorities reflect different socioeconomic trajectories and other forms of historical path dependency but not differences in the meaning of the values on which cultures differ (Welzel Reference Welzel2013, 41–3). Importantly, this premise assumes that the same survey items can be used to measure values in virtually all countries in the world and that country means aggregated from those items are largely comparable on a worldwide scale.

To ensure that people in various countries rank their value priorities on the same scale and that their composite value scores are therefore comparable across countries, the measurement model for that scale should be invariant (i.e., apply in different cultural and geographical contexts). In statistical terms, this means that all model parameters should be (approximately) the same in each possible subsample (i.e., country). This assumption is often called the assumption of measurement invariance/equivalence (Steenkamp and Baumgartner Reference Steenkamp and Baumgartner1998; Davidov et al. Reference Davidov, Meuleman, Cieciuch, Schmidt and Billiet2014). If the invariance assumption does not hold, then it means that people in different countries interpret the relevant survey questions and respond to them differently. In that case, cross-national comparisons and regression analyses using noninvariant value scales are invalid, because such scales confuse true differences in individual values and country-specific response biases (Stegmueller Reference Stegmueller2011).

The assumption of invariance has rarely been tested statistically for any index from the modernization-emancipation family. To some extent, this may be explained by reference to the fact that 40 years ago, when the postmaterialism index was introduced, the importance of the invariance assumption was not yet widely recognized in comparative survey research. In addition, proper statistical tools for invariance testing had simply not been developed at that time. The few papers that did check for invariance of some indices belonging to the modernization-emancipation family, however, reveal a lack of cross-national comparability for these indices (e.g., Mackintosh Reference MacIntosh1998; Davis, Dowley, and Silver Reference Davis, Dowley and Silver1999; Ippel, Gellisen, and Moors Reference Ippel, Gelissen and Moors2014; but see Sacchi Reference Sacchi1998).

In particular, Alemán and Woods (Reference Alemán and Woods2016) recently tested for invariance of (1) traditional–secular-rational and survival–self-expression values and (2) secular and emancipative values across the cultural zones defined in Welzel (Reference Welzel2013), using all available data from the World Values Survey from 1981 to 2014. They concluded that “For the most part, WVS orientations are not [. . .] comparable cross-nationally, except among a small number of Western post-industrial societies.” They also admitted that “Welzel may be getting closer to formulating a set of concepts that are equivalent and coherent across the world” but then added that “The empirical properties of these value orientations differ in important ways from the ones he [Welzel] and Inglehart have proposed” (ibid., 1059).

It is, however, worth noting that Alemán and Woods generated a slightly modified version of the EVI, including only two instead of the original four first-order latent variables and without the second-order factor and only eleven items of twelve.Footnote ² In addition, they conducted an invariance test across only four of ten cultural zones defined by Welzel, three of which were represented only by one country each.

The goal of the present article is to extend this line of comparative attitudinal research and provide a full-scale investigation of the measurement validity of the EVI, the most recent and complex representative of what I have called the “modernization-emancipation family” of value indices. In this respect, the main contribution of the article is threefold. First, it demonstrates that the original measurement model for the EVI based on the pooled WVS sample has multiple misspecifications, resulting in a relatively poor fit. Furthermore, the results of tests for measurement invariance of emancipative values show that the EVI is not even configurally invariant across the ten cultural zones (as defined in Welzel Reference Welzel2013) and therefore across WVS countries. These two findings suggest that the EVI is a biased estimator of cross-national differences in prevalent value patterns. However, for one subcomponent of the EVI, known as “pro-choice values” or “choice,” cross-zone and even cross-national invariance can be established using a less restrictive approximate Bayesian approach. Hence, this value subdimension (which reflects how permissible people in different countries find sexual self-determination in such matters as homosexuality, abortion, and divorce) can serve as a reliable benchmark for cross-national comparisons of the prevalence of a mass-level desire for emancipation.

The paper also contributes to the general discussion on definition and measurement of emancipative values and broadly of any multi-item attitudinal construct in comparative research. In a recent paper, Welzel and Inglehart (Reference Welzel and Inglehart2016) reject the latent-variable (or reflective) interpretation of their value measures, including the EVI, and claim all such measures are formative constructs, so the issue of noninvariance, while being essential for reflective constructs, is not critical for the validity of the indices from the modernization-emancipation family. They also argue that their indices are intended to capture societal-level change in prevalent patterns of values and assert that evidence of a construct's measurement noninvariance at the individual level has nothing to do with the same construct's validity at the country level. Finally, they claim that the exceptionally strong external linkages of their value measures are an excellent indication of those measures’ validity.

I challenge these claims. First, I show that the strong aggregate-level associations of particular components of the EVI with each other and with other variables of interest can be partially attributed to measurement error. I also propose a thought experiment challenging Welzel and Inglehart's formative interpretation of the EVI and show that the index seems to poorly fit the validity criteria for formative measures. Using another thought experiment and the case of the effective democracy measure (Alexander, Inglehart, and Welzel Reference Alexander, Inglehart and Welzel2012) as an empirical example, I then demonstrate that the EVI mixes up different value dimensions and their actual associations with political outcomes.

In the section that follows, I introduce the concept of emancipative values; assess the internal validity of its measure, the EVI; examine whether the EVI meets the established methodological standards of cross-cultural comparability; and highlight three arguments used by Welzel and Inglehart to defend their value measures. In the next section, I show that these arguments are not applicable in the case of emancipative values. I conclude by specifying several suggestions on how the measurement of value orientations from the modernization-emancipation family can be improved and emphasizing the general importance of proper construct development and measurement for comparative political research.

ASSESSMENT OF THE ORIGINAL MEASUREMENT MODEL FOR EMANCIPATIVE VALUES

Emancipative Values: Definition and Measurement

Modernization-emancipation theory states that if societies develop progressively, then their course follows a two-stage pathway: a transition from agrarian to industrial societies and from industrial to knowledge societies. The first transition leads to an increasing bureaucratization of society, while the second transition comes with increasing individualization (Inglehart and Welzel Reference Inglehart and Welzel2005; Welzel Reference Welzel2013, 58). Individualization here means an increase in people's desire for emancipation or “an existence free from domination” (Welzel Reference Welzel2013, 2).

The EVI is a further development of a widely used index of self-expression values (Inglehart and Baker Reference Inglehart and Baker2000). It is designed to measure “how strongly people claim authority over their lives for themselves” and how much equality they concede in this matter to everyone else (Welzel Reference Welzel2013, 59). The measurement model behind the concept of emancipative values may be represented as a hierarchical model (Figure 1), in which emancipative values are a second-order construct related to four first-order constructs, which in turn affect observed indicator variables. Welzel calls these first-order constructs “personal autonomy,” “reproductive choice,” “gender equality,” and “people's voice,” or, in short, “autonomy,” “choice,” “equality,” and “voice.”

To measure people's emphasis on “autonomy,” Welzel uses three WVS items showing whether respondents consider (a) independence and (b) imagination as desirable qualities in children while not considering (c) obedience to be such a quality. To measure how strongly people value “choice,” again three items are used, indicating how acceptable respondents find (a) divorce, (b) abortion, and (c) homosexuality. A respondent's emphasis on “equality” is measured with another three items indicating how strongly people disagree with the statements that (a) “education is more important for a boy than a girl,” (b) “when jobs are scarce, men should have priority over women to get a job,” and (c) “men make better political leaders than women.” Finally, to measure how strongly respondents value the “voice” of the people as a source of influence in their society, the following three items are used, indicating whether respondents assign first, second, or no priority to the goals of (a) “protecting freedom of speech,” (b) “giving people more say in important government decisions,” and (c) “giving people more say about how things are done at their jobs and in their communities” (Welzel Reference Welzel2013, 67).

Internal Validity and Comparability of Emancipative Values

Welzel does not justify his index based on its internal consistence. Still, he reports the results of a hierarchical (second-order) exploratory factor analysis of twelve items defining the EVI using the country-pooled individual level data of 95 countries surveyed at least once by the WVS/EVS, based on the latest available survey from each country (from the period of 1995–2005). Factor loadings produced by that procedure are documented in Welzel (Reference Welzel2013, 71, Table 2.3) and are quite high both at the first level and at the second level (almost all >0.7).

I replicate his analysis using the data of all 99 countries surveyed during the third to sixth rounds of the World Values Survey (1995–2014) using confirmatory factor analysis (CFA). I also conduct tests for measurement invariance of emancipative values. Given that my results in these respects are similar to Alemán and Woods's findings and because of space limitations, I provide a detailed description of my analysis, as well as a discussion of the relevant methodological issues, in the supplementary material (Appendices A and B). Here I briefly summarize the key findings.

First, I find that, according to the standard CFA goodness-of-fit criteria, Welzel's original measurement model for the EVI is not decidedly unacceptable but still obviously misspecified in some way (see Appendix A). In particular, several items do not discriminate well between the first-order factors and may simply not be useful for measuring the overall construct. Also, there is evidence of multidimensionality of the latent trait behind the index.

I then proceed with tests for cross-cultural comparability of emancipative values. I follow the empirical strategy of Alemán and Woods and employ multigroup CFA (MGCFA) for measurement invariance testing but extend the spatial coverage of their analysisFootnote ³ and use more appropriate CFA techniques.Footnote ⁴ Nevertheless, I come to the same substantive conclusion: The EVI does not satisfy even the weakest requirement necessary for cross-national comparisons, which is known as configural invariance.Footnote ⁵

Table 1 illustrates the point: It clearly shows that the factor loading patterns are inconsistent across Welzel's ten cultural zones. Several items in certain zones were excluded from the analysis to achieve model convergence. In each zone, at least one (typically more) item has a factor loading that is either insignificant or critically low (<0.3)Footnote ⁶ or even has a negative sign. This unambiguously suggests that Welzel's original twelve-item measure of emancipative values is not universally applicable in different cultural contexts.

TABLE 1. Group-Specific CFAs of 12 Variables from the Sixth Wave of the WVS for Ten Cultural Zones (2010–2014)

Notes: Entries are standardized factor loadings. All estimates are significant at the 0.05 level (except those marked as n.s. = nonsignificant). Loadings in bold are those lower than 0.30. Negative loadings are in italic. Variable intercepts, thresholds, and variances are not shown. Models were estimated in MPLUS version 7.11. National samples were weighted to equal size (N = 1,500). Due to the fact that nine of twelve observed indicators are categorical ordered variables, the WLSMV estimator was used for parameter estimation. Pairwise present analysis was used to deal with missing values. N = number of observations used. CFI = Comparative Fit Index. TLI = Tucker-Lewis Index. RMSEA = Root Mean Standard Error of Approximation.

At the same time, I reveal that one component of emancipative values, “choice,” has high-loaded indicators across all cultural zones. I examine whether more demanding assumptions required for comparability (known as metric and scalar invariance) hold for this value dimension. For this purpose, I use a recently introduced method of invariance testing known as the approximate Bayesian approach. This method is less restrictive than the classical (strict) approach to invariance testing but still allows for valid comparability testsFootnote ⁷ (Van de Shoot et al. Reference Van De Schoot, Kluytmans, Tummers, Lugtig, Hox and Muthén2013; Muthen and Asparouhov Reference Muthén and Asparouhov2013).

Applying this approach yields an encouraging result: For “choice,” the stronger forms of [approximate] invariance hold not only across ten cultural zones but also across all WVS countries. This suggests that “choice” is a reliable indicator of cross-national differences related to the modernization-emancipation process.Footnote ⁸

Unfortunately, “choice” is the only comparable component of the EVI in the WVS data. All other index components, as well as the second-order construct itself, fail to satisfy even the weakest requirement of configural invariance. Strictly speaking, this means that the EVI, in its current version, does not measure the same latent value dimension(s) in different cultural zones and, consequently, in different countries. Therefore, it should not be used for cross-national comparisons and statistical analyses.

Why Do Welzel and Inglehart Think the Results Above Are Not as Relevant as Their Critics Say?

The finding of noninvariance is not very surprising. Inglehart and Welzel themselves have pointed out explicitly a number of times “the variable and [. . .] weak inter-item coherence of [. . .] value constructs at the individual level within countries” (Inglehart and Welzel Reference Inglehart and Welzel2005, 231–44; Welzel Reference Welzel2013, 74–9, 110–2). They, however, do not think that the EVI's lack of invariance poses serious problems to substantive inferences involving emancipative values as an explanatory variable. In a recent article, Welzel and Inglehart (Reference Welzel and Inglehart2016) directly respond to criticisms of their value measures based on the noninvariance issue.

First, Welzel and Inglehart state that despite the fact that the data they “use to create [. . .] value constructs are collected from individuals, the purpose of these constructs is not to measure internally convergent personality traits.” Instead, they “aim to capture value configurations that emerge first and foremost—and sometimes solely—at the group level, the place where culture takes shape” (ibid., 1070–1). In other words, what matters for them is societal prevalence of different types of values, not individual value priorities.

They also note that the strength of associations between different components of emancipative values dramatically increases at the aggregate level compared to the individual level, referring to that regularity as the “micro-macro puzzle.” They explain this finding with the impact of measurement error that partially obscures true correlations at the individual level. Moreover, Welzel and Inglehart claim that aggregate-level correlations are more reliable since they are free of individual-level measurement error (which is removed by aggregation). Their overall conclusion is that “the strength of this aggregate-level association is in no way invalidated by its weakness at the individual level” (ibid., 1072) and that one cannot assess the measurement quality of an aggregate-level construct at the individual level, even if the aggregation emerges from individual-level data.

Welzel and Inglehart go on to argue that the EVI is designed according to combinatory (the term “formative” is more popular in the methods literature) principles of index construction, not to the dimensional (the equivalent term “reflective” is more popular in the methods literature) ones predominant in contemporary social sciences. The main difference between the two is that the reflective logic of measurement assumes that the latent construct causes variation in the observed indicators (and the latter are interchangeable manifestations of that construct), while the formative assumes that the observed indicators define the construct and requires constituent components to be recognizably different, not interchangeable. According to formative principles, the construct is simply an algebraical function of its constituent items. Welzel and Inglehart state that the EVI, as a formative measure, is designed to measure overall performances across a theoretically defined field of emancipation, and the dimensionality of the partial performances simply does not enter into consideration.

If so, then it is inappropriate to assess the validity of measures designed according to combinatory principles against the standards of the dimensional approach (including requirement of invariance). What can be used to assess the validity of a formative measure? According to Welzel and Inglehart, the most important criterion to judge the measurement quality of a combinatory construct is its “predictive power over other aspects of reality” (ibid., 1075). Welzel and Inglehart note that their “value constructs associate at exceptional strength across nations and over time with several dozen key indicators of (1) socioeconomic development, (2) cultural legacies, and (3) institutional performance[. . .] The explanatory power of the value constructs over these aspects of social reality ranges from 60% to 80%, across some 100 countries representing more than 90% of the world population” (ibid.). Therefore, these constructs “tap something very real” and represent fully valid measures.

In the following section, I show why appeals to (1) the presumed independence of the validity of macro-level constructs from the validity of individual-level constructs, (2) the standards of formative logic, and (3) high explanatory power of emancipative values and their strong linkages to various structural variables are not sufficient to justify the current version of the measurement model for the EVI.

CHALLENGING WELZEL AND INGLEHART'S DEFENSE

Does Macro-Level Construct Validity Not Depend on Micro-Level Measurement?

Two principal objections can be raised against the first pillar of Welzel and Inglehart's defensive argument. The first objection accentuates the contradiction between (a) Welzel and Inglehart's statement that what matters for their theories are value configurations that exist and operate at the macro level and (b) their own description of causal mechanisms linking mass-level value change and its presumed political effects. Let me use a putative link between values (or, broadly speaking, political culture) and democracy, which has been advanced in dozens of papers by Inglehart, Welzel et al., to substantiate this point.

Most political scientists can agree that institutional changes, such as democratization, typically spring from the actions of (groups of) individuals. The latter, however, are partially driven by macro-level factors (e.g., economic depression, income inequality, ethnic or religious discrimination, foreign intervention) that create incentives and redistribute resources for political action. Welzel and Inglehart recognize this basic macro-micro-macro sequence. In a very small nutshell, their general argument, which appears in their papers again and again in modified forms, states that the principal cause of value change and therefore political evolution is economic development, which relaxes existential constraints on people's actions, facilitates satisfaction of basic needs, turns the nature of everyday life from a source of pressures into a source of opportunities, and, hence, increases the utility and value of universal freedoms.

However, the growth of well-being and security from survival threats does not affect political institutions directly. As Inglehart and Welzel (Reference Inglehart and Welzel2010, 551) state in a recent paper, “simply reaching a given level of economic development could not itself produce democracy; it can do so only by bringing changes in how people act.” A liberating impulse of improving life conditions and expanding action resources travels first from the macro to the micro level, changes relevant individual behavioral dispositions (values and regime preferences), and only then goes back to the macro level, where it manifests itself through revolutionary or peaceful democratizations.

This pattern clearly suggests that values are important determinants of the individual actions that trigger political change or, to put it simply, that individual values matter. Since the mechanism is supposed to be universal, in order to be able to test it, a researcher should observe comparable value dimensions in different countries.Footnote ⁹ If she does not, as in the case with configural noninvariance of respective measures, it implies that either (1) the whole theory is wrong or (2) the way value orientations are measured is flawed (Seligson Reference Seligson2002).

My second objection underscores the danger of ignoring measurement error in analyses of international survey data. It is commonly acknowledged in contemporary social sciences that measurement error is an inevitable evil within survey research (Alwin Reference Alwin2007; Adcock and Collier Reference Adcock and Collier2001). Various factors may cause observed individual responses to be different from true individual preferences or attitudes. Welzel and Inglehart recognize this danger but argue (e.g., Inglehart and Welzel Reference Inglehart and Welzel2005, 231–44; Welzel and Inglehart Reference Welzel and Inglehart2016, 1071–2) that the errors in survey data are distributed randomly, so aggregation mostly cancels them out. This argument may be correct when one deals with individual-level errors. However, comparative researchers face another, potentially much more dangerous source of measurement error, arising exclusively at the level of society.

Let me illustrate the problem by introducing a very simple formal notation, borrowed from multigroup structural equation modeling. In the general context, the relationship between the unobserved individual preference η_ij of the individual i regarding the issue j and the observed response y _ij to some survey item j can be described as follows:

(1)

$$\begin{equation*} {y_{ij}} = \ {\eta _{ij}} + \ {\varepsilon _{ij}},\ \end{equation*}$$

where ε_ij represents nonsystematic individual-level error.Footnote ¹⁰

When individuals are clustered within groups (e.g., countries), not only individual-level but also group-level factors can distort the correspondence between the latent preference and the observed response. Such factors, which are often called “method effects” (Stegmueller Reference Stegmueller2011), include inaccurate translations of survey questions, differences in sampling procedures, response rates, and survey modes across studied countries, systematic differences in response styles and culturally specific understandings of certain measured concepts (Davidov et al. Reference Davidov, Meuleman, Cieciuch, Schmidt and Billiet2014, 56; Przeworski and Teune Reference Przeworski and Teune1966). To capture potential bias due to macro-level method effects, Equation (1) should be rewritten as

(2)

$$\begin{equation*} {y_{ijg}} = \ {\eta _{ijg}} + \ {u_{jg}} + \ {\varepsilon _{ijg}},\ \end{equation*}$$

where g is the group index and u _jg represents the group-level component of the overall measurement error for the item j in the group g. By contrast with other terms in Equation (2), u _jg does not vary across individuals but simultaneously affects all individual responses within the group g. As a result, the average of the observed responses to the item j in group g does not correspond directly to the average of respective individual latent preferences but is generally biased, and the magnitude and direction of group-specific bias are determined by the term u_jg.Footnote ¹¹

Sometimes the macro-level bias can be substantial. Consider the case of a standard four-point Likert item. Suppose that there are two countries in which the vast majority of the population has an equally favorable view of some social phenomenon. Thus, the item asking whether respondents agree with some statement expressing positive attitude toward this phenomenon should receive responses of “agree”/“strongly agree” or, to quantify them using a 1-to-4 scale, 3 and 4, in both countries. Consequently, the country means in both should be around 3.5.

Suppose now that extreme responding is prevalent in one country and midpoint responding is prevalent in the other. In that case, in the first country the average score will be biased upward (and close to 4), while in the second country the average score will be biased downward (and close to 3). In the limit, the observed difference in means between the two countries will tend to 1. Given that the theoretical range for mean scores is 3 (4 minus 1), an analyst who uses raw data and ignores measurement error will conclude that people in the first country are almost 50% more supportive of the phenomenon of interest than people in the second country—despite the true difference being nearly zero.Footnote ¹²

What is even more important is that the country-level measurement bias affects not only the hierarchy of country means but also the magnitude of aggregate-level correlations between survey items. As it follows from Equation (2), the observed aggregate correlation between two attitudinal items y ₁ and y ₂ consists of two main components: (a) the correlation between the country means on η₁ and η₂ and (b) the correlation between the country-level error terms u _{y ₁} and u _{y ₂}.Footnote ¹³

The latter correlation is typically much stronger. Because the bias is introduced at the macro level, it unidirectionally affects many (if not most) attitudinal items included in a country-specific questionnaire, especially items intended to measure close concepts (for instance, for such items a common translation error is more likely to occur). Therefore, the observed aggregate-level correlation in the presence of this kind of bias will always be higher than the true correlation. In the supplementary material (Appendix C), I report a simple simulation experiment showing that, under quite realistic conditions, the country-level error can easily double the strength of correlation at the aggregate level compared to the individual level.

The general principle determining the size of the observed aggregate-level correlations is simple: (a) the higher relative magnitude of the country-specific errors compared to the range of national mean scores on the respective construct for each involved variable and (b) the stronger correlation between the country-level errors, and then (c) the higher upward bias in the macro-level correlations between attitudinal items, compared to the individual-level correlations between the same items.

Importantly, here (and in the simulation in the supplementary material) I assume that the macro-level error terms u _{y ₁} and u _{y ₂} are correlated with each other but not with the mean vectors μ_{y ₁} and μ_{y ₂}. That is, I assume that the country-level error is not driven by the same systematic country characteristics that produce variation in the prevalence of values. Welzel and Inglehart insist that the country-specific internal consistency (they prefer the terms “coherence” and “inter-item convergence”) of the EVI increases in parallel with the country-mean scores on emancipative values. Furthermore, both the variation in the intranational degree of the index's internal consistency and the variation in country mean scores on the EVI can be quite satisfactorily predicted with a set of structural variables, most importantly the level of cognitive mobilization (see Welzel Reference Welzel2013, 74–9, for a detailed description of this concept). They call these variables “coherence-inducing forces.”

According to Welzel and Inglehart (Reference Welzel and Inglehart2016, 1080–2), since the coherence-inducing factors reflect pivotal aspects of modernization, such as technological progress and expansion of a knowledge society, their tremendously high correlations both with the mean level and with the degree of intranational coherence of the EVI suggest that this index represent a fully consistent measure of modernization's manifestation in collective mentalities.

This result is not as supportive of Welzel and Inglehart's argument as it appears at first glance. First, their finding simply means that in more technologically advanced societies surveys provide more reliable results. It does not prove with certainty that in less-developed societies emancipative values are actually as low as is observed in the WVS data. From the methodological viewpoint, it only illustrates that (a) there is much more uncertainty about the estimates of relative prevalence of emancipative values in such societies and (b) those estimates are more likely to be biased.

Second, they measure intranational coherence of emancipative values using Cronbach's alpha with respect to the four subcomponents of the EVI. But the internal convergence of three of the four first-order components of the index is itself highly doubtful for most WVS countries. Moreover, Cronbach's alpha is often criticized as a measure of either internal consistency or reliability (Sijtsma Reference Sijtsma2009; Alwin Reference Alwin2007). Since it assumes equal factor loadings and error variances for all indicators, it is not recommended for use in comparative contexts (Alemán and Woods Reference Alemán and Woods2016, 1051). Moreover, as follows from Figures 1 and 3 in Welzel and Inglehart (Reference Welzel and Inglehart2016, 1078–82), country-specific alphas for the EVI do not exceed 0.65 even for the most developed Western countries (Croatia, Austria, and Denmark approach, but do not pass, that threshold), although it is generally considered that the acceptable values of alpha range from 0.70 to 0.90 (Tavakol and Dennick Reference Tavakol and Dennick2011, 54).

So, even while increasing with cognitive mobilization, the overall consistency of the EVI remains low even for the most developed countries. As Table 1 above shows, configural differences in factor loading patterns exist even between Welzel's three zones containing the world's most developed nations (in which, according to Welzel-Inglehart's thesis, both the degree of convergence and the average level of the EVI should be highest): the Old West, the Reformed West, and the New West. This suggests that, in the case of the EVI, cross-national differences in the level of cognitive mobilization do not fully account for measurement bias.

Third, empirical evidence indicates that method effects do indeed amplify the observed macro-level correlations between WVS variables. As even configural invariance does not hold for the EVI, one cannot directly account for country-level bias and estimate its impact on the observed correlations between the EVI and other variables. However, it is possible to do so for the “choice” subdimension. I test the robustness of the correlation between country-average pro-choice values and country-average willingness to fight for one's countryFootnote ¹⁴ (Inglehart, Puranen, and Welzel Reference Inglehart, Puranen and Welzel2015; see also the supplementary material, Appendix D), using data from the sixth wave of the WVS.

I measure country means on “choice” using both Welzel's approach (i.e., arithmetic means) and the approximate Bayesian MGCFA approach, which allows for accounting for both country- and individual-level measurement error. For its part, willingness to fight is a binary indicator with a very clear and (in theory) universally understandable meaning, so one may expect that this item is a little less noninvariant than other WVS questions. Although, as Stegmueller (Reference Stegmueller2011, 484) stresses, “If only a single item is available for analysis, cross-national equivalence cannot be tested but must be assumed.” All in all, these two variables provide a relatively good benchmark for assessment of how much bias measurement error induces in the correlation between attitudinal scores aggregated from cross-national survey data.

I find that the raw country means on “choice” correlate more strongly (ρ = –0.499) with the average willingness to fight than the Bayesian means (i.e., adjusted for measurement error) do (ρ = –0.443). Therefore, accounting for measurement error even for only one of the two attitudinal constructs decreases the strength of correlation between these constructs by approximately 13%. In this case, the bias is obviously not critical: The adjusted association remains statistically significant and relatively strong.

Yet, due to the absence of even configural measurement invariance, one cannot directly (that is, using a measurement model) account for the bias in the country means of the EVI's other components. Given that these other components are much more problematic than “choice” in terms of their measurement validity, I would suggest that their high macro-level correlations with each other and with other political variables, either aggregated from the individual-level data or existing solely at the macro level, can be explained by the influence of measurement error at least to the same degree as by existence of true associations.

Are Emancipative Values a Truly Formative Construct?

While both the formative and the reflective are powerful approaches to the specification of multidimensional constructs, the choice between these two may have a substantial impact on inferences and is not an easy task (Coltman et al. Reference Coltman, Devinney, Midgley and Venaik2008). Law et al. (Reference Law, Wong and Mobley1998) suggest a relatively simple rule to distinguish which of formative and reflective measurement is more appropriate in a given situation. According to Law et al., reflective specification (they use the term “latent model”) of the construct should be used whenever the multidimensional construct exists at a deeper and more embedded level than its dimensions or, in other words, when the multidimensional construct is a higher-order abstraction underlying its dimensions. The formative specification (they use the term “aggregate model”) of the construct should be used whenever the multidimensional construct exists at the same level as its dimensions and is defined as a combination of its dimensions and, in addition, can be formed as an algebraic function of its dimensions (Law et al. Reference Law, Wong and Mobley1998, 742–3).

Do the EVI and its particular components exist at the same level of abstraction or not? Though in previous works Welzel and Inglehart relied on factor analytic procedures to validate their measures (Inglehart and Baker Reference Inglehart and Baker2000; Welzel Reference Welzel2013), they now oppose the latent variable interpretation of their value constructs and advocate the combinatory interpretation instead. Consider, however, a hypothetical situation, when items measuring one specific component of emancipative values are not included in the WVS questionnaire that will be used in the next wave.

Can one make an approximate inference about country mean scores on the unobserved values component using measured scores on other components of the emancipative values? Can one make an inference about at least the sign of the association between emancipative values measured on a reduced scale and other aggregate country characteristics? Finally, will the latter inference significantly change if one suddenly obtains scores on the missed component and then reconducts the analysis using all the relevant information?Footnote ¹⁵

The way emancipative values are typically used suggests that one can recover scores on that dimension with an acceptable level of precision, as well as potentially use particular scores on the remaining components of emancipative values to estimate the strength and direction of the association between the EVI and other country-level variables of interest. Therefore the first-order components (and, certainly, the observed items used to measure them) apparently represent interchangeable manifestations of the higher-order construct of emancipative values.

Then, Coltman et al. (Reference Coltman, Devinney, Midgley and Venaik2008, 1254) recommend the formative measurement when observed items are not highly correlated. Moreover, the use of correlated components may cause estimation problems for formative measures. As Table 1 and the analysis in the supplementary material demonstrates, most observed items and first-order constructs defining emancipative values are strongly correlated with each other, and in general, on the pooled WVS sample the EVI shows an acceptable, though not ideal, fit. So, in this respect, it also should be considered a reflective measure. The problem with the EVI is that the strength of correlations between its components varies significantly across different countries and cultural zones.

Certainly, the substantial consequences of the lack of strong (metric or scalar) forms of invariance should not be overestimated. As Davidov (Reference Davidov2008, 43) notes, “Measurement invariance is too strict [. . .]. In other words, the measurement invariance test could fail [. . .] although there is cognitive equivalence” (see also Oberski Reference Oberski2014). Unfortunately for the EVI, even the weakest form of invariance, configural invariance, could not be established across global cultural zones and countries. Moreover, some configural differences exist even between the zones with the highest mean scores on emancipative values.

This indicates that emancipative values, in their current form, are definitely an ideal construct, not an empirically observable value dimension. Welzel and Inglehart recognize that their construct refers to an ideal rather than empirical benchmark and measures only how far individual responses to selected survey items in different countries are (in the aggregate) from the normative standard of a liberal and equal society. They are quite convincing in arguing that if such a “deviation-from-the-standard” measure associates strongly with other important features of reality, it may be a meaningful measure.Footnote ¹⁶ But this is not formative measurement. It is essentially normative measurement.Footnote ¹⁷

Finally, it must be noted that the formative models are not free from measurement error. They must be specified carefully in order to account for it (Jarvis, MacKenzie, and Podsakoff Reference Jarvis, MacKenzie and Podsakoff2003). They also cannot be assumed to be equivalent across countries merely because they are formative. This assumption can and generally must be tested (Diamantopoulos and Papadopoulos Reference Diamantopoulos and Papadopoulos2010). Formative measurement is not a panacea, especially given that the estimator of the nation-level prevalence of emancipative values, according to Welzel's definition, is a simple average of all relevant indicators; that is, it assumes (a) no measurement error and (b) equal contribution of all observed items.Footnote ¹⁸ These assumptions are too dubious even by the standards of the formative approach.

Are Explanatory Power and a Convincing Theory Sufficient When Assessing the Quality of Formative Constructs?

A potential undesirable consequence of the use of complex value measures is that, despite their complexity, such indices may oversimplify, or blur, actual associations between particular value dimensions and their expected correlates. In the supplementary material (Appendix E), I report another simulation experiment that clearly shows that both a theory-driven formative and a data-driven reflective combination of several distinct constructs into a single higher-order construct may yield measures that have high internal and external validity and fit theoretical predictions but miss some important aspects of the reality at the same time (e.g., the opposite effects of the first-order components on the outcome of interest). The detection of misspecifications of such composite measures is not an easy task (Coltman et al. Reference Coltman, Devinney, Midgley and Venaik2008, 1253), and it may also require considerable revision of the theory as its consequence. Nevertheless, when some indirect evidence of misspecification is available, researchers should not simply ignore it. For an empirical illustration of this point, allow me to consider the association between emancipative values and effective democracy.

Table 2 reports pairwise correlations between the 2014 Effective Democracy Index, or EDI (Alexander, Inglehart, and Welzel Reference Alexander, Inglehart and Welzel2012), and country scores on four components of the EVI measured during the sixth round of the WVS.Footnote ¹⁹ If Pearson's ρ is used as a measure of the association's strength, then “choice” correlates with the EDI almost at the same level as the EVI does (ρs are 0.809 and 0.830, respectively; the correlation between “choice” and the EVI is 0.92).Footnote ²⁰ The correlation between “equality” and the EDI is also numerically close to these associations in strength (ρ = 0.730). “Voice” and “autonomy” correlate with the EDI to a substantively lesser, though still significant extent (ρs are 0.579 and 0.568, respectively).

TABLE 2. Pairwise Correlations between Effective Democracy and Various Components of Emancipative Values

Notes: Entries are pairwise correlations between Alexander, Inglehart, and Welzel's (Reference Welzel and Deutsch2012) Effective Democracy Index (EDI) and various components of emancipative values. “Choice (latent means)” = country mean scores on the “choice” value dimension, computed using MGCFA with approximate Bayesian invariance. “Choice,” “autonomy,” “equality,” and “voice” are country mean scores on the respective components of the EVI. They, as well as the EVI country mean scores, are computed as Welzel (Reference Welzel2013, 63–9) describes. The EDI is updated for 2014 and rescaled so that the theoretical minimum is 0 and the maximum is 1. All value scores are computed using the data from the sixth wave of the WVS (2010–2014; 58 countries: Kuwait and Egypt were excluded because one or more items measuring pro-choice orientations were not included in the national WVS questionnaires in those countries and therefore the latent means on “choice” could not be estimated).

***P < 0.001; ** P < 0.01; * P < 0.05.

Because the EDI computed for 60 countries surveyed in the sixth wave of the WVS has an obvious bimodal distribution, the use of Pearson's ρ may produce biased estimates of the correlations between different value dimensions and effective democracy. However, the rank correlation coefficients, Spearman's ρ and Kendall's τ, both tell broadly the same story: “choice” and “equality” correlate with the EDI almost as strongly as the EVI does. The associations between “voice” and the EDI and between “autonomy” and the EDI are significantly lower (by 25%–50%).

Furthermore, when “voice” and either “choice” or “equality” are used simultaneously in multiple regression analyses, regression coefficients for “voice” are insignificant (contrary to bivariate regression of the EDI on “voice”; see Table 3: Models 1, 3, and 5). In contrast, “choice” and “equality” remain significant predictors of the EDI even when simultaneously included in the regression equation (Table 3: Model 7).

TABLE 3. Regressing Effective Democracy on Various Components of Emancipative Values

Notes: Entries are nonstandardized OLS regression coefficients with standard errors in parentheses. The dependent variable is Alexander, Inglehart, and Welzel's (Reference Welzel and Deutsch2012) Effective Democracy Index (EDI). “Choice,” “autonomy,” “equality,” and “voice” are country mean scores on the respective components of the Emancipative Values Index. They are computed as Welzel (Reference Welzel2013, 63–9) describes. The EDI is updated for 2014 and rescaled so that the theoretical minimum is 0 and the maximum is 1. All value scores are computed using the data from the sixth wave of the WVS (2010–2014; 60 countries surveyed). Test statistics for heteroskedasticity (Breush-Pagan test), multicollinearity (variance inflation factors), and influential cases (Bonferroni P values for studentized residuals) reveal no violations of ordinary least squares (OLS) assumptions, except models 5 and 6, for which the Breush-Pagan test is significant. Reestimation of these models with heteroskedasticity corrected standard errors gives the same substantive results.

*** P < 0.001; ** P < 0.01; * P < 0.05.

Moreover, the use of “choice” and “equality” as separate predictors of the EDI in a linear regression model gives an R-squared value almost as high as for the EVI (Model 8), 0.661 versus 0.685. So these two particular components of the index, when considered as separate value dimensions, have essentially the same explanatory power than the composite score on emancipative values. This suggests that “choice” and “equality” are presumably the only two components of the EVI that are actually important covariates of effective democracy.Footnote ²¹ This finding clearly shows that a simple composition of theoretically relevant items, even if it has high external validity, may sometimes overshadow truly important factors.

Some readers may not be convinced by my argument because it involves only one real data counter-example. A deep investigation of associations between particular components of emancipative values and various societal-level outcomes, however, requires a full-length volume. I refer such readers to Table 2.6 in Welzel's book (Reference Welzel2013, 80; see also related discussion on p. 81), which reports correlations of emancipative values and their components with assumed antecedents and consequences. In that table, “choice” demonstrates the strongest or at least the second-strongest correlation among the subcomponents of emancipative values with almost all individual- and especially country-level covariates, suggesting that, even if it is declared that all value components are equal, some of them are more equal than others.Footnote ²²

DISSCUSION AND CONCLUDING REMARKS

This article reports evidence of misspecification of the measurement model for the index of emancipative values. It shows that this index, which is used as the key explanatory variable in many important contributions in sociology and political science, is cross-culturally noninvariant and confuses different value dimensions and their actual associations with political variables. However, an analysis using a novel approximate Bayesian approach shows that one subdimension of emancipative values, “choice,” is comparable across WVS societies.

Welzel and Inglehart (Reference Welzel and Inglehart2016) argue that MGCFA-based invariance testing is not an appropriate method to test the validity of the EVI, as well as of other value measures from the modernization-emancipation family of value indices that have been proved to be noninvariant in recent studies, because all these value constructs (a) operate and manifest themselves only at the aggregate level, (b) are formative measures and therefore should not be judged against the standards of reflective measurement, and (c) have high external validity.

My thought experiments and real data examples suggest that their argument is doubtful. Even if we assume that the EVI fully satisfies the definition of a formative construct (which is itself a highly questionable assumption), it is quite likely that the observed associations between values and various political outcomes can be explained mainly by the effects of two subcomponents of the overall index, “choice” and “equality.” Although this article strongly supports the reflective interpretation of emancipative values, it nevertheless shows that, whatever approach to the measurement of these values is used, their current operationalization remains problematic. In addition, high aggregate-level correlations between the EVI or its components and other variables (especially those aggregated from survey data) are at least partially bolstered by country-level method effects.

Of course, these results by no means imply that all substantive results based on the use of the EVI, or any other index belonging to the modernization-emancipation family, are completely wrong. The long tradition of highly respected research addressed in this article shows that values are important determinants of both individual behavior and desirable societal-level changes, such as growth of trust and tolerance or democratization. My aim is not to cast doubt on the fundamental theses and findings of modernization-emancipation theory. Instead, I have sought to show that at the current level of the theory's development, there is still some work to be done to clarify what emancipative values actually are and how they should be measured. So I would like to conclude by outlining several implications of this study that might be useful for further clarification of the concept of emancipative values and investigation of associations between emancipative values and other aspects of social reality.

My main positive finding is that the “choice” subdimension of emancipative values is a reliable indicator of a given society's emphasis on “freedom from external domination.” It must also be noted that, as can be seen from Table 1, two items measuring the “equality” dimension of emancipative values have high loadings in almost all cultural zones. This may indicate that perceptions of issues related to gender equality can also be cognitively equivalent across different cultures. However, even in such a reduced form this subindex cannot be assumed to be sufficiently reliable, because, since MGCFA is not applicable with only two indicators, it is problematic to account for potential bias due to zone- or country-level measurement error. Thus, a further search for additional effective indicators of this value would be worthwhile.

Importantly, these two value dimensions appear to be equally powerful predictors of variation in political outcomes to the EVI itself. This finding probably reflects better measurement properties of the respective constructs but also a different, and more fundamental, nature of these value dimensions compared to other components of emancipative values. As Inglehart, Ponarin, and Inglehart (Reference Inglehart, Ponarin and Inglehart2017, 1315) note, “all preindustrial societies that survived for long, encouraged much higher human fertility rates than do today's high-income societies.” Traditional societies did so by indoctrinating their members with rigid cultural norms emphasizing the priority of reproduction, and of the community's survival in general, over individual goals and desires.

Those norms stigmatized deviant forms of behavior, such as homosexuality, abortion, and divorce, among others, and often involved physical repressions against deviant individuals. They also placed harsh constraints on the space of permitted social roles for women. Even today, sexual liberation is highly contested by traditionalist forces (Alexander, Inglehart, and Welzel Reference Alexander, Inglehart and Welzel2016, 911), while emancipation in other domains does not meet such fierce resistance from religious and other conservative organizations. It is not surprising that repressive norms that have been deeply rooted in the everyday social practices of virtually all societies throughout the whole history of humanity are mirrored in different cultures in a more coherent way than abstract Western-centric ideas of political liberty and permissive parenting styles.

Overall, this article and other recent research show that the “choice” value construct (a) has a well-developed theoretical basis, (b) demonstrates a reasonable level of cross-national comparability, and (c) has strong external linkages to structural variables and also high external validity. Moreover, (d) it is simply a more parsimonious measure than the EVI. I believe these reasons are sufficient to justify the use of “choice” as the prime indicator of cross-national value differences related to modernization and moral evolution.

That said, I do not mean to say that “choice,” in its current form, is the best possible measure of the cultural differences linked with human empowerment and other aspects of development. Rather, I argue that it is the best measure among currently available options. It can certainly be improved in a couple of ways. For example, additional indicators of tolerance toward some “nonconventional” (for traditional societies) forms of sexual behavior can be added to the “choice” index, or alternative measures of other subdimensions of emancipative values, first, “equality,” can be proposed.

My research also has implications that go beyond the discussion of the measurement validity of Inglehart and Welzel's value measures. Multi-item measurement models are common in comparative political research. Often researchers use them uncritically and are not aware of (or do not have effective incentives to deal with) several important measurement issues, such as the need for a proper definition of the construct, the choice of a relevant measurement strategy, and assessment of the construct's measurement equivalence. In general, the measurement validity of any survey-based measure cannot be assumed and must be tested empirically.

Duane F. Alwin notes that there are “two basic approaches to minimizing [measurement] error. The first is to emphasize the reduction of errors in the collection of survey data through improved techniques of questionnaire design, interviewer training and survey implementation. The second is to accept the fact that measurement errors are bound to occur, even after doing everything that is in one's power to minimize them, and to model the behavior of errors using statistical designs” (Alwin Reference Alwin2007, 9).

Both approaches are highly relevant for comparative political science. If some survey-based measures of theoretically important constructs are found to be inappropriate (e.g., performing nonequivalently across countries), then they should be replaced by better-calibrated measures (e.g., using alternative item wordings, alternative item scales, or simply alternative sets of items). A good example of such conduct was given by Shalom Schwarz and his collaborators. Schwartz's 21-item measure of human values implemented in the European Social Survey was at one point found to be problematic in terms of invariance (e.g., Davidov Reference Davidov2008). Schwartz then proposed a refined instrument to measure human values, which was later shown to be approximately invariant across European nations (Cieciuch et al. Reference Cieciuch, Davidov, Algesheimer and Schmidt2017).

Unfortunately, refining such complex measurement instruments as comparative surveys is a difficult enterprise, which can take a great deal of time and resources. Despite the titanic efforts of survey administrators, money and organizational constraints often prevent comparative survey programs from reaching the highest standards of data quality and comparability. And, as Davidov et al. (Reference Davidov, Meuleman, Cieciuch, Schmidt and Billiet2014, 57) note, even “the most rigorous application of current standards cannot guarantee that measurements are comparable across nations.”

Existing international survey programs nevertheless remain an extremely important source of information for political scientists and definitely can and should be used in applied research. However, researchers using comparative survey data must (a) be aware of the potential pitfalls inherent to this kind of political data and (b) inform their audience of how they deal with the relevant methodological problems. In particular, the following three points are worth clarifying by applied researchers when they introduce their data and methods to the readers: (1) how their constructs are defined and measured (e.g. whether the authors are employing reflective or formative measurement), (2) how individual- and/or country-level measurement error is taken into account (if it is modeled directly or assumed negligible, and, if so, why), and (3) how the issue of cross-national comparability is addressed.

In the book cited above, to underline the importance of studying errors of measurement in survey research, Alwin quotes Mark Twain, who reportedly said about the weather: “Everybody talks about the subject, but nobody does anything about it” (Alwin Reference Alwin2007, xii). I hope that the present study, along with other recent methodological contributions, sufficiently clarifies why political scientists have to stop “talking” (or, to be precise, reading) about measurement error only on the pages of methodological journals and finally begin to take it seriously in their research.

SUPPLEMENTARY MATERIAL

To view supplementary material for this article, please visit https://doi.org/10.1017/S0003055417000624.

Replication Materials can be found on Dataverse at https://doi.org/10.7910/DVN/TPGPCR.

Footnotes

I thank Christian Welzel, Eduard Ponarin, Peter Schmidt, Maksim Rudnev, and Yegor Lazarev, as well as the members of the research seminar at the Laboratory for Comparative Social Research, Higher School of Economics, for comments on earlier drafts and fruitful discussions. I am also very grateful to the editor, Kenneth Benoit, and four anonymous reviewers for their helpful comments and suggestions. Finally, I thank Alexei Stephenson for proofreading the manuscript. The study was prepared within the framework of the Basic Research Program at the National Research University Higher School of Economics (HSE) and supported within the framework of a subsidy by the Russian Academic Excellence Project ‘5-100’.

¹ I use the term “modernization-emancipation family” to refer to the postmaterialism index, the index of survival/self-expression values, the index of traditional/secular-rational values, and the EVI to distinguish these value indices (which have much in common and were developed within the same theoretical framework) from other theories utilizing the concept of “values,” such as Schwartz's theory of human values (Schwartz Reference Schwartz1992).

² The item “giving people more say about how things are done at their jobs and in their communities” was not featured in the version of the data file Alemán and Woods (Reference Alemán and Woods2016, 1053) used.

³ I test the invariance of the EVI across all ten of the cultural zones defined by Welzel. These zones are the Islamic East, the Indic East, the Sinic East, the Orthodox East, the Old West, the Reformed West, the Returned West, the New West, Latin America, and sub-Saharan Africa. Aléman and Woods present the results of MGCFA for emancipative values only for four zones: the New West, the Old West, the Reformed West, and Sub-Saharan Africa.

⁴ Aléman and Woods did not report certain important details about their analytical procedures. Thus, it is not clear from their paper whether they accounted for the categorical nature of the indicators used to define emancipative values. Due to the fact that nine of twelve observed indicators are categorical ordered variables, I use the WLSMV estimator for parameter estimation, which is the default option for dealing with categorical or nonnormal responses in many structural equation modeling software packages (including MPLUS 7.11, which I use).

⁵ Configural invariance requires only that the factor loading patterns are the same across difference groups (that is, the same indicators have nonzero loadings on the same constructs in all groups). This ensures that a proposed model measures the same construct in all groups. I provide a more formal definition of the concept of measurement invariance and its different levels in the supplementary material (Appendix B, Section B1)

⁶ In various methodological papers, the value of 0.30 is referred to as a minimal (i.e., most tolerant) cutoff for meaningful factor loadings (Brown Reference Brown2015).

⁷ See a detailed description of this method in the supplementary material (Appendix B, Section B2).

⁸ A word of caution is necessary here. The approximate Bayesian approach to invariance testing is a new method, and there is much uncertainty about many aspects of its application. I follow the guidelines proposed by the developers of this approach (these guidelines were used in almost all practical applications of the method I am aware of), but some new research suggests that more nuanced tests may be necessary to ensure comparability using this approach (Hoijtink and Van de Schoot Reference Hoijtink and van de Schoot2017). The interested readers can find a detailed discussion of the approximate Bayesian invariance analysis and related methodological issues in the supplementary material (Appendix B, Sections B2, B4, and B5).

⁹ In Freedom Rising, Welzel extensively employs multilevel modeling in his tests of the emancipation theory and pays special attention to the cross-level interactions between the effects of individual values and the macro-level prevalence of the emancipatory worldview. He correctly argues that the latter factor may leverage the strength of an individual-level association between value orientations and, say, propensity to participate in political protests. However, findings from such analyses are reliable only when both the macro-level measures of value prevalence and the micro-level measures of individual value priorities are comparable cross-nationally. The existence of ecological effects (Welzel calls them “elevator” effects) does not itself justify the use of noninvariant measures.

¹⁰ Note that here the response variable is assumed to be normally distributed. However, the same reasoning can be easily extended to the case of binary or ordered polytomous responses.

¹¹ Since the average value of ε_ijg is usually expected to be zero, it does not cause bias in the estimate of country mean score on j.

¹² I do not argue here that response style is the primary cause of noninvariance of the EVI—other method factors can contribute as well. Moreover, empirical research on response bias in cross-cultural perspective shows that cross-national differences in average response styles are often not large (He and Van der Vijver Reference He and van de Vijver2015). I use the case of response style as a simple and intuitively clear example of a bias in quantities of interest due to method effects. A similar example can be found in Van Vlimmeren, Moors, and Gelissen (Reference Van Vlimmeren, Moors and Gelissen2017, 2742).

¹³ Let μ_{y ₁} and μ_{y ₂} be the vectors of observed country means on y ₁ and y ₂ and μ_η₁ and μ_η₂ be the vectors of unobserved country means on latent preferences η₁ and η₂, respectively. Assuming that the country-level error terms u _{y ₁} and u _{y ₂} are each uncorrelated with both μ_η₁ and μ_η₂ and E(ε_ijg) = 0 for each g, the aggregate correlation of observed mean vectors is given by

$$\begin{equation*} \begin{array}{@{}l@{}} cor\left( {{\mu _{{y_1}}},{\mu _{{y_2}}}} \right) = {\rm{\ }}\frac{{cov\left( {{\mu _{{y_1}}},{\mu _{{y_2}}}} \right)}}{{var\left( {{\mu _{{y_1}}}} \right)var\left( {{\mu _{{y_2}}}} \right)}} = {\rm{\ }}\frac{{cov\left( {{\mu _{{\eta _1}}} + {u_{{y_1}}},{\rm{\ }}{\mu _{{\eta _2}}} + {u_{{y_2}}}} \right)}}{{var\left( {{\mu _{{\eta _1}}} + {u_{{y_1}}}} \right)var\left( {{\rm{\ }}{\mu _{{\eta _2}}} + {u_{{y_2}}}} \right)}}\\[4pt] \quad \quad \quad \quad \quad = {\rm{\ }}\frac{{cov\left( {{\mu _{{\eta _1}}},{\rm{\ }}{\mu _{{\eta _2}}}} \right) + {\rm{\ }}cov\left( {{\mu _{{\eta _1}}},{u_{{y_2}}}} \right) + {\rm{\ }}cov\left( {{u_{{y_1}}},{\rm{\ \ }}{\mu _{{\eta _2}}}} \right) + {\rm{\ }}cov\left( {{u_{{y_1}}},{u_{{y_2}}}} \right)}}{{\left[ {var\left( {{\mu _{{\eta _1}}}} \right) + {\rm{\ }}var\left( {{u_{{y_1}}}} \right)} \right]\left[ {var\left( {{\rm{\ }}{\mu _{{\eta _2}}}} \right) + {\rm{\ }}var\left( {{u_{{y_2}}}} \right)} \right]}}\\[4pt] \quad \quad \quad \quad \quad = {\rm{\ }}\frac{{cov\left( {{\mu _{{\eta _1}}},{\rm{\ }}{\mu _{{\eta _2}}}} \right) + {\rm{\ }}cov\left( {{u_{{y_1}}},{u_{{y_2}}}} \right)}}{{\left[ {var\left( {{\mu _{{\eta _1}}}} \right) + {\rm{\ }}var\left( {{u_{{y_1}}}} \right)} \right]\left[ {var\left( {{\rm{\ }}{\mu _{{\eta _2}}}} \right) + {\rm{\ }}var\left( {{u_{{y_2}}}} \right)} \right]}} \end{array} \end{equation*}$$

¹⁴ The proportion of respondents, expressed in fractions of 1, saying they are willing to fight for their country when responding to the following WVS question:

Of course, we all hope that there will not be another war, but if it were to come to that, would you be willing to fight for your country? (The response options are “yes” and “no.”)

¹⁵ One may consider a slightly different situation where one item per first-order construct forming the EVI is excluded from the questionnaire. Can this reduced item combination be used to approximate country means on emancipative values or not? In addition, how should missing values be treated with respect to the index? Should all individuals with at least one missing response on any of the twelve items be excluded from the sample, because their overall performances within the emancipation domain cannot be assessed due to insufficient information? Or can missing values be inferred using the observed sample correlations between the manifest items?

¹⁶ Adcock and Collier (Reference Adcock and Collier2001) in their influential methodological paper on measurement validity state that scholars do not need to use validation procedures based on the notion of external validity (Adcock and Collier use the term “nomological validity”) “if other types of validation raise concerns about the validity of a given indicator and the scores it produces” (542). By “other types” they mean assessments of content and convergent validity, thus favoring internal consistence over external validity as a more fundamental component of model quality.

¹⁷ Welzel and Inglehart (Reference Welzel and Inglehart2016, 1084) make this transparent when they state that a “score in emancipative values [. . .] tells us something real: How little appeal emancipatory ideals have in a society. Whether the majority of a society wishes to be measured against the standards of emancipation is a different question.”

¹⁸ In formative models, different indicators often have different effects on the unobservable construct. To estimate the particular contribution of each indicator, the formative model should be identified, which is typically done by relating the construct to several reflective indicators or other endogenous constructs (Jarvis, MacKenzie, and Podsakoff Reference Jarvis, MacKenzie and Podsakoff2003, 213–215). In the multiple-group setup, the cross-group equality of the effects of all formative indicators on the construct is required to ensure the equivalence of the overall measurement model (Diamantopoulos and Papadopoulos Reference Diamantopoulos and Papadopoulos2010).

¹⁹ Note that in this example all components of the EVI are computed according to Welzel's original procedure.

²⁰ Interestingly, the correlation between the EDI and the alternative version of country mean scores on “choice,” which is computed using the approximately invariant Bayesian MGCFA model, is 15.4% lower (ρ = 0.701). This agrees strongly with the finding above that the use of raw means may overestimate the strength of association between an aggregate-level attitudinal construct and some other outcome.

²¹ Notice that the effect of “autonomy” on the EDI also retains significance when controlled for either “choice” or “equality” though becomes dramatically lower (Table 3: Models 4 and 6). However, as indicated above, the bivariate association of “autonomy” with the EDI is substantively weaker than those of the latter two dimensions of emancipative values.

²² The EVI correlates more strongly than any of its components with each variable presented in the referenced table. This is not surprising because of a well-known principle of aggregation (Rushton, Brainerd, and Pressley Reference Rushton, Brainerd and Pressley1983), stating that the sum of a set of multiple measurements is a more stable and representative estimator than any single measurement. I do not challenge the well-documented explanatory power of various value measures. My point here is that uncritical use of total score, instead of proper analysis of the effects of partial scores, in some cases may prevent researchers from identifying factors which are in fact necessary and/or sufficient conditions of social change, as a result of confusing the impact of such factors with the much weaker and admittedly non-causal influence of others.

References

REFERENCES

Abramson, Paul, and Inglehart, Ronald. 1995. Value Change in Global Perspective. Ann Arbor, MI: University of Michigan Press.CrossRef Google Scholar

Adcock, Robert, and Collier, David. 2001. “Measurement Validity: A Shared Standard for Qualitative and Quantitative Research.” American Political Science Review 95 (3): 529–46.CrossRef Google Scholar

Alemán, José, and Woods, Dwayne. 2016. “Value Orientations from the World Values Survey: How Comparable Are They Cross-Nationally?” Comparative Political Studies 49 (8): 1039–67.CrossRef Google Scholar

Alexander, Amy, Inglehart, Ronald, and Welzel, Christian. 2012. “Measuring Effective Democracy: A Defense.” International Political Science Review 33 (1): 41–62.CrossRef Google Scholar

Alexander, Amy, Inglehart, Ronald, and Welzel, Christian. 2016. “Emancipating Sexuality: Breakthroughs into a Bulwark of Tradition.” Social Indicators Research 129 (2): 909–35.CrossRef Google Scholar PubMed

Alexander, Amy, and Welzel, Christian. 2011. “Empowering Women: The Role of Emancipative Beliefs.” European Sociological Review 27 (3): 364–84.CrossRef Google Scholar

Alwin, Duane. 2007. Margins of Error: A Study of Reliability in Survey Measurement. New York: John Wiley & Sons.CrossRef Google Scholar

Andersen, Robert, and Fetner, Tina. 2008. “Economic Inequality and Intolerance: Attitudes toward Homosexuality in 35 Democracies.” American Journal of Political Science 52 (4): 942–58.CrossRef Google Scholar

Bergh, Johannes. 2007. “Gender Attitudes and Modernization Processes.” International Journal of Public Opinion Research 19 (1): 5–23.CrossRef Google Scholar

Brown, Timothy. 2015. Confirmatory Factor Analysis for Applied Research. London, UK: Guilford Press.Google Scholar

Cieciuch, Jan, Davidov, Eldad, Algesheimer, Rene, and Schmidt, Peter. 2017. “Testing for Approximate Measurement Invariance of Human Values in the European Social Survey.” Sociological Methods & Research. Published online ahead of print April 10, 2017. https://doi.org/10.1177/0049124117701478 CrossRef Google Scholar

Coltman, Tim, Devinney, Timothy, Midgley, David, and Venaik, Sunil. 2008. “Formative versus Reflective Measurement Models: Two Applications of Formative Measurement.” Journal of Business Research 61 (12): 1250–62.CrossRef Google Scholar

Davidov, Eldad. 2008. “A Cross-Country and Cross-Time Comparison of the Human Values Measurements with the Second Round of the European Social Survey.” Survey Research Methods 2 (1): 33–46.Google Scholar

Davidov, Eldad, Meuleman, Bart, Cieciuch, Jan, Schmidt, Peter, and Billiet, Jaak. 2014. “Measurement Equivalence in Cross-National Research.” Annual Review of Sociology 40: 55–75.CrossRef Google Scholar

Davis, Darren W., Dowley, Kathleen M., and Silver, Brian D.. 1999. “Postmaterialism in World Societies: Is It Really a Value Dimension?” American Journal of Political Science 43 (3): 935–62.CrossRef Google Scholar

Diamantopoulos, Adamantios, and Papadopoulos, Nicolas. 2010. “Assessing the Cross-National Invariance of Formative Measures: Guidelines for International Business Researchers.” Journal of International Business Studies 41 (2): 360–70.CrossRef Google Scholar

He, Jia, and van de Vijver, Fons. 2015. “Effects of a General Response Style on Cross-Cultural Comparisons Evidence from the Teaching and Learning International Survey.” Public Opinion Quarterly 79 (S1): 267–90.CrossRef Google Scholar

Hoijtink, Herbert, and van de Schoot, Rens. 2017. “Testing Small Variance Priors Using Prior-Posterior Predictive P-Values”. Psychological Methods. Published online ahead of print April 3, 2017. https://doi.org/10.1037/met0000131.CrossRef Google Scholar

Inglehart, Ronald. 1971. “The Silent Revolution in Europe: Intergenerational Change in Post-Industrial Societies.” American Political Science Review 65 (4): 991–1017.CrossRef Google Scholar

Inglehart, Ronald. 1977. The Silent Revolution. Princeton, NJ: Princeton University Press.Google Scholar

Inglehart, Ronald. 1990. Culture Shift in Advanced Industrial Society. Princeton, NJ: Princeton University Press.CrossRef Google Scholar

Inglehart, Ronald. 1997. Modernization and Postmodernization: Cultural, Economic, and Political Change in 43 Societies. Princeton, NJ: Princeton University Press.CrossRef Google Scholar

Inglehart, Ronald, and Appel, David. 1989. “The Rise of Postmaterialist Values and Changing Religious Orientations, Gender Roles and Sexual Norms.” International Journal of Public Opinion Research 1 (1): 45–75.CrossRef Google Scholar

Inglehart, Ronald, and Baker, Wayne. 2000. “Modernization, Cultural Change and the Persistence of Traditional Values.” American Sociological Review 65 (1): 19–51.CrossRef Google Scholar

Inglehart, Ronald, and Norris, Pippa. 2003. Rising Tide: Gender Equality and Cultural Change Around the World. Cambridge, UK: Cambridge University Press.CrossRef Google Scholar

Inglehart, Ronald, and Norris, Pippa. 2004. Sacred and Secular: Religion and Politics Worldwide. Cambridge, UK: Cambridge University Press.Google Scholar

Inglehart, Ronald F., Ponarin, Eduard, and Inglehart, Ronald C.. 2017. “Cultural Change, Slow and Fast: The Distinctive Trajectory of Norms Governing Gender Equality and Sexual Orientation.” Social Forces 95 (4): 1313–40.Google Scholar

Inglehart, Ronald, Puranen, Bi, and Welzel, Christian. 2015. “Declining Willingness to Fight for One's Country: The Individual-Level Basis of the Long Peace.” Journal of Peace Research 52 (4): 418–34.CrossRef Google Scholar

Inglehart, Ronald, and Welzel, Christian. 2003. “Political Culture and Democracy: Analyzing Cross-Level Linkages.” Comparative Politics 36 (1): 61–79.CrossRef Google Scholar

Inglehart, Ronald, and Welzel, Christian. 2005. Modernization, Cultural Change, and Democracy: The Human Development Sequence. Cambridge, UK: Cambridge University Press.Google Scholar

Inglehart, Ronald, and Welzel, Christian. 2010. “Changing Mass Priorities: The Link between Modernization and Democracy.” Perspectives on Politics 8 (2): 551–67.CrossRef Google Scholar

Ippel, Lianne, Gelissen, John, and Moors, Guy. 2014. “Investigating Longitudinal and Cross Cultural Measurement Invariance of Inglehart's Short Post-Materialism Scale.” Social Indicators Research 115 (3): 919–32.CrossRef Google Scholar

Jarvis, Cheryl Burke, MacKenzie, Scott, and Podsakoff, Philip. 2003. “A Critical Review of Construct Indicators and Measurement Model Misspecification in Marketing and Consumer Research.” Journal of Consumer Research 30 (2): 199–218.CrossRef Google Scholar

Law, Kenneth, Wong, Chi-Sum, and Mobley, William. 1998. “Toward a Taxonomy of Multidimensional Constructs.” Academy of Management Review 23 (4): 741–55.CrossRef Google Scholar

MacIntosh, Randall. 1998. “Global Attitude Measurement: An Assessment of the World Values Survey Postmaterialism Scale.” American Sociological Review 63 (3): 452–64.CrossRef Google Scholar

Muthén, Bengt, and Asparouhov, Tihomir. 2013. “BSEM Measurement Invariance Analysis.” Mplus Web Notes 17. https://www.statmodel.com/examples/webnotes/webnote17.pdf Google Scholar

Oberski, Daniel. 2014. “Evaluating Sensitivity of Parameters of Interest to Measurement Invariance in Latent Variable Models.” Political Analysis 22 (1): 45–60.CrossRef Google Scholar

Przeworski, Adam, and Teune, Henry. 1966. “Equivalence in Cross-National Research.” Public Opinion Quarterly 30 (4): 551–68.CrossRef Google Scholar

Rushton, Philippe, Brainerd, Charles, and Pressley, Michael. 1983. “Behavioral Development and Construct Validity: The Principle of Aggregation.” Psychological Bulletin 94 (1): 18–38.CrossRef Google Scholar

Sacchi, Stefan. 1998. “The Dimensionality of Postmaterialism: An Application of Factor Analysis to Ranked Preference Data.” European Sociological Review 14 (2): 151–75.CrossRef Google Scholar

Schwartz, Shalom. 1992. “Universals in the Content and Structure of Values: Theoretical Advances and Empirical Tests in 20 Countries.” Advances in Experimental Social Psychology 25: 1–65.CrossRef Google Scholar

Seligson, Mitchell. 2002. “The Renaissance of Political Culture or the Renaissance of the Ecological Fallacy?” Comparative Politics 34 (3): 273–92.CrossRef Google Scholar

Sijtsma, Klaas. 2009. “On the Use, the Misuse, and the Very Limited Usefulness of Cronbach's Alpha.” Psychometrika 74 (1): 107–20.CrossRef Google Scholar PubMed

Steenkamp, Jan-Benedict, and Baumgartner, Hans. 1998. “Assessing Measurement Invariance in Cross-National Consumer Research.” Journal of Consumer Research 25 (1): 78–107.CrossRef Google Scholar

Stegmueller, Daniel. 2011. “Apples and Oranges? The Problem of Equivalence in Comparative Research.” Political Analysis 19 (4): 471–87.CrossRef Google Scholar

Tavakol, Mohsen, and Dennick, Reg. 2011. “Making Sense of Cronbach's Alpha.” International Journal of Medical Education 2: 53–5.CrossRef Google Scholar PubMed

Van De Schoot, Rens, Kluytmans, Anouck, Tummers, Lars, Lugtig, Peter, Hox, Joop, and Muthén, Bengt. 2013. “Facing off with Scylla and Charybdis: a Comparison of Scalar, Partial, and the Novel Possibility of Approximate Measurement Invariance.” Frontiers in psychology 4, 770. https://doi.org/10.3389/fpsyg.2013.00770 CrossRef Google Scholar PubMed

Van Vlimmeren, Eva, Moors, Guy, and Gelissen, John. 2017. “Clusters of Cultures: Diversity in Meaning of Family Value and Gender Role Items cross Europe.” Quality & Quantity 51 (6): 2737–60.CrossRef Google Scholar

Welzel, Christian. 2006. “Democratization as an Emancipative Process: The Neglected Role of Mass Motivations.” European Journal of Political Research 45 (6): 871–96.CrossRef Google Scholar

Welzel, Christian. 2007. “Are Levels of Democracy Affected by Mass Attitudes? Testing Attainment and Sustainment Effects on Democracy.” International Political Science Review 28 (4): 397–424.CrossRef Google Scholar

Welzel, Christian. 2010. “How Selfish are Self-Expression Values? A Civicness Test.” Journal of Cross-Cultural Psychology 41 (2): 152–74.CrossRef Google Scholar

Welzel, Christian. 2013. Freedom Rising. New York, NY: Cambridge University Press.CrossRef Google Scholar

Welzel, Christian, and Deutsch, Franziska. 2012. “Emancipative Values and non-Violent Protest: The Importance of ‘Ecological’ Effects.” British Journal of Political Science 42 (2): 465–79.CrossRef Google Scholar

Welzel, Christian, and Inglehart, Ronald. 2006. “Emancipative Values and Democracy: Response to Hadenius and Teorell.” Studies in Comparative International Development 41 (3): 74–94.CrossRef Google Scholar

Welzel, Christian, and Inglehart, Ronald. 2016. “Misconceptions of Measurement Equivalence Time for a Paradigm Shift.” Comparative Political Studies 49 (8): 1068–94.CrossRef Google Scholar

Welzel, Christian, Inglehart, Ronald, and Klingemann, Hans-Dieter. 2003. “The Theory of Human Development: A Cross-Cultural Analysis.” European Journal of Political Research 42 (3): 341–80.CrossRef Google Scholar

FIGURE 1. Measurement model for the EVI

Note: Rectangles represent observed variables, ovals latent ones.

TABLE 1. Group-Specific CFAs of 12 Variables from the Sixth Wave of the WVS for Ten Cultural Zones (2010–2014)

TABLE 2. Pairwise Correlations between Effective Democracy and Various Components of Emancipative Values

TABLE 3. Regressing Effective Democracy on Various Components of Emancipative Values

Sokolov supplementary material

Sokolov supplementary material 1

PDF 1.8 MB

Sokolov Dataset

Dataset

https://doi.org/10.7910/DVN/TPGPCR

Link

Submit a response

Comments

No Comments have been published for this article.

Article contents

The Index of Emancipative Values: Measurement Model Misspecifications

Abstract

Information

Access options

Article purchase

Temporarily unavailable

ASSESSMENT OF THE ORIGINAL MEASUREMENT MODEL FOR EMANCIPATIVE VALUES

Emancipative Values: Definition and Measurement

Internal Validity and Comparability of Emancipative Values

Why Do Welzel and Inglehart Think the Results Above Are Not as Relevant as Their Critics Say?

CHALLENGING WELZEL AND INGLEHART'S DEFENSE

Does Macro-Level Construct Validity Not Depend on Micro-Level Measurement?

Are Emancipative Values a Truly Formative Construct?

Are Explanatory Power and a Convincing Theory Sufficient When Assessing the Quality of Formative Constructs?

DISSCUSION AND CONCLUDING REMARKS

SUPPLEMENTARY MATERIAL

Footnotes

References

REFERENCES

Sokolov supplementary material

Sokolov Dataset

Comments

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests