Skip to main content Accessibility help
Hostname: page-component-768dbb666b-6zkrn Total loading time: 0.594 Render date: 2023-02-03T02:18:31.978Z Has data issue: true Feature Flags: { "useRatesEcommerce": false } hasContentIssue true

Apples and Oranges? The Problem of Equivalence in Comparative Research

Published online by Cambridge University Press:  04 January 2017

Daniel Stegmueller*
Nuffield College, University of Oxford, New Road, Oxford, OX1 1NF, United Kingdom, and School of Social Sciences, University of Mannheim, Germany e-mail:
Rights & Permissions[Opens in a new window]


HTML view is not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Researchers in comparative research are increasingly relying on individual level data to test theories involving unobservable constructs like attitudes and preferences. Estimation is carried out using large-scale cross-national survey data providing responses from individuals living in widely varying contexts. This strategy rests on the assumption of equivalence, that is, no systematic distortion in response behavior of individuals from different countries exists. However, this assumption is frequently violated with rather grave consequences for comparability and interpretation. I present a multilevel mixture ordinal item response model with item bias effects that is able to establish equivalence. It corrects for systematic measurement error induced by unobserved country heterogeneity, and it allows for the simultaneous estimation of structural parameters of interest.

Copyright © The Author 2011. Published by Oxford University Press on behalf of the Society for Political Methodology 


Aitkin, M. 1999. A general maximum likelihood analysis of variance components in generalized linear models. Biometrics 55: 117–28.CrossRefGoogle ScholarPubMed
Bartels, Larry M. 1996. Pooling disparate observations. American Journal of Political Science 40: 905–42.CrossRefGoogle Scholar
Baumgartner, Hans, and Steenkamp, Jan-Benedict. 1998. Multi-group latent variable models for varying numbers of items and factors with cross-national and longitudinal applications. Marketing Letters 9: 2135.CrossRefGoogle Scholar
Baumgartner, Hans, and Steenkamp, Jan-Benedict. 2001. Response styles in marketing research: A cross-national investigation. Journal of Marketing Research 38: 143–56.CrossRefGoogle Scholar
Baumgartner, Hans, and Steenkamp, Jan-Benedict. 2004. Issues in assessing measurement invariance in cross-national research. Presentation at Symposium on Cross-Cultural Survey Research, University of Illinois, Urbana-Champaign.Google Scholar
Becker, Gary S. 1993. Human capital: A theoretical and empirical analysis with special reference to education. Chicago, IL: University of Chicago Press.CrossRefGoogle Scholar
Burnham, Kenneth P., and Anderson, David. 2003. Model selection and multi-model inference. A practical information-theoretic approach. New York: Springer.Google Scholar
Byrne, Barbara M., Shavelson, Richard J., and Muthén, Bengt. 1989. Testing for the equivlence of factor covariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin 105: 456–66.CrossRefGoogle Scholar
Clinton, Joshua D., Jackman, Simon, and Rivers, Doug. 2004. The statistical analysis of roll call voting: A unified approach. American Political Science Review 98: 355–70.CrossRefGoogle Scholar
Croon, Marcel, and Bolck, A. 1997. On the use of factor scores in structural equations models. Technical report No. 97.10.102/7. The Netherlands: Work and Organization Research Center, Tilburg University.Google Scholar
Cusack, Thomas, Iversen, Torbern, and Rehm, Phillip. 2005. Risks at work: The demand and supply sides of government redistribution. Oxford Review Of Economic Policy 22: 365–89.Google Scholar
Davidov, Eldad. 2009. Measurement equivalence of nationalism and constructive patriotism in the ISSP 2003: 34 countries in a comparative perspective. Political Analysis 17: 6482.CrossRefGoogle Scholar
De Boeck, P., and Wilson, M. 2004. Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer.CrossRefGoogle Scholar
De Jong, Martijn G., and Steenkamp, Jan-Benedict E. M. 2010. Finite mixture multilevel multidimensional ordinal IRT models for large scale cross-cultural research. Psychometrika 75: 332.CrossRefGoogle Scholar
Delhey, Jan, and Newton, Kenneth. 2005. Predicting cross-national levels of social trust: Global pattern or nordic exceptionalism? European Sociological Review 21: 311–27.CrossRefGoogle Scholar
Estevez-Abe, Margarita, Iversen, Torben, and Soskice, David. 2001. Social protection and the formation of skills. A reinterpretation of the welfare state. In Varieties of capitalism. The institutioinal foundations of comparative advantage, ed. Hall, Peter A. and Soskice, David W., 145–83. Oxford: Oxford University Press.Google Scholar
Fennessey, James. 1986. The general linear model: A new perspective on some familiar topics. American Journal of Sociology 74: 127.CrossRefGoogle Scholar
Fontaine, Johnny R. J. 2005. Equivalence. In Encyclopedia of social measurement. Vol. 1, A-E, ed. Kempf-Leonard, Kimberly, 803–18. New York: Academic Press.Google Scholar
Fox, Jean-Paul, and Glas, Cees A. W. 2001. Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika 66: 271–88.CrossRefGoogle Scholar
Gelman, Andrew, and Stern, Hal. 2006. The difference between ‘significant’ and ‘not significant’ is not itself statistically significant. The American Statistician 60: 328–31.CrossRefGoogle Scholar
Gouveia, Miguel, and Masia, Neal A. 1998. Does the median voter model explain the size of government? Evidence from the states. Public Choice 97: 159–77.CrossRefGoogle Scholar
Greene, William, and Hensher, David. 2010. Modeling ordered choices: A primer. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Hambleton, Ronald K., Swaminathan, H., and Jane Rogers, H. 1991. Fundamentals of item response theory. Newbury Park: Sage.Google Scholar
Heckman, J., and Singer, B. 1984. A method for minimizing the impact of distributional assumptions in econometric models for duration data. Econometrica 52: 271320.CrossRefGoogle Scholar
Hofstede, Geert H. 2001. Culture's consequences: Comparing values, behaviors, institutions, and organizations across nations. Thousand Oaks: Sage.Google Scholar
Hooghe, Liesbet, and Marks, Gary. 2004. Does identity or economic rationality drive public opinion on European integration? Political Science & Politics 37: 415–20.Google Scholar
Hooghe, Marc, Reeskens, Tim, Stolle, Dietlind, and Trappers, Ann. 2009. Ethnic diversity and generalized trust in Europe. A cross-national multilevel study. Comparative Political Studies 42: 198223.CrossRefGoogle Scholar
Horn, John L., and McArdle, Jack J. 1992. A practical and theoretical guide to measurement invariance in aging research. Experimental Aging Research 18: 117–44.CrossRefGoogle ScholarPubMed
Hyman, Herbert H. 1972. Secondary analysis of sample surveys: Principles, procedures and potentialities. New York: Wiley.Google Scholar
Iversen, Torben. 2005. Capitalism, democracy, and welfare. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Iversen, Torben. 2006. Class politics is dead! Long live class politics! Apolitical economy perspective on the new partisan politics. APSA-CP 17: 16.Google Scholar
Iversen, Torben, and Soskice, David. 2001. An asset theory of social policy preferences. American Political Science Review 95: 875–93.Google Scholar
Jackman, Simon. 2008. Measurement. In Oxford Handbook of Political Methodology, ed. Box-Steffensmeier, Janet M., Brady, Henry E., and Collier, David, 119–51. Oxford: Oxford University Press.Google Scholar
Jesse, Stephen A. 2009. Spatial voting in the 2004 presidential election. American Political Science Review 103: 5981.CrossRefGoogle Scholar
Johnson, Timothy P. 1998. Approaches to equivalence in cross-cultural and cross-national survey research. In ZUMA-Nachrichten Spezial Band 3: Cross-cultural survey equivalence, ed. Harkness, J. Mannheim: ZUMA.Google Scholar
Johnson, Timothy, Kulesa, Patrick, Cho, Young Ik, and Shavitt, Sharon. 2005. The relation between culture and response styles. Evidence from 19 countries. Journal of Cross-Cultural Psychology 36: 264–77.CrossRefGoogle Scholar
Johnson, Valen E., and Albert, Jim H. 1999. Ordinal data modeling. New York: Springer.Google Scholar
Jöreskog, Karl G. 1971. Simultaneous factor analysis in several populations. Psychometrika 36: 409–26.CrossRefGoogle Scholar
Kim, Jae-On, and Mueller, Charles W. 1978. Factor analysis. Thousand Oaks: Sage.CrossRefGoogle ScholarPubMed
King, Gary, Keohane, Robert O., and Verba, Sidney. 1994. Designing social inquiry. Princeton: Princeton University Press.Google Scholar
King, Gary, Murray, Christopher J. L., Salomon, Joshua A., and Tandon, Ajay. 2004. Enhancing the validity and cross-cultural comparability of measurement in survey research. American Political Science Review 98: 191207.CrossRefGoogle Scholar
King, Gary, and Wand, Jonathan. 2007. Comparing incomparable survey responses: Evaluating and selecting anchoring vignettes. Political Analysis 15: 4666.CrossRefGoogle Scholar
Laird, N. 1978. Nonparametric maximum likelihood estimation of a mixture distribution. Journal of the American Statistical Association 73: 805–11.CrossRefGoogle Scholar
Lazarsfeld, Paul F. 1959. Latent structure analysis. In Psychology: A study of a science, Vol. III, ed. Koch, Sigmund. New York: McGraw-Hill.Google Scholar
Lee, Sik-Yum, and Shi, Jian-Qing. 2001. Maximum likelihood estimation of two-level latent variable models with mixed continuous and polytomous data. Biometrics 57: 787–94.CrossRefGoogle ScholarPubMed
Lesaffre, Emmanuel, and Spiessens, Bart. 2001. On the effect of the number of quadrature points in a logistic random-effects model: An example. Journal of the Royal Statistical Society A 50: 325–35.Google Scholar
Little, Roderick J.A., and Rubin, Donald B. 2002. Statistical analysis with missing data. Hoboken: Wiley.CrossRefGoogle ScholarPubMed
Lubke, Gitta H., and Muthén, Bengt O. 2004. Applying multigroup confirmatory factor models for continuous outcomes to Likert scale data complicates meaningful group comparisons. Structural Equation Modeling 11: 514–34.CrossRefGoogle Scholar
Martin, Andrew D., and Quinn, Kevin M. 2002. Dynamic ideal point estimation via Markov Chain Monte Carlo for the U. S. Supreme Court, 1953-1999. Political Analysis 10: 134–53.CrossRefGoogle Scholar
McCullagh, P., and Nelder, J. A. 1989. Generalized linear models. London: Chapman & Hall.CrossRefGoogle Scholar
McLachlan, Geoffrey, and Peel, David. 2000. Finite mixture models. New York: Wiley.CrossRefGoogle ScholarPubMed
McLachlan, Geoffrey J., and Krishnan, Thriyambakam. 2008. The EM algorithm and extensions. New York: Wiley.CrossRefGoogle Scholar
Mellenbergh, Gideon J. 1994. Generalized linear item response theory. Psychological Bulletin 115: 300–07.CrossRefGoogle Scholar
Meltzer, Allan H., and Richard, Scott F. 1981. A rational theory of the size of government. Journal of Political Economy 89: 914–27.CrossRefGoogle Scholar
Meredith, William. 1993. Measurement invariance, factor analysis and factorial invariance. Psychometrika 58: 525–43.CrossRefGoogle Scholar
Mill, John Stuart. 2007. Utilitarianism, liberty & representative government. Rockville, MD: Wildside Press.Google Scholar
Millsap, Roger E., and Kwok, Oi-Man. 2004. Evaluating the impact of partial factorial invariance on selection in two populations. Psychological Methods 9: 93115.CrossRefGoogle ScholarPubMed
Millsap, Roger E., and Yun-Tein, Jenn. 2004. Assessing factorial invariance in ordered-categorical measures. Multivariate Behavioral Research 39: 479515.CrossRefGoogle Scholar
Moene, Karl Ove, and Wallerstein, Michael. 2003. Earnings inequality and welfare spending: A disaggregated analysis. World Politics 55: 485516.CrossRefGoogle Scholar
Moustaki, Irini. 2000. A latent variable model for ordinal variables. Applied Psychological Measurement 24: 211–23.CrossRefGoogle Scholar
Moustaki, Irini. 2003. A general class of latent variable models for ordinal manifest variables with covariate effects on the manifest and latent variables. British Journal of Mathematical and Statistical Psychology 56: 337–57.CrossRefGoogle ScholarPubMed
Moustaki, Irini, Jöreskog, Karl G., and Mavridis, Dimitris. 2004. Factor models for ordinal variables with covariate effects on the manifest and latent variables: A comparison of LISREL and IRT approaches. Structural Equation Modeling 11: 487513.CrossRefGoogle Scholar
Moustaki, Irini, and Knott, Martin. 2000. Generalized latent trait models. Psychometrika 65: 391411.CrossRefGoogle Scholar
Muthén, Bengt. 1989. Latent variable modeling in heterogeneous populations. Psychometrika 54: 557–85.CrossRefGoogle Scholar
O'Rourke, Kevin H., and Sinnott, Richard. 2006. The determinants of individual attitudes towards immigration. European Journal of Political Economy 22: 838–61.CrossRefGoogle Scholar
Quinn, Kevin M. 2004. Bayesian factor analysis for mixed ordinal and continuous responses. Political Analysis 12: 338–53.CrossRefGoogle Scholar
Rabe-Hesketh, Sophia, Skrondal, Anders, and Pickles, Andrew. 2004. Generalized multilevel structural equation modeling. Psychometrika 69: 167–90.CrossRefGoogle Scholar
Reeskens, Tim, and Hooghe, Marc. 2008. Cross-cultural measurement equivalence of generalized trust. Evidence from the European Social Survey (2002 and 2004). Social Indicators Research 85: 515–32.Google Scholar
Rijmen, F., Tuerlinckx, F., De Boeck, P., and Kuppens, P. 2003. A nonlinear mixed model framework for item response theory. Psychological Methods 8: 185205.CrossRefGoogle ScholarPubMed
Rodrigiuez, F. C. 1999. Does distributional skewness lead to redistribution? Evidence from the United States. Economics & Politics 11: 171–99.CrossRefGoogle Scholar
Rodrik, Dani, and Mayda, Anna Maria. 2005. Why are some people (and countries) more protectionist than others? European Economic Review 49: 1393–430.Google Scholar
Royall, Richard M. 1986. Model robust confidence intervals using maximum Likelihood estimators. International Statistical Review 54: 221–26.CrossRefGoogle Scholar
Salzberger, Thomas, Sinkovics, Rudolf, and Schlegelmilch, Bodo. 1999. Data equivalence in cross-cultural research: A comparison of classical test theory and latent trait theory based approaches. Australasian Journal of Marketing 7: 2338.CrossRefGoogle Scholar
Samejima, Fumiko. 1969. Estimation of latent ability using a response pattern of graded scores (Psychometric Monograph No. 17). Richmond: Psychometric Society.Google Scholar
Scheve, Kenneth, and Stasavage, David. 2006. Religion and preferences for social insurance. Quarterly Journal of Political Science 1: 255–86.CrossRefGoogle Scholar
Schwarz, Norbert. 2003. Culture-sensitive context effects: A challenge for cross-cultural surveys. In Cross-cultural survey methods, ed. Harkness, Janet A., van de Vijver, Fons J. R., and Mohler, Peter Ph, 93100. New Jersey: Wiley.Google Scholar
Sinn, Hans-Werner. 1995. A theory of the welfare state. Scandinavian Journal of Economics 97: 495526.CrossRefGoogle Scholar
Skrondal, Anders, and Laake, Petter. 2001. Regression among factor scores. Psychometrika 66: 563757.CrossRefGoogle Scholar
Skrondal, Anders, and Rabe-Hesketh, Sophia. 2004. Generalized latent variable modeling: Multilevel, longitudinal and structural equation models. Boca Raton, IL: Chapman & Hall.CrossRefGoogle Scholar
Song, Xin-Yuan, and Lee, Sik-Yum. 2004. Bayesian analysis of two-level nonlinear structural equation models with continuous and polytomous data. British Journal of Mathematical and Statistical Psychology 57: 2952.CrossRefGoogle ScholarPubMed
Steenbergen, Marco R., and Jones, Bradford S. 2002. Modeling multilevel data structures. American Journal of Political Science 46: 218–37.CrossRefGoogle Scholar
Takane, Yoshio, and de Leeuw, Jan. 1987. On the relationship between item response theory and factor analysis of discretized variables. Psychometrika 52: 393408.CrossRefGoogle Scholar
van Deth, Jan W. 1998. Equivalence in comparative political research. In Comparative politics. The problem of equivalence, ed. van Deth, Jan W., 119. London: Routledge.Google Scholar
van Herk, Hester, Poortinga, Ype H., and Verhallen, Theo M. M. 2004. Response styles in rating scales: Evidence of method bias in data from six EU countries. Journal of Cross-Cultural Psychology 35: 346–60.CrossRefGoogle Scholar
Varian, Hal R. 1980. Redistributive taxation as social insurance. Journal of Public Economics 14: 4968.CrossRefGoogle Scholar
Vermunt, Jeroen. 2004. An EM algorithm for the estimation of parametric and nonparametric hierarchical nonlinear models. Statistica Neerlandica 58: 220–33.CrossRefGoogle Scholar
Vermunt, Jeroen. 2008. Multilevel latent variable modeling: An application in education testing. Austrian Journal of Statistics 37: 285–99.Google Scholar
Vermunt, Jeroen K., and Magidson, Jay. 2008. LG-Syntax user's guide: Manual for latent GOLD 4.5 Syntax module. Belmont, CA: Statistical Innovations Inc.Google Scholar
Weldon, Steven A. 2006. The institutional context of tolerance for ethnic minorities: A comparative, multilevel analysis of Western Europe. American Journal of Political Science 50: 331–49.CrossRefGoogle Scholar
White, Halbert. 1996. Estimation, inference and specification analysis. Cambridge, MA: Cambridge University Press.Google Scholar
Yang, Yongwei, Harkness, Janet A., Chin, Tzu-Yun, and Villar, Ana. 2010. Response styles and culture. In Survey methods in multicultural, multinational, and multiregional contexts, ed. Harkness, Janet A., Braun, Michael, Edwards, Brad, Johnson, Timothy P., Lyberg, Lars E., Mohler, Peter Ph, Pennell, Beth-Ellen, and Smith, Tom W., 203–26. Hoboken: Wiley.Google Scholar
Supplementary material: PDF

Stegmueller supplementary material


Download Stegmueller supplementary material(PDF)
You have Access
Cited by

Save article to Kindle

To save this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the or variations. ‘’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Apples and Oranges? The Problem of Equivalence in Comparative Research
Available formats

Save article to Dropbox

To save this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Dropbox account. Find out more about saving content to Dropbox.

Apples and Oranges? The Problem of Equivalence in Comparative Research
Available formats

Save article to Google Drive

To save this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Google Drive account. Find out more about saving content to Google Drive.

Apples and Oranges? The Problem of Equivalence in Comparative Research
Available formats

Reply to: Submit a response

Please enter your response.

Your details

Please enter a valid email address.

Conflicting interests

Do you have any conflicting interests? *