Skip to main content Accessibility help

Apples and Oranges? The Problem of Equivalence in Comparative Research

  • Daniel Stegmueller (a1)


Researchers in comparative research are increasingly relying on individual level data to test theories involving unobservable constructs like attitudes and preferences. Estimation is carried out using large-scale cross-national survey data providing responses from individuals living in widely varying contexts. This strategy rests on the assumption of equivalence, that is, no systematic distortion in response behavior of individuals from different countries exists. However, this assumption is frequently violated with rather grave consequences for comparability and interpretation. I present a multilevel mixture ordinal item response model with item bias effects that is able to establish equivalence. It corrects for systematic measurement error induced by unobserved country heterogeneity, and it allows for the simultaneous estimation of structural parameters of interest.

    • Send article to Kindle

      To send this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the or variations. ‘’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      Apples and Oranges? The Problem of Equivalence in Comparative Research
      Available formats

      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      Apples and Oranges? The Problem of Equivalence in Comparative Research
      Available formats

      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      Apples and Oranges? The Problem of Equivalence in Comparative Research
      Available formats



Hide All
Aitkin, M. 1999. A general maximum likelihood analysis of variance components in generalized linear models. Biometrics 55: 117–28.
Bartels, Larry M. 1996. Pooling disparate observations. American Journal of Political Science 40: 905–42.
Baumgartner, Hans, and Steenkamp, Jan-Benedict. 1998. Multi-group latent variable models for varying numbers of items and factors with cross-national and longitudinal applications. Marketing Letters 9: 2135.
Baumgartner, Hans, and Steenkamp, Jan-Benedict. 2001. Response styles in marketing research: A cross-national investigation. Journal of Marketing Research 38: 143–56.
Baumgartner, Hans, and Steenkamp, Jan-Benedict. 2004. Issues in assessing measurement invariance in cross-national research. Presentation at Symposium on Cross-Cultural Survey Research, University of Illinois, Urbana-Champaign.
Becker, Gary S. 1993. Human capital: A theoretical and empirical analysis with special reference to education. Chicago, IL: University of Chicago Press.
Burnham, Kenneth P., and Anderson, David. 2003. Model selection and multi-model inference. A practical information-theoretic approach. New York: Springer.
Byrne, Barbara M., Shavelson, Richard J., and Muthén, Bengt. 1989. Testing for the equivlence of factor covariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin 105: 456–66.
Clinton, Joshua D., Jackman, Simon, and Rivers, Doug. 2004. The statistical analysis of roll call voting: A unified approach. American Political Science Review 98: 355–70.
Croon, Marcel, and Bolck, A. 1997. On the use of factor scores in structural equations models. Technical report No. 97.10.102/7. The Netherlands: Work and Organization Research Center, Tilburg University.
Cusack, Thomas, Iversen, Torbern, and Rehm, Phillip. 2005. Risks at work: The demand and supply sides of government redistribution. Oxford Review Of Economic Policy 22: 365–89.
Davidov, Eldad. 2009. Measurement equivalence of nationalism and constructive patriotism in the ISSP 2003: 34 countries in a comparative perspective. Political Analysis 17: 6482.
De Boeck, P., and Wilson, M. 2004. Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer.
De Jong, Martijn G., and Steenkamp, Jan-Benedict E. M. 2010. Finite mixture multilevel multidimensional ordinal IRT models for large scale cross-cultural research. Psychometrika 75: 332.
Delhey, Jan, and Newton, Kenneth. 2005. Predicting cross-national levels of social trust: Global pattern or nordic exceptionalism? European Sociological Review 21: 311–27.
Estevez-Abe, Margarita, Iversen, Torben, and Soskice, David. 2001. Social protection and the formation of skills. A reinterpretation of the welfare state. In Varieties of capitalism. The institutioinal foundations of comparative advantage, ed. Hall, Peter A. and Soskice, David W., 145–83. Oxford: Oxford University Press.
Fennessey, James. 1986. The general linear model: A new perspective on some familiar topics. American Journal of Sociology 74: 127.
Fontaine, Johnny R. J. 2005. Equivalence. In Encyclopedia of social measurement. Vol. 1, A-E, ed. Kempf-Leonard, Kimberly, 803–18. New York: Academic Press.
Fox, Jean-Paul, and Glas, Cees A. W. 2001. Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika 66: 271–88.
Gelman, Andrew, and Stern, Hal. 2006. The difference between ‘significant’ and ‘not significant’ is not itself statistically significant. The American Statistician 60: 328–31.
Gouveia, Miguel, and Masia, Neal A. 1998. Does the median voter model explain the size of government? Evidence from the states. Public Choice 97: 159–77.
Greene, William, and Hensher, David. 2010. Modeling ordered choices: A primer. Cambridge: Cambridge University Press.
Hambleton, Ronald K., Swaminathan, H., and Jane Rogers, H. 1991. Fundamentals of item response theory. Newbury Park: Sage.
Heckman, J., and Singer, B. 1984. A method for minimizing the impact of distributional assumptions in econometric models for duration data. Econometrica 52: 271320.
Hofstede, Geert H. 2001. Culture's consequences: Comparing values, behaviors, institutions, and organizations across nations. Thousand Oaks: Sage.
Hooghe, Liesbet, and Marks, Gary. 2004. Does identity or economic rationality drive public opinion on European integration? Political Science & Politics 37: 415–20.
Hooghe, Marc, Reeskens, Tim, Stolle, Dietlind, and Trappers, Ann. 2009. Ethnic diversity and generalized trust in Europe. A cross-national multilevel study. Comparative Political Studies 42: 198223.
Horn, John L., and McArdle, Jack J. 1992. A practical and theoretical guide to measurement invariance in aging research. Experimental Aging Research 18: 117–44.
Hyman, Herbert H. 1972. Secondary analysis of sample surveys: Principles, procedures and potentialities. New York: Wiley.
Iversen, Torben. 2005. Capitalism, democracy, and welfare. Cambridge: Cambridge University Press.
Iversen, Torben. 2006. Class politics is dead! Long live class politics! Apolitical economy perspective on the new partisan politics. APSA-CP 17: 16.
Iversen, Torben, and Soskice, David. 2001. An asset theory of social policy preferences. American Political Science Review 95: 875–93.
Jackman, Simon. 2008. Measurement. In Oxford Handbook of Political Methodology, ed. Box-Steffensmeier, Janet M., Brady, Henry E., and Collier, David, 119–51. Oxford: Oxford University Press.
Jesse, Stephen A. 2009. Spatial voting in the 2004 presidential election. American Political Science Review 103: 5981.
Johnson, Timothy P. 1998. Approaches to equivalence in cross-cultural and cross-national survey research. In ZUMA-Nachrichten Spezial Band 3: Cross-cultural survey equivalence, ed. Harkness, J. Mannheim: ZUMA.
Johnson, Timothy, Kulesa, Patrick, Cho, Young Ik, and Shavitt, Sharon. 2005. The relation between culture and response styles. Evidence from 19 countries. Journal of Cross-Cultural Psychology 36: 264–77.
Johnson, Valen E., and Albert, Jim H. 1999. Ordinal data modeling. New York: Springer.
Jöreskog, Karl G. 1971. Simultaneous factor analysis in several populations. Psychometrika 36: 409–26.
Kim, Jae-On, and Mueller, Charles W. 1978. Factor analysis. Thousand Oaks: Sage.
King, Gary, Keohane, Robert O., and Verba, Sidney. 1994. Designing social inquiry. Princeton: Princeton University Press.
King, Gary, Murray, Christopher J. L., Salomon, Joshua A., and Tandon, Ajay. 2004. Enhancing the validity and cross-cultural comparability of measurement in survey research. American Political Science Review 98: 191207.
King, Gary, and Wand, Jonathan. 2007. Comparing incomparable survey responses: Evaluating and selecting anchoring vignettes. Political Analysis 15: 4666.
Laird, N. 1978. Nonparametric maximum likelihood estimation of a mixture distribution. Journal of the American Statistical Association 73: 805–11.
Lazarsfeld, Paul F. 1959. Latent structure analysis. In Psychology: A study of a science, Vol. III, ed. Koch, Sigmund. New York: McGraw-Hill.
Lee, Sik-Yum, and Shi, Jian-Qing. 2001. Maximum likelihood estimation of two-level latent variable models with mixed continuous and polytomous data. Biometrics 57: 787–94.
Lesaffre, Emmanuel, and Spiessens, Bart. 2001. On the effect of the number of quadrature points in a logistic random-effects model: An example. Journal of the Royal Statistical Society A 50: 325–35.
Little, Roderick J.A., and Rubin, Donald B. 2002. Statistical analysis with missing data. Hoboken: Wiley.
Lubke, Gitta H., and Muthén, Bengt O. 2004. Applying multigroup confirmatory factor models for continuous outcomes to Likert scale data complicates meaningful group comparisons. Structural Equation Modeling 11: 514–34.
Martin, Andrew D., and Quinn, Kevin M. 2002. Dynamic ideal point estimation via Markov Chain Monte Carlo for the U. S. Supreme Court, 1953-1999. Political Analysis 10: 134–53.
McCullagh, P., and Nelder, J. A. 1989. Generalized linear models. London: Chapman & Hall.
McLachlan, Geoffrey, and Peel, David. 2000. Finite mixture models. New York: Wiley.
McLachlan, Geoffrey J., and Krishnan, Thriyambakam. 2008. The EM algorithm and extensions. New York: Wiley.
Mellenbergh, Gideon J. 1994. Generalized linear item response theory. Psychological Bulletin 115: 300–07.
Meltzer, Allan H., and Richard, Scott F. 1981. A rational theory of the size of government. Journal of Political Economy 89: 914–27.
Meredith, William. 1993. Measurement invariance, factor analysis and factorial invariance. Psychometrika 58: 525–43.
Mill, John Stuart. 2007. Utilitarianism, liberty & representative government. Rockville, MD: Wildside Press.
Millsap, Roger E., and Kwok, Oi-Man. 2004. Evaluating the impact of partial factorial invariance on selection in two populations. Psychological Methods 9: 93115.
Millsap, Roger E., and Yun-Tein, Jenn. 2004. Assessing factorial invariance in ordered-categorical measures. Multivariate Behavioral Research 39: 479515.
Moene, Karl Ove, and Wallerstein, Michael. 2003. Earnings inequality and welfare spending: A disaggregated analysis. World Politics 55: 485516.
Moustaki, Irini. 2000. A latent variable model for ordinal variables. Applied Psychological Measurement 24: 211–23.
Moustaki, Irini. 2003. A general class of latent variable models for ordinal manifest variables with covariate effects on the manifest and latent variables. British Journal of Mathematical and Statistical Psychology 56: 337–57.
Moustaki, Irini, Jöreskog, Karl G., and Mavridis, Dimitris. 2004. Factor models for ordinal variables with covariate effects on the manifest and latent variables: A comparison of LISREL and IRT approaches. Structural Equation Modeling 11: 487513.
Moustaki, Irini, and Knott, Martin. 2000. Generalized latent trait models. Psychometrika 65: 391411.
Muthén, Bengt. 1989. Latent variable modeling in heterogeneous populations. Psychometrika 54: 557–85.
O'Rourke, Kevin H., and Sinnott, Richard. 2006. The determinants of individual attitudes towards immigration. European Journal of Political Economy 22: 838–61.
Quinn, Kevin M. 2004. Bayesian factor analysis for mixed ordinal and continuous responses. Political Analysis 12: 338–53.
Rabe-Hesketh, Sophia, Skrondal, Anders, and Pickles, Andrew. 2004. Generalized multilevel structural equation modeling. Psychometrika 69: 167–90.
Reeskens, Tim, and Hooghe, Marc. 2008. Cross-cultural measurement equivalence of generalized trust. Evidence from the European Social Survey (2002 and 2004). Social Indicators Research 85: 515–32.
Rijmen, F., Tuerlinckx, F., De Boeck, P., and Kuppens, P. 2003. A nonlinear mixed model framework for item response theory. Psychological Methods 8: 185205.
Rodrigiuez, F. C. 1999. Does distributional skewness lead to redistribution? Evidence from the United States. Economics & Politics 11: 171–99.
Rodrik, Dani, and Mayda, Anna Maria. 2005. Why are some people (and countries) more protectionist than others? European Economic Review 49: 1393–430.
Royall, Richard M. 1986. Model robust confidence intervals using maximum Likelihood estimators. International Statistical Review 54: 221–26.
Salzberger, Thomas, Sinkovics, Rudolf, and Schlegelmilch, Bodo. 1999. Data equivalence in cross-cultural research: A comparison of classical test theory and latent trait theory based approaches. Australasian Journal of Marketing 7: 2338.
Samejima, Fumiko. 1969. Estimation of latent ability using a response pattern of graded scores (Psychometric Monograph No. 17). Richmond: Psychometric Society.
Scheve, Kenneth, and Stasavage, David. 2006. Religion and preferences for social insurance. Quarterly Journal of Political Science 1: 255–86.
Schwarz, Norbert. 2003. Culture-sensitive context effects: A challenge for cross-cultural surveys. In Cross-cultural survey methods, ed. Harkness, Janet A., van de Vijver, Fons J. R., and Mohler, Peter Ph, 93100. New Jersey: Wiley.
Sinn, Hans-Werner. 1995. A theory of the welfare state. Scandinavian Journal of Economics 97: 495526.
Skrondal, Anders, and Laake, Petter. 2001. Regression among factor scores. Psychometrika 66: 563757.
Skrondal, Anders, and Rabe-Hesketh, Sophia. 2004. Generalized latent variable modeling: Multilevel, longitudinal and structural equation models. Boca Raton, IL: Chapman & Hall.
Song, Xin-Yuan, and Lee, Sik-Yum. 2004. Bayesian analysis of two-level nonlinear structural equation models with continuous and polytomous data. British Journal of Mathematical and Statistical Psychology 57: 2952.
Steenbergen, Marco R., and Jones, Bradford S. 2002. Modeling multilevel data structures. American Journal of Political Science 46: 218–37.
Takane, Yoshio, and de Leeuw, Jan. 1987. On the relationship between item response theory and factor analysis of discretized variables. Psychometrika 52: 393408.
van Deth, Jan W. 1998. Equivalence in comparative political research. In Comparative politics. The problem of equivalence, ed. van Deth, Jan W., 119. London: Routledge.
van Herk, Hester, Poortinga, Ype H., and Verhallen, Theo M. M. 2004. Response styles in rating scales: Evidence of method bias in data from six EU countries. Journal of Cross-Cultural Psychology 35: 346–60.
Varian, Hal R. 1980. Redistributive taxation as social insurance. Journal of Public Economics 14: 4968.
Vermunt, Jeroen. 2004. An EM algorithm for the estimation of parametric and nonparametric hierarchical nonlinear models. Statistica Neerlandica 58: 220–33.
Vermunt, Jeroen. 2008. Multilevel latent variable modeling: An application in education testing. Austrian Journal of Statistics 37: 285–99.
Vermunt, Jeroen K., and Magidson, Jay. 2008. LG-Syntax user's guide: Manual for latent GOLD 4.5 Syntax module. Belmont, CA: Statistical Innovations Inc.
Weldon, Steven A. 2006. The institutional context of tolerance for ethnic minorities: A comparative, multilevel analysis of Western Europe. American Journal of Political Science 50: 331–49.
White, Halbert. 1996. Estimation, inference and specification analysis. Cambridge, MA: Cambridge University Press.
Yang, Yongwei, Harkness, Janet A., Chin, Tzu-Yun, and Villar, Ana. 2010. Response styles and culture. In Survey methods in multicultural, multinational, and multiregional contexts, ed. Harkness, Janet A., Braun, Michael, Edwards, Brad, Johnson, Timothy P., Lyberg, Lars E., Mohler, Peter Ph, Pennell, Beth-Ellen, and Smith, Tom W., 203–26. Hoboken: Wiley.
MathJax is a JavaScript display engine for mathematics. For more information see

Related content

Powered by UNSILO
Type Description Title
Supplementary materials

Stegmueller supplementary material

 PDF (97 KB)
97 KB

Apples and Oranges? The Problem of Equivalence in Comparative Research

  • Daniel Stegmueller (a1)


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.