Skip to main content Accessibility help
  • Get access
    Check if you have access via personal or institutional login
  • Cited by 20
  • Print publication year: 2007
  • Online publication date: November 2009

4 - Test Validity in Cognitive Assessment



Scientific theories can be viewed as attempts to explain phenomena by showing how they would arise, if certain assumptions concerning the structure of the world were true. Such theories invariably involve a reference to theoretical entities and attributes. Theoretical attributes include such things as electrical charge and distance in physics, inclusive fitness and selective pressure in biology, brain activity and anatomic structure in neuroscience, and intelligence and developmental stages in psychology. These attributes are not subject to direct observation but require an inferential process by which the researcher infers positions of objects on the attribute on the basis of a set of observations.

To make such inferences, one needs to have an idea of how different observations map on to different positions on the attribute (which, after all, is not itself observable). This requires a measurement model. A measurement model explicates how the structure of theoretical attributes relates to the structure of observations. For instance, a measurement model for temperature may stipulate how the level of mercury in a thermometer is systematically related to temperature, or a measurement model for intelligence may specify how IQ scores are related to general intelligence.

The reliance on a process of measurement and the associated measurement model usually involves a degree of uncertainty; the researcher assumes, but cannot know for sure, that a measurement procedure is appropriate in a given situation.

Related content

Powered by UNSILO
Bechtold, H. P. (1959). Construct validity: A critique. American Psychologist, 14, 619–629.
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Borsboom, D., Mellenbergh, G. J., & Heerden, J. (2004). The concept of validity. Psychological Review, 111, 1061–1071.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105.
Cronbach, L. J. (1990). Essentials of psychological testing (5th ed.). New York: Harper & Row.
Cronbach, L. J., & Gleser, G. C. (1957). Psychological tests and personnel decisions.Urbana: University of Illinois Press.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.
Boeck, P., & Wilson, M. (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer.
Groot, A. D. (1970). Some badly needed non-statistical concepts in applied psychometrics. Nederlands Tijdschrift voor de Psychologie, 25, 360–376.
Vries, A. L. M. (2006). The merit of ipsative measurement: Second thoughts and minute doubts. Unpublished doctoral dissertation, University of Maastricht, The Netherlands.
Dolan, C. V., Jansen, B. R. J., & Maas, H. L. J. (2004). Constrained and unconstrained normal finite mixture modeling of multivariate conservation data. Multivariate Behavioral Research, 39, 69–98.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.
Gross, A. L., & Su, W. H. (1975). Defining a “fair” or “unbiased” selection model: A question of utilities. Journal of Applied Psychology, 60, 345–351.
Guttman, L. (1965). Introduction to facet design and analysis. In Proceedings of the 15th international congress of psychology. Amsterdam: North Holland.
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer-Nijhoff.
Inhelder, B., & Piaget, J. (1958). The growth of logical thinking from childhood to adolescence. New York: Basic Books.
Jackson, D. N. (1971). The dynamics of structured personality tests. Psychological Review, 78, 229–248.
Jansen, B. R. J., & Maas, H. L. J. (1997). Statistical tests of the rule assessment methodology by latent class analysis. Developmental Review, 17, 321–357.
Jansen, B. R. J., & Maas, H. L. J. (2002). The development of children's rule use on the balance scale task. Journal of Experimental Child Psychology, 81, 383–416.
Kaplan, D. (2000). Structural equation modeling: Foundations and extensions. Newbury Park, CA: Sage.
Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3, 635–694.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Maris, E. (1995). Psychometric latent response models. Psychometrika, 60, 523–547.
Mellenbergh, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127–143.
Mellenbergh, G. J. (1996). Measurement precision in test score and item response models. Psychological Methods, 1, 293–299.
Mellenbergh, G. J., & Linden, W. J. (1979). The internal and external optimality of decisions based on tests. Applied Psychological Measurement, 3, 257–273.
Mellenbergh, G. J., & Linden, W. J. (1981). The linear utility model for optimal selection. Psychometrika, 46, 283–305.
Meredith, W. (1993). Measurement invariance, factor analysis, and factorial invariance. Psychometrika, 58, 525–543.
Messick, S. (1998). Test validity: A matter of consequence. Social Indicators Research, 45, 35–44.
Messick, S. C. (1989). Validity. In Linn, R. L. (Ed.), Educational measurement (pp. 13–103). Washington, DC: American Council on Education and National Council on Measurement in Education.
Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British Journal of Psychology, 88, 355–383
Michell, J. (1999). Measurement in psychology: A critical history of a methodological concept.Cambridge, UK: Cambridge University Press.
Millsap, R. E. (1997). Invariance in measurement and prediction: Their relationship in the single-factor case. Psychological Methods, 2, 248–260.
Millsap, R. E., & Meredith, W. (1992). Inferential conditions in the statistical detection of measurement bias. Applied Psychological Measurement, 16, 389–402.
Oort, F. J. (1993). Theory of violators: Assessing unidimensionality of psychological measures. In Steyer, R., Wender, K. F., & Widaman, K. F. (Eds.), Proceeding of the 7th European meeting of the Psychometric Society in Trier (pp. 377–381). Stuttgart: Gustav Fischer.
Oosterveld, P. (1996). Questionnaire design methods. Unpublished doctoral dissertation, University of Amsterdam, Amsterdam, The Netherlands.
Popham, W. J. (1997). Consequential validity: Right concern-wrong concept. Educational Measurement: Issues and Practice, 16, 9–13.
Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59–104.
Roid, G. H. (2003). Stanford-Binet Intelligence Scales, Fifth Edition. Itasca, IL: Riverside Publishing.
Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14, 271–282.
Schmittmann, V. D., Dolan, C. V., Maas, H. L. J., & Neale, M. C. (2005). Discrete latent Markov models for normally distributed response data. Multivariate Behavioral Research, 40, 461–484.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin.
Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory. Thousand Oaks, CA: Sage.
Snow, R. E., & Lohman, D. F. (1989). Implications of cognitive psychology for educational measurement. In Linn, R. L. (Ed.), Educational measurement (pp. 263–311). Washington, DC: American Council on Education and National Council on Measurement in Education.
Stouthard, M. E. A., Hoogstraten, J., & Mellenbergh, G. J. (1995). A study of the convergent and discriminant validity of the dental anxiety inventory. Behaviour Research and Therapy, 33, 589–595.
Stouthard, M. E. A., Mellenbergh, G. J., & Hoogstraten, J. (1993). Assessment of dental anxiety: A facet approach. Anxiety, Stress, and Coping, 6, 89–105.
Tuerlinckx, F., & Boeck, P. (2005). Two interpretations of the discrimination parameter. Psychometrika, 70, 629–650.
Uebersax, J. S. (1999). Probit latent class analysis with dichotomous or ordered category measures: Conditional independence/dependence models. Applied Psychological Measurement, 23, 283–297.
Linden, W. J. (1980). Decision models for use with criterion-referenced tests. Applied Psychological Measurement, 4, 469–492.
Linden, W. J., & Hambleton, R. K. (Eds.). (1997). Handbook of modern item response theory. New York: Springer.
Wechsler, D. (1997). Wechsler Adult Intelligence Scale, Third Edition. San Antonio, TX: The Psychological Corporation.
Wicherts, J. M., Dolan, C. V., & Hessen, D. J. (2005). Stereotype threat and group differences in test performance: A question of measurement invariance. Journal of Personality and Social Psychology, 89, 696–716.