Skip to main content
  • Get access
    Check if you have access via personal or institutional login
  • Cited by 17
  • Cited by
    This chapter has been cited by the following publications. This list is generated based on data provided by CrossRef.

    Sessoms, John and Henson, Robert A. 2018. Applications of Diagnostic Classification Models: A Literature Review and Critical Commentary. Measurement: Interdisciplinary Research and Perspectives, Vol. 16, Issue. 1, p. 1.

    Razavipour, Kioumars 2018. Revisiting the Assessment of Second Language Abilities: From Theory to Practice. p. 373.

    Hughes, David J. 2018. The Wiley Handbook of Psychometric Testing. p. 751.

    Liu, Ren Huggins-Manley, Anne Corinne and Bradshaw, Laine 2017. The Impact of Q-Matrix Designs on Diagnostic Classification Accuracy in the Presence of Attribute Hierarchies. Educational and Psychological Measurement, Vol. 77, Issue. 2, p. 220.

    Borsboom, Denny and Wijsen, Lisa D. 2016. Frankenstein’s validity monster: the value of keeping politics and science separated. Assessment in Education: Principles, Policy & Practice, Vol. 23, Issue. 2, p. 281.

    Seng Kam, Chester Chun and Meyer, John P. 2015. Implications of Item Keying and Item Valence for the Investigation of Construct Dimensionality. Multivariate Behavioral Research, Vol. 50, Issue. 4, p. 457.

    Maul, Andrew 2015. Learning Progressions, Vertical Scales, and Testable Hypotheses: Promising Intuitions and Points for Clarification. Measurement: Interdisciplinary Research and Perspectives, Vol. 13, Issue. 2, p. 118.

    Madison, Matthew J. and Bradshaw, Laine P. 2015. The Effects of Q-Matrix Design on Classification Accuracy in the Log-Linear Cognitive Diagnosis Model. Educational and Psychological Measurement, Vol. 75, Issue. 3, p. 491.

    Bradshaw, Laine and Templin, Jonathan 2014. Combining Item Response Theory and Diagnostic Classification Models: A Psychometric Model for Scaling Ability and Diagnosing Misconceptions. Psychometrika, Vol. 79, Issue. 3, p. 403.

    Maul, Andrew 2013. On the ontology of psychological attributes. Theory & Psychology, Vol. 23, Issue. 6, p. 752.

    Borsboom, Denny and Markus, Keith A. 2013. Truth and Evidence in Validity Theory. Journal of Educational Measurement, Vol. 50, Issue. 1, p. 110.

    Jee, Tze Ling Tay, Kai Meng and Ng, Chee Khoon 2012. Outcome-Based Science, Technology, Engineering, and Mathematics Education. p. 302.

    Newton, Paul E. 2012. Questioning the Consensus Definition of Validity. Measurement: Interdisciplinary Research & Perspective, Vol. 10, Issue. 1-2, p. 110.

    Newton, Paul E. 2012. Clarifying the Consensus Definition of Validity. Measurement: Interdisciplinary Research & Perspective, Vol. 10, Issue. 1-2, p. 1.

    Millett, Joseph Atwill, Kim Blanchard, Jay and Gorin, Joanna 2008. The Validity of Receptive and Expressive Vocabulary Measures with Spanish-Speaking Kindergarteners Learning English. Reading Psychology, Vol. 29, Issue. 6, p. 534.

    Ketterlin-Geller, Leanne R. 2008. Testing Students with Special Needs: A Model for Understanding the Interaction Between Assessment and Student Characteristics in a Universally Designed Environment. Educational Measurement: Issues and Practice, Vol. 27, Issue. 3, p. 3.

    Gorin, Joanna S. 2007. Reconsidering Issues in Validity Theory. Educational Researcher, Vol. 36, Issue. 8, p. 456.

  • Print publication year: 2007
  • Online publication date: November 2009

4 - Test Validity in Cognitive Assessment



Scientific theories can be viewed as attempts to explain phenomena by showing how they would arise, if certain assumptions concerning the structure of the world were true. Such theories invariably involve a reference to theoretical entities and attributes. Theoretical attributes include such things as electrical charge and distance in physics, inclusive fitness and selective pressure in biology, brain activity and anatomic structure in neuroscience, and intelligence and developmental stages in psychology. These attributes are not subject to direct observation but require an inferential process by which the researcher infers positions of objects on the attribute on the basis of a set of observations.

To make such inferences, one needs to have an idea of how different observations map on to different positions on the attribute (which, after all, is not itself observable). This requires a measurement model. A measurement model explicates how the structure of theoretical attributes relates to the structure of observations. For instance, a measurement model for temperature may stipulate how the level of mercury in a thermometer is systematically related to temperature, or a measurement model for intelligence may specify how IQ scores are related to general intelligence.

The reliance on a process of measurement and the associated measurement model usually involves a degree of uncertainty; the researcher assumes, but cannot know for sure, that a measurement procedure is appropriate in a given situation.

Recommend this book

Email your librarian or administrator to recommend adding this book to your organisation's collection.

Cognitive Diagnostic Assessment for Education
  • Online ISBN: 9780511611186
  • Book DOI:
Please enter your name
Please enter a valid email address
Who would you like to send this to *
Bechtold, H. P. (1959). Construct validity: A critique. American Psychologist, 14, 619–629.
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Borsboom, D., Mellenbergh, G. J., & Heerden, J. (2004). The concept of validity. Psychological Review, 111, 1061–1071.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105.
Cronbach, L. J. (1990). Essentials of psychological testing (5th ed.). New York: Harper & Row.
Cronbach, L. J., & Gleser, G. C. (1957). Psychological tests and personnel decisions.Urbana: University of Illinois Press.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.
Boeck, P., & Wilson, M. (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer.
Groot, A. D. (1970). Some badly needed non-statistical concepts in applied psychometrics. Nederlands Tijdschrift voor de Psychologie, 25, 360–376.
Vries, A. L. M. (2006). The merit of ipsative measurement: Second thoughts and minute doubts. Unpublished doctoral dissertation, University of Maastricht, The Netherlands.
Dolan, C. V., Jansen, B. R. J., & Maas, H. L. J. (2004). Constrained and unconstrained normal finite mixture modeling of multivariate conservation data. Multivariate Behavioral Research, 39, 69–98.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.
Gross, A. L., & Su, W. H. (1975). Defining a “fair” or “unbiased” selection model: A question of utilities. Journal of Applied Psychology, 60, 345–351.
Guttman, L. (1965). Introduction to facet design and analysis. In Proceedings of the 15th international congress of psychology. Amsterdam: North Holland.
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer-Nijhoff.
Inhelder, B., & Piaget, J. (1958). The growth of logical thinking from childhood to adolescence. New York: Basic Books.
Jackson, D. N. (1971). The dynamics of structured personality tests. Psychological Review, 78, 229–248.
Jansen, B. R. J., & Maas, H. L. J. (1997). Statistical tests of the rule assessment methodology by latent class analysis. Developmental Review, 17, 321–357.
Jansen, B. R. J., & Maas, H. L. J. (2002). The development of children's rule use on the balance scale task. Journal of Experimental Child Psychology, 81, 383–416.
Kaplan, D. (2000). Structural equation modeling: Foundations and extensions. Newbury Park, CA: Sage.
Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3, 635–694.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Maris, E. (1995). Psychometric latent response models. Psychometrika, 60, 523–547.
Mellenbergh, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127–143.
Mellenbergh, G. J. (1996). Measurement precision in test score and item response models. Psychological Methods, 1, 293–299.
Mellenbergh, G. J., & Linden, W. J. (1979). The internal and external optimality of decisions based on tests. Applied Psychological Measurement, 3, 257–273.
Mellenbergh, G. J., & Linden, W. J. (1981). The linear utility model for optimal selection. Psychometrika, 46, 283–305.
Meredith, W. (1993). Measurement invariance, factor analysis, and factorial invariance. Psychometrika, 58, 525–543.
Messick, S. (1998). Test validity: A matter of consequence. Social Indicators Research, 45, 35–44.
Messick, S. C. (1989). Validity. In Linn, R. L. (Ed.), Educational measurement (pp. 13–103). Washington, DC: American Council on Education and National Council on Measurement in Education.
Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British Journal of Psychology, 88, 355–383
Michell, J. (1999). Measurement in psychology: A critical history of a methodological concept.Cambridge, UK: Cambridge University Press.
Millsap, R. E. (1997). Invariance in measurement and prediction: Their relationship in the single-factor case. Psychological Methods, 2, 248–260.
Millsap, R. E., & Meredith, W. (1992). Inferential conditions in the statistical detection of measurement bias. Applied Psychological Measurement, 16, 389–402.
Oort, F. J. (1993). Theory of violators: Assessing unidimensionality of psychological measures. In Steyer, R., Wender, K. F., & Widaman, K. F. (Eds.), Proceeding of the 7th European meeting of the Psychometric Society in Trier (pp. 377–381). Stuttgart: Gustav Fischer.
Oosterveld, P. (1996). Questionnaire design methods. Unpublished doctoral dissertation, University of Amsterdam, Amsterdam, The Netherlands.
Popham, W. J. (1997). Consequential validity: Right concern-wrong concept. Educational Measurement: Issues and Practice, 16, 9–13.
Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59–104.
Roid, G. H. (2003). Stanford-Binet Intelligence Scales, Fifth Edition. Itasca, IL: Riverside Publishing.
Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14, 271–282.
Schmittmann, V. D., Dolan, C. V., Maas, H. L. J., & Neale, M. C. (2005). Discrete latent Markov models for normally distributed response data. Multivariate Behavioral Research, 40, 461–484.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin.
Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory. Thousand Oaks, CA: Sage.
Snow, R. E., & Lohman, D. F. (1989). Implications of cognitive psychology for educational measurement. In Linn, R. L. (Ed.), Educational measurement (pp. 263–311). Washington, DC: American Council on Education and National Council on Measurement in Education.
Stouthard, M. E. A., Hoogstraten, J., & Mellenbergh, G. J. (1995). A study of the convergent and discriminant validity of the dental anxiety inventory. Behaviour Research and Therapy, 33, 589–595.
Stouthard, M. E. A., Mellenbergh, G. J., & Hoogstraten, J. (1993). Assessment of dental anxiety: A facet approach. Anxiety, Stress, and Coping, 6, 89–105.
Tuerlinckx, F., & Boeck, P. (2005). Two interpretations of the discrimination parameter. Psychometrika, 70, 629–650.
Uebersax, J. S. (1999). Probit latent class analysis with dichotomous or ordered category measures: Conditional independence/dependence models. Applied Psychological Measurement, 23, 283–297.
Linden, W. J. (1980). Decision models for use with criterion-referenced tests. Applied Psychological Measurement, 4, 469–492.
Linden, W. J., & Hambleton, R. K. (Eds.). (1997). Handbook of modern item response theory. New York: Springer.
Wechsler, D. (1997). Wechsler Adult Intelligence Scale, Third Edition. San Antonio, TX: The Psychological Corporation.
Wicherts, J. M., Dolan, C. V., & Hessen, D. J. (2005). Stereotype threat and group differences in test performance: A question of measurement invariance. Journal of Personality and Social Psychology, 89, 696–716.