Test Validity in Cognitive Assessment

Denny Borsboom; Gideon J. Mellenbergh

doi:10.1017/CBO9780511611186.004

4 - Test Validity in Cognitive Assessment

Published online by Cambridge University Press: 23 November 2009

Denny Borsboom and

Gideon J. Mellenbergh

Edited by

Jacqueline Leighton and

Mark Gierl

Show author details

Denny Borsboom: Affiliation:
Assistant Professor of Psychology, University of Amsterdam
Gideon J. Mellenbergh: Affiliation:
Professor of Psychology, University of Amsterdam
Jacqueline Leighton: Affiliation:
University of Alberta
Mark Gierl: Affiliation:
University of Alberta

Book contents

Get access

Summary

INTRODUCTION

Scientific theories can be viewed as attempts to explain phenomena by showing how they would arise, if certain assumptions concerning the structure of the world were true. Such theories invariably involve a reference to theoretical entities and attributes. Theoretical attributes include such things as electrical charge and distance in physics, inclusive fitness and selective pressure in biology, brain activity and anatomic structure in neuroscience, and intelligence and developmental stages in psychology. These attributes are not subject to direct observation but require an inferential process by which the researcher infers positions of objects on the attribute on the basis of a set of observations.

To make such inferences, one needs to have an idea of how different observations map on to different positions on the attribute (which, after all, is not itself observable). This requires a measurement model. A measurement model explicates how the structure of theoretical attributes relates to the structure of observations. For instance, a measurement model for temperature may stipulate how the level of mercury in a thermometer is systematically related to temperature, or a measurement model for intelligence may specify how IQ scores are related to general intelligence.

The reliance on a process of measurement and the associated measurement model usually involves a degree of uncertainty; the researcher assumes, but cannot know for sure, that a measurement procedure is appropriate in a given situation.

Type: Chapter
Information: Cognitive Diagnostic Assessment for Education
Theory and Applications
, pp. 85 - 116

DOI: https://doi.org/10.1017/CBO9780511611186.004 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2007

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bechtold, H. P. (1959). Construct validity: A critique. American Psychologist, 14, 619–629.CrossRef Google Scholar

Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.CrossRef Google Scholar

Borsboom, D., Mellenbergh, G. J., & Heerden, J. (2004). The concept of validity. Psychological Review, 111, 1061–1071.CrossRef Google Scholar PubMed

Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105.CrossRef Google Scholar PubMed

Cronbach, L. J. (1990). Essentials of psychological testing (5th ed.). New York: Harper & Row.Google Scholar

Cronbach, L. J., & Gleser, G. C. (1957). Psychological tests and personnel decisions.Urbana: University of Illinois Press.Google Scholar

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.CrossRef Google Scholar PubMed

Boeck, P., & Wilson, M. (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer.CrossRef Google Scholar

Groot, A. D. (1970). Some badly needed non-statistical concepts in applied psychometrics. Nederlands Tijdschrift voor de Psychologie, 25, 360–376.Google Scholar

Vries, A. L. M. (2006). The merit of ipsative measurement: Second thoughts and minute doubts. Unpublished doctoral dissertation, University of Maastricht, The Netherlands.Google Scholar

Dolan, C. V., Jansen, B. R. J., & Maas, H. L. J. (2004). Constrained and unconstrained normal finite mixture modeling of multivariate conservation data. Multivariate Behavioral Research, 39, 69–98.CrossRef Google Scholar

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.Google Scholar

Gross, A. L., & Su, W. H. (1975). Defining a “fair” or “unbiased” selection model: A question of utilities. Journal of Applied Psychology, 60, 345–351.CrossRef Google Scholar

Guttman, L. (1965). Introduction to facet design and analysis. In Proceedings of the 15th international congress of psychology. Amsterdam: North Holland.Google Scholar

Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer-Nijhoff.CrossRef Google Scholar

Inhelder, B., & Piaget, J. (1958). The growth of logical thinking from childhood to adolescence. New York: Basic Books.CrossRef Google Scholar

Jackson, D. N. (1971). The dynamics of structured personality tests. Psychological Review, 78, 229–248.CrossRef Google Scholar

Jansen, B. R. J., & Maas, H. L. J. (1997). Statistical tests of the rule assessment methodology by latent class analysis. Developmental Review, 17, 321–357.CrossRef Google Scholar

Jansen, B. R. J., & Maas, H. L. J. (2002). The development of children's rule use on the balance scale task. Journal of Experimental Child Psychology, 81, 383–416.CrossRef Google Scholar PubMed

Kaplan, D. (2000). Structural equation modeling: Foundations and extensions. Newbury Park, CA: Sage.Google Scholar

Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3, 635–694.Google Scholar

Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.Google Scholar

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.Google Scholar

Maris, E. (1995). Psychometric latent response models. Psychometrika, 60, 523–547.CrossRef Google Scholar

Mellenbergh, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127–143.CrossRef Google Scholar

Mellenbergh, G. J. (1996). Measurement precision in test score and item response models. Psychological Methods, 1, 293–299.CrossRef Google Scholar

Mellenbergh, G. J., & Linden, W. J. (1979). The internal and external optimality of decisions based on tests. Applied Psychological Measurement, 3, 257–273.CrossRef Google Scholar

Mellenbergh, G. J., & Linden, W. J. (1981). The linear utility model for optimal selection. Psychometrika, 46, 283–305.CrossRef Google Scholar

Meredith, W. (1993). Measurement invariance, factor analysis, and factorial invariance. Psychometrika, 58, 525–543.CrossRef Google Scholar

Messick, S. (1998). Test validity: A matter of consequence. Social Indicators Research, 45, 35–44.CrossRef Google Scholar

Messick, S. C. (1989). Validity. In Linn, R. L. (Ed.), Educational measurement (pp. 13–103). Washington, DC: American Council on Education and National Council on Measurement in Education.Google Scholar

Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British Journal of Psychology, 88, 355–383CrossRef Google Scholar

Michell, J. (1999). Measurement in psychology: A critical history of a methodological concept.Cambridge, UK: Cambridge University Press.CrossRef Google Scholar

Millsap, R. E. (1997). Invariance in measurement and prediction: Their relationship in the single-factor case. Psychological Methods, 2, 248–260.CrossRef Google Scholar

Millsap, R. E., & Meredith, W. (1992). Inferential conditions in the statistical detection of measurement bias. Applied Psychological Measurement, 16, 389–402.CrossRef Google Scholar

Oort, F. J. (1993). Theory of violators: Assessing unidimensionality of psychological measures. In Steyer, R., Wender, K. F., & Widaman, K. F. (Eds.), Proceeding of the 7th European meeting of the Psychometric Society in Trier (pp. 377–381). Stuttgart: Gustav Fischer.Google Scholar

Oosterveld, P. (1996). Questionnaire design methods. Unpublished doctoral dissertation, University of Amsterdam, Amsterdam, The Netherlands.Google Scholar

Popham, W. J. (1997). Consequential validity: Right concern-wrong concept. Educational Measurement: Issues and Practice, 16, 9–13.CrossRef Google Scholar

Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59–104.CrossRef Google Scholar

Roid, G. H. (2003). Stanford-Binet Intelligence Scales, Fifth Edition. Itasca, IL: Riverside Publishing.Google Scholar

Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14, 271–282.CrossRef Google Scholar

Schmittmann, V. D., Dolan, C. V., Maas, H. L. J., & Neale, M. C. (2005). Discrete latent Markov models for normally distributed response data. Multivariate Behavioral Research, 40, 461–484.CrossRef Google Scholar PubMed

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin.Google Scholar

Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory. Thousand Oaks, CA: Sage.CrossRef Google Scholar

Snow, R. E., & Lohman, D. F. (1989). Implications of cognitive psychology for educational measurement. In Linn, R. L. (Ed.), Educational measurement (pp. 263–311). Washington, DC: American Council on Education and National Council on Measurement in Education.Google Scholar

Stouthard, M. E. A., Hoogstraten, J., & Mellenbergh, G. J. (1995). A study of the convergent and discriminant validity of the dental anxiety inventory. Behaviour Research and Therapy, 33, 589–595.CrossRef Google Scholar PubMed

Stouthard, M. E. A., Mellenbergh, G. J., & Hoogstraten, J. (1993). Assessment of dental anxiety: A facet approach. Anxiety, Stress, and Coping, 6, 89–105.CrossRef Google Scholar

Tuerlinckx, F., & Boeck, P. (2005). Two interpretations of the discrimination parameter. Psychometrika, 70, 629–650.CrossRef Google Scholar

Uebersax, J. S. (1999). Probit latent class analysis with dichotomous or ordered category measures: Conditional independence/dependence models. Applied Psychological Measurement, 23, 283–297.CrossRef Google Scholar

Linden, W. J. (1980). Decision models for use with criterion-referenced tests. Applied Psychological Measurement, 4, 469–492.CrossRef Google Scholar

Linden, W. J., & Hambleton, R. K. (Eds.). (1997). Handbook of modern item response theory. New York: Springer.CrossRef Google Scholar

Wechsler, D. (1997). Wechsler Adult Intelligence Scale, Third Edition. San Antonio, TX: The Psychological Corporation.Google Scholar

Wicherts, J. M., Dolan, C. V., & Hessen, D. J. (2005). Stereotype threat and group differences in test performance: A question of measurement invariance. Journal of Personality and Social Psychology, 89, 696–716.CrossRef Google Scholar PubMed

Book contents

4 - Test Validity in Cognitive Assessment

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive