Multiple-Choice Tests: Polytomous IRT Models Misestimate Item Information

Miguel A. García-Pérez

doi:10.1017/sjp.2014.95

Multiple-Choice Tests: Polytomous IRT Models Misestimate Item Information

Published online by Cambridge University Press: 18 December 2014

Miguel A. García-Pérez

Show author details

Miguel A. García-Pérez*: Affiliation:
Universidad Complutense (Spain)
*: *Correspondence concerning this article should be sent to Miguel A. García-Pérez. Departamento de Metodología. Facultad de Psicología. Universidad Complutense. Campus de Somosaguas. 28223. Madrid (Spain). Phone: +34–913943061. Fax: +34–913943189. E-mail: miguel@psi.ucm.es

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Likert-type items and polytomous models are preferred over yes–no items and dichotomous models for the measurement of attitudes, because a broader range of response categories provides superior item and test information functions. Yet, for ability assessment with multiple-choice tests, the dichotomous three-parameter logistic model (3PLM) is often chosen. Because multiple-choice responses are polytomous before they are categorized as correct or incorrect, a polytomous characterization might render more efficient tests. Early studies suggested that the nominal response model (NRM) is advantageous in this respect. We investigate the reasons for those results and the outcomes of a polytomous characterization based on the multiple-choice model (MCM). An empirical data set is used to compare polytomous (NRM and MCM) and dichotomous (3PLM) characterizations of a test. The results revealed superior item and test information functions from polytomous models. Yet, close inspection suggests that these outcomes are artifactual and two simulation studies confirmed this point. These studies revealed a structural inadequacy of the NRM for multiple-choice items and that the MCM characterization outperforms the 3PLM characterization only when distractor endorsement frequencies vary non-monotonically with ability, although this feature is rarely observed in empirical data sets.

Keywords

item response theory item information multiple-choice items polytomous models

Type: Research Article
Information: The Spanish Journal of Psychology , Volume 17 , 2014 , E88

DOI: https://doi.org/10.1017/sjp.2014.95 [Opens in a new window]
Copyright: Copyright © Universidad Complutense de Madrid and Colegio Oficial de Psicólogos de Madrid 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abad, F. J., Olea, J., & Ponsoda, V. (2009). The multiple-choice model: Some solutions for estimation of parameters in the presence of omitted responses. Applied Psychological Measurement, 33, 200–221. http://dx.doi.org/10.1177/0146621608320760 Google Scholar

Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29–51. http://dx.doi.org/10.1007/BF02291411 Google Scholar

Bolt, D. M., Cohen, A. S., & Wollack, J. A. (2001). A mixture item response model for multiple-choice data. Journal of Educational and Behavioral Statistics, 26, 381–409. http://dx.doi.org/10.3102/10769986026004381 Google Scholar

Bolt, D. M., Wollack, J. A., & Suh, Y. (2012). Application of a multidimensional nested logit model to multiple-choice test items. Psychometrika, 77, 339–357. http://dx.doi.org/10.1007/s11336-012-9257-5 Google Scholar

Bortolotti, S. L. V., Tezza, R., de Andrade, D. F., Bornia, A. C., & de Sousa Júnior, A. F. (2013). Relevance and advantages of using the item response theory. Quality & Quantity, 47, 2341–2360. http://dx.doi.org/10.1007/s11135-012-9684-5 Google Scholar

Bulut, O., & Kan, A. (2012). Application of computerized adaptive testing to entrance examination for graduate studies in Turkey. Eğitim Araştırmaları / Eurasian Journal of Educational Research, 49, 61–80.Google Scholar

De Ayala, R. J. (1989). A comparison of the nominal response model and the three-parameter logistic model in computerized adaptive testing. Educational and Psychological Measurement, 49, 789–805. http://dx.doi.org/10.1177/001316448904900403 CrossRef Google Scholar

De Ayala, R. J. (1992). The nominal response model in computerized adaptive testing. Applied Psychological Measurement, 16, 327–343. http://dx.doi.org/10.1177/014662169201600403 Google Scholar

Dodd, B. G., De Ayala, R. J., & Koch, W. R. (1995). Computerized adaptive testing with polytomous items. Applied Psychological Measurement, 19, 5–22. http://dx.doi.org/10.1177/014662169501900103 Google Scholar

du Toit, M. (2003). IRT from SSI: BILOG-MG, MULTILOG, PARSCALE, TESTFACT. Lincolnwood, IL: Scientific Software International.Google Scholar

García-Pérez, M. A. (1989). Item sampling, guessing, partial information and decision-making in achievement testing. In Roskam, E. E. (Ed.) Mathematical Psychology in Progress (pp. 249–265). New York, NY: Springer.Google Scholar

García-Pérez, M. A. (1993). In defence of ‘none of the above’. British Journal of Mathematical and Statistical Psychology, 46, 213–229. http://dx.doi.org/10.1111/j.2044-8317.1993.tb01013.x Google Scholar

Kalender, I. (2012). Computerized adaptive testing for student selection to higher education. Yükseköğretim Dergisi / Journal of Higher Education, 2, 13–19.Google Scholar

Kang, T., Cohen, A. S., & Sung, H.-J. (2009). Model selection indices for polytomous items. Applied Psychological Measurement, 33, 499–518. http://dx.doi.org/10.1177/0146621608327800 CrossRef Google Scholar

Lorenzo-Seva, U., & Ferrando, P. J. (2006). FACTOR: A computer program to fit the exploratory factor analysis model. Behavior Research Methods, 38, 88–91. http://dx.doi.org/10.3758/BF03192753 Google Scholar

Nogami, Y., & Hayashi, N. (2010). A Japanese adaptive test of English as a foreign language: Developmental and operational aspects. In van der Linden, W. J. & Glas, C. A. W. (Eds.) Elements of Adaptive Testing (pp. 151–165). New York, NY: Springer.Google Scholar

Olatunji, B. O., Woods, C. M., de Jong, P. J., Teachman, B. A., Sawchuk, C. N., & David, B. (2009). Development and initial validation of an abbreviated spider phobia questionnaire using item response theory. Behavior Therapy, 40, 114–130. http://dx.doi.org/10.1016/j.beth.2008.04.002 Google Scholar

Olea, J., Abad, F. J., Ponsoda, V., & Ximénez, C. (2004). Un test adaptativo informatizado para evaluar el conocimiento de inglés escrito: Diseño y comprobaciones psicométricas [A computerized adaptive test for the assessment of written English: Design and psychometric properties]. Psicothema, 16, 519–525.Google Scholar

Olea, J., Abad, F. J., Ponsoda, V., Barrada, J. R., & Aguado, D. (2011). eCAT-Listening: Design and psychometric properties of a computerized adaptive test on English Listening. Psicothema, 23, 802–807.Google Scholar

Reif, M. (2013). Package mcIRT - IRT models for multiple choice items (R package version 0.41) [computer software]. Retrieved from http://cran.r-project.org/web/packages/mcIRT Google Scholar

Revuelta, J. (2004). Analysis of distractor difficulty in multiple-choice items. Psychometrika, 69, 217–234. http://dx.doi.org/10.1007/BF02295941 CrossRef Google Scholar

Revuelta, J. (2005). An item response model for nominal data based on the rising selection ratios criterion. Psychometrika, 70, 305–324. http://dx.doi.org/10.1007/s11336-002-0975-y Google Scholar

Revuelta, J. (2010). Estimating difficulty from polytomous categorical data. Psychometrika, 75, 331–350. http://dx.doi.org/10.1007/s11336-009-9145-9 Google Scholar

Rudner, L. M. (2010). Implementing the Graduate Management Admission Test computerized adaptive test. In van der Linden, W. J. & Glas, C. A. W. (Eds.), Elements of Adaptive Testing (pp. 151–165). New York, NY: Springer.Google Scholar

Samejima, F. (1970). Estimation of latent ability using a response pattern of graded scores. Psychometrika, 35, 139. http://dx.doi.org/10.1007/BF02290599 Google Scholar

San Martín, E., del Pino, G., & De Boeck, P. (2006). IRT models for ability-based guessing. Applied Psychological Measurement, 30, 183–203. http://dx.doi.org/10.1177/0146621605282773 CrossRef Google Scholar

Suh, Y., & Bolt, D. M. (2010). Nested logit models for multiple-choice item response data. Psychometrika, 75, 454–473. http://dx.doi.org/10.1007/s11336-010-9163-7 Google Scholar

Thissen, D., & Steinberg, L. (1984). A response model for multiple choice items. Psychometrika, 49, 501–519. http://dx.doi.org/10.1007/BF02302588 Google Scholar

Thissen, D., Steinberg, L., & Fitzpatrick, A. R. (1989). Multiple-choice models: The distractors are also part of the item. Journal of Educational Measurement, 26, 161–176. http://dx.doi.org/10.1111/j.1745-3984.1989.tb00326.x Google Scholar

Verschoor, A. J., & Straetmans, G. J. J. M. (2010). MATHCAT: A flexible testing system in mathematics education for adults. In van der Linden, W. J. & Glas, C. A. W. (Eds.) Elements of Adaptive Testing (pp. 137–149). New York, NY: Springer.Google Scholar

Article contents

Multiple-Choice Tests: Polytomous IRT Models Misestimate Item Information

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests