Hostname: page-component-848d4c4894-hfldf Total loading time: 0 Render date: 2024-05-31T06:22:00.700Z Has data issue: false hasContentIssue false

An Analysis of (Dis)Ordered Categories, Thresholds, and Crossings in Difference and Divide-by-Total IRT Models for Ordered Responses

Published online by Cambridge University Press:  13 February 2017

Miguel A. García-Pérez*
Affiliation:
Universidad Complutense (Spain)
*
*Correspondence concerning this article should be addressed to Miguel A. García-Pérez. Universidad Complutense. Departamento de Metodología. Facultad de Psicología. Campus de Somosaguas. 28223. Madrid (Spain). Phone: +34–913943061. E-mail: miguel@psi.ucm.es

Abstract

Threshold parameters have distinct referents across models for ordered responses. In difference models, thresholds are trait levels at which responding beyond category k is as likely as responding at or below it; in divide-by-total models, thresholds are trait levels at which responding in category k is as likely as responding in category k – 1. Thus, thresholds in divide-by-total models (but not in difference models) are the crossings of the option response functions for consecutive categories. Thresholds in difference models are always ordered but they may inconsequentially yield ordered or disordered crossings. In contrast, assimilation of thresholds and crossings in divide-by-total models questions category order when crossings are disordered. We analyze these aspects of difference and divide-by-total models, their relation to the order of response categories, and the consequences of collapsing categories to instate ordered crossings under divide-by-total models. We also show that item parameters in models for ordered responses can never contradict the pre-assumed order of categories and that the empirical order can only be established using a polytomous model that does not assume ordered categories, although this often gives rise to spurious outcomes. Practical implications for scale development are discussed.

Type
Research Article
Copyright
Copyright © Universidad Complutense de Madrid and Colegio Oficial de Psicólogos de Madrid 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Adams, R. J., Wu, M. L., & Wilson, M. (2012). The Rasch rating model and the disordered threshold controversy. Educational and Psychological Measurement, 72, 547573. https://doi.org/10.1177/0013164411432166 Google Scholar
Alexandrowicz, R. W., Friedrich, F., Jahn, R., & Soulier, N. (2015). Using Rasch-models to compare the 30–, 20–, and 12-items version of the general health questionnaire taking four recoding schemes into account. Neusopsychiatrie, 29, 179191. https://doi.org/10.1007/s40211-015-0160-z Google Scholar
Andersen, E. B. (1997). The rating scale model. In van der Linden, W. J. & Hambleton, R. K. (Eds.), Handbook of modern item response theory (pp. 6884). New York, NY: Springer.Google Scholar
Andrich, D. (1978a). A rating formulation for ordered response categories. Psychometrika, 43, 561573. https://doi.org/10.1007/BF02293814 Google Scholar
Andrich, D. (1978b). Application of a psychometric rating model to ordered categories which are scored with successive integers. Applied Psychological Measurement, 2, 581594. https://doi.org/10.1177/014662167800200413 Google Scholar
Andrich, D. (1995). Models for measurement, precision, and the nondichotomization of graded responses. Psychometrika, 60, 726. https://doi.org/10.1007/BF02294426 Google Scholar
Andrich, D. (2004). Understanding resistance to the data-model relationship in Rasch’s paradigm: A reflection for the next generation. In Smith, E. V. Jr, & Smith, R. M. (Eds.), Introduction to Rasch measurement (pp. 167200). Maple Grove, MN: JAM.Google Scholar
Andrich, D. (2010). Sufficiency and conditional estimation of person parameters in the polytomous Rasch model. Psychometrika, 75, 292308. https://doi.org/10.1007/S11336-010-9154-8 Google Scholar
Andrich, D. (2013a). The legacies of R. A. Fisher and K. Pearson in the application of the polytomous Rasch model for assessing the empirical order of categories. Educational and Psychological Measurement, 73, 553580. https://doi.org/10.1177/0013164413477107 CrossRefGoogle Scholar
Andrich, D. (2013b). An expanded derivation of the threshold structure of the polytomous Rasch model that dispels any “threshold disorder controversy”. Educational and Psychological Measurement, 73, 78124. https://doi.org/10.1177/0013164412450877 Google Scholar
Andrich, D., de Jong, J. H. A. L., & Sheridan, B. E. (1997). Diagnostic opportunities with the Rasch model for ordered response categories. In Rost, J. & Langeheine, R. (Eds.), Applications of latent trait and latent class models in the social sciences (pp. 5970). New York, NY: Waxmann.Google Scholar
Annoni, P., Weziak-Bialowolska, D., & Farhan, H. (2013). Measuring the impact of the Web: Rasch modelling for survey evaluation. Journal of Applied Statistics, 40, 18311851. https://doi.org/10.1080/02664763.2013.796351 CrossRefGoogle Scholar
Ashley, L., Smith, A. B., Keding, A., Jones, H., Velikova, G., & Wright, P. (2013). Psychometric evaluation of the Revised Illness Perception Questionnaire (IPQ-R) in cancer patients: Confirmatory factor analysis and Rasch analysis. Journal of Psychosomatic Research, 75, 556562. https://doi.org/10.1016/j.jpsychores.2013.08.005 Google Scholar
Baker, F. B. (1997a). Estimation of graded response model parameters using multilog . Applied Psychological Measurement, 21, 8990. https://doi.org/10.1177/0146621697211007 Google Scholar
Baker, F. B. (1997b). Empirical sampling distributions of equating coefficients for graded and nominal response instruments. Applied Psychological Measurement, 21, 157172. https://doi.org/10.1177/01466216970212005 Google Scholar
Baker, J. G., Rounds, J. B., & Zevon, M. A. (2000). A comparison of graded response and Rasch partial credit models with subjective well-being. Journal of Educational and Behavioral Statistics, 25, 253270. http://www.jstor.org/stable/1165205 Google Scholar
Bee, P., Gibbons, C., Callaghan, P., Fraser, C., & Lovell, K. (2016). Evaluating and quantifying user and carer involvement in mental health care planning (EQUIP): Co-development of a new patient-reported outcome measure. PLoS One, 11, e0149973. https://doi.org/10.1371/journal.pone.0149973 Google Scholar
Bell, R. C., Low, L. H., Jackson, H. J., Dudgeon, P. L., Copolov, D. L., & Singh, B. S. (1994). Latent trait modelling of symptoms of schizophrenia. Psychological Medicine, 24, 335345. https://doi.org/10.1017/S0033291700027318 Google Scholar
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 2951. https://doi.org/10.1007/BF02291411 Google Scholar
Bock, R. D. (1997). The nominal categories model. In van der Linden, W. J. & Hambleton, R. K. (Eds.), Handbook of modern item response theory (pp. 3349). New York, NY: Springer.Google Scholar
Bokhary, K. A., Suttle, C., Alotaibi, A. G., Stapleton, F., & Boon, M. Y. (2013). Development and validation of the 21-item Children’s Vision for Living Scale (CVLS) by Rasch analysis. Clinical and Experimental Optometry, 96, 566576. https://doi.org/10.1111/cxo.12055 Google Scholar
Bourke, M., Wallace, L., Greskamp, M., & Tormoehlen, L., (2015). Improving objective measurement in nursing research: Rasch model analysis and diagnostics of the Nursing Students’ Clinical Stress Scale. Journal of Nursing Measurement, 23, E1E15. https://doi.org/10.1891/1061-3749.23.1.E1 Google Scholar
Brogårdh, C., Lexell, J., & Lundgren-Nilsson, Å. (2013). Construct validity of a new rating scale for the self-reported impairments in persons with late effects of polio. Physical Medicine & Rehabilitation, 5, 176181. https://doi.org/10.1016/j.pmrj.2012.07.007 Google Scholar
Choi, S. W., Reise, S. P., Pilkonis, P. A., Hays, R. D., & Cella, D. (2010). Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms. Quality of Life Research, 19, 125136. https://doi.org/10.1007/s11136-009-9560-5 CrossRefGoogle ScholarPubMed
Clinton, M., Alayan, N., & El-Alti, L. (2014). Rasch analysis of Lebanese nurses’ responses to the EIS questionnaire. SAGE Open, 110. https://doi.org/10.1177/2158244014547182 Google Scholar
das Nair, R., Moreton, B. J., & Lincoln, N. B. (2011). Rasch analysis of the Nottingham extended activities of Daily Living Scale. Journal of Rehabilitation Medicine, 43, 944950. https://doi.org/10.2340/16501977-0858 Google Scholar
De Ayala, R. J., Dodd, B. G., & Koch, W. R. (1992). A comparison of the partial credit and graded response models in computerized adaptive testing. Applied Measurement in Education, 5, 1734. https://doi.org/10.1207/s15324818ame0501_2 Google Scholar
De Ayala, R. J., & Sava-Bolesta, M. (1999). Item parameter recovery for the nominal response model. Applied Psychological Measurement, 23, 319. https://doi.org/10.1177/01466219922031130 Google Scholar
DeMars, C. E. (2003). Sample size and the recovery of nominal response model item parameters. Applied Psychological Measurement, 27, 275288. https://doi.org/10.1177/0146621603027004003 Google Scholar
Dougherty, B. E., Nichols, J. J., & Nichols, K. K. (2011). Rasch analysis of the Ocular Surface Disease Index (OSDI). Investigative Ophthalmology and Visual Science, 52, 86308635. https://doi.org/10.1167/iovs.11-8027 Google Scholar
du Toit, M. (2003). IRT from SSI: bilog-mg, multilog, parscale, testfact . Lincolnwood, IL: Scientific Software International.Google Scholar
Forrest, C. B., Bevans, K. B., Pratiwadi, R., Moon, J., Teneralli, R. E., Minton, J. M., & Tucker, C. A. (2014). Development of the PROMIS® pediatric global health (PGH-7) measure. Quality of Life Research, 23, 12211231. https://doi.org/10.1007/s11136-013-0581-8 Google Scholar
García-Pérez, M. A. (2014). Multiple-choice tests: Polytomous IRT models misestimate item information. The Spanish Journal of Psychology, 17, 118. https://doi.org/10.1017/sjp.2014.95 Google Scholar
García-Pérez, M. A., Alcalá-Quintana, R., & García-Cueto, E. (2010). A comparison of anchor-item designs for the concurrent calibration of large banks of Liker-type items. Applied Psychological Measurement, 34, 580599. https://doi.org/10.1177/0146621609351259 Google Scholar
González-Romá, V., & Espejo, B. (2003). Testing the middle response categories «Not sure», «In between» and «?» in polytomous items. Psicothema, 15, 278284. Retrieved from http://www.psicothema.com/pdf/1058.pdf Google Scholar
Gordon, R. A., Fujimoto, K., Kaestner, R., Korenman, S., & Abner, K. (2013). An assessment of the validity of the ECERS-R with implications for measures of child care quality and relations to child development. Developmental Psychology, 49, 146160. https://doi.org/10.1037/a0027899 CrossRefGoogle ScholarPubMed
Gothwal, V. K., Wright, T. A., Lamoureux, E. L., & Pesudovs, K. (2011). Multiplicative rating scales do not enable measurement of vision-related quality of life. Clinical and Experimental Optometry, 94, 5262. https://doi.org/10.1111/j.1444-0938.2010.00554.x Google Scholar
Grimbeek, P., & Nisbet, S. (2006). Surveying primary teachers about compulsory numeracy testing: Combining factor analysis with Rasch analysis. Mathematics Education Research Journal, 18, 2739. https://doi.org/10.1007/BF03217434 Google Scholar
Hahn, E. A., DeVellis, R. F., Bode, R. K., Garcia, S. F., Castel, L. D., Eisen, S. V., ... on behalf of the PROMIS Cooperative Group (2010). Measuring social health in the patient-reported outcomes measurement information system (PROMIS): Item bank development and testing. Quality of Life Research, 19, 10351044. https://doi.org/10.1007/s11136-010-9654-0 Google Scholar
Hernández, A., Espejo, B., & González-Romá, V. (2006). The functioning of central categories Middle Level and Sometimes in graded response scales: Does the label matter? Psicothema, 18, 300306. Retrieved from http://www.psicothema.com/pdf/3214.pdf Google Scholar
Jansen, P. G. W., & Roskam, E. E. (1984). The polychotomous Rasch model and dichotomization of graded responses. In Degreef, E. & van Buggenhaut, J. (Eds.), Trends in Mathematical Psychology (pp. 413430). Amsterdam, the Netherlands: North-Holland.Google Scholar
Jansen, P. G. W., & Roskam, E. E. (1986). Latent trait models and dichotomization of graded responses. Psychometrika, 51, 6991. https://doi.org/10.1007/BF02294001 Google Scholar
Kieftenbeld, V., & Natesan, P. (2012). Recovery of graded response model parameters: A comparison of marginal maximum likelihood and Markov chain Monte Carlo estimation. Applied Psychological Measurement, 36, 399419. https://doi.org/10.1177/0146621612446170 Google Scholar
Linacre, J. M. (1999). Category disordering (disordered categories) vs. threshold disordering (disordered thresholds). Rasch Measurement Transactions, 13, 675.Google Scholar
Linacre, J. M. (2004). Rasch model estimation: Further topics. In Smith, E. V. Jr. & Smith, R. M. (Eds.), Introduction to rasch measurement (pp. 4872). Maple Grove, MN: JAM.Google Scholar
Lundgren-Nilsson, Å., Tennant, A., Grimby, G., & Sunnerhagen, K. S. (2006). Cross-diagnostic validity in a generic instrument: An example from the Functional Independence Measure in Scandinavia. Health and Quality of Life Outcomes, 4, 55. https://doi.org/10.1186/1477-7525-4-55 Google Scholar
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149174. https://doi.org/10.1007/BF02296272 Google Scholar
Masters, G. N., & Wright, B. D. (1997). The partial credit model. In van der Linden, W. J. & Hambleton, R. K. (Eds.), Handbook of modern item response theory (pp. 101121). New York, NY: Springer.Google Scholar
Maydeu-Olivares, A. (2005). Further empirical results on parametric versus non-parametric IRT modeling of Likert-type personality data. Multivariate Behavioral Research, 40, 261279. https://doi.org/10.1207/s15327906mbr4002_5 Google Scholar
Maydeu-Olivares, A., Drasgow, F., & Mead, A. D. (1994). Distinguishing among parametric ítem response models for polychotomous ordered data. Applied Psychological Measurement, 18, 245256. https://doi.org/10.1177/014662169401800305 Google Scholar
Muraki, E. (1990). Fitting a polytomous item response model to Likert-type data. Applied Psychological Measurement, 14, 5971. https://doi.org/10.1177/014662169001400106 Google Scholar
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159176. https://doi.org/10.1177/014662169201600206 Google Scholar
Muraki, E. (1993). Information functions of the generalized partial credit model. Applied Psychological Measurement, 17, 351363. https://doi.org/10.1177/014662169301700403 Google Scholar
Muraki, E. (1997). A generalized partial credit model. In van der Linden, W. J. & Hambleton, R. K. (Eds.), Handbook of modern item response theory (pp. 153164). New York, NY: Springer.CrossRefGoogle Scholar
Murray, A. L., Booth, T., & Molenaar, D. (2016). When middle really means “top” or “bottom”: An analysis of the 16PF5 using Bock’s nominal response model. Journal of Personality Assessment, 98, 319331. https://doi.org/10.1080/00223891.2015.1095197 Google Scholar
Nilsson, Å. L., Sunnerhagen, K. S., & Grimby, G. (2005). Scoring alternatives for FIM in neurological disorders applying Rasch analysis. Acta Neurologica Scandinavica, 111, 264273. https://doi.org/10.1111/j.1600-0404.2005.00404.x Google Scholar
Oluboyede, Y., & Smith, A. B. (2013). Evidence of a unidimensional 15-item version of the CASP-19 using a Rasch model approach. Quality of Life Research, 22, 24292433. https://doi.org/10.1007/s11136-013-0367-z Google Scholar
Osborne, R. H., Batterham, R. W., Elsworth, G. R., Hawkins, M., & Buchbinder, R. (2013). The grounded psychometric development and initial validation of the Health Literacy Questionnaire (HLQ). BMC Public Health, 13, 658. https://doi.org/10.1186/1471-2458-13-658 Google Scholar
Preston, K., Reise, S., Cai, L., & Hays, R. D. (2011). Using the nominal response model to evaluate response category discrimination in the PROMIS Emotional Distress item pools. Educational and Psychological Measurement, 71, 523550. https://doi.org/10.1177/0013164410382250 Google Scholar
Reckase, M. D. (2009). Multidimensional item response theory. New York, NY: Springer.Google Scholar
Reise, S. P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27, 133144. https://doi.org/10.1111/j.1745-3984.1990.tb00738.x Google Scholar
Roskam, E. E. (1995). Graded responses and joining categories: A rejoinder to Andrich’ “Models for measurement, precision, and nondichotomization of graded responses”. Psychometrika, 60, 2735. https://doi.org/10.1007/BF02294427 Google Scholar
Roskam, E. E., & Jansen, P. G. W. (1989). Conditions for Rasch-dichotomizability of the unidimensional polytomous Rasch model. Psychometrika, 54, 317332. https://doi.org/10.1007/BF02294523 Google Scholar
Rubio, V. J., Aguado, D., Hontangas, P. M., & Hernández, J. M. (2015). Psychometric properties of an emotional adjustment measure: An application of the graded response model. European Journal of Psychological Assessment, 23, 3946. https://doi.org/10.1027/1015-5759.23.1.39 Google Scholar
Salzberger, T. (2015). The validity of polytomous items in the Rasch model – The role of statistical evidence of the threshold order. Psychological Test and Assessment Modeling, 57, 377395.Google Scholar
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph No. 17. Richmond, VA: Psychometric Society. Retrieved from https://www.psychometricsociety.org/sites/default/files/pdf/MN17.pdf Google Scholar
Samejima, F. (1972). A general model for free-response data. Psychometrika Monograph No. 18. Richmond, VA: Psychometric Society. Retrieved from https://www.psychometricsociety.org/sites/default/files/pdf/MN18.pdf Google Scholar
Samejima, F. (1996). Evaluation of mathematical models for ordered polychotomous responses. Behaviormetrika, 23, 1735. https://doi.org/10.2333/bhmk.23.17 Google Scholar
Samejima, F. (1997). Graded response model. In van der Linden, W. J. & Hambleton, R. K. (Eds.), Handbook of modern item response theory (pp. 85100). New York, NY: Springer.Google Scholar
Smith, H. J., Richardson, J. B., & Tennant, A. (2009). Modification and validation of the Lysholm Knee Scale to assess articular cartilage damage. Osteoarthritis and Cartilage, 17, 5358. https://doi.org/10.1016/j.joca.2008.05.002 Google Scholar
Smith, E. V. Jr., Wakely, M. B., de Kruif, R. E. L., & Swartz, C. W. (2003). Optimizing rating scales for self-efficacy (and other) research. Educational and Psychological Measurement, 63, 369391. https://doi.org/10.1177/0013164403063003002 Google Scholar
Thissen, D., & Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, 51, 567577. https://doi.org/10.1007/BF02295596 Google Scholar
Thissen, D., Steinberg, L., & Fitzpatrick, A. R. (1989). Multiple-choice models: The distractors are also part of the item. Journal of Educational Measurement, 26, 161176. https://doi.org/10.1111/j.1745-3984.1989.tb00326.x CrossRefGoogle Scholar
van der Wal, M. B. A., Tuinebreijer, W. E., Bloemen, M. C. T., Verhaegen, P. D. H. M., Middelkoop, E., & van Zuijlen, P. P. M. (2012). Rasch analysis of the Patient and Observer Scar Assessment Scale (POSAS) in burn scars. Quality of Life Research, 21, 1323. https://doi.org/10.1007/s11136-011-9924-5 Google Scholar
Wang, Y.-C., Deutscher, D., Yen, S. -C., Werneke, M. W., & Mioduski, J. E. (2014). The self-report Fecal Incontinence and Constipation Questionnaire in patients with pelvic-floor dysfunction seeking outpatient rehabilitation. Physical Therapy, 94, 273288.CrossRefGoogle ScholarPubMed
Wang, Z., Zhou, J., Luo, X., Xu, Y., She, X., Chen, L., … Wang, X. (2015). Rasch analysis of the Adult Strabismus Quality of Life Questionnaire (AS-20) among Chinese adult patients with strabismus. PLoS ONE, 10, e0142188. https://doi.org/10.1371/journal.pone.0142188 Google Scholar
Wetzel, E., & Carstensen, C. H. (2014). Reversed thresholds in partial credit models: A reason for collapsing categories? Assessment, 21, 765774. https://doi.org/10.1177/1073191114530775 Google Scholar
Wetzel, E., Hell, B., & Pässler, K. (2012). Comparison of different test construction strategies in the development of a gender fair interest inventory using verbs. Journal of Career Assessment, 20, 88104. https://doi.org/10.1177/1069072711417166 Google Scholar
Wollack, J. A., Bolt, D. M., Cohen, A. S., & Lee, Y.-S. (2002). Recovery of item parameters in the nominal response model: A comparison of marginal maximum likelihood estimation and Markov chain Monte Carlo estimation. Applied Psychological Measurement, 26, 339352. https://doi.org/10.1177/0146621602026003007 CrossRefGoogle Scholar
Zhong, Q., Gelaye, B., Fann, J. R., Sanchez, S. E., & Williams, M. A. (2014). Cross-cultural validity of the Spanish version of the PHQ-9 among pregnant Peruvian women: A Rasch item response theory analysis. Journal of Affective Disorders, 158, 148153. https://doi.org/10.1016/j.jad.2014.02.012 Google Scholar
Supplementary material: PDF

García-Pérez supplementary material

Figures A1-A5

Download García-Pérez supplementary material(PDF)
PDF 155 KB