Hostname: page-component-76c49bb84f-c7tcl Total loading time: 0 Render date: 2025-07-05T15:04:45.309Z Has data issue: false hasContentIssue false

Conditional Statistical Inference with Multistage Testing Designs

Published online by Cambridge University Press:  01 January 2025

Robert J. Zwitser*
Affiliation:
Cito Institute for Educational Measurement
Gunter Maris
Affiliation:
Cito Institute for Educational Measurement and University of Amsterdam
*
Requests for reprints should be sent to Robert J. Zwitser, Psychometric Research Center, Cito Institute for Educational Measurement, P.O. Box 1034, 6801 MG, Arnhem, The Netherlands. E-mail: robert.zwitser@cito.nl

Abstract

In this paper it is demonstrated how statistical inference from multistage test designs can be made based on the conditional likelihood. Special attention is given to parameter estimation, as well as the evaluation of model fit. Two reasons are provided why the fit of simple measurement models is expected to be better in adaptive designs, compared to linear designs: more parameters are available for the same number of observations; and undesirable response behavior, like slipping and guessing, might be avoided owing to a better match between item difficulty and examinee proficiency. The results are illustrated with simulated data, as well as with real data.

Information

Type
Original Paper
Copyright
Copyright © 2013 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Andersen, E.B. (1973a). Conditional inference and models for measuring. Mentalhygiejnisk Forskningsinstitut.Google Scholar
Andersen, E.B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38, 123140.CrossRefGoogle Scholar
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In Lord, F.M., & Novick, M.R. (Eds.), Statistical theories of mental test scores (pp. 395479). Reading: Addison-Wesley.Google Scholar
Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters. Psychometrika, 46, 443460.CrossRefGoogle Scholar
Cronbach, L.J., & Gleser, G.C. (1965). Psychological test and personnel decisions (2nd ed.). Urbana: University of Illinois Press.Google Scholar
Eggen, T.J.H.M., & Verhelst, N.D. (2011). Item calibration in incomplete designs. Psychológica, 32, 107132.Google Scholar
Glas, C.A.W. (1988). The Rasch model and multistage testing. Journal of Educational Statistics, 13, 4552.CrossRefGoogle Scholar
Glas, C.A.W. (1989). Contributions to estimating and testing Rasch models. Unpublished doctoral dissertation, Arnhem: Cito.Google Scholar
Glas, C.A.W. (2000). Item calibration and parameter drift. In Van der Linden, W.J., & Glas, C.A.W. (Eds.), Computerized adaptive testing: theory and practice (pp. 183199). Dordrecht: Kluwer Academic Publishers.CrossRefGoogle Scholar
Glas, C.A.W. (2010). Item parameter estimation and item fit analysis. In Van der Linden, W.J., & Glas, C.A.W. Elements of adaptive testing (pp. 269288). Berlin: Springer.Google Scholar
Glas, C.A.W., Wainer, H., & Bradlow, E. (2000). MML and EAP estimation in testlet-based adaptive testing. In Van der Linden, W.J., & Glas, C.A.W. (Eds.), Computerized adaptive testing: theory and practice (pp. 271287). Dordrecht: Kluwer Academic Publishers.CrossRefGoogle Scholar
Kubinger, K.D., Steinfeld, J., Reif, M., & Yanagida, T. (2012). Biased (conditional) parameter estimation of a Rasch model calibrated item pool administered according to a branched testing design. Psychological Test and Assessment Modeling, 52(4), 450460.Google Scholar
Lord, F.M. (1971). The self-scoring flexilevel test. Journal of Educational Measurement, 8(3), 147151.CrossRefGoogle Scholar
Lord, F.M. (1971). A theoretical study of two-stage testing. Psychometrika, 36, 227242.CrossRefGoogle Scholar
Masters, G.N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149174.CrossRefGoogle Scholar
Neyman, J., & Scott, E.L. (1948). Consistent estimates based on partially consistent observations. Econometrica, 16, 132.CrossRefGoogle Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: The Danish Institute of Educational Research. (Expanded edition, 1980, Chicago, The University of Chicago Press).Google Scholar
Rubin, D. (1976). Inference and missing data. Biometrika, 63, 581592.CrossRefGoogle Scholar
Van der Linden, W.J., & Glas, C.A.W. (Eds.) (2010). Elements of adaptive testing. New York: Springer.CrossRefGoogle Scholar
Verhelst, N.D., & Glas, C.A.W. (1995). The one parameter logistic model: OPLM. In Fischer, G.H., & Molenaar, I.W. (Eds.), Rasch models: foundations, recent developments and applications (pp. 215238). New York: Springer.CrossRefGoogle Scholar
Verhelst, N.D., Glas, C.A.W., & Verstralen, H.H.F.M. (1993). OPLM: one parameter logistic model. Arnhem: Cito. Computer program and manual.Google Scholar
Wainer, H., Bradlow, E., & Du, Z. (2000). Testlet response theory: an analog for the 3pl model useful in testlet-based adaptive testing. In Van der Linden, W., & Glas, C. (Eds.), Computerized adaptive testing: theory and practice (pp. 245269). Dordrecht: Kluwer Academic Publishers.CrossRefGoogle Scholar
Warm, T. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427450.CrossRefGoogle Scholar
Weiss, D.J. (Ed.) (1983). New horizons in testing: latent trait test theory and computerized adaptive testing. New York: Academic Press.Google Scholar
Zenisky, A., Hambleton, R.K., & Luecht, R. (2010). Multistage testing: issues, designs and research. In Van der Linden, W.J., & Glas, C.A.W. (Eds.), Elements of adaptive testing (pp. 355372). Berlin: Springer.Google Scholar