Hostname: page-component-76c49bb84f-vcmr7 Total loading time: 0 Render date: 2025-07-05T18:09:52.884Z Has data issue: false hasContentIssue false

Does Standard Deviation Matter? Using “Standard Deviation” to Quantify Security of Multistage Testing

Published online by Cambridge University Press:  01 January 2025

Chun Wang*
Affiliation:
University of Minnesota at Twin-Cities
Yi Zheng
Affiliation:
University of Illinois at Urbana-Champaign
Hua-Hua Chang
Affiliation:
University of Illinois at Urbana-Champaign
*
Requests for reprints should be sent to Chun Wang, University of Minnesota at Twin-Cities, 75 East River Road, Elliott Hall N658, Minneapolis, MN 55403, USA. E-mail: wang4066@umn.edu

Abstract

With the advent of web-based technology, online testing is becoming a mainstream mode in large-scale educational assessments. Most online tests are administered continuously in a testing window, which may post test security problems because examinees who take the test earlier may share information with those who take the test later. Researchers have proposed various statistical indices to assess the test security, and one most often used index is the average test-overlap rate, which was further generalized to the item pooling index (Chang & Zhang, 2002, 2003). These indices, however, are all defined as the means (that is, the expected proportion of common items among examinees) and they were originally proposed for computerized adaptive testing (CAT). Recently, multistage testing (MST) has become a popular alternative to CAT. The unique features of MST make it important to report not only the mean, but also the standard deviation (SD) of test overlap rate, as we advocate in this paper. The standard deviation of test overlap rate adds important information to the test security profile, because for the same mean, a large SD reflects that certain groups of examinees share more common items than other groups. In this study, we analytically derived the lower bounds of the SD under MST, with the results under CAT as a benchmark. It is shown that when the mean overlap rate is the same between MST and CAT, the SD of test overlap tends to be larger in MST. A simulation study was conducted to provide empirical evidence. We also compared the security of MST under the single-pool versus the multiple-pool designs; both analytical and simulation studies show that the non-overlapping multiple-pool design will slightly increase the security risk.

Information

Type
Original Paper
Copyright
Copyright © 2013 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Ariel, A., Veldkamp, B.P., van der Linden, W.J. (2004). Constructing rotating item pools for constrained adaptive testing. Journal of Educational Measurement, 41, 345359CrossRefGoogle Scholar
Barrada, J.R., Olea, J., Abad, F.J. (2008). Rotating item banks versus restriction of maximum exposure rates in computerized adaptive testing. The Spanish Journal of Psychology, 11, 618625CrossRefGoogle ScholarPubMed
Breithaupt, K., Hare, D.R. (2007). Automated simultaneous assembly of multistage testlets for a high-stakes licensing examination. Educational and Psychological Measurement, 67(1), 520CrossRefGoogle Scholar
Chang, H.-H. (2004). Understanding computerized adaptive testing: from Robbins-Monro to Lord and beyond. In Kaplan, D. (Eds.), The Sage handbook of quantitative methodology for the social sciences, Thousand Oaks: Sage 117133Google Scholar
Chang, H., Wang, S., & Ying, Z. (1997). Three dimensional visulization of item/test information. Paper presented at the annual meeting of American Educational Research Association, Chicago, IL. Google Scholar
Chang, H.-H., Ying, Z. (1999). Alpha-stratified multistage computerized adaptive testing. Applied Psychological Measurement, 23, 211222CrossRefGoogle Scholar
Chang, H.-H., Zhang, J. (2002). Hypergeometric family and item overlap rates in computerized adaptive testing. Psychometrika, 67, 387398CrossRefGoogle Scholar
Chang, H., & Zhang, J. (2003, April). Assessing CAT security breaches by the item pooling index—to compromise a CAT item bank, how many thieves are needed? Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago. Google Scholar
Cheng, Y., Chang, H. (2009). The maximum priority index method for severely constrained item selection in computerized adaptive testing. British Journal of Mathematical & Statistical Psychology, 62, 369383CrossRefGoogle ScholarPubMed
Chen, S.Y., Ankenmann, R.D., Spray, J.A. (2003). The relationship between item exposure and test overlap in computerized adaptive testing. Journal of Educational Measurement, 40, 129145CrossRefGoogle Scholar
Davey, T., Nering, M. (2002). Controlling item exposure and maintaining item security. In Mills, C., Potenza, M.T., Fremer, J.J., Ward, W.C. (Eds.), Computer-based testing: building the foundation for future assessments, Mahwah: Lawrence Erlbaum AssociatesGoogle Scholar
Dean, V., Martineau, J. (2012). A state perspective on enhancing assessment and accountability systems through systematic implementation of technology. In Lissitz, R.W., Jiao, H. (Eds.), Computers and their impact on state assessment: recent history and predictions for the future, Charlotte: Information Age Publisher 5577Google Scholar
Finkelman, M., Nering, M.L., Roussos, L.A. (2009). A conditional exposure control method for multidimensional adaptive testing. Journal of Educational Measurement, 46(1), 84103CrossRefGoogle Scholar
Hendrickson, A. (2007). An NCME instructional module on multistage testing. Educational Measurement, Issues and Practice, 26, 4452CrossRefGoogle Scholar
Kim, H., & Plake, B. (1993, April). Monte Carlo simulation comparison of two-stage testing and computerized adaptive testing. Paper presented at the meeting of the National Council on Measurement in Education, Atlanta, GA. Google Scholar
Lim, E. (2010). The effectiveness of using multiple item pools to increase test security in computerized adaptive testing. Unpublished doctoral thesis, University of Illinois at Urbana-Champaign. Google Scholar
Luecht, R.M., Nungester, R.J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 229249CrossRefGoogle Scholar
Mills, C.N., Steffen, M. (2000). The GRE computer adaptive test: operational issues. In van der Linden, W.J., Glas, C.A.W. (Eds.), Computerized adaptive testing: theory and practice, Dordrecht: Kluwer 7599CrossRefGoogle Scholar
Stocking, M.L., Lewis, C. (1998). Controlling item exposure conditional on ability in computerized adaptive testing. Journal of Educational and Behavioral Statistics, 23, 5775CrossRefGoogle Scholar
Wang, C., & Chang, H.-H. (2008, June). Continuous a-stratification index in computerized item selection. Paper presented at the annual meeting of the Psychometric Society, Durham, NH. Google Scholar
Way, W.D. (1998). Protecting the integrity of computerized testing item pools. Educational Measurement, Issues and Practice, 17, 1727CrossRefGoogle Scholar
Way, W., Zara, A., & Leahy, J. (1996, April). Modifying the NCLEX TMCAT item selection algorithm to improve item exposure. Paper presented at the Annual Meeting of the American Educational Research Association, New York, NY. Google Scholar
Yi, Q., Zhang, J., Chang, H.-H. (2008). Severity of organized item theft in computerized adaptive testing: a simulation study. Applied Psychological Measurement, 32(7), 543558Google Scholar
Zhang, J., Chang, H.-H., Yi, Q. (2012). Comparing single-pool and multiple-pool designs regarding test security in computerized testing. Behavior Research Methods, 44, 742752CrossRefGoogle ScholarPubMed