Does Standard Deviation Matter? Using “Standard Deviation” to Quantify Security of Multistage Testing

Chun Wang; Yi Zheng; Hua-Hua Chang

doi:10.1007/s11336-013-9356-y

Does Standard Deviation Matter? Using “Standard Deviation” to Quantify Security of Multistage Testing

Published online by Cambridge University Press: 01 January 2025

Chun Wang ,

Yi Zheng and

Hua-Hua Chang

Show author details

Chun Wang*: Affiliation:
University of Minnesota at Twin-Cities
Yi Zheng: Affiliation:
University of Illinois at Urbana-Champaign
Hua-Hua Chang: Affiliation:
University of Illinois at Urbana-Champaign
*: Requests for reprints should be sent to Chun Wang, University of Minnesota at Twin-Cities, 75 East River Road, Elliott Hall N658, Minneapolis, MN 55403, USA. E-mail: wang4066@umn.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

With the advent of web-based technology, online testing is becoming a mainstream mode in large-scale educational assessments. Most online tests are administered continuously in a testing window, which may post test security problems because examinees who take the test earlier may share information with those who take the test later. Researchers have proposed various statistical indices to assess the test security, and one most often used index is the average test-overlap rate, which was further generalized to the item pooling index (Chang & Zhang, 2002, 2003). These indices, however, are all defined as the means (that is, the expected proportion of common items among examinees) and they were originally proposed for computerized adaptive testing (CAT). Recently, multistage testing (MST) has become a popular alternative to CAT. The unique features of MST make it important to report not only the mean, but also the standard deviation (SD) of test overlap rate, as we advocate in this paper. The standard deviation of test overlap rate adds important information to the test security profile, because for the same mean, a large SD reflects that certain groups of examinees share more common items than other groups. In this study, we analytically derived the lower bounds of the SD under MST, with the results under CAT as a benchmark. It is shown that when the mean overlap rate is the same between MST and CAT, the SD of test overlap tends to be larger in MST. A simulation study was conducted to provide empirical evidence. We also compared the security of MST under the single-pool versus the multiple-pool designs; both analytical and simulation studies show that the non-overlapping multiple-pool design will slightly increase the security risk.

Keywords

computerized adaptive testing multistage testing item exposure test overlap rate standard deviation

Information

Type: Original Paper
Information: Psychometrika , Volume 79 , Issue 1 , January 2014 , pp. 154 - 174

DOI: https://doi.org/10.1007/s11336-013-9356-y [Opens in a new window]
Copyright: Copyright © 2013 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Ariel, A., Veldkamp, B.P., van der Linden, W.J. (2004). Constructing rotating item pools for constrained adaptive testing. Journal of Educational Measurement, 41, 345–359CrossRef Google Scholar

Barrada, J.R., Olea, J., Abad, F.J. (2008). Rotating item banks versus restriction of maximum exposure rates in computerized adaptive testing. The Spanish Journal of Psychology, 11, 618–625CrossRef Google Scholar PubMed

Breithaupt, K., Hare, D.R. (2007). Automated simultaneous assembly of multistage testlets for a high-stakes licensing examination. Educational and Psychological Measurement, 67(1), 520CrossRef Google Scholar

Chang, H.-H. (2004). Understanding computerized adaptive testing: from Robbins-Monro to Lord and beyond. In Kaplan, D. (Eds.), The Sage handbook of quantitative methodology for the social sciences, Thousand Oaks: Sage 117133Google Scholar

Chang, H., Wang, S., & Ying, Z. (1997). Three dimensional visulization of item/test information. Paper presented at the annual meeting of American Educational Research Association, Chicago, IL. Google Scholar

Chang, H.-H., Ying, Z. (1999). Alpha-stratified multistage computerized adaptive testing. Applied Psychological Measurement, 23, 211–222CrossRef Google Scholar

Chang, H.-H., Zhang, J. (2002). Hypergeometric family and item overlap rates in computerized adaptive testing. Psychometrika, 67, 387–398CrossRef Google Scholar

Chang, H., & Zhang, J. (2003, April). Assessing CAT security breaches by the item pooling index—to compromise a CAT item bank, how many thieves are needed? Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago. Google Scholar

Cheng, Y., Chang, H. (2009). The maximum priority index method for severely constrained item selection in computerized adaptive testing. British Journal of Mathematical & Statistical Psychology, 62, 369–383CrossRef Google Scholar PubMed

Chen, S.Y., Ankenmann, R.D., Spray, J.A. (2003). The relationship between item exposure and test overlap in computerized adaptive testing. Journal of Educational Measurement, 40, 129–145CrossRef Google Scholar

Davey, T., Nering, M. (2002). Controlling item exposure and maintaining item security. In Mills, C., Potenza, M.T., Fremer, J.J., Ward, W.C. (Eds.), Computer-based testing: building the foundation for future assessments, Mahwah: Lawrence Erlbaum AssociatesGoogle Scholar

Dean, V., Martineau, J. (2012). A state perspective on enhancing assessment and accountability systems through systematic implementation of technology. In Lissitz, R.W., Jiao, H. (Eds.), Computers and their impact on state assessment: recent history and predictions for the future, Charlotte: Information Age Publisher 5577Google Scholar

Finkelman, M., Nering, M.L., Roussos, L.A. (2009). A conditional exposure control method for multidimensional adaptive testing. Journal of Educational Measurement, 46(1), 84103CrossRef Google Scholar

Hendrickson, A. (2007). An NCME instructional module on multistage testing. Educational Measurement, Issues and Practice, 26, 44–52CrossRef Google Scholar

Kim, H., & Plake, B. (1993, April). Monte Carlo simulation comparison of two-stage testing and computerized adaptive testing. Paper presented at the meeting of the National Council on Measurement in Education, Atlanta, GA. Google Scholar

Lim, E. (2010). The effectiveness of using multiple item pools to increase test security in computerized adaptive testing. Unpublished doctoral thesis, University of Illinois at Urbana-Champaign. Google Scholar

Luecht, R.M., Nungester, R.J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 229249CrossRef Google Scholar

Mills, C.N., Steffen, M. (2000). The GRE computer adaptive test: operational issues. In van der Linden, W.J., Glas, C.A.W. (Eds.), Computerized adaptive testing: theory and practice, Dordrecht: Kluwer 7599CrossRef Google Scholar

Stocking, M.L., Lewis, C. (1998). Controlling item exposure conditional on ability in computerized adaptive testing. Journal of Educational and Behavioral Statistics, 23, 57–75CrossRef Google Scholar

Wang, C., & Chang, H.-H. (2008, June). Continuous a-stratification index in computerized item selection. Paper presented at the annual meeting of the Psychometric Society, Durham, NH. Google Scholar

Way, W.D. (1998). Protecting the integrity of computerized testing item pools. Educational Measurement, Issues and Practice, 17, 17–27CrossRef Google Scholar

Way, W., Zara, A., & Leahy, J. (1996, April). Modifying the NCLEX ^TMCAT item selection algorithm to improve item exposure. Paper presented at the Annual Meeting of the American Educational Research Association, New York, NY. Google Scholar

Yi, Q., Zhang, J., Chang, H.-H. (2008). Severity of organized item theft in computerized adaptive testing: a simulation study. Applied Psychological Measurement, 32(7), 543558Google Scholar

Zhang, J., Chang, H.-H., Yi, Q. (2012). Comparing single-pool and multiple-pool designs regarding test security in computerized testing. Behavior Research Methods, 44, 742–752CrossRef Google Scholar PubMed

Article contents

Does Standard Deviation Matter? Using “Standard Deviation” to Quantify Security of Multistage Testing

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests