Hostname: page-component-89b8bd64d-46n74 Total loading time: 0 Render date: 2026-05-08T08:51:43.440Z Has data issue: false hasContentIssue false

Parallel Optimal Calibration of Mixed-Format Items for Achievement Tests

Published online by Cambridge University Press:  01 January 2025

Frank Miller*
Affiliation:
Stockholm University Linköping University
Ellinor Fackle-Fornius
Affiliation:
Stockholm University
*
Correspondence should be made to Frank Miller, Department of Statistics, Stockholm University, 10691 Stockholm, Sweden. Email: frank.miller@stat.su.se
Rights & Permissions [Opens in a new window]

Abstract

When large achievement tests are conducted regularly, items need to be calibrated before being used as operational items in a test. Methods have been developed to optimally assign pretest items to examinees based on their abilities. Most of these methods, however, are intended for situations where examinees arrive sequentially to be assigned to calibration items. In several calibration tests, examinees take the test simultaneously or in parallel. In this article, we develop an optimal calibration design tailored for such parallel test setups. Our objective is both to investigate the efficiency gain of the method as well as to demonstrate that this method can be implemented in real calibration scenarios. For the latter, we have employed this method to calibrate items for the Swedish national tests in Mathematics. In this case study, like in many real test situations, items are of mixed format and the optimal design method needs to handle that. The method we propose works for mixed-format tests and accounts for varying expected response times. Our investigations show that the proposed method considerably enhances calibration efficiency.

Information

Type
Theory & Methods
Creative Commons
Creative Common License - CCCreative Common License - BY
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Copyright
Copyright © 2024 The Author(s)
Figure 0

Figure 1 Response functions for an example GPCM for a 2-point item with βi=(ai,bi1,bi2)=(1,-1,1)\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\beta }_i=(a_i, b_{i1}, b_{i2}) = (1, -1, 1)$$\end{document}.

Figure 1

Figure 2 Optimal design for 2PL model (left, red), 3PL model (middle, green), and GPCM (right, blue), 40 versions, and 9 items per version.

Figure 2

Figure 3 Relative efficiencies for 2PL (first panel), 3PL (middle panel), and GPCM (last panel) model, 40 versions, and 9 items per version. Optimal versus random design.

Figure 3

Table 1 Relative efficiency of optimal design versus random design depending on model (2PL, 3PL, or GPCM) and number of versions (V=12,15,20,30,40,60,100\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$V=12, 15, 20, 30, 40, 60, 100$$\end{document}), averaged over all 60 items.

Figure 4

Figure 4 Optimal design for the 3PL model and V=20,30,60\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$V=20, 30, 60$$\end{document} versions (left, middle, right, respectively), and 9 items per version.

Figure 5

Figure 5 Optimal design (left panel) and item information (right panel). Red Δ\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Delta $$\end{document} = 2PL items, green +\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$+$$\end{document} = 3PL items. Right panel: colored without joining lines for optimal design, black with joining lines for random design. Mixed-format test with 2PL and 3PL items. 20 versions and 9 items per version.

Figure 6

Table 2 Relative efficiency of optimal design versus random design for mixed 2PL and 3PL model depending on number of versions (V=12,15,20,30,40,60,100\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$V=12, 15, 20, 30, 40, 60, 100$$\end{document}), averaged over all 60 items (first row) or all 30 2PL or 30 3PL items in second and third row.

Figure 7

Table 3 Item types in the calibration test for Swedish national test in Mathematics.

Figure 8

Figure 6 Swedish national test in Mathematics: Results from Grade 6 in 2022 of the pupils who participated in the calibration test; red vertical lines: division into groups for the V=20\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$V=20$$\end{document} calibration versions.

Figure 9

Table 4 Calibration test for Swedish national test in Mathematics: Relative efficiency of optimal design versus random design (V=12,15,20,30,40,60,100\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$V=12, 15, 20, 30, 40, 60, 100$$\end{document}), averaged over all items (first row) or all 2PL, all 3PL, and all GPCM items in other rows.

Figure 10

Figure 7 Calibration test for Swedish national test in Mathematics: Optimal design (left) and relative item efficiencies (right). Symbols and colors: red Δ\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Delta $$\end{document} for 2PL item groups, green +\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$+$$\end{document} for 3PL items, blue ×\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\times $$\end{document} for GPCM items. Size of symbols in right panel: small Δ\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Delta $$\end{document} = single 2PL item, medium Δ\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Delta $$\end{document} = group of two 2PL items; large Δ\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Delta $$\end{document} = group of three 2PL items.

Figure 11

Figure 8 Locally D-optimal unrestricted designs for GPCM with three categories (0, 1, or 2 points) for a=1\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$a=1$$\end{document} and b1+b2=0\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$b_1+b_2=0$$\end{document} depending on b2-b1\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$b_2-b_1$$\end{document}. Black solid lines: ability values for design points; red dashed line: weight on largest and smallest design points, where the remaining weight is on the middle or equally divided between the two middle design points; dotted vertical reference lines: b2-b1\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$b_2-b_1$$\end{document}-values for change between two-, three-, and four-point designs (approximately at 1.51 and 5.25).

Figure 12

Figure 9 Locally D-optimal designs for GPCM items with bi1\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$b_{i1}$$\end{document} equidistantly between -2 and 1, bi2=bi1+1\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$b_{i2}=b_{i1}+1$$\end{document} (left panel) or with bi1\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$b_{i1}$$\end{document} equidistantly between -2 and 0, bi2=bi1+2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$b_{i2}=b_{i1}+2$$\end{document} (right panel). ai=3.5\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$a_i=3.5$$\end{document}, 40 versions, 60 items.

Figure 13

Figure 10 Relative efficiencies for item parameters. First panel 2PL, second panel 3PL, third panel GPCM. Symbols: cross is discrimination parameter a, diamond is (first) difficulty parameter b, partially filled square is guessing parameter c for 3PL or second difficulty parameter for GPCM. 40 versions, and 9 items per version. Optimal versus random design.

Figure 14

Table 5 The items for the calibration test for Swedish national test in mathematics: item parameter estimates and their standard errors, and anticipated response times (in minutes) based on pretesting study. Item types are 2PL, 3PL, or GPCM with 0, 1, or 2 possible points (GP2).

Supplementary material: File

Miller and Fackle-Fornius Supplementary material

Miller and Fackle-Fornius Supplementary material 1
Download Miller and Fackle-Fornius Supplementary material(File)
File 215.8 KB
Supplementary material: File

Miller and Fackle-Fornius Supplementary material

Miller and Fackle-Fornius Supplementary material 2
Download Miller and Fackle-Fornius Supplementary material(File)
File 1.9 KB
Supplementary material: File

Miller and Fackle-Fornius Supplementary material

Miller and Fackle-Fornius Supplementary material 3
Download Miller and Fackle-Fornius Supplementary material(File)
File 10 KB
Supplementary material: File

Miller and Fackle-Fornius Supplementary material

Miller and Fackle-Fornius Supplementary material 4
Download Miller and Fackle-Fornius Supplementary material(File)
File 34.1 KB