Hostname: page-component-89b8bd64d-x2lbr Total loading time: 0 Render date: 2026-05-07T02:12:41.990Z Has data issue: false hasContentIssue false

Bayesian Selection Policies for Human-in-the-Loop Anomaly Detectors with Applications in Test Security

Published online by Cambridge University Press:  10 December 2025

Michael Fauss*
Affiliation:
ETS Research Institute , USA Department of Electrical and Computer Engineering, Princeton University, USA
Xiang Liu
Affiliation:
ETS Research Institute , USA
Chen Li
Affiliation:
ETS Research Institute , USA
Ikkyu Choi
Affiliation:
ETS Research Institute , USA
H. Vincent Poor
Affiliation:
Department of Electrical and Computer Engineering, Princeton University, USA
*
Corresponding author: Michael Fauss; Email: mfauss@ets.org
Rights & Permissions [Opens in a new window]

Abstract

This article investigates the problem of automatically flagging test takers who exhibit atypical responses or behaviors for further review by human experts. The objective is to develop a selection policy that maximizes the expected number of test takers correctly identified as warranting additional scrutiny while maintaining a manageable volume of reviews per test administration. The selection procedure should learn from the outcomes of the expert reviews. Since typically only a fraction of test takers are reviewed, this leads to a semi-supervised learning problem. The latter is formalized in a Bayesian setting, and the corresponding optimal selection policy is derived. Since calculating the policy and the underlying posterior distributions is computationally infeasible, a variational approximation and three heuristic selection policies are proposed. These policies are informed by properties of the optimal policy and correspond to different exploration/exploitation trade-offs. The performance of the approximate policies is assessed via numerical experiments using both synthetic and real-world data and is compared with procedures based on off-the-shelf algorithms as well as theoretical performance bounds.

Information

Type
Theory and Methods
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© Educational Testing Service and the Author(s), 2025. Published by Cambridge University Press on behalf of Psychometric Society
Figure 0

Figure 1 Graphical illustration of information flow and posterior update of the Bayesian model.

Figure 1

Figure 2 Graphical illustration of the approximate posterior update in (37)–(41).

Figure 2

Figure 3 Average accumulated detection rate of the proposed policies against the number of administrations for different review sizes K.Note: Here, $N = 100$, $r=0.2$, and the parameters of the feature distributions are given in (59). The thin horizontal lines indicate the upper bound on the detection rate in (56).

Figure 3

Figure 4 Average MSE against the number of administrations.Note: Here, $N = 100$, $r=0.2$, $K = 20$, and the parameters of the feature distributions are given in (59).

Figure 4

Figure 5 Average values of $\phi _t$ in (49) against the number of administrations.Note: Here, $N = 100$, $r=0.2$, $K = 20$, and the parameters of the feature distributions are given in (59).

Figure 5

Figure 6 Average accumulated detection rates of reference procedures against the number of administrations for different review sizes K.Note: Here, $N = 100$, $r=0.2$, and the parameters of the feature distributions are given in (59). The thin horizontal lines indicate the upper bound on the detection rate in (56).

Figure 6

Table 1 Hyper-parameters of reference procedures for different values of K

Figure 7

Figure 7 Comparison of average accumulated detection rates of the proposed detection-greedy policy and support-vector-based reference procedure.Note: Here, $N = 100$, $r=0.2$, and the parameters of the feature distributions are given in (59).

Figure 8

Figure 8 Average detection rate of the proposed policies against the number of administrations for different review sizes K.Note: Here, $N = 100$, $M=3$, and $r=0.2$. The thin horizontal lines indicate upper bounds on the detection rate obtained by an “oracle” version of the proposed method that uses a fitted model from the beginning instead of learning it from the data.

Figure 9

Figure 9 Average posterior mean of R against the number of administrations for different selection policies and values of K, with $N = 100$ and $M=3$.

Figure 10

Figure 10 Average detection rate of reference procedures against the number of administrations for different review sizes K.Note: Here, $N = 100$, $M=3$, and $r=0.2$. The thin horizontal lines indicate upper bounds on the detection rate obtained by an “oracle” version of the proposed method that uses a fitted model from the beginning instead of learning it from the data.

Figure 11

Table 2 Hyper-parameters of reference procedures for different values of K

Figure 12

Figure 11 Comparison of average accumulated detection rates of the proposed detection-greedy policy and logistic-regression-based reference procedure.Note: Here, $N = 100$, $M=3$, and $r=0.2$.