Computerized adaptive tests (CATs) play a crucial role in educational assessment and diagnostic screening in behavioral health. Unlike traditional linear tests that administer a fixed set of pre-assembled items, CATs adaptively tailor the test to an examinee’s latent trait level based on their previous responses. We introduce a novel CAT system that builds on recent advances in Bayesian multivariate IRT. Our approach leverages direct sampling from the latent factor posterior distributions, significantly accelerating existing information-theoretic item-selection methods by eliminating the need for computationally intensive Markov chain Monte Carlo simulations. To address the potential suboptimality of one-step-ahead item-selection rules, we also develop a double deep Q-learning algorithm that efficiently learns an optimal item-selection policy offline using a calibrated item bank. Through simulation and real-data studies, we demonstrate that our approach not only accelerates existing item-selection methods but also highlights the potential of reinforcement learning (RL) in CATs. Notably, our Q-learning-based strategy consistently achieves the fastest posterior variance reduction, leading to earlier test termination. These results demonstrate the promise of combining exact posterior sampling with RL to deliver scalable, high-precision CATs.