Listening ability in second language (L2) assessment is elicited through test methods that often require other abilities, which may introduce construct-irrelevant variance and threaten the validity of the test. While previous research has examined the effects of item presentation and item format on test performance, studies on test-takers’ cognitive processing remain exploratory. This study investigates the effects of two test method variables, item presentation (operationalized as while-listening performance [WLP] vs. post-listening performance [PLP]) and item format (operationalized as multiple-choice questions [MCQs] vs. open-ended questions [OEQs]), on test-takers’ cognitive processing, using gaze behaviors as real-time indicators. A Graeco-Latin square design was employed to administer four psychometrically validated short-talk listening testlets across the test conditions. Eye-tracking data were collected while controlling for word count and typing speed as covariates. Linear mixed-effects modeling revealed higher total fixation duration, fixation counts, total visit duration, and visit counts in the WLP condition than in the PLP condition. Interaction effects for all four metrics further indicated that these differences were more pronounced for MCQs than for OEQs. Additionally, typing speed and word count contributed to the variance in eye-tracking measures. The findings contribute to our growing understanding of how test methods shape listening processes and offer implications for the design and interpretation of L2 listening assessments.