Hostname: page-component-77f85d65b8-2tv5m Total loading time: 0 Render date: 2026-04-17T18:15:24.692Z Has data issue: false hasContentIssue false

Reply: Birnbaum’s (2012) statistical tests of independencehave unknown Type-I error rates and do not replicate withinparticipant

Published online by Cambridge University Press:  01 January 2023

Yun-shil Cha
Affiliation:
Korea Institute of Public Finance
Michelle Choi
Affiliation:
University of Illinois at Urbana-Champaign
Ying Guo
Affiliation:
University of Illinois at Urbana-Champaign
Michel Regenwetter*
Affiliation:
University of Illinois at Urbana-Champaign
Chris Zwilling
Affiliation:
University of Illinois at Urbana-Champaign
*
Address: Michel Regenwetter, Department of Psychology, 603 E.Daniel St., Champaign, IL 61820. Email: regenwet@illinois.edu
Rights & Permissions [Opens in a new window]

Abstract

Birnbaum (2011, 2012) questioned the iid (independent and identicallydistributed) sampling assumptions used by state-of-the-art statistical tests inRegenwetter, Dana and Davis-Stober’s (2010, 2011) analysis of the“linear order model”. Birnbaum (2012) cited, but did not use, atest of iid by Smith and Batchelder (2008) with analytically known properties.Instead, he created two new test statistics with unknown samplingdistributions.

Our rebuttal has five components: 1) We demonstrate that the Regenwetter et al.data pass Smith and Batchelder’s test of iid with flying colors. 2) Weprovide evidence from Monte Carlo simulations that Birnbaum’s (2012)proposed tests have unknown Type-I error rates, which depend on the actualchoice probabilities and on how data are coded as well as on the null hypothesisof iid sampling. 3) Birnbaum analyzed only a third of Regenwetter etal.’s data. We show that his two new tests fail to replicate on the othertwo-thirds of the data, within participants. 4) Birnbaum selectively picked dataof one respondent to suggest that choice probabilities may have changed partwayinto the experiment. Such nonstationarity could potentially cause a seeminglygood fit to be a Type-II error. We show that the linear order model fits equallywell if we allow for warm-up effects. 5) Using hypothetical data, Birnbaum(2012) claimed to show that “true-and-error” models for binarypattern probabilities overcome the alleged short-comings of Regenwetter etal.’s approach. We disprove this claim on the same data.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
The authors license this article under the terms of the Creative Commons Attribution 3.0 License.
Copyright
Copyright © The Authors [2013] This is an Open Access article, distributed under the terms of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Figure 0

Figure 1: Screen shot of a Cash I paired-comparison stimulus (see also RDDS, Figure 2)

Figure 1

Figure 2: Illustrative analysis of the sampling distribution of pν approximated through 3,000 simulated iid data sets using the maximum likelihood binomial parameters of three participants from Regenwetter et al. (2011) Cash I, and a hypothetical participant. The underlying binomial probabilities are given above the histograms. The expected frequency in each bin under the uniform null is given by the horizontal line. The Kolmogorov-Smirnov statistic is significant in each case, i.e., each distribution differs significantly from a uniform on [0,1].

Figure 2

Table 1: First 36 out of 800 pairwise choices of Participant # 100 in RDDS.

Figure 3

Table 2: Test of iid binary choice following Eq. 21 and text in Smith and Batchelder (2008, p. 727).

Figure 4

Figure 3: Illustrative analysis of the sampling distribution of pr approximated through 3,000 simulated iid data sets using the maximum likelihood binomial parameters of three participants from Regenwetter et al. (2011) Cash I, and a hypothetical participant. The underlying binomial probabilities are given above the histograms. The expected frequency in each bin under the uniform null is given by the horizontal line. The Kolmogorov-Smirnov statistic is significant in one case, i.e., the distribution differs significantly from a uniform on [0,1] for the iid samples from the best fitting collection of binomials of Participant 10.

Figure 5

Table 3: Comparison of simulated sampling distributions for pν and pr for different collections of binomials.

Figure 6

Table 4: Nominal versus actual Type-I error rates for Birnbaum’s (2012) tests of iid.

Figure 7

Table 5: Summary of pν and pr values, rounded to two significant digits, according to the method of Birnbaum (2012) for Cash I, Cash II, and Noncash of Regenwetter et al. (2011), for both the full data sets, as well as the reduced data sets where the first four trials for each gamble pair pair were dropped.

Figure 8

Table 6: Analysis of the linear order model on the full data sets and on reduced data sets where the first four trials for each gamble pair are dropped. A checkmark ✓ indicates perfect fit.

Figure 9

Table 7: Hypothetical data in Birnbaum’s (2012) Tables A.4 (top), A.5. (center), and A.6. (bottom). A “1” indicates choice of the first option in pair, a “0” indicates choice of the second option. For each column of data, we also provide the result of a test for iid sampling of Smith and Batchelder (2008, p.727) using confidence intervals of point estimates ± 2 standard errors (or ± 1.96 standard errors. The results of using 1.96 or 2 standard errors matched throughout.).

Supplementary material: File

Cha et al. supplementary material

Cha et al. supplementary material
Download Cha et al. supplementary material(File)
File 345.8 KB