Hostname: page-component-6766d58669-7cz98 Total loading time: 0 Render date: 2026-05-17T20:13:28.080Z Has data issue: false hasContentIssue false

Goodness-of-fit Tests for Categorical Models of Psychological Processes: Fixing the Occasional Failures of Asymptotic Theory

Published online by Cambridge University Press:  04 March 2025

Miguel A. García-Pérez*
Affiliation:
Universidad Complutense, Spain
Rocío Alcalá-Quintana
Affiliation:
Universidad Complutense, Spain
*
Corresponding author: Miguel A. García-Pérez; Email: miguel@psi.ucm.es
Rights & Permissions [Opens in a new window]

Abstract

The goodness of fit of categorical models of psychological processes is often assessed with the log-likelihood ratio statistic (G2), but its underlying asymptotic theory is known to have limited empirical validity. We use examples from the scenario of fitting psychometric functions to psychophysical discrimination data to show that two factors are responsible for occasional discrepancies between actual and asymptotic distributions of G2. One of them is the eventuality of very small expected counts, by which the number of degrees of freedom should be computed as (J−1) × I−P−K0.06, where J is the number of response categories in the task, I is the number of comparison levels, P is the number of free parameters in the fitted model, and K0.06 is the number of cells in the implied I × J table in which expected counts do not exceed 0.06. The second factor is the administration of small numbers ni of trials at each comparison level xi (1 ≤ iI). These numbers should not be ridiculously small (i.e., lower than 10) but they need not be identical across comparison levels. In practice, when ni varies across levels, it suffices that the overall number N of trials exceeds 40 × I if J = 2 or 50 × I if J = 3, with no ni lower than 10. Correcting the degrees of freedom and using large ni are easy to implement in practice. These precautions ensure the validity of goodness-of-fit tests based on G2.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Universidad Complutense de Madrid and Colegio Oficial de la Psicología de Madrid
Figure 0

Figure 1. Artificial data (symbols) and fitted psychometric functions (curves) in binary (a) and ternary (b) discrimination tasks with a standard duration of 250 ms (vertical line in each panel) and comparison durations ranging from 100 ms to 400 ms (horizontal axis). Response types are indicated at the top of each row. Parameter estimates and the value of the G2 statistic are indicated on the right side of each panel.

Figure 1

Figure 2. Observed and expected counts of responses of each type at each comparison duration in the binary and ternary tasks. Observed counts are plotted in panels in the top row of Figure 1 as proportions over the 20 trials at each comparison duration. Expected counts are similarly plotted as proportions in the corresponding panels of Figure 1 in the form of the ordinate of the applicable psychometric function at each comparison duration. A red background identifies cells with expected counts of zero.

Figure 2

Figure 3. Actual (histograms) and asymptotic (red curves) distributions of G2 in the binary (left panels) and ternary (right panels) tasks under the simulation conditions stated in the text. The nominal number of degrees of freedom is indicated at the top of each column. The arrow in each panel is horizontally located at the critical point for a size-.05 goodness-of-fit test, and the numeral above it is the percentage of replicates in which the value of G2 exceeded this critical point. The top and bottom rows pertain, respectively, to simulations in which comparison durations had the coverage depicted in the top and bottom rows of Figure 1.

Figure 3

Figure 4. Actual (histograms) and asymptotic (red curves) distributions of G2 from the simulation condition involving J = 3 (ternary task), I = 13 (13 comparison durations), and ni = 20 (20 trials at each comparison duration) with no parameter estimation. The top-left panel shows the overall distribution and the asymptotic distribution with df = (3−1) × 13 = 26 degrees of freedom. The remaining panels show analogous results after determining the value of df* for each replicate by subtracting one degree of freedom for each expected count below T = 0.06. Each panel shows the distribution for the subset of replicates for which df* had the same value, for values between df* = 10 and df* = 20. The number of replicates and the value of df* for each group are given at the top right side of each panel. Two additional groups are omitted because the number of replicates that fell into them was too few (1999 replicates with df* = 21 and 236 replicates with df* = 22). Rejection rates and the critical point for a size-.05 test with the degrees of freedom in each panel are indicated by an arrow and the numeral above it.

Figure 4

Figure 5. Overall rejection rates in the binary (a) and ternary (b) tasks with no parameter estimation as a function of the value T of the threshold used to determine the number KT of cells with small expected counts. Each panel shows results for the number ni of trials indicated at the top left of the panel. The five (unmarked) curves in each panel show results for each number I of comparison durations (between 10 and 14). Note that T = 0.06 produces rejection rates that are virtually at the target 5% level in all cases.

Figure 5

Figure 6. Overall rejection rates in the binary (red) and ternary (blue) tasks with parameter estimation as a function of the value T of the threshold used to determine the number KT of cells with small expected counts. Data come from the simulation for which overall distributions of G2 were plotted in the top row of Figure 3. Note that T = 0.06 also produces rejection rates that are closest to the nominal 5% level here.

Figure 6

Figure 7. Actual (histograms) and asymptotic (red curves) distributions of G2 in binary (left column) and ternary (right column) tasks from the simulation condition involving I = 12 (i.e., 12 comparison durations) and ni ranging from 2 (top row) to 128 (bottom row) in multiplicative steps of 4, and with no parameter estimation. Each panel shows the distribution for the set of 300,000 replicates and the asymptotic chi-squared distribution with the applicable degrees of freedom (indicated by the value of df at the top of each column). Rejection rates and the critical point for a size-.05 test with the degrees of freedom in each panel are indicated by an arrow and the numeral above it.

Figure 7

Figure 8. Actual (histograms) and asymptotic (red curves) distributions of G2 in binary (left column) and ternary (right column) tasks with adaptive placement of N trials, ranging from 192 (top row) to 1,536 (bottom row) in multiplicative steps of 2, and with no parameter estimation. Graphical conventions as in Figure 7.