Hostname: page-component-89b8bd64d-mmrw7 Total loading time: 0 Render date: 2026-05-06T08:37:44.378Z Has data issue: false hasContentIssue false

Multiple Hypothesis Testing in Conjoint Analysis

Published online by Cambridge University Press:  26 January 2023

Guoer Liu
Affiliation:
Ph.D. Candidate, Department of Political Science, University of Michigan, Ann Arbor, MI, USA. E-mail: guoerliu@umich.edu
Yuki Shiraito*
Affiliation:
Assistant Professor, Department of Political Science, University of Michigan, Ann Arbor, MI, USA; Center for Political Studies, 4259 Institute for Social Research, 426 Thompson Street, Ann Arbor, MI 48104-2321, USA. E-mail: shiraito@umich.edu URL: shiraito.github.io
*
Corresponding author Yuki Shiraito
Rights & Permissions [Opens in a new window]

Abstract

Conjoint analysis is widely used for estimating the effects of a large number of treatments on multidimensional decision-making. However, it is this substantive advantage that leads to a statistically undesirable property, multiple hypothesis testing. Existing applications of conjoint analysis except for a few do not correct for the number of hypotheses to be tested, and empirical guidance on the choice of multiple testing correction methods has not been provided. This paper first shows that even when none of the treatments has any effect, the standard analysis pipeline produces at least one statistically significant estimate of average marginal component effects in more than 90% of experimental trials. Then, we conduct a simulation study to compare three well-known methods for multiple testing correction, the Bonferroni correction, the Benjamini–Hochberg procedure, and the adaptive shrinkage (Ash). All three methods are more accurate in recovering the truth than the conventional analysis without correction. Moreover, the Ash method outperforms in avoiding false negatives, while reducing false positives similarly to the other methods. Finally, we show how conclusions drawn from empirical analysis may differ with and without correction by reanalyzing applications on public attitudes toward immigration and partner countries of trade agreements.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of the Society for Political Methodology
Figure 0

Figure 1 False-positive results of estimated AMCEs when all null hypotheses are true. Each bar presents the number of datasets (y-axis) for each number of statistically significant estimates (x-axis), with the truth (no significant findings) shaded by gray.

Figure 1

Figure 2 Example of the spike-and-slap prior distribution. The spike (point mass) is at zero, and the slap (gray curve) follows a normal distribution.

Figure 2

Figure 3 False-positive AMCE estimates when all null hypotheses are true with correction methods. Whereas the standard analysis pipeline correctly accepts all null hypotheses in fewer than 80 datasets, the BC, the BH, and the Ash all correct multiple testing in more than 900 experimental trials with the Ash performing the best.

Figure 3

Table 1 Number of datasets for each number of true- and false-positive findings when the AMCE of Gender is non-zero. (a) The effect of male is $-.06$ and the effects of female and all other attributes are drawn independently from $\mathcal {N}(0,.015^2)$. The error variance of the regression model for continuous responses is $.01^2$. (b) AMCEs are identical to Table 1a, but the error variance of the regression model is $.1^2$. (c) The effect of male and the other attributes and the error variance are identical to Table 1b, but the effect of female is independently drawn from $\mathcal {N}(0,.12^2)$. Empty cells indicate zero.

Figure 4

Table 2 Number of datasets for each number of true- and false-positive findings when the true AMCEs of all levels in Gender, Education, and English are non-zero. Obtaining 10 true positives and zero false positives (shaded) is the ground truth. Empty cells indicate zero.

Figure 5

Figure 4 Density Histogram of the Difference between True Positive Rate (TPR) and False Positive Rate (FPR). A larger value on the x-axis indicates better performance. The figure is based on the same simulations as Table 2.

Figure 6

Figure 5 Effects of the immigrant’s country of origin (left) and profession (right) on the probability of being preferred for admission to the United States. For country of origin, the reference category is India; for profession, the reference category is janitor. The plot shows estimates with no correction, the BC (Bonf), the Ash with a mixture of normal components (ash.Norm), and the Ash with a mixture of uniform components (ash.Unif) for each pair of comparison. BH$\checkmark $ next to a point estimate indicates the BH corrected coefficient is significant for the corresponding attribute level. The estimates are based on regression estimators with clustered standard errors at the respondent level; the bars represent 95% confidence intervals. The estimates with no correction replicate the results for the corresponding attributes in Figure 3 in Hainmueller et al. (2014, 21).

Figure 7

Figure 6 Effects of Military ally (top) and Environmental protection standards (bottom) on the probability of being preferred as trading partners in Vietnam. For Military ally, the reference category is allied; for Environmental Protection Standards, the reference category is lower standards. The plot shows estimates with no correction, the BC (Bonf), the Ash with a mixture of normal components (ash.Norm), and the Ash with a mixture of uniform components (ash.Unif) for each pair of comparison. BH$\checkmark $ next to a point estimate indicates the BH corrected coefficient is significant for the corresponding attribute level. The estimates are based on regression estimators with clustered standard errors at the respondent level; the bars represent 95% confidence intervals. The estimates with no correction replicate the results for the corresponding attributes in Figure 1.3 in Spilker et al. (2016, 715).

Figure 8

Figure 7 Checklist for multiple hypothesis testing in conjoint analysis.

Supplementary material: Link

Liu and Shiraito Dataset

Link
Supplementary material: PDF

Liu and Shiraito supplementary material

Liu and Shiraito supplementary material

Download Liu and Shiraito supplementary material(PDF)
PDF 329.6 KB