In both popular culture and politics, many have asserted that there is a crisis of masculinity (Illing Reference Illing2023; Kahloon Reference Kahloon2023). Indeed, some politicians, such as U.S. Republican senator Josh Hawley – who has repeatedly called out liberals for their “attack on men” (Hawley Reference Hawley2021) – have gone so far as to make the issue a centerpiece of their campaign platforms. A crisis of masculinity is thought to partially undergird problems such as the global growth in far-right extremism, a movement that is chiefly driven by men and which often dovetails with backlash to gender equality, feminist movements, and the significant strides that women have made in achieving liberation (Greig Reference Greig2019). Though much of the “crisis of masculinity” and its consequences may be attributed to structural factors like the decline in manufacturing, sociologists and psychologists have also attributed backlash to women’s progress to a masculinity threat. Masculinity threat is the notion that manhood is a precarious state, and men must constantly affirm their gender, often through extreme demonstrations of masculinity (Dahl et al. Reference Dahl, Vescio and Weaver2015; Weaver and Vescio Reference Weaver and Vescio2015). In an early and highly influential 2013 test of this theory published in the American Journal of Sociology, Willer and colleagues find that masculinity threat can impact political attitudes. Specifically, the authors observe in two lab experiments (N total 100–110, N men 40–51) conducted on a convenience sample of university students that inducing masculinity threat increases support for war, homophobic attitudes, and support for dominance hierarchies among male participants (Willer et al. Reference Willer, Conlon, Rogalin and Wojnowicz2013). We conduct a pre-registered replication of this foundational work with a large, nationally representative probability sample from the University of Chicago NORC AmeriSpeak Panel (N total 2,774, N men 2,095).Footnote 1
Masculinity threat, also termed fragile masculinity or precarious manhood, refers to the theory that manhood is a status that is earned, maintained, and defended when it is challenged. Whereas womanhood is viewed as the result of natural biological development (Vandello and Bosson Reference Vandello and Bosson2013), manhood and the masculine identity are understood to be more tenuous. Masculinity threat has been linked to maladaptive behaviors such as increased risk-taking (Parent et al. Reference Parent, Kalenkoski and Cardella2018) and physical aggression (Cohn et al. Reference Cohn, Seibert and Zeichner2009), as well as to attitudinal consequences like decreased support for transgender rights (Harrison and Michelson Reference Harrison and Michelson2019). In the pages that follow, we describe the original Willer et al. (Reference Willer, Conlon, Rogalin and Wojnowicz2013) experiment, why the replication is of interest, the reasons it is unclear ex ante if the study will replicate, and our design and measurement strategy. Importantly, as we detail below, our design facilitates not only a replication of the study with a larger and more representative sample, but also insights into the reasons – theoretical or related to the treatment – behind the possible failure or success of the replication.
We find that Willer et al.’s (Reference Willer, Conlon, Rogalin and Wojnowicz2013) main results do not hold in our replication: we do not observe that masculinity threat is associated with increased support for war, homophobic attitudes, support for dominance hierarchies, traditionalism, or other conservative attitudes. Our study falls somewhere between a direct and a conceptual replication of the original Willer et al. (Reference Willer, Conlon, Rogalin and Wojnowicz2013) study, and differences in study design and implementation – in particular, time period, mode of treatment delivery, and sample characteristics – could be responsible for these contrasting findings. However, null findings do not appear to be driven by outdated outcome measures, respondent inattention or age, or the masculinity threat induction appearing unrealistic; we also do not find any effects from inducing a more “general” sense of threat, and results indicate that we were sufficiently statistically powered to identify small effect sizes of interest. Although some influence of design differences cannot be completely ruled out, our analyses provide no evidence that they drive the contrasting findings. Further, if the original results are indeed specific to a particular sample, time period, or mode of delivery, such a finding could at the very least indicate important scope conditions for the underlying theory and its empirical test. Taken together, our results underscore the need for a more nuanced understanding of whether and when masculinity threat shapes political beliefs – and the types of men who may be most susceptible to such threats.
Original experiment and replication
Willer et al. (Reference Willer, Conlon, Rogalin and Wojnowicz2013) conducted a between-subjects lab experiment to test the theory that masculinity threat impacts political attitudes. The goal of the experimental manipulation was to induce a sense of masculinity threat, which was predicted to lead to the adoption of more stereotypically masculine attitudes. In both Study 1 (n = 111) and Study 2 (n = 100),Footnote 2 the authors had university students fill out the Bem Sex Role Inventory (BSRI), a measure of stereotypically masculine and feminine traits. After completing the inventory, men and women were randomly assigned to receive feedback that they scored either in a masculine or feminine range. When men [women] received feedback that they scored in the feminine [masculine] range, this constituted the gender threat condition. When men [women] received feedback that they scored within the average masculine [feminine] range for their gender, this constituted the non-gender threat (control) condition. The dependent measures in Study 1 were three items on attitudes towards homosexuality, two items on support for the Iraq War, and a Car Purchasing Survey on vehicle desirability.Footnote 3 Study 2 dependent measures were eight dominance attitude items (Pratto et al. Reference Pratto, Sidanius, Stallworth and Malle1994), measures of political conservatism including attitudes on six different political items, eight system justification items (Jost et al. Reference Jost, Banaji and Nosek2004), and seven items on traditionalism. After the post-test survey, respondents were debriefed about the deceptive gender identity feedback.
Our study both replicates the Willer et al. design and introduces innovations to further probe mechanisms and robustness.Footnote 4 First, we replicate the study with a much larger and nationally representative probability sample from the University of Chicago NORC AmeriSpeak Panel. The original Willer et al. study was conducted with a convenience sample of university students. It was also significantly underpowered and had a low number of men across both Study 1 and 2 – particularly problematic given the theoretical expectation and findings that gender threat only impacts men. Our minimum detectable effect size analyses indicate that we are sufficiently powered to detect effect sizes substantially smaller than those in the original study given our sample size of 2,774 respondents (see Appendix B). Importantly, because gender threat is theoretically a phenomenon specific to men, and empirical investigations into masculinity threat show that this is the case (see Carian and Sobotka Reference Carian and Sobotka2018; Harrison and Mitchell Reference Harrison and Michelson2019; DiMuccio and Knowles Reference DiMuccio and Knowles2023), we oversample men (75% men, 24% women, 1% other).
A second contribution of our replication is the inclusion of a factual manipulation check asking respondents to recall their score on the experimentally manipulated feedback. We include this check to measure attention to the experimental stimuli without distorting treatment effects (Kane and Barabas Reference Kane and Barabas2019). Third, we introduce two additional conditions to the Willer et al. design. Given the lack of pre-test measures and piloting of the experimental manipulations, it is unclear if the treatment is inducing masculinity threat as opposed to a general sense of threat. Indeed, research indicates that people generally respond to threat with shifts toward conservative political and social positions (Bonanno and Jost Reference Bonanno and Jost2006, but see Brandt et al. Reference Brandt, Turner-Zwinkels, Karapirinler, Van Leeuwen, Bender, van Osch and Adams2021). Participants randomly assigned to our “general threat” condition complete an entertainment and popular culture knowledge quiz and receive feedback that they performed poorly and have a low level of popular knowledge; importantly, feedback is not gendered. The selection of this domain is driven by the fact that research finds that women are stereotyped, by both women and men, as knowing more about this topic (Coffman Reference Coffman2014; Bordalo et al. Reference Bordalo, Coffman, Gennaioli and Shleifer2019). Nevertheless, we expect that knowledge in this domain will be valued by both men and women, and therefore, negative feedback will induce a sense of threat. We view the condition as a way to distinguish whether a shift to more conservative attitudes among men truly stems from exposure to gender-specific threat or simply from threat evoked from receiving negative feedback in a valued domain.
The second condition we add to the design is a gender threat condition that is similar to the original treatment condition but constrains the range of feedback based on respondents’ actual BSRI scores. One criticism of the original threat condition is that the feedback given may have seemed implausible to some participants. In particular, participants who answered the BSRI questions in alignment with gendered expectations – or, in other words, who had real scores that were on the extreme ends of the scale – may have felt the feedback was particularly implausible. Building on work by DiMuccio & Knowles (2023), we utilize participants’ actual scores on the BSRI and subtract (or add) 20 points to their scores to induce gender threat.Footnote 5
Fourth and finally, we supplement original outcome variables with updated measures, which better reflect current salient political issues. Additional questions probe opinions on transgender rights, support for decreasing the number of legal immigrants allowed to enter the United States, the desirability of electric vehicles, and preferential hiring policies for women. Finally, to establish discriminant validity, we include an item on marijuana legalization, which polling indicates is not strongly associated with gender or political leanings.
Data
We fielded our replication experiment on a nationally representative sample of American adults (n = 2774) through NORC’s AmeriSpeak panel.Footnote 6 The AmeriSpeak panel is a probability sample with households selected from a sampling frame designed to provide a minimum of 97% coverage of the U.S. population. Because of our theoretical interest in gender threat towards men, we requested that they be oversampled; 74.7% of our sample identified as men and 24.2% identified as women (1.1% identified outside of the gender binary).Footnote 7 Including leaners, about 40.1% of our sample identified with the Democratic Party and 43.2% with the Republican Party (see Appendix Table C6 for full demographic information).
Replication of the original Willer et al. (Reference Willer, Conlon, Rogalin and Wojnowicz2013) study
Table 1 displays replication results corresponding to Study 1 findings in Willer et al.; to facilitate comparison, we also mark statistically significant findings reported in the original study. Replicating their analysis strategy, we test hypotheses with a series of two-tailed t-tests comparing means across conditions split by binary gender identification (see Appendix Table C1 for robustness tests with p-values adjusted for multiple comparisons and Appendix Tables C8–C11 for regression models with demographic controls). All outcome variables are rescaled to range from 0 to 1 for interpretability and are coded such that higher values correspond to more conservative responses. In the original study, exposure to a gender identity threat increased men’s support for war, homophobic attitudes, and interest in purchasing an SUV. In contrast, our replication finds no significant effect of the gender identity threat on any of these attitudes for either men or women.
Table 1. T-test results replicating Willer et al. Study 1

Notes: * p < .05, ** p < .01, *** p < .001. T-score and p-values correspond to our study; †indicates statistically significant difference (p < 0.05) in the original study.
Table 2 presents our replication results for the outcomes originally examined in Study 2 of Willer et al. The central finding in the original study was that gender identity threat led to increased endorsement of social dominance attitudes (Pratto et al. Reference Pratto, Sidanius, Stallworth and Malle1994) and traditionalist beliefs (Jost et al. Reference Jost, Glaser, Kruglanski and Sulloway2003) among men. The authors also explored the effects of gender threat on system justification (Jost et al. Reference Jost, Banaji and Nosek2004) and political conservatism. These results were interpreted as evidence that men respond to masculinity threats by reasserting dominance and reinforcing hierarchical beliefs. Again, we test hypotheses with a series of two-tailed t-tests comparing means across conditions, and we mark statistically significant findings from the Willer et al. study. Contrary to the original findings, we observe no significant effect of gender identity threat on any of these outcomes for either men or women; again, results diverge from those in the original study.
Table 2. T-test results replicating Willer et al. Study 2

Note: p < .05, ** p < .01, *** p < .001. T-score and p-values correspond to our study; † indicates statistically significant difference (p < 0.05) in original study.
Discussion
What explains our observed null results? One plausible reason for why our findings diverge could be that some of the original study’s outcome measures – such as support for the Iraq War or opposition to gay rights – may no longer resonate with most Americans, as the Iraq War has ended and public opinion in the United States is now overwhelmingly in favor of same-sex marriage and broader gay rights (McCarthy Reference McCarthy2022).Footnote 8 To evaluate if this shift in political context is responsible for our null findings, we can evaluate effects on our aforementioned supplementary items on salient contemporary issues on immigration, electric vehicles, transgender rights, and gender-based affirmative action policies.
Figure 1 summarizes results. The left-hand panel shows mean responses by experimental condition (control versus gender identity threat), while the right-hand panel displays average treatment effects. Once again, for women in our sample, we see no impact of gender identity threat on any of the contemporary political attitudes we measured. For men in our sample, we observe null effects for electric car desirability, support for legal immigration, marijuana legalization, and recognition of transgender identities. We do find that men whose gender identity is threatened are less supportive of gender-based affirmative action, as Willer et al. (Reference Willer, Conlon, Rogalin and Wojnowicz2013)’s theory and other work linking masculinity threat to the assertion of dominance over women (Dahl et al. Reference Dahl, Vescio and Weaver2015) would suggest; however, this finding is not robust to correcting for multiple comparisons (see Appendix Table C1). In sum, we find little evidence that null results are attributable to outdated outcome measures.

Figure 1. Means for updated outcome variables (left) and average treatment effects (right) split by binary gender.
Note: Darker bars indicate 95% confidence intervals, and thinner bars indicate 90% confidence intervals.
A second explanation for our null finding is that some respondents may have questioned the credibility or realism of the treatment.Footnote 9 To examine this possibility, we can evaluate if treatment effects appear in our modified gender threat condition wherein participants received BSRI feedback constrained to not fall too far from their actual scores. A third explanation is that the original effects in Willer et al. were driven not by masculinity threat, but by a generally induced threat to which men were more responsive. If true, and the masculinity threat in our study did not induce a general threat – perhaps because masculinity was not sufficiently salient to respondents, or alternatively because it was so salient as to engender resistance to feedback – this could explain why our results differed from those in the original study. To investigate this explanation, we can see whether our general threat treatment condition produced effects.
Figure 2 summarizes results for these two additional treatments. For most of the original Willer et al. items, we find null effects for both our constrained masculinity and general threat treatments. Contrary to expectation, we observe that men in the general threat condition report significantly lower social dominance attitudes. We also find that women in the general threat condition express significantly lower support for affirmative action policies and greater interest in purchasing an SUV. However, none of these effects remain statistically significant after correcting for multiple comparisons (see Appendix Table C1). Null results thus do not appear to be attributable to the credibility of our treatments or to the original study’s findings being driven by general as opposed to masculinity threat.

Figure 2. Average treatment effects by constrained feedback and general threat conditions.
In our Appendix, we rule out additional explanations for null results. Balance tests on observables confirm that randomization was effectively implemented (see Table C2); we appear sufficiently powered for our analyses (see Appendix B); differences in respondent attention do not appear to explain null results (see Tables C12–C15)Footnote 10 ; and results are unchanged if we remove survey weights (see Table C16). Finally, we consider three further possible explanations: survey mode, as well as the size and characteristics of the original Willer et al. sample. Regarding mode, it is possible that effects in the original study are attributable to the treatment having greater salience in an in-person setting. We view this explanation as unlikely for several reasons. Our results do not differ between high- and low-attention respondents. We also find null results even in the more credible, constrained gender-threat condition. Moreover, online studies frequently detect effects from treatments that are considerably more complex than ours. Nevertheless, if this explanation were correct, it would imply that masculinity-threat effects arise only under a narrow and restrictive set of conditions.
As concerns sample characteristics, it could be the case either that masculinity threat is particularly relevant to the convenience sample of university students in the Willer et al. study or that it is especially easy to induce threat among this set of participants.Footnote 11 While we do not observe that age moderates treatment effects in our replication (see Appendix Table C21), we cannot rule out this explanation; however, as with mode effects, it would indicate a rather limited set of conditions under which we might expect induced threat to affect political attitudes. Finally, we note that the small sample size in the original study may have produced a type I error, or false positive, indicating either that experimentally inducing masculinity threat is unlikely to affect political attitudes or that the link between masculinity threat and such attitudes is not causal.
Conclusion
We conducted a replication and extension of foundational work by Willer et al. (Reference Willer, Conlon, Rogalin and Wojnowicz2013) on the link between masculinity threat and political attitudes with a larger, nationally representative sample of U.S. adults. We do not find evidence that the original effects replicate. While we do not observe any evidence that contrasting findings are attributable to design differences between our replication and the original study, we cannot entirely rule out their potential impact. In particular, the move from a small, homogenous student sample in a controlled lab environment to a large, national probability sample is notable and could have altered the salience of the masculinity manipulation. Further, the two studies were conducted nearly twenty years apart; although we introduced updated dependent variables to capture possible changes in associations of different attitudes with masculinity, due to space constraints, we were unable to confirm whether these associations are realized in the U.S. public today.
Our results call for further examination into whether and when masculinity threat shapes political attitudes. Understanding which design differences, if any, are responsible for contrasting findings could further help to elucidate possible theoretical scope conditions. Are young men particularly likely to adopt conservative attitudes in response to masculinity threats? Do men assert their masculinity through different means or forms of expression today than they did 20 years ago? Are some settings particularly conducive for inducing masculinity threat? We conclude that more work needs to be done to investigate and clarify the relationship between masculinity threat and political attitudes.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/XPS.2025.10027.
Data availability
Support for this research was provided by Time-Sharing Experiments for the Social Sciences (TESS). Data collected by Time-sharing Experiments for the Social Sciences, NSF Grant 2017464, Maureen Craig, James Druckman, and Jeremy Freese, Principal Investigators. TESS is funded by the Social, Behavioral, and Economic Sciences Directorate of the National Science Foundation. The data, code, and any additional materials required to replicate all analyses in this article are available at the Journal of Experimental Political Science Dataverse within the Harvard Dataverse Network at https://doi.org/10.7910/DVN/5FVQBW.
Acknowledgements
In addition to the anonymous reviewers, we thank the members of the Center for the Experimental-Philosophical Study of Discrimination (CEPDISC, Aarhus University) for helpful comments on this project. We are also grateful to Robb Willer for his comments throughout the review process. We gratefully acknowledge funding support from Time-sharing Experiments for the Social Sciences (TESS) and the Danish National Research Foundation (DNRF144).
Competing interests
The authors are not aware of any conflicts of interest.
Ethics statement
This study received ethical approval from the Research Ethics Committee at Aarhus University (No. BSS-2024-048-S1). The research adheres to APSA’s Principles and Guidance for Human Subjects Research. See Appendix E.

