Motivated Numeracy and Active Reasoning in a Western European Sample

20 22 26 28 33 40 Recent work by Kahan et al. [15] on the psychology of motivated numeracy in the context of intra-cultural disagreement suggests that people are less likely to employ their capabilities when the evidence runs contrary to their political ideology. This research has so far been carried out primarily in the United States, regarding the liberal-conservative divide over gun control regulation. In this paper, we present the results of a conceptual replication with Western European participants regarding both the hierarchy-egalitarianism and individualism-communalism divides over immigration policy (n=746). We reproduce the motivated numeracy e � ect, though we do not � nd evidence of increased polarization of high-numeracy participants.


INTRODUCTION
People disagree about key societal issues in the face of compelling scientic evidence. Such disagreements have signicant societal impacts not only with regard to decision making (e.g., whether to vaccinate children) but also with regard to political polarization between groups. Why do seemingly intractable disagreements about policy arise?
According to the "Identity-protective Cognition Thesis" (ICT), the answer is that human reasoning is negatively aected when new information threatens prior beliefs and values. In a previous study with American participants,  found support for this hypothesis. People with high numeracy seem to use their reasoning skills selectively: when the topic about which they were asked to exercise their reasoning skills was unrelated to their political identities (whether a skin cream cured rashes), high-numeracy liberals and conservatives both performed well. However, when the topic was related to their political identities (whether gun control is eective policy), high-numeracy liberals tended to successfully exercise their capabilities only when the evidence suggested that gun control is eective, whereas high-numeracy conservatives tended to successfully exercise their capabilities only when the evidence suggested that gun control is not eective. It may not be surprising that responses became politically polarized when answering questions about a gun-control ban, but what was remarkable in  was that polarization was higher among high-numeracy individual than among low-numeracy individuals. This suggests that the quantitative reasoning skills of participants with high numeracy skills can become suppressed, which portends starker disagreement between more numerate partisans than between less numerate partisans.
In this study we investigated whether a similar result can be found in a Western European sample of participants, and for a dierent controversial topic (migration policies). 1 In addition, we were interested to see whether encouraging active reasoning in one of two ways might mitigate the eect. We thus examine the following two research questions: RQ1: Do some active reasoning interventions do a better job than others at improving numeric reasoning overall?
RQ2: Can we replicate the polarizing eect of identity-protective cognition on numeracy for a dierent controversial topic in a dierent population?
Here is the plan for this paper: in Section 2, we contextualize our study in the published literature on motivated numeracy and active reasoning. Then, in Section 3 we explain the methodology used for the current study. In Section 4, we lay out our results and address RQ1 and RQ2. Finally, in Section 5 we discuss limitations of the current study and explore opportunities for future work on this important topic.

RELATED WORK
In this section we summarize the extant research in the area of motivated numeracy. We also explain our use of active reasoning inductions, and why we believe such inductions may help temper the ill eects of motivated numeracy. To the best of our knowledge, this is the rst study to investigate the eect of active reasoning interventions on motivated numeracy.

Motivated Numeracy
Motivated numeracy is a species within the larger genus of motivated cognition. The overarching category includes processes and dispositions related to seeking out evidence, trusting and distrusting sources of information, interpreting evidence and counter-evidence, weighting competing criteria in decision-making, remembering information, noticing Motivated Numeracy and Active Reasoning in a Western European Sample 3 inferential connections, and so on. Much motivated cognition is normatively unobjectionable, even desirable. There is nothing wrong with people seeking out information related to topics and issues they care about rather than those they do not. Additionally, if someone lacks epistemic motivation entirely, they are unlikely to engage in inquiry. However, motivated reasoning can turn vicious when it leads people to disregard or misinterpret -for identity-protective reasons -key evidence that they would otherwise be well-positioned to process.
Motivated numeracy specically crops up in those cases in which people need to exercise their learned capacity to interpret data, tables, and gures. In such a context, there is typically a clear right answer dictated by the evidence. This makes the study of motivated numeracy more interpretable than the study of, for instance, risk perception. When social scientists such as Kahan et al. [2005] study attitudes towards new technologies like nanoparticles, it is often dicult even for experts to say exactly how the risks and benets should be weighed against one another. If some people focus more on the risks while others focus more on the benets, they may come to dierent conclusions and yet both be reasoning unobjectionably. Indeed, Alfano [2019] argues that the same person may come to opposite evaluations if they approach the evidence rst skeptically, then in a trusting mode. When it comes to interpreting a graph or a contingency table, though, there is a denitive correct answer. This means that researchers can use numeracy tasks to examine not just faultless dierences in risk-aversion but outright errors in reasoning, which brings us to .
Participants in Kahan and colleagues' study were presented with a contingency table like the one pictured in Figure 1.
The patients who did not use the cream got better (83.6 percent). Thus, even though more patients who used the cream got better, the likelihood of getting better given that one used the cream was lower than the likelihood of getting better given that one did not.  found that higher-numeracy participants were better able to interpret the contingency table than lower-numeracy participants. In the skin cream conditions, participants' political partisanship had no eect on their responses. However, in the gun control conditions, partisan participants tended to answer correctly only when they saw ideologically-friendly data: high-numeracy liberal Democrats gave the correct answer primarily when the table suggested that gun control worked, whereas conservative Republicans gave the correct answer primarily when the table suggested that gun control did not work. Moreover, polarization was more evident between high-numeracy partisans than between low-numeracy partisans. Kahan and colleagues explain these results, and in particular the polarization, as stemming from identity-protective cognition. Essentially, the idea is that identity-related commitments (e.g., to minimal regulation of rearms or to strong regulation of rearms) can bump up against the facts, and that when such clashes occur people tend to hold tight to their commitments and ignore or misinterpret the facts. conservatives and liberals across a range of controversial issues, including not only gun control but also health-care reform, nuclear power, and same-sex marriage. Fourth, Khanna and Sood [2018] conducted three studies -all using some form of rearms regulation as the controversy -that again replicated the original nding. Finally, Nurse and Grant [2019] conducted a conceptual replication with Australian participants (N=504) using anthropogenic climate change rather than gun control as the controversial topic; this conceptual replication also succeeded in nding the eect of motivated numeracy.
Thus, to date, all but one of the studies of motivated numeracy have involved participants from the United States.
Direct replications will presumably continue to employ American participants, since gun control is not nearly as controversial in the vast majority of other countries as it is in the States. In addition, all ve of these replication studies used a unidimensional measure of political ideology, along the traditional left-right spectrum. While the unidimensional measure is adequate for many purposes, we suspect that it may obscure some interesting dierences. For that reason, Motivated Numeracy and Active Reasoning in a Western European Sample 5

Active Reasoning
Critical thinking -and avoiding the ill eects of motivated reasoning -is a highly valued skill but a dicult one to teach or nurture. Unfortunately, critical thinking is a skill that is often missing even among people holding a degree in a scientic eld of study [Shtulman 2013]. It is dicult to undermine unfounded beliefs by simply pointing out alternative explanations. Indeed, trying to correct such beliefs might even strengthen people's initial beliefs [Lewandowsky et al. 2012;Nguyen et al. 2007]. In particular, such backring is liable to occur when the argument threatens someone's identity or falls outside the boundaries of what they consider acceptable.
One way to address this problem is to present information with sucient support and guidance. Additionally, it is crucial to support critical thinking early, as it is most likely to exert an inuence at the time of message exposure [Lewandowsky et al. 2012].
Extant research documents encouraging evidence for various active reasoning approaches that support critical thinking. In the classroom, an eective method to foster active reasoning has been to ask students to themselves generate counter-arguments for unfounded beliefs [Miller and Wozniak 2001]. Teaching such active reasoning skills and pointing out awed argumentation techniques used by providers of misinformation has also been shown to be eective to reduce belief in false information [Cook et al. 2017]. The results suggested a slight increase in item acceptance. Other work introduced a light-weight but eective protocol for supporting debate in a classroom activity with university students. The ndings suggest that this intervention led to a statistically signicant belief change, and that this change was in the direction of the position best supported by scientic evidence. However, the intervention combined several aspects (including exposure to a lecture on critical thinking, and seeing the arguments of peers), which does not allow us to draw conclusions about the eects of individual aspects [Holzer et al. 2018].
Further, some authors argue that online debate could reduce beliefs in pseudoscientic claims [Holzer et al. 2015;Tsai et al. 2015], possibly leveraging the fact that arguments from peers can be more persuasive than those coming from more authoritative gures [Garrett 2011]. In this vien, rbutr is a software solution that scaolds peer debates on controversial information right where it appears. 2 It does so by allowing users to post and rate rebuttals for web pages through a browser plugin. In this way, any web page can become a live debate platform. This is in line with a view that there should be a World Wide Argument Web, connecting arguments with each other online (see Schneider et al. [2013] for a review).
In light of this previous work, we posit that a procedure that encourages active reasoning could decrease the extent that identity-protective cognition manifests. To clarify this issue, we designed a replication study measuring identityprotective cognition with two active reasoning manipulations (one with online argumentation, the other using online search).

EXPERIMENT
This experiment is a conceptual replication of the study by .

Stimulus
As in the original study, the stimulus consisted of four versions of a problem involving the interpretation of data and causal inference. Those results were reported in a two-by-two contingency table, the columns of which specied 2 http://rbutr.com, retrieved August 2019 6 the number of cases that reected positive and negative results, respectively, and the rows of which reected the experimental treatment (see Figure 1). These were on two dierent topics: Medicine and Policy.
Medicine. For the skin rash treatment topic, there were two of the versions of the experiment. These two versions diered only in terms of which result they supported. This meant that labels at the tops of the columns ("Rash got better" vs. "Rash got worse") in the table were reversed. The contingency table below the labels describes a number of patients suering from skin rashes, where some have received treatment and others have not. The table indicates how many patients got better, and the participant is asked to indicate either that "the people who used the skin cream were more likely to get better than those who didn't" or that "the people who use the skin cream were more likely to get worse than those who didn't. These stimuli are identical to those used in the original  study.
Policy. Two conditions of the experiment involved a new immigration policy. The contingency table describes the eectiveness of a strict new immigration policy; in one condition, the stricter policy is eective, in the other not.
The table indicates the number of people whose level of radicalization decreased and the number whose level of radicalization increased. The wording was kept as comparable as possible to the original Kahan et al. study: Terrorism researchers have developed a new policy for identifying radicalization in recent immigrants.
New policies often work but sometimes lead to additional radicalization. Even when policies don't work, radicalization sometimes decreases and sometimes increases randomly. As a result, it is necessary to test any new policy in an experiment to see whether it leads to more or less radicalization. Researchers have conducted an experiment on recent immigrants at risk of radicalization. In the experiment, one group of border security ocers applied a stricter entrance policy and a second group did not apply the stricter entrance policy. For each group, the number of people whose level of radicalization decreased and the number whose level of radicalization increased are recorded in the table below. Because security ocers do not always complete studies, the total number of participants in each of the two groups is not exactly the same, but this does not prevent assessment of the results. Please indicate whether the experiment shows that using the strict new policy is likely to make radicalization decrease or increase.

Procedure
In a between-subjects design participants were assigned to one out of 8 conditions (2 by 2 by 2 design): • Result polarity (2): intervention caused improvement, intervention caused decline • Topic (2): medical treatment, immigration policy • Active reasoning (2): browser search, rbutr Participants rst supplied basic demographic information. Then they were asked to spend some time on actively and critically researching their topic (medical treatment or immigration policy), e.g.,"Do modern medical treatments work?
How eective are they? What strengths or aws do they have?" Depending on the condition, participants were either asked to use the rbutr website, or to use their preferred method for nding information online. The rbutr system is a website and plugin where users supply links to articles that "rebut" or argue against the points made in other articles. The active control is described in the following way: "To answer these questions, please use your preferred method for getting information online. Please spend approximately 10 minutes searching, reading, or watching videos to learn about the quality of medical research." Both active reasoning interventions were accompanied by a 10 minute timer that prevented participants from moving to the next stage before they had done some research.
Next, participants completed a questionnaire about their political aliation (see Section 3.2.1) and a questionnaire assessing their Numeracy skills (see Section 3.2.2). The experiment was concluded with a free text comment box for remaining questions or comments from participants.
3.2.1 Political orientation. The Kahan et al. study that we are replicating used self-reports on the continuum between conservative Republican and liberal Democrat. To broaden the study to European political views, we used a questionnaire containing two validated scales to measure political aliation [Kahan 2012]. In this questionnaire, participants indicate the level of their disagreement or agreement with each item on a Likert response measure. Responses are then aggregated (with appropriate reverse-coding of the "E" and "C" items) to form continuous "Hierarchy -egalitarianism" (H-E, 13 items) and "Individualism -communitarianism" (I-C, 17 items) worldview scores. Here is an example item from the I-C scale associated with high individualism: "People who are successful in business have a right to enjoy their wealth as they see t." And here is an example item from the H-E scale associated with high hierarchy: "It seems like the criminals and welfare cheats get all the breaks, while the average citizen picks up the tab." A full list of items can be found in Kahan et al. [2007].

Numeracy. To assess Numeracy competence, participants completed the questions in a validated numeracy
questionnaire [Weller et al. 2013]. Questions range in diculty to make it possible to distinguish between participants with various levels of numeracy. A relatively easy question is, "In the ACME PUBLISHING SWEEPSTAKES, the chance of winning a car is 1 in 1000. What percent of tickets of ACME PUBLISHING SWEEPSTAKES win a car?" A relatively dicult question is, "Which of the following numbers represents the biggest risk of getting a disease? (1 in 12 or 1 in 37). "

RESULTS
All analyses were conducted in R [Core Team 2018]. Following Kahan and colleagues, primary analyses used multiple imputation to handle missingness (the maximum amount of missingness for any variable used was 7 missing responses for two items within the Individualism-Collectivism scale, less than 1 percent missing). Multiple imputation was performed using the 'mice' R package [van Buuren and Groothuis-Oudshoorn 2011], and type-II Analyses of Variance (ANOVAs) were performed using the 'miceadds' R package [Robitzsch et al. 2018]. 3

Participants
Participants were recruited on the Prolic platform, with a lter for participants registered as British or Dutch to ensure a European sample with high English comprehension. In total, 746 participants completed the study (61 percent female). 4 The majority (68 percent) were British, and a small minority (2 percent) were Dutch, though 28 percent did not specify a nationality. The mean age was 34.75 (StD = 11.61). The majority of participants had either completed a College (227) or a Bachelors degree (294), but there were participants at Elementary school level (7), High school (111)

Preliminary analysis
We rst investigated whether numeracy skills were dierent based on mean-splits of political scores. Welch Two Sample t-tests indicated that numeracy scores diered between high and low scorers on H-E (p< 0.001, Cohen's d = 0.28) and high and low scorers on I-C (p< 0.001, Cohen's d = 0.27). In each case more liberal participants (who scored below the mean on the political scales) scored higher on numeracy.
4.3 RQ1: Do some active reasoning interventions do a beer job than others at improving numeric reasoning overall?
Overall, participants selected the correct interpretation of the data table only 43 percent of the time, which was signicantly lower than chance, .95 CI = [.39, .46]. This is similar to the result in , who found 41 percent correct interpretation.
To test whether the active reasoning manipulation aected the accuracy of responses, we t a logistic regression predicting correct responses (1 = correct, 0 = incorrect) from a dummy indicating condition (1 = active reasoning manipulation, 0 = control). Active reasoning condition had no signicant eect on response accuracy, b = 0.09, SE = 0.15, t(741.9) = 0.62, p = 0.53. Moreover, there were also no signicant two-way interaction eects between active reasoning and topic or result polarity, and no three-way active reasoning by topic by polarity interaction (all p > 0.16).
These results suggest that there is no signicant dierence in the two active reasoning interventions, however there were some issues with the used platform (Rbutr) which are addressed in the discussion. Given the similar performance across the active reasoning interventions, we also collapsed across these two conditions in further analyses. We also compared whether the topic manipulation (medicine and policy) aected accuracy of responses. The average number of correct responses was lower for the policy topic (40 percent) compared to the medicine topic (45 percent), but this dierence was not statistically signicant (p = 0.16).

RQ2:
Can we replicate the polarizing eect of identity-protective cognition on numeracy for a dierent controversial topic in a dierent population?
Based on the ndings of , we hypothesized that individuals' political orientations would interact with topic (medicine vs. policy) and result polarity (intervention leads to increase vs. decrease in rashes/radicalization) in determining the probability of correct responses among individuals higher in numerical reasoning ability. Specically, we hypothesized that liberal-leaning respondents high in numerical reasoning ability would be more likely to respond correctly when the data supported a more liberal policy stance (i.e., when the stricter entrance policy increased radicalization), while more conservative-leaning respondents high in numerical reasoning would be more likely to respond correctly in the policy condition when the data supported a more conservative political stance (i.e., when the stricter entrance policy decreased radicalization). By contrast, we expected that in the medicine condition, result polarity would have no eect on response accuracy, regardless of respondents' ideology or numeracy. This hypothesis entails a predicted four-way interaction: topic by polarity by respondent numeracy by respondent political ideology.
To test this hypothesis, we t two separate logistic regression models predicting correct responses from a dummy indicating the topic (0 = medicine, 1 = policy), a dummy indicating response polarity (0 = intervention decreases outcome, 1 = intervention increases outcome), respondents' numeracy scores, and respondents' political ideology (one model  Figure 2 displays the predicted probabilities of answering correctly for each topic and polarity type. 5 Consistent with a motivated numeracy account, more egalitarian and collectivist respondents were generally more likely to select the correct answer, but when results ran counter to an egalitarian world-view -the policy/decrease condition, in which stricter border policies led to reduced radicalization -more egalitarian and collectivist respondents became less likely to select the correct answer, and more hierarchical and individualistic respondents became more likely to select the correct answer. H-E model I-C model Fig. 3. Predicted probabilities of correct answer by numeracy (with ideology set to its mean) and political ideology (with numeracy set to its mean) from models using H-E (black lines) and I-C (red dashed lines) as the measure of political ideology.

DISCUSSION
The main nding of this study is that a motivated numeracy eect can be conceptually reproduced in a Western European sample using immigration policy rather than gun control as the controversial topic. In addition, we nd that both the H-E and the I-C dimensions of political orientation are associated with this motivated numeracy eect. However, we were not able to reproduce the four-way interaction (involving greater polarization among high-numeracy than low-numeracy participants) indicative of increased polarization among high-numeracy partisans. This may be due to dierences between the American participants in the original study and our European participants, to the dierence between the gun control controversy and the immigration controversy, or to some other (set of) factor(s). We also note that there is evidence that high-numeracy partisans tend to place dierent evaluative emphasis on the same conditional probabilities [Van Boven et al. 2019], which might partially explain our results. That said, we also found no evidence of convergence among high-numeracy participants with opposing ideologies -that is to say, we found no evidence that being high in numeracy led to reduced polarization, which is what one might naively hope for. In addition, we found no evidence that dierent active reading inductions mitigated the motivated numeracy eect dierently.
In the replicated paper, Kahan and colleagues pit the "science comprehension thesis" (SCT) against the "identityprotective cognition thesis" (ICT). Strictly speaking, these are not inconsistent. Problems in public discourse and deliberation could be due to multiple causes, including both poor overall science comprehension and identity-protective cognition on the part of those who would otherwise be well-positioned to understand and interpret scientic evidence.
Our results suggest that both may be in play. The participants who were low in numeracy would have done better to ip a coin than to trust their own reasoning. The participants higher in numeracy did slightly better than chance, Motivated Numeracy and Active Reasoning in a Western European Sample 11 but showed signs of identity-protective cognition and resulting polarization. Together, these results suggest that both improving education and dampening the eects of identity-protective cognition are worth pursuing.
We conclude by discussing the prospects of active reasoning inductions, several limitations of the current study, and directions for future research.

Active reasoning
Motivated numeracy about politically contentious issues presents a serious challenge to democratic deliberation and decision making. In this study, we compared two active reasoning inductions to see whether either was more successful than the other at mitigating the motivated numeracy eect: inviting participants to use their own preferred method of information-seeking about the topic versus using the rbutr interface. The results were inconclusive. We found no evidence that either approach is more eective than the other.
In both conditions participants displayed the motivated numeracy eect, at a similar level as the original study.
This suggests that the failure to replicate the 4-way interaction of polarity, topic, numeracy, and ideology is not easily explain by the presence of (some form of) active reasoning for all conditions. That is, active reasoning did not improve numeric reasoning directly, although a more complex interaction may have occurred.
This could be due to any number of causes. For instance, several participants in the rbutr condition reported that the interface was hard to use or broke down. We hold out hope that a dierent active reasoning induction may help to mitigate the motivated reasoning eect.

Limitations
Our study has several limitations. First, as mentioned above, numeracy and political orientation were confounded for both ideology subscales. Participants with egalitarian (communitarian) politics tended to score higher on the numeracy scale than those with hierarchical (individualist) politics. A follow-up study using stratied sampling would address this limitation. Second, we deviated from our pre-registered data collection plan. In the pre-registration, we aimed to collect data from 1600 participants. In the end, we could only aord to collect data from 746 participants. This is still a sizable dataset, but with a larger sample we may have been able to detect a potential four-way interaction as in Kahan et al. 's original study -though it is worth pointing out that the four-way interaction was nowhere near the threshold for statistical signicance in our data.

Future directions
We close on a pessimistic and skeptical note about the prospects of dampening identity-protective cognition. In their original paper, Kahan and colleagues suggest that this is possible, and point to a book review by [Kahan et al. 2006] of Sunstein [2005] as providing a method for overcoming identity-protectiveness. However, that method turns out to be self-armation exercises, which were rst developed in the context of responding to stereotype threat [Cohen et al. 2000]. Alas, the literature on stereotype threat seems to not to be replicating well [Flore and Wicherts 2014;Paulette et al. 2019], which indicates that self-armation is a solution in search of a problem. Of course, this does not mean that self-armation cannot be the solution to a dierent problem. Does self-armation dampen identity-protective cognition? Further research is needed to shed light on this question.
We are more enticed, though, by the prospect of using identity itself to dampen identity-protective cognition.
Paradoxical as this might sound, it seems quite promising. The way this would work is by cultivating identities that incorporate epistemic aims (e.g., accuracy, reliability, reasonableness). Someone who embodies such an identity would