How Deliberation Happens: Enabling Deliberative Reason

We show, against skeptics, that however latent it may be in everyday life, the ability to reason effectively about politics can readily be activated when conditions are right. We justify a definition of deliberative reason, then develop and apply a Deliberative Reason Index (DRI) to analysis of 19 deliberative forums. DRI increases over the course of deliberation in the vast majority of cases, but the extent of this increase depends upon enabling conditions. Group building that activates deliberative norms makes the biggest difference, particularly in enabling participants to cope with complexity. Without group building, complexity becomes more difficult to surmount, and planned direct impact on policy decisions may actually impede reasoning where complexity is high. Our findings have implications beyond forum design for the staging of political discourse in the wider public sphere.


D
eliberative democracy is now arguably the main theme in both democratic theory and the practice of democratic innovation.Yet deliberative reasoning appears excessively demanding in the face of enduring skepticism rooted in a long tradition in psychology and political science, which has found reiteration and renewed life in high-profile treatments such as that of Achen and Bartels (2016), and turbocharged critiques of democracy such as Brennan (2016).The skeptics find that the capacities of ordinary people to recognize let alone weigh issuebased reasons for choices are very limited.According to the skeptics, what happens instead is that people follow scripts that are mostly intuitive or look for reasons to support conclusions already established (confirmation bias) or are attached to strong emotional responses.Such responses can be activated by the invocation of political symbols by demagogues, or by making particular beliefs and positions (such as climate change denial) a matter of group identity (Kahan 2013).This pessimism would, it seems, apply a fortiori to deliberative democracy, given it is much more demanding of human reasoning than is (say) voting.So Achen and Bartels dismiss deliberative democracy in a brief footnote as "not relevant" to national-level democracy (2, n2).With plenty of other demands on their time, "people cannot engage in much thoughtful political deliberation, nor should they" (9).
Our purpose in this article is to counter this pervasive skepticism about any deeper democracy by showing that lay citizens can deliberate effectively-given the right conditions.This is demonstrated using a measure of reason grounded in deliberative principles of mutual understanding and reciprocity which, while different from those used by skeptics such as Achen and Bartels, enables us to show how the reasoning pathologies they identify can be overcome.That said, we can actually agree with the skeptics, but very conditionally.The typical citizen might drop "to a lower level of mental performance as soon as [they enter] the political field" (Schumpeter [1943(Schumpeter [ ] 1976, 262), 262), but it is less the reasoning capabilities of citizens that demands critical scrutiny than the construction of the field in which it occurs.Against the skeptics and pessimists, we believe that "humans are … poor monadic reasoners but not poor group reasoners" (Chambers 2018, 37).We intend to show theoretically and empirically exactly how citizens can engage in effective deliberative reasoning, especially if the field is right.Ours is not just another normative defense of deliberative democracy, or conceptual critique of democratic skeptics.Instead, our treatment is empirical, based on a new method of assessing deliberative reason, which for the first time enables measurement of the degree to which citizens reason together.In this analysis, we are careful to distinguish between observable attributes of the content of deliberative reason, and the process of deliberative reasoning that ought to produce these attributes.
Much political science survey research demonstrates citizen incompetence in terms of solitary reasoning.Many pertinent psychological experiments involve decontextualized tasks with no interaction and no supportive environment providing participants with adequate information.But deliberation involves reasoning together, not individually.The prospects for deliberation can be illuminated by the more optimistic perspective on the evolution of human reasoning developed by Mercier and Sperber (2011;2017).While sharing pessimism about individual reasoning, they claim that reasoning is an inherently group process, best invoked in social settings (see also Sloman and Fernbach 2017).Group deliberation provides the best setting for ensuring a justificatory basis of reasons because individuals must find ways to reach and convince others, thus correcting their own inherent bias.The process of human reasoning is essentially dialogical; it evolved to convince others and if appropriate, to be convinced by them, not to find one's own way (Mercier and Sperber 2011).The property that we characterize as deliberative reason, induced by group deliberation, particularly in cases involving diversity, should therefore reflect a fuller range of relevant reasons.Deliberation also provides a social setting that induces responsiveness to relevant diverse considerations (and reasons) beyond the self, yielding what Arendt (1961) calls "enlarged thinking," the capacity to widen the field of view, and incorporate the standpoint of others.
Yet deliberative reasoning is not something that will just happen in any group setting, which is why we need to study precise enabling conditions.Rosenberg (2014, 108) reports that "most "participants" who attend deliberative processes do not, in fact, engage in the give and take of the discussion."We show that this kind of generalization fails to do justice to considerable variation across different forums regarding the amount of deliberative reasoning.Therefore, we "zoom in" on design features of deliberative forums (alongside issue characteristics and demographic factors) we expect to produce variation in the quality of deliberative reason,1 which we measure in terms of attributes of outcomes that participants mutually construct.This will enable us to identify the conditions under which deliberative reason is produced, and indeed what it is that makes forums deliberative (Ryan and Smith 2014, 23).Notably, we will demonstrate that improved group building activates interactive norms at the outset of forums, dramatically increasing the degree of deliberative reason.Our empirical analysis needs an operationalizable account of deliberative reason, which we now provide.

DELIBERATIVE REASON
Deliberation is generally understood to involve "mutual communication on matters of common concern" whereby participants weigh relevant considerations to inform conclusions regarding forms of action (Bächtiger et al. 2018).But how do we discern the effectiveness of such weighing empirically?
One widely used measure is the Discourse Quality Index (Steenbergen et al. 2003), which treats good deliberation as a matter of good procedure (justification of positions, respect, etc.), without sensitivity to whether participants are actually weighing all relevant considerations effectively.Most epistemic treatments of deliberation speak of "truth tracking" and so require either a value judgment concerning what is a good outcome (truth) external to the deliberation itself, or seeing truth as constructed intersubjectively, while lacking a formal measure of deliberative quality (see Estlund and Landemore 2018).Opinion change is sometimes used as an indicator of deliberative quality, but it can be produced by distinctly nondeliberative mechanisms, such as those yielding increased group polarization (Sunstein 2002).While recognizing the contributions of these three approaches, we try here to move beyond their limitations by drawing on deliberative theory to identify key features of deliberative reason we should see constructed by reasoning.
Deliberative reason as we conceptualize it is a property of relationships between individuals, in which their differences in values, beliefs, and preferences are regulated by parameters formed in deliberative communication, parameters whose content we now describe.To begin, deliberative reasoning is intersubjective insofar as it connects internal knowledge with knowledge of the minds of others, as well as with their objective knowledge of the world (Davidson 2001).Here, internal knowledge (and knowledge of the minds of others) can consist of values (such as social harmony, or security), subjective dispositions (e.g., suspicion of large corporations, or trust in scientists), and experiential understanding (such as what it is like to live with a disability).
Deliberation, therefore, draws upon and constructs a shared representational framework within which we clarify our understanding of what our fellow deliberators mean from our observations of what they are saying, 2 and within which reasoning is enabled as well as constrained within mutually endorsed boundaries. 3 Take for example the challenge of constructing an effective policy response to the problem of drug addiction.Relevant considerations to inform the framework might include scientific findings concerning the addictive properties and physical consequences of particular drugs, and psychological or social scientific findings concerning the propensity of different kinds of people to addictive behavior and the consequences of their actions for other people.Some facts and findings may be settled, some contested.Also relevant are valued ends, such as individual health and community safety.The motivations of legal and illegal drug suppliers might also matter.Different characterizations of the challenge of solving the problem are possible, depending on whether addiction is seen as in essence a matter of biochemical propensities, personality, material incentives, or socioeconomic structure.Applying these characterizations, there are both settled and contested facts concerning implications of different policies (such as criminalizing drug users as well as suppliers, regulating suppliers of legal drugs, and access to therapy and rehabilitation) for valued ends.A shared representational framework constructed by deliberators should render all these aspects mutually intelligible.
The shared "logic" embedded in a representational framework may be intuitive (and irretrievable in syllogistic form), but it nonetheless should produce coherence across propositions (Davidson 2001).Here, coherence aligns our understanding of what others mean with our own account of the objective world, as well as with the internal values that inform our judgments.Together these beliefs and values comprise considerations to be evaluated in determining what should be done.Coherence checking is central to group reasoning (Mercier and Sperber 2011).For example, an environmentalist whom a group trusts might surprisingly express support for nuclear power (or geoengineering), contradicting the established idea that environmentalists should oppose the technology on grounds of environmental risk, so producing incoherence.The explanation this person gives is that they now think the risks of nuclear power (or geoengineering) are not as great as the risks from the climate change the technology would help avert.The group should then reflect on this in an effort to restore coherence between trust and specific claims.Coherence checking here would also hold individuals accountable for factual claims (such as the risks of these technologies).
Deliberative reasoning as we characterize it recognizes the possibility of identifying the set of relevant considerations, while falling short by failing actively to take all of them into account to capture the complete picture.And so the resulting deliberative reason should ideally reflect integration, where all relevant considerations are factored into reasoning (Misak 2004).What counts as relevant should itself be determined in deliberation. 4Integration should overcome pathologies such as motivated reasoning, confirmation bias, and the suppression of relevant values (such as shared environmental concern) in collective choices.Integration can help eliminate seemingly coherent yet indefensible propositions (such as conspiracy theories).To the extent integration occurs across all relevant considerations, then the degree the group trusts the veracity of facts (or at least the basis for their contestation), the relevance of characterizations, and the prioritization of valued ends should be reflected in the degree it agrees that particular actions would serve particular ends.So, if the group shares an understanding of what the issue looks like, there should be proportionality within the group in the differences between the valued ends (and other supporting considerations) and any resulting preferences.To return to our addiction example, a "therapeutic" package might stress addictive personality, individual health, and so decriminalization and social support; a "choice" package might focus on profit motivations of drug suppliers and so positive and negative incentives to suppliers and users.Within the group, individual backing for each package should be proportional to approval of its supporting considerations.
In short, deliberative reason reflects the mutual integration of relevant evidential, forensic, interpretive, and normative considerations within an intersubjective representational framework featuring coherent understanding of cause and effect applied to the question of what action to take on matters of common concern.The ideal here is not consensus, but rather higher-level agreement on what considerations matter and the implications for how we choose what to do.Such higher-level agreement can be sought even in a deeply divided setting (such as Northern Ireland), involving acceptance of the validity of the identity concerns of the other side (such as a British or Irish identity supported by acceptable reasons, anchored in history, beliefs, lived circumstances) even though they are not shared (O'Flynn and Caluwaerts 2018, 748-9).
As we move to measurement and empirical analysis, we shall stress the deliberative reason that is the substance of what we capture directly, rather than the deliberative reasoning which produces that substance.The higher-level agreement that is key to deliberative reasoning is observable only as an outcome measure (and so is different from process or procedural measures like the discourse quality index).But the kind of outcome attribute it captures is exactly what deliberative reasoning ought to produce, and which other plausible influences surrounding deliberation (such as the conformity effect of being in a group, conclusion and for evaluating their strength" (Mercier and Sperber 2011, 58).This contrasts with schemas (e.g., Lodge and Hamill 1986) that assist intuitive judgments, without necessarily involving thoughtful evaluation required to assess which considerations-and their associated representations-should be integrated into reasoning and/or updating of representations.
or undue influence of expert framings) should not produce (see our analysis below).A representational update should occur-one that coheres toward a shared understanding within the group and integrates a more complete set of relevant considerations, which in turn helps form and revise judgments.It is this shared understanding our measure of deliberative reason will capture.This measure can be used, as in this article, in test-retest form to assess the consequences of an intervening process of deliberation.We develop further suggestions for research on links between the outcome and process of deliberation at the end of this article.
At the conceptual level, the connection between the outcome and process of deliberation implied in our measure can be illuminated by a parallel with philosophical discussions of public reason.Public reason as presented by Rawls, Habermas, Sen, and others refers to the idea that basic political principles (such as rights and liberties) should be justifiable to everyone governed by these principles; it is an attribute of outcomes.But public reason also requires a procedure (such as free, inclusive, and competent deliberation) of which these principles are outcomes.Likewise, deliberative reason is an attribute of outcomes (though unlike public reason, this attribute does not refer to the content of principles), but this attribute ought to reflect a particular kind of (deliberative) procedure.
Deliberative reason as we measure it is a group-level relational property.Existing process measures such as the Discourse Quality Index (Steenbergen et al. 2003) capture deliberative qualities (such as justification and respect) mostly at the individual level.They simply aggregate individual qualities at the group level rather than conceptualizing them as a group-level property.We can do better here.
Because deliberative reason is a group-level relational property, a good measure should involve no external value judgment about the substance of outcomes, or indeed any contemplation of this substance at all.5 Rather, it should involve two key group-level features.The first is consistency, yielded by high coherence, when any agreement on actions is supported by convergence toward the same representational framework, and any disagreement on actions can be understood in terms of that framework.The second is integration shared across the group.If a deliberating group agrees on relevant considerations and integrates them into reasoning via a shared representational framework, regularity between opinions and preferences forms within the group. 6Thus the extent to which deliberators disagree is constrained by a shared "logic," such that their diverging values or beliefs should yield a comparable degree of divergence in expressed preferences.7So in our nuclear power example, divergent assessments of the relative size and moral significance of the risks of nuclear power and climate change should produce proportional divergence in degree of support for the nuclear option.The level of opinion agreement on considerations should be proportional to the level of agreement on preferences among possible courses of action.This measurable (intersubjective) consistency forms the basis of our Deliberative Reason Index (DRI).

Observing Deliberative Reason
The DRI is based on the intersubjective consistency of any pair of deliberators, as described by Niemeyer and Dryzek (2007).Capturing intersubjective consistency begins by surveying opinions across the range of underlying considerations that ought plausibly to inform preferences concerning the issue at hand.Here, "considerations" cover what we earlier characterized as "internal knowledge" of values, beliefs, subjective dispositions, and experiential as well as objective understandings of the world.Our stress on considerations rather than just reasons is consistent with the more expansive and less narrowly rationalistic approach now generally accepted by deliberative scholars (Bächtiger et al. 2018, 6, 7), and allows that facts and values are often intertwined.These considerations can be modeled by asking each individual to arrange around 20-40 statements (drawn from real-world discourse) about the issue at hand along a "most agree" to "most disagree" scale.(In Section A of the Supplementary Material, we provide case study details and in Section B of the Supplementary Material, we explain survey design and how we obtained data.8 ) For the Far North Queensland Citizens' Jury (FNQCJ) case, which featured conflict over development of a road (the Bloomfield Track) through the World Heritage listed Daintree rainforest (case 3; see Section A of the Supplementary Material), examples of "considerations" statements are below: • Laying bitumen on the Bloomfield Track would be beneficial for the environment.• Erosion from the Bloomfield Track is permanently damaging the coral reefs that fringe the beaches below.
• No development should be permitted in World Heritage areas such as the Daintree.• The fate of the Bloomfield Track is of no concern to me.• The Bloomfield Track is important because it allows quick access to remote areas of the North.• There is no reason to believe that the Daintree Rainforest is under threat.
• Let's fix the problems in the Daintree just for now.The future will take care of itself.• The most important use of the Bloomfield Track is for tourism.• Everyone in Queensland is better off for having a road like the Bloomfield Track.
Preferences across action options (usually alternative policies) are then measured through each individual ranking a small set of (usually less than 10) options (Niemeyer 2020).For the FNQCJ case, the options are • upgrade the track to a bitumen road; • maintain the road in its current condition as a 4WD track; • close the road and rehabilitate it; • upgrade the road, to a dirt road suitable for conventional vehicles; • stabilize specific trouble spots, such as steep slopes, on the road but leave it as a 4WD track.
Computation of DRI (explained in more detail in Section C of the Supplementary Material) begins with calculating the correlations between any two individuals for considerations, then doing the same for preferences, and repeating for all possible pairs within the deliberating group.9These correlations can be used to construct a plot of points for each pair, and to compute intersubjective consistency.Figure 1 shows a sample intersubjective consistency plot for four deliberators (A,B,C,D), drawn from the FNQCJ case. 10 The degree of intersubjective consistency for each pair is measured as the orthogonal distance from the 1:1 line representing direct proportionality or perfect consistency (d a,b , d a,c , d a,d, d b,c , d b,d , d c,d ).High intersubjective consistency is not the same as high agreement.In the figure, the pair AB exhibits greatest agreement on preferences and the pair CD the greatest agreement on considerations.However, the pair BC is the most intersubjectively consistent in aligning considerations with preferences (because it is closest to the 1:1 line). 11alculation of individual-level DRI (DRI Ind ) involves the aggregation of intersubjective consistency for all pairs including that individual (see Section C of the Supplementary Material).From Figure 1, pairs that include participant D are least consistent, with relatively low levels of preference agreement compared to agreement on considerations.D, therefore, exhibits the lowest levels of reason when triangulated with the rest of the group.By contrast participant A exhibits the highest deliberative reason (DRI Ind ), as possible pairs involving A display the smallest average distance.All DRI Ind are then aggregated to produce group DRI.
The left-hand diagram of Figure 2 plots predeliberation pairs for all individuals in the FNQCJ case.12Most points are distributed orthogonally distant from the 1:1 line, representing a (low constraint) situation where relatively high overall agreement regarding considerations is not reflected in preference agreement.The distribution of pairs resembles a random pattern produced by Monte Carlo simulation, hence a group DRI approaching 0. However, in this case, the distribution is offset to the right, due to nonrandom levels of consideration agreement-a predeliberative pattern typical for many cases we report.
Pre-deliberation group DRI here is so low because strategic political language in the larger public sphere created polarization and sectarian reasoning that did not reflect the underlying consideration agreement prior to deliberation (Niemeyer 2011).There was instead a focus on misleading claims (advanced by both environmental and developmental interests) that reduced integration of considerations into reasoning about preferences.Deliberation then dissolved sectarian framings and enabled integration as reflected in the much higher group consistency in the right-hand plot-where preference consensus reflects that for considerations. 13he very low pre-deliberation group DRI (−0.07) rises to 0.49 post-deliberation, as graphically represented in Figure 2. A (hypothetical) strongly negative group DRI would suggest an extreme situation, for example where willful sectarianism produces a representation that inverts the conclusions drawn from the same considerations.
Why do we call DRI "deliberative reason" when it can be measured prior to a deliberative procedure?The answer is that deliberative reason can also be found to greater or lesser extent in natural settings in the broader public sphere (i.e., not just designed forums).We can get some sense of this extent (and how it might vary across different groups in the public sphere, such as climate skeptics and nonskeptics) using the DRI. 14Low group DRI, approaching zero or even negative values in cases such as FNQCJ before deliberation, suggests poor deliberative reason in the public sphere. 15he FNQCJ represents an extreme case.The Uppsala Speaks study (Jennstål 2019) represents a more common pre-deliberation scenario.The three cases for this study are reported in Figure 3.They include a control group, which simply performed the DRI survey at the same pre-and post-deliberation time points as two deliberative group cases (Group Briefing and Group Building Plus) involving two different treatment conditions.Pre-deliberation DRI is similarly high for all three cases compared to the FNQCJ case (Figure 2).
Figure 3 shows group DRI improved substantially in both deliberative groups, but not in the control group.Moreover, improvement is higher for the Group Building Plus than for the Group Briefing case.But how do we know it is reasoning together that yields improvements in DRI (found in 16 of our 19 cases-Table 3), and not something else?We can rule out some alternatives.Any tendency toward group conformity should suppress group DRI, because people would move toward agreement on preferences without increased agreement on considerations, thus violating integration.If conformity induced agreement on both considerations and preferences, then we would indeed see an improved DRI.But if we examine scatterplots such as those in Figure 3, we see

DRI Plots: FNQCJ Case
Note: Pre-and post-deliberation refer to time points where participants were surveyed immediately prior to, and at the conclusion of the entire proceedings for the cases that we report.

FIGURE 3. DRI Plots: Uppsala Speaks Study
Note: † Includes cognitive training as well as group building (level 5 in Table 1).
How Deliberation Happens convergence toward the 1:1 line for those who continue to disagree on both considerations and preferences.Moreover, the fact that considerations and preferences are ascertained in private both before and after deliberation means there is no social payoff to conformity.Subtle domination on the basis of (say) personality or social class would likewise suppress DRI.Information provision for its part can yield some improvement in DRI, but its effect is difficult to isolate when combined with deliberation; formal deliberation has at least as strong an effect (see Section D.1 of the Supplementary Material).The unconscious adoption by deliberators of a framework provided by experts could also conceivably increase group DRI.However, framing would need to be comprehensive and persuasive to a degree unlikely when there is a diversity of perspectives.Domination of a single view might still decrease integration, but this is why balanced information, facilitation, and presentations by advocates from different sides are a key part of deliberative forums (although there would be nothing wrong with reflective acceptance by deliberators of an expert framework).As Druckman (2004, 683) concludes, competing frames presented by elites and "heterogeneous discussions" can overcome framing effects.Our results suggest no such expert framing is influential in driving changes to DRI (see Section D.4 of the Supplementary Material), consistent with the finding of Westwood (2015) that the most effective form of persuasion in deliberating groups is peer-to-peer.
We reiterate that DRI is not a measure of consensus.Mansbridge et al. (2010, 68-9) point out that clarifying disagreements is just as valuable as seeking consensus in deliberation.Such clarification is captured by our group DRI.For a high group DRI suggests deliberators have come to understand the relationship between the preferences of themselves and others and the reasons (in relevant considerations) for these preferences, no matter how strong preference disagreement continues to be.This covers understanding the nature of their conflict, because an increase in DRI reflects the degree participants engage with the perspectives of others.

CONDITIONS FOR DELIBERATIVE REASON
Our analysis of the determinants of improvement in group DRI is based on forums designed to be deliberative.Neblo, Esterling, and Lazer (2018) claim that when citizens have the "means, motives, and opportunity" to become informed about politics, they behave differently than in the "anemic view of democratic participation" implied by many survey researchers, but we need to check that this holds.We expect the experience of group deliberation to improve deliberative reason, irrespective of other variables.We refer to change in DRI absent other influences as the effect of "deliberation per se"-observed via comparisons between pre-and post-deliberation. 16The baseline here is constituted by the standard conditions of deliberative forums: the opportunity to learn from balanced information, voice one's concerns without fear of denigration, listen to the concerns of others, and reflect upon what they have to say (Curato et al. 2021).But that does not mean all such forums are equal in their capacity to induce deliberative reasoning, and we find considerable variation across forums (see Figure 3 above).So we examine design features of forums (alongside issue characteristics and demographic factors) that might produce variation in deliberative reasoning beyond deliberation per se.

Design Effects
To begin, we need to look closely at what theorists of group reasoning (whether or not they locate themselves in the field of deliberative democracy) say about what should enable it to occur.Beyond design features all deliberative forums share (such as balanced information, gender equality, representativeness of participants, and facilitation), three design variables can be located in the relevant literature: group building, duration, and expected policy impact.17

Deliberative Group Building
We expect dedicated activities at the outset to build positive dispositions toward both the subsequent deliberative task and other members of the group to produce better deliberative reasoning by priming the reasoning task.Group building may also improve trust, which in turn removes one barrier to deliberation, because it means participants do not have to suspect each other's motives.Further, epistemic trust can counter the domination of the viewpoints of advantaged groups (Catala 2015).Now, Mercier and Sperber (2011;2017, 9) point out that group reasoning is necessary precisely when trust in what an individual says must be assessed, especially if it does not cohere with prior beliefs.But group building can still establish the conditions under which listening and taking seriously different views is likely to occur, facilitating willingness to adjust arguments to accommodate considerations that can be evaluated (and integrated) by all (Mercier and Sperber 2011, 60).Mansbridge et al. (2006, 13) stress the perception of many facilitators of deliberative forums that "a productive group atmosphere" enables "the free flow of frank speech."Group building should therefore help break down the role of sectarian framing that scripts reasoning along narrow, nonintegrative terms (such that, e.g., a position should be adopted because it is endorsed by a party).
We distinguish five levels of group-building activity (Table 1).
Levels 2 and 3 may help build trust and shared group identity (Batalha et al. 2019).In level 4, the group develops its own principles for deliberation, which in practice invariably embody deliberative ideals (respect, reciprocity, trust, etc.).The fact that these principles are self-generated rather than dictated to the group (as in level 3, the most common practice) means the group has "ownership" of them.This is consistent with the idea that "player-generated rules" have particular force for a group's subsequent interactions (Lerner 2014, 64, 65).We hypothesize that framing the deliberative task and building of trust are most successful when generated by the deliberators themselves, as in levels 4 and 5. Together levels 4 and 5 constitute a "group building proper" subset, with a dedicated process to build (or activate) deliberative capabilities.Level 5 is specific to the Uppsala Speaks "Group Building Plus" case illustrated above in Figure 3, which included cognitive training, using mindfulness techniques designed to improve coherence through self-awareness of reasoning.

Duration
We expect duration to improve deliberative reason: the more time (measured in days) participants have, the more in-depth reasoning processes can unfold (Curato et al. 2017;Street et al. 2014).The longer the process, the more different kinds of information and means of its presentation can be digested and engaged; the more nuance can be both expressed and received; the more opportunity there is for reflection; and the more time there is to think through the content of the final report (or equivalent) and integrate different points of view when producing recommendations (Street et al. 2014, 5).

Decision Impact
Decision impact is the intended relationship between the output of a deliberative forum and an associated policy decision.Five levels are outlined in Table 2.
The conventional wisdom is that expected impact on policy induces higher quality deliberation because participants will take their task more seriously (e.g., Fung 2003, 346), but we are aware of no evidence supporting this claim.Our competing conjecture is that proximity to power may actually reduce deliberative reason, by inducing individuals to behave strategically to advance the prospects of their preferred options-although this effect may be mitigated by group building.The case of the 2004 British Columbia Citizens' Assembly on electoral system reform supports this conjecture.The Assembly was charged with crafting a referendum question, and toward the end of the process, there was pressure on participants to join a consensus to maximize influence on the referendum outcome (Warren and Pearse 2008).Another related (confounding) effect is the possibility of self-selection to higher staked processes by individuals motivated by the prospect of policy influence rather than the opportunity to participate in deliberation. 18he Effect of Complexity Design features interact with the characteristics of the issue and task at hand.Issue characteristics are good predictors of procedural deliberative quality in legislatures (as measured by the Discourse Quality Index; Bächtiger and Hangartner 2010;Steiner et al. 2004).Complexity in particular challenges deliberative reasoning.Coherence is harder to achieve when complexity facilitates divergence in representations of the issue at hand.Integration is also harder where there are more considerations to weigh, involving   2016).
We anticipate that complexity interacts with design variables, especially group-building: the stronger the group-building activities, the more the group will be equipped to take on complex tasks and questions, thus counteracting negative effects of complexity.Group building facilitates listening more effectively to multiple dimensions of what others say, as well as enabling effective division of cognitive labor (Mercier and Sperber 2011), notably in judging the credibility of claims made by experts.Group building also facilitates motivation to explore and evaluate claims, overcoming cognitive closure (Kruglanski and Boyatzi 2012).
Complexity is a predictor that we construct on a 1-4 ordinal scale (low to high) based on three dimensions: (i) breadth of the remit assigned to the process (number and latitude of options to be addressed), (ii) degree of technical or scientific content of the issue at hand, and (iii) geographical scale of the issue (local to international) (see Section E of the Supplementary Material for details).

Individual-Level Effects
The characteristics of individual participants comprising the group may also make a difference to deliberative reasoning.We focus on three standard demographic variables: education, gender, and age.Higher education level is associated with greater participation in conversations involving politics (e.g., Moy and Gastil 2006).Standard research suggests socioeconomic resources and education shape abilities to participate effectively in politics (Verba, Schlozman, and Brady 1995).Given deliberation is more demanding than voting, we might expect education to play an even stronger role.Research on deliberative forums has largely refuted this prediction, finding no effect for education (Siu 2017).However, evaluation of a Europe-wide Deliberative Poll ("Europolis"), showed that the least privileged participants-the least educated, particularly from the European periphery-were the least skilled deliberators (Gerber et al. 2018).Yet this study found those good at providing sophisticated justifications also listened respectfully to, and seemed as open-minded as, participants with lower communication skills, suggesting group deliberative reasoning was not necessarily impaired.
When it comes to gender, argumentation involving logical deduction and the application of general principles is sometimes seen as a masculine style, while more tentative, contextual, figurative, and emotional expressions are seen as feminine-which is why Sanders (1997) is hostile to a narrow interpretation of deliberation (see also Young 2002, 38-40).However, deliberative virtues such as empathy (Morrell 2010;Muradova 2020) and perspective-taking (Scudder 2016) are sometimes seen as feminine (e.g., Sommerlad et al. 2021).Dutwin (2003) found no evidence that gender affected quality of deliberation in a forum: the overall amount of speaking and number of topics discussed were roughly equal across gender (as well as race and perceived political minority status). 19 Finally, age could be seen as a proxy for experience with politics, though whether this increases a person's ability for self-reflection and responsiveness to others' viewpoints and arguments is questionable.Gerber et al. (2018) find that younger participants deliberate at a slightly higher level (measured by the Discourse Quality Index).

CASES
We analyze deliberative reason and the conditions that enable it for 19 deliberative forums (see Section A of the Supplementary Material).Seventeen are minipublics, defined by Ryan and Smith (2014, 19) as comprising lay citizens recruited using stratified random selection to deliberate a specific issue in a fixed time under structured (facilitated) conditions.Two of the forums (ForestERA and WA Biobank) involved stakeholders rather than lay citizens. 20All operated according to standard deliberative principles, overseen by a facilitator.All our cases come from Western countries; future research should expand the reach.
Our dataset covers the totality of cases where appropriate data is available (see Section A of the Supplementary Material).Although these cases do not cover all types of deliberative forums, they do capture the relevant diversity of key design features (in group building, duration, and decision impact). 21Table 3 reports the change in pre-to post-deliberative group DRI for all 19 cases, along with their corresponding 19  Karpowitz and Mendelberg (2014) show experimentally that interaction of gender composition of a group with decision rule makes a difference to deliberation.However, all our cases featured gender equality, and any "decision rule" (such as consensus, voting, or agreeing to disagree) could in many cases be determined by the participants themselves, and/or left implicit. 20The results are not impacted when these two cases are removed from the analysis.See Section D5.2 of the Supplementary Material. 21Thus, our focus in case selection is on avoiding bias in drawing conclusions, rather than representativeness.To this end, we employed a meta-analytical approach to check for possible sample size and effect size bias using Egger et al.'s (1997) regression and Begg and Mazumdar's (1994) rank test on DRI effect size, which did not reveal any sampling bias.See Section F.6 of the Supplementary Material.
case-level values.It shows that in the vast majority of cases, group DRI increased substantially.

ANALYSIS
Analysis of influences on deliberative reason across our 19 cases was performed using two levels: the effect of participants' individual-level characteristics (level 1) and the effect of the case-level features discussed above (level 2).We applied multi-level modeling (MLM) on DRI Ind . 22  Structuring the analysis this way accommodates the fact that DRI measurement is issue and contextspecific, requiring a survey instrument designed for a given study.In the absence of controlled conditions across treatments, such as those associated with the Uppsala Speaks study reported in Figure 3, we enable comparison across studies by adopting a model that considers cases as random effect and other predictors as fixed in order to hold transformation effects constant, thus enabling us to determine whether changes to DRI Ind can be attributed to deliberation per se, demographic (level 1) or group/case (level 2) variables.The MLM analysis employs a random intercept model (using the R package lme4; Bates et al. 2015), whereby the intercept randomly varies across each cluster, to accommodate variations across the cases in terms of level of DRI (see Section F.3 of the Supplementary Material).The level of model fit is assessed using pseudo-R 2 (Nakagawa and Schielzeth 2013).Multicollinearity between variables has been checked (see Section F.5 of the Supplementary Material).We corrected for potential loss of power due to relatively small sample size (19 cases involving 387 survey respondents among 722 deliberating individuals)23 using Restricted Maximum Likelihood (REML) estimation (Luke 2017) with the degrees of freedom estimated following Kenward and Roger (1997).Finally, we adopted a correlated random intercept in our modeling to accommodate variation in pre-deliberative DRI levels across different cases (see Table 3; Section F.3 of the Supplementary Material).
For the MLM analyses reported below, we use a hierarchical approach, adding variables to the model in sequenced steps, beginning with effect of participantlevel variables (level 1), then case-level variables (level 2).This enables the exploration of progressive changes to model fit and the fixed effect of each additional predictor variable.

RESULTS
The results of the basic MLM analysis are provided in Table 4. Model 1 produces a significant intercept, or pre-deliberation DRI, supporting our earlier point that deliberative reason is not limited to deliberative forums.However, model 2 demonstrates that the general experience of participating in a forum, "deliberation per se," dramatically improves reason, increasing DRI by 0.113.Given that in most cases the working range for DRI is between 0 and 1, this is substantial.The effect is fairly constant across all the models. 24evertheless, deliberation per se in model 2 accounts for a relatively small additional amount of fixed variance (ΔR 2 = 0.066) over the intercept-only model.We know from Table 3 that deliberation does not guarantee improvement in deliberative reason, with a few cases posting a decrease in DRI, hence our contention that design and context play important roles in explaining the variation.
The results when adding models 3-5 in Table 4 suggest individual-level variables do not figure strongly.They may have some effect on deliberative reason. 25But the transformative potential of deliberation does not depend on these individual-level features.
Table 4 shows the case-level variables strongly impact DRI.The greatest effect by far is associated with Group Building (model 6; 0.076; Fixed Effect ΔR 2 = 0.166).DRI improves with level of group building (although not necessarily continuously; and, as we report below, this effect is subject to interaction with other case variables).The effect of Complexity in this model is not significant on its own, but best understood via interaction with other variables (as reported below).Contrary to expectation, duration's effect is not positive. 26Similarly, Decision Impact does not appear to affect DRI, except in conjunction with other variables (again, reported below).
An interaction analysis is reported in Table 5, which controls for level 1 and level 2 variables and so their potential confounding effects.When Complexity is combined with Group Building (Table 5; model I2), differing degrees of complexity have a variably negative impact on DRI.While lesser forms of group building do not overcome the challenge of complexity, higher forms are more successful.In model I2, the effect of complexity in attenuating DRI decreases as level of group building is gradually increased.Eventually, at a point beyond level 4 (group briefing) complexity no longer attenuates improvement in DRI.This interaction explains the weak effect of Complexity on its own in Table 4 and confirms the power of Group Building in enhancing deliberative reason on complex issues.
Decision Impact has a complicated relationship with Complexity. Figure G.1.1 in the Supplementary Material shows the changes to DRI as Complexity increases for different levels of Decision Impact.Forums with lower decision impact tend to produce lower DRI where complexity is low.But as Complexity increases beyond (approximately) level 3 the relationship appears to reverse-higher decision impact produces lower DRI. 27The differences are only significant between the extreme levels of Decision Impact-and for the majority of cases where Complexity is about level 3 (see Table 3), it will tend not to impact DRI.
The fact that high complexity and high decision impact appear to decrease DRI is broadly consistent with our hypothesis that strategic behavior sets in under such conditions.This effect was observed in case 13 (see Section A.1 of the Supplementary Material), where the deeper climate skeptics-more prone to deliberative pathology from the outset-behaved strategically when decision-makers were in attendance.Once they realized their views would not prevail, they abandoned the process (Hobson and Niemeyer 2013).This strategic self-deselection meant improved deliberative reason following their departure.0.18 0.17 0.17 0.17 0.17 0.17 0.17 0.17 0.17 In contrast to Decision Impact, the negative impact of Complexity is attenuated when combined with higher levels of Group Building (Table 5; model I2).As shown in Figure G.2.1 in the Supplementary Material, the incremental impact of successive Group Building levels becomes marginal with declining complexity.But where complexity is high the benefit is clear.And although the confidence intervals between successive Group Building levels overlap, even when Complexity reaches the maximum value in Figure G.2.1 in the Supplementary Material the overall effect is significant.The implication is incremental improvement in Group Building may not guarantee improvement in DRI, but it is still possible, if not likely, especially where complexity is high.
Together these interactions explain the weak effect of Complexity on its own in Table 4.And they confirm the power of Group Building in enhancing deliberative reasoning on complex issues.

DISCUSSION OF RESULTS
We have disputed the claims of skeptics who think deliberation about politics is impossible given limited relevant capacities among ordinary citizens, or only accessible for some privileged minority. 28However latent it may be, the ability to reason effectively about politics is readily activated in a forum under good conditions.In most of our cases, there was improvement in deliberative reason.The design features of forums matter: notably, Group Building, which enables participants to cope effectively with complexity.This finding vindicates the historical deployment of minipublics for complex topics that the legacy institutions of the representative system find problematic (Bächtiger and Goldberg 2020), such as climate change or risks associated with new technologies.Our results support the view that well-designed deliberative forums can perform important functions in democratic politics.While it is true that deliberative forums have often found limited visibility and consequentiality, there is now a growing number of cases where minipublic recommendations have become prominent in the public sphere, as evidenced by citizens' assemblies on the climate crisis in France, and issues such as same-sex marriage and abortion in Ireland.Much here depends on publicizing forums, their arguments, and their findings.There are many ways of thinking about significant roles for such forums in deliberative systems, ranging from direct influence in shaping public policy, to proposals to use them as chambers of reflection and review, to recognition of a key role in promoting the deliberative capacity of the broader public sphere and political system (Niemeyer and Jennstål 2018).Our findings on group building resonate with increasing interest in educative measures to improve the capabilities and (broadly) "deliberative stance" of participants in forums (Owen and Smith 2015).Examples here include perspective-taking to improve empathy (Muradova 2020), critical thinking, skill training, and mindfulness training (Jennstål N.d., as applied to the Uppsala Speaks Group Building; case 2).They are also consistent with evidence that ideologically driven sectarianism which is antithetical to group building impairs reasoning (Kahan 2013).Now, demonstrating that group building is key to more effective deliberative reason in small, designed forums is one thing; achieving similar effect in larger publics quite another.Yet while the precise kinds of group building we have identified are specific to designed forums, we can search for counterparts that would improve deliberative reason in larger publics.Our analysis suggests only places to look, whether we actually find the effects we seek depends on further empirical inquiry.
To begin, we can seek mass-level counterparts to the higher items on our five-point scale for group building.One counterpart to point 3, "group briefing," might be sought in the rhetorical choices of political leaders.Rhetoric can be inclusive or divisive (Pedrini, Bächtiger, and Steenbergen 2013).O'Flynn (2017) points to political leaders in divided societies who can cultivate a sense of "pulling together" in larger publics.Chambers distinguishes between the plebiscitary and deliberative rhetoric of leaders; the latter "makes people think, it makes people see things in new ways, it conveys information and knowledge, and it makes people more reflective" (Chambers 2009, 335).Leaders' rhetoric can be "bridging" across different groups to induce respect for those with different identities and characteristics-a key deliberative principle-rather than "bonding" of the already like-minded (Dryzek 2010).
One counterpart to point 4, "participatory group building," where participants themselves work through principles for subsequent deliberative interaction, is intimated in the internal practices of some social movements (Della Porta and Doerr 2018, emphasizing the World Social Forum) and protests (Mendonça and Ercan 2015, discussing cases in Brazil and Turkey).Min (2015, 81) compares the principles developed by Occupy Wall Street protestors in New York to Habermasian ideals of communicative action and deliberation.
A counterpart to point 5, "cognitive training," exists in attempts to interest parliaments in mindfulness training. 29Again, some social movement practices are indicative.Extinction Rebellion in the UK cultivates a "supportive internal environment based on care" that involves checking feelings about oneself and others, and extends to caring for adversaries, such as the police (Westwell and Bunting 2020, 551).The general point is that counterparts to deliberative group building can be sought in larger publics, though their effectiveness requires further investigation.This does not mean that deliberative forums are only useful for discovering conditions to be recreated in other contexts.For forums themselves can help promote deliberative reason in larger publics if their reflections and findings are publicized (Niemeyer and Jennstål 2018).Warren and Gastil (2015) report evidence from the Oregon Citizen's Initiative Review process, where a minipublic induced some public learning before a referendum.This article is not the last word on deliberative reason, and many significant questions can now be explored using our method.It would be possible to examine in more depth how deliberative reason is affected by ideological diversity within the group, or different sorts of demographic composition (meaning stratified random sampling would have to be relaxed).It would also be possible to combine the DRI with procedural measures such as the Discourse Quality Index to see whether substantive deliberative reason correlates with procedural adherence to the norms of quality discourse.

CONCLUSION
Corroborating the idea that human reasoning is something that happens most effectively in groups, we have shown that citizens are capable of together producing effective deliberative reason, particularly when the conditions are right, thus validating the core claim of deliberative democratic theory.This suggests citizen deliberation can and should play a larger role in democratic systems.But the quality of deliberative reason depends crucially on the features of the forum, especially the degree to which it involves participatory group building at the outset.Effective group-building enables groups to overcome the challenge of complexity.The latent ability of citizens to engage in effective political reasoning can be activated under the right conditions.Our findings demonstrate this possibility and point to pathways for widespread improvement of deliberative reason.

TABLE 1 .
Group Building Levels

TABLE 4 .
Multilevel Regression Results

TABLE 5 .
Interaction Effects