Frankly, we are stunned. We expected that our target article would provoke ferocious counter-attacks among a substantial cross-section of researchers from several fields. Awaiting the commentaries, we steeled ourselves, bracing for harsh and relentless rebukes. One renowned social psychologist, who had read an early draft, warned us that our colleagues would probably spit on us. What arrived were 28 commentaries from an esteemed and diverse set of scholars, including anthropologists, economists, linguists, neuroscientists, philosophers, primatologists, and sociologists, as well as cognitive, developmental, personality, and social psychologists. These commentaries largely cohere as an emerging synthesis, offering important expansions and extensions of our argument, as well as raising several interesting points for debate and discussion. There is now sufficient evidence from diverse human populations to indicate that researchers can no longer continue to – explicitly or implicitly – infer the universality of psychological processes or behavior from studying only WEIRD people and their children. Our reading indicates that 23 of 28 commentaries largely support our main thesis, although they raise important issues and fruitful points for debate. Of the remaining five, only one is in decisive disagreement (Gaertner, Sedikides, Cai, & Brown [Gaertner et al.]), with the other four (Khemlani, Lee, & Bucciarelli [Khemlani et al.], Machery, Maryanski, and Shweder) seeming somewhat ambiguous or ambivalent as to their precise views. Of course, it is possible that those who disagree most strongly with our assessment chose not to comment. We look forward to engaging representatives of this position in the future.
Our reply is ordered as follows: We (1) consolidate the additional lines of evidence, extensions, and amplifications of our target piece made by various commentators; (2) discuss the importance of integrating experimental and ethnographic methods, and show how researchers using behavioral games have done precisely this; (3) present our concerns with arguments from several commentators that separate “content” and “representations” from “computations,” “learning,” or “basic” psychological processes; (4) address concerns that the patterns we highlight marking WEIRD people as psychological outliers arise from aspects of researchers and the research process; (5) respond to Gaertner et al.'s claim that being members of the same species means we must have the same invariant psychological processes; (6) address criticisms of our categories and rhetorical strategy; and (7) return briefly to the question of explaining why WEIRD people are psychologically unusual.
R1. Additional evidence, extensions, and amplifications
Here we consolidate additional evidence, extensions, and amplifications of our target article. Seven commentaries reviewed empirical evidence that we did not present. All of this evidence supports the notion that WEIRD people are unusual, and none of it challenges that claim. Several of these lines of evidence are complementary with each other, and suggest some theoretical reasons for the unusual nature of WEIRD people, an issue that we return to in the final section. Nine additional commentaries supplied insightful amplifications, nuances, or extensions of our efforts.
R1.1. Additional support for the argument that WEIRD populations are unusual
1. Chiao & Cheon point out that the vast majority of cognitive neuroscience findings are based on WEIRD brains. They then review findings from the nascent field of cultural neuroscience showing how population-level differences in experimental findings reveal themselves in brain activity. Some of the psychological differences we discussed regarding the self and holistic versus analytic reasoning can be observed in differential patterns of brain activation. These commentators also highlight differences in brain activation based on socioeconomic status. Their commentary underscores the point that neither imaging brains nor measuring hormones allows one to avoid the challenge of population-level variation. Moreover, because sophisticated cultural learning is an aspect of the evolved human repertoire, our brains are partially self-programmable, so even cultural differences are biological differences (though not genetic differences) – for example, see Nisbett and Cohen (Reference Nisbett and Cohen1996).
2. Majid & Levinson emphasize the importance of considering the world's immense linguistic diversity for studying and theorizing about psychology, especially in light of the unusual nature of English on several important dimensions (e.g., see next item). They also highlight additional evidence indicating how deeply some aspects of language imprint themselves on nonlinguistic aspects of cognition. They point to the uncanny coincidence that the fundamental stock of prelinguistic concepts hypothesized by cognitive scientists corresponds closely to those available in English – the language of many of the researchers and most of the participants – but not to those found in other languages.
3. Amplifying Majid & Levinson's points with further examples, Machery and Stich discuss studies in America, France, Mongolia, India, and China showing that people have different philosophical intuitions, with English-speakers (and Americans) at one end of the spectrum and Chinese at the other. Apparently, some philosophical theories of reference are based on these “English intuitions.”
4. Stich, in considering the implications of our efforts for philosophy, highlights novel work from experimental philosophy showing population-level differences in philosophical intuitions and moral judgments, including a recent finding on the lack of any “omission bias” in the moral judgments of Mayans. This lack of omission bias contradicts previous claims of universality based on work done purely in industrialized populations (and on the Internet). These initial findings, if replicated and extended, suggest that important elements of philosophical theorizing and conceptual analysis are rooted in local folk intuitions that do not extend to the rest of humanity.
5. Karasik, Adolph, Tamis-LeMonda, & Bornstein (Karasik et al.) review evidence on differences in motor development across human populations, and recent data linking developmental differences to childrearing practices. They also highlight how, in some cases, different developmental trajectories arrive at the same outcome.
6. Fernald, in underlining the extreme narrowness of the samples used by developmental psychologists, reviews evidence linking socioeconomic status, early cognitive stimulation, and long-term cognitive outcomes. Since most developmental work is with infants from WEIRD families, the oft-highlighted developmental milestones for various cognitive and linguistic abilities may reside at the extreme end of the true underlying species distribution.
7. Lancy reviews both ethnographic and experimental work on children from across diverse human populations showing just how unusual the worlds are of children who grow up in WEIRD societies, and how different their cognitive development can be.
R1.2. Extensions of our points
Here we lay out five different ways in which various commentators highlighted, amplified, and extended our efforts.
R1.2.1. Dealing with the predominance of WEIRD researchers
One important issue that we were not able to spend much space on in our target article revolves around the fact that most researchers are themselves WEIRD people. This directly impacts the choice of subject pools, since many researchers study those around them. However, this also may impact theory building and experimental design in a number of ways (e.g., Majid & Levinson, Stich). First, Fessler emphasizes how researchers use their own intuitions, at least at the start, in either theory-building or experimental design. His discussion of research on shame shows how American cultural models of emotion lead to basic elements of this emotion being missed by U. S. researchers, despite the fact that these elements are highly salient elsewhere. He argues that researchers, at least for some topics, would be better off not studying people who are culturally similar to themselves. Second, Bennis & Medin worry about this same point, arguing that researchers' cultural biases influence the choice of topics and phenomena that are considered interesting. Further magnifying the problem, instruments are then developed and honed in particular populations, which may not be suitable to other populations (also see Rochat). Finally, Meadon & Spurrett suggest that one important way of addressing these challenges is to bring more non-WEIRD researchers into the process. Empirical findings should be peer reviewed by researchers who bring different cultural models and implicit expectations to the problem.
We agree with all these suggestions: Researchers can view phenomena from a novel perspective, not constrained by their own intuitions, when they study those from other cultures, and can potentially discover phenomena that they otherwise would not see. However, we disagree with an extreme version of this argument, which proposes that researchers should entirely avoid studying people from their own culture. Researcher's intuitions about the ways people in their own cultures think can be a useful source of understanding in building theories and in honing research instruments. More non-WEIRD researchers should be brought into the discussion, as well as onto collaborative research teams. Research teams themselves that better reflect broad global diversity can more effectively address the challenges delineated by Fessler, Rochat, and Bennis & Medin.
With regard to these points, it is instructive to consider why psychology is more dominated by American research than any other science (May Reference May1997). One possibility is that pursuing a career in psychology is a luxury that people cannot afford until the countries and societies in which they live have achieved sufficient economic development. This may be part of the explanation, although this would not explain why universities in wealthy societies like those of Japan and Western Europe typically have proportionately smaller complements of psychology researchers and majors than do North American universities. Another possibility, which we highlight here, is that the field's emphasis on WEIRD samples, coupled with the guiding assumption of universal psychological processes, tends to unintentionally marginalize international research. If non-WEIRD researchers are interested in extending findings initially established with WEIRD samples in their home populations, such as findings associated with motivations for self-enhancement, they may well be unable to replicate the American results. The implicit assumption that self-enhancement motivations are similar everywhere would suggest that such failed replications are not due to the nature of the samples studied but instead due to some kind of unspecified deficiency in the methods of the non-WEIRD researchers. American researchers have a distinct advantage in that the field's key theories were largely constructed on data from American participants, and we suggest that this is likely why American research constitutes 70% of the field's citations. International research suffers from the disadvantage of trying to extend American-based theories with participants who often have different psychological tendencies, yielding results that are difficult to interpret while embracing an untested assumption of universal psychological processes. In contrast, if the field comes to recognize that psychological phenomena cannot be assumed to be universal until demonstrated as such, then research conducted by non-WEIRD researchers, guided by non-WEIRD intuitions, and studied with non-WEIRD samples, would come to be viewed as particularly important for understanding human psychology.
R1.2.2. Existential proofs
Gächter's commentary also extends one of our discussion points by underlining the fact that, depending on the research question, WEIRD subjects may be suitable, or even ideal. In our target article we wrote that
Research programs that are seeking existential proofs for psychological or behavioral phenomena, such as in the case of altruistic punishment discussed earlier (e.g., Fehr & Gächter Reference Fehr and Gächter2002), could certainly start with WEIRD samples. That is, if the question is whether a certain phenomenon can be found in humans at all, reliance on any slice of humanity would be a legitimate sampling strategy. (sect. 7.1.6)
We pointed both to Kahneman and Tverky's work on rationality (e.g., Gilovich et al. Reference Gilovich, Griffin and Kahneman2002) and to Rozin's work on magical thinking (Rozin & Nemeroff Reference Rozin, Nemeroff, Stigler, Herdt and Shweder1990) to highlight situations in which WEIRD samples are either suitable or ideal. However, if one's goal is ultimately to construct (rather than tactically falsify) theories of human behavior, it is hard to see how that could be done without expanding beyond WEIRD subjects.
R1.2.3. Differences among chimpanzee populations
Two commentaries expand on one of our points (Note 14 of the target article) by highlighting the challenge that population-level psychological variation creates for programs comparing humans and chimpanzees (Boesch and Leavens, Bard, & Hopkins [Leavens et al.]). Applying our argument to chimpanzees (though not to other animals?), these commentaries make the point that chimpanzee populations may also vary in their psychological abilities and motivations, and that this difference is important for comparing wild and captive populations.
Broadly speaking, we agree with this point and feel it needs careful attention. Nevertheless, we offer some cautionary notes. First, there are both theoretical and empirical reasons to believe that human population-level psychological variation is substantially greater than that found in chimpanzees. While chimpanzees are a cultural species, with local traditions and some imitative abilities (Horner & Whiten Reference Horner and Whiten2005; Whiten et al. Reference Whiten, Goodall, McGrew, Nishida, Reynolds, Sugiyama, Tutin, Wrangham and Boesch1999), humans are a runaway hyper-cultural species whose genetic endowments, including abilities to adapt ontogenetically, have been shaped by a long history of cumulative cultural evolution, social norms, institutions, and culture-gene coevolution (Henrich Reference Henrich and Brown2008; Laland et al. Reference Laland, Odling-Smee and Myles2010; Richerson & Boyd Reference Richerson and Boyd2005). With the same basic genetic endowments, humans expanded as foragers to all major continents, across substantial bodies of ocean, and into an immense diversity of environments. Meanwhile, chimpanzees remained stuck in a narrow band of African tropical forests. The impact of culture-gene coevolution has become increasingly clear from studies of the human genome (Laland et al. Reference Laland, Odling-Smee and Myles2010). Therefore, although we agree that understanding chimpanzees also requires the study of diverse samples, we suspect that population-level variation is a far more significant issue for understanding human psychology than for understanding chimpanzee psychology.
Second, we note that it is far from clear which way the use of captive chimpanzees in psychological experiments might bias empirical findings. The aforementioned two commentaries provide opposing claims on this issue. Boesch suggests that if populations of human foragers were compared with wild chimpanzees, then the observed psychological and motivational differences would be minimized because of the impoverished social environments of captive chimpanzees and the unusual psychology of WEIRD people. Leavens et al., somewhat contrastingly, summarize evidence indicating that captive and human-reared chimpanzees have declarative pointing abilities more similar to humans (or at least to their human captors) than is found in wild chimpanzees. This suggests that captive chimpanzees may be more similar to some humans than to wild chimpanzees. It certainly seems plausible that life with humans might make chimpanzees more similar to humans, not less. This remains an open and important question, the answer to which is likely to vary depending on the phenomenon under investigation.
Finally, some of Boesch's specific indictments are off the mark. For example, he suggests that Silk et al. (Reference Silk, Brosnan, Vonk, Henrich, Povinelli, Richardson, Lambeth, Mascaro and Shapiro2005) designed experiments with the “ethnocentric assumption that sharing should be preferred over nonsharing,” and then affirmatively cites Henrich et al.'s (2006) work, which argues that these differences result from different culturally evolved norms. It turns out, however, that Henrich was a co-investigator on both projects, and that these chimpanzee experiments were designed with full knowledge of the cross-cultural results, and precisely to test the “cultural norms” hypothesis against alternatives. Moreover, these experiments were done with chimpanzees in Louisiana, which Boesch criticizes, but were also replicated in Bastrop, Texas, prior to publication (Silk et al. Reference Silk, Brosnan, Vonk, Henrich, Povinelli, Richardson, Lambeth, Mascaro and Shapiro2005), and then replicated again in Leipzig (Jensen et al. Reference Jensen, Hare, Call and Tomasello2006). Despite the substantial variation in the social environments of these chimpanzee populations, the experimental results were identical in all three sites. Finally, although differences between captive and wild chimpanzees may be important, there is nothing inconsistent about field observations of chimpanzee cooperation and Silk et al.'s experimental results. Pure self-interest can generate plenty of sharing and cooperation in some contexts, and seems to explain much about chimpanzees' social behavior (Gilby Reference Gilby2006; Tennie et al. Reference Tennie, Gilby and Mundry2009).
R1.2.4. Generalizing across contexts
The commentaries by both Konečni and Ceci, Kahan, & Braman (Ceci et al.) call attention to another concern regarding generalizability: How well do research findings generalize beyond the methods that are used to test specific hypotheses in laboratory settings? Relatedly, Rochat argues that the choice of narrowly conceived psychological instruments may limit the generalization of findings, as well. We agree with Konečni, Ceci et al., and Rochat that there are potential artifacts underlying the findings of many experimental paradigms. Highlighting the need to broaden participant samples does not obviate the need for researchers to study their phenomena with a variety of methods and in different contexts to assess whether their findings are meaningful and generalizable.
R1.2.5. Implications beyond the laboratory
The unusual nature of WEIRD samples is not solely a problem for researchers; it has implications that extend well beyond the laboratory. We think that Konečni is correct in highlighting how automatic assumptions of psychological universality can be problematic when people from one society apply and enforce new norms and policies in another society. As he notes, there can be enormous costs in “the deliberate or unconscious incorporation of WEIRD-based findings into the normative expectations held by international bodies in ‘cognitively distant’ war-torn areas – such as in Rwanda.” International interventions that are based on WEIRD research, or inspired by untested universalist assumptions, may generate ineffective and potentially destructive policies.
We emphasize, however, that an awareness of population variability is not a call for unbridled cultural relativism. Findings that reveal population differences do not imply an absence of a universal human nature, but they do indicate that what is universal might not be the same as what emerges from WEIRD participants. The investigation of universals can play a central role in the endeavor to manage international disputes and humanitarian crises, because they stand to possibly provide the only legitimate criteria by which any particular cultural practice or belief system may be understood. As Fox (Reference Fox1973, p. 13) has argued, “We could not plead against inhuman tyrannies if we did not know what is inhuman.” Understanding what is human or inhuman necessarily requires studying people from a diversity of populations.
Theories based on narrow sampling also have disturbing implications for the field of psychiatry and the treatment of mental health across diverse cultural contexts. As in the behavioral sciences, psychiatric models have largely been constructed on an empirical foundation that was gathered from WEIRD people. The burgeoning field of cultural psychiatry (Tseng Reference Tseng2001), however, has revealed that many disorders have distinct cultural boundaries, and can best be understood as culture-bound syndromes, such as (1) bulimia nervosa in the West (Keel & Klump Reference Keel and Klump2003), (2) hikikomori in Japan (Sakai et al. Reference Sakai, Ishikawa, Takizawa, Sato and Sakano2004), and (3) koro in Southeast Asia (Ngui Reference Ngui1969). Moreover, many universal mental disorders manifest themselves in quite distinct ways across populations, such that presentations of depression (Kleinman Reference Kleinman1988), social anxiety disorder (Okazaki Reference Okazaki1997), or even schizophrenia (WHO 1973) are associated with different symptoms and prognoses. In his recent book, Crazy Like Us: The Globalization of the American Psyche, Watters (Reference Watters2010) documents how psychiatry has been exporting American models of psychopathologies around the world, such as post-traumatic stress syndrome to Sri Lanka, anorexia nervosa to Hong Kong, and depression to Japan, often with disastrous consequences. The problem lies in diagnosing and treating indigenous presentations of pathologies according to how they appear through the prisms of the culturally limited diagnostic categories of the American Psychiatric Association's Diagnostic and Statistical Manual of Mental Disorders, which often increases the distress for both the patient and the community. When real-life interventions are based on a body of research – however extensive – that is disproportionately drawn from WEIRD samples, the implications extend far beyond the accuracy of our theories and into human lives.
R2. Methods for an interdisciplinary science of human behavior
We agree with Rai & Fiske, Astuti & Bloch, and Shweder that a fully interdisciplinary study of human psychology demands an integration of ethnographic and experimental methods. Panchanathan, Frankenhuis, & Barrett (Panchanathan et al.) also recognize the need for research that goes beyond disciplinary boundaries. Experimental methods provide instruments for better measurement and permit the testing of causal hypotheses. Ethnographic methods provide crucial insights for developing theory, designing experiments, and interpreting results, as well as important information for understanding the proximate causes (e.g., ontogenetic processes) of psychological differences (Henrich & Henrich Reference Henrich and Henrich2007, Ch. 1). However, because our target article was aimed principally at experimentalists, we wrote in the language of experiments. Too often, ethnographers have railed against experiments, but little communication has occurred because ethnographers generally have refused to become fluent in the local language of experimental thought (which is ironic).
There are important differences here, however. Our own view is more in line with Rai & Fiske, than with Astuti & Bloch and Shweder, who seem to be emphasizing an approach based on qualitative ethnography with an emphasis on “thick description.” Ethnographic work must be based on systemic, quantitative, and replicable research protocols that quantify the theoretically relevant aspects of life. Alongside in-depth interviews and participant observation, this might involve time allocation, systematic observation, social network measures, conversational recordings, and formal cognitive tasks (e.g., pile sorts). The integration of experimental techniques with ethnography will partially return anthropology to its broader scope, prior to the intellectually destructive epidemic of postmodernism (Slingerland Reference Slingerland2008). As Shweder points out, field anthropologists used to integrate experiments with ethnography (e.g., Edgerton Reference Edgerton1971; Mead Reference Mead1932; Rivers Reference Rivers and Haddon1901b). What we do not need is greater reliance on ethnographic impressionism, which delivers such spectacles as the Mead-Freeman-Orans debate (Freeman et al. Reference Freeman, Orans and Cote2000) on the nature of Samoan adolescent sexuality.
In the 21st century, scattered teams of highly interdisciplinary researchers have already begun to demonstrate how to integrate ethnographic and experimental findings in a manner that takes advantage of their synergies, going well beyond “thick description” (e.g., Atran et al. Reference Atran, Medin, Ross, Lynch, Coley, Ek and Vapnarsky1999; Reference Atran, Medin and Ross2005; Barrett & Behne Reference Barrett and Behne2005; Cohen Reference Cohen2007; Fessler Reference Fessler2004; Henrich et al. Reference Henrich, Boyd, Bowles, Camerer, Fehr, Gintis, McElreath, Alvard, Barr, Ensminger, Henrich, Hill, Gil-White, Gurven, Marlowe, Patton and Tracer2005a; Reference Henrich, McElreath, Ensminger, Barr, Barrett, Bolyanatz, Cardenas, Gurven, Gwako, Henrich, Lesorogol, Marlowe, Tracer and Ziker2006; Henrich & Henrich Reference Henrich and Henrich2007). In Section R2.1 below, we discuss Foundations of Human Sociality: Economic Experiments and Ethnographic Evidence from Fifteen Small-Scale Societies (Henrich et al. Reference Henrich, Boyd, Bowles, Gintis, Fehr and Camerer2004), which explicitly integrates insights from long-term fieldwork with findings from behavioral games.
R2.1. Meanings and misunderstandings: Behavioral game experiments
In expressing concerns about how experimental participants from diverse societies interpret or conceptualize particular experiments, some commentators have criticized the use of “economic games” across diverse societies (Shweder, Rai & Fiske, Baumard & Sperber, and Astuti & Bloch). We address these specific issues in two ways. First, we show how these criticisms arise from an incomplete reading of the work that has been done using economic games across diverse populations. Second, we use these cross-cultural game projects as an example of how in-depth ethnographic studies can be combined with experimental tools by interdisciplinary teams to address important theoretical questions.
Understanding the utility of an experiment requires understanding the theoretical debates that those experiments aim to address. The Roots of Human Sociality Project, which consists of two phases of experiments and ethnography performed in 22 different small-scale societies by a team of anthropologists, psychologists, and economists (see Henrich et al. Reference Henrich, Boyd, Bowles, Gintis, Camerer, Fehr and McElreath2001; Reference Henrich, McElreath, Ensminger, Barr, Barrett, Bolyanatz, Cardenas, Gurven, Gwako, Henrich, Lesorogol, Marlowe, Tracer and Ziker2006; Reference Henrich, Ensminger, McElreath, Barr, Barrett, Bolyanatz, Cardenas, Gurven, Gwako, Henrich, Lesorogol, Marlowe, Tracer and Ziker2010), was designed to examine particular hypotheses related to the evolution of large-scale complex human societies. One hypothesis for the emergence of large-scale human societies proposes that cultural evolution, driven by competition among societies and institutions, favored the evolution of particular kinds of social norms. These norms harness and extend evolved social motivations to foster cooperation, trust, and exchange with ephemeral partners, beyond each individual's stable local network of kin and repeat interactants. Such norms permitted the formation of market institutions, which encourage market expansion, trade, and economic success. Psychologically, this hypothesis suggests that the inhabitants of large-scale, complex, market-integrated societies will possess default sets of prosocial beliefs, motivations, and expectations about how to treat ephemeral interactants (e.g., strangers or anonymous others). Under this view, the institutions of complex societies, such as markets, ought to correlate positively with prosocial behavior in these contexts.
An alternative hypothesis proposes that cooperation, trust, and exchange in large-scale societies result directly from the misapplication of evolved kin- and reciprocity-based heuristics for interacting in small-scale societies to individuals in larger social spheres (Burnham & Johnson Reference Burnham and Johnson2005; Dawkins Reference Dawkins2006), eventually including nation states. These heuristics for life in small-scale societies, which are not favored by natural selection in large-scale societies, misfire in large-scale societies because these societies have spread only in the last 10 millennia. Crucial to this hypothesis is that cultural evolution cannot substantially alter the social motivations and calculations that determine sociality.
These two hypotheses make quite different predictions about the context-specific behavior and motivations of people from different societies toward ephemeral interactants.
What kind of experiments might allow us to measure these differences? Ideally, the experiments should have real costs and benefits with the same underlying material payoffs, so we can comparatively measure motivations and expectations. However, there ought to be cues that will tap the predicted sets of context-specific motivations and expectations (norms) for interacting with individuals in the absence of information about their particular relationships (e.g., cues about status, sex, kinship, or future interaction). With their salient cues of cash and anonymity, and their lack of other cues, economic games seem ideally suited for testing the above hypotheses (Henrich et al. Reference Henrich, Ensminger, McElreath, Barr, Barrett, Bolyanatz, Cardenas, Gurven, Gwako, Henrich, Lesorogol, Marlowe, Tracer and Ziker2010).
From the beginning, the Roots team leaders knew that deploying these experiments across diverse societies would be challenging, requiring experts on each local culture and qualitative ethnographic information to assess local meanings and interpretations. Long-term anthropological fieldworkers were recruited to design and implement the protocols. Although the Phase I findings did show that market integration was indeed important for predicting prosociality in these contexts, there were also a few cases in which the experiments happened to cue local prosocial norms – interpretations or meaning systems – unrelated to the targeted set of default norms for exchanging with strangers or anonymous others. The project team attended to these alternative interpretations, arguing that it is essential to understand the mapping between the experiments and local norms (Alvard Reference Alvard, Henrich, Boyd, Bowles, Camerer, Fehr and Gintis2004; Ensminger Reference Ensminger, Henrich, Boyd, Bowles, Camerer, Fehr and Gintis2004; Henrich & Smith Reference Henrich, Smith, Henrich, Boyd, Bowles, Gintis, Fehr and Camerer2004; Hill & Gurven Reference Hill, Gurven, Henrich, Boyd, Bowles, Camerer, Gintis and Fehr2004); they also captured much of this variation with a variable related to non-market cooperative domains in their statistical analyses. The observation of how daily life influences the experiments was so important that it was one of the five major points in the team's 2005 BBS paper (Henrich et al. Reference Henrich, Boyd, Bowles, Camerer, Fehr, Gintis, McElreath, Alvard, Barr, Ensminger, Henrich, Hill, Gil-White, Gurven, Marlowe, Patton and Tracer2005a). Each ethnographer also wrote a chapter in Foundations of Human Sociality: Economic Experiments and Ethnographic Evidence and in Fifteen Small-Scale Societies, in which they deployed their own interviews, participant observation, and years of ethnographic experience to illuminate the local meanings of our experiments.
An important example of this focus on understanding local meanings comes from the team's investigation of their experimental findings from the Au of New Guinea (see Shweder). This investigation began when ethnographer David Tracer, who speaks Au and had been working in New Guinea for 12 years prior to the Roots collaboration, published two papers illuminating his experimental findings (Tracer Reference Tracer2003; Reference Tracer, Henrich, Boyd, Bowles, Camerer, Fehr and Gintis2004). To further address this in Phase II, the Roots team recruited another long-term linguistically skilled New Guinea ethnographer, Alex Bolyanatz. In this second phase, Tracer replicated and extended his prior findings, while Bolyanatz returned home with findings parallel to those of Tracer (Bolyanatz, under review; Tracer et al., under review). It seems likely that in New Guinea, behavioral games map onto prosocial norms that have little or nothing to do with markets or complex societies. This is consistent with decades of ethnography emphasizing the broad-ranging importance of reciprocity norms in New Guinea (Fiske Reference Fiske1991; Sillitoe Reference Sillitoe1998).
After the Phase I findings became known, many researchers expressed the same concerns that have been highlighted by commentators Baumard & Sperber, who write “participants in these games have no information about the rights of each player over the stake and are asked to make a ‘blind’ decision. But who owns the money? … Who is the other participant? … Does he or she have rights over the money?” Whereas the Roots team argued against these concerns for Phase I (Henrich et al. Reference Henrich, Boyd, Bowles, Camerer, Fehr, Gintis, McElreath, Alvard, Barr, Ensminger, Henrich, Hill, Gil-White, Gurven, Marlowe, Patton and Tracer2005b), Phase II's design directly and explicitly addressed them in the standardized game instructions, pre-game tests of participants' understanding, post-game interviews on game interpretations, contextualized game variants, and games with double-blind anonymous partners. Phase II's results replicated and extended the Phase I findings in various ways, showing that, among other things, the modifications made to address the concerns raised by Baumard & Sperber have little impact on the results (Henrich & Ensminger, n.d.; Henrich et al. Reference Henrich, McElreath, Ensminger, Barr, Barrett, Bolyanatz, Cardenas, Gurven, Gwako, Henrich, Lesorogol, Marlowe, Tracer and Ziker2006; Reference Henrich, Ensminger, McElreath, Barr, Barrett, Bolyanatz, Cardenas, Gurven, Gwako, Henrich, Lesorogol, Marlowe, Tracer and Ziker2010).
Rai & Fiske suggest the experimental games are not meaningful because they do not correlate with anything important in the real world (Levitt & List Reference Levitt and List2007). They cite Gurven and Winking (Reference Gurven and Winking2008), who show that socializing, food-sharing, beer-brewing, and well-digging are not correlated with three bargaining experiments among the Tsimane in Bolivia. We question what theory predicts that those domains should be correlated? The above described evolutionary approach to social norms predicts that, if game play does indeed tap norms evolved for interacting with strangers or anonymous others, then the games played by Gurven and Winking ought to be associated with things such as market integration, social scale (community size), and other features related to the operation of larger-scale societies – features that capture those elements of social interactions not governed by durable personal relationships. Looking across diverse populations, market integration is indeed highly correlated with experimental measures of prosocial behavior in these bargaining games (Henrich et al. Reference Henrich, Ensminger, McElreath, Barr, Barrett, Bolyanatz, Cardenas, Gurven, Gwako, Henrich, Lesorogol, Marlowe, Tracer and Ziker2010). Similarly, antisocial punishment (Figure 3 in the target article) is highly negatively correlated with GDP (gross domestic product), and predicted by national measures of the strength of the rule of law and measures of norms of civic cooperation (Herrmann et al. Reference Herrmann, Thoni and Gächter2008). Within populations, trust game measures of trustworthiness predict repaying loans in a microfinance program (Karlan Reference Karlan2005), and predict alumni donations (Baran et al. Reference Baran, Sapienza and Zingales2009). Dictator game offers are correlated with donations to hurricane victims (Kam et al., n.d.) and political participation (Fowler & Kam Reference Fowler and Kam2007). Once properly theorized, not only are economic games highly correlated with important real-world phenomena, but we can predict with which real-world phenomena they should correlate.
R2.2. Merely methodological artifacts?
Beyond economic games, Shweder, as well as Baumard & Sperber, ask whether the population-level variations found in the psychological literature we reviewed are a product of methodological artifacts arising from different meanings assigned to the experimental settings, or from communication failures between researchers and participants. Could it be that, for example, the extensive cognitive differences we reviewed in perceptual judgments, visual illusions, analytic-holistic thinking, and folkbiological reasoning arise not from differences in cognitive processes, but from different interpretations of the questions or the tasks?
While we agree that researchers working across populations should take such methodological concerns seriously, several lines of evidence speak against this being a general problem. First, we emphasize that diverse methodological techniques have often been used that yield consistent findings. For example, Western participants are found to privilege analytic cognitive strategies, whether the tasks measure reaction times in categorization, free recall, patterns of bias in deductive reasoning, or eye tracking in scene recognition. It is hard to see how task interpretation or meaning issues would affect all of these tasks and yield responses in the same direction as the predicted differences in cognitive processes. Moreover, many of the psychological studies we included in our review (Choi & Nisbett Reference Choi and Nisbett1998; Norenzayan et al. 2002) included control conditions in which there were no population-level differences, and none were expected. This helps establish the meaning equivalence of the experimental contexts and procedures across populations, and undermines a purely methodological interpretation of differences found. Finally, it is important to recognize that the same or similar methods and instruments that have revealed population-level differences have also revealed population-level invariances, often in the same study with the same participants (Atran & Medin Reference Atran and Medin2008; Atran et al. Reference Atran, Medin and Ross2005; Haun et al. Reference Haun, Rapold, Call, Janzen and Levinson2006; Henrich et al. Reference Henrich, McElreath, Ensminger, Barr, Barrett, Bolyanatz, Cardenas, Gurven, Gwako, Henrich, Lesorogol, Marlowe, Tracer and Ziker2006; Norenzayan et al. 2002; Segall et al. Reference Segall, Campbell and Herskovits1966). As we point out in our target article, if certain methods count toward establishing invariant aspects of psychological processes, then data that indicate variation have to count as well.
As an illustration, consider recent work showing both universal and variable aspects of numerical cognition. In contrast to evidence from WEIRD samples, experimental work from two small-scale Amazonian societies, the Piraha and the Munduruku, suggests that the ability to distinguish quantities digitally beyond the first couple of integers is poor in these groups, whose languages do not include numerals above 3 (Gordon Reference Gordon2004; Pica et al. Reference Pica, Lerner, Izard and Dehaene2004) – a pattern common in many such societies (Everett Reference Everett2005). These same experiments also demonstrate that the cognitive ability to estimate quantity approximately, or an analog “number sense” (Dehaene Reference Dehaene1997), is found to be strikingly similar irrespective of linguistic variation in counting systems. This analog system is also present in numerous nonhuman species (Hauser & Spelke Reference Hauser, Spelke and Gazzaniga2004).
R2.3. Accessing non-WEIRD samples
Commentators Gosling, Carson, John, & Potter (Gosling et al.) discuss an important tool that should allow the behavioral sciences to obtain more diverse samples by reaching non-WEIRD participants through the Internet. We agree that this is a potent addition to the researcher's toolbox. The advantages of the Internet are the ease and affordability by which international samples can be accessed. We suspect that some fields are probably more likely to change in the ways we prescribe if the changes do not require researchers to alter their habitual practices, or leave their home universities to go out into the world.
While the Internet is undoubtedly a valuable tool that needs to be fully exploited, there are some limitations to this approach, as Gosling et al. note. First, though the Internet is amenable to some kinds of psychological experimentation, it will not facilitate the kind of integrated research program that synthesizes tools from across the human sciences, including direct observation, naturalistic field experiments, biomarkers, and qualitative ethnography, into longitudinal studies across the life cycle. Second, the segments of many countries that have Internet access probably share many attributes with WEIRD people already (see Rozin) – they will tend to be rich, educated (at least literate), and often disproportionately from particular ethnic groups. For example, Internet users in Africa are far more likely to be cultural outliers in WEIRD ways relative to the general African population, compared to, say, Internet users in Europe relative to the general European population. Third, many people overestimate the current reach of the Web. Our Table R1 gives the percentage of the total population in each of the world's major regions that comprises of Internet users (i.e., percentage of Internet penetration across the globe). The percentages for Africa, Asia, and Latin America are not only low, but the distributions are highly skewed. Most African countries have Internet penetrations of less than 1%; and the overall African penetration is distorted upward by the higher distribution in Egypt, South Africa, and Nigeria. The aggregate statistics for Oceania/Australia are also deceptive, as more than half the countries comprising that region have less than 15% penetration, with six countries showing less than 5% penetration (Solomon Islands, Kiribati, Papua New Guinea, Nauru, Marshall Islands, and Samoa). Fourth and finally, we hope that the ease of this method does not discourage researchers from considering other ways to broaden their research programs to integrate diverse populations. Overall, we are in favor of any additional methodological tools that can be used to study diverse human samples, and we believe the field will be best off by using a wide variety of different tools.
Global distribution of Internet penetration as of 2009Footnote 1
R3. “Basic level” processes, learning, and computational operations
Four commentaries emphasize a bipartite partition between (1) mental content and (2) basic or universal, psychological processes (Rozin), variously labeled as “learning” (Danks & Rose), “computational operations” (Khemlani et al.), or “low-level processing” (Rochat). Each commentary emphasizes that while content varies, the underlying computational machine is constant. Danks & Rose write, “there is a natural, defensible distinction between the cognitive ‘objects’ of the mind, and dynamic mental ‘processes.’ Cognitive objects include representations, knowledge structures, and so on.” Similarly, Rozin notes, “But at the level of basic psychological processes, such as learning, motor organization, or vision, the NAU [North American undergraduate] is probably a pretty good fruit fly.” Maybe this is how the human mind operates, like the computer on your desk, but how do we know until we sample a broad range of human diversity? Human learning or computational processes might be self-modifying to adapt to local conditions, so, while all human fetuses might begin with (roughly) the same cognitive equipment, acquired content could provide feedback and alter these “basic”-level learning or computational processes through phenotypic plasticity or cultural transmission. Or, culture-gene coevolution, which is increasingly recognized as a powerful force in human evolution (Laland et al. Reference Laland, Odling-Smee and Myles2010), could genetically adapt local populations to more effectively acquire and process the local stable cultural representations.
Our own folk model of human psychology, which is also rooted in a computer metaphor, conforms to that suggested by these commentators. However, we think the available evidence ought to make us question this metaphor. Let's first consider vision, since Rozin highlights this. By all appearances, vision seems to be the product of “basic” processes. We have already discussed the population-level variation in the Müller-Lyer illusion, which, according to Khemlani et al., is “but one single phenomenon in visual perception, hardly representative of all visual perceptual processes.” However, as we noted, there is also substantial variation in the Sander Parallelogram and two versions of the Horizontal-Vertical illusion.
But forget illusions. Suppose one was studying why people see so poorly underwater, compared to on land. Can one make universal generalizations regarding the human ability to see underwater by exclusively studying undergraduates? Turns out, no (Gislen et al. Reference Gislen, Dacke, Kroger, Abrahamsson, Nilsson and Warrant2003). The Moken are nomadic sea foragers who live in an archipelago off the coast of Burma. From a young age, Moken subsist by collecting food from the sea floor. Comparisons of underwater visual acuity between Moken and European children show that Moken children have more than twice the visual acuity of their European counterparts. The Moken appear to have acquired the ability to constrict their pupils underwater, thus improving acuity, rather than widening them in the dimmer light, as Europeans do. That is, the pupils of Moken and Europeans do opposite things when they enter water, one adaptive, the other, not so much. Isn't pupil dilation a “basic” part of visual processing?
Rozin also mentions “motor organization” as a basic process. It is not clear to us precisely what this means, but Karasik et al.'s commentary reviews evidence showing how important aspects of motor development vary across human populations, and suggests how this might be related to childrearing or parenting practices. Karasik et al. also discuss how children in some small-scale societies never crawl – instead, they “butt-scoot” or “bum-shuffle.” A “crawling stage” per se is neither universal nor necessary for adult bipedalism.
Perhaps a non-psychological example will further sharpen the problem. Suppose you wanted to study the nature of human running. Can you build a universal model of human running that is based on undergraduates, or other WEIRD people? Interestingly, WEIRD people would be one of the worst populations to select for such an investigation. Recent research shows that cushioned running shoes lead to dramatic modifications of the human running profile by causing runners to land principally on their heels instead of the balls of their feet. This difference has substantial implications for how we understand the evolved design and engineering of human feet (Lieberman et al. Reference Lieberman, Venkadesan, Werbel, Daoud, D'Andrea, Davis, Mang'Eni and Pitsiladis2010). If one studies life-long sneaker-wearers, the engineering of human feet appears ill-suited for long-distance running. In contrast, studying barefoot runners, particularly life-long barefoot runners, suggests a marvelous evolved design for the human foot, possibly specialized for long-distance running (Bramble & Lieberman Reference Bramble and Lieberman2004). Hence, if you study WEIRD runners exclusively, you again get the wrong answer.
The difference between the feet of shod and unshod people has implications even for the interpretation of ancient hominid evolution. In 1978–79 a 27.5-meter-long trail of footprints hardened in volcanic ash dating to 3.5 million years ago was unearthed at Laetoli, Tanzania. Comparisons of these ancient prints with those of urban North Americans suggested that although bipedal, these ancient hominids were not bipedal in the way Homo sapiens are. The ancient footprints show a separation between the big toe and second toe, an anterior “fanning,” and a substantial arch – all indicating differences compared to the feet of WEIRD humans. However, when the Laetoli prints were compared with those of the Machiguenga, who live a barefoot life of hunting, gathering, and horticulture in the Peruvian Amazon, the Laetoli footprints could not be distinguished from these non-WEIRD footprints (Tuttle et al. Reference Tuttle, Webb, Weidl and Baksh1990; Reference Tuttle, Webb and Baksh1991). It turns out that WEIRD people have flat, narrow feet with underdeveloped big toes, which are the product of a lifetime of having one's feet bound in cushioned shoes. One cannot even safely identify universals about human foot anatomy by exclusively studying WEIRD people!
The assumption that “basic” processes are invariant also needs to stand up to evidence that human brains change in response to experience and cultural routines. Work on neuroplasticity has shown how training and expertise can create both functional and structural differences in brains. But the demands and incentives of developmentally adapting to local social organizations, status hierarchies, performance norms, carpentered corners, and other culturally evolved features of developmental environments persist for longer periods, are probably more constant, and are arguably more intensive than the behavioral regimes associated with occupations, such as those associated with musical training, mathematics, or taxi driving (Reynolds Losin et al. Reference Reynolds Losin, Dapretto and Iacoboni2010). Yet, musical training creates structural alterations in brains, such as enlarging the anterior corpus callosum and altering the motor and somatosensory maps. Taxi driving increases gray matter in the hippocampus, while training as a mathematician increases grey matter in the parietal cortex. Consequently, however one conceives of these hypothesized invariant “basic processes,” such processes would have to remain constant in the face of the structural and functional modifications in brains that inevitably arise from ontogenetically adapting to culturally constructed environments (Reynolds Losin et al. Reference Reynolds Losin, Dapretto and Iacoboni2010). As Panchanathan et al. recognize, rather than demanding universal psychological processes, it might be fruitful to think about evolved ontogenetic processes that construct and calibrate diverse psychological processes to local environments, at least for some domains.
Recent evidence emerging from collaborations between cognitive psychologists and anthropologists further challenges the distinction between process and content that stands as a virtual axiom in parts of the cognitive sciences. Cultural differences in what people think about loops back to impact how people think (Bang et al. Reference Bang, Medin and Atran2007). For example, in folkbiology, differences in what people believe about plants and animals affect memory organization and ecological reasoning about living things.
Finally, a key point that is missed by the prevailing arguments supporting a distinction between “universal process” versus “variable content” is that universality is not an all-or-nothing phenomenon. In order to draw meaningful conclusions about what is universal and what is not, one must also make distinctions between different levels of universals that are grounded in empirical observation. For example, the cognitive ability to estimate quantity approximately, discussed earlier, appears to be quite invariant in that it produces cognitive responses with identical effect sizes across populations. In contrast, many processes of central interest to psychology and cognitive science, such as rule-based categorization, geocentric spatial reasoning, or some egocentric motivational biases, are universal in a much weaker sense. They may exist in the psychological repertoires of all peoples, but their use and relative dominance over other competing strategies are contingent on population-level variability in cultural routines and practices. See Norenzayan and Heine (Reference Norenzayan and Heine2005) for a theoretical framework for identifying levels of universals.
In sum, it is not clear to us what kinds of psychological processes are a priori more likely to be universal. More empirical evidence regarding the degree to which psychological phenomena vary across populations will be of much utility in addressing this important question. Perhaps, as Machery alludes, such evidence will reveal that social psychological phenomena (whatever those are?) are especially likely to vary across populations. But at this point, there is insufficient evidence to support such a conclusion.Footnote 2
R4. Are WEIRD populations actually unusual?
While agreeing with our central thesis, Bennis & Medin worry that the seemingly extreme nature of WEIRD populations on many dimensions may result from various biases created by the fact that most researchers are themselves WEIRD, and that psychological instruments are developed and honed in WEIRD populations. We are sympathetic with their concerns, although we quibble with some of the details of their critique.
Bennis & Medin begin their critique by proposing that a consideration of base rates would suggest that something is amiss with our claim. Their logic implies that it was essentially a random process that determined which society, of all historical and extant societies, happened to accumulate sufficient experimental findings about human behavior from enough populations to begin to consider the question at hand. However, the right way to think about the question they are posing is to ask: What is the probability that a society is psychologically unusual, given that it is the first to aggregate sufficient experimental findings from diverse societies to even explore the question? For starters, such a society has to inherit a scientific and experimental tradition. The society has to be economically successful enough to create occupational specializations for experts in human psychology and behavior. The society has to be willing to place sufficient value on these activities, despite their questionable economic utility. This society must be willing to fund data collection in some diverse populations. And, at least some members of the society have to find this endeavor sufficiently interesting to dedicate their lives to pursuing it. Given all these prerequisites, which limit the number of candidate societies to a handful, we think it is quite plausible that the society which first had sufficient experimental findings to explore the question at all, would itself be psychologically unusual.
In fact, many of the findings reviewed in our article and in these commentaries could plausibly be linked to being in a position to explore the question. Commenators Kesebir, Oishi, & Spellman (Kesebir et al.) note that Americans have low pathogen loads and high residential mobility, both of which are correlated with individualism and may promote economic growth. Together, Lancy and Fernald suggest that American (and Western, more generally) childrearing practices may speed the cognitive development of particular skills. We pointed out the unusual lack of co-sleeping in the United States, which may influence independence and self-reliance. Individualistic notions of the self may increase Americans' curiosity about psychology, which might explain why psychology is so dominated by U.S.-based research. Differences in holistic versus analytical thinking may be linked to epistemic social norms about what counts as “good thinking” (Buchtel & Norenzayan Reference Buchtel and Norenzayan2008), and may be rooted in the very origins of science (Nisbett Reference Nisbett2003). Visual illusion variability and folkbiological anomalies may be linked to growing up in built urban environments with ample two-dimensional representations (artwork, photographs). Motivations for fairness towards anonymous others are highly correlated with market participation, whereas motivations for antisocial punishment strongly negatively predict GDP; both of these factors can impact economic performance. In short, it appears that a society that has conducted and amassed a large majority of the extant psychological data may, for precisely the same underlying reasons, be a psychological outlier, at least on many important dimensions. These are not independent processes, as Bennis & Medin would have it.
Having clarified this, we do agree with Bennis & Medin's more general concern that WEIRD samples may look particularly unusual on those measures that have been selected, developed, and honed for use in WEIRD populations. If Indian psychologists first identified a specific phenomenon that was of particular concern in India, they too might hone their methods to enhance their effects, resulting in Indian participants being outliers for that phenomenon. However, this is precisely the problem that the behavioral sciences face – initial theories are largely derived from WEIRD samples, and, as such, we fail to consider phenomena that might be of greater concern elsewhere. As our target article emphasized, it remains an open question whether, when a full accounting is taken of all the psychological phenomena that exist throughout the world, WEIRD samples will remain any more unusual than other societies. At present, we lack the empirical data to evaluate this possibility, and hope that researchers strive to contribute to this database by identifying and studying phenomena that are more of a concern in other populations.
R5. Does variability conceal psychological universals? Is WEIRD WRONG?
Gaertner et al.'s commentary stands as the sole one that explicitly rejects our basic claim regarding population-level variability in psychological processes. Although this commentary is unique in its criticism of our key arguments, we suspect that many behavioral scientists share the intuitions underlying these commentators' critique, and we therefore devote substantial space to this in our response.
Gaertner et al. maintain that behavioral scientists study the culturally variant phenotypes of an underlying universal genotype. We note that this point is dependent on researchers being able to discern what the underlying universal psychological process is in the first place. But behavioral scientists do not have direct access to this underlying genotypic level; rather, we are in the business of studying questions at the phenotypic level, such as, whether people view themselves more positively than they are viewed by others, whether people succumb to the Müller-Lyer illusion, or at what point people reject low offers in the Ultimatum Game. The genotypic level is inferred on the basis of the phenotypic evidence; and when the phenotypic evidence comes from a narrow sample, such as North American undergraduates, researchers very well might incorrectly infer what is going on at the genotypic level.
As an analogy, consider the case of understanding pregnancy sickness. All of the major hypotheses for why women get nausea and vomiting during their first trimester are evolutionary explanations. Some hypothesize that pregnancy sickness is an evolutionary byproduct of the body's need to alter hormonal levels, to permit the growth of a fetus. Other explanations propose that pregnancy sickness is an adaptation triggered by either threats from pathogens (in meat) or toxins (from plants). Which evolutionary explanation is correct? It turns out that key evidence comes from studies of small-scale societies that rely on very little meat and use corn as a staple (which does not have the toxins hypothesized to spark pregnancy sickness as a defense). These societies have plenty of pregnancy, but no pregnancy sickness, thus undermining the byproduct hypothesis (Fessler Reference Fessler2002; Flaxman & Sherman Reference Flaxman and Sherman2000). So, assuming evolved universality does not preclude the need for comparative evidence to adjudicate among alternative hypotheses.
Now consider the self-serving attributional bias, which is operationalized as the tendency for people to take personal credit for their successes but to direct blame externally for their failures. An extensive meta-analysis revealed that the effect size for this bias is d=1.05 for Americans but d=−0.30 for Japanese (Mezulis et al. Reference Mezulis, Abramson, Hyde and Hankin2004): that is, the phenotypes are diametrically opposed. What is the underlying universal psychological process here? Just like the blind men trying to identify the elephant, one would reach an entirely different conclusion if one had Japanese data than if one had American data. The most complete view of the underlying beast requires multiple sources of evidence; the more diverse the populations studied, the better researchers will be able to triangulate on the underlying processes, be those universal psychological or ontogenetic processes.
The guiding assumption of much of the behavioral sciences has been that human behavior is the expression of universal underlying psychological processes. We submit that it is because of this assumption that the samples studied are as narrow as Arnett's (Reference Arnett2008) analysis reveals. Indeed, there is little point to trek into the New Guinea highlands if highlanders share the same universal psychological processes as the undergraduates in a researcher's home university. In the target article we questioned whether this assumption is tenable – how does it stand up to the empirical evidence? For some domains (e.g., personality structure; sex differences in some mate preferences) the evidence for universality, at least at the level of functional universals (see Norenzayan & Heine Reference Norenzayan and Heine2005), seems solid. In other domains (e.g., fairness motivations, moral principles, spatial perception), the high degree of variation among populations makes identifying an underlying universal psychological process more interesting (and more work). But we cannot address whether this assumption of universality is supported for a given hypothesized process until there is sufficient comparative data.
In some domains, it might be the case that there is not a common, universal psychological process. Many in the behavioral sciences have yet to take seriously the implications of epigenetic inheritance and culture-gene coevolution. A rising tide of evidence from epigenetics is showing how genetic systems modify gene expressions to adapt to local circumstances without altering the underlying DNA. This can create lasting heritable variation in individuals (epigenetic inheritance), and differences among populations, without any underlying differences in DNA base pair sequences – only differences in gene expression (Jablonka & Raz Reference Jablonka and Raz2009). Monozygotic twins diverge in their gene expression as they age because their epigenetic response systems modify their gene expressions (Fraga et al. Reference Fraga, Ballestar, Paz, Ropero, Setien, Ballestar, Heine-Sũner, Cigudosa, Urioste, Benitez, Boix-Chornet, Sanchez-Aguilera, Ling, Carlsson, Poulsen, Vaag, Stephan, Spector, Wu, Plass and Esteller2005). In addition, potent forces of culture-gene coevolution could mean that different populations have genetically adapted to stable elements of their cultures, such as in the case of lactose tolerance (Laland et al. Reference Laland, Odling-Smee and Myles2010). This does not mean that there are no basic principles to account for psychology and behavior; it just means that we must move one step back to principles from genetics, epigenetics, culture-gene coevolution, and cultural evolution to explain human variation.
It seems to us that Gaertner et al. are offering an unfalsifiable hypothesis. They suggest that studying diverse populations will either yield evidence of similarities because of an underlying universal psychological process, or it will yield evidence of differences, which mask the underlying universal psychological process. They do not offer any means for discerning an underlying universal process in the face of population-level variability. Indeed, they do not seem willing to entertain alternative hypotheses that, for example, propose universal ontogenetic processes that give rise to different psychological processes under different conditions during development.
This is particularly evident in their discussion of population-level variability in self-enhancement motivations. Gaertner et al. challenge us by claiming that the cross-cultural evidence regarding self-enhancement points to much universality. We tackle this claim at length here as it is central to their rejection of our thesis, and it is an ongoing controversy.Footnote 3 First, they argue that this motivation is universal but expressed differently: Westerners enhance themselves in domains that are important to them (i.e., individualism), while East Asians enhance themselves in domains that are important to them (i.e., collectivism). This question has been investigated using a number of different methods. The results from the “better-than-average effect” largely support this hypothesis (Brown & Kobayashi Reference Brown and Kobayashi2002; Sedikides et al. Reference Sedikides, Gaertner and Toguchi2003). These are the only findings cited by Gaertner et al.
However, the other 11 methods that have addressed this same question (viz., the false-uniqueness bias, actual-ideal self-discrepancies, manipulations of success and failure, situation sampling, self-peer biases, relative-likelihood and absolute-likelihood optimism biases, open-ended self-descriptions, automatic self-evaluations, social relations model, and a corrected better-than-average effect) yield an opposite pattern of results – that is, East Asians do not self-enhance more in domains that are especially important to them (whereas Westerners do: Falk et al. Reference Falk, Heine, Yuki and Takemura2009; Hamamura et al. Reference Hamamura, Heine and Takemoto2007; Ross et al. Reference Ross, Heine, Wilson and Sugimori2005; Su & Oishi Reference Su and Oishi2010). A meta-analysis including all of the published studies on this topic finds no support for this hypothesis (Heine et al. Reference Heine, Kitayama and Hamamura2007a; Reference Heine, Kitayama and Hamamura2007b); the meta-analyses cited by Gaertner et al. (viz., Sedikides et al. Reference Sedikides, Gaertner and Vevea2005; Reference Sedikides, Gaertner and Vevea2007a) find different results because they excluded most of the studies that yielded contrary findings. Further, many research programs have documented that the better-than-average effect is a compromised measure of self-enhancement as it includes a number of cognitive biases that exaggerate estimates of self-enhancement (Chambers & Windschitl Reference Chambers and Windschitl2004; Hamamura et al. Reference Hamamura, Heine and Takemoto2007; Klar & Giladi Reference Klar and Giladi1997; Krizan & Suls Reference Krizan and Suls2008; Kruger Reference Kruger1999; Windschitl et al. Reference Windschitl, Conybeare and Krizan2008). That is, the sum total of the available evidence contradicts Gaertner et al.'s claim that East Asians are self-enhancing in domains of special importance to them.
A second argument for the universality of self-enhancement that Gaertner et al. offer is that the population-level variability is a function of different modesty norms such that the cultural differences disappear with non-explicit measures. However, other studies of self-enhancement that employ hidden behavioral measures (which Gaertner et al. do not cite) find equally pronounced differences as those with explicit measures (Heine et al. Reference Heine, Takata and Lehman2000; Reference Heine, Kitayama, Lehman, Takata, Ide, Leung and Matsumoto2001). Along these lines, Gaertner et al. cite evidence that cultural differences do not appear with the Implicit Association Test (IAT) measure of self-esteem (Greenwald & Farnham Reference Greenwald and Farnham2000). This is the only method out of 31 that did not find a population-level difference in the magnitude of self-enhancement motivations between Westerners (average d=0.87) and East Asians (average d=−0.01: Heine & Hamamura Reference Heine and Hamamura2007), yet none of the results from these 30 other methods are discussed by Gaertner et al. Further, the IAT measure of self-esteem has the least validity evidence of any of the IAT attitude measures (Hofmann et al. Reference Hofmann, Gawronski, Gschwendner, Le and Schmitt2005), and this measure does not correlate reliably with other implicit measures of self-esteem, measures of explicit self-esteem, or other external validity criteria (Bosson et al. Reference Bosson, Swann and Pennebaker2000; Falk et al. Reference Falk, Heine, Yuki and Takemura2009). Hence, at present it is unclear what the self-esteem IAT measures, and it is noteworthy that it is the method that stands alone in not finding a difference in self-enhancement motivations between Westerners and East Asians.
Finally, Gaertner et al. argue that self-enhancement promotes adjustment equally in both Westerners and East Asians. But the relationship between self-enhancement and adjustment continues to be hotly debated, even among Western samples, with divergent results emerging depending on the methods used. In general, evidence for this is best with measures of the better-than-average effect and self-report measures of adjustment, where the individual answers items regarding both how positively people view themselves relative to others and how positively they view themselves with regards to their adjustment. The evidence is much weaker, and often contradictory, for studies that utilize objective benchmarks of self-enhancement (Colvin et al. Reference Colvin, Block and Funder1995; Paulhus Reference Paulhus1998; Robins & Beer Reference Robins and Beer2001; but see Taylor et al. Reference Taylor, Lerner, Sherman, Sage and McDowell2003). Further, the only published study that measured self-enhancement and depression in both East Asian and Western locations finds a significantly weaker relation between the two constructs among Japanese than Canadians (Heine & Lehman Reference Heine and Lehman1999). It is also worth noting that, although evidence for self-enhancement is far weaker among East Asians than Westerners, epidemiological studies find depression rates in East Asia to be approximately one-fifth that of North America (Kessler et al. Reference Kessler, McGonagle, Zhao, Nelson, Hughes, Eshleman, Wittchen and Kendler1994; Weissman et al. Reference Weissman, Bland, Canino, Faravelli, Greenwald and Hwu1996) – a pattern that is difficult to explain if self-enhancement promotes well-being equally across populations.
In sum, Gaertner et al. claim that East Asians self-enhance similarly to North Americans, and that this reveals the universality of self-enhancement motivations. They are only able to make this claim by ignoring the vast majority of the relevant data. The assumption of underlying universal psychological processes is nothing more than that – an assumption, which needs to be evaluated against alternative hypotheses with empirical evidence from diverse human populations.
R6. Misleading categories and contrasts?
While agreeing with our major point, Astuti & Bloch suggest that our series of telescoping contrasts, and especially our contrasts between small-scale and industrialized societies and between Western and non-Western societies, distorts or exaggerates the unusual nature of WEIRD people. They charge us with the “uncritical lumping together of a variety of disparate societies” and with using “under-theorized labels.”Footnote 4
We agree with their concerns about uncritically lumping societies, which is why we sought to carefully avoid such pitfalls. To begin, in our introduction we wrote, “We emphasize that our presentation of telescoping contrasts is only a rhetorical approach guided by the nature of the available data. It should not be taken as capturing any unidimensional continuum, or suggesting any single theoretical explanation for the variation” (target article, sect. 1, para. 7). The first sentence in this quotation was meant to explain how our particular choices of telescoping contrasts were strictly driven by the lumpy and sparse distribution of the available data, not by any theorizing about the nature of the variation. The second sentence, regarding the unidimensional continuum, was meant to explicitly avoid any suggestion of what Astuti & Bloch term a “unilineal path.”
Next, because of our concern about lumping disparate societies, we displayed the data whenever possible. Figure 2 in our target article displays the 14 small-scale societies for the Müller-Lyer illusion, along with two industrialized populations. We also included all the samples of children on that figure. This allows the reader to draw his or her own conclusions, and highlights the degree of variation among small-scale societies. Figures 3A, 3B, and 3C display the means for each of 15 small-scale societies on three different behavioral measures related to fairness. Again, this kind of graphical display was specifically intended to lay all the cards on the table, and avoid concealing population-level variation in broad categories. In the text we wrote,
For Dictator Game offers, Figure 3A shows that the U.S. sample has the highest mean offer, followed by the Sanquianga from Colombia, who are renowned for their prosociality (Kraul Reference Kraul2008). The U.S. offers are nearly double that of the Hadza, foragers from Tanzania, and the Tsimane, forager-horticulturalists from the Bolivian Amazon. … [F]or Ultimatum Game offers, the United States has the second highest mean offer, behind the Sursurunga from Papua New Guinea. (target article, sect. 3.2, para. 5)
How is this lumping the small-scale societies together? Figure 4 from our target articles displays the available data from nine non-Western and seven Western societies for both the punishment of free-riding and for antisocial punishment. The reader can see the interesting variation within both Western and non-Western societies, and can even apply his or her own categorization schemes. In Figure 5, we show the percentage difference between analytic and holistic judgments for six different samples, which we compiled ourselves from different sources in order to illustrate the variation as accurately as possible.
Astuti & Bloch express forceful charges against our efforts, while agreeing with our main point. Given this, we believe it would have been more constructive to have explained how they would have presented the data and made the case more effectively.
R7. Explaining the variation: Why are WEIRD samples so unusual?
For the purposes of our target article we remained largely agnostic regarding explanations for the peculiar nature of WEIRD psychology, although we did point out that this should not be unexpected, given the rather odd ways of life of most WEIRD people, especially compared to those of small-scale societies and the environments of ancestral humans. Contra Danks & Rose, we did not mean to suggest that this variation is fundamentally inexplicable, or that we should not try to explain it. In fact, all three of us (Henrich, Heine, and Norenzayan) have been engaged in trying to explain various elements of it for much of our careers.
Several commentaries, when viewed together, suggest two proximate explanations for why WEIRD subjects are often unusual. First, following Majid & Levinson, the English language apparently occupies an obscure corner of the design space of possible languages, potentially giving theorists misleading points of departure or unusual folk intuitions. Majid & Levinson worry that this “English-bias” may be impacting theorizing in the cognitive sciences, while Machery and Stich show that it has impacted philosophical inquiry. By citations, the top four sources of research in psychology are all from English-speaking countries (see May Reference May1997): (1) the United States, (2) the United Kingdom, (3) Canada, and (4) Australia (for comparison, the top sources of research in physics are [1] the United States, [2] Germany, [3] Japan, and [4] France).
The second explanation combines arguments and evidence offered by Lancy, Fernald, and Karasik et al., suggesting at least the proximate end of a theory that may illuminate a wide range of cognitive difference between WEIRD populations and others. Lancy lays the groundwork by highlighting the relative strangeness, in a broad global and historical context, of modern middle- and upper-class American beliefs, values, cultural models, and practices vis-à-vis childrearing. Fernald and Karasik et al. review evidence that is beginning to document how these practices impact cognitive, linguistic, and motor development, including long-term cognitive outcomes.
At a more ultimate level, we speculate that in the context of mobile, meritocratic societies like those of the United States, Western Europe, and Australia, cultural evolutionary processes rooted in our evolved tendencies to imitate successful and prestigious individuals (Henrich & Gil-White Reference Henrich and Gil-White2001) will favor the spread of child-rearing traits that speed up and enhance the development of those particular cognitive and social skills that eventually translate into social and economic success in these populations. This kind of cultural evolutionary process may be part of what is driving the dramatic increases in IQ observed in many industrialized nations over the last century (Flynn Reference Flynn2007), along with increases in biases toward analytical reasoning and individualism. It would also explain the obsession with active instruction of all kinds shown by middle- and upper-class Americans (Lancy Reference Lancy2008).
In our target article we wanted to avoid making any theoretical claims regarding the origins of the psychological differences we highlighted. We suggested that WEIRD psychology probably arises from a myriad of different proximate causal sources that, at best, have been aggregated in WEIRD populations. The two hypotheses mentioned above illustrate this. It is likely a coincidence that the first population to seriously engage in the systematic study of psychology and decision-making happened to speak English. However, it is probably not a coincidence that the economic system of this population happened to favor certain child-rearing practices and ways of reasoning. Overall, we suspect that many of the phenomena for which WEIRD samples occupy extreme positions do so for quite distinct causal reasons, including some researcher-created biases. Once the behavioral sciences accept the existence – or potential existence – of broad-ranging variation among populations, we can commence with the more interesting endeavor of explaining that variation at both proximate and ultimate levels of inquiry.
R8. Closing words
We have a vision for the future of scientific efforts to understand the foundations of human psychology and behavior. Research programs need to increasingly emphasize large-scale, highly interdisciplinary, fully international research networks that maintain long-term, ongoing, research projects among diverse populations that collect data over the full life cycle using an integrated set of methodological tools, including wide-ranging experimental techniques, quantitative and qualitative ethnography, surveys, brain imaging, and biomarkers. Questions and methods are best devised and designed at collaborative meetings of these international research networks.