I Saw You in the Crowd: Credibility, Reproducibility, and Meta-Utility

Crowdsourcing enables novel forms of research and knowledge production. It uses cyberspace to collect diverse research participants, coordinate projects and keep costs low. Recently social scientists began crowdsourcing their peers to engage in mass research targeting a specific topic. This enables meta-analysis of many analysts’ results obtained from a single crowdsourced research project, leading to exponential gains in credibility and scientific utility. Initial applications demonstrate positive returns for both original and replication research using various research instruments, and secondary or experimental data. It can provide more reliable Bayesian priors for selecting models and is an untapped mode of theory production that greatly benefit social science. Finally, in addition to the credibility and reproducibility gains, crowdsourcing embodies many core values of the Open Science Movement because it promotes community and equality among scientists.

C rowdsourcing is a new word. Google Books' first recorded usage of "crowdsourcing" was in 1999. 1 It originally referenced Internet users adding content to websites, the largest example being Wikipedia (Zhao and Zhu 2014). As a scientific practice, its roots go back to open-engineering competitions in the 1700s, when the collective ideas of many overcame otherwise insurmountable problems of the few. Openly tapping into a large population breaks down barriers of time, money, and data. The solution is simple. A centralized investigator or team poses a question or problem to the crowd. Sometimes tasks are co-creative, such as the programming of packages by statistical software users and the development of Wiki content. Other times, they are discrete, such as asking individuals to donate computing power, 2 respond to a survey, or perform specific tasks. Using crowdsourced methods, scientists, citizens, entrepreneurs, and even governments more effectively address societies' most pressing problems (e.g., cancer and global warming) (Howe 2008).
Although the public comprises the typical crowd, new landmark studies involved researchers crowdsourcing other researchers. The human genome project, for example, involved major byline contributions from almost 3,000 researchers. In the social sciences, crowdsourcing of researchers is brand new. In a 2013 study, Silberzahn et al. (2018) convened researchers from across disciplines and geographic locations to analyze the same dataset to discover whether football referees issued more red cards to darker-skinned players. 3 Klein et al. (2014) sourced laboratories across the world to try and reproduce several high-profile experimental psychological studies. In the Crowdsourced Replication Initiative, Breznau, Rinke, and Wuttke et al. (2019) demonstrated that crowdsourcing also is useful for replication, structured online deliberation, and macro-comparative research. Studies like these are a new paradigm within social, cognitive, and behavioral research.
In the knowledge business, crowdsourcing is highly cost effective. The average researcher lacks the means to survey or conduct experiments with samples outside of universities. Crowdsourcing platforms such as Amazon's Mechanical Turk (mTurk) changed this by creating access to even-moreconvenient convenience samples from all over the world (Palmer and Strickland 2016). Others use these platforms to crowdsource globally competitive labor to perform paid tasks of scientific value (Berinsky, Huber, and Lenz 2012). Blockchain technology uses decentralized crowd computing. Many actions that crowds already take (e.g., "Tweeting" and "geotagging" images) offer free possibilities to test hypotheses of social and political relevance (Nagler and Tucker 2015;Salesses, Schechtner, and Hidalgo 2013). Wikipedia had fantastic success, given the strong willingness of publics around the world to add and monitor content. The Wiki model is the basis for all types of crowd creations. 4 There are the Wikiversity 5 crowdsourcing-teaching resources and the Replication-Wiki 6 for listing and identifying replications across the social sciences (Höffler 2017). In a similar vein, there are efforts to crowdsource open peer review and ideas for what researchers should study from potential stakeholders outside of science.
Some scholars might question why many researchers are necessary when we have so many preexisting datasets and a single researcher can run every possible statistical model configuration. Crowdsourcing researchers' time and effort may seem inefficient when machine learning is so advanced. However, if a single scholar or machine throws all possible variables into their models like a "kitchen sink," it is only efficient at maximizing prediction. There is a subtle but crucial difference between human research, where if X predicts Y, it might be a meaningful association, versus the machine approach, where if X predicts Y, it is meaningful (Leamer 1978). When humans or machines start running every possible model, which is the typical machine-learning strategy, they implicitly test every possible hypothesis and potentially every possible theory. Among all possible models, there will be some in which the data-generating process is theoretically impossible, such as predicting biological sex as an outcome of party affiliation, or right-party votes in 1970 from GDP in 2010. If we want something more than predictive power, human supervision and causal inference are necessary to rule out impossible realities that a computer cannot identify on its own (Pearl 2018).
Crowdsourcing is epistemologically different than machine learning. It is not a "kitchen-sink" method. The landmark Silberzahn et al. (2018) study demonstrated that seemingly innocuous decisions in the data-analysis phase of research can change results dramatically. The effect of idiosyncratic features of researchers (e.g., prior information and beliefs) are something Bayesian researchers raised warning signals about long ago (Jeffreys 1998(Jeffreys [1939). This is echoed in more recent discussions about the unreliability and irreproducibility of research (Dewald, Thursby, and Anderson 1986;Gelman and Loken 2014;Wicherts et al. 2016). However, grappling with this uncertainty is problematic. Identification of informative research priors usually involves consulting an expert, such as the researcher doing the work, or using simulations wherein the user selects parameters that may (or may not) reflect the reality of the researcher's decisions. The primary objection is that the distribution of prior beliefs across researchers is unknown (Gelman 2008). The Silberzahn et al. (2018) and Breznau, Rinke, and Wuttke (2018; studies provide a real observation of these priors because all researchers develop what should be at least a plausible model for testing the hypothesis, given the data. This observation extends beyond prior studies because the crowdsourced researchers scrutinized the other researchers' models before seeing the results. This practice is extremely useful because it is not one expert guessing about the relative value of a model but rather potentially 100 or more. The reduction of model uncertainty is good from a Bayesian perspective but it also demands theory from a causal perspective. Statisticians are well aware that without the correct model specification, estimates of uncertainty are themselves uncertain, if not useless. Thus, rather than having 8.8 million false positives after running 9 billion different regression models (Muñoz and Young 2018), humans can identify correct-or at least "better"-model specifications using causal theory and logic (Clark and Golder 2015;Pearl 2018). The diversity of results in the Silberzahn et al. (2018) and Breznau, Rinke, and Wuttke et al. (2018;) studies helped to identify key variables and modeling strategies that had the "power" to change the conclusions of any given research team. These were not simply variables among millions of models but rather variables among carefully constructed plausible models. This shifts the discussion from the results and even the Bayesian priors to data-generating theories.
Machines cannot apply discriminating logic-or can they? The US military's Defense Advanced Research Projects Agency (DARPA) deems the human-versus-machine question so important that it currently funds the Systematizing Confidence in Open Research and Evidence (SCORE) project pitting machines against research teams in reviewing the credibility of research across disciplines. SCORE architects believe that machines might be capable of determining the credibility of social and behavioral research better, faster, and more cost effectively than humans. In this case, crowdsourcing provides DARPA the exact method it needs to answer this question. This project is the largest crowdsourcing of social researchers to date, with already more than 500 participating. 7 It is a major demonstration that crowdsourcing can bring together the interests of academics and policy makers. It remains to be seen how valuable the machines are with minimal human interference.
At the heart of the SCORE project are questions of credibility and reliability. These topics also are at the center of scandals and conflicts that are infecting science at the moment and rapidly increasing researchers and funding agencies' interests in replication (Eubank 2016;Ishiyama 2014;Laitin and Reich 2017;Stockemer, Koehler, and Lentz 2018). However, there are limits to what replications can offer, and they take on diverse formats and goals (Clemens 2015;Freese and Peterson 2017). Currently, the decision of whether, what, and how to replicate is entirely a researcher's prerogative. Few engage in any replications, meaning that any given original study is fortunate to have even one replication. 8 Journals thus far have been hesitant to publish replications, especially of their own publications, even if the replications overturn preposterous-sounding claims (e.g., precognition) (Ritchie, Wiseman, and French 2012) or identify major mistakes in the methods (Breznau 2015;Gelman 2013). Crowdsourcing could change this because it provides the power of metareplication in one study. Moreover, crowdsourcing might appear to journal editors as cutting-edge research rather than "just another" replication (Silberzahn and Uhlmann 2015).
Perhaps more important, crowdsourced replications provide reliability at a meta-level. Replications, experimental reproductions, and original research alike suffer from what Leamer (1978) referred to as "metastatistics" problems. If researchers exercise their degrees of freedom such that ostensibly identical research projects have different results-for example, more than 5% of the time-then any one study is not reliable by most standards. Breznau, Rinke, and Wuttke (2018) simulated this problem and deduced that four to seven independent replications are necessary to obtain a majority of direct replications-that is, a reproduction of the same results, arriving at similar effect sizes within 0.01 of an original study in a 95% confidence interval. Subsequently, the results from their crowdsourced project showed that even after correcting for major mistakes in code, 11% of the effects were substantively different from the original (i.e., a change in significance or direction) (Breznau, Rinke, and Wuttke 2019). Søndergaard (1994) reviewed replications of the Values Survey Model employed by Hofstede-one of the most cited and replicated studies in social science-and found that 19 of 28 replications (only 68%) came to approximately the same relative results. Lack of reproducibility is not restricted to quantitative research. For example, Forscher et al. (2019) investigated thousands of National Institutes of Health grant reviews and found that using the typical three to five reviewers approach achieved only a 0.2 inter-rater reliability. Increasing this to an extreme of 12 reviewers per application achieved only 0.5 reliability. Both scores are far below any acceptable standard for researchers subjectively evaluating the same text data.
Crowdsourcing is epistemologically different than machine learning. It is not a "kitchen-sink" method. There is much to learn about why researchers arrive at different results. It appears that intentional decisions are only one factor (Yong 2012) and that context, versioning, and institutional constraints also play a role (Breznau 2016; Breznau, Rinke, and Wuttke 2019). Crowdsourcing addresses this meta-uncertainty because principal investigators (PIs) can hold certain research factors constant-including the method, data, and exact framing of the hypothesis-thereby exponentially increasing the power to both discover why researchers' results differ and to meta-analyze a selected topic . Combined with the rapidly expanding area of specification-curve analysis, crowdsourcing provides a new way to increase credibility for political and social researchin both sample populations and among the researchers themselves (Rohrer, Egloff, and Schmukle 2017;Simonsohn, Simmons, and Nelson 2015). It is hoped that these developments are tangible outcomes that increase public, private, and government views of social science.
As social scientists, we naturally should be skeptical of the "DARPA hypothesis" that computers could be more reliable than humans for evidence-based policy making (Grimmer 2015). In fact, a crowdsourced collaboration of 107 teams recently demonstrated that humans were essentially as good as machines in predicting life outcomes of children (Salganik et al. 2020). In this study, however, both humans and machines were relatively poor at prediction overall. These problems may stem from a lack of theory. For example, in a different study of brain imaging involving 70 crowdsourced teams, not a single pair agreed about the data-generating model (Botvinik-Nezer et al. 2020). Breznau et al. (2019) suggested that a lack of theory to describe the relationship of immigration and social policy preferences means that research in this area also is unreliable and-as these new crowdsourced studies demonstrate-metaanalyzing the results does not solve this dearth of theory. However, crowdsourcing has an untapped resource in addition to meta-analysis of results for resolving these issues: metaconstruction of theory (Breznau 2020). In the Breznau et al. (2019) study, careful consideration of the data-generating model, online deliberation, and voting on others' models revealed where theoretical weaknesses and lack of consensus exist. When coupling observed data-generating model variations across teams with their deliberations and disagreements on the one hand, with how these may or may not shape the results on the other, immigration and social policy scholars gain insights to where they should focus their theoretical efforts in the future to obtain the most significant gains in knowledge.
Additionally, as a structured crowdsourced research endeavor using the technology of the online deliberation platform Kialo, the Breznau et al. (2019) study undermined the current research system that favors novelty of individual researchers. It instead promoted consensus building and direct responsiveness to theoretical claims. When else do hundreds of political researchers collaborate to focus their theoretical discussions in one area? Crowdsourcing, with the help of technologies such as Kialo, could resolve the perpetual problem of scholars, areas, and disciplines talking "past" one another. It also would provide a leap forward technologically: we currently observe "collective" theory construction in a primitive format among conference panels and journal symposia, limited to a few invited participants in a specific event in space and time. This is a narrow collaborative process in the face of structured crowdsourced deliberations in which potentially thousands of global participants can engage at their convenience over many months. Kialo also provides extensive data for analyzing deliberative crowd processes and outcomes, thereby streamlining the process. 9 The benefits of crowdsourcing are not only on the scientific output or theoretical side. The Silberzahn et al. (2018) and Breznau et al. (2018; studies also included structured interaction among researchers, which fostered community engagement and the exchange of ideas and methodological knowledge. The Breznau et al. (2019) study specifically asked participants about their experiences and found that learning and enjoyment were common among them. For example, 69% reported that participation was enjoyable, 22% were neutral, and only 7% found it not enjoyable. Of 188 participants, the retention rate was 87% from start to finish, which demonstrates motivation across the spectrum of PhD students, postdocs, professors, and nonacademic professionals. Crowdsourcing potentially requires a significant time investment on behalf of the participating researchers, but the investment pays off in academic and personal development. Moreover, the practice of crowdsourcing researchers embodies many principles amenable to the Open Science Movement, such as open participation, transparency, reproducibility, and better practices. Crowdsourcing may not be a high-density method but it seems sustainable as an occasional large-scale project and as something good for science.
A key principle of open science and crowdsourcing also is related to incentive structure. The status attainment of scholars is theoretically infinite, limited only by the number of people willing to cite a scholar's work, which leads to egomaniacal and destructive behaviors (Sørensen 1996). This is the status quo of academia. In a crowdsourced endeavor, the crowdsourced researcher participants are equal. There is no chance for cartelism, veto, or exclusionary practices as long as the PIs are careful moderators. Thus, in its ideal form, crowdsourced projects should give authorship to all participants who complete the tasks assigned to them. The PIs of a crowdsourced endeavor naturally have more to gain than the participants because the research is their "brainchild" and they will be known as the "PIs." Nonetheless, they also have more to lose in case such a massive implementation of researcher human capital were to fail; therefore, the participants have good reason to trust the process. The net gains should be positive for everyone involved in a well-executed crowdsourced project. The advantage of this vis-à-vis the Open Science Movement is that the current "publish-or-perish" model becomes something positive, wherein all participants benefit by getting published-and any self-citation thereafter is a community rather than individualistic good.

Achieving Diversity and Inclusion in Political Science
A The American Political Science Association has several major programs aimed at enhancing diversity within the discipline and identifying and aiding students and faculty from underrepresented backgrounds in the political science field. These programs include:

Ralph Bunche Summer Institute (RBSI) (Undergraduate Juniors)
The RBSI Program is an annual five-week program designed to introduce to the world of doctoral study in political science to those undergraduate students from under-represented racial/ethnic groups or those interested in broadening participation in political science and pursuing scholarship on issues affecting underrepresented groups or issues of tribal sovereignty and governance. Application deadline: January of each year. For more information, visit www.apsanet.org/rbsi.

APSA Minority Fellows Program (MFP) (Undergraduate Seniors or MA and PhD students)
(Fall Cycle for seniors and MA Students, Spring Cycle for PhD students) MFP is a fellowship competition for those applying to graduate school, designed to increase the number of individuals from under-represented backgrounds with PhD's in political science. Application deadline: October and March of each year. For more information, visit www.apsanet.org/mfp.

Minority Student Recruitment Program (MSRP) (Undergraduates and Departmental members)
The MSRP was created to identify undergraduate students from under-represented backgrounds who are interested in, or show potential for, graduate study and, ultimately, to help further diversify the political science profession. For more information, visit www.apsanet.org/msrp.

APSA Mentoring Program
The Mentoring Program connects undergraduate, graduate students, and junior faculty to experienced and senior members of the profession for professional development mentoring. APSA membership is required for mentors. To request a mentor or be a mentor, visit www.apsanet.org/mentor.