Hostname: page-component-6b989bf9dc-pkhfk Total loading time: 0 Render date: 2024-04-11T21:04:54.961Z Has data issue: false hasContentIssue false

Beyond playing 20 questions with nature: Integrative experiment design in the social and behavioral sciences

Published online by Cambridge University Press:  21 December 2022

Abdullah Almaatouq*
Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA, USA
Thomas L. Griffiths
Departments of Psychology and Computer Science, Princeton University, Princeton, NJ, USA
Jordan W. Suchow
School of Business, Stevens Institute of Technology, Hoboken, NJ, USA
Mark E. Whiting
School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
James Evans
Department of Sociology, University of Chicago, Chicago, IL, USA Santa Fe Institute, Santa Fe, NM, USA
Duncan J. Watts
Department of Computer and Information Science, Annenberg School of Communication, and Operations, Information, and Decisions Department, University of Pennsylvania, Philadelphia, PA, USA
Corresponding author: Abdullah Almaatouq; Email:


The dominant paradigm of experiments in the social and behavioral sciences views an experiment as a test of a theory, where the theory is assumed to generalize beyond the experiment's specific conditions. According to this view, which Alan Newell once characterized as “playing twenty questions with nature,” theory is advanced one experiment at a time, and the integration of disparate findings is assumed to happen via the scientific publishing process. In this article, we argue that the process of integration is at best inefficient, and at worst it does not, in fact, occur. We further show that the challenge of integration cannot be adequately addressed by recently proposed reforms that focus on the reliability and replicability of individual findings, nor simply by conducting more or larger experiments. Rather, the problem arises from the imprecise nature of social and behavioral theories and, consequently, a lack of commensurability across experiments conducted under different conditions. Therefore, researchers must fundamentally rethink how they design experiments and how the experiments relate to theory. We specifically describe an alternative framework, integrative experiment design, which intrinsically promotes commensurability and continuous integration of knowledge. In this paradigm, researchers explicitly map the design space of possible experiments associated with a given research question, embracing many potentially relevant theories rather than focusing on just one. Researchers then iteratively generate theories and test them with experiments explicitly sampled from the design space, allowing results to be integrated across experiments. Given recent methodological and technological developments, we conclude that this approach is feasible and would generate more-reliable, more-cumulative empirical and theoretical knowledge than the current paradigm – and with far greater efficiency.

Target Article
Copyright © The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Aad, G., Abajyan, T., Abbott, B., Abdallah, J., Abdel Khalek, S., Abdelalim, A. A., … Zwalinski, L. (2012). Observation of a new particle in the search for the Standard Model Higgs Boson with the ATLAS detector at the LHC. Physics Letters, Part B, 716(1), 129.CrossRefGoogle Scholar
Abbott, B. P., Abbott, R., Abbott, T. D., Abernathy, M. R., Acernese, F., Ackley, K., … LIGO Scientific Collaboration and Virgo Collaboration. (2016). Observation of gravitational waves from a binary black hole merger. Physical Review Letters, 116(6), 061102.CrossRefGoogle ScholarPubMed
Aggarwal, I., & Woolley, A. W. (2018). Team creativity, cognition, and cognitive style diversity. Management Science, 65(4), 15861599. Scholar
Agrawal, M., Peterson, J. C., & Griffiths, T. L. (2020). Scaling up psychology via scientific regret minimization. Proceedings of the National Academy of Sciences of the United States of America, 117(16), 88258835.CrossRefGoogle ScholarPubMed
Allen, L., Scott, J., Brand, A., Hlava, M., & Altman, M. (2014). Publishing: Credit where credit is due. Nature, 508(7496), 312313.CrossRefGoogle ScholarPubMed
Allen, N. J., & Hecht, T. D. (2004). The “romance of teams”: Toward an understanding of its psychological underpinnings and implications. Journal of Occupational and Organizational Psychology, 77(4), 439461.CrossRefGoogle Scholar
Allport, F. H. (1924). The group fallacy in relation to social science. The American Journal of Sociology, 29(6), 688706.CrossRefGoogle Scholar
Almaatouq, A. (2019). Towards stable principles of collective intelligence under an environment-dependent framework. Massachusetts Institute of Technology. Scholar
Almaatouq, A., Alsobay, M., Yin, M., & Watts, D. J. (2021a). Task complexity moderates group synergy. Proceedings of the National Academy of Sciences of the United States of America, 118(36), e2101062118. ScholarPubMed
Almaatouq, A., Becker, J., Houghton, J. P., Paton, N., Watts, D. J., & Whiting, M. E. (2021b). Empirica: A virtual lab for high-throughput macro-level experiments. Behavior Research Methods, 53, 21582171. ScholarPubMed
Almaatouq, A., Noriega-Campero, A., Alotaibi, A., Krafft, P. M., Moussaid, M., & Pentland, A. (2020). Adaptive social networks promote the wisdom of crowds. Proceedings of the National Academy of Sciences of the United States of America, 117(21), 1137911386.CrossRefGoogle ScholarPubMed
Almaatouq, A., Rahimian, M. A., Burton, J. W., & Alhajri, A. (2022). The distribution of initial estimates moderates the effect of social influence on the wisdom of the crowd. Scientific Reports, 12(1), 16546.CrossRefGoogle ScholarPubMed
Many Primates, Altschul, D. M., Beran, M. J., Bohn, M., Call, J., DeTroy, S., Duguid, S. J., … Watzek, J. (2019). Establishing an infrastructure for collaboration in primate cognition research. PLoS ONE, 14(10), e0223675.Google ScholarPubMed
Arrow, H., McGrath, J. E., & Berdahl, J. L. (2000). Small groups as complex systems: Formation, coordination, development, and adaptation. Sage.CrossRefGoogle Scholar
Atkinson, A. C., & Donev, A. N. (1992). Optimum experimental designs (Oxford statistical science series, 8) (1st ed.). Clarendon Press.Google Scholar
Aumann, R. J., & Hart, S. (1992). Handbook of game theory with economic applications. Elsevier.Google Scholar
Auspurg, K., & Hinz, T. (2014). Factorial survey experiments. Sage.Google Scholar
Awad, E., Dsouza, S., Bonnefon, J.-F., Shariff, A., & Rahwan, I. (2020). Crowdsourcing moral machines. Communications of the ACM, 63(3), 4855.CrossRefGoogle Scholar
Awad, E., Dsouza, S., Kim, R., Schulz, J., Henrich, J., Shariff, A., … Rahwan, I. (2018). The Moral Machine experiment. Nature, 563(7729), 5964.CrossRefGoogle ScholarPubMed
Bakshy, E., Dworkin, L., Karrer, B., Kashin, K., Letham, B., Murthy, A., & Singh, S. (2018). AE: A domain-agnostic platform for adaptive experimentation. Workshop on System for ML. Scholar
Balandat, M., Karrer, B., Jiang, D. R., Daulton, S., Letham, B., Wilson, A. G., & Bakshy, E. (2020). BoTorch: A framework for efficient Monte-Carlo Bayesian optimization. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS'20) (pp. 21524–21538). Curran Associates Inc.Google Scholar
Balietti, S. (2017). NodeGame: Real-time, synchronous, online experiments in the browser. Behavior Research Methods, 49(5), 16961715.CrossRefGoogle ScholarPubMed
Balietti, S., Klein, B., & Riedl, C. (2021). Optimal design of experiments to identify latent behavioral types. Experimental Economics, 24, 772799. Scholar
Baribault, B., Donkin, C., Little, D. R., Trueblood, J. S., Oravecz, Z., van Ravenzwaaij, D., … Vandekerckhove, J. (2018). Metastudies for robust tests of theory. Proceedings of the National Academy of Sciences of the United States of America, 115(11), 26072612.CrossRefGoogle ScholarPubMed
Barron, B. (2003). When smart groups fail. Journal of the Learning Sciences, 12(3), 307359.CrossRefGoogle Scholar
Becker, J., Brackbill, D., & Centola, D. (2017). Network dynamics of social influence in the wisdom of crowds. Proceedings of the National Academy of Sciences of the United States of America, 114(26), E5070E5076.Google ScholarPubMed
Bell, S. T. (2007). Deep-level composition variables as predictors of team performance: A meta-analysis. The Journal of Applied Psychology, 92(3), 595615.CrossRefGoogle ScholarPubMed
Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E.-J., Berk, R., … Camerer, C. (2018). Redefine statistical significance. Nature Human Behaviour, 2, 610. ScholarPubMed
Berkman, E. T., & Wilson, S. M. (2021). So useful as a good theory? The practicality crisis in (social) psychological theory. Perspectives on Psychological Science, 16(4), 864874. ScholarPubMed
Bertrand, M., & Mullainathan, S. (2004). Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. The American Economic Review, 94(4), 9911013.CrossRefGoogle Scholar
Bourgin, D. D., Peterson, J. C., Reichman, D., Russell, S. J., & Griffiths, T. L. (2019). Cognitive model priors for predicting human decisions. In Chaudhuri, K. & Salakhutdinov, R. (Eds.), Proceedings of the 36th international conference on machine learning (Vol. 97, pp. 51335141). PMLR.Google Scholar
Bowen, D. (n.d.). Hemlock. Retrieved April 22, 2022, from Scholar
Brewin, C. R. (2022). Impact on the legal system of the generalizability crisis in psychology. The Behavioral and Brain Sciences, 45, e7.CrossRefGoogle ScholarPubMed
Breznau, N., Rinke, E. M., Wuttke, A., Nguyen, H. H. V., Adem, M., Adriaans, J., … Żółtak, T. (2022). Observing many researchers using the same data and hypothesis reveals a hidden universe of uncertainty. Proceedings of the National Academy of Sciences of the United States of America, 119(44), e2203150119.CrossRefGoogle ScholarPubMed
Brunswik, E.. (1947). Systematic and representative design of psychological experiments. In Proceedings of the Berkeley symposium on mathematical statistics and probability (pp. 143202). University of California Press.Google Scholar
Brunswik, E. (1955). Representative design and probabilistic theory in a functional psychology. Psychological Review, 62(3), 193217.CrossRefGoogle Scholar
Burger, B., Maffettone, P. M., Gusev, V. V., Aitchison, C. M., Bai, Y., Wang, X., … Cooper, A. I. (2020). A mobile robotic chemist. Nature, 583(7815), 237241.CrossRefGoogle ScholarPubMed
Byers-Heinlein, K., Bergmann, C., Davies, C., Frank, M. C., Kiley Hamlin, J., Kline, M., … Soderstrom, M. (2020). Building a collaborative psychological science: Lessons learned from ManyBabies 1. Canadian Psychology/Psychologie Canadienne, 61(4), 349363. ScholarPubMed
Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., … Wu, H. (2018). Evaluating the replicability of social science experiments in nature and science between 2010 and 2015. Nature Human Behaviour, 2(9), 637644.CrossRefGoogle Scholar
Carter, E. C., Schönbrodt, F. D., Gervais, W. M., & Hilgard, J. (2019). Correcting for bias in psychology: A comparison of meta-analytic methods. Advances in Methods and Practices in Psychological Science, 2(2), 115144.CrossRefGoogle Scholar
Cesario, J. (2014). Priming, replication, and the hardest science. Perspectives on Psychological Science, 9(1), 4048. ScholarPubMed
Cesario, J. (2022). What can experimental studies of bias tell us about real-world group disparities?. Behavioral and Brain Sciences, 45, E66. Scholar
Chandler, J., Mueller, P., & Paolacci, G. (2014). Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers. Behavior Research Methods, 46(1), 112130.CrossRefGoogle ScholarPubMed
Cohen, J. (1994). The earth is round (p<.05). The American Psychologist, 49(12), 997.CrossRefGoogle Scholar
Cooper, H., Hedges, L. V., & Valentine, J. C. (Eds.) (2019). The handbook of research synthesis and meta-analysis. Russell Sage Foundation.CrossRefGoogle Scholar
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281302.CrossRefGoogle ScholarPubMed
Debrouwere, S., & Rosseel, Y. (2022). The conceptual, cunning and conclusive experiment in psychology. Perspectives on Psychological Science, 17(3), 852862. ScholarPubMed
DeKay, M. L., Rubinchik, N., Li, Z., & De Boeck, P. (2022). Accelerating psychological science with metastudies: A demonstration using the risky-choice framing effect. Perspectives on Psychological Science, 17(6), 17041736. ScholarPubMed
de Leeuw, J. R. (2015). JsPsych: A JavaScript library for creating behavioral experiments in a web browser. Behavior Research Methods, 47(1), 112.CrossRefGoogle Scholar
de Leeuw, J. R., Motz, B. A., Fyfe, E. R., Carvalho, P. F., & Goldstone, R. L. (2022). Generalizability, transferability, and the practice-to-practice gap [Review of Generalizability, transferability, and the practice-to-practice gap]. The Behavioral and Brain Sciences, 45, e11.CrossRefGoogle ScholarPubMed
Devine, D. J., Clayton, L. D., Dunford, B. B., Seying, R., & Pryce, J. (2001). Jury decision making: 45 years of empirical research on deliberating groups. Psychology, Public Policy, and Law, 7(3), 622727.CrossRefGoogle Scholar
Devine, D. J., & Philips, J. L. (2001). Do smarter teams o better: A meta-analysis of cognitive ability and team performance. Small Group Research, 32(5), 507532.CrossRefGoogle Scholar
Dienes, Z. (2008). Understanding psychology as a science: An introduction to scientific and statistical inference. Macmillan.Google Scholar
Dubova, M., Moskvichev, A., & Zollman, K. (2022). Against theory-motivated experimentation in science. MetaArXiv. June 24. Scholar
Ebersole, C. R., Atherton, O. E., Belanger, A. L., Skulborstad, H. M., Allen, J. M., Banks, J. B., … Nosek, B. A. (2016). Many Labs 3: Evaluating participant pool quality across the academic semester via replication. Journal of Experimental Social Psychology, 67, 6882.CrossRefGoogle Scholar
Ellemers, N., & Rink, F. (2016). Diversity in work groups. Current Opinion in Psychology, 11, 4953.CrossRefGoogle Scholar
Engel, D., Woolley, A. W., Jing, L. X., Chabris, C. F., & Malone, T. W. (2014). Reading the mind in the eyes or reading between the lines? Theory of mind predicts collective intelligence equally well online and face-to-face. PLoS ONE, 9(12), e115212.CrossRefGoogle ScholarPubMed
Erev, I., Ert, E., Plonsky, O., Cohen, D., & Cohen, O. (2017). From anomalies to forecasts: Toward a descriptive model of decisions under risk, under ambiguity, and from experience. Psychological Review, 124(4), 369409.CrossRefGoogle Scholar
Eyke, N. S., Green, W. H., & Jensen, K. F. (2020). Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening. Reaction Chemistry & Engineering, 5(10), 19631972.CrossRefGoogle Scholar
Eyke, N. S., Koscher, B. A., & Jensen, K. F. (2021). Toward machine learning-enhanced high-throughput experimentation. Trends in Chemistry, 3(2), 120132.CrossRefGoogle Scholar
Fehr, E., & Gachter, S. (2000). Cooperation and punishment in public goods experiments. The American Economic Review, 90(4), 980994.CrossRefGoogle Scholar
Freese, J., & Peterson, D. (2017). Replication in social science. Annual Review of Sociology, 43, 147165. Scholar
Fyfe, E. R., de Leeuw, J. R., Carvalho, P. F., Goldstone, R. L., Sherman, J., Admiraal, D., … Motz, B. A. (2021). ManyClasses 1: Assessing the generalizable effect of immediate feedback versus delayed feedback across many college classes. Advances in Methods and Practices in Psychological Science, 4(3), 25152459211027575.CrossRefGoogle Scholar
Gale, D., & Shapley, L. S. (1962). College admissions and the stability of marriage. The American Mathematical Monthly, 69(1), 915.CrossRefGoogle Scholar
Gelman, A. (2018). Don't characterize replications as successes or failures [Review of Don't characterize replications as successes or failures]. The Behavioral and Brain Sciences, 41, e128.CrossRefGoogle ScholarPubMed
Gelman, A., & Carlin, J. (2017). Some natural solutions to the p-value communication problem – and why they won't work. Journal of the American Statistical Association, 112(519), 899901.CrossRefGoogle Scholar
Gelman, A., & Loken, E. (2014). The statistical crisis in science data-dependent analysis – a “garden of forking paths” – explains why many statistically significant comparisons don't hold up. American Scientist, 102(6), 460.CrossRefGoogle Scholar
Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4(1), 158.CrossRefGoogle Scholar
Gongora, A. E., Xu, B., Perry, W., Okoye, C., Riley, P., Reyes, K. G., … Brown, K. A. (2020). A Bayesian experimental autonomous researcher for mechanical design. Science Advances, 6(15), eaaz1708.CrossRefGoogle ScholarPubMed
Goodman, J. K., Cryder, C. E., & Cheema, A. (2013). Data collection in a flat world: The strengths and weaknesses of Mechanical Turk samples: Data collection in a flat world. Journal of Behavioral Decision Making, 26(3), 213224.CrossRefGoogle Scholar
Greenhill, S., Rana, S., Gupta, S., Vellanki, P., & Venkatesh, S. (2020). Bayesian optimization for adaptive experimental design: A review. IEEE Access, 8, 1393713948.CrossRefGoogle Scholar
Griffiths, T. L. (2015). Manifesto for a new (computational) cognitive revolution. Cognition, 135, 2123.CrossRefGoogle ScholarPubMed
Grubbs, J. B. (2022). The cost of crisis in clinical psychological science [Review of The cost of crisis in clinical psychological science]. The Behavioral and Brain Sciences, 45, e18.CrossRefGoogle ScholarPubMed
Hackman, J. R. (1968). Effects of task characteristics on group products. Journal of Experimental Social Psychology, 4(2), 162187.CrossRefGoogle Scholar
Harkins, S. G. (1987). Social loafing and social facilitation. Journal of Experimental Social Psychology, 23(1), 118.CrossRefGoogle Scholar
Hartshorne, J. K., de Leeuw, J. R., Goodman, N. D., Jennings, M., & O'Donnell, T. J. (2019). A thousand studies for the price of one: Accelerating psychological science with Pushkin. Behavior Research Methods, 51(4), 17821803. Scholar
Henrich, J., Heine, S., & Norenzayan, A. (2010). The weirdest people in the world?. Behavioral and Brain Sciences, 33(2-3), 6183. ScholarPubMed
Higgins, J. P. T., Thompson, S. G., Deeks, J. J., & Altman, D. G. (2003). Measuring inconsistency in meta-analyses. BMJ, 327(7414), 557560.CrossRefGoogle ScholarPubMed
Hill, G. W. (1982). Group versus individual performance: Are N + 1 heads better than one? Psychological Bulletin, 91(3), 517539.CrossRefGoogle Scholar
Hofman, J. M., Sharma, A., & Watts, D. J. (2017). Prediction and explanation in social systems. Science (New York, N.Y.), 355(6324), 486488.CrossRefGoogle ScholarPubMed
Hofman, J. M., Watts, D. J., Athey, S., Garip, F., Griffiths, T. L., Kleinberg, J., … Yarkoni, T. (2021). Integrating explanation and prediction in computational social science. Nature, 595(7866), 181188.CrossRefGoogle ScholarPubMed
Hofstede, G. (2016). Culture's consequences: Comparing values, behaviors, institutions, and organizations across nations (2nd ed.). Collegiate Aviation Review, 34(2), 108109. Retrieved from Scholar
Hong, L., & Page, S. E. (2004). Groups of diverse problem solvers can outperform groups of high-ability problem solvers. Proceedings of the National Academy of Sciences of the United States of America, 101(46), 1638516389.CrossRefGoogle ScholarPubMed
Horton, J. J., Rand, D. G., & Zeckhauser, R. J. (2011). The online laboratory: Conducting experiments in a real labor market. Experimental Economics, 14(3), 399425.CrossRefGoogle Scholar
Husband, R. W. (1940). Cooperative versus solitary problem solution. The Journal of Social Psychology, 11(2), 405409.CrossRefGoogle Scholar
Inglehart, R., & Welzel, C. (2005). Modernization, cultural change, and democracy: The human development sequence. Cambridge University Press.Google Scholar
Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124.CrossRefGoogle ScholarPubMed
Janis, I. L. (1972). Victims of groupthink: A psychological study of foreign-policy decisions and fiascoes (p. 277). Houghton Mifflin Company. Scholar
Jones, B. C., DeBruine, L. M., Flake, J. K., Liuzza, M. T., Antfolk, J., Arinze, N. C., … Coles, N. A. (2021). To which world regions does the valence-dominance model of social perception apply? Nature Human Behaviour, 5(1), 159169.CrossRefGoogle ScholarPubMed
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., … Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583589.CrossRefGoogle ScholarPubMed
Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica: Journal of the Econometric Society, 47(2), 263291.CrossRefGoogle Scholar
Karau, S. J., & Williams, K. D. (1993). Social loafing: A meta-analytic review and theoretical integration. Journal of Personality and Social Psychology, 65(4), 681706.CrossRefGoogle Scholar
Kim, Y. J., Engel, D., Woolley, A. W., Lin, J. Y.-T., McArthur, N., & Malone, T. W. (2017). What makes a strong team?: Using collective intelligence to predict team performance in league of legends. Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing – CSCW ’17 (pp. 2316–2329). New York, NY, USA.CrossRefGoogle Scholar
Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B., Bahník, Š., Bernstein, M. J., … Nosek, B. A. (2014). Investigating variation in replicability. Social Psychology, 45(3), 142152.CrossRefGoogle Scholar
Klein, R. A., Vianello, M., Hasselman, F., Adams, B. G., Adams, R. B., Alper, S., … Nosek, B. A. (2018). Many Labs 2: Investigating variation in replicability across samples and settings. Advances in Methods and Practices in Psychological Science, 1(4), 443490.CrossRefGoogle Scholar
Knudde, N., van der Herten, J., Dhaene, T., & Couckuyt, I. (2017). GPflowOpt: A Bayesian optimization library using TensorFlow. arXiv [stat.ML]. arXiv. Scholar
Koyré, A. (1953). An experiment in measurement. Proceedings of the American Philosophical Society, 97(2), 222237.Google Scholar
Lakens, D., Uygun Tunç, D., & Necip Tunç, M. (2022). There is no generalizability crisis [Review of There is no generalizability crisis]. The Behavioral and Brain Sciences, 45, e25.CrossRefGoogle ScholarPubMed
Landy, J. F., Jia, M. L., Ding, I. L., Viganola, D., Tierney, W., Dreber, A., … Uhlmann, E. L. (2020). Crowdsourcing hypothesis tests: Making transparent how design choices shape research results. Psychological Bulletin, 146(5), 451479.CrossRefGoogle ScholarPubMed
Larson, J. R. (2013). In search of synergy in small group performance. Psychology Press.CrossRefGoogle Scholar
Larson, S. D., & Martone, M. E. (2009). Ontologies for neuroscience: What are they and what are they good for? Frontiers in Neuroscience, 3(1), 6067. ScholarPubMed
Laughlin, P. R., Bonner, B. L., & Miner, A. G. (2002). Groups perform better than the best individuals on letters-to-numbers problems. Organizational Behavior and Human Decision Processes, 88(2), 605620.CrossRefGoogle Scholar
Lei, B., Kirk, T. Q., Bhattacharya, A., Pati, D., Qian, X., Arroyave, R., & Mallick, B. K. (2021). Bayesian optimization with adaptive surrogate models for automated experimental design. NPJ Computational Materials, 7(1), 112.CrossRefGoogle Scholar
LePine, J. A. (2003). Team adaptation and postchange performance: Effects of team composition in terms of members’ cognitive ability and personality. The Journal of Applied Psychology, 88(1), 2739.CrossRefGoogle ScholarPubMed
Letham, B., Karrer, B., Ottoni, G., & Bakshy, E. (2019). Constrained Bayesian optimization with noisy experiments. Bayesian Analysis, 14(2), 495519. Scholar
Levinthal, D. A., & Rosenkopf, L. (2021). Commensurability and collective impact in strategic management research: When non-replicability is a feature, not a bug. Working-paper (unpublished preprint). Scholar
Levitt, S. D., & List, J. A. (2007). What do laboratory experiments measuring social preferences reveal about the real world? The Journal of Economic Perspectives: A Journal of the American Economic Association, 21(2), 153174.CrossRefGoogle Scholar
Li, W., Germine, L. T., Mehr, S. A., Srinivasan, M., & Hartshorne, J. (2022). Developmental psychologists should adopt citizen science to improve generalization and reproducibility. Infant and Child Development, e2348. Scholar
Litman, L., Robinson, J., & Abberbock, T. (2017). A versatile crowdsourcing data acquisition platform for the behavioral sciences. Behavior Research Methods, 49(2), 433442.CrossRefGoogle ScholarPubMed
MacWhinney, B. (2014). The childes project: Tools for analyzing talk, volume II: The database (3rd ed.). Psychology Press. Scholar
Maier, M., Bartoš, F., Stanley, T. D., Shanks, D. R., Harris, A. J. L., & Wagenmakers, E.-J. (2022). No evidence for nudging after adjusting for publication bias. Proceedings of the National Academy of Sciences of the United States of America, 119(31), e2200300119.CrossRefGoogle ScholarPubMed
ManyBabies Consortium. (2020). Quantifying sources of variability in infancy research using the infant-directed-speech preference. Advances in Methods and Practices in Psychological Science, 3(1), 2452.CrossRefGoogle Scholar
Manzi, J. (2012). Uncontrolled: The surprising payoff of trial-and-error for business, politics, and society (pp. 1320). Basic Books.Google Scholar
Mao, A., Mason, W., Suri, S., & Watts, D. J. (2016). An experimental study of team size and performance on a complex task. PLoS ONE, 11(4), e0153048.CrossRefGoogle ScholarPubMed
Martin, T., Hofman, J. M., Sharma, A., Anderson, A., & Watts, D. J. (2016). Exploring limits to prediction in complex social systems. In Proceedings of the 25th international conference on world wide web no. 978-1-4503-4143-1 (pp. 683–694). Republic and Canton of Geneva, CHE. International World Wide Web Conferences Steering Committee.CrossRefGoogle Scholar
Mason, W., & Suri, S. (2012). Conducting behavioral research on Amazon's Mechanical Turk. Behavior Research Methods, 44(1), 123.CrossRefGoogle ScholarPubMed
Mason, W., & Watts, D. J. (2012). Collaborative learning in networks. Proceedings of the National Academy of Sciences of the United States of America, 109(3), 764769.CrossRefGoogle ScholarPubMed
McClelland, G. H. (1997). Optimal design in psychological research. Psychological Methods, 2(1), 319.CrossRefGoogle Scholar
McGrath, J. E. (1984). Groups: Interaction and performance. Prentice Hall.Google Scholar
Meehl, P. E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 34(2), 103115.CrossRefGoogle Scholar
Meehl, P. E. (1990a). Why summaries of research on psychological theories are often uninterpretable. Psychological Reports, 66(1), 195244.CrossRefGoogle Scholar
Meehl, P. E. (1990b). Appraising and amending theories: The strategy of Lakatosian defense and two principles that warrant it. Psychological Inquiry, 1(2), 108141.CrossRefGoogle Scholar
Mertens, S., Herberz, M., Hahnel, U. J. J., & Brosch, T. (2022). The effectiveness of nudging: A meta-analysis of choice architecture interventions across behavioral domains. Proceedings of the National Academy of Sciences of the United States of America, 119(1). ScholarPubMed
Merton, R. K. (1968). On sociological theories of the middle range. Social Theory and Social Structure, 3972.Google Scholar
Milkman, K. L., Gandhi, L., Patel, M. S., Graci, H. N., Gromet, D. M., Ho, H., … Duckworth, A. L. (2022). A 680,000-person megastudy of nudges to encourage vaccination in pharmacies. Proceedings of the National Academy of Sciences of the United States of America, 119(6). ScholarPubMed
Milkman, K. L., Patel, M. S., Gandhi, L., Graci, H. N., Gromet, D. M., Ho, H., … Duckworth, A. L. (2021). A megastudy of text-based nudges encouraging patients to get vaccinated at an upcoming doctor's appointment. Proceedings of the National Academy of Sciences of the United States of America, 118(20), e2101165118.CrossRefGoogle ScholarPubMed
Mook, D. G. (1983). In defense of external invalidity. The American Psychologist, 38(4), 379387.CrossRefGoogle Scholar
Moshontz, H., Campbell, L., Ebersole, C. R., IJzerman, H., Urry, H. L., Forscher, P. S., … Chartier, C. R. (2018). The psychological science accelerator: Advancing psychology through a distributed collaborative network. Advances in Methods and Practices in Psychological Science, 1(4), 501515.CrossRefGoogle ScholarPubMed
Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., du Sert, N. P., … Ioannidis, J. P. A. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1, 21.CrossRefGoogle ScholarPubMed
Muthukrishna, M., Bell, A. V., Henrich, J., Curtin, C. M., Gedranovich, A., McInerney, J., & Thue, B. (2020). Beyond western, educated, industrial, rich, and democratic (WEIRD) psychology: Measuring and mapping scales of cultural and psychological distance. Psychological Science, 31(6), 678701.CrossRefGoogle ScholarPubMed
Muthukrishna, M., & Henrich, J. A. (2019). A problem in theory. Nature Human Behaviour, 3, 221229. ScholarPubMed
Myerson, R. B. (1981). Optimal auction design. Mathematics of Operations Research, 6(1), 5873.CrossRefGoogle Scholar
National Information Standards Organization. (2022). ANSI/NISO Z39. 104-2022, CRediT, contributor roles taxonomy. [S. L.]. National Information Standards Organization. Scholar
National Science Foundation. (2022). NSF budget requests to congress and annual appropriations. National Science Foundation. Scholar
Nemesure, M. D., Heinz, M. V., Huang, R., & Jacobson, N. C. (2021). Predictive modeling of depression and anxiety using electronic health records and a novel machine learning approach with artificial intelligence. Scientific Reports, 11(1), 1980.CrossRefGoogle Scholar
Newell, A. (1973). You can't play 20 questions with nature and win: Projective comments on the papers of this symposium. Scholar
Open Science Collaboration. (2015). PSYCHOLOGY. Estimating the reproducibility of psychological science. Science (New York, N.Y.), 349(6251), aac4716.CrossRefGoogle Scholar
Page, S. E. (2008). The difference: How the power of diversity creates better groups, firms, schools, and societies – New edition. Princeton University Press.CrossRefGoogle Scholar
Palan, S., & Schitter, C. (2018). – A subject pool for online experiments. Journal of Behavioral and Experimental Finance, 17, 2227.CrossRefGoogle Scholar
Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D., & Griffiths, T. L. (2021). Using large-scale experiments and machine learning to discover theories of human decision-making. Science (New York, N.Y.), 372(6547), 12091214.CrossRefGoogle ScholarPubMed
Plonsky, O., Apel, R., Ert, E., Tennenholtz, M., Bourgin, D., Peterson, J. C., … Erev, I. (2019). Predicting human decisions with behavioral theories and machine learning. arXiv [cs.AI]. arXiv. Scholar
Preckel, F., & Brunner, M. (2017). Nomological nets. Encyclopedia of Personality and Individual Differences, 14. Scholar
Ren, P., Xiao, Y., Chang, X., Huang, P.-Y., Li, Z., Gupta, B. B., … Wang, X. (2021). A survey of deep active learning. ACM Computing Surveys, 54(9), 140.Google Scholar
Reuss, H., Kiesel, A., & Kunde, W. (2015). Adjustments of response speed and accuracy to unconscious cues. Cognition, 134, 5762.CrossRefGoogle ScholarPubMed
Richard Hackman, J., & Morris, C. G. (1975). Group tasks, group interaction process, and group performance effectiveness: A review and proposed integration. In Berkowitz, L. (Ed.), Advances in Experimental Social Psychology (Vol. 8, pp. 4599). Academic Press. Scholar
Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86(3), 638641.CrossRefGoogle Scholar
Rubin, D. L., Lewis, S. E., Mungall, C. J., Misra, S., Westerfield, M., Ashburner, M., … Musen, M. A. (2006). National center for biomedical ontology: Advancing biomedicine through structured organization of scientific knowledge. OMICS: A Journal of Integrative Biology, 10(2), 185198. ScholarPubMed
Schneid, M., Isidor, R., Li, C., & Kabst, R. (2015). The influence of cultural context on the relationship between gender diversity and team performance: A meta-analysis. The International Journal of Human Resource Management, 26(6), 733756.CrossRefGoogle Scholar
Schulz-Hardt, S., & Mojzisch, A. (2012). How to achieve synergy in group decision making: Lessons to be learned from the hidden profile paradigm. European Review of Social Psychology, 23(1), 305343.CrossRefGoogle Scholar
Schwartz, S. (2006). A theory of cultural value orientations: Explication and applications. Comparative Sociology, 5(2–3), 137182.CrossRefGoogle Scholar
Settles, B. (2011). From theories to queries: Active learning in practice. In Guyon, I., Cawley, G., Dror, G., Lemaire, V., & Statnikov, A. (Eds.), Active learning and experimental design workshop in conjunction with AISTATS 2010 (Vol. 16, pp. 118). PMLR.Google Scholar
Shallue, C. J., & Vanderburg, A. (2018). Identifying exoplanets with deep learning: A five-planet resonant chain around Kepler-80 and an eighth planet around Kepler-90. AJS; American Journal of Sociology, 155(2), 94.Google Scholar
Shaw, M. E. (1963). Scaling group tasks: A method for dimensional analysis. Scholar
Shields, B. J., Stevens, J., Li, J., Parasram, M., Damani, F., Alvarado, J. I. M., … Doyle, A. G. (2021). Bayesian reaction optimization as a tool for chemical synthesis. Nature, 590(7844), 8996.CrossRefGoogle ScholarPubMed
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 13591366.CrossRefGoogle ScholarPubMed
Simons, D. J., Shoda, Y., & Lindsay, D. S. (2017). Constraints on generality (COG): A proposed addition to all empirical papers. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 12(6), 11231128.CrossRefGoogle ScholarPubMed
Simonsohn, U., Simmons, J., & Nelson, L. D. (2022). Above averaging in literature reviews. Nature Reviews Psychology, 1(10), 551552.CrossRefGoogle Scholar
Smucker, B., Krzywinski, M., & Altman, N. (2018). Optimal experimental design. Nature Methods, 15(8), 559560.CrossRefGoogle ScholarPubMed
Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. arXiv [stat.ML]. arXiv. Scholar
Steiner, I. D. (1972). Group process and productivity. Academic Press.Google Scholar
Stewart, G. L. (2006). A meta-analytic review of relationships between team design features and team performance. Journal of Management, 32(1), 2955.CrossRefGoogle Scholar
Stokes, D. E. (1997). Pasteur's quadrant: Basic science and technological innovation. Brookings Institution Press.Google Scholar
Szaszi, B., Higney, A., Charlton, A., Gelman, A., Ziano, I., Aczel, B., … Tipton, E. (2022). No reason to expect large and consistent effects of nudge interventions [Review of No reason to expect large and consistent effects of nudge interventions]. Proceedings of the National Academy of Sciences of the United States of America, 119(31), e2200732119.CrossRefGoogle ScholarPubMed
Tasca, G. A. (2021). Team cognition and reflective functioning: A review and search for synergy. Group Dynamics: Theory, Research, and Practice, 25(3), 258270.CrossRefGoogle Scholar
Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3–4), 285294.CrossRefGoogle Scholar
Turner, J. A., & Laird, A. R. (2012). The cognitive paradigm ontology: Design and application. Neuroinformatics, 10(1), 5766.CrossRefGoogle ScholarPubMed
Turner, M. A., & Smaldino, P. E. (2022). Mechanistic modeling for the masses [Review of Mechanistic modeling for the masses]. The Behavioral and Brain Sciences, 45, e33.CrossRefGoogle ScholarPubMed
Uhlmann, E. L., Ebersole, C. R., Chartier, C. R., Errington, T. M., Kidwell, M. C., Lai, C. K., … Nosek, B. A. (2019). Scientific utopia III: Crowdsourcing science. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 14(5), 711733.CrossRefGoogle ScholarPubMed
Van Bavel, J. J., Mende-Siedlecki, P., Brady, W. J., & Reinero, D. A. (2016). Contextual sensitivity in scientific reproducibility. Proceedings of the National Academy of Sciences of the United States of America, 113(23), 64546459.CrossRefGoogle ScholarPubMed
Vickrey, W. (1961). Counterspeculation, auctions, and competitive sealed tenders. The Journal of Finance, 16(1), 837.CrossRefGoogle Scholar
Voelkel, J. G., Stagnaro, M. N., Chu, J., Pink, S. L., Mernyk, J. S., Redekopp, C., … Willer, R. (2022). Megastudy identifying successful interventions to strengthen Americans’ democratic attitudes. Preprint. Scholar
Wager, S., & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523), 12281242.CrossRefGoogle Scholar
Watson, G. B. (1928). Do groups think more efficiently than individuals? Journal of Abnormal and Social Psychology, 23(3), 328.CrossRefGoogle Scholar
Watts, D. (2017). Response to Turco and Zuckerman's “Verstehen for sociology.” The American Journal of Sociology, 122(4), 12921299.CrossRefGoogle Scholar
Watts, D. J. (2011). Everything is obvious*: Once you know the answer. Crown Business.Google Scholar
Watts, D. J. (2014). Common sense and sociological explanations. The American Journal of Sociology, 120(2), 313351.CrossRefGoogle ScholarPubMed
Watts, D. J. (2017). Should social science be more solution-oriented? Nature Human Behaviour, 1, 15.CrossRefGoogle Scholar
Watts, D. J., Beck, E. D., Bienenstock, E. J., Bowers, J., Frank, A., Grubesic, A., … Salganik, M. (2018). Explanation, prediction, and causality: Three sides of the same coin? Scholar
Wiernik, B. M., Raghavan, M., Allan, T., & Denison, A. J. (2022). Generalizability challenges in applied psychological and organizational research and practice [Review of Generalizability challenges in applied psychological and organizational research and practice]. The Behavioral and Brain Sciences, 45, e38.CrossRefGoogle ScholarPubMed
Witkop, G. (n.d.). Systematizing confidence in open research and evidence (SCORE). DARPA. Retrieved June 22, 2022, from Scholar
Wood, R. E. (1986). Task complexity: Definition of the construct. Organizational Behavior and Human Decision Processes, 37(1), 6082.CrossRefGoogle Scholar
Woolley, A. W., Chabris, C. F., Pentland, A., Hashmi, N., & Malone, T. W. (2010). Evidence for a collective intelligence factor in the performance of human groups. Science (New York, N.Y.), 330(6004), 686688.CrossRefGoogle ScholarPubMed
Wurman, P. R., Wellman, M. P., & Walsh, W. E. (2001). A parametrization of the auction design space. Games and Economic Behavior, 35(1), 304338.CrossRefGoogle Scholar
Yarkoni, T. (2022). The generalizability crisis. Behavioral and Brain Sciences, 45, E1. Scholar
Yarkoni, T., Eckles, D., Heathers, J., Levenstein, M., Smaldino, P. E., & Lane, J. I. (2019). Enhancing and accelerating social science via automation: Challenges and opportunities. Scholar
Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 12(6), 11001122.CrossRefGoogle ScholarPubMed
Zelditch, M. Jr. (1969). Can you really study an army in the laboratory. A Sociological Reader on Complex Organizations, 528539.Google Scholar