Crossref Citations
This article has been cited by the following publications. This list is generated based on data provided by
Crossref.
Gyimesi, Marton L.
Vilsmeier, Johannes K.
Voracek, Martin
Tran, Ulrich S.
Vazire, Simine
and
Tullett, Alexa
2019.
No Evidence That Lateral Preferences Predict Individual Differences in the Tendency to Update Mental Representations: A Replication-Extension Study.
Collabra: Psychology,
Vol. 5,
Issue. 1,
Gangestad, Steven W.
Dinh, Tran
Grebe, Nicholas M.
Del Giudice, Marco
and
Emery Thompson, Melissa
2019.
Psychological cycle shifts redux: Revisiting a preregistered study examining preferences for muscularity.
Evolution and Human Behavior,
Vol. 40,
Issue. 6,
p.
501.
Sanbonmatsu, David M.
Cooley, Emily H.
and
Butner, Jonathan E.
2021.
The Impact of Complexity on Methods and Findings in Psychological Science.
Frontiers in Psychology,
Vol. 11,
Issue. ,
Heirene, Robert M.
2021.
A call for replications of addiction research: which studies should we replicate and what constitutes a ‘successful’ replication?.
Addiction Research & Theory,
Vol. 29,
Issue. 2,
p.
89.
Plucker, Jonathan A.
and
Makel, Matthew C.
2021.
Replication is important for educational psychology: Recent developments and key issues.
Educational Psychologist,
Vol. 56,
Issue. 2,
p.
90.
Fletcher, Samuel C.
2021.
The role of replication in psychological science.
European Journal for Philosophy of Science,
Vol. 11,
Issue. 1,
Irvine, Elizabeth
2021.
The Role of Replication Studies in Theory Building.
Perspectives on Psychological Science,
Vol. 16,
Issue. 4,
p.
844.
Boyce, Veronica
Mathur, Maya
and
Frank, Michael C.
2023.
Eleven years of student replication projects provide evidence on the correlates of replicability in psychology.
Royal Society Open Science,
Vol. 10,
Issue. 11,
Butt, Furqan A.
Fawzy, Mohammad
Al Wattar, Bassel H.
Bueno-Cavanillas, Aurora
Khan, Khalid S.
and
Khalaf, Yacoub
2024.
The randomized clinical trial trustworthiness crisis.
Middle East Fertility Society Journal,
Vol. 29,
Issue. 1,
Almaatouq, Abdullah
Griffiths, Thomas L.
Suchow, Jordan W.
Whiting, Mark E.
Evans, James
and
Watts, Duncan J.
2024.
Beyond playing 20 questions with nature: Integrative experiment design in the social and behavioral sciences.
Behavioral and Brain Sciences,
Vol. 47,
Issue. ,
Murphy, Jennifer
Caldwell, Aaron R.
Mesquida, Cristian
Ladell, Aera J. M.
Encarnación-Martínez, Alberto
Tual, Alexandre
Denys, Andrew
Cameron, Bailey
Van Hooren, Bas
Parr, Ben
DeLucia, Bianca
Mason, Billy R. J.
Clark, Brad
Egan, Brendan
Brown, Calum
Ade, Carl
Sforza, Chiarella
Taber, Christopher B.
Kirk, Christopher
McCrum, Christopher
Tighe, Cian OKeeffe
Byrne, Ciara
Brunetti, Claudia
Forestier, Cyril
Martin, Dan
Taylor, Danny
Diggin, David
Gallagher, Dearbhla
King, Deborah L.
Rogers, Elizabeth
Bennett, Eric C.
Lopatofsky, Eric T.
Dunn, Gemma
Gauchard, Gérome C.
Mornieux, Guillaume
Catalá-Vilaplana, Ignacio
Caetan, Ines
Aparicio-Aparicio, Inmaculada
Barnes, Jack
Blaisdell, Jake
Steele, James
Fletcher, Jared R.
Hutchinson, Jasmin
Au, Jason
Oliemans, Jason P.
Bakhshinejad, Javad
Barrios, Joaquin
Quesada, Jose Ignacio Priego
Rager, Joseph
Capone, Julia B.
Walton, Julie S. J.
Stevens, Kailey
Heinrich, Katie
Wu, Kelly
Meijer, Kenneth
Richards, Laura
Jutlah, Lauren
Tong, Le
Bridgeman, Lee
Banet, Leo
Mbiyu, Leonard
Sefton, Lucy
de Chanaleilles, Margaux
Charisi, Maria
Beerse, Matthew
Major, Matthew J.
Caon, Maya
Bargh, Mel
Rowley, Michael
Moran, Miguel Vaca
Croker, Nicholas
Hanen, Nicolas C.
Montague, Nicole
Brick, Noel E.
Runswick, Oliver R.
Willems, Paul
Pérez-Soriano, Pedro
Blake, Rebecca
Jones, Rebecca
Quinn, Rebecca Louise
Sanchis-Sanchis, Roberto
Rabello, Rodrigo
Bolger, Roisin
Shohat, Roy
Cotton, Sadie
Chua, Samantha
Norwood, Samuel
Vimeau, Samuel
Dias, Sandro
Pedersen, Sissel
Skaper, Spencer S.
Coyle, Taylor
Desai, Terun
Gee, Thomas I.
Edwards, Tobias
Pohl, Torsten
Yingling, Vanessa
Ribeiro, Vinicius
Duchene, Youri
Papadakis, Zacharias
and
Warne, Joe P.
2025.
Estimating the Replicability of Sports and Exercise Science Research.
Sports Medicine,
Vol. 55,
Issue. 10,
p.
2659.
Lipshits, Rachel
Goldstein, Kelly
Goldstein, Alon
Eichel, Ron
and
Goldstein, Ayelet
2025.
Novel Approach to Modeling Investor Decision-Making Using the Dual-Process Theory: Synthesizing Experimental Methods from Within-Subjects to Between-Subjects Designs.
Mathematics,
Vol. 13,
Issue. 19,
p.
3090.
I agree wholeheartedly that replication, or the potential of replication, is central to experimental science, and I also agree that various concerns about the difficulty of replication should, in fact, be interpreted as arguments in favor of replication. For example, if effects can vary by context, this provides more reason why replication is necessary for scientific progress. I also agree with the target article that it is an error when, following a disappointing replication result, proponents of the original published studies “irrationally privilege the chronological order of studies more than the objective characteristics of those studies when evaluating claims about quality and scientific rigor” (sect. 5.1.1, para. 3). As a remedy to this fallacy I have proposed a “time-reversal heuristic” (Gelman Reference Gelman2016b): the thought experiment of imagining the large, pre-registered replication study coming first, followed by the original, uncontrolled study.
It may well make sense to assign lower value to replications than to original studies, when considered as intellectual products, as we can assume the replication requires less creative effort. When considered as scientific evidence, however, the results from a replication could well be better than those of the original study, in that the replication can have more control in its design, measurement, and analysis.
It is also good to present and analyze all of the data from an experiment. Selection, forking paths, and researcher degrees of freedom have led us into the replication crisis, but these problems are all much reduced with analyses that use all of the data. Conversely, if we do not have access to raw data, many published results are close to useless, and when there is a high-quality pre-registered replication, I would be inclined to pretty much ignore the original paper, rather than, say, to assume the truth lies somewhere between the original and replication results.
Beyond this, I would like to add two points from a statistician's perspective.
First, the idea of replication is central not just to scientific practice but also to formal statistics, even though this has not always been recognized. Frequentist statistics relies on the reference set of repeated experiments, and Bayesian statistics relies on the prior distribution which represents the population of effects – and in the analysis of replication studies it is important for the model to allow effects to vary across scenarios.
My second point is that in the analysis of replication studies I recommend continuous analysis and multilevel modeling (meta-analysis), in contrast to the target article which recommends binary decision rules, which I think are contrary to the spirit of inquiry that motivates replication in the first place.
The target article follows the conventional statistical language in which a study is a “false positive” if it claims to find an effect where none exists. But in the human sciences, just about all of the effects we are trying to study are real; there are no zeros. See Gelman (Reference Gelman2013) and McShane et al. (Reference McShane, Gal, Gelman, Robert and Tackett2017) for further discussion of this point. Effects can be hard to detect, though, because they can be highly variable and measured inaccurately and with bias. Instead of talking about false positives and false negatives, we prefer to speak of type M (magnitude) and type S (sign) errors (Gelman & Carlin Reference Gelman and Carlin2014). Related is the use of expressions such as “failed replication.” I have used such phrases myself, but they get us into trouble with their implication that there is some criterion under which a replication can be said to succeed or fail. Do we just check whether p<.05? That would be a very noisy rule, and I think we would all be better off simply reporting the results from the old and new studies (as in the graph in Simmons & Simonsohn Reference Simmons and Simonsohn2015). If there is a need to count replications in a larger study of studies such as the Open Science Collaboration, I would prefer to do so using continuous measures rather than threshold-based replication rates.
The authors write, “if there is no theoretical reason to assume that an effect that was produced with a sample of college students in Michigan will not produce a similar effect in Florida, or in the United Kingdom or Japan, for that matter, then a replication carried out with these samples would be considered direct” (sect. 4, para. 3). The difficulty here is that theories are often so flexible that all these sorts of differences can be cited as reasons for a replication failure. For example, Michigan is colder than Florida, and outdoor air temperature was used as an alibi for a replication failure of a well-publicized finding in evolutionary psychology (Tracy & Beall Reference Tracy and Beall2014). Also there is no end to the differences between the United Kingdom and Japan that could be used to explain away a disappointing replication result in social psychology. The point is that any of these could be considered a “direct replication” if that interpretation is desired, or a mere “extension” or “conceptual replication” if the results do not come out as planned. In social psychology, at least, it could be argued that no replication is truly direct: society, and social expectations, change over time. The authors recognize this in citing Schmidt (Reference Schmidt2009) and also in their discussion of why contextual variation does not invalidate the utility of replications; given this, I think the authors could improve their framework by abandoning the concept of “direct replication” entirely, instead moving to a meta-analytic approach in which it is accepted ahead of time that the underlying treatment effects will vary between studies. Rather than trying to evaluate “whether a study is a direct or conceptual” replication, we can express the difference between old and new studies in terms of the expected variation in the treatment effect between conditions.
That said, if the measurements in the original study are indirect and noisy (as is often the case) and it is impossible or inconvenient to reanalyze the raw data, the question is moot, and it can make sense to just take the results from the replication or extension studies as our new starting point.