Madole & Harden (M&H) draw a comparison between the results of randomized controlled trials (RCTs) and finding of behavior genetics; just as RCTs give us “causal knowledge,” they argue, so too does behavior genetics. The causes identified in both cases, they argue, are shallow causes – they are “non-uniform,” “non-unitary,” and “non-explanatory,” but they are causes nonetheless (target article). And, they argue, just as in the case of RCTs, the “shallow” results can be leveraged into research leading to the establishment of “second-generation causal knowledge,” by identifying “processes and contexts through which the effect emerges” (target article, sect. 2.4, para. 5). We disagree and argue that the comparison to RCTs is misleading for both first- and second-generation causal analysis. The “shallowness” of the causal knowledge gained in RCTs does not prevent them from being useful guides to practice; this is not the case with the associations found in behavior genetics.
The point of RCTs, in general, is to find actionable interventions. Although these interventions rely on “shallow” causal knowledge, and so cannot be expected to port reliably between different populations and environments, they can be expected to be effective in populations and environments similar to those in the RCT. When we, for example, change our prescribing practices based on the results of an RCT, we expect that the drug we are now prescribing will, on average, yield better results, assuming that the population is relevantly similar to that used by the RCT. Or, to take M&H's example, if we decided to promote early childhood education as a means of improving socially desirable outcomes, the RCT would give us a reason to expect those improvements, ceteris paribus. These actionable interventions represent “first-generation” causal knowledge, according to M&H, and can later potentially provide an entry into mechanistic understandings of causation and explanation, so called “second-generation” causal knowledge.
The situation in behavior genetics is nothing like this. Unlike in the case of RCTs, we cannot change the genetic variants associated with the phenotypic variation – and even if we could, doing so would be wildly irresponsible. A reliable finding that a particular childhood intervention has downstream effects that are significant both statistically and “clinically” (or socially, etc.) suggests a reasonable course of action, even if the relationship between the intervention and the outcome is shallow (non-unitary, non-uniform, non-explanatory). But what would we gain from even an accurate finding that a particular genetic variant was associated with downstream effects? If such knowledge was not “shallow,” we might be able to use information gleaned from such genetic associations to intervene environmentally – but the very nature of the shallowness of the relationship prevents our being able to make those kinds of interventions; we cannot change the genes, and gain no information useful for making environmental interventions that could not be equally or better gained from straightforward analyses of the effects of intervention on the traits themselves.
The “shallowness” of behavior genetics findings is generally more problematic than that typically encountered for RCTs. In behavior genetics, the associations between particular loci and behavioral traits are almost absurdly small, and the only reason that they rise to statistical significance at all is that genome-wide association studies (GWASs) can leverage sample sizes in the hundreds of thousands or millions. Even when accumulated across thousands of loci in polygenic scores, the application of polygenic scores is plagued by problems with portability because of effects of both real biological complexity and methodological artifacts (Kaplan & Fullerton, Reference Kaplan and Fullerton2022; Matthews, Reference Matthews2022). These issues, in addition to the generally low predictive power, render polygenic scores of little to no use at the individual level (Fusar-Poli, Rutten, van Os, Aguglia, & Guloksuz, Reference Fusar-Poli, Rutten, van Os, Aguglia and Guloksuz2022; Morris, Davies, & Smith, Reference Morris, Davies and Smith2020) and a large amount of uncertainty of estimates appears to hinder the ability to even accurately and consistently stratify individuals into high-risk groups (Ding et al., Reference Ding, Hou, Burch, Lapinska, Privé, Vilhjálmsson and Pasaniuc2022; Muslimova et al., Reference Muslimova, Dias Pereira, von Hinke, Van Kippersluis, Rietveld and Meddens2023; Schultz et al., Reference Schultz, Merikangas, Ruparel, Jacquemont, Glahn, Gur and Almasy2022).
Put bluntly, knowing that one drug is, say, 20% more effective than another is actionable; knowing that one allele is associated with a tiny fraction of a percent of the variance in the trait in that population is far less so. Finding associations with a small amount of variance is not necessarily useless; disease GWASs have sometimes highlighted promising mechanisms or pathways even from single-nucleotide polymorphisms (SNPs) with small effects; however, sociobehavioral GWASs have not provided any such specific and discrete pathways. Adding together many such alleles yield no more actionable information than any one of them. Indeed, this points toward another weakness of M&H – it elides the distinction between the results of GWASs and heritability estimates from classic “twin” studies. The former point toward loci that at least in principle might be useful for moving from “shallow” to “second-generation” causal analyses (though the tiny effect sizes make the value of this questionable). The latter point toward nothing actionable at all. While accumulating small effects into polygenic scores has increased values of R 2, there has been little elaboration of this kind of mechanistic knowledge revealed by functional enrichment or genetic correlations. This does not mean that such studies are necessarily useless – just that they do not produce the kinds of causal information about the influence of genes on behavior that anything useful can be done with (Kaplan & Turkheimer, Reference Kaplan and Turkheimer2021; Turkheimer, Reference Turkheimer2016).
There are a number of other important disanalogies that one might point toward. The controlled aspect of RCTs makes “red hair” effects (Jencks, Reference Jencks, Smith, Henry, Jo Bane, Cohen, Gintis and Michelson1972) much less likely (though not impossible). The randomizing element of RCTs avoids the problems that GWASs have with cryptic population structure. More generally, RCTs, by their nature, avoid many problems with the intervention of interest covarying systemically with the environment experienced – problems that plague both heritability estimates and GWASs.
In the end, the comparison between the causal information gleaned from RCTs and the results of behavior genetics highlights the weaknesses of the latter, and reveals that they share more in common with observational studies, including their weaknesses.
Madole & Harden (M&H) draw a comparison between the results of randomized controlled trials (RCTs) and finding of behavior genetics; just as RCTs give us “causal knowledge,” they argue, so too does behavior genetics. The causes identified in both cases, they argue, are shallow causes – they are “non-uniform,” “non-unitary,” and “non-explanatory,” but they are causes nonetheless (target article). And, they argue, just as in the case of RCTs, the “shallow” results can be leveraged into research leading to the establishment of “second-generation causal knowledge,” by identifying “processes and contexts through which the effect emerges” (target article, sect. 2.4, para. 5). We disagree and argue that the comparison to RCTs is misleading for both first- and second-generation causal analysis. The “shallowness” of the causal knowledge gained in RCTs does not prevent them from being useful guides to practice; this is not the case with the associations found in behavior genetics.
The point of RCTs, in general, is to find actionable interventions. Although these interventions rely on “shallow” causal knowledge, and so cannot be expected to port reliably between different populations and environments, they can be expected to be effective in populations and environments similar to those in the RCT. When we, for example, change our prescribing practices based on the results of an RCT, we expect that the drug we are now prescribing will, on average, yield better results, assuming that the population is relevantly similar to that used by the RCT. Or, to take M&H's example, if we decided to promote early childhood education as a means of improving socially desirable outcomes, the RCT would give us a reason to expect those improvements, ceteris paribus. These actionable interventions represent “first-generation” causal knowledge, according to M&H, and can later potentially provide an entry into mechanistic understandings of causation and explanation, so called “second-generation” causal knowledge.
The situation in behavior genetics is nothing like this. Unlike in the case of RCTs, we cannot change the genetic variants associated with the phenotypic variation – and even if we could, doing so would be wildly irresponsible. A reliable finding that a particular childhood intervention has downstream effects that are significant both statistically and “clinically” (or socially, etc.) suggests a reasonable course of action, even if the relationship between the intervention and the outcome is shallow (non-unitary, non-uniform, non-explanatory). But what would we gain from even an accurate finding that a particular genetic variant was associated with downstream effects? If such knowledge was not “shallow,” we might be able to use information gleaned from such genetic associations to intervene environmentally – but the very nature of the shallowness of the relationship prevents our being able to make those kinds of interventions; we cannot change the genes, and gain no information useful for making environmental interventions that could not be equally or better gained from straightforward analyses of the effects of intervention on the traits themselves.
The “shallowness” of behavior genetics findings is generally more problematic than that typically encountered for RCTs. In behavior genetics, the associations between particular loci and behavioral traits are almost absurdly small, and the only reason that they rise to statistical significance at all is that genome-wide association studies (GWASs) can leverage sample sizes in the hundreds of thousands or millions. Even when accumulated across thousands of loci in polygenic scores, the application of polygenic scores is plagued by problems with portability because of effects of both real biological complexity and methodological artifacts (Kaplan & Fullerton, Reference Kaplan and Fullerton2022; Matthews, Reference Matthews2022). These issues, in addition to the generally low predictive power, render polygenic scores of little to no use at the individual level (Fusar-Poli, Rutten, van Os, Aguglia, & Guloksuz, Reference Fusar-Poli, Rutten, van Os, Aguglia and Guloksuz2022; Morris, Davies, & Smith, Reference Morris, Davies and Smith2020) and a large amount of uncertainty of estimates appears to hinder the ability to even accurately and consistently stratify individuals into high-risk groups (Ding et al., Reference Ding, Hou, Burch, Lapinska, Privé, Vilhjálmsson and Pasaniuc2022; Muslimova et al., Reference Muslimova, Dias Pereira, von Hinke, Van Kippersluis, Rietveld and Meddens2023; Schultz et al., Reference Schultz, Merikangas, Ruparel, Jacquemont, Glahn, Gur and Almasy2022).
Put bluntly, knowing that one drug is, say, 20% more effective than another is actionable; knowing that one allele is associated with a tiny fraction of a percent of the variance in the trait in that population is far less so. Finding associations with a small amount of variance is not necessarily useless; disease GWASs have sometimes highlighted promising mechanisms or pathways even from single-nucleotide polymorphisms (SNPs) with small effects; however, sociobehavioral GWASs have not provided any such specific and discrete pathways. Adding together many such alleles yield no more actionable information than any one of them. Indeed, this points toward another weakness of M&H – it elides the distinction between the results of GWASs and heritability estimates from classic “twin” studies. The former point toward loci that at least in principle might be useful for moving from “shallow” to “second-generation” causal analyses (though the tiny effect sizes make the value of this questionable). The latter point toward nothing actionable at all. While accumulating small effects into polygenic scores has increased values of R 2, there has been little elaboration of this kind of mechanistic knowledge revealed by functional enrichment or genetic correlations. This does not mean that such studies are necessarily useless – just that they do not produce the kinds of causal information about the influence of genes on behavior that anything useful can be done with (Kaplan & Turkheimer, Reference Kaplan and Turkheimer2021; Turkheimer, Reference Turkheimer2016).
There are a number of other important disanalogies that one might point toward. The controlled aspect of RCTs makes “red hair” effects (Jencks, Reference Jencks, Smith, Henry, Jo Bane, Cohen, Gintis and Michelson1972) much less likely (though not impossible). The randomizing element of RCTs avoids the problems that GWASs have with cryptic population structure. More generally, RCTs, by their nature, avoid many problems with the intervention of interest covarying systemically with the environment experienced – problems that plague both heritability estimates and GWASs.
In the end, the comparison between the causal information gleaned from RCTs and the results of behavior genetics highlights the weaknesses of the latter, and reveals that they share more in common with observational studies, including their weaknesses.
Competing interest
None.