13.1 Introduction
Reading is fascinating to think about. It represents a facet of human civilization that emerged relatively late (compared to, say, tools) yet is a skill required by modern society. The ability to read, coupled with the ability to write, substantiates literacy in its narrow definition. Defined broadly as “the ability to identify, understand, interpret, create, communicate, and compute using printed and written materials associated with varying contexts” (Reference MontoyaMontoya, 2018, p. 1), literacy remains a centerpiece, pointing to the very nature of the materials and the mastery of them. To reiterate: The referred materials should be printed and/or written and, therefore, require the mastery of reading skills. At the societal level, literacy rates determine the level of prosperity of a country and its place in the hierarchy of today’s world. At the group level, these rates differentiate ethnic, religious, or other groups and correlate with the socioeconomic position of a group. At the individual level, literacy (or reading and writing proficiency) is a powerful predictor of educational attainment (Reference Park and KyeiPark & Kyei, 2011), labor market placement (Reference Cherry and VignolesCherry & Vignoles, 2020), and happiness (Reference Angner, Miller, Ray, Saag and AllisonAngner et al., 2009). It is hard to think of another single skill that is as influential and important at the societal level as individual literacy. Consider physical aptitude or vocal capacity – both are highly appreciated, but neither is so valued that it functions as a gatekeeper to the higher levels of today’s society.
The fascination of reading has many angles. One such angle is the societal “decision” – or rather the conclusion of over a hundred years spent constructing “a social world in which much value is located in the individual” (Reference Soysal and StrangSoysal & Strang, 1989, p. 279) – to develop a system of state-run mass education that places its main emphasis on reading (Reference CipollaCipolla, 1969).Footnote 1 As a result, the skill that for centuries had been acquired by only a few became available to and required by billions of people. This transformation generated two challenges. The first was to develop a systematic (stage-based) and effective (quick and inexpensive) way of teaching reading that could be delivered to many people simultaneously. The second was to make sure that reading could be taught to many different people because even as long ago as in Plato’s Academy, the observation was made that students vary in how they learn to read, although the causes for this variation were contemplated only much later (Reference Kussmaul, von Ziemssen and McCreeryKussmaul, 1877).
Another angle is the speed with which the demand for the skill spread throughout the world. From the 48 copies of the Gutenberg Bible printed in 1450 in Latin to the ~227,000 new titles per month in 2020 in all written world languages, this dissemination of written materials assumes a market of eager readers who can consume such titles.Footnote 2 Given that most of these consumers were born during the twentieth and twenty-first centuries, reading has truly become a successful enterprise! At the time of writing, for the projected 2020 world population of 7,800,000,000, it was expected there would be ~34.92 new titles for every 100,000 people.
A third angle is the ever-changing texture of reading. From the ancient, as in Egyptian hieroglyphs (2680 BC), to the modern, as in Japanese Kanji, to the alphabetical-abstract phone–grapheme mapping that we see in the majority of today’s languages, especially those that first came to be written relatively recently (such as Ndebele in the 1960s); from oral, as in St. Augustine’s Confessions (by St. Ambrose in the middle 300s ad), to silent reading which is the form that proficient reading is generally considered to take today; from unfolded and fully sentenced, as in most printed materials such as books, mass media, and formal documents, to folded and abbreviated, as in today’s texting and symbolic prints such as NASDAQ information – reading has changed with the demands of the civilization that produced and uses it.
If my readers think that these characteristics of reading are not relevant to the discussion of its genetic bases, they are mistaken. All of these considerations, when contemplated holistically, define the parameters of the genetic system that is the foundation of literacy in general and (a)typical reading and writing in particular.Footnote 3 Yet it is a distal one, with the proximal foundation being the brain. In short, the genetic bases of (a)typical reading and writing are nothing more than the genetic bases of a brain that, pressured by the demands and opportunities imposed by modern society, has turned itself into a reading and writing (i.e., literate) brain.
13.2 Familiality, Heritability, and the Relative Risks of (A)Typical Reading
As soon as industrialized societies established the requirement that their members should attain certain levels of literacy and numeracy in exchange for a commitment to mass education, it became evident that there was great variation in the efficiency and quality of how children mastered these skills (Reference KerrKerr, 1897). In families educated by the same pedagogy delivered by the same teacher in the same classroom, students varied in their reading and writing skills.Footnote 4 Very early on in the research into the causes of this variation, the word “congenital” was used (Reference FisherFisher, 1905, Reference Fisher1910; Reference HinshelwoodHinshelwood, 1900, Reference Hinshelwood1902, Reference Hinshelwood1907; Reference StephensonStephenson, 1904, Reference Stephenson1907; Reference ThomasThomas, 1905), although an articulated position on the role of genes was introduced much later (Reference HallgrenHallgren, 1950), following an accumulation of facts substantiating the critical role of the brain in the manifestation of reading difficulties (Reference OrtonOrton, 1939).
Thus, since the second half of the last century, the field has attempted to understand how these “genetic factors,” deemed by Reference HallgrenHallgren (1950) as important to the development of reading difficulties, actually trigger these difficulties. This quest has been as systematic as it could be, given that at various points, it was inevitably limited by (1) the sophistication (or rather lack thereof) of the diagnostic materials necessary for defining phenotypes for genetically informed studies of reading difficulties; (2) the availability and cost of the molecular-genetic technologies that permit the specification of the genetic mechanisms thought to be of importance; and (3) the availability of computational methods, computer power and time to both estimate the role of the genome and connect reading and reading-related phenotypes to genetic influences.Footnote 5 Because the capacity for all three was low until the 1980s, studies conducted between 1950 and 1980 generated a diverse array of findings, underscoring the importance of genetic factors but doing so with relative imprecision and lack of specification (Reference BakwinBakwin, 1973; Reference Finucci, Guthrie, Childs, Abbey and ChildsFinucci et al., 1976; Reference HermannHermann, 1956; Reference NorrieNorrie, 1939; Reference WeinschenkWeinschenk, 1965; Reference Zerbin-RüdinZerbin-Rüdin, 1967). As theoretical and corresponding measurement developments accumulated by the early 1980s, some seminal articles set up the spiral development of the field. Among others, the following pioneering publications should be mentioned.
The publication by Lewitter et al. (Reference Lewitter, DeFries and ElstonLewitter, DeFries, & Elston, 1980) set off a line of important research by looking for the amount and type of reading difficulties aggregating in families, which had also been reported in earlier works (Reference FisherFisher, 1905, Reference Fisher1910; Reference HinshelwoodHinshelwood, 1900, Reference Hinshelwood1902, Reference Hinshelwood1907; Reference StephensonStephenson, 1904, Reference Stephenson1907; Reference ThomasThomas, 1905). This research was done in a systematic way by carrying out formal segregation analyses and fitting different genetic models. However, the attempt to identify this single model was not successful, as a variety of models were reported to fit the data. Specifically, major gene models (recessive [Reference Lewitter, DeFries and ElstonLewitter et al., 1980], dominant [Reference Gilger, Borecki, DeFries and PenningtonGilger et al., 1994; Reference Pennington, Gilger and PaulsPennington et al., 1991], and additive [Reference Pennington, Gilger and PaulsPennington et al., 1991]) and polygenic models (Reference Pennington, Gilger and PaulsPennington et al., 1991) were reported to be plausible based on their fit with various sets of family data. Although none of these early observations have been maintained, this line of research has been highly important. First, these early researchers were able, given the status of the corresponding assessments, to define reading difficulties not only through categorical clinical decisions (i.e., qualitative phenotypes) but also through continuous indicators (quantitative phenotypes). Second, these analyses generated such diverse patterns of results that they have shed light on the heterogeneity of the genetic mechanisms of reading and reading disability. It was initially assumed that these analyses had captured all types of these mechanisms, as different families might indeed present examples of different modes of genetic transmission. Yet, the current consensus, based on the usage of newly developed analytical approaches, promotes oligogenic models of inheritance for both reading and reading-related traits; these models involve many genes exerting moderate to low effects (Reference Hsu, Wijsman, Berninger and ThomsonHsu et al., 2002; Reference Naples, Chang, Katz and GrigorenkoNaples et al., 2009; Reference Wijsman, Peterson and LeuteneggerWijsman et al., 2000). Third, given the availability of many samples in the field, estimates of relative risk (Reference Ziegler, Konig and DeimelZiegler et al., 2005) consistent with heritability estimates (i.e., the proportion of the phenotypic variance controlled by genetic factors) for (a)typical reading and related traits have been generated.
The article by LaBuda et al. (Reference DeFries, Fulker and LaBudaDeFries, Fulker, & LaBuda, 1987) introduced the complexity of quantitative genetics into previously simplistic models of variance decomposition and set off a plethora of ever-more-sophisticated reports of the heritability of reading and reading-related skills. As the field has grown and acquired remarkable – both with respect to phenotypic-behavioral characterization and the sheer number of participants – genetically informed samples (i.e., samples structured by known degrees of biological relatedness – twins or other siblings, nuclear and extended families), it has been able to provide information on several important issues. Thus, compared to early twin studies, there are much more precise estimates of heritability for both reading performance and reading-related skills. In fact, the corresponding literature is so large that it could be meta-analyzed, and such analyses have indicated that heritability estimates, when error variance is taken into account, range between 41 and 74 percent for reading and up to 90 percent for reading-related processes (Reference GrigorenkoGrigorenko, 2004). Heritability estimates are reportedly not modulated by sex (Reference Hawke, Wadsworth, Olson and DeFriesHawke et al., 2007).
Yet, a host of variables do appear to differentiate heritability estimates, such as age (Reference Byrne, Coventry and OlsonByrne et al., 2009; Reference Harlaar, Trzaskowski, Dale and PlominHarlaar et al., 2014; Reference Soden, Christopher and HulslanderSoden et al., 2015) and ethnicity (Reference Grigorenko, Ngorosho, Jukes and BundyGrigorenko et al., 2006). Moreover, there is an observation that heritability estimates vary across the severity of difficulties, both for reading skills (Reference Hawke, Wadsworth, Olson and DeFriesHawke et al., 2007) and for reading-related indicators, such as IQ (Reference Wadsworth, Olson and DeFriesWadsworth, Olson, & DeFries, 2010), and working and short-term memory (Reference van Leeuwen, van den Berg, Peper, Hulshoff Pol and Boomsmavan Leeuwen et al., 2009). Heritability estimates appear to increase across the lifespan, although nonlinearly, as children get older (Reference Byrne, Wadsworth and CorleyByrne et al., 2005; Reference Kovas, Voronin and KaydalovKovas et al., 2013; Reference Lewis, Freebairn, Tag and BenchekLewis et al., 2018; Reference Samuelsson, Olson and WadsworthSamuelsson et al., 2007; Reference Soden, Christopher and HulslanderSoden et al., 2015; Reference Wadsworth, Corley, Hewitt and DeFriesWadsworth et al., 2001); yet, this might not be the case for all indicators of reading (Reference Tosto, Hayiou-Thomas and HarlaarTosto et al., 2017). Importantly, although reading is an acquired skill, its developmental stability is largely driven by genetic factors (Reference Harlaar, Trzaskowski, Dale and PlominHarlaar et al., 2014; Reference Soden, Christopher and HulslanderSoden et al., 2015). Interestingly, parental correlations on reading-related skills indicate the presence of assortative mating for some (Reference Naples, Chang, Katz and GrigorenkoNaples et al., 2009; Reference Swagerman, van Bergen and DolanSwagerman et al., 2017; Reference van Bergen, van Zuijen, Bishop and de Jongvan Bergen et al., 2017) reading-related variables, which might have important applied implications. Among other findings, there are some that reflect the fundamental assumption of quantitative genetics; specifically, that heritability reflects the influence of the genome that is unleashed when the environment is optimized and homogenized to be most beneficial for those who develop reading and reading-related skills. This phenomenon has been demonstrated in the literature on reading with regard to teaching instruction (Reference Taylor, Roehrig, Soden Hensler, Connor and SchatschneiderTaylor et al., 2010), parental education (Reference Friend, DeFries and OlsonFriend et al., 2009), and SES (Reference Hart, Soden, Johnson, Schatschneider and TaylorHart et al., 2013, Reference Hart, Soden, Johnson, Schatschneider and Taylor2014). Similarly, there are observations that heritability estimates are higher in societies with an egalitarian educational system, which reduces environmental variance (Reference Grasby, Coventry, Byrne and OlsonGrasby et al., 2019; Reference van Leeuwen, van den Berg, Peper, Hulshoff Pol and Boomsmavan Leeuwen et al., 2009). Of note is that, when considered meta-analytically, such variables as publication year, grade level, project, zygosity methods, and response type moderate heritability estimates obtained in twin studies (Reference Little, Haughbrook and HartLittle, Haughbrook, & Hart, 2017). Importantly, it has been demonstrated that children’s reading ability determines how much they choose to read, not the other way around (Reference Zeeuw, Beijsterveldt and DolanZeeuw et al., 2018), although reading exposure does strengthen reading skills. Although researched substantially less, the etiology of writing appears to mirror the etiology of reading (Reference Olson, Hulslander and ChristopherOlson et al., 2013).
Last but not least, the report by Smith et al. (Reference Smith, Kimberling, Pennington and LubsSmith et al., 1983) marked the first attempt to materialize the very elusive genetic factors and to translate the heritability of reading) and reading-related skills into specific molecular mechanisms. This report triggered the ongoing quest for the molecules and their pathways that lay the ground for the sociocultural construction of the literate brain.
13.3 Structural DNA Variation and (A)Typical Reading
As is the case for many other complex disorders, the field of molecular-genetic research within (a)typical reading is dominated by two major models of the overall genetic architecture of complex human traits. These models differ fundamentally both in their specification of the disability and their interpretation of normal variation in reading performance.
Up to now, the field of genetic studies of specific reading disability (SRDFootnote 6), commonly known as dyslexia (see Rigatti et al., Chapter 12 in this volume), has been dominated by the common disorder–common variant (CDCV) hypothesis (Reference Schork, Murray, Frazer and TopolSchork et al., 2009), according to which SRD arises in the polygenic background: the inheritance of multiple common genetic risk variants are individually characterized by small effect sizes, but they collectively represent a certain liability threshold above which the disability is manifested. It is also assumed that some of these risk variants are general to all facets of SRD and, perhaps, to SRD and other learning disabilities (Reference Plomin and KovasPlomin & Kovas, 2005), whereas others are SRD specific and even reading-component specific (Reference Naples, Chang, Katz and GrigorenkoNaples et al., 2009). These partial overlaps of risk variants can explain substantial, but far from 1, genetic correlations between different reading-related componential processes, both in typical reading and SRD. When other sources of variance (e.g., age, ethnicity, SES, quality of teaching) are considered, they can also explain differential heritability estimates for different reading-related componential processes and their fluctuations. Finally, the CDCV hypothesis assumes continuity between typical and atypical states (i.e., a single underlying trait that defines various states), as the common risk variants are present in the general population at the levels below the liability threshold, and this presence guarantees a continuity of reading performance.
On the contrary, the common disorder–rare variant (CDRV) hypothesis (Reference Schork, Murray, Frazer and TopolSchork et al., 2009) assumes that each case (or almost each case) is caused by a single rare variant of large effect size and that these variants can occur in different genes in different families/individuals. This hypothesis can explain the robust findings of the genetic underpinning for specific families and the lack of generalizability of these findings to heterogeneous SRD samples. Although both hypotheses and their multiple versions are implicitly present in the literature on SRD, they have not been systematically tested. This is especially the case for the CDRV hypothesis, as the supporting evidence is circumstantial rather than systematic. In order to describe the frontiers of research with regard to both hypotheses, a brief introduction on the terminology, conceptual apparatus, and common grounds upon which the corresponding studies have been generated is needed.
The decade of studies into the genetic architecture of common human traits/disorders, which unfolded after the release of the first draft of the human genome, has led to a number of realizations. Specifically, it has become evident that the statistical power of genome-wide association (GWA) studies of complex traits/diseases, even with sample sizes considerably larger than employed previously (e.g., thousands of cases and controls), remains low (Reference Manolio, Brooks and CollinsManolio, Brooks, & Collins, 2008; Reference Rodriguez-Murillo and GreenbergRodriguez-Murillo & Greenberg, 2008; Reference Tenesa, Farrington and PrendergastTenesa, Farrington, & Prendergast, 2008; The Wellcome Trust Case Control Consortium, 2007). Two leading explanations have been put forth to explain this lack of power and progress in identifying causal common variants for common traits/disorders.
One points to the small genetic effects, thought to be significant for susceptibility loci associated with complex traits/diseases, for example, odds ratios of 1.1–1.5 (Reference Ioannidis, Trikalinos and KhouryIoannidis, Trikalinos, & Khoury, 2006; Reference Manolio, Brooks and CollinsManolio et al., 2008). The other effect is the remarkable heterogeneous genetic mechanisms underlying what are perceived as homogeneous behaviorally common traits/diseases (Reference Munson, Dawson and SterlingMunson et al., 2008; Reference Ring, Woodbury-Smith, Watson, Wheelwright and Baron-CohenRing et al., 2008; Reference Simms, Kemper, Timbie, Bauman and BlattSimms et al., 2009; Reference SutcliffeSutcliffe, 2008; Reference WeissWeiss, 2009). To address the issue of effect sizes, even larger samples have been called for (Reference ManolioManolio, 2010), although there have been some disagreements regarding the potential yield of these calls (Reference Stein and ElstonStein & Elston, 2009). To understand the heterogeneity, there has been a growing interest in the role of rare genetic variants in the etiology of complex disorders (Reference Ahituv, Kavaslar and SchackwitzAhituv et al., 2007; Reference Cohen, Kiss and PertsemlidisCohen et al., 2004; Reference Cohen, Pertsemlidis and FahmiCohen et al., 2006; Reference Ji, Foo and O’RoakJi et al., 2008; Reference Romeo, Pennacchio and FuRomeo et al., 2007; Reference Romeo, Wu and KozlitinaRomeo et al., 2009; Reference Zhu, Feng, Li, Lu and ElstonZhu et al., 2010). The impact of a rare variant is often circumscribed to specific isolated families or even specific individuals. This, in turn, assumes the presence of many heterogeneous rare variants, both transmitted in families and arising de novo, the large-effect (Reference Gorlov, Gorlova, Sunyaev, Spitz and AmosGorlov, Gorlova et al., 2008) impact of which triggers homogeneous (or semihomogeneous) behavior manifestations. Capitalizing on these observations, a number of theoretical analyses (Reference PritchardPritchard, 2001; Reference Pritchard and CoxPritchard & Cox, 2002; Reference Reich and LanderReich & Lander, 2001) and simulation studies (Reference Peng and KimmelPeng & Kimmel, 2007; Reference PritchardPritchard, 2001; Reference Reich and LanderReich & Lander, 2001) of the genetics of common diseases/disorders have been conducted.
Numerous interesting observations have been made in this research (Reference Peng and KimmelPeng & Kimmel, 2007). One of these, which is particularly pertinent to this discussion, is that if the genetic etiology of a common disease/disorder assumes the involvement of multiple loci (i.e., if one of the major assumptions of the CDCV hypothesis is invoked), then a diverse allelic spectrum with rare causal alleles should be anticipated for at least some of these loci (i.e., one of the major assumptions of the CDRV is summoned as well). This diversity is substantiated by the existence of a much larger than anticipated array of normal genetic/genomic variation that has to be taken into account when searching for the candidate genes or functional elements associated with common traits/disorders (Reference Conrad, Pinto and RedonConrad et al., 2009).
More specifically, the central assumption dominating the field upon the completion of the first draft of the human genome (Reference Lander, Linton and BirrenLander et al., 2001) was that delineating the main type of genomic/genetic variation between two individual human beings would be captured primarily (although not exclusively) by their complement of single nucleotide polymorphisms (SNPs, a common variation in each of the nucleotides, A, T, C, or G, of the DNA sequence). This type of variation was expected to affect about 0.1 percent of the total genomic sequence. Yet, subsequent research has not confirmed that delineation and, instead, has revealed copy-number-variation and structural-variation (CNV/SV) to be the main sources of variation between humans.Footnote 7 Indeed, the completion of additional human genome sequences overrode the initial estimate of 0.1 percent and stipulated that the degree of variation between two “normal” individual genomes is much larger. The nature of this variation is very complex: In addition to SNPs, each genome contains an abundance of very small insertions and deletions (indels, the insertion or the deletion of bases in the DNA sequence longer than 1 nucleotide), and a large amount of CNV/SV, where entire blocks of the DNA sequence, ranging in size from just 1 kb to several mb, have been inserted, deleted, inverted, or translocated (Reference Conrad, Pinto and RedonConrad et al., 2009).
To illustrate, about 45 percent of the human genome consists of transposons and other retroelements such as LINE-1 and Alu sequences (Reference Lander, Linton and BirrenLander et al., 2001). In our understanding of the formation and role of CNV/SV, such retroelements are multifunctional. Specifically, their excision from one position or insertion into another constitutes a smaller CNV in itself. Yet, simultaneously, retroelements are frequently clustered around the ends (breakpoints) of larger CNV/SV, leading to the hypothesis that they can give rise to larger events through mechanisms such as nonallelic homologous recombination, NAHR (Reference Kim, Lam and UrbanKim et al., 2008), a mechanism that is thought to be responsible for genomic syndromes such as DiGeorge Syndrome (Reference Dittwald, Gambin and SzafranskiDittwald et al., 2013), the core of whose manifestation includes, among other features, learning difficulties.
The role of transposable elements in evolution in general and in human evolution, in particular, has long been recognized (Reference Skipper, Andersen, Sharma and MikkelsenSkipper et al., 2013). Recent literature has accumulated evidence regarding the potentially important role of transposable elements (activity). Their role appears to be important in both typical and atypical pathways of development, especially in the nervous system differentiation (Reference Duranthon, Beaujean and BrunnerDuranthon et al., 2012). Transposition events can interfere with the function of the genome by inserting themselves directly into transcribed sequences, disrupting genes themselves or their regulatory elements (or nearby elements), and thereby altering their activity. The quest to understand the role of transposable elements in a genome-wide fashion is ongoing and has only become possible with modern genomic technology. Currently, it is estimated that about 0.05 percent of all transposable elements are still capable of transposition and that about thirty-five to fifty subfamilies of Alu, LINE-1, and SVA elements remain actively mobile (Reference Dewannieux and HeidmannDewannieux & Heidmann, 2013).
Parallel to this fundamental general realization depicting the nature of normal variation in the human genome is the appearance of an increasing number of studies that link not only point mutations and functional SNPs but also events such as CNV/SV to phenotypic effects, both in atypically and typically developing individuals. Yet, technology is only now becoming available that may allow the linking of comprehensively mapped genotypes consisting of all classes/sizes of variation events to clinical phenotypes. Normal human genomic variation has to be taken into account when trying to understand complex traits/disorders, both in terms of causative and modifying events, if indeed a complex trait/disorder is caused not by a single genetic event of strong effect but a combination of variants each of small effect, or by a rare variant of medium-strong effect embedded in a background of modifying normal variants (i.e., the merge of CDCV and CDRV hypothesis). The data on both normal and disorder-related human genomic variation have been rapidly accumulating through large-scope collaborative efforts (e.g., 1000 Genomes Project, 100,000 Genomes Project).
The chapter so far, I hope, has illustrated the “repertoire” available to the genome to control both typical and atypical development. Clearly, reading development is not an exception and, as it appears now, the genome has possibly exercised many of its tricks in substantiating the diversity in the human brain structure and function, which is reflected, in turn, in the diversity of human development in general and reading development in particular. Next, I will present a brief overview of the relevant literature that attempts to illustrate how the two hypotheses, CDCV and CDRV, may be exemplified in the current literature on the molecular-genetic bases of typical and atypical reading.
The CDCV hypothesis has capitalized on the frequent manifestation of reading difficulties in the general population, while the other, CDRC, has capitalized on the infrequent manifestation of severe reading difficulties in extended families highly dense with SRD. These studies (Reference GrigorenkoGrigorenko, 2005) can be subdivided into a number of major overlapping categories by the type of samples they engage with (i.e., genetically unrelated cases/probands and matched controls or family units such as siblings or nuclear and extended families) and by the type of genetic units they target (i.e., specific genes, specific genetic regions, or the whole genome). For all approaches combined, there are references to at least twenty (Reference Schumacher, Hoffmann, Schmal, Schulte-Korne and NothenSchumacher et al., 2007) potential genetic susceptibility loci (i.e., regions of the genome that have demonstrated a statistically significant linkage to SRD; typically these regions involve more than one and often hundreds of genes) and about a dozen “official” (Reference Grigorenko, Naples, McCardle and PughGrigorenko & Naples, 2009; Reference Peterson and PenningtonPeterson & Pennington, 2012) candidate genes (i.e., genes located within susceptibility loci that have been statistically associated with SRD). Yet, none of these loci or genes have been either fully accepted or fully rejected by the field. Moreover, new regions and candidate genes are being presented on a regular basis, and both lists are likely to continue expanding (Reference Becker, Vasconcelos and OliveiraBecker et al., 2017; Reference Rubenstein, Matsushita, Berninger, Raskind and WijsmanRubenstein et al., 2011).
Numerous genome-wide screens for SRD and reading- (and writing)-related components have been reported (Reference Brkanac, Chapman and IgoBrkanac et al., 2008; Reference de Kovel, Hol and Heisterde Kovel et al., 2004; Reference Eicher, Powers and MillerEicher et al., 2013; Reference Fagerheim, Raeymaekers and TonnessenFagerheim et al., 1999; Reference Field, Shumansky and RyanField et al., 2013; Reference Fisher, Francks and MarlowFisher et al., 2002; Reference Gialluisi, Newbury and WilcuttGialluisi et al., 2014; Reference Gialluisi, Andlauer and Mirza-SchreiberGialluisi et al., 2019; Reference Igo, Chapman and BerningerIgo et al., 2006; Reference Kaminen, Hannula-Jouppi and KestilaKaminen et al., 2003; Reference Luciano, Evans and HansellLuciano et al., 2013; Reference Meaburn, Harlaar, Craig, Schalkwyk and PlominMeaburn et al., 2008; Reference Nopola-Hemmi, Myllyluoma and VoutilainenNopola-Hemmi et al., 2002; Reference Price, Wigg and FengPrice et al., 2020; Reference Raskind, Igo and ChapmanRaskind et al., 2005; Reference Roeske, Ludwig and NeuhoffRoeske et al., 2011; Reference SvenssonSvensson, 2011; Reference Truong, Adams and BoadaTruong et al., 2017). These studies, driven by the CDCV hypothesis, utilized hundreds of thousands of genetic markers as technology and cost permitted. These studies have generated numerous suggestive findings, but the overwhelming majority of them are inconsistent, with a low replicability coefficient and small effects. In discussing this pattern of results, typically, a reference is made to the low statistical power (as evidenced through relatively small sample sizes) of the original whole-genome studies (Reference Gialluisi, Andlauer and Mirza-SchreiberGialluisi et al., 2019), although newer studies with large samples (Reference Gialluisi, Andlauer and Mirza-SchreiberGialluisi et al., 2019; Reference Price, Wigg and FengPrice et al., 2020) still produce inconsistent results, suggesting that the difficulty in creating a texture of replicable findings is related not only to the issue of statistical power but likely also to some other issues, such as the overall credibility of the CDCV hypothesis (Reference GibsonGibson, 2012) or the categorization of developmental disorders (Reference Peters and AnsariPeters & Ansari, 2019). There are also studies that focus on particular regions of the genome (Reference Deffenbacher, Kenyon and HooverDeffenbacher et al., 2004; Reference Francks, Paracchini and SmithFrancks et al., 2004). The selection of these regions is typically determined either by a previous whole-genome scan or by a theoretical hypothesis capitalizing on SRD and its componential processes (Reference Skiba, Landi, Wagner and GrigorenkoSkiba et al., 2011).Footnote 8
Yet, some of the studies settled on candidate regions through different means, such as a known chromosomal aberration, that is, through the verification of the CDRV hypothesis. Denmark, for example, has a health policy of screening all newborns for macro-chromosomal changes (e.g., large rearrangements). In these cases, researchers can screen individuals who have such rearrangements for the presence of SRD (Reference Buonincontri, Bache and SilahtarogluBuonincontri et al., 2011). The hypothesis is that a gene affected by such an aberration is somehow related to SRD. In addition, as the ultimate goal of this work is to identify specific genes whose functions are related to the transformation of a brain into a reading brain, a number of candidate genes for reading difficulties have been identified – ROBO1 (Reference Hannula-Jouppi, Kaminen-Ahola and TaipaleHannula-Jouppi et al., 2005), DYX1C1 (Reference Taipale, Kaminen and Nopola-HemmiTaipale et al., 2003), and SEMA6DFootnote 9 (Reference Ercan-Sencicek, Davis Wright and SandersErcan-Sencicek et al., 2012). All of these genes have been detected through studies of single extended families (Reference Taipale, Kaminen and Nopola-HemmiTaipale et al., 2003) or individual cases (Reference Ercan-Sencicek, Davis Wright and SandersErcan-Sencicek et al., 2012).Footnote 10 Systematic explorations of the importance of different types of structural variation in the field of reading have been few and – on large events, that is, insertions and deletions larger than 1mb (Reference Girirajan, Brkanac and CoeGirirajan et al., 2011) and copy number variants, CNV – smaller in size, with a median total length of ~640 kb covered by CNVs per sample, or ~479 kb, considering only CNVs annotated to genes (Reference Gialluisi, Visconti and WillcuttGialluisi et al., 2016). It is important to stress that large structural variants are relatively rare (e.g., <1 percent of the general population), and the underlying assumption here is that the identification of such rare variants will provide a clue for subsequent studies of the gene(s) affected by this structural alteration or the pathway in which this gene(s) is(are) involved. It is especially relevant to investigations of the genetic bases of complex traits such as reading abilities or disabilities. The idea is that once a rare variant is identified and associated with a particular trait (e.g., reading), there is a need to investigate common variance in the gene/region that was impacted by this rare variant. In the field of reading, an example of such a transition from a rare variant to a continuous trait is the research on ROBO1 (Reference Bates, Luciano, Montgomery, Wright and MartinBates, Luciano, Montgomery et al., 2011).
As we have indicated, there are “official” candidate genes being evaluated as causal genes for DD and reading-related difficulties in at least two independent studies or datasets. These include:
DYX1C1, now referred to as DNAAF4 (Reference Currier, Etchegaray, Haight, Galaburda and RosenCurrier et al., 2011; Reference Taipale, Kaminen and Nopola-HemmiTaipale et al., 2003) at 15q21;
KIAA0319 (Reference Cope, Harold and HillCope et al., 2005; Reference Dennis, Paracchini and ScerriDennis et al., 2009; Reference Francks, Paracchini and SmithFrancks et al., 2004; Reference Harold, Paracchini and ScerriHarold et al., 2006; Reference Sánchez-Morán, Hernández and DuñabeitiaSánchez-Morán et al., 2018) at 6p22;
DCDC2 (Reference Li, Malins and DeMilleLi et al., 2018; Reference Marino, Meng and MascherettiMarino et al., 2012; Reference Meng, Smith and HagerMeng et al., 2005; Reference Riva, Mozzi and ForniRiva et al., 2019; Reference Schumacher, Anthoni and DahdouhSchumacher et al., 2006) at 6p22;
ROBO1 (Reference Bates, Luciano, Montgomery, Wright and MartinBates, Luciano, Medland et al., 2011; Reference Hannula-Jouppi, Kaminen-Ahola and TaipaleHannula-Jouppi et al., 2005; Reference Tran, Wigg and ZhangTran et al., 2014) at 3p12,
GRIN2B (Reference Ludwig, Roeske and HermsLudwig et al., 2010; Reference Mascheretti, Facoetti and GiordaMascheretti et al., 2015) at 12p13;
FOXP2 (Reference Peter, Raskind and MatsushitaPeter et al., 2011; Reference Sánchez-Morán, Hernández and DuñabeitiaSánchez-Morán et al., 2018; Reference Wilcke, Ligges and BurkhardtWilcke et al., 2012) at 7q21.1;
CNTNAP2 at 7q25 (Reference Newbury, Paracchini and ScerriNewbury et al., 2011; Reference Peter, Raskind and MatsushitaPeter et al., 2011; Reference Vernes, Newbury and AbrahamsVernes et al., 2008).
However, more genes have been reported as putative additions to this list (e.g., Reference Buonincontri, Bache and SilahtarogluBuonincontri et al., 2011; Reference Ercan-Sencicek, Davis Wright and SandersErcan-Sencicek et al., 2012; Reference Newbury, Paracchini and ScerriNewbury et al., 2011; Reference Scerri, Paracchini and MorrisScerri et al., 2010). At this point, the field contains both support and lack of support for the involvement of each of these genes; thus, the findings are somewhat difficult to interpret. In addition, there is an ongoing debate regarding the specificity of the impact of SRD-related genes. There is growing evidence that a high degree of pleiotropy is exerted by at least some of these genes. Thus, the DYX1C1/DNAAF4 gene has been implicated in ciliary dyskinesia (PCD) – a disorder manifested through chronic airway disease, laterality defects, and male infertility (Reference Tarkar, Loges and SlagleTarkar et al., 2013). The DCDC2 gene has been shown to play a role in kidney disease (Reference Schueler, Braun and ChandrasekarSchueler et al., 2015). Importantly, many of the “SRD (which can also be SWD)” genes have been implicated in other neurodevelopmental and psychiatric disorders. Specifically, KIAA0319 has been featured in speech and sound disorders (SSD) (Reference Eicher, Stein and DengEicher et al., 2015), language difficulties (Reference Rice, Smith and GayánRice, Smith, & Gayán, 2009), ADHD (Reference Mascheretti, Trezzi and GiordaMascheretti et al., 2017), and Autism Spectrum Disorder (ASD) (Reference Eicher and GruenEicher & Gruen, 2015). DCDC2 has been associated with disorders such as SSD (Reference Eicher, Stein and DengEicher et al., 2015) and ADHD (Reference Mascheretti, Trezzi and GiordaMascheretti et al., 2017), but also with general cognitive ability (Reference Davies, Lam and HarrisDavies et al., 2018). ROBO1 has been identified as an ASD-associated gene (Reference Anitha, Nakamura and YamadaAnitha et al., 2008; Reference Iossifov, Ronemus and LevyIossifov et al., 2012), and its pathway, SLIT/ROBO, has been featured in the literature on schizophrenia (Reference Brennand, Simone and JouBrennand et al., 2011). Clearly, more time and effort will be needed to understand each gene’s involvement with reading and its related processes.
13.4 Conclusions and Discussion
In this brief chapter, I have attempted to trace the evolution of the genetic bases of (a)typical reading and writing, that is, the two fundamental skills that underlie the acquisition of literacy. While summarizing and interpreting the literature, I have underscored the general differentiation of the CDCV and CDRV hypotheses and lines of research into the following bases: from simplistic to complex, from one kind to many kinds, from deterministic to highly probabilistic, and from responsive to environmental pressures. These two hypotheses still dominate the literature, so there is a growing understanding that they are likely to only partially capture the complexities of the genetic mechanisms involved in the manifestation of complex human traits and the related neurodevelopmental disorders.
Given the impetus behind this volume, two issues appear to be particularly important to consider. The first issue pertains to the universality of the mechanisms discussed in this chapter regarding the linguistic and cultural diversity in which global literacy exists. Notably, the overwhelming majority of both quantitative and molecular-genetic studies of reading and writing have been carried out with English- (or, less frequently, other European languages-) speaking samples and in high- and middle-income countries. When considered collectively, they reflect a pattern of results that demonstrates enough conversion to believe that, fundamentally, the genetic mechanisms that set up the brain for the acquisition of reading and writing are the same. Yet, the field is still at the very beginning of the inquiry into these mechanisms and their specificity and/or generality. Clearly, more studies in non-European languages that have writing systems around the world will be necessary to appraise the extent of the universality of the genetic mechanisms that preconfigure the human brain for acquiring literacies in their narrow and broad definitions.
Relatedly, more studies are needed to sample not only different languages but also different countries and cultures. This might appear counterintuitive as the focus is on genetic mechanisms, but two considerations are particularly important here. The first pertains to the nature of quantitative genetic approaches that generate heritability estimates. As the phenotypic variability for any trait within the corresponding framework is always fixed at 100 percent, the variability of its sources, additively, is also fixed at 100 percent. To illustrate, the variability in schooling in high-income countries is substantially lower than that in lower-income countries, so it is likely that the distribution of heritability and environmentality (the estimate of the importance of merged, that is, all contributing environments) will also be different. When environmental variability is constrained and every child gets quality schooling, individual differences will be more sensitive to the influences of genetic factors. When environmental variability is huge, and some children go to high-quality private schools, whereas others do not go to school at all, the environment overrides the genetic variation. The propensity of a human brain to transform into a literate brain then depends on print exposure and effective teachers because children cannot learn to read and write without being taught how to read and write. Also, essentially, the role of the genetic mechanism in reshaping the illiterate brain of a young child into the literate brain of an adult only establishes the range of individual differences for the parameters defining this transformation (e.g., the maximum and minimum speed and accuracy with which people as sampled from the general population, however defined, can acquire the skills of reading and writing); it does not determine the mean value in the population. That mean value is established and constantly redefined by a society which forms expectations regarding the types and depth of orthographies it requires (see Perfetti & Verhoeven, Chapter 11 in this volume). And most importantly, these expectations are translated from societies to their child-members by educators; collectively, teachers in their well-organized evidence-based classroom can continuously move this mean up, substantially and perhaps indefinitely!
I started this chapter with the features of reading (and writing) that I find fascinating and highlighted the very quick ascent of reading to the pinnacle of the hierarchy of human skills, its universal value among millions of people, and its ever-changing texture. For these features to be present, the brain must exercise plasticity in its absolute essence. And for the brain to become a reading and writing brain (and a literate brain) with ever-expanding capacities for literacy (as I believe literacy will diversify its presentation and, likely, become more and more central to human civilization), it has to be substantiated by a set of diverse genetic/genomic mechanisms that are multiple, complex, and elegant. These mechanisms can and must be cataloged. Just give the field some time.
Author Note
This work was supported by the US National Institutes of Health, awards P50 HD052117 (PI: Fletcher), P20 HD091005 (PI: Grigorenko), and P50 HD052120 (PI: Wagner). Grantees are encouraged to express their professional judgment; this essay does not necessarily represent the policies or positions of the NIH. I would like to thank Ms. Nicole Guha for her editorial support.