Atypical Reinforcement Learning in Developmental Dyslexia

Abstract Objectives: According to the Procedural Deficit Hypothesis, abnormalities in corticostriatal pathways could account for the language-related deficits observed in developmental dyslexia. The same neural network has also been implicated in the ability to learn contingencies based on trial and error (i.e., reinforcement learning [RL]). On this basis, the present study tested the assumption that dyslexic individuals would be impaired in RL compared with neurotypicals in two different tasks. Methods: In a probabilistic selection task, participants were required to learn reinforcement contingencies based on probabilistic feedback. In an implicit transitive inference task, participants were also required to base their decisions on reinforcement histories, but feedback was deterministic and stimulus pairs were partially overlapping, such that participants were required to learn hierarchical relations. Results: Across tasks, results revealed that although the ability to learn from positive/negative feedback did not differ between the two groups, the learning of reinforcement contingencies was poorer in the dyslexia group compared with the neurotypicals group. Furthermore, in novel test pairs where previously learned information was presented in new combinations, dyslexic individuals performed similarly to neurotypicals. Conclusions: Taken together, these results suggest that learning of reinforcement contingencies occurs less robustly in individuals with developmental dyslexia. Inferences for the neuro-cognitive mechanisms of developmental dyslexia are discussed.


INTRODUCTION
Developmental dyslexia is one of the most common neurodevelopmental disorders, characterized by a selective impairment in reading skill acquisition despite conventional instruction, adequate intelligence, and sociocultural opportunity. The dominant hypothesis as to the etiology of dyslexia proposes a deficit in direct access to, and manipulation of, phonemic language units retrieved from long-term declarative memory as the underlying cause of dyslexia (Snowling, 2000). Yet, burgeoning research indicates that people with dyslexia have a wide range of nonlinguistic deficits that are difficult to explain by a phonological impairment (Farmer & Klein, 1995;Fawcett & Nicolson, 2019;Lum, Ullman, & Conti-Ramsden, 2013).

Procedural Learning Dysfunction in Dyslexia
Recent conceptualizations of dyslexia implicate domaingeneral procedural learning systems in its etiology.

Reinforcement Learning
RL is the process by which individual learn by trial and error to make choices which will exploit the likelihood of rewards and minimize the occurrence of penalties (Sutton & Barto, 1998). The learner is not told explicitly which action to take, but instead must discover which actions yield the highest reward by trying them out. RL has been shown to be critically dependent on the basal ganglia (Daw, Niv, & Dayan, 2005;Schultz, Dayan, & Montague, 1997;Schultz, 1999). Frank, Seeberger, and O'reilly (2004) suggested a neurocomputational model in which dopamine levels play a significant role in RL. According to their model, the basal ganglia acts as a gating system that reinforces neural firing in the frontal cortex, which is interconnected to appropriate actions while suppressing actions that are less appropriate. The gating function of the basal ganglia is suggested to be modulated by the dopaminergic system. In order to test their model, Frank et al. employed two procedural RL tasks in patients with Parkinson's who either received or did not receive dopamine medications. In the Probabilistic Selection (PS) Task employed by Frank et al., participants are required to choose one of two presented stimuli, based on reinforcing feedback. The feedback is probabilistic, such that participants need to learn which is the most frequent rewarded outcome based on reinforcement histories. Several stimuli are more strongly associated with positive feedback and others with negative feedback (for instance, the stimulus A is associated 80% of the time with positive feedback, whereas stimulus B is associated 20% of the time with positive feedback). Participants are therefore required to learn to choose A over B depending on either positive or negative feedback or both. In a test phase, participants are presented with novel pairs in which the original stimulus is paired with a new one. Examination of participants' performance during this test enables one to assess whether participants are more inclined to use positive (will choose A in all combinations that contain A) or negative (will avoid B in all combinations that contain b) learning strategies.
The other task employed by Frank et al. is an implicit Transitive Inference Task (TI). Although transitive inference (choosing A over C based on knowing that A is better than B and that B is better than C) is assumed to reflect a declarative logical inference process, it is also possible to learn such hierarchal relations implicitly based on associative learning. In a typical TI problem, the reinforcement for each stimulus is deterministic but stimulus pairs partially overlap. In particular, participants are trained on a series of simultaneous discrimination problems (AþB−, BþC−, CþD−, DþE−, EþF−) where "þ" and "−" refer to the rewarded and non-rewarded choices, respectively. A hierarchy (A > B > C > D > E) is learned, whereby stimuli close to the top of the hierarchy develop a net positive associative strength, and those near the bottom develop a net negative associative strength. Consistent with their model, Frank et al. observed that Parkinson's patients off medication were better at learning to avoid choices that lead to negative outcomes than at learning from positive outcomes across both the PS and TI tasks. Furthermore, medications designed to increase the level of striatal dopamine reversed this bias. In a later study, Frank, O'Reilly, and Curran (2006) showed that a drug aimed at blocking hippocampal function did not impair learning in either the PS or TI tasks, implying that these processes are independent of hippocampal function. Notably, Smith and Squire (2005) observed that patients with hippocampal damage were impaired in novel test pairs that were not encountered during the training phase (e.g. the transitive pair BD), suggesting that a mixture of associative and relational processes occurs in the TI task. Since Frank's seminal study, the PS and TI tasks were tested in many special (Frank, Santamaria, O'Reilly, & Willcutt, 2007;Lee & Tomblin, 2012;Solomon et al., 2015; and patient populations (Titone, Ditman, Holzman, Eichenbaum, & Levy, 2004;Waltz, Frank, Robinson, & Gold, 2007) but not among individuals with dyslexia.
The PDH suggests that developmental dyslexia arises from a selective dysfunction in the corticostriatal network, whereas medial temporal lobe (MTL) structures are hypothesized to be intact or even enhanced (Krishnan et al., 2016;Ullman et al., 2020). Therefore, one could hypothesize that RL would be impaired in dyslexia. Here, we tested this hypothesis by examining the PS and TI tasks among individuals and with dyslexia and typical readers.

Participants
The sample consisted of 40 university students, 20 individuals with dyslexia and 20 typical readers. All were native speakers of Hebrew with no history of neurological disorders, psychiatric disorders, or attention deficits (according to the American Psychiatric Association, 2000). In addition, all participants had normal or corrected-to-normal vision and had normal hearing. The dyslexia group was recruited from the Yael Learning Disabilities Center at the University of Haifa, Israel. A documented diagnosis of a comorbid learning disability such as Attention Deficit Hyperactivity Disorder Reinforcement Learning in Dyslexia 271 (ADHD) or Specific Language Impairment (SLI) or any sensory or neurological impairment served as exclusion criteria. The inclusion criteria for the dyslexia group were (1) a formal diagnosis of dyslexia by a qualified psychologist; (2) a score of at least one standard deviation below the average of the local norms in tests of phonological decoding (non-word reading). Since there are no standardized reading tests for adults in Hebrew, selection was based on local norms, using similar criteria to other studies conducted on Hebrew readers with dyslexia (Gabay, Najjar, & Reinisch, 2019;Weiss, Katzir, & Bitan, 2016;Yael, Tami, & Tali, 2015). Scores of one standard deviation below the mean of the local norms were chosen following the standard practice in the Hebrew literature (Breznitz & Misra, 2003;Shany & Breznitz, 2011). The control group consisted of individuals with no reading problems (i.e., above the inclusion criteria of the dyslexia group on the non-word reading test) and the same level of cognitive ability. The study was approved by the Institutional Review Board of the University of Haifa and was conducted in accordance with the Declaration of Helsinki. Written informed consent was obtained from all participants, who were compensated for their participation in the study (120 new Israeli shekels, approximately $30). All participants performed a series of cognitive tests to evaluate general cognitive ability, verbal working memory, rapid automatized naming reading skills, and phonological awareness. Details about these tasks are presented in Table 1. As indicated by results shown in Table 2, the groups did not differ in age or cognitive ability. However, compared to the control group, the dyslexia group displayed a profile of reading disability compatible with the symptomatology of dyslexia.

Experimental Procedure
Two RL tasks were used following the exact same procedure conducted by Frank, Seeberger and O'reilly (2004). A previous study demonstrated good test-retest reliability for similar learning tasks (Weidinger, Gradassi, Molleman, & van den Bos, 2019).

Probabilistic Selection Task
Three different stimulus pairs (AB, CD, and EF) were presented randomly in the PS task (see Figure 1a). For each pair, participants were required to learn to choose one of the two stimuli. After the participant's choice, probabilistic feedback followed to indicate whether the choice was correct or incorrect. In 80% of AB trials, a choice of stimulus A led to correct (positive) feedback, whereas only in 20% of these trials, the B choice led to a positive feedback. CD and EF pairs were less reliable: Stimulus C was correct in 70% of CD trials, whereas E was correct in 60% of EF trials. Participants practiced the task until they reach a performance criterion in order to ensure  (Wechsler, 1997). The Block Design subtest first requires breaking down each design presented into logical units and then a reasoned manipulation of blocks to reconstruct the original design from separate parts. Task administration is discontinued after a failure of two blocks. 2. Digit Span Subtest. Verbal working memory was assessed by the Digit Span subtest from the Wechsler Intelligence test for adults (Wechsler, 1997). In this task, participants are required to recall the names of the digits presented auditorily in the order they appeared with a maximum of total raw score 28. Task administration is discontinued after a failure to recall two trials with a similar length of digits. Test reliability coefficient is .9 3. Rapid Automatized Naming -Naming skills were assessed by the Rapid Automatized Naming task (RAN) (Breznitz, 2003). The tasks require oral naming of rows of visually-presented exemplars drawn from a constant category (RAN colors, RAN categories, RAN numerals, and RAN letters). It requires not only the retrieval of a familiar phonological code for each stimulus but also coordination of phonological and visual (color) or orthographic (alphanumeric) information quickly in time. 4. The One Minute Test of Words (Shatil, 1995b) and the One Minute Test of Non-words (Shatil, 1995a), which assessed the number of words and non-words accurately read aloud within one minute. The One Minute Test of Words contains 168 non-vowelized words of an equivalent level of difficulty, listed in columns. Both accuracy (number of correct words read per minute) and speed (number of items read per minute) were measured. The One Minute Test of Non-words contains 86 successively difficult vowelized non-words, listed in seven columns. Both accuracy (number of correct words read per minute) and speed (number of items read per minute) were measured. 5. The Phoneme Deletion test (Breznitz & Misra, 2003) and Spoonerism Test (developed by Peleg and Ben-Dror) were used to assess phonological awareness. The Phoneme Deletion test contains 25 non-words. In this test, the experimenter read a word and a specific phoneme, and the participant was required to repeat the word without that phoneme. In the Spoonerism Test, the participant is required to switch the first syllables of a word pair and then to synthesize the segments to provide new words, for example, the word pair brown sugar becomes srown bugar. For both tests, both accuracy (number of correct letters/objects read per minute) and time (the time that participants need to complete the task) were measured. A Hebrew version of this test was used in the present study 272 A. Odah Massarwe et al.
that all participants reached the same level of performance before moving to the test phase. Since the structure of the task was probabilistic, a different criterion was used for each pair stimulus (65% A in AB, 60% C in CD, 50% E in EF; evaluated after each training block of 60 trials). After reaching this criterion, participants performed the test phase in which they were tested on the same training pairs and on all novel combinations of stimuli. The novel combinations of stimulus pairs involved either an A (AC, AD, AE, AF) or a B (BC, BD, BE, BF). During this test phase, no feedback was provided to participants. Each test pair was presented 4 times (for a total number of 60 trials).

Implicit Transitive Inference Task
In this task, the reinforcement for each stimulus pair was deterministic, but stimulus pairs partially overlapped. Four pairs of stimuli were presented: AþB−, BþC−, CþD−, and DþE−, where þ andindicate positive and negative feedback, respectively (See Figure 1b). The training session was composed of 4 phases of blocked trials followed by a fifth phase of interleaved trials. Meeting a performance criterion of at least 75% correct choices across all trials was required for each phase. If the criterion was not met, the phase was repeated. Initially, stimulus pairs were presented in pure blocks of six trials (six trials of AB followed by six trials of BC and so on). Then, the blocks were shortened in the second phase (four successive trials of each pair per block). The third phase had three trials per block, whereas the fourth had two trials per block. However, in the fifth phase, all pairs were randomly interleaved for a total of 24 trials (six repetitions for each pair) before criterion performance was measured. If the criterion was not met, the random sequence was repeated. The test phase was similar to the fifth training phase, in that all pairs were  randomly interleaved and each was presented 6 times (for a total of 36 trials). However, no feedback was provided, and two transitive pairs (BD and AE) were added to the mix of randomly ordered pairs.

Procedure
Participants were invited to three sessions. In the first session, participants completed a battery of tests to assess multiple indicators of language skills and general cognitive abilities. In the following two sessions, they either performed the PS or the TI task. The order of the two tasks was counterbalanced across participants, and the second and third sessions were separated by one week. Participants were seated approximately 40 cm in front of a 24 inch computer screen with a resolution of 1920 × 1200 on which the visual paired stimuli were presented in black over a white background (2°height and 2°width). They were instructed to press keys (i.e., either the number 1 or 2 key on the keyboard) to indicate which of the paired stimuli they thought was correct. Visual feedback was immediately provided after participants' choice for a duration of 1.5 s. If the computer did not detect a response, the words 'No Response Detected' in red print appeared at the center of the screen. Participants could take a brief break between two consecutive blocks if needed. Each task lasted approximately 15-25 min, depending on the number of blocks participants performed during the acquisition phase. Stimulus presentation and the recording of response time and accuracy were controlled by E-prime (Schneider, Eschman, & Zuccolotto, 2002).

Statistical Analysis
The power of the study was calculated by G*Power software (Faul, Erdfelder, Lang, & Buchner, 2007; See supplementary materials). The following analyses were conducted for both PS and TI tasks. We first examined whether the learning of the training pairs during the acquisition phase differed significantly between the two groups. For this purpose, we conducted 1) two-sample t tests to compare the number of training trials required for participants to reach a criterion during the training phase before moving to the test phase, and 2) a mixed ANOVA test to examine group differences for each training pair in the first training block/during the first acquisition phase.
(3) For the PS task, we also employed two-sample t tests to examine the influence of reinforcement feedback on rapid early acquisition (see Supplementary materials). Next, the learning of stimulus pairs was assessed by using a mixed ANOVA test to examine group differences during the post-acquisition test phase. Finally, in order to test for possible differences in the ability to learn from positive or negative feedback, we used a mixed ANOVA test. For the TI task, we also examined the performance of both groups on novel test pairs.

RESULTS 1
Probabilistic Selection Task

Acquisition of Training Pairs
There was no difference in the number of trials required by the two groups to reach criterion. An ANOVA test was conducted, using training pairs (AB, CD, EF) as a within-subject factor, group (Dyslexia vs. Control) as a between-subject factor, and mean proportion of correct responses during the first block of the acquisition phase as the dependent variable (see results in Figure 2a). Only the main effect of training pairs was significant, F (2, 76) = 3.58, p = .03; ηp 2 = .08. Further analyses suggested that that difficult reward contingencies (EF) were harder to learn compared with the other easier reward We examined whether the data contained outliers for which the criteria was based on 2.5 SD above/below the mean (calculated for each group separately). No outliers were detected in the PS task. In the TI task one control participant performed significantly below the mean (Z = −3.65) and was excluded from the analysis.

Data Filtering
Following the approach of Frank et al. (2004), participants who did not perform better than chance during the test phase in the easiest training pair conditions were not included in the analysis. Based on this criterion, three participants with dyslexia and one control participant were excluded from the analysis.

Post-Test Acquisition
An ANOVA was conducted, using training pair (AB, CD, EF) as the within-subject factor, group (Dyslexia vs. Controls) as the between-subject factor, and mean proportion of correct responses during the test phase as the dependent variable (see results in Figure 2b). The main effect of the group was significant, F(1, 34) = 4.34, p = .04; η p 2 = .11, suggesting that in general the dyslexia group was impaired in learning reinforcement contingencies compared to the control group (M = .74, S.E. = .03, M = .85, S.E. = .03 for the dyslexia and control groups, respectively). The main effect of the reward contingency was significant as well, F(2, 68) = 25.78, p = .001; η p 2 = .43, suggesting that participants performed worse when tested on complex reinforcement contingencies (EF) compared to the easier reinforcement contingencies (CD and AB), F(1, 34) = 35.56, p = .001; η p 2 = .51. Furthermore, participants performed worse when tested on medium reinforcement contingencies (CD) compared with easy reinforcement contingencies (AB), F (1, 34) = 6.87, p = .001; η p 2 = .16 (with a p value less than the Bonferroni-corrected significant value of 0.025 (0.05/2) considered to be significant). No other effects were significant, and the groups did not differ in the critical AB pair alone.

Performance on Transfer Measures
A mixed design ANOVA was conducted, with group (Dyslexia vs. Controls) as the between-subject factor, learning strategy (choose A in novel pairs vs. avoid B in novel pairs) as the within-subject factor, and mean proportion of correct responses during the test phase as the dependent variable (see results in Figure 3a). No significant effects were detected.

Acquisition of Training Pairs
There was no difference in the number of trials required by the two groups to reach criterion.
An ANOVA test was conducted, using training pairs (AB, BC, DC, DE) as within-subject factors, group (Dyslexia vs. Controls) as the between-subject factor, and mean proportion of correct responses during the test phase as the dependent variable (see results in Figure 4a). Only the main effect of the training pair was significant, F(3, 111) = 2.68, p = .04; η p 2 = .06. Further analysis revealed that performance on DC, DE avoid pairs was significantly better than with AB, BC choose pairs, F (1, 37) = 6.89, p = .012; η p 2 = .15 and there were no differences between inner and outer pairs within each pair group, F < 1 (with a p value less than the Bonferroni-corrected significant value of 0.016 (0.05/3) considered to be significant).

Post-Acquisition Test of Training Pairs
An ANOVA test was conducted, with training pairs as within-subject factors, group (Dyslexia vs. Controls) as the between-subject factor, and mean proportion of correct responses during the test phase as the dependent variable (see results in Figure 4b). The main effect of group was 19, indicating that participants with dyslexia were overall less accurate compared with controls. A main effect for training pair was found, F (3, 111) = 5.36, p = .001; η p 2 = .12. Planned comparisons (with a p value less than the Bonferroni-corrected significant value of .016 (0.05/3) considered to be significant) indicated that outer anchor pairs were significantly easier to learn compared to inner training pairs, F(1, 37) = 8.43, p = .006; η p 2 = .15, whereas no significant differences were observed between pairs of each pair group. No other effects were significant.

Novel Test Pairs Results of the TI Task
An ANOVA test was conducted using novel pairs (AE vs. BD) as within-subject factors, group (Dyslexia vs. Controls) as the between-subject factor, and mean proportion of correct responses during the test phase as the dependent variable (see results in Figure 3b). A significant difference was found between training pairs, such that participants performed better on the AE pair compared with the BD pair, F(1, 37) = 11.52, p = .002, η p 2 = .23. No other effects were detected. Further analyses were conducted to investigate whether accuracy for novel test pairs was above chance level (50%). Single-sample t tests indicated that both groups performed above chance on the end-anchor probe pair (AE) [t(20) = 9.72, p = .001, t(19) = 8.28, p = .001 for the dyslexia and control groups], yet only the dyslexia group performed significantly above chance on the transitive probe pair (BD) [t(20) = 3.06, p = .006, t(20) = 2.047, p = .055, for the dyslexia group and the control group, respectively] (with a p value less than the Bonferroni-corrected significant value of 0.0125 (0.05/4) considered to be significant).

Assessment of Learning Strategies
An ANOVA test was conducted with group (Dyslexia vs. Controls) as the between-subject variable, learning strategy [choose A (ABþBC) vs. avoid B (CDþDE)] as the within-subject factor, and mean proportion of correct responses during the test phase as the dependent variable. A main effect of group was found, F (1, 37) = 8.72, p = .005; η p 2 = .19, demonstrating that the control group outperformed the dyslexia group (M = .95, S.E. = .02, M = .85, S.E. = .02 for the control and dyslexia groups, respectively). A main effect was also detected for learning strategy such that, in general, participants learned more from negative feedback compared to positive feedback, F(1, 37) = 6.68, p = .013; η p 2 = .15. No other effects were significant.

DISCUSSION
In the present study, we examined the assumption that RL is impaired in dyslexia using two well-studied RL tasks. A similar pattern of results was obtained from both the PS and TI tasks. During the acquisition phase, participants with dyslexia did not differ significantly from the control group in the number of trials required to reach criterion and performance did not differ significantly between the two groups during the first phase of learning. Consistently, win-stay and lose-shift scores in the PS task (see supplementary material) were comparable across the two groups. However, results from the post-acquisition test phase indicated reduced learning of reinforcement contingencies in the dyslexia group compared with the control group. This pattern was confined to trained items, as the ability to generalize from repeated exposure to negative/positive outcomes was comparable across the two groups.
The observation that participants with dyslexia were impaired at the post-acquisition phase of the PS task but not early in training should be considered within the context of Frank's model of RL. According to Frank and Claus (2006), the PS task involves two RL types. The first involves the ability to represent and integrate feedback online to rapidly learn contingencies that depend on the orbitofrontal cortex, whereas the second reflects the gradual, habit-like acquisition of contingencies, largely dependent on the basal ganglia. According to Waltz et al. (2007), performance on the
post-acquisition test items is likely to reflect the gradual, habit-like acquisition of contingencies, largely dependent on the BG, whereas performance during the early acquisition phase reflects prefrontal cortex-based processes. In the present study, early acquisition (win-stay and lose-shift scores) in the PS task was comparable across the two groups (see supplementary material), but post-test acquisition performance was reduced in the dyslexia group. Therefore, it may be the case that participants with dyslexia were still capable of maintaining intact performance during the early acquisition phase by updating working memory representations necessary for representing differences in relative magnitude of reinforcement online but later, when slower habit-like acquisition of contingencies came into play, group differences emerged. It should be noted that a U-shaped serial position curve was observed in the test phase of the TI task, such that anchor pairs were learned better compared to the other pairs and better performance was observed on the AE test compared with the BD test. These results are consistent with an account positing that in the TI task, participants learn hierarchal relations based on associative learning mechanisms as opposed to an explicit reasoning account, since there are strength advantages for the end-anchor pairs that comprise items that are unambiguously reinforced (Frank et al., 2006;Frank, Rudy, Levy, & O'Reilly, 2005;Vasconcelos, 2008). In addition, an explicit reasoning account would predict that performance should be equal and robust across all test pairs. Then again, an associative learning account predicts a graded outcome, with the strongest evidence of transitive behavior observed in the AE test and the weakest evidence in the BD test (Frank et al., 2005). Notwithstanding, a mixture of declarative and non-declarative processes may have come into play in the TI task. In the study of Smith and Squire (2005) neurotypicals who became aware of the hierarchy, unaware participants and patients with hippocampal damage, all exhibited above chance performance in the end-anchor probe pair AE, whereas only aware participants performed above chance in the transitive probe pair BD. Consistently, impaired performance in the BD pair condition was also observed in hippocampal-lesioned animals (Titone et al., 2004). In the present study, both groups performed above chance in the AE pair but only the dyslexia group performed significantly above chance in the BD pair. This state of affairs can imply the use of declarative strategies of the dyslexia group during the TI task, possibly by the action of compensatory medial temporal lobe (MTL)-related structures.
Observations of both the PS and TI tasks suggest that learning of reinforcement contingencies occurs less robustly in individuals with dyslexia compared with neurotypicals, with no differences in learning strategies between the two groups. However, these learning impairments seem to be limited to trained items since the ability to generalize from repeated exposure in both the PS and TI tasks did not differ across the dyslexia and control groups. Such a pattern of results seems to adhere to the notion of different functions of the basal ganglia vs. hippocampus during associative learning. In particular, the basal ganglia are critically involved in stimulus-response habit learning, whereas the hippocampus and related MTL structures may be required for more complex learning such carry out transfer processes when familiar stimuli are presented in novel combinations (Gabrieli, 1998;Gluck & Myers, 1993;Myers et al., 2002;Myers et al., 2003;Shohamy, Myers, Geghman, Sage, & Gluck, 2006). Indeed, research suggests that non-demented elderly individuals with hippocampal atrophy (HA) are capable of learning trained items as well as control participants but are impaired when these items are presented in novel combinations (Myers et al., 2002). In addition, Myers et al. (2003) reported a double dissociation between the associative learning deficits observed in patients with medial temporal (hippocampal) damage versus patients with Parkinson's disease. In their study patients with basal ganglia dysfunction exhibited impaired initial learning yet intact transfer abilities. In contrast, patients with hippocampal damage exhibited the opposite pattern. Based on this, they argued that both the basal ganglia and the hippocampus are involved in associative learning, but the basal ganglia are involved in initial learning, whereas the hippocampus is involved when the transfer of the learned ability is required (Moustafa, Keri, Herzallah, Myers, & Gluck, 2010). Taken together, the observation of impaired learning alongside intact transfer ability in the dyslexia group is consistent with the PDH that posits abnormality in the basal ganglia in dyslexia but not in hippocampal and related MTL structures.
The tasks used in the present study involves learning of cue-outcome relationships over many trials by integrating the overall frequency of reinforcement. In this sense, they share some similarities with the WPT. In the study of , both feedback-based and observational probabilistic category learning were impaired in dyslexia. It may be the case that the probabilistic nature of both of these tasks may have given rise to the reported observations. Specifically, probabilistic relationships increase uncertainty, are more immune to the involvement of declarative compensation processes, and may therefore present a major source of difficulty in dyslexia (Lum et al., 2013). Notably, in the present study, individuals with dyslexia were also impaired in the TI task that involved a deterministic feedback. However, the partial overlapping of stimuli introduced reward uncertainty, which resulted in less consistent mapping between specific cues and outcomes (e.g. B was 50% rewarding and 50% non-rewarding depending on the other stimuli in the pair) a fact that could negatively influence the ability of individuals with dyslexia to learn reinforcement contingencies.
Studies in recent years argue in favor of an involvement of domain-general learning mechanisms in many aspects of language acquisition (Rabagliati, Gambi, & Pickering, 2016;Saffran & Thiessen, 2007) including the formation of speech categories (Holt & Lotto, 2010). Speech categories are multidimensional such that there is no one consistent cue for signaling category membership and some cues are more reliable than others in signaling a category difference depending on the listener's linguistic experience (Holt & Lotto, 2006). Listeners need to discover which cues are most critical for singling meaning change in their native linguistic environments (i.e. which cues are most important for making phonological distinctions). It is assumed that unsupervised statistical learning where learners become sensitive to the distributional frequency of speech cues supports speech categorization (Maye & Gerken, 2000, 2001. However, recent evidence suggests that language learning cannot be fully explained by mere statistical tracking of regularities and that reinforcement-learning mechanisms are critically involved as well (Harmon, Idemaru, & Kapatsinski, 2019;Lim, Fiez, & Holt, 2014;Nixon, 2020;Olejarczuk, Kapatsinski, & Baayen, 2018;Rabagliati et al., 2016). It seems that listeners approach the task of speech categorization also by using a discriminative error-driven learning process that guides learners to ignore non-informative cues and to efficiently learn to use predictive cues (Nixon, 2020). Consistently, several studies have revealed striatal activity among neurotypical listeners who acquire sound categories (Feng, Yi, & Chandrasekaran, 2019;Lim, Fiez, & Holt, 2019) as well as a relationship between basal ganglia dopamine levels and phonological processing (Tettamanti et al., 2005). Therefore, RL deficits may influence dyslexics' ability to establish precise phonological representations. Hence, the PDH could encompass one of the known causal factors in dyslexia, phonological processing deficits, while also providing a broader explanatory framework. In particular, the PDH can provide a mechanistic understanding of dyslexics' phonological deficits that extends to other cognitive and motor skills impairments as well. Notably, the ability to attend acoustic cues might be influenced by attentional processes that are compromised in dyslexia (Facoetti, Lorusso, Cattaneo, Galli, & Molteni, 2005;Franceschini, Gori, Ruffino, Pedrolli, & Facoetti, 2012). Therefore, it is possible that impaired attentional mechanisms could influence learning processes in this population. However, the alternative option is also possible, according to which (reinforcement) learning shapes attention toward specific acoustic cues rather than the opposite. Future studies should explore whether the process of learning to attend or attentional problems may contribute to dyslexics' difficulties in forming phonological categories.  observed impaired incidental auditory category learning in individuals with dyslexia. They speculated that impairments in reward-prediction error-driven learning via the basal ganglia might contribute to disrupting the typical course of category acquisition in dyslexia, with cascading effects on phonological processing. The present study confirms the hypothesis that RL is impaired in dyslexia but a replication of the current findings using a more heterogeneous sample is required, including a population of dyslexic children. This observation, however, opens the door for further investigation. First, it is possible that linguistic deficits observed in dyslexia can originate not only from impaired unsupervised statistical learning abilities (Bogaerts, Siegelman, & Frost, 2020;Sigurdardottir et al., 2017;Singh et al., 2018;Vandermosten et al., 2019) but also from less robust RL mechanisms. For instance, weak learning signals from the striatum could ultimately contribute to dimensioned cortical representations of sound categories in people with dyslexia leading to weakened phonological representations . Future neuroimaging studies are required to explore this assumption. An additional possible venue of investigation would be to examine whether pharmacological interventions (De Vries, Ulte, Zwitserlood, Szymanski, & Knecht, 2010) or experimental manipulations (Gabay, Shahbari-Khateb, & Mendelsohn, 2018) can rescue RL in dyslexia. Finally, burgeoning research suggests the existence of two RL systems in the brain (Daw, Niv, & Dayan, 2005), one that is involved in the formation of stimulus-response associations (model-free) and the other involving the learning of a model of the world that is believed to support goal-directed behavior (model-based). Examination of the balance between these two RL systems in dyslexia could help to further characterize their learningrelated impairments.

SUPPLEMENTARY MATERIAL
To view supplementary material for this article, please visit https://doi.org/10.1017/S1355617721000266.