The atypical pattern of irony comprehension in autistic children

Abstract Nonliteral language understanding has always been recognized as problematic in autistic individuals. We ran a study on 26 autistic children (mean age = 7.3 years) and 2 comparison groups of typically developing children, 1 matched for chronological age, and 1 of younger peers (mean age = 6.11 years) matched for linguistic abilities, aiming at assessing their understanding of ironic criticisms and compliments, and identifying the cognitive and linguistic factors that may underpin this ability. Autistic participants lagged behind the comparison groups in the comprehension of both types of irony, and their performance was related to mindreading and linguistic abilities. Significant correlations were found between first-order Theory of Mind (ToM) and both types of irony, between second-order ToM and ironic compliments, and between linguistic abilities and ironic criticisms. The autistic group displayed an interesting, and previously unattested in the literature, bimodal distribution: the great majority of them (n = 18) displayed a very poor performance in irony understanding, whereas some (n = 6) were at ceiling. We discuss these results in terms of two different profiles of autistic children.

those of children with language disorder (Kjelgaard & Tager-Flusberg, 2001). It is important to note, although, that using specific tasks targeting computationally complex structures (see, for instance, Durrleman, Delage, et al., 2017 for passives), even within the group of highly verbal children, some individuals appear not to encounter problems, whereas others show deficits, and these weaknesses seem not to be related to verbal memory skills (Meir & Novogrodsky, 2020) or to nonverbal intelligence scores (Prévost et al., 2018).
Despite phenotypical variability in the severity and heterogeneity of linguistic and intelligence profiles of abilities (Silleresi et al., 2020;WHO, 2018), pragmatics is acknowledged as the most consistently and universally impaired linguistic domain in autism (Baron-Cohen, 1988;Dewey & Everard, 1974;Tager-Flusberg, 1981;Tager-Flusberg et al., 2005;Young et al., 2005), even in those individuals who score in the normal range on tests of IQ and display structural language skills in the norm. Within the broad area of pragmatics, individuals with autism encounter specific difficulties in those abilities that require reasoning about other persons' communicative intentions. In fact, only some pragmatic tasks require the attribution of specific beliefs and intentions to the speaker, whereas others might be solved via the enrichment of what has been literally said, but without using mentalistic skills (see the discussion in Domaneschi & Bambini, 2020). Recent studies found that the autistic condition permits to solve these more "linguistic pragmatic" tasks, such as interpreting indirect speech acts (Deliens et al., 2018) or deriving scalar implicatures (Andrés-Roqueta & Katsos, 2020), whereas "social pragmatic" tasks such as those that require inferring speakers' communicative intentions (e.g., lies, jokes, or pretense) remain difficult to solve.
As Geurts et al. (2020) reminds us, the problems in handling nonliteral language have been spelled out as a core feature of autism since the very beginning: Kanner (1944) mentioned that autistic individuals tend to display excessive "literalness," and Asperger (1944Asperger ( /1991 suggested that they usually lack the understanding of jokes. The centrality of this deficit for the characterization of autism is also confirmed by its mention in the latest edition of the DSM-5: one of the diagnostic criteria for autism is the difficulty in understanding what is not explicitly stated (e.g., making inferences) and nonliteral or ambiguous meanings of language (e.g., idioms, humor, metaphors, multiple meanings that depend on the context for interpretation).
Nonliteralness can be broadly defined as the discrepancy between the linguistic meaning and the conveyed meaning of an utterance. Typically, the term nonliteral language refers to figurative modes of speech, such as irony, jokes, metaphor, hyperbole, or understatement (Gibbs, 1994). Recent reviews of studies on figurative language in the autistic population confirmed a general impairment compared to TD comparison groups in the interpretation of idioms and proverbs (Morsanyi & Stamenković, 2021) and of metonymies, metaphors, and irony (Kalandadze et al., 2018;Melogno et al., 2012). However, the nature of the difficulties autistic individuals encounter in handling this type of language is still a matter of debate. In her pioneering work, Happé (1993) found that autistic children were impaired in both metaphors and irony and attributed their difficulties to deficits in basic (for metaphors) or advanced (for irony) mentalistic abilities. Subsequent studies, however, questioned the role of mindreading in metaphor understanding and instead highlighted the contribution of linguistic knowledge (Norbury, 2005). In fact, Kalandadze et al. (2018) emphasized how the impairment in figurative speech was marginal or even nonsignificant when autistic individuals were compared to TD peers matched for language skills. At the same time, Vulchanova et al. (2015) underlined that the existence of highly verbal autistic children who still exhibit difficulties in metaphor or idiom comprehension calls for other possible explanations, and they pointed to weak central coherence, a core feature of the autistic dimension. A recent and lively debate concerns precisely the question of whether metaphors might be correctly understood thanks to sufficient lexical competence but without an egocentric perspective that does not take into consideration the speaker's point of view. Kissine (2021), in particular, considers the autistic condition as an ideal testing ground and concludes that since autistic participants with preserved linguistic skills can understand metaphors, these do not necessarily require mindreading skills. On the other hand, Kalandadze et al. (2019) emphasize the role of different aspects of the task used to test metaphor comprehension, and along the same lines Mazzarella & Noveck (2021) underline how the full appreciation of some metaphors might in fact require perspective-taking skills.
Quite interestingly, however, even if it is highly debated whether other types of figurative language necessarily require mindreading, almost all scholars believe that irony, the main focus of our study, is indeed a social pragmatic phenomenon and that its comprehension requires mentalistic skills (for a notable exception, see Katsos & Andrés-Roqueta, 2021, who propose that, under particular situations, an egocentric appreciation of irony may take place).
Since the seminal work of Happé (1993), autistic individuals have been considered seriously impaired in irony comprehension, as they have difficulties going beyond the literal meaning in order to identify the speaker's real intention. This deficit was related to difficulties in mindreading, which were proposed to be a distinguishing and core feature of autism. In particular, it was claimed that the inability to assume another person's perspective affected the recognition of the speaker's ironic intent.
In recent years, however, a growing body of research has highlighted that some autistic individuals could-at the behavioral level-successfully pass tasks designed to tackle both meta-representational cognitive abilities and pragmatic skills connected to sarcasm or irony understanding (Colich et al., 2012;Glenwright & Agbayewa, 2012). The debate thus moved to ascertaining which cognitive mechanisms might help autistic individuals successfully pass mindreading and pragmatic tasks (Colich et al., 2012;Pexman et al. 2011; for a more general theoretical debate, see Kissine, 2021). To tackle this issue, we first outline a characterization of verbal irony and its acquisition in children with typical development or with autism, and then we present a new experimental study.

Verbal irony
As already alluded to, verbal irony is a type of nonliteral language in which what a speaker intends to communicate (the speaker's meaning) clashes with the literal meaning conveyed by the sentence and is typically (even if not always, Wilson & Sperber, 2012) the opposite of what was said (the sentence meaning). Thus, if after an extremely boring party Ann says to Chloe "The party was great fun!", she is communicating her contemptuous attitude toward the idea that the party was fun, and any interlocutor who correctly recognizes her communicative intent will come to believe that Ann thinks that the party was anything but fun (speaker's meaning). In this sense, she is in fact making a criticism (speaker's attitude). Indeed, ironic comments are always evaluative, and even if the most common form of irony is represented by literally positive remarks that are used to blame (ironic criticisms, also defined as sarcasm), the reverse is also possible. A literally negative statement can be ironically used to communicate its opposite meaning, as when Ann comments to Bill who got straight A's, "Your exam went very badly." In addition to being rarer, ironic compliments are harder to understand and to appreciate, at least when they do not echo a preceding statement (thus, the preceding example becomes more natural in a context in which Bill manifested his fear that the exam would go very badly; Wilson & Sperber, 2012). This difference has been linked to the so-called asymmetry of affect: ironic criticisms are easier (and more widespread) because they echo a positive expectation that can be easily retrieved even if not explicitly stated; on the other hand, ironic compliments make reference to negative expectations that do not constitute the standard (Kreuz & Link, 2002).
Assuming the hearer's perspective, the most powerful cue to detect the speaker's ironic intent is the incongruence between the context and the statement (see, a.o., Rivière et al., 2018), even if ironic compliments might require the presence of an explicit statement to be correctly perceived. To avoid misunderstandings, the ironic speaker can (but does not need to) also display the so-called ironic markers, that is, meta-communicative cues that alert the audience that his or her comment should not be interpreted literally. In addition to particular linguistic choices (e.g., extreme adjectives or syntactic constructions such as topicalization), the ironic speaker can display specific gestural or phonological cues, such as the ironic tone of voice (Burgers & van Mulken, 2017).

The acquisition of irony
The complexity of the mechanisms required to comprehend irony is reflected in its late acquisition by TD children. Studies that investigated the comprehension of irony within a developmental perspective found that TD children start comprehending irony around the age of 5 years, but its full appreciation may not be realized until up to 8 years of age (Creusere, 2000;Dews et al., 1996;Filippova & Astington, 2008;Harris & Pexman, 2003, a.o.). The correct detection of the speaker's meaning has been reported to precede the inference of the speaker's attitude (Hancock et al., 2000). Moreover, children's comprehension of ironic criticisms usually precedes that of ironic compliments (Pexman & Glenwright, 2007). Various proposals have been put forth to explain the acquisition asymmetry between ironic criticisms and compliments. As already noted, ironic criticisms are more common than ironic compliments and thus might constitute more conventionalized forms of irony: children might have more often encountered instances of sarcasm during their normal conversational exchanges, and thus, they might be keener to recognize and appreciate them (Pexman & Glenwright, 2007). Other scholars emphasized that ironic compliments are particularly challenging because they require the negation of a (inherently) negative statement (Giora, 1995), and the processing of a double negation might be too demanding for children. More generally, the asymmetry in acquisition might simply reflect the asymmetry of affect previously discussed: ironic criticisms are indeed easier than ironic compliments (also for adults) because they make reference to positive expectations about the outcome of an event (e.g., the party will be fun), whereas ironic compliments refer to negative expectancies (e.g., the exam will go badly). As human beings, we tend to have positive feelings about how things will go, and children might possibly be even more optimistic. Therefore, it is easier to retrieve positive antecedents compared to negative ones (Kumon-Nakamura et al., 1995). In fact, it has been shown that if ironic compliments echo an explicitly stated negative antecedent, children do not find them harder than ironic criticisms (Hancock et al., 2000, exp. 2;Nakassis & Snedeker, 2002).
Many scholars have investigated the factors that might promote the development of the recognition of irony. Sullivan et al. (1995) argued that sophisticated cognitive abilities connected to the so-called Theory of Mind (ToM) are required to distinguish lies from jokes. ToM is an umbrella term that refers to the mentalistic abilities required to attribute and reason about others' mental states, such as emotions, intentions, and beliefs. This mindreading competence is crucial to interpret and predict people's behavior and to maintain good social relations; it has also been linked to the recognition of the speaker's communicative intent. Let us again take the example of the ironic criticism regarding the boring party that Ann attended. When Ann comments to Chloe, "The party was great fun!", she is saying something that is false in that situation. If Ann thinks that Chloe knows that the party was not at all fun, then she wants Chloe to recognize her jocular intent. If, on the other hand, Ann thinks that Chloe does not know that the party was boring, then she wants to mislead her and she is lying. In other words, to distinguish lies from jokes, an advanced second-order ToM is required (i.e., predicting one person's state of mind about another person's state of mind), since the speaker must hold a belief about his or her interlocutor's state of knowledge. Indeed, different scholars linked children's difficulties in irony recognition to their still immature mentalistic skills and found a correlation between advanced ToM levels and irony understanding in TD children (Nilsen et al., 2011;Sullivan et al., 1995;Winner & Leekam, 1991). At the same time, the development of ToM is tightly linked to language proficiency (Milligan et al., 2007), and studies that controlled for language competence found that not only ToM but also language correlated with irony comprehension in TD children (Filippova & Astington, 2008). Angeleri and Airenti (2014) hypothesized that the relation between ToM and irony was spurious and resulted from language skills affecting both ToM and irony.

Irony comprehension in autism
One of the defining characteristics of autism is the presence of persistent deficits in social communication and social interaction, and it is therefore not surprising that the area of pragmatic communication has been extensively investigated in this population.
Focusing on irony comprehension, apparently contrasting results have been reported. Indeed, the interpretive skills required for irony appreciation are some of the skills thought to be particularly problematic for autistic individuals. In particular, it has been suggested that the deficit in comprehension of ironic language found among autistic individuals may be related to the presence of coexisting cognitive difficulties in the areas of ToM (Happé, 1993;Peterson et al., 2012;Tager-Flusberg, 2000), in keeping track of diverse perspectives (Kissine, 2012) and in structural language skills (vocabulary and syntax) (Norbury, 2004(Norbury, , 2005Whyte et al., 2014). Happé (1993) conducted the first experimental research to make explicit connections between ToM and figurative language comprehension in individuals with ASD. She found that only the autistic participants (with heterogeneous verbal and nonverbal intelligence scores) who passed both first and second-order ToM tasks could understand ironic utterances. MacKay and Shaw (2004) reported that when asked to interpret ironic utterances, children with high-functioning autism tended to offer rephrasing of the statement as answers or provided explanations that involved reinterpreting the context to make the literal meaning fit. Martin and McDonald (2004) tested individuals with Asperger syndrome (AS) presenting them short stories that ended with a lie or a joke and that contained questions about (first-and second-order) mental attributions and about the speaker's meaning and intent. They found a delay compared to TD comparison groups and a correlation between ToM and identification of the speaker's communicative intent (even if, since questions were presented within the same stories, this effect might be due to a search for coherence). Similar findings were found by Kaland et al. (2002): they tested adolescents with AS on various tasks targeting, among other things, (white) lies, misunderstanding, double bluff, and irony, and they found an impairment in the questions requiring justification of the corresponding mental state. Saban-Bezalel et al. (2019) compared the performance in irony comprehension between 20 autistic children and a comparison group of TD children matched not only for age, vocabulary, and executive functions but also for ToM (autistic children succeeded on the second-order ToM task 80% of the time). They found that even if autistic children showed a remarkable ability in the appreciation of the comic strips (79% accuracy), their performance still lagged behind that of TD peers (89%). Interestingly, the difference between the two groups disappeared when other mentalizing abilities assessed with the Hinting Test (which requires understanding other persons' intentions) were taken into account. Deliens et al. (2018) tested autistic adolescents and young adults without language impairment or intellectual disability (their verbal and nonverbal intelligence scores were above 70) in a task that required recognizing the (sincere or) ironic intent of a statement using three different cues: incongruence with the context, prosodic information, and facial expressions (cues presented cumulatively and in isolation). Autistic participants detected the ironic intended meaning to a lesser extent than TD controls but relied on the same cues (context incongruence more than acoustical or visual cues). Deliens et al. also collected eye-tracking data and found that the autism group fixated the incorrect object longer than the comparison group.
In the last decade, several studies using implicit or neural measures of irony comprehension have been conducted, and interesting findings have emerged. Pexman et al. (2011) tested 18 children and adolescents with high-functioning autism (with a mean age of 11 years) and 2 comparison groups matched for chronological and linguistic age (LA) on a task assessing ironic criticisms and ironic compliments. They found no delay compared to the comparison TD groups on the questions assessing the speaker's meaning or the speaker's intent-possibly because the requested answers (forced choice in the first case and selection of an object in the second case) posed minimal verbal demands; nevertheless, the group with autism provided lower humor ratings for the ironic remarks, suggesting that they did not appreciate the social functions of irony. Moreover, response latency and eye gaze measures suggested that autistic individuals were processing irony differently from their TD peers.
The atypical processing of sarcasm in autistic individuals is also confirmed by studies that use neuroimaging techniques (functional magnetic resonance imaging) to discover the neural mechanisms involved in the elaboration of sarcasm. All these studies converge in highlighting that, even if children and adolescents with highfunctioning autism did not differ from their TD peers in behavioral responses to ironic remarks (Colich et al., 2012;Williams et al., 2013) or had a nevertheless satisfactory performance (Wang et al., 2006), their brain activity was atypical compared to TD peers. Wang et al. (2006) found that even if the neural networks involved in the processing of the speaker's sincere or ironic intent were the same in children with and without autism, the autistic group showed hyperactivity in the right frontal and bilateral temporal regions, and the increased cortical recruitment was interpreted as reflecting a compensatory mechanism and/or increased neural effort needed to perform the task. Additionally, Colich et al. (2012) found that, despite the strong similarities in the neural networks activated for the processing of ironic items, the brain activity of autistic children was considerably more bilaterally distributed and more widespread, extending to regions such as the left temporal pole and the medial prefrontal cortex, typically engaged during mentalizing tasks. Williams et al. (2013) found activation differences in key languageprocessing regions (left middle temporal, left pars triangularis, left pars opercularis, left medial frontal, and right middle temporal) in the autism group compared to the TD group. Specifically, the authors found that participants on the autism spectrum (both children and adults) showed lower coordination within the left hemisphere language network during irony comprehension than controls. However, in contrast to Colich et al., autistic participants did not show an increase in functional connectivity for ironic texts compared to literal texts. Such a difference in results might be due to a difference in tasks, since in Colich et al.'s (2012) study, children were explicitly asked to interpret a character's communicative intentions, and this more demanding task might have led to a recruitment of this brain region (Williams et al., 2013: 299). Finally, both Colich et al. (2012) and Williams et al. (2013) detected less activation in the left pars triangularis region in autistic children. As highlighted by Williams et al. (2013), this region has been associated with semantic processing, and a lack of an increase in activation might be associated with a diminished appreciation of the ironic information.
To summarize, even if impairments in figurative understanding in autism are supported by overwhelming evidence, some issues need to be addressed. Regarding the factors that might promote irony comprehension, many studies found correlations between ToM and sarcasm understanding, but only a few studies conducted on autistic participants investigated whether linguistic skills might intervene as a mediating factor. This is particularly relevant because recent research has suggested a close developmental link between ToM and language skills (Angeleri & Airenti, 2014), in particular, mastery of complement clauses (Durrleman, Burnel, et al., 2017). Moreover, except for Pexman et al. (2011), previous studies have tested only one type of verbal irony, which is sarcasm. Even if they are less common than ironic criticisms, ironic compliments might constitute an interesting testing ground because they might highlight similarities and differences in the way autistic or TD children derive the speaker's meaning and intent. Finally, some recent studies found that autistic individuals showed a good understanding of irony, obtaining overall scores above chance level or even comparable to those of the TD comparison groups (Colich et al., 2012;Pexman et al., 2011;Wang et al., 2006;Williams et al., 2013). A more in-depth investigation of the profile of autistic individuals who pass these irony tasks might shed light on the factors that promote its understanding and thus offer valuable indications for rehabilitation programs that might help them develop this ability.

The present study
We conducted a study that aimed to test the understanding of ironic criticisms and ironic compliments in autistic children, comparing their performance to two comparison groups of TD children, one matched on chronological age (CA) and the other matched on linguistic competence. In particular, we were interested in identifying what might facilitate irony comprehension, and thus, we investigated the links with both ToM skills and linguistic abilities.

Methods Participants
Twenty-six autistic children (24 male and 2 female) with a mean age of 7.26 years (SD = 2.02, range = 3.75-10.25 years) and 52 TD children took part in the study. All participants lived in Italy and were native speakers of Italian. Participants with autism were recruited and diagnosed at the Hospital 'Azienda Provinciale per i Servizi Sanitari' (APSS) in Trento, Italy, and were included in the study if they were considered verbal children according to the Autism Diagnostic Observation Schedule (ADOS; Lord et al., 2012) and DSM-5 (2013) criteria by medical professionals. The individual and mean scores obtained in the ADOS are reported in Table 1, along with their scores on the language test (Test di comprensione grammaticale of the BVL, Marini et al., 2015) and nonverbal IQ test (Italian standardization of the Raven Colored Progressive Matrices (Raven), Belacchi et al., 2008). These tests are described in detail below.
Autistic children were paired one by one with a group of CA-matched TD children (CA, N = 26, 17 male and 9 female, mean age = 7.26, SD = 2.02, range = 4-10.25) and with another group of TD children paired one by one based on their score in the grammatical comprehension task of the BVL (Marini et al., 2015) and thus matched for LA (N = 26, 14 male and 12 female, mean age = 6.11, SD = 1.70, range = 4.08-9.83). Children in the LA group were significantly younger than the autistic children (t = −2.15, p = .03). H Participants that will fit into the "at-ceiling group," see the Individual analysis in the Results section; L Participants that will fit into the "at-floor group," see the Individual analysis in the Results section.
The TD children were recruited in kindergartens and primary schools in the provinces of Vicenza and Verona (Italy). This study is a part of extensive research on pragmatic deficits in the autistic population that has been approved by the University of Trento's Ethical Committee, and all parents provided written informed consent prior to beginning the experiment.
All children were also tested for fluid nonverbal intelligence using the Italian standardization of the Raven Colored Progressive Matrices (Belacchi et al., 2008). This test comprises 3 sets of 12 items (maximum score = 36), with increasing difficulty and complexity within and across sets. Children are presented with a pattern in which a piece is missing, and their task is to choose, among six alternatives, the piece that best completes the matrix. To achieve this task, children must infer rules, manage a hierarchy of goals, and form high-level abstractions (Carpenter et al., 1990). Linguistic abilities were assessed with the grammar comprehension task of the BVL (Marini et al., 2015). Children had to select the picture, out of four alternatives, that corresponds to the sentence pronounced by the researcher. In 40 items, the test allows an assessment of several morphosyntactic features that involve the expression of gender, number (singular/plural), clitic pronouns, negation, passive structures, and relative clauses.
For general linguistic abilities, ASD and LA were matched one by one for scores in the grammatical task, whereas ASD scored significantly lower than CA (t = 2.47, p = .02). On the other hand, there were no significant differences between the ASD and the two TD groups in the Raven scores (all ps > .3). The characteristics of the three groups are summarized in Table 2.
ToM tasks. We tested children's mentalistic abilities using one task targeting firstorder ToM and another task (with three test questions) targeting second-order ToM. First-order ToM was evaluated with an unexpected-contents task (Gopnik & Astington, 1988): children were shown a Band-Aid box that actually contained playing cards and were asked to say what another person who had not opened the box would say if asked "What's inside the Band-Aid box?". To answer correctly (i.e., "Band-Aids"), the child has to demonstrate their ability to understand that people can have a false belief. We assessed second-order ToM with a modified version Table 2. Characteristics of the three groups of participants: mean age (in months) and mean raw scores and standard deviations at the Raven, the grammatical comprehension task (BVL) for the group of children with ASD and for the two TD groups matched for chronological age (CA) and for linguistic age (LA)
Irony comprehension task. To assess irony comprehension, we used an adapted version of the task used by Panzeri et al. (2020;2021) that consists of 10 short stories involving 2 characters interacting and with a concluding remark that needs to be interpreted either literally (control stories: N = 4) or ironically (ironic compliments: N = 3; ironic criticisms: N = 3). The task was created on PowerPoint and presented on a laptop computer; the stories were recorded to avoid uncontrolled and unwanted changes in prosody. The final remarks were pronounced with the corresponding sincere or ironic intonation, and the task was presented auditorily to the participants, accompanied by some images to capture the children's attention.
After hearing each story, children had to answer three questions that were meant to evaluate i) detection of the speaker's meaning (to assess if the child correctly interpreted the final remark as literal or ironic), ii) context recognition (control), and iii) understanding of the speaker's attitude (to assess if the child correctly interpreted the final remark as a compliment or a criticism). The second context recognition question was used to check whether participants understood what truly happened in the story; they were asked to choose the correct picture among four (the target, the competitor, and two distractors). Examples of ironic stories and related test questions are reported in Table 3 (ironic criticism) and Table 4 (ironic compliment).
All stories (literal and ironic) were presented in a pseudorandomized order: Two stories of the same type (e.g., two ironic criticisms) were never presented consecutively.

Procedure
All children were tested individually in a quiet room of their kindergarten or school (TD children) or in the Hospital 'Azienda Provinciale per i Servizi Sanitari' (APSS) in Trento (autistic children). The testing lasted for approximately 1 hr; autistic children and preschoolers were tested in two different sessions to avoid participants becoming tired or distracted. The tasks were administered in a fixed order: Raven, ToM tasks, grammatical BVL task, and irony task.
The results are divided into several sections, one for the type of analysis. All statistical analyses were conducted with R version 4.0.3.

ToM abilities
The results are reported in Table 5. The first-order ToM score was significantly lower in the ASD group than in the two TD comparison groups (ASD vs. CA: z = 3.87, p = .0001; ASD vs. LA: z = 3.66, p = .0002). Additionally, the secondorder score was significantly lower in the ASD group than in the two TD groups (ASD vs. CA: t = 4.85, p < .0001; ASD vs. LA: t = 3.10, p = .0027).

Irony comprehension task: Ironic versus literal stories
First, we compared literal and ironic stories considering all three questions (speaker's meaning, context, and speaker's attitude). Accuracy was analyzed as a dichotomous variable (1 = correct/0 = wrong) by means of mixed effects logistic regressions. Significance tests were performed through likelihood ratio comparisons between models. Considering the three groups, our comparisons of interest were between ASD children and the two TD groups (CA and LA).
The main results are illustrated in Figure 1 We performed a logistic regression analysis comparing two models with the following random structure: by-participants and by-items varying intercepts, and Table 3. Example of a story in the ironic criticism condition (English translation)

Introductory background
Tommy is spending the afternoon playing at Paul's home. Tommy asks Paul to play with the LEGO bricks to build a big spaceship. Initially, Paul does not want to, because he is worried that after playing his room would be a mess. Tommy promises that he will help Paul to tidy up the room.

Contextual information
But when it is time for Tommy to go home, he leaves without helping Paul. The room remains a mess. So Paul tells Tommy: Target sentence Thanks for your help in tidying up! Question 1 (speaker meaning) Did Paul mean that: Tommy helped him or Tommy did not help him.

How was Paul's room when Tommy left?
Question 2 (speaker attitude) When Paul thanked Tommy for his help in tidying up, Paul wanted to compliment or criticize Tommy? Table 4. Example of a story in the ironic compliments condition (English translation)

Introductory background
Mom is preparing a cake and asks her daughter Chiara to help her. Chiara does not want to mess up things, and she tells mom that she is afraid to bungle things. Mom tells her not to worry, and that she is sure Chiara will be very careful.

Contextual information
Chiara succeeds in putting the flour in a bowl, and she also adds the eggs, without making any mistakes. So mom tells Chiara: Target sentence You are a real bungler! Question 1 (speaker meaning) Did mom mean that: Chiara is a bungler or she is not a bungler?
Question 2 (context) What did Chiara do with the eggs?
Question 2 (speaker attitude) When mom told Chiara that she was a bungler, mom wanted to compliment or criticize Chiara?  Figure 1. Boxplots representing the distribution of participants mean accuracy (y-axis) in the three groups (x-axis) in ironic (white) and literal (gray) stories. The "X" symbols represent mean values.
by-participants varying slopes for the ironic/literal variable. The first model was a full model with group, type of story (ironic/literal), and their interaction, and the second model dropped the interaction term. The comparison between the two models indicated that the interaction was significant (χ 2 (2) = 10.41, p = .005).
To investigate the interaction, we performed a post hoc analysis considering the following comparisons: ASD versus CA and ASD vs. LA in ironic and literal stories and ironic vs. literal stories in the three groups. The results showed that the difference in accuracy between the ironic and literal stories was significant in all three groups. Moreover, accuracy on ironic stories was significantly lower in the ASD group than in the two TD groups, whereas the difference in accuracy in literal stories was not significant (Table 6).

Irony comprehension task: Ironic criticisms versus ironic compliments
In this section, we focus on the comparison between ironic criticisms and ironic compliments. In this analysis, we considered only the two questions directly linked to irony comprehension (speaker's meaning and speaker's attitude), and we directly compared the performance in these two questions. As in the previous section, answers to each question were coded as 1, correct, or 0, wrong, and the dichotomous variable accuracy was analyzed by means of mixed effects logistic regressions. Our comparisons of interest were between ASD children and CA and LA children.
In all groups, accuracy was lower in ironic compliments compared to ironic criticisms, with the greatest difference in the LA group (ASD compliments: We performed a logistic regression analysis comparing several models with byparticipants varying intercepts. The first model was a full model with group, type of irony (compliments/criticisms), question (meaning/attitude), and their interaction, including the three-way interaction. Then, we removed one by one the predictors that did not significantly contribute to the models' goodness of fit (Table 7). A summary of the final model is given in Table 8. Confirming the previous analysis, accuracy in ironic stories was significantly higher in CA children than in ASD children and in LA children than in ASD children. Moreover, across groups, accuracy was higher in criticisms than in compliments. No significant difference was found between the meaning and attitude questions.

Irony comprehension task: Individual analysis
The analysis thus far considered all 26 ASD participants as a group. As seen in Figures 1 and 2, the performance within the group was extremely variable; therefore, in this section, we present individual results. As in Panzeri et al. (2021), we Figure 2. Boxplots representing the distribution of participants mean accuracy (y-axis) in the three groups (x-axis) in ironic compliments (white) and ironic criticisms (gray). The "X" symbols represent mean values.  calculated an ironic criticism and an ironic compliment score. We assigned one point for each correct answer to the two irony-related questions (detection of speaker's meaning and of speaker's attitude) in each of the three stories per type of irony. Then, each individual had a score (max 6 points) for ironic criticisms and one (max 6 points) for ironic compliments. As Figure 3 shows, the score distribution was not homogeneous: 12/26 participants (46%) scored 0/6 both in ironic compliments and criticisms, and 6/26 participants (23%) scored 6/6. Of the remaining eight participants, the vast majority had a score of 0 in compliments and a very low score (1 or 2) in criticisms.
To draw a comparison, we also considered individual scores for TD children. As one can see in Figure 4, it is clear that the situation is quite different: first, no TD child had a score of 0 both in compliments and in criticisms, and second, the distribution from low to high scores looks smoother than for ASD children (this is visible considering the LA group, since many participants of the CA group performed at ceiling).

Relationship between irony comprehension and cognitive or linguistic factors in autistic children
To better understand the pattern of results of the ASD group, in the last step of the analysis, we investigated the relationship between age, morphosyntactic (BVL) score, first-(ToM 1) and second-order (ToM 2) ToM scores, ADOS score, and performance in the irony task.
We performed two different analyses. First, we considered all ASD participants, and we ran an exploratory correlation analysis between the scores of compliments and criticisms and the various biographical, linguistics, and cognitive measures. To obtain a baseline, we performed the correlation analysis also considering TD children (taking both groups, CA and LA, together).
Then, to account for the bimodal distribution characterizing the performance of ASD children described in the previous section, we classified the ASD participants according to their performance in the irony task. Children with a score of a maximum of 2 correct answers out of 12 were grouped together ("at-floor group," 18 participants), and children with a score of at least 10 correct answers out of 12 were combined in a second group ("at-ceiling group," 6 participants. They all scored 12/ 12). Therefore, two participants were not considered. Subsequently, we compared the two groups considering the same biographical, linguistic, and cognitive measures used in the correlation analysis.
Spearman's correlation analysis. Spearman's coefficients for TD children are reported in Table 9, and those for ASD children are reported in Table 10. Significant correlations are highlighted in bold.  Even if these analyses do not permit a direct comparison between the ASD and TD populations, it is interesting to note that the overall picture is different. As expected, in the groups of TD children (Table 9), improvement in all areas was linked to age (except for first-order ToM because of an at-ceiling effect). Focusing on irony comprehension, correlations with age, linguistic competence, and even nonverbal intelligence were found for both types of irony, whereas scores in second-order ToM were correlated with ironic compliments only. In the ASD group, however, the pattern was different. Age did not have any impact on irony comprehension; grammatical skills were related to ironic criticisms only, and mindreading skills were positively correlated with accuracy in both types of irony, even if the correlation between second-order ToM and ironic criticisms did not reach statistical significance.
At-floor versus at-ceiling ASD group. Descriptive statistics about the two groups are provided in Table 1.
At first glance, strong differences between the two groups were found in the two ToM scores. To evaluate our first observations and statistically compare the two groups, we ran a series of unpaired two-sample Wilcoxon tests. The results confirm that participants of the "at-ceiling group" significantly outperformed participants of the "at-floor group" in both ToM measures (ToM 1 and ToM 2 scores), whereas the other measures did not significantly differ between the two groups (Table 12).

Discussion
Our aim was to investigate the comprehension of ironic criticisms and ironic compliments in autistic children. By relying on previous results, we predicted significant delays in this competence, which involves the attribution of mental states and intentions to the speaker. In contrast to other studies (see, a.o., Hancock et al., 2000;Pexman & Glenwright, 2007), we did not find a difference in accuracy between questions assessing the speaker's meaning and the speaker's attitude: if a participant correctly detected the speaker's intended meaning, then they would also correctly infer the speaker's praising or blaming attitude. This might depend on the fact that, unlike other studies that required children to rate speaker's level of meanness/niceness using a Likert scale, we proposed a forced choice question (Was the speaker making a compliment or criticism?). On the other hand, we were also interested in identifying the factors that might promote the development of this skill. The results showed that our sample of autistic children lagged behind TD children, not only those with the same CA but also the younger ones matched for language level. This finding contrasts with what emerged in the review of Kalandadze et al. (2018), who noticed that the performance of autistic participants in figurative language tasks was in fact only marginally different, or even comparable, to that of TD peers matched for LA. However, Chahboun et al. (2016) found that their sample of high-functioning autistic participants lagged behind TD peers matched for language competence in the comprehension of metaphors. Importantly, autistic children encountered specific problems in the detection of ironic remarks but correctly interpreted all stories that required a literal interpretation. Statistical analyses highlighted that ironic compliments were harder to understand than ironic criticisms, and this was the case for all groups of participants. Notably, even though the final remark of our task always echoed a preceding statement (in the story reported in Table 3, for instance, the mother's final comment "You are a real bungler!" referred to the daughter's fear that she would bungle things), still making the negative expectation explicit was not enough to override the asymmetry of affect. These results differed from the findings reported in Hancock et al. (2000) and Nakassis and Snedeker (2002). Thus, our findings confirm that ironic compliments are inherently more difficult to recognize, and we assume that their complexity is reflected in their being uncommon. If familiarity with instances of irony were to facilitate its recognition, we could expect a correlation between age and accuracy in irony recognition, since the older children get, the more likely it is for them to encounter ironic exchanges. This is what we found for the group of TD children, with age linked to the comprehension of both ironic criticisms and ironic compliments. In addition to irony, grammatical abilities and nonverbal intelligence also ameliorate, together with second-order ToM. This "typicality" in the development of TD children is what makes it difficult to disentangle the factors that promote the mastering of irony comprehension: In addition to age, verbal and nonverbal intelligence were also related to both types of irony. Quite surprisingly, however, in the TD groups, mentalistic abilities were not correlated with ironic criticism comprehension, and only a significant correlation was found in the case of second-order ToM and ironic compliments.
When we turn to the group of ASD children, the picture is rather different. First, age significantly correlated with verbal and nonverbal intelligence (strong positive correlations). In contrast, the correlations between age and mentalistic reasoning and age and irony understanding did not reach significance. This result seems to corroborate the idea that these two abilities, which inherently require the attribution of mental states and intentions to other persons, are impaired in this population and do not seem to be compensated simply by age. At the same time, it is important to note that these data contrast with other studies that did find relationships between CA on the one hand and mentalistic abilities and sarcasm recognition on the other hand (e.g., Happé, 1995;Peterson et al., 2012); thus, they must be interpreted cautiously because they might be specific to our sample.
It is notable, however, that the ADOS score does not seem to capture autistic children's difficulties in these specific skills, since we only found an inverse correlation with raw scores in the Raven task. Focusing on irony comprehension, when we consider autistic children as a group, the situation is not clear. Ironic compliment comprehension significantly correlated with both levels of ToM scores, whereas ironic criticism comprehension correlated with grammatical comprehension and with first-order ToM scores (moderate positive correlations).
The picture can be sharpened by zooming into individual data. Autistic children reached an overall accuracy of approximately 30% in ironic stories (27% for ironic compliments and 34% for ironic criticisms), but this overall score is in fact obtained by a clear bimodal distribution of the participants' performance. The great majority of autistic children (18 out of 26) failed almost all questions assessing irony understanding (at least 10 errors out of 12, and 12 of these children failed all 12 questions); the other 6 children, on the other hand, responded correctly to all 12 questions investigating ironic criticisms and ironic compliments. Such a sharp partitioning of the population in "at-floor" and "at-ceiling" performers is not present in the groups of TD children, and interestingly, it was not attested either in the other atypical populations tested with the same irony comprehension task (Panzeri et al. 2020;2021). To search for the factors that might promote irony recognition in autistic children, we believe that, instead of considering the autistic children as a group, it might be more relevant to delineate the profiles of the children who fail from those who succeed in irony understanding. Even if we are fully aware that the difference in size of the groups (18 vs. 6 children) and, particularly, that the very low number of children performing at ceiling do not permit us to draw firm conclusions, the data reported in Tables 11 and 12 are worth discussing. Despite the fact that children who demonstrate a perfect understanding of the ironic stories were slightly older and had higher scores in all the tested cognitive and linguistic abilities (compared to those who fail almost all questions related to irony), only in the case of ToM did this difference reach statistical significance. The at-ceiling performers had significantly higher scores than the at-floor performers in both first-order ToM (.52 vs.11) and second-order ToM (1.33 vs.28). This result, at first glance, seems to support the idea that higher-order mentalistic skills are required to master irony, and in particular to identify the speaker's communicative intent (distinguishing lies from jokes). On the other hand, a closer inspection of the data casts some doubts on the necessary links between ToM skills and irony understanding: within the group of at-ceiling performers, the variation in ToM scores was high, and two (out of six) of the children who responded correctly to all questions investigating irony understanding failed both ToM tasks. This result, then, questions the idea that passing mindreading tasks is a necessary condition for performing well in irony comprehension-even if we must acknowledge the fact that ToM abilities were assessed with only two tasks.
The most striking result of our study, however, is the sharp bimodal distribution in at-ceiling and at-floor performers. This is different from the protracted process that leads TD children to fully master irony. After an initial phase in which TD children adhere to a literal interpretation of the ironic remark, they start realizing that the remark is incongruent with the situation; in this phase, however, children might struggle in identifying the speaker's communicative intent and/or attitude. Moreover, TD children recognize ironic criticisms earlier and better than ironic compliments. Looking at individual performances, then, a shallow curve is typically found, as in Figure 4: some children fail most questions, others start demonstrating some understanding of ironic criticisms, while still misinterpreting ironic compliments, then other children begin to finally grasp all criticisms and some  compliments, then eventually begin to master all types of ironic remarks. Our group of autistic children, however, was in fact split into two: the majority (18 children) failed all (N = 12) or almost all of the questions; six children responded correctly to all of them. For the first group, the at-floor performers, this result appears to be perfectly in line with the characterization of the profile of the condition of autism. As extensively discussed in the introduction, individuals with autism experience difficulties in inferring other persons' intentions, and to understand irony and figurative language in general, interlocutors must go beyond the literal meaning of the sentence to recognize the speaker's communicative intent. The presence of autistic children who respond correctly to all irony questions might be remarkable, but other studies found good performances in these tasks (Colich et al., 2012;Pexman et al., 2011;Saban-Bezalel et al., 2019;Wang et al., 2006;Williams et al., 2013). What is truly unexpected, although, is the absence of "intermediate" performers: only two autistic children (out of 26) had a 50% accuracy, the others were either at 100% or below 17%. This suggests that autistic participants either misunderstand ironic remarks, sticking to a literal interpretation and being consistent with this wrong interpretation, or they behave as if they found the key to solve the enigma of irony. Several scholars noticed that some autistic individuals appear to have "found the key" to respond correctly to tasks requiring mindreading, using strategies that appear to be different from those employed by TD peers. Happé (1995) found that autistic participants who had better scores in ToM tasks were those with higher verbal abilities and commented on this result, hypothesizing that they could solve mentalistic tasks "in a verbally mediated fashion," "in an unusually conscious and logical way, for example, looking as if they are doing 'mental arithmetic' before eventually giving the correct answer" (Happé, 1995: 852). Additionally, Frith et al. (1991) proposed that the few autistic individuals who could pass ToM tasks behaved as if they had extracted general rules to account for specific situations that they would still deem totally irrational. Notice that this intellectually based strategy might require "routes that are slow and cumbersome, disrupting the timing of their responses" (Bowler, 1992: 888). There is evidence that autistic individuals who pass mentalistic tasks are slower in solving them (Bowler, 1997;Kaland et al., 2007), and neuroimaging techniques have found different activation patterns (Kana et al., 2015;Kim et al., 2016;Yuk et al., 2018). Recently, Eigsti & Irvine (2021) found that autistic adolescents were slower, compared to TD controls, in responding to false belief test questions when they were performing the task under verbal load. This finding is consistent with the proposal that there are two types of mindreading processes, one explicit and verbal and the other implicit and nonverbal, and autistic adolescents tend to rely on the former. As discussed in the introductory section, in the case of sarcasm processing, the brain activity of autistic individuals was atypical compared to TD peers (Colich et al., 2012;Wang et al., 2006;Williams et al., 2013). These observations lead us to hypothesize that the autistic children who performed at the ceiling in our sample were in fact using a "cognitive" strategy to detect ironic remarks, possibly based on simple rules, such as "if speakers said something blatantly false, then they are ironic and mean the opposite." Notice that the perfect performance on ironic compliments might be easily explained assuming that these children were simply applying a rule, which permits to solve equally well ironic blames and ironic praises, without taking into account the different types of social norms they refer to (the "asymmetry of affect"), and Pexman et al. (2011) claimed that autistic individuals could not appreciate the social functions of irony.

Conclusion
The results of our study can be read in two ways. Focusing on global performance, we confirmed that autistic children lagged behind TD peers, not only those of the same CA but also the younger ones matched for linguistic competence. Moreover, positive correlations between ToM and irony detection were found. Nevertheless, zooming into the individual performances, we revealed an unexpected bimodal distribution. We hypothesized that the children who failed (almost) all questions related to ironic stories would show a "typical" profile, which met the pattern of responses expected from autistic children. On the other hand, those who showed an unexpected pattern of responses, answering correctly to all questions investigating not only ironic criticisms but also the rarer and usually harder ironic compliments, represented an "atypical" profile in autistic children. We speculated that these "atypical" children were in fact using an intellectually or verbally based compensatory strategy to solve this task. Our hypotheses are merely speculative, since we did not use any measure that might capture their processing strategies.
Crucially, as briefly discussed in the introduction, a bimodal distribution has also been found in studies investigating structural aspects of language (Kjelgaard & Tager Flusberg, 2001, a.o.), and interestingly, the presence of "high-" and "low-performers" in language-related tasks seems to crosscut the presence or absence of intellectual disability (see also Silleresi et al., 2020). To the best of our knowledge, the presence of spared and impaired language abilities in a group of autistic individuals with comparable nonverbal mental competence has been attested for structural language aspects but has not been discussed for pragmatic tasks. This raises the question of whether the results of the present study are imputable to the well-known phenotypical variability and heterogeneity of linguistic and cognitive abilities in the autistic condition or whether future and more fine-grained analyses might delineate the profile of individuals who are able to comprehend ironic remarks and to attribute complex mental states to the others. Finally, it is still a matter of debate whether autistic individuals who exhibit good or even optimal performance in irony comprehension tasks are also able to understand and appreciate irony in real-life situations (Mognon et al., 2021). More generally, additional research is needed to test the replicability of the current findings and to address the following limitations. The number of participants in the current study was limited; a larger population sample, and in particular a larger sample including children with low nonverbal IQ, should make it possible to better address the question of the number and relative prevalence of patterns of responses on the comprehension of ironic criticisms and ironic compliments in autistic children. Regarding the identification of clusters of abilities in autism, we think that several pieces are still missing from the puzzle. In the irony comprehension task, the group of autistic children lagged behind both comparison groups of TD children and the younger groups matched for LA. This result might indicate that either autistic children are simply delayed, and they might possibly catch up as they grow older, or they are impaired, showing little or no spontaneous improvement with age. The fact that in our study we did not find correlations between CA on the one hand and irony comprehension (and ToM scores) on the other hand might be read as indirect evidence that irony recognition constitutes a real deficit (and not a simple delay) in autistic individuals. Nevertheless, as already noted, these data must be read cautiously because we do not know the extent to which they can be generalized. We believe that a longitudinal approach may help show the evolution of profiles across time and introduce further implications for the clinical aspects of this condition. We thus suggest that in future research, the concept of chronogeneity, that is, the heterogeneity of profiles of abilities in relation to the dimension of time, should be introduced (Georgiades et al., 2017).
Our work has taken a step forward in identifying the comprehension of irony in autistic children, but other factors that we were unable to investigate in this study should be taken into consideration in future analyses, such as executive functions, which might play an important role in the development of irony comprehension and/or of mentalistic abilities. Moreover, the use of gaze detection techniques or neuroimaging could shed light on processing function in the two profiles we detected in our study. Finally, additional research is needed to investigate how the comprehension of irony may evolve from a longitudinal perspective. Concerning the comparison with other clinical groups, future research could include a direct comparison with children with Social Communication Disorder (DSM-5) and children with William Syndrome to determine whether the phenotypical realizations of irony comprehension in these conditions are different. Nevertheless, we believe that the results of the present investigation raise interesting questions that future research might fruitfully tackle.