Background
Emotion dysregulation (ED) – difficulties with how one manages emotions – is a critical transdiagnostic construct in mental health (Aldao, Gee, De Los Reyes, & Seager, Reference Aldao, Gee, De Los Reyes and Seager2016; Sloan, et al., Reference Sloan, Hall, Moulding, Bryce, Mildred and Staiger2017). ED is an important target in treatments for mood, anxiety, substance use, and other disorders (see Antuña-Camblor et al., Reference Antuña-Camblor, Gómez-Salas, Burgos-Julián, González-Vázquez, Juarros-Basterretxea and Rodríguez-Díaz2024; Lincoln, Schulze, & Renneberg, Reference Lincoln, Schulze and Renneberg2022, for review). While ED is not part of the diagnostic criteria for autism spectrum disorder, autistic individuals have heightened rates of ED, with studies finding that autistic youth are two to four times more likely to have clinically elevated levels of ED than community samples (Conner et al., Reference Conner, Golt, Shaffer, Righi, Siegel and Mazefsky2021; Northrup et al., Reference Northrup, Hartman, MacKenzie, Sivathasan, Eldeeb and Mazefsky2024). A meta-analysis found that autistic samples have significantly higher ED compared to nonautistic groups (most of which were youth with other developmental disabilities), although severity compared to mental health diagnostic groups is minimal (McDonald, Cargill, Khawar, & Kang, Reference McDonald, Cargill, Khawar and Kang2024). Prior research has suggested multiple factors that may contribute to increased ED in autism, including differences in cognitive flexibility and emotion recognition, difficulties with inhibitory control, and differences in sensory processing that lead to increased environmental triggers (Cai et al., Reference Cai, Richdale, Uljarevic, Dissanayake and Samson2018, Reference Cai, Richdale, Dissanayake, Trollor and Uljarević2019). Additionally, chronic experiences of discrimination and invalidation, reported by many autistic individuals, are likely an important contributor to ED (Beck et al., Reference Beck, MacKenzie, Kumar, Breitenfeldt, Chang, Conner and Mazefsky2024). Importantly, most extant research on ED in autism has focused on youth samples despite evidence that autistic adults also experience ED (McDonald, Cargill, Khawar, & Kang, Reference McDonald, Cargill, Khawar and Kang2024) and autistic adults identify mental health as a high research priority (Benevides et al., Reference Benevides, Shore, Palmer, Duncan, Plank, Andresen and Coughlin2020; Gotham et al., Reference Gotham, Marvin, Taylor, Warren, Anderson, Law and Lipkin2015).
Studies of ED often utilize commercially available questionnaires with options for different reporters and lifespan assessment. However, these often lack specificity to ED, such as ASEBA scales (Achenbach, Reference Achenbach2009), focused on mental health broadly, while others include ED secondary to a primary focus on other constructs, such as the BRIEF (Gioia, Isquith, Guy, & Kenworthy, Reference Gioia, Isquith, Guy and Kenworthy2015), which focuses on executive functioning. In youth, a psychometric evidence-based update on questionnaires specifically focused on ED (and related constructs of emotion regulation strategy use or irritability) found that many questionnaires lack strong psychometric properties (Mazefsky et al., Reference Mazefsky, Conner, Breitenfeldt, Leezenbaum, Chen, Bylsma and Pilkonis2021). Commonly used strategy-focused questionnaire measures include the Emotion Regulation Questionnaire (Gross & John, Reference Gross and John2003), which has both adult and child/adolescent self-report versions, and the Difficulties with Emotion Regulation Scale (Gratz & Roemer, Reference Gratz and Roemer2004), a self-report measure for adults that has been used with adolescents. More general measures of ED include the Affective Reactivity Index (Stringaris et al., Reference Stringaris, Goodman, Ferdinando, Razdan, Muhrer, Leibenluft and Brotman2012), which focuses on irritability and anger, and has both self- and parent-report versions, and the Emotion Regulation Checklist (Shields & Cicchetti, Reference Shields and Cicchetti1997), a parent-report youth questionnaire that yields an emotional lability/negativity subscale and an emotion regulation subscale (adaptive skills). While many of these measures have been used with autistic samples, they were not developed with autistic samples, and most lack validation with autistic people.
The Emotion Dysregulation Inventory (EDI) was developed originally to measure the full range of ED in autism. The initial EDI assessed two constructs: (i) Reactivity, which captures rapidly escalating, intense, negative affect that is poorly regulated; and (ii) Dysphoria, which captures anhedonia, unease, and sadness (Mazefsky et al., Reference Mazefsky, Day, Siegel, White, Yu and Pilkonis2018a, Reference Mazefsky, Yu, White, Siegel and Pilkonis2018b). The two-factor structure of the EDI was replicated and validated in a sample of proxy reporters for 1,000 6- to 17-year-old community youth matched to the US Census on demographic characteristics (Mazefsky, Yu, & Pilkonis, Reference Mazefsky, Yu and Pilkonis2021). It was also later modified for use in early childhood, ages 2–5 years old (Day, Northrup, & Mazefsky, Reference Day, Northrup and Mazefsky2023). Again, psychometric evaluation supported the two-factor structure with scales for Reactivity and Dysphoria in both a sample of young children with autism and other developmental disabilities and a general early childhood sample (Day et al., Reference Day, Mazefsky, Yu, Zeglen, Neece and Pilkonis2024). The two factors of the EDI capture a fuller picture of ED, as Reactivity captures challenges in downregulating negative emotions (intense, labile, and long-lasting), while Dysphoria captures difficulties with upregulating positive emotions. Since previous research has observed associations between ED and multiple forms of psychopathology in both autistic and nonautistic samples (Antuña-Camblor et al., Reference Antuña-Camblor, Gómez-Salas, Burgos-Julián, González-Vázquez, Juarros-Basterretxea and Rodríguez-Díaz2024; McDonald, Cargill, Khawar, & Kang, Reference McDonald, Cargill, Khawar and Kang2024), the EDI subscales, as opposed to broader measures of ED or measures of specific strategy use, can specifically benefit research into the processes underlying different forms of psychopathology.
Since its initial development, the EDI has been adopted worldwide on six continents, in over 50 countries, and across the United States. The EDI has been recommended for clinical measurement of irritability (Carlson et al., Reference Carlson, Singh, Amaya-Jackson, Benton, Althoff, Bellonci and McClellan2022) and has shown utility as an outcome measure in clinical trials in autism and other populations (e.g. Groves et al., Reference Groves, Chan, Marsh, Gaye, Jaisle and Kofler2022; Pickard, Maddox, Boles, & Reaven, Reference Pickard, Maddox, Boles and Reaven2024; Shaffer et al., Reference Shaffer, Schmitt, Reisinger, Coffman, Horn, Goodwin and Erickson2023; Smith et al., Reference Smith, Sherwood, Sung, Williams, Ross, Sharma and Steacy2023; White, Conner, Beck, & Mazefsky, Reference White, Conner, Beck and Mazefsky2025). It has been applied to advance mechanistic understanding of ED in autistic and other samples, such as in conjunction with psychophysiology (Chiu et al., Reference Chiu, Ip, Ching, Wong, Lui, Tse and Wong2024; Greenlee et al., Reference Greenlee, Lorang, Olson, Rodriquez, Yoon and Hartley2024; Reisinger et al., Reference Reisinger, Goodwin, Horn, Schmitt, Coffman and Shaffer2024) and neurostimulation (Ni et al., Reference Ni, Chen, Hsieh, Wu, Chen, Juan and Lin2024). It has been used across multiple clinical settings, including psychiatric inpatient (Taylor, Reynolds, & Siegel, Reference Taylor, Reynolds and Siegel2021; Wieckowski et al., Reference Wieckowski, Luallin, Pan, Righi, Gabriels and Mazefsky2020), partial hospital (Kennedy et al., Reference Kennedy, Henderson-Davis, Henry, Hawks, Diaz, Crabbs and Anthony2024, Reference Kennedy, Wilkie, Henry, Moe-Hartman, President, Townson and Hawks2023), and orthopedic surgery (Criss et al., Reference Criss, Fajardo, Lee, Dubon and May2025) settings. The EDI has also been used in research on preschool-aged (Northrup et al., Reference Northrup, Hartman, MacKenzie, Sivathasan, Eldeeb and Mazefsky2024; Sivathasan et al., Reference Sivathasan, Eldeeb, Northrup, Antezana, Ionadi, Wakschlag and Mazefsky2024), minimally verbal (Ni et al., Reference Ni, Chen, Hsieh, Wu, Chen, Juan and Lin2024; Plesa Skwerer et al., Reference Plesa Skwerer, Joseph, Eggleston, Meyer and Tager-Flusberg2019), and autistic individuals with intellectual disability (Álvarez-Couto, García-Villamisar, & del Pozo, Reference Álvarez-Couto, García-Villamisar and del Pozo2024; Ferguson et al., Reference Ferguson, Drapalik, Liang, Hua, Feerst, Mallory and Vernon2021; Ni et al., Reference Ni, Chen, Hsieh, Wu, Chen, Juan and Lin2024).
Despite the contribution and utility of the proxy-report EDI for both research and clinical purposes, the lack of a self-report option has been a significant limitation. Only having proxy-report limits the ability to study ED and develop evidence-based treatments in adults, and fails to capture the contribution of the first-person perspective of ED. Self-report also allows for multi-reporter assessment, strengthening the information that clinicians and researchers might use to understand and treat ED (Deshpande, Rajan, Sudeepthi, & Abdul Nazir, Reference Deshpande, Rajan, Sudeepthi and Abdul Nazir2011; Nicolaidis et al., Reference Nicolaidis, Raymaker, McDonald, Lund, Leotti, Kapp and Zhen2020; De Los Reyes et al., Reference De Los Reyes, Augenstein, Wang, Thomas, Deborah, Burgers and Rabinowitz2015). The overall goal of this project was to develop a precise and efficient self-report questionnaire of emotion dysregulation that could be used in autism and other populations. We first aimed to adapt the EDI to create a self-report version for adolescents and adults, including conducting cognitive interviews with autistic adolescents and adults and incorporating their feedback into instructions, items, and item responses. Then, we performed psychometric evaluation and preliminary tests of validity in both an autism/intellectual and developmental disabilities sample and a general population sample, with the aim of (1) determining the factor structure, (2) evaluating items to identify the most sensitive and robust items, and (3) examining convergent and divergent validity and test–retest reliability.
Study 1: cognitive interviews
Methods
Item pool development
The development of the original proxy-report EDI began with a comprehensive literature review and the formulation of a conceptual model of ED to guide item generation (Mazefsky et al., Reference Mazefsky, Day, Siegel, White, Yu and Pilkonis2018a). For the initial EDI-SR item pool, we included the final 30 items from the proxy-report EDI. Then we added 16 items to ensure adequate coverage of Reactivity and Dysphoria manifestations relevant to adolescents and adults, and edited items to enhance comprehension. In all, 46 items were used in cognitive interviews. Next, we had autistic adolescents and adults review and provide feedback on the measure using cognitive interviewing. The cognitive interview process both aligns with PROMIS measure development guidelines (PROMIS, 2013) and allows for autistic people to consult on and improve the measure in accordance with participatory research approaches (Le Cunff et al., Reference Le Cunff, Ellis Logan, Ford, Martis, Mousset, Sekibo and Giampietro2023; Nicolaidis et al., Reference Nicolaidis, Raymaker, Kapp, Baggs, Ashkenazy, McDonald and Joyce2019, Reference Nicolaidis, Raymaker, McDonald, Lund, Leotti, Kapp and Zhen2020).
Cognitive interviews
Participants. Participants for EDI-SR cognitive interviews were recruited from past studies and local advertising. To be eligible, participants had to have either a prior autism diagnosis or score at threshold or higher on the Autism Diagnostic Observation Schedule, Second Edition (ADOS-2; Lord et al., Reference Lord, Rutter, DiLavore, Risi, Gotham and Bishop2012). Twenty-four autistic adolescents and adults completed cognitive interviews. Seven adolescents (four male, three female) ranged from 11 to 17 years old (mean = 14.4; SD = 1.90). Of the 17 adults (7 male, 10 female), ages ranged from 20 to 53 years old (mean = 33.0; SD = 10.5). The majority of participants identified as White (22; 91.7%; Black = 1; Missing = 1; Hispanic/Latine = 8.3%). Across the entire sample, Full Scale 2-IQ scores were mean = 107.88 (SD = 16.90; range 81–135). Reading level, as assessed by the WRAT-5 (Wilkinson & Robertson, Reference Wilkinson and Robertson2017), ranged from equivalent to 8th grade to college level (mean = 11.7 grade; SD = 1.5; 6 missing).
Procedures. For both cognitive interviews and other assessments, participants could choose to participate in-person or online using HIPAA-compliant Zoom. Participants reviewed all aspects of the EDI-SR, including the instructions, response sets, and each proposed item. Trained research staff used a structured PowerPoint document and prompted participants to read each instruction, response set, and item. For items, participants were asked to select a response and discuss aloud how they arrived at that response. Staff were trained to minimize bias; for example, they were instructed to avoid leading questions or reflective/summarizing responses and to only use open-ended questions with participants (e.g. ‘how did you decide on your answer?’ instead of ‘was this an easy question to answer?’). Additionally, participants were encouraged to provide general feedback regarding whether items were applicable to their life experiences, whether items were vague or confusing, and whether any other ED-related concepts were missing. Total interview time ranged from 25 minutes to 3 hours (mean = 70.91 minutes; SD = 42.40 minutes), often broken up into multiple 1-hour sessions based on participant preference. After cognitive interviews were completed, all item-level feedback was aggregated, the research team reviewed each individual item, response options, and instructions, and made revisions as warranted.
Results
Response options
Participant feedback resulted in changes to the response options. In the proxy-report EDI, the 5-point Likert scale was:
-
- 0 = ‘Not at All – Never Happens’
-
- 1 = ‘Mild – Present occasionally or does not cause too much of a problem’
-
- 2 = ‘Moderate – Happens less than half of the time or causes some problems’
-
- 3 = ‘Severe – Happens at least half of the time or substantially interferes’
-
- 4 = ‘Very Severe – Almost always happens or causes a serious problem.’
However, in self-report cognitive interviews, participants were hesitant to characterize their reactions as ‘Very Severe’ even when the response description was appropriate. Thus, response options were changed to remove the global labels for severity. Instead, ‘Level 0’ to ‘Level 4’ were substituted, followed by the same descriptors for frequency or interference:
-
- 0 = ‘Level 0 – Never Happens’
-
- 1 = ‘Level 1 – Happens occasionally or does not cause too much of a problem’
-
- 2 = ‘Level 2 – Happens less than half of the time or causes some problems’
-
- 3 = ‘Level 3 – Happens at least half of the time or substantially interferes’
-
- 4 = ‘Level 4 – Happens at least half of the time or substantially interferes’
Item level feedback
The majority of items did not require editing following cognitive interviews. Overall, we simplified wording and removed ‘double-barreled’ responses based on feedback and our prior experience conducting interviews with autistic individuals (see Supplementary Table 1 for more information). For example, we split proxy-report item #38 ‘Tense or agitated and unable to relax’ into two items: ‘I was tense or agitated’ and ‘My emotions made it hard for me to relax’. Two items, both original proxy-report EDI items, were removed: #50 ‘I became upset without a clear reason’ and #52 ‘I had mood swings’. Item #50 was removed because many participants disliked ‘with no clear reason’ wording, explaining that they felt that their behavior always had a valid reason. Item #52 was removed because of confusion about or dislike of the ‘mood swings’ wording.
We also added alternate wording items to test in the large-scale analyses; for example, in addition to #64 ‘I felt uneasy through the day’, we added ‘I felt tense through the day’, and in addition to item #34 ‘My emotions went from 0 to 100 instantly’ we added ‘My emotions went from calm to out of control quickly’. Similarly, we added ‘My emotions made it hard for me to do what I needed to do’ to capture emotional responses that may not be visible to others.
In summary, 3 potential items were removed and 6 were added, resulting in a total of 49 items for psychometric testing.
Study 2: psychometric analyses
Methods
Participants
Participants included autistic individuals, individuals with other intellectual and developmental disabilities, and a community-based comparison group of adolescents and adults (see Table 1, and see Supplemental Table 2 for more information on adult sample). The minimum age was 11 years with no upper limit. All participants completed the study between October 2022 and August 2023. An autism-only portion of the sample was collected through the Simons Powering Autism Research (SPARK) registry. Inclusion in the SPARK registry requires a self-reported community diagnosis of an autism spectrum disorder. Previous research has found that 98.8% of a SPARK subsample had confirmed autism diagnoses through medical records (Fombonne, Coppola, Mastel, & O’Roak, Reference Fombonne, Coppola, Mastel and O’Roak2022).
Demographic information

Note: IDD= intellectual and developmental disabilities.
a Participants could select multiple developmental disability diagnoses.
In addition to the SPARK registry, we recruited from many local and national autism and intellectual and developmental disabilities organizations for autistic and other adolescents and adults with intellectual and developmental disabilities. These organizations included: Association for Autism and Neurodiversity; Autism Services; Education, Resources, and Training Collaborative of Pennsylvania; Autism Connection of Pennsylvania; Autism Science Foundation; Autistic Women & Non-binary Network; Autistics Association of Greater Washington; National Fragile X Foundation; and Williams Syndrome Registry.
To recruit the community comparison samples, we partnered with YouGov, an international polling company. YouGov initially contacted 589 parents of 11–17 year olds and 624 adult respondents, with the final sample of 1,000 selected to match the US population based on the 2019 American Community Survey. Participants were weighted using propensity scores for age, gender, race/ethnicity, years of education, and US region. YouGov participants were asked about autism and other intellectual and developmental disabilities diagnoses (see Table 1). Since this comparison sample was intended to be representative of a general community sample as opposed to a nonautistic control sample, all participants who indicated autism and other intellectual and developmental disability diagnoses were retained in the sample.
Overall, the autism/intellectual and developmental disabilities self-report sample consisted of 996 participants; 325 adolescents aged 11–17 years old, and 671 adults. The community comparison sample consisted of 1,000 participants; 500 adolescents aged 11–17 years old, and 500 adults.
Measures
Achenbach system of empirically based assessment (ASEBA; Achenbach, Reference Achenbach2009). The ASEBA includes the Adult Self-Report (ASR) and the Youth Self-Report (YSR). Each item is rated on a 0–2 (‘Not True’, ‘Somewhat or Sometimes True’, ‘Very True or Often True’) scale over the past 6 months. For this study, the Withdrawn/Depressed and Anxious/Depressed scales were used in correlations and concurrent calibrations with the EDI-SR subscales. Additionally, the following ASEBA scales were used in correlations with the autism/intellectual and developmental disability sample to examine convergent/divergent validity (due to time constraints in YouGov, self-report ASEBA measures were not administered in that sample): Depressive Problems, Anxiety Problems, and Externalizing Problems. Internal reliabilities for the ASR Withdrawn (α = 0.81), YSR Withdrawn/Depressed (α = 0.86), Anxious/Depressed (ASR α = 0.91, YSR α = 0.86), Depression Problems (ASR α = 0.87; YSR α = 0.80), Anxiety Problems (ASR α = 0.82, YSR α = 0.79), and Externalizing Problems (ASR α = 0.88, YSR α = 0.89) subscales were good.
Demographics questionnaire. A demographic questionnaire was developed for this study to obtain information on participant age, sex at birth, gender, education level and history, autism and intellectual and developmental disability diagnosis, current and history of psychiatric treatment, and verbal ability.
Emotion dysregulation inventory – self-report (EDI-SR). The EDI-SR item pool used for psychometric analyses consisted of 49 items, as described above. Items were rated on a 5-point Likert scale over the past 7 days. The EDI-SR forms and scoring are available for free via the measures request form at www.reaact.pitt.edu.
Affective reactivity index – self-report (ARI; Stringaris et al., Reference Stringaris, Goodman, Ferdinando, Razdan, Muhrer, Leibenluft and Brotman2012). The ARI is a widely used measure of irritability, defined as temper outbursts or becoming easily annoyed or angered. The ARI is a self-report that consists of six items rated on a 3-point Likert scale from ‘not true’ to ‘certainly true’ over the past 6 months that yields a total irritability score. Internal reliability for both the autism/intellectual and developmental disabilities sample (α = 0.89) and community comparison sample (α = 0.78) were good.
Emotion regulation questionnaire for children and adolescents (ERQ-CA; Gullone, Hughes, King, & Tonge, Reference Gullone, Hughes, King and Tonge2010; Gullone & Taffe, Reference Gullone and Taffe2012). The ERQ-CA is a modified version of the ERQ for adults (Gross & John, Reference Gross and John2003) that uses simplified wording and a 5-point Likert scale. We also chose to use the ERQ-CA for the adults in our samples for consistency across all ages and because it has less advanced language. Like the original ERQ, it consists of six Reappraisal items and four Suppression items that are rated based on what ‘seems most true for you’ (no timeframe). Both subscales demonstrated good internal consistency reliability in the autism/intellectual and developmental disabilities sample (Reappraisal α = 0.86; Suppression α = 0.75) and the community comparison sample (Reappraisal α = 0.87; Suppression α = 0.77).
Range and differentiation of emotional experiences (RDEES; Kang & Shaver, Reference Kang and Shaver2004). The RDEES consists of 14 items rated on a 5-point Likert scale (1 = ‘it does not describe me very well’ to 5 = ‘describes me very well’) without a specific timeframe. It yields two subscales: (i) Differentiation, to assess how much one distinguishes between emotions; and (ii) Range, which assesses the breadth of emotional intensity that one experiences. Due to timing constraints in YouGov, the RDEES was only administered to the autism/intellectual and developmental disabilities sample; both Differentiation (α = 0.91) and Range (α = 0.84) had good reliability.
PROMIS anxiety and depression – short forms (Irwin et al., Reference Irwin, Stucky, Langer, Thissen, Dewitt, Lai and DeWalt2010; Pilkonis et al., Reference Pilkonis, Choi, Reise, Stover, Riley and Cella2011). The PROMIS four-item short forms are a set of self- and caregiver-reported assessments of depression and anxiety. Each item is rated on a 5-point Likert scale over the past 7 days. In the current study, adult (18+) samples completed the self-report versions, and adolescent (11–17) caregivers completed the proxy-report versions. Reliability for the PROMIS depression scale in the autism/intellectual and developmental disabilities and community comparison samples (α = 0.90 and α = 0.94, respectively) was excellent. Similarly, reliability in the PROMIS anxiety scale in both groups (α = 0.89 and α = 0.87, respectively) was high.
Analyses
Factor analysis. We first examined the missingness of the EDI-SR data. Among the 49 EDI-SR items, 64.4% of the clinical sample missed one item, 4.4% missed two items, and <1% missed three or more items. There was no missingness for the comparison sample. We also examined the item frequency distributions for potential sparse cells (item responses with <5% may affect factor analysis and item response parameters) (Thissen, Reference Thissen2003). For the clinical sample, 15 items had <5% of the sample endorsing the highest response category (‘Level 4’), and 5 of these 15 items had <5% of the sample endorsing the two highest response categories combined (‘Level 3’ and ‘Level 4’). For the comparison sample, we observed 13 items with <5% of the sample endorsing the two highest response categories. Because only one item had a sparse cell with <10 observations, we chose to keep all items for factor analysis. The one exception was the item (‘I physically attacked people’) on which only seven participants in the clinical sample endorsed the highest response category. The autism and intellectual and developmental disabilities sample was randomly split for exploratory factor analysis (EFA; n = 512) and confirmatory factor analysis (CFA; n = 484). Both EFAs and CFAs were conducted using Mplus 6.2 with promax rotation (Muthén & Muthén, Reference Muthén and Muthén2007). We examined factor loadings, scree plots, and eigenvalues in the EFAs, with a focus on the ratios of eigenvalues in EFAs and the relative proportions of variance accounted for by the factors. We also emphasized the magnitude of factor loadings in EFA. Individual items were dropped because of factor loadings <0.45, cross-loadings between factors, and clinical judgment regarding content validity (Comrey & Lee, Reference Comrey and Lee1992; Tabachnick & Fidell, Reference Tabachnick and Fidell2001). Analyses were first conducted in the autism/intellectual and developmental disabilities sample and then repeated in the community comparison sample. Once the best-fitting EFA solution was determined, CFA was run to confirm the structure.
Item response theory (IRT) analysis. After completing EFA and CFA, we evaluated item-level properties. Consistent with the original EDI, we used a two-parameter graded response model (GRM; Samejima, Reference Samejima1969). The GRM has a slope parameter and n − 1 threshold parameters for each item, where n is the number of response categories. The slope parameter measures item discrimination, that is, how well the item differentiates higher versus lower levels of severity (or Ɵ in IRT terms). Useful items have larger slope parameters. Threshold parameters measure item difficulty, that is, the ease versus difficulty of endorsing different response options for an item. For example, the first threshold parameter for an item tells us where along the Ɵ scale of severity a respondent is more likely to endorse a response of ‘rarely’ rather than ‘never’. Items remaining in the pool for each factor were calibrated with the two-parameter GRM using IRTPRO 3.1 (Cai, Thissen, & du Toit, Reference Cai, Thissen and du Toit2011). Local dependency (LD) marginal chi-square analyses identified redundant items due to high LD (residual correlations) with other items after controlling for their Ɵ levels, and one item of each LD pair was removed (with rare exceptions when the content of both items was judged to be clinically valuable and both were therefore retained).
Differential item functioning (DIF) analysis. DIF occurs when participant characteristics, such as sex or age, affect measurement. DIF analyses flag an item if it is more or less difficult to endorse or more or less discriminating for different subgroups after controlling for comparable Ɵ levels. We conducted DIF analyses for age (< vs. > = 18.0 years) and sex. For DIF analyses, we used the Wald test (Lord, Reference Lord and Poortinga1977) with the improved Supplemental EM algorithm (Cai, Reference Cai2008) embedded in IRTPRO 3.1 (Cai, Thissen, & du Toit, Reference Cai, Thissen and du Toit2011). Items with significant DIF (p < .01) received further review for potential elimination.
Short-form item selection. Computerized adaptive testing (CAT) simulations were conducted to select items for a static short form for Reactivity due to the length of the Reactivity scale (a short form is defined as 4–10 items per PROMIS guidance; thus, Dysphoria did not require a short form) (‘PROMIS introduction’, 2021). The simulations were performed using the Firestar program (Choi, Reference Choi2009). The minimum number of items to be administered was set to 7, and the maximum number of items to be administered was the total for the full item bank.
Convergent divergent validity and internal consistency. We examined the correlations between scores on the EDI domains with ARI, ERQ-CA, ASEBA (Depressive Problems, Anxiety Problems, and Externalizing Problems), and RDEES scores. We also calculated Cronbach’s alpha internal consistencies for each subscale.
Concurrent calibrations with ARI and ASEBA scales. Concurrent calibrations provide estimated item parameters for multiple measures using the same latent trait scale. For this study, we co-calibrated the EDI-SR with commonly used measures of related constructs (i.e. the EDI-Reactivity score with the ARI and the EDI-Dysphoria scale with ASR/YSR Anxious/Depressed and Withdrawn/Depressed scales).
Test–retest reliability. Paired t-tests were used to compare 4-week retest scores in a subsample of the autism/intellectual and developmental disability group.
Results
Table 1 reports demographics and descriptive characteristics of the autism/intellectual and developmental disabilities group and the community comparison group. These groups are presented separately to mirror the analyses. All analyses were conducted on the autism/intellectual and developmental disabilities group (i.e. clinical group) and community comparison group (i.e. general group) separately, and decisions were made after examining each group.
Factor analysis
Before conducting EFAs/CFAs using the full sample, we first ran EFAs separately by age groups (11–17 years vs. 18+ years) for both the clinical and community comparison samples to determine whether the age groups had different factor structures. After comparing the eigenvalue plots, factor structures, and factor loadings between the age groups, we found similar EFA results. Thus, we chose to run subsequent factor analyses using the full, age-combined samples.
Each group was split evenly into two groups for exploratory factor analysis (EFA; clinical group = 512; general group = 502) and confirmatory factor analysis (CFA; clinical group = 484; general group = 498). Using EFA, 1- to 3-factor solutions were generated for each group. After examining the scree plots, eigenvalues, and clinical interpretation of the factors for the autism/intellectual and developmental disabilities group, the general group showed a strong 1-factor solution while the 3-factor solution from the clinical group had more interesting findings: The 10 Dysphoria items loaded on the third factor of the 3-factor solution and were conceptually different from the rest of the items that loaded on the first and second factors. Roughly, items loaded on the first and second factors (with some cross-loadings) were items related to Reactivity, while items loaded on the third factor were related to Dysphoria, which was consistent with the original EDI. To further evaluate this difference between Reactivity- and Dysphoria-related items, for the general group, we compared a lumped approach (i.e. the single factor solution on all 49 items) to the split approach (i.e. the 10 items loaded on the third factor from the clinical group were carved out for separate factor analysis from the rest of the 39 items). For the general group, the split approach showed better fit indices with comparable factor loadings than the lumped approach. Therefore, for both clinical and general samples, we decided to take the split approach for further factor analysis. That is, 10 items related to Dysphoria were treated separately from the remaining 39 items for further CFA analysis.
Separate single-factor CFAs were conducted to confirm unidimensionality in 39 Reactivity-related items in the autism/intellectual and developmental disability sample, as well as the general sample. The initial round of CFAs showed all 39 items with factor loadings >0.60, which included 31 items with factor loadings from 0.73 to 0.93. To improve fit indices, we removed 9 items with smaller loadings, resulting in a 31-item scale. Fit indices on the 31-item scale were strong (CFI = 0.94; TLI = 0.94; SRMR = 0.05; RMSEA = 0.10). The one-factor CFA for the community comparison group was similar (CFI = 0.96; TLI = 0.96; SRMR = 0.05; RMSEA = 0.06).
Next, separate CFAs were conducted for the 10 Dysphoria-related items for the clinical and general samples. The range of factor loadings on the 10 items was from 0.658 to 0.888. Fit indices for the clinical sample were strong (CFI = 0.97; TLI = 0.97; SRMR = 0.05; RMSEA = 0.06). Results for the community comparison group were similar (CFI = 0.98; TLI = 0.97; SRMR = 0.04; RMSEA = 0.15). The 10 Dysphoria items were conceptually similar to the Dysphoria scale on the original, proxy EDI, covering sadness, unease, and limited ability to experience positive affect.
In summary, following factor analysis, 8 items were dropped, leaving 31 Reactivity and 10 Dysphoria items for IRT analyses.
IRT
IRT calibrations were conducted separately in the two groups using the two-parameter GRM in IRTPRO 3.1. For Reactivity in the autism/intellectual and developmental disabilities sample, four items were removed due to age-related DIF, one item due to age-related DIF and local dependency, and one item because of local dependency. In the community comparison sample, these items all displayed the same local dependency or significant sex-related DIF, further justifying their removal. Thus, the final Reactivity scale included 25 items.
For the Dysphoria scale in the autism/intellectual and developmental disabilities sample, one item was removed due to lower discrimination parameters, one item was removed due to model misfit, and one item was removed due to both model misfit and local dependency. In the community comparison sample, the same items also showed local dependency. Thus, the final version of the Dysphoria scale consisted of seven items. The final IRT parameters are reported in Tables 2 and 3.
Item parameter estimates for reactivity in descending order of the slope parameter

a Denotes short form items. Column a displays the slope parameter (how well the item discriminates between respondents with low or high reactivity). Columns b1–b4 display threshold values for individual responses (low threshold values indicate that the item is sensitive to low severity levels and high threshold values indicate that the item is sensitive to high severity levels).
Item parameter estimates for dysphoria in descending order of the slope parameter

Note: Column a displays the slope parameter (how well the item discriminates between respondents with low or high dysphoria). Columns b1–b4 display threshold values for individual responses (low threshold values indicate that the item is sensitive to low severity levels and high threshold values indicate that the item is sensitive to high severity levels).
The Flesh-Kincaid Grade Level for the final EDI-SR items was 4.2, equivalent to a US fourth-grade level.
Short-form item selection
To be able to provide a static short form administration, we rank-ordered Reactivity items on four criteria: discrimination parameters, the percentage of times the item would have been selected in a simulated seven-item CAT selection using our calibration sample, expected information under the standard normal distribution with a mean of 0 and SD of 1, and expected information under a normal distribution with a larger SD, that is, a mean of 0 and SD of 1.5 (Choi et al., Reference Choi, Reise, Pilkonis, Hays and Cella2010). The simulations were performed using the Firestar program (Choi, Reference Choi2009). We set the test length to seven to mirror the length of the original EDI short form. Based on the convergence of the four psychometric criteria, the content of candidate items, and location parameters, we decided that a six-item short form was the best outcome for the EDI-SR. See Table 2 for the items selected. The Reactivity short form was highly correlated (r = 0.96) with the full form. Three of the EDI-SR Reactivity short form items are the same as the EDI-proxy report short form, while the remaining three items are similar in content (e.g. EDI-Proxy: ‘emotions go from 0 to 100 instantly’; EDI-SR: ‘my emotions went from calm to out of control quickly’). We did not develop a short form for Dysphoria given that the final item set was seven items.
Concurrent calibrations
We conducted concurrent calibrations for the Reactivity short form with the ARI and for Dysphoria with the Anxious/Depressed and Withdrawn/Depressed scales from the ASR and YSR. Figure 1 displays the test information curves from the autism/intellectual and developmental disabilities sample. Both the Reactivity short form and Dysphoria provided more test information, even in cases where the EDI scales had fewer items.
Concurrent calibrations of Reactivity and Dysphoria with related measures in the autism/IDD sample.

Convergent and divergent validity, internal consistency
Correlations with related measures revealed expected patterns; specifically, both Reactivity and Dysphoria scores were significantly correlated with other ER subscales (ERQ-CA), with larger correlations as compared to measures of emotion recognition (RDEES subscales). Reactivity was more positively and highly correlated with ARI compared to Dysphoria. Similarly, ASEBA self-report Depressive Problems and Anxiety Problems were more highly and positively correlated with Dysphoria, whereas Externalizing Problems were more highly and positively correlated with Reactivity (Table 4).
Correlations between EDI scores and related measures

Note: ***p < .001; **p < .01; *p < .05.
In the autism/intellectual and developmental disabilities sample, internal consistency was excellent (Reactivity α = 0.97; Reactivity short form α = 0.92; Dysphoria α = 0.92). Internal consistency was similarly high in the community comparison group (Reactivity α = 0.98; Reactivity short form α = 0.93; Dysphoria α = 0.91).
Test–retest reliability
The stability of Reactivity, Reactivity Short Form, and Dysphoria scores was examined in a subset (n = 730) of the autism/intellectual and developmental disabilities sample who completed the EDI-SR again 4 weeks later. As expected, scores were highly and significantly correlated (Reactivity r = 0.86; Reactivity SF r = 0.84; Dysphoria r = 0.87, all p < .001).
Discussion
The current work documents the successful adaptation of the proxy-report Emotion Dysregulation Inventory (EDI) to self-report for ages 11 years through adulthood. Similar to the original, proxy-report EDI, the EDI-SR items were developed, underwent cognitive interviews, and were tested in psychometric analyses with 996 autistic individuals and individuals with intellectual and developmental disabilities, and 1,000 individuals in a community comparison sample. The final EDI self-report (EDI-SR) consists of a 25-item Reactivity scale, a 7-item Dysphoria scale, and a 6-item Reactivity short form scale, offering options similar to the original, proxy-report EDI (Mazefsky et al., Reference Mazefsky, Day, Siegel, White, Yu and Pilkonis2018a, Reference Mazefsky, Yu, White, Siegel and Pilkonis2018b).
Overall, minimal changes to proposed EDI-SR items were needed based on cognitive interviews. After analyses, the EDI-SR resembled the two-factor structure of the proxy-report EDI, with a Reactivity scale and a Dysphoria scale. Factor analyses did not support different scales for adolescent versus adult self-report samples. Similar to the original, proxy-report version of the EDI, both the Reactivity and Dysphoria scales had good psychometric properties and encouraging preliminary results regarding test–retest reliability and convergent validity with legacy measures assessing comparable constructs. Given that both the original proxy report of the EDI and the EDI-SR generate Reactivity and Dysphoria scores, multi-reporter assessment (self and proxy) is now feasible in clinical and research settings.
The availability of the EDI-SR allows for more research to focus on self-report and ED in autistic adults. Treatment options for autistic adults are lacking compared to youth (Maddox et al., Reference Maddox, Dickson, Stadnick, Mandell and Brookman-Frazee2021; Pantazakos & Vanaken, Reference Pantazakos and Vanaken2023), even though co-occurring mental health conditions are often a top priority for research (Benevides et al., Reference Benevides, Shore, Palmer, Duncan, Plank, Andresen and Coughlin2020; Gotham et al., Reference Gotham, Marvin, Taylor, Warren, Anderson, Law and Lipkin2015). Autistic adults commonly present with mental health concerns in settings that are not autism specialty clinics, so it is beneficial to have a measure suitable across populations. The availability of the EDI-SR allows for a broader range of ages to use the EDI, given that proxy reports are less common among adults. Additionally, the EDI-SR provides a measure of ED rather than a focus on emotion regulation strategies, unlike commonly used adult-focused measures such as the Difficulties in Emotion Regulation Scale (Gratz & Roemer, Reference Gratz and Roemer2004) and Emotion Regulation Questionnaire (Gross & John, Reference Gross and John2003). While we compared the EDI-SR to other common ED and mental health measures, other than the PROMIS measures, these measures include longer timeframes or ask participants to rate responses based on what is typical of them, which may explain differences seen in convergent validity and co-calibration analyses. Furthermore, EDI-SR response options merge frequency and interference levels, which may also result in different responses than a measure focused on one aspect. Having a measure of ED is particularly useful for understanding and tracking impairment and progress with intervention. Importantly, the EDI-SR also allows change-sensitive assessment, which is needed in the context of intervention targeting ED or interventions in which ED may be a secondary outcome.
Limitations of the current study include reliance on online sampling and participants who were mostly white and non-Hispanic/Latine (particularly in the autism/intellectual and developmental disabilities group). Only a portion of our sample included matched pairs (proxy and self-reporter), and understanding reporter differences was not the goal of this study. Nonetheless, there is a rich literature on reporter correspondence, and this is an interesting area for future research, particularly in adolescence, when obtaining both reporters is most common, and in autism samples, given that communication of emotion may differ. Further, more studies are needed to examine how self-reported ED may provide unique or complementary information about ED, particularly in autistic samples. Future research with more individuals with other intellectual and developmental disabilities who can self-report is needed. Future research that utilizes other clinical samples, beyond autism, is also needed to better understand how well the EDI-SR may measure ED in other populations with increased ED. Finally, future research with the EDI-SR to assess sensitivity to change in interventions targeting ED is vital to ensure that the EDI-SR can be used effectively in clinical settings.
In sum, the newly developed EDI-SR provides a psychometrically strong questionnaire to measure reactivity and dysphoria in adolescents and adults. A self-report measure that provides a change-sensitive measure of ED is an important addition for research and clinical use, particularly in autistic adults, where research in ED has been limited by measurement options. The ability to conduct multiple-reporter assessment with the EDI-SR and proxy EDI will also encourage more research into reporter discrepancies and the ability to assess how ED constructs like reactivity and dysphoria influence outcomes.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0033291726103857.
Acknowledgments
The authors would like to thank the families who participated in this research. The authors are grateful to the staff at Down Syndrome Connect through the National Institutes of Child Health and Human Development, the Global Prader-Willi Syndrome Registry, the Williams Syndrome Registry, the University of North Carolina’s Carolina Institute for Developmental Disabilities’ Fragile X Registry, and other registries for individuals with intellectual and developmental disabilities for their support during recruitment. The complete dataset is available through the NIMH Data Archive Collection C3514.
Funding statement
This work was supported by NICHD R01HD079512 to CM. Participant recruitment was also supported by a NIH grant to the University of Pittsburgh Clinical and Translational Science Institute (UL1TR001857). The authors are grateful to the families in SPARK, the SPARK clinical sites, and the SPARK staff. The authors appreciate obtaining access to recruit participants through SPARK research match on SFARI Base. The authors would like to thank YouGov for supporting the collection of their general population sample.
Competing interests
The authors declare none.
Ethical standard
The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.