Introduction
Regarding mental health problems of children and adolescents, a distinction is often made between internalizing and externalizing disorders. Internalizing disorders encompass affective and anxiety disorders, whereas externalizing disorders include impulse control problems such as attention deficit hyperactivity disorder (ADHD) and conduct problems (CP; American Psychiatric Association, 2013). In early childhood, externalizing behaviors are often developmentally typical but may reach clinically relevant levels when they persist or intensify (Wakschlag et al., Reference Wakschlag, Estabrook, Petitclerc, Henry, Burns, Perlman, Voss, Pine, Leibenluft and Briggs-Gowan2015). Externalizing symptoms, such as aggression and rule-breaking, vary in severity and reflect a dimensional continuum rather than discrete categories (Coghill & Sonuga-Barke, Reference Coghill and Sonuga-Barke2012; Forslund et al., Reference Forslund, Brocki, Bohlin, Granqvist and Eninger2016; Wakschlag et al., Reference Wakschlag, Estabrook, Petitclerc, Henry, Burns, Perlman, Voss, Pine, Leibenluft and Briggs-Gowan2015).
Externalizing symptoms
Externalizing symptoms can significantly affect a child’s socioemotional development and academic performance (Brennan et al., Reference Brennan, Shaw, Dishion and Wilson2012; Breslau et al., Reference Breslau, Miller, Breslau, Bohnert, Lucia and Schweitzer2009; Metcalfe et al., Reference Metcalfe, Harvey and Laws2013). Clinically relevant externalizing disorders are associated with considerable stress and challenges for children and their families. The symptoms appear early and tend to persist (Erskine et al., Reference Erskine, Norman, Ferrari, Chan, Copeland, Whiteford and Scott2016; McElroy et al., Reference McElroy, Belsky, Carragher, Fearon and Patalay2018; Ravens-Sieberer et al., Reference Ravens-Sieberer, Wille, Bettge and Erhart2007). Consistent with the increasing focus on a dimensional perspective (Hirjak et al., Reference Hirjak, Schwarz and Meyer-Lindenberg2021), a complex interaction between interdependent individual vulnerabilities (e.g., genetic, epigenetic) and equally interdependent contextual risk factors (e.g., harsh parenting, bullying, deviant or criminal environment) is assumed to affect development trajectories of mental health symptoms in minors (Beauchaine & McNulty, Reference Beauchaine and McNulty2013). Early temperament can predict later onset of those symptoms (Golm & Brandt, Reference Golm and Brandt2023). To understand why some children develop externalizing problems while others do not, early-emerging temperament traits provide a key explanatory framework.
Temperament
Temperament, defined by individual differences in emotional, motor, and attentional reactivity and self-regulation, plays a critical role in shaping a child’s behavioral responses to the environment (Rothbart et al., Reference Rothbart, Sheese and Posner2007; Rothbart & Bates, Reference Rothbart, Bates, Damon, Lerner and Eisenberg (Hrsg.).2006). Reactive temperament refers to early-emerging, biologically based differences in how strongly and how quickly individuals respond to positive or negative stimulation, reflecting the intensity and threshold of emotional, behavioral, and physiological responses (Rothbart & Derryberry, Reference Rothbart and Derryberry1981; Rothbart & Rueda, Reference Rothbart, Rueda, Mayr, Awh and Keele (Hrsg.)2005).
Three broad components are central: Negative Affect (NA) and Surgency, which represent reactive dimensions, and Effortful Control (EC), which denotes the self-regulatory capacity to modulate these reactive tendencies through attention and inhibition. Because temperament is both biologically grounded and shaped by experience, it shows substantial developmental continuity, especially for reactive traits such as NA and Surgency, and greater plasticity for EC (Rothbart & Rueda, Reference Rothbart, Rueda, Mayr, Awh and Keele (Hrsg.)2005).
Developmental stability
Developmental stability can be described at complementary levels that differ in focus and timescale. Mean-level stability refers to changes in the average level of a characteristic within a group over time, indicating whether a trait becomes generally stronger or weaker with age (Caspi et al., Reference Caspi, Roberts and Shiner2005). Rank-order stability describes the extent to which individuals maintain their relative position compared to others within the group, even when average levels change (Caspi & Roberts, Reference Caspi and Roberts2001). Beyond these interindividual perspectives a more dynamic form of within-person stability captures how individuals tend to return toward or deviate from their own typical level over time, reflecting an ongoing intraindividual regulatory process (Grimm et al., Reference Grimm, An, McArdle, Zonderman and Resnick2012; McArdle, Reference McArdle2009; Nesselroade & Ram, Reference Nesselroade and Ram2004). Temperament is thus both stable and malleable, reflecting ongoing interactions between maturation and experience (Putnam et al., Reference Putnam, Rothbart and Gartstein2008; Rothbart & Rueda, Reference Rothbart, Rueda, Mayr, Awh and Keele (Hrsg.)2005). The following sections review evidence for these dynamics in NA, Surgency, and EC.
Reactive temperament: negative affect
NA reflects biologically rooted but contextually shaped reactivity to negative or punishing stimuli (Rothbart & Bates, Reference Rothbart, Bates, Damon, Lerner and Eisenberg (Hrsg.).2006). Longitudinal studies show moderate-to-high rank-order stability and small but systematic mean-level changes during early childhood. Bornstein et al. (Reference Bornstein, Putnick, Gartstein, Hahn, Auestad and O’Connor2015) found moderate rank-order stability and curvilinear mean-level change, with NA peaking in toddlerhood and slightly declining afterward. Schmidt et al. (Reference Schmidt, Aschersleben and Henning2025) reported comparable results in a large German sample, confirming both rank-order stability and age-related mean-level decreases between ages 3 and 6. These findings align with the view that NA reflects a reactive disposition that remains relatively stable in individual ranking but can vary in intensity across developmental stages (Caspi et al., Reference Caspi, Roberts and Shiner2005; Caspi & Roberts, Reference Caspi and Roberts2001).
Cross-sectionally, Heinze et al. (Reference Heinze, Daseking, Gawrilow, Gunzenhauser, Karbach, Ulitzka and Kerner Auch Koerner2025) found that NA was positively associated with ADHD and CP, confirming the concurrent link between reactive temperament and externalizing behavior in preschool age. Wichstrøm et al., (Reference Wichstrøm, Penelo, Rensvik Viddal, Osa and Ezpeleta2018) demonstrated that higher NA at age 4 predicted increases in ADHD and CP symptoms from preschool to middle childhood. Kostyrka-Allchorne et al. (Reference Kostyrka-Allchorne, Wass and Sonuga-Barke2020) observed a similar effect, with early NA forecasting internalizing and externalizing symptoms at age 7, particularly in children with low self-regulation. Gartstein et al. (Reference Gartstein, Putnam and Rothbart2012) likewise found that NA in toddlerhood predicted later behavior problems, especially when EC was low. More recent evidence extends this pattern into adolescence. Harris et al. (Reference Harris, LeBeau and Petersen2025) reported that NA at age 4 predicted a general psychopathology factor, as well as later externalizing problems at age 15, underlining the long-term relevance of early reactivity.
NA is therefore best conceptualized as a relatively stable yet developmentally responsive trait that shapes trajectories of externalizing behavior through both its enduring rank-order position and its interaction with self-regulatory and contextual processes (Putnam et al., Reference Putnam, Rothbart and Gartstein2008; Rettew & McKee, Reference Rettew and McKee2005).
Reactive temperament: surgency
Whereas NA represents sensitivity to negative or punishing stimuli Surgency reflects approach-related tendencies toward positive stimulation and reward. Moderate levels promote exploration and engagement, but high levels may increase impulsivity and poorly regulated behavior (De Pauw & Mervielde, Reference De Pauw and Mervielde2010; Nigg, Reference Nigg2006). Empirical findings indicates moderate-to-high rank-order stability and modest mean-level change in Surgency during early childhood. Bornstein et al. (Reference Bornstein, Putnick, Gartstein, Hahn, Auestad and O’Connor2015) found increasing mean-level Surgency from infancy to toddlerhood and substantial rank-order stability, suggesting a developmental consolidation of approach tendencies. Schmidt et al. (Reference Schmidt, Aschersleben and Henning2025) replicated this pattern between ages 3 and 6 in a large German sample, again showing high rank-order stability and age-related mean-level increases. These findings suggest that Surgency is a relatively stable dimension in relative ranking but shows gradual mean-level increases as children gain autonomy and social experience (Caspi et al., Reference Caspi, Roberts and Shiner2005; Caspi & Roberts, Reference Caspi and Roberts2001).
Cross-sectionally, Heinze et al. (Reference Heinze, Daseking, Gawrilow, Gunzenhauser, Karbach, Ulitzka and Kerner Auch Koerner2025) confirmed that higher Surgency was associated with more externalizing symptoms, suggesting that approach-related tendencies foster both engagement and impulsivity. Several longitudinal studies highlight developmental continuity of Surgency in relation to externalizing problems. Wichstrøm et al. (Reference Wichstrøm, Penelo, Rensvik Viddal, Osa and Ezpeleta2018) found that higher Surgency at age 4 predicted increases in ADHD and conduct symptoms through childhood. Jonas and Kochanska (Reference Jonas and Kochanska2018) showed that toddlers with high Surgency developed more disruptive behavior when EC was low, indicating a regulatory moderation effect. Similarly, Gartstein et al. (Reference Gartstein, Putnam and Rothbart2012) reported that early Surgency predicted later problem behavior, particularly when EC was weak. Dollar and Stifter (Reference Dollar and Stifter2012) extended this to social domains: high Surgency combined with poor emotion regulation predicted peer problems. These results illustrate that Surgency shows developmental continuity in its behavioral expression, maintaining approach motivation but manifesting differently depending on regulatory development and environmental context.
Surgency therefore represents a dual-sided developmental disposition that is adaptive when sufficiently regulated but can become maladaptive when environmental demands are high (Beauchaine, Reference Beauchaine2001; De Pauw & Mervielde, Reference De Pauw and Mervielde2010). Taken together, these findings suggest that Surgency shows high rank-order and moderate mean-level stability, while its developmental plasticity determines whether it contributes to adaptive or maladaptive outcomes during development.
Regulatory temperament: effortful control
EC represents the self-regulatory aspect of temperament, the capacity to suppress a dominant response, sustain attention, and plan actions (Rothbart & Rueda, Reference Rothbart, Rueda, Mayr, Awh and Keele (Hrsg.)2005). Conceptually, EC overlaps with executive functions, particularly inhibitory control and attention shifting, integrating affective and cognitive regulation (Heinze et al., Reference Heinze, Daseking, Gawrilow, Karbach and Kerner Auch Koerner2025; Kälin & Roebers, Reference Kälin and Roebers2021; Zhou et al., Reference Zhou, Chen and Main2012).
Compared with reactive temperament traits, EC shows moderate rank-order stability and pronounced mean-level increases during early childhood. Kopala–Sibley et al., (Reference Kopala–Sibley, Olino, Durbin, Dyson and Klein2018) observed significant mean-level growth in EC from early to middle childhood alongside moderate individual stability, consistent with neurodevelopmental changes in prefrontal control systems (Rothbart et al., Reference Rothbart, Sheese and Posner2007; Zelazo, Reference Zelazo2020). Schmidt et al. (Reference Schmidt, Aschersleben and Henning2025) similarly reported mean-level increases in EC between ages 3 and 6 and partial structural invariance across measurement occasions. These findings indicate that EC develops through maturation and experience while maintaining moderate stability in individual differences (Caspi et al., Reference Caspi, Roberts and Shiner2005; Caspi & Roberts, Reference Caspi and Roberts2001). Eisenberg et al. (Reference Eisenberg, Valiente, Spinrad, Cumberland, Liew, Reiser, Zhou and Losoya2009) found that lower EC predicted higher externalizing symptoms over time, whereas higher EC buffered the effects of NA. Similarly, Sulik et al. (Reference Sulik, Blair, Mills-Koonce, Berry and Greenberg2015) reported that EC moderated the link between emotional reactivity and problem behavior.
Thus, EC constitutes a moderately stable, developmentally responsive construct that underlies self-regulatory maturation and functions as a protective factor, buffering or reversing the adverse effects of reactive temperament (Belsky & Pluess, Reference Belsky and Pluess2009). While EC highlights the regulatory processes that shape behavioral adjustment, understanding developmental pathways of psychopathology requires integrating reactive and regulatory components within a broader temporal framework.
Integrative perspective on stability and developmental change
The link between temperament and mental health symptoms has been well established at both theoretical and empirical levels. Numerous longitudinal studies demonstrate that early temperament traits predict later psychopathological symptoms, particularly externalizing problems. However, most of this research has focused on relatively stable interindividual differences rather than developmental change. As a result, it remains unclear how within-person fluctuations in temperament, alongside stable between-person tendencies, shape the emergence and persistence of externalizing symptoms. Addressing this gap requires a framework that simultaneously considers stability and development at multiple levels, mean-level, rank-order, and intraindividual, and examines how these processes relate to individual differences in symptom expression.
To this end, the present study investigates developmental changes in reactive temperament, focusing on NA and Surgency between ages 3 and 5, and their joint contribution to externalizing symptoms. EC is examined as a self-regulatory factor that may modify how such changes in reactive temperament translate into behavioral adjustment. This change-oriented approach extends previous trait-based research by integrating interindividual and intraindividual perspectives, thereby capturing both enduring predispositions and dynamic developmental processes. Early childhood represents a particularly sensitive period for this investigation, as regulatory systems are rapidly maturing and behavioral tendencies remain malleable (Diamond, Reference Diamond2013; Rothbart & Bates, Reference Rothbart, Bates, Damon, Lerner and Eisenberg (Hrsg.).2006). Understanding whether shifts in reactive temperament reflect adaptive adjustment or increasing vulnerability offers deeper insight into how early individual differences and developmental processes jointly construct risk for psychopathology.
Building on this framework, the present preregistered study (osf.io/5hrnu) examines how developmental changes in NA and Surgency between ages 3 and 5 predict externalizing symptoms, considering EC as a potential moderator.
Hypotheses
We hypothesize that changes in NA and surgency will be positively associated with changes in CD and ADHD throughout development, while changes in EC will be negatively associated with changes in CD and ADHD. Additionally, we propose that EC moderates the impact of NA and surgency on externalizing symptoms.
Specifically, we hypothesize that changes in NA will be positively associated with changes in symptoms of CD and ADHD (Wichstrøm et al., Reference Wichstrøm, Penelo, Rensvik Viddal, Osa and Ezpeleta2018). Similarly, we propose that changes in surgency will also be positively associated with changes in symptoms of CD and ADHD (Beauchaine, Reference Beauchaine2001; De Pauw & Mervielde, Reference De Pauw and Mervielde2010; Nigg, Reference Nigg2006; Wichstrøm et al., Reference Wichstrøm, Penelo, Rensvik Viddal, Osa and Ezpeleta2018).
Regarding the moderating effect of EC, we hypothesize that increases in EC will predict a weaker effect of changes in NA on changes in symptoms of CD and ADHD (Gartstein et al., Reference Gartstein, Putnam and Rothbart2012; Santens et al., Reference Santens, Claes, Dierckx and Dom2020; Wichstrøm et al., Reference Wichstrøm, Penelo, Rensvik Viddal, Osa and Ezpeleta2018). Similarly, we propose that increases in EC will predict a weaker effect of changes in surgency on changes in symptoms of CD and ADHD (Nigg, Reference Nigg2006; Zelazo, Reference Zelazo2020).
Method
Data from the latest release of the National Educational Panel Study (NEPS; Blossfeld et al., Reference Blossfeld, Roßbach and von Maurice2019) were used for this study. The data were collected by the Institute of Educational Sciences and Longitudinal Research (INBIL) from 2009 to 2013 and by the Leibniz Institute for Educational Pathways (LIfBi) since 2014, both at the Otto-Friedrich University Bamberg, Germany. Details on survey implementation are publicly available on the NEPS website (www.neps-data.de).
Sample
This study uses data from NEPS Start Cohort 1 (Newborn), which provides longitudinal data on temperament and externalizing symptoms in early childhood. A representative sample of children born in Germany between February and July 2012 was drawn according to established sampling procedures (NEPS-Network, 2023). The sample was stratified by community size, and participants were recruited through systematic interval sampling from municipal register data. The study followed participants across survey waves using contact information such as address, telephone number, and email (NEPS-Network, 2023). The present study includes data from waves 4 (age 3 years), 5 (age 4 years), 6 (age 5 years), 7 (age 6 years), and 9 (age 8 years). Interviewers and test coordinators were experienced in working with children and were trained by NEPS staff (NEPS-Network, 2023). Participants received ten euros after each completed survey wave. For further descriptives, see Table 1.
Table 1. Demographic data of the study sample

Note. SDQ = strengths and difficulties questionnaire; CBQ = children‘s behavior questionnaire.
Children’s behavior questionnaire (CBQ)
To assess Temperament, the child’s caregivers completed a nine item abbreviated form of the Very Short Form of the Children’s Behavior Questionnaire (CBQ; Putnam & Rothbart, Reference Putnam and Rothbart2006; 36 items) when children were 3, 4, and 5 years old. The three scales NA, Surgency, and EC each contained three items (NA: “Gets quite frustrated when prevented from doing something s/he wants to do,” “Seems to feel depressed when unable to accomplish some task,” “Is very difficult to soothe when s/he has become upset”; Surgency: “Often rushes into new situations,” “Likes rough and rowdy games,” “Is full of energy, even in the evening”; EC: “Sometimes becomes absorbed in a picture book and looks at it for a long time,” “When drawing or coloring in a book, shows strong concentration,” “Enjoys gentle rhythmic activities such as rocking or swaying”). Caregivers rated behaviors on a 7-point Likert scale (plus a “not applicable” option) from 1 to 7, with higher scores indicating stronger expression of the trait.
McDonald’s ω and Cronbach’s α for internal consistency were moderate for NA and Surgency (ω = .53–.58; α = .48–.62) but low for EC (ω = .31–.43; α = .26–.38) across the three waves. These values reflect the expected limitations of very short (three-item) scales. Slightly higher but still modest reliability estimates have been reported elsewhere, with α values typically ranging between .62 and .78 in the original validation (Putnam & Rothbart, Reference Putnam and Rothbart2006) and α = .69–.70 in a recent German sample (Heinze et al., Reference Heinze, Daseking, Gawrilow, Gunzenhauser, Karbach, Ulitzka and Kerner Auch Koerner2025). These values are consistent with prior research on this version of the CBQ. In short scales, moderate internal consistency can be considered acceptable, as reliability indices such as Cronbach’s α are sensitive to the number of items and tend to underestimate true reliability (Clark & Watson, Reference Clark and Watson1995; Schmitt, Reference Schmitt1996). Although the CBQ has not yet been formally validated in German, a German version is available on Rothbart’s CBQ homepage and has been successfully used in several studies (e.g., Kerner auch Koerner et al., Reference Kerner auch Koerner, Gust and Petermann2018; Licata-Dandel et al., Reference Licata-Dandel, Wenzel, Kristen-Antonow and Sodian2021). Because scalar measurement invariance could not be established across time points, EC was excluded from the longitudinal analyses. Even after removal of one item, scalar invariance was not achieved, and the resulting two-item scale lacked conceptual breadth and statistical reliability, leading to non-convergence in the growth models (see Table 3).
Strength and difficulties questionnaire (SDQ)
The parent-rated SDQ is a brief questionnaire designed to assess behavioral problems and strengths in children and adolescents aged 2-16 years (Goodman et al., Reference Goodman, Meltzer and Bailey1998). Five subscales are derived from the 25 items: Emotional Problems, Hyperactivity/Inattention, CP, Peer Problems, and Prosocial Behavior. In NEPS, the parent version included only the five CP items and five Hyperactivity/Inattention items, excluding the internalizing and prosocial subscales. Parents completed the SDQ when children were 5, 6, and 8 years old.
The CP scale includes the items “tantrums,” “obedience,” “fights with other children,” “argumentative,” and “steals.” The Hyperactivity/Inattention scale, consisting of “restless,” “fidgety,” “easily distracted,” “thinks,” and “attention,” captures the three symptom domains of ADHD: hyperactivity, impulsivity, and inattention. Given that it covers the core ADHD domains, this scale is referred to as ADHD symptoms in the present study. Prior research shows strong correlations between the SDQ Hyperactivity/Inattention scale and longer ADHD-specific measures, such as the Conners Rating Scale (Woerner et al., Reference Woerner, Fleitlich-Bilyk, Martinussen, Fletcher, Cucchiaro, Dalgalarrondo, Lui and Tannock2004) and the Child Behavior Checklist (Becker et al., Reference Becker, Woerner, Hasselhorn, Banaschewski and Rothenberger2004).
The SDQ is answered on a three-point scale: Not True, Somewhat True, and Certainly True. Exploratory and confirmatory factor analyses (CFAs) have been used to reproduce the scales in different countries (Kersten et al., Reference Kersten, Czuba, McPherson, Dudley, Elder, Tauroa and Vandal2016). McDonald’s ω and Cronbach’s α indicated acceptable reliability for ADHD symptoms (ω = .72–.73; α = .73–.74) and marginal reliability for CP (ω = .53–.54; α = .51–.53). These results are comparable to previous validation studies (Goodman et al., Reference Goodman, Meltzer and Bailey1998; Goodman, Reference Goodman2011; Stone et al., Reference Stone, Otten, Engels, R., Vermulst and Janssens2010) and reflect typical reliability estimates for short behavioral symptom subscales. The German version is used in the NEPS study (Klasen et al., Reference Klasen, Woerner, Rothenberger and Goodman2003).
Data analysis
To examine the factor structure of the CBQ and SDQ, CFAs were performed. Model fit was evaluated using RMSEA (≤ .06), CFI (≥ .95), SRMR (≤ .08; Hu & Bentler, Reference Hu and Bentler1999), and for model comparisons using the RMSEA D (Savalei et al., Reference Savalei, Brace and Fouladi2023). Where fit was suboptimal, modification indices were used to introduce within-factor residual correlations until acceptable fit was achieved. Each construct was then tested for longitudinal measurement invariance (configural, metric, scalar, strict). When full invariance could not be achieved, partial invariance was applied by freeing parameters with large modification indices. Final invariance models were fixed before the longitudinal analyses.
Subsequent analyses aimed to capture developmental stability and change across multiple levels. To this end, dual-change score models (DCSMs) were employed to capture individual differences in overall developmental trajectories as well as interval-to-interval dynamics, that is, relations of levels at one measurement occasion to changes at the next (i.e., autoproportional effects). This approach provides a developmentally sensitive alternative to traditional latent growth curve models (LGCMs), which only estimate overarching trajectories rather than dynamic change processes between adjacent assessments (Cáncer et al., Reference Cáncer, Estrada, Ollero and Ferrer2021; Kievit et al., Reference Kievit, Brandmaier, Ziegler, Van Harmelen, De Mooij, Moutoussis, Goodyer, Bullmore, Jones, Fonagy, Lindenberger and Dolan2018; McArdle, Reference McArdle2009).
In all models, indicator loadings for latent intercept factors were fixed to 1, and item intercepts were constrained to equality across waves to ensure scalar measurement invariance. Each latent state variable at time t was regressed on its prior state (t − 1) with a fixed coefficient of 1. A constant change factor (gX) captured the shared, time-invariant component of developmental change across intervals (e.g., from ages 3 to 5), representing the overall rate and direction of change in the construct. The autoproportion parameter (β) was freely estimated and held equal across intervals. For model identification, residual variances of latent state variables (e.g., SURAge3, ADHDAge5) were fixed to zero, while residuals of observed indicators were allowed to correlate within waves and were constrained to equality across time.
Three parameters describe different but complementary aspects of developmental change. The mean of the constant change factor (gX) represents the model-implied average direction and magnitude of change (e.g., gADHD = overall change in ADHD symptoms from ages 3 to 5). A positive mean indicates that, on average, the construct increased over time, whereas a negative mean reflects a general decrease across the sample. The variance of gX captures interindividual variability in change, indicating how strongly children differ in their developmental trajectories. A large variance reflects heterogeneous change, whereas a small variance reflects uniform change. Finally, the autoproportion parameter (β) represents the model-implied within-person dynamic, quantifying how prior levels predict subsequent change. A negative β indicates that children with higher scores at one occasion tend to show smaller subsequent increases (or stronger decreases), while those with lower scores show stronger increases over time, reflecting convergence toward the overall group trajectory. Conversely, a positive β indicates that children with higher scores at one occasion tend to show further increases, (or smaller decreases) whereas those with lower scores exhibit weaker increases or further decreases, reflecting divergence and the amplification of individual differences (Grimm et al., Reference Grimm, An, McArdle, Zonderman and Resnick2012; McArdle, Reference McArdle2009).
In addition, the covariance between the intercept and the constant change factor provides a complementary indicator of developmental stability at the interindividual level. Whereas β reflects within-person dynamics (how an individual’s prior level predicts their own subsequent change), the intercept – change covariance reflects between-person associations (how individuals who start higher or lower differ in their average rate of change). A negative covariance indicates that children with higher initial levels tend to show smaller average changes, suggesting convergence across individuals, whereas a positive covariance indicates that children with higher initial levels show larger average changes, reflecting divergence in developmental trajectories. The intercept and constant change factors and their covariance complement the model by capturing how stable between-person differences contribute to the overall patterns of change. A negative covariance indicates convergence across individuals, whereas a positive covariance indicates divergence, meaning that children who start higher tend to increase more over time.
In a first step, univariate DCSMs were estimated for each construct (NA, SUR, EC, ADHD, CP) to describe average developmental trends, interindividual variability in change, and proportional dynamics. Because both a constant change factor (gX) and an autoproportion parameter (β) were included, these models represent dual-change specifications. Constant change factors are denoted by g (e.g., gNA = change in NA between adjacent ages).
In a second step, bivariate DCSMs were specified to examine whether temperament predicted externalizing symptoms. In these models, both the baseline level factors of externalizing symptoms (e.g., ADHDAge5) and their constant change factors (e.g., gADHD) were regressed on the baseline level and constant change factors of temperament (e.g., SURAge3, gSUR). This specification tested developmental predictions of both symptom level and symptom change from initial temperament levels and temperament change. All structural relations were estimated jointly within a single model for each outcome, capturing parallel developmental dynamics across domains and allowing for simultaneous analysis of proportional and constant components of developmental change.
Temperament at age 5 was not included as a separate predictor within the same model because its variance is already decomposed into the baseline level at age 3 and the preceding constant change factor (gTemperament). Including age 5 alongside these predictors would therefore introduce redundancy and multicollinearity, as it is not statistically independent of them. For comparison, additional models included either age-3 or age-5 temperament levels as predictors of baseline symptoms and subsequent change.
Concurrent cross-lagged effects could not be estimated because temperament and externalizing symptoms only had one concurrent measurement (temperament: ages 3–5; symptoms: ages 5–9). We therefore modeled temporally ordered associations using DCSMs. To maintain identification, limit the number of free parameters, and preserve statistical power, we estimated separate DCSMs for each pairing (NA → ADHD, NA → CP, SUR → ADHD, SUR → CP). This specification also avoids cross-domain multicollinearity and facilitates transparent interpretation of coupling paths.
Figure 1 illustrates the general structure of the DCSM used in this study. All models were estimated in Mplus 8.4 using the robust full-information maximum-likelihood (FIML) estimator.

Figure 1. Dual change score model of the development of reactive temperament and externalizing symptoms. Note. Squares represent observed indicators and circles represent latent variables; Latent change score factors capture short-term changes between adjacent measurement occasions (Change), while the Constant Change Factor (gCBQ, gSDQ) represents the average rate of change across intervals; Autoproportion parameters (β) quantify proportional, level-dependent dynamics and were constrained to equality across intervals to represent constant proportional change over time; The variances of the latent change factors were likewise constrained to equality, and their loadings on the Constant Change Factor were fixed to 1 for model identification; Cross-domain relations (from temperament to symptoms) are shown with thicker arrows to emphasize the primary developmental couplings tested conceptually; CBQ = Children’s Behavior Questionnaire; SDQ = Strengths and Difficulties Questionnaire.
Results
Sample characteristics, including measurement time points, sample sizes, and demographic information at each wave, are presented in Table 1. These data provide an overview of the study population and data availability across assessment periods. Descriptive statistics for the main study variables, including latent means, standard deviations, and correlations among temperament and externalizing symptoms, are provided in ESM Table 1.
Construct validity
To assess the construct validity of the SDQ and CBQ, CFAs were conducted, examining the factor structure at each measurement time point. For the CBQ, one- and three-factor models were tested at age 3, 4, and 5, and for the SDQ, one- and two-factor models were tested at age 5, 6, and 8 (see Table 2).
Table 2. Fit indices for confirmatory factor analysis models for SDQ and CBQ

Note. SDQ = strengths and difficulties questionnaire; CBQ = children‘s behavior questionnaire; χ 2 (df) = Chi Square Value (degrees of freedom); RMSEA = root mean Square Error of Approximation; RMSEAD = RMSEA associated with chi-square difference test comparing successive models (Savalei et al., Reference Savalei, Brace and Fouladi2023); CFI = comparative fit index; SRMR = standardized root mean square residual; *** p < .001, ** p < .01, * p < .05 (two-tailed).
The two-factor model provided a better fit for the SDQ data than the one-factor model, confirming that CP and ADHD are best represented as distinct constructs. Similarly, for the CBQ, the three-factor model separating NA, Surgency, and EC demonstrated superior model fit, supporting the differentiation of temperament dimensions. Overall, the results for both the CBQ and the SDQ indicate that models separating the constructs into distinct, yet correlated, factors have a better fit than single-factor models. This supports the construct validity of NA, Surgency, and EC for the CBQ and of CP and ADHD for the SDQ.
Measurement invariance
Measurement invariance was tested separately for each construct. Configural and metric invariance were supported for all CBQ and SDQ scales, indicating that the same factor structures and loadings were maintained across waves. To achieve scalar invariance, a few intercepts per scale were freed. The resulting partial scalar invariance models showed acceptable to good fit, supporting valid latent mean comparisons over time even though strict equality of all intercepts could not be established (see Table 3).
Table 3. Fit indices for measurement invariance analysis for SDQ and CBQ

Note. SDQ = strengths and difficulties questionnaire; CBQ = children‘s behavior questionnaire; ADHD = attention deficit hyperactivity disorder; CP = conduct problems; EC = effortful control; NA = negative affectivity; SUR = surgency; RMSEA = root mean square error of approximation; CFI = comparative fit Index; SRMR = standardized root mean square residual; χ2 = Chi-square statistic; df = degrees of freedom; RMSEAD = RMSEA associated with chi-square difference test comparing successive models (Savalei et al., Reference Savalei, Brace and Fouladi2023); Loosened constraints digit following item name denotes the measurement wave (e.g., NA _fail4 = Wave 4);.
Strict invariance, which requires residual variances to remain constant across time points, could not be fully achieved. Nevertheless, model fit indices remained acceptable in most cases, suggesting approximate comparability over time. For EC, scalar invariance could not be achieved even after removing one item. The resulting two-item solution lacked conceptual coverage and showed convergence problems in change models; therefore, EC was excluded from further analyses.
Analysis of change
To examine intraindividual change in NA, Surgency, ADHD, and CP from age 3 to age 5, DCSMs were applied. As shown in Table 4, all constructs exhibited significant Constant Change Factors (gX), indicating that the constructs changed systematically over time rather than remaining static. Variances of the Constant Change Factors were also significant, showing that the magnitude and direction of change differed meaningfully between children.
Table 4. Analysis of constant change factors and autoproportion effects

Note. ADHD = attention-deficit/hyperactivity disorder symptoms; CP = conduct problems; EC = effortful control; NA = negative affectivity; SUR = surgency; χ 2 = chi-square statistic; df = degrees of freedom; Baseline χ 2 (df) refers to the freely estimated dual-change model for each construct; gX Mean = 0 = Constant Change Factor (gX) mean fixed to 0; gX Variance = 0 = Constant Change Factor/Slope variance fixed to 0; Δχ 2 = likelihood-ratio χ 2 difference from the baseline model; Δdf = corresponding difference in degrees of freedom (Diff). Means, variances, covariance and β-coefficients are unstandardized; standard errors are in parentheses; *** p < .001, ** p < .01, * p < .05 (two-tailed); All Constant Change Factors (means and variances) and Auto-Proportion parameters (β) were statistically significant (p < .001) unless otherwise indicated; Critical χ 2 values for p < .001: Δdf = 1 → 10.83, 2 → 13.82, 3 → 16.27, 4 → 18.47, 5 → 20.52.
Significant and negative autoproportion parameters (β) indicated that higher prior levels were followed by smaller subsequent changes, meaning that children with initially higher scores tended to show less further increase, whereas those with initially lower scores showed stronger increases. In contrast, positive Intercept – Constant Change Factor covariances were observed for all constructs. These covariances refer to between-person differences in average change rather than individual fluctuations. They show that, on average, children who started at higher levels also exhibited greater overall increases across the study period.
Together, these results confirm that all constructs exhibited systematic developmental change and individual variability in change dynamics. Building on these findings, the next analyses examined whether these developmental processes in reactive temperament were associated with subsequent changes in externalizing symptoms.
Associations of reactive temperament and externalizing symptoms
As hypothesized, baseline levels of NA and SUR at age three were not significantly related to later levels or changes in CP or ADHD once intraindividual change was modeled (Table 5). In contrast, increases in both NA and SUR from ages three to five were consistently associated with higher levels of CP and ADHD at age five and with further increases in these symptoms through age eight. All significant coupling parameters were positive, indicating that intraindividual increases in reactive temperament predicted corresponding increases in externalizing behavior over time. This pattern was highly consistent across both temperament traits and both externalizing domains. The corresponding structural diagrams for all four dual-change models (NA–ADHD, SUR–ADHD, NA–CP, and SUR–CP) are presented in ESM Figures 1a–1d.
Table 5. Dual change score model: associations of reactive temperament (age 3 and change) and problem behavior (age 5 and change)

Note. ADHD = attention deficit hyperactivity disorder; CP = conduct problems; NA = negative affect; SUR = surgency; gX = constant change factor/slope; RMSEA = root mean square error of approximation; CFI = comparative fit index; SRMR = standardized root mean square residual; χ 2 (df) = Chi-square statistic (degrees of freedom); R 2 = coefficient of determination.
Observed trajectories (Figure 2) illustrate this pattern descriptively. Mean increases in ADHD and CP symptoms were marginal, while level differences between tertiles of NA and SUR were visible across all time points. Children in the high-change tertiles consistently showed higher average symptom levels than those in the middle or low tertiles, but temporal increases within each group remained small, suggesting that variability was primarily between individuals rather than within individuals.

Figure 2. Observed trajectories of conduct problems and ADHD. Symptoms by change in reactive temperament. Note. Observed trajectories of Conduct Problems (left) and ADHD symptoms (right) are shown across three measurement points, separated by tertiles of the Constant Change Factor for Negative Affect (gNA; top) and Surgency (gS/gS2; bottom); Thin lines represent individual observed values; bold lines indicate mean trajectories within each tertile; Color gradients (blue, yellow, red) correspond to low, medium, and high levels of the Constant Change Factor; Trajectories were generated from observed SDQ scores without wavewise standardization; Grouping was based on tertiles of latent Constant Change Factor scores estimated in the latent change models; For each group, individual trajectories were plotted as semi-transparent lines, and group mean trajectories were overlaid using smoothed line estimates; Colors were harmonized across panels for comparability.
Model-implied trajectories (predicted symptom paths generated from the latent change model, rather than observed scores; Figure 3) illustrate the modeled pattern more clearly. Starting from a shared baseline, the predicted curves converge over time, consistent with negative auto-proportion parameters (Table 4). These parameters reflect within-person dynamics, meaning that individual deviations from the mean tend to diminish across time. In contrast, the Constant Change Factors (gNA, gSUR) represent stable between-person differences: children who showed stronger increases in NA or SUR also showed higher predicted ADHD and CP levels across time. Together, these parameters characterize the modeled developmental dynamics, indicating that intraindividual change in temperament tends to co-occur with changes in externalizing symptoms, while between-person variability is modeled as gradually stabilizing.

Figure 3. Model-implied trajectories of conduct problems and ADHD symptoms by change in reactive temperament. Note. Model-implied trajectories of Conduct Problems (left) and ADHD symptoms (right) are shown across three measurement points, separated by tertiles of the Constant Change Factor for Negative Affect (gNA; top) and Surgency (gS/gS2; bottom); Thin lines represent individual model-implied trajectories derived from the latent change score model; bold lines indicate mean trajectories within each tertile; Color gradients (blue, yellow, red) correspond to low, medium, and high levels of the Constant Change Factor; Predicted trajectories were computed using unstandardized Mplus parameters obtained from the latent change score model; For each individual, latent estimates of initial status and change (Constant Change Factor and proportional change parameters) were used to calculate model-implied ADHD and CP values at ages 5, 6, and 8; The resulting trajectories were plotted using the same tertile grouping, color scheme, and scaling as the observed data to ensure comparability between observed and model-implied figures.
To complement these primary models, post hoc analyses using temperament levels at ages three and five as predictors of concurrent and later symptoms (ESM Tables 2 and 3) showed that both age-specific temperament levels were associated with symptom levels measured at the same or later time points. However, when latent change was included, these associations were no longer significant. This indicates that predictive effects of temperament were largely explained by intraindividual change rather than by static differences in baseline levels.
Discussion
This study examined how changes in reactive temperament contribute to the development of externalizing symptoms in early childhood. The results showed that early temperament and externalizing symptoms are not fixed traits but evolve dynamically during the preschool years. In particular, increases in NA and Surgency between ages 3 and 5 were closely tied to the emergence and growth of CP and ADHD symptoms, whereas baseline levels were far less predictive. These findings indicate that developmental change, rather than early trait levels alone, is a more sensitive marker of emerging risk.
The analyses further revealed that developmental processes operate at two complementary levels. The Constant Change Factors (gNA, gSUR, gADHD, gCP) captured intraindividual change and its covariation across domains, whereas the autoproportion parameters (β) described proportional dynamics within each construct. The positive couplings between gNA/gSUR and gADHD/gCP indicate that children who increased more strongly in reactive temperament also showed stronger increases in externalizing symptoms. The negative β-coefficients, in contrast, represent modeled convergence within constructs, indicating that individual deviations from the mean are estimated to diminish across time. Together, these parameters illustrate that modeled change in temperament is associated with modeled change in symptoms, while overall trajectories are modeled as converging at the group level, a pattern consistent with theoretical accounts emphasizing preschool as a period of both plasticity and emerging self-regulation.
Construct validity
CFA confirmed the factor structure of temperament (three factors: NA, Surgency, EC) and externalizing symptoms (two factors: CP, ADHD) that have been found in previous studies (temperament with CBQ: e.g., Putnam & Rothbart, Reference Putnam and Rothbart2006; Sleddens et al., Reference Sleddens, Kremers, Candel, De Vries and Thijs2011, Reference Sleddens, Hughes, O’Connor, Beltran, Baranowski, Nicklas and Baranowski2012; de la Osa et al., Reference de la Osa, Granero, Penelo, Domènech and Ezpeleta2014; externalizing symptoms with SDQ: Goodman et al., Reference Goodman, Meltzer and Bailey1998; Kiel et al., Reference Kiel, Bruckdorfer, Petermann and Reinelt2018; Stone et al., Reference Stone, Otten, Engels, R., Vermulst and Janssens2010, Reference Stone, Janssens, Vermulst, Van Der Maten, Engels and Otten2015; see Table 2). Measurement invariance was adequate for all constructs except EC (see Table 2). Because EC did not meet invariance requirements, hypotheses involving this construct were not tested. The lack of invariance suggests that observed score differences over time may partly reflect changes in measurement properties or informant perceptions rather than true developmental change.
Developmental change in temperament and externalizing symptoms
Our findings underscore that early temperament is shaped by development rather than reflecting fixed traits. Significant variability in NA and Surgency trajectories challenges static trait-based conceptions of early temperament. The observed heterogeneity in trajectories, reflected in significant variance of the Constant Change Factors, indicates that children do not follow a uniform developmental path. Instead, temperament unfolds in diverse directions, likely shaped by maturational and contextual influences (Beauchaine & Cicchetti, Reference Beauchaine and Cicchetti2019). NA and Surgency exhibited substantial heterogeneity, with some children intensifying, others declining, and some remaining stable. This aligns with Caspi et al. (Reference Caspi, Roberts and Shiner2005), who noted that rank-order stability can coexist with substantial intraindividual change. Strong autoproportion effects (β) suggested that higher initial levels were followed by smaller gains, while lower initial levels often preceded greater increases, patterns consistent with normative regulatory processes (see ESM Figure 4). In the context of our models, the mean of the Constant Change Factor corresponds to mean-level change at the group level, whereas the autoproportion parameter reflects proportional or within-person change and the intercept – change covariance captures how between-person differences at baseline relate to between-person differences in average change. Taken together, these results indicate that NA and Surgency are not static traits but evolving dispositions shaped by both interindividual variability and intraindividual dynamics. Additionally, Dyson et al. (Reference Dyson, Olino, Durbin, Goldsmith, Bufferd, Miller and Klein2015) emphasize heterotypic continuity, meaning that different behaviors may reflect the same underlying trait at different ages. This is particularly relevant to our reliance on parent-reported CBQ data, where observed changes may reflect both genuine developmental shifts and heterotypic manifestations of temperament. Integrating this perspective helps contextualize our findings: increases in NA and Surgency could partly signal maturational changes in how these traits are expressed and perceived during the preschool years, in line with the broader literature on multiple types of stability in early childhood.
Externalizing symptoms also showed dynamic properties. Initial ADHD and CP levels predicted subsequent change, but children with higher initial scores tended to show smaller increases or even reductions, while those with lower initial scores often increased more. This reflects self-correcting tendencies and supports developmental cascade models in which early symptoms serve as starting points rather than fixed pathways (Diamond, Reference Diamond2013; Frick & Viding, Reference Frick and Viding2009; Van Meter et al., Reference Van Meter, Sibley, Vandana, Birmaher, Fristad, Horwitz, Youngstrom, Findling and Arnold2024). These results suggest that externalizing problems in early childhood can shift in intensity and expression as children mature. For example, elevated ADHD or CP at preschool age may decline if regulatory capacities strengthen or if environmental supports are introduced, whereas initially low levels can intensify under adverse conditions. This pattern is consistent with maturational processes and the emergence of regulatory capacities (Ágrez et al., Reference Ágrez, Vakli, Weiss, Vidnyánszky and Bunford2025; Diamond, Reference Diamond2013). Rather than reflecting stable, trait-like vulnerabilities, externalizing symptoms appear embedded in developmental systems that can both escalate and attenuate over time depending on individual and contextual conditions (Van Meter et al., Reference Van Meter, Sibley, Vandana, Birmaher, Fristad, Horwitz, Youngstrom, Findling and Arnold2024). The pattern also supports models suggesting that early symptoms of CP can initiate but not determine developmental trajectories, which remain malleable in the presence of protective or exacerbating influences (Blair et al., Reference Blair, Veroude and Buitelaar2018, Reference Blair, Bashford-Largo, Zhang, Lukoff, Elowsky, Leibenluft, Hwang, Dobbertin and Blair2020; Frick & Viding, Reference Frick and Viding2009).
When temperament and externalizing symptoms are considered together, the findings suggest parallel developmental dynamics. Both NA/Surgency and CP/ADHD showed heterogeneity in trajectories and evidence of autoregressive or self-correcting processes. This indicates that reactive temperament and externalizing behaviors may co-develop in ways that reinforce each other: shifts in NA or Surgency can set the stage for symptom escalation, while early symptom levels may in turn shape subsequent temperament change. Such reciprocal influences support transactional and cascade models of development, emphasizing that vulnerabilities are constructed through ongoing interactions rather than fixed traits. This interpretation is consistent with Beauchaine and McNulty’s (Reference Beauchaine and McNulty2013) ontogenic model, which highlights how comorbidities and continuities emerge through reciprocal developmental processes that accumulate over time. It also resonates with developmental cascade perspectives (Masten et al., Reference Masten, Roisman, Long, Burt, Obradović, Riley, Boelcke-Stennes and Tellegen2005), in which functioning in one domain of behavior spreads to other domains in lasting ways. Strong autoproportion effects emerged for NA and Surgency, clearly surpassing those observed for ADHD and CP. Such patterns may indicate normative regulatory processes, including maturational shifts and emerging self-regulation (Perry et al., Reference Perry, Calkins, Dollar, Keane and Shanahan2018). Importantly, the variance in Constant Change Factors demonstrates that children differ meaningfully in their developmental trajectories, significant mean changes indicate group-level trends, autoproportion parameters may capture self-correcting tendencies within individuals, and intercept–change covariances show how starting levels shape subsequent growth. Together, these parameters provide quantitative evidence for the developmental trajectories of stability and change observed between temperament and externalizing symptoms.
Associations of reactive temperament and externalizing symptoms
When modeled jointly with developmental change, baseline levels of NA and Surgency at age 3 were no longer associated with CP or ADHD outcomes at age 5 or with subsequent changes through age 8. In contrast, increases in NA and Surgency between ages 3 and 5 were consistently related to higher CP and ADHD symptom levels at age 5 and to further symptom growth thereafter (see Table 5, ESM Figures 1a–1d). These findings suggest that intraindividual change in reactive temperament provides additional information beyond early trait levels. Similar patterns were reported by Wichstrøm et al. (Reference Wichstrøm, Penelo, Rensvik Viddal, Osa and Ezpeleta2018), who demonstrated that within-person increases in negative affectivity and surgency predicted subsequent growth in externalizing problems, and by Nielsen et al. (Reference Nielsen, Olino, Dyson and Klein2019), who found that changes in temperament across preschool years, assessed using a performance-based temperament task, were prospectively associated with externalizing trajectories. Recent work also links temperament-based pathways to heterogeneity in ADHD (Karalunas & Nigg, Reference Karalunas and Nigg2020; Nigg, Reference Nigg2022), underscoring the developmental significance of reactive traits.
Concurrent links between temperament and behavior problems are well established across cross-sectional and longitudinal research (Dougherty et al., Reference Dougherty, Klein, Durbin, Hayden and Olino2010; Martel & Nigg, Reference Martel and Nigg2006; Nielsen et al., Reference Nielsen, Olino, Dyson and Klein2019; Oldehinkel et al., Reference Oldehinkel, Hartman, De Winter, Veenstra and Ormel2004; Wichstrøm et al., Reference Wichstrøm, Penelo, Rensvik Viddal, Osa and Ezpeleta2018). It is therefore plausible that same-wave assessment of age-5 temperament and symptoms contributed to the strength of concurrent effects. Our modeling approach extends this work by explicitly estimating intraindividual change, thereby distinguishing developmental variation from time-specific covariance and providing a developmentally sensitive perspective on early risk.
Together with the model-implied trajectories showing that increases in NA and Surgency predicted higher ADHD and CP symptom levels over time (see ESM Figures 2–3 and Figures 2 and 3), these findings indicate that developmental change in reactive temperament captures meaningful within-person variation linked to emerging externalizing risk rather than reflecting stable between-person differences. Figure 3 illustrates that children with stronger increases in reactive temperament (red tertile) showed higher symptom levels and greater variability across waves, whereas the lower-change groups (blue tertile) remained comparatively stable over the same developmental window. The results are consistent with cascade and transactional perspectives, which propose that variation in one domain, in this case reactive temperament, can influence other domains of behavior and thereby shape the development of externalizing problems (Moilanen et al., Reference Moilanen, Shaw and Maxwell2010; Nigg, Reference Nigg2006). Moreover, the results align with dimensional approaches to psychopathology that conceptualize early behavioral tendencies such as irritability and high approach as continuous with later externalizing risk (Wakschlag et al., Reference Wakschlag, Estabrook, Petitclerc, Henry, Burns, Perlman, Voss, Pine, Leibenluft and Briggs-Gowan2015, Reference Wakschlag, Roberts, Flynn, Smith, Krogh-Jespersen, Kaat, Gray, Walkup, Marino, Norton and Davis2019). Examining intraindividual change therefore complements static assessments of temperament by providing a developmentally sensitive means of identifying children at elevated risk for persistent behavioral problems. This perspective highlights the value of studying temperament not only as a set of early traits but as an evolving aspect of socioemotional development (Diamond, Reference Diamond2013; Zelazo & Carlson, Reference Zelazo and Carlson2012). The convergence of findings from our analyses and prior longitudinal studies underscores that early temperament change is a sensitive indicator of developmental risk, aligning with dimensional models of psychopathology and highlighting opportunities for early, context-sensitive interventions.
Limitations
Although this study has several strengths, there are also limitations that should be acknowledged. First, the EC construct posed notable measurement challenges: one item had to be removed and measurement invariance was not met across waves (Widaman et al., Reference Widaman, Ferrer and Conger2010). These issues highlight difficulties in reliably assessing this dimension and precluded analyses of change, associations, or moderation effects involving EC. These concerns are consistent with our construct validity analyses, which already indicated instability in the EC factor.
Second, the CBQ used in the panel study includes only three items per subscale, limiting content coverage and leading to relatively low internal consistency, particularly for EC. This raises broader concerns about the reliability and content validity of abbreviated measures (Aiken, Reference Aiken1980). Nevertheless, measurement invariance across waves was achieved for NA and Surgency, suggesting that despite their relatively low reliability coefficients, these constructs demonstrated sufficient temporal factor stability in measurement properties. Although our findings converge with prior research, the abbreviated short scales reduce comprehensiveness. Moreover, Surgency and NA were not modeled together, preventing tests of interactions between these traits. Future studies should employ longer and more comprehensive instruments to capture temperament dimensions and their interactions.
Third, all measures of temperament and externalizing symptoms relied exclusively on parent ratings. Shared method variance and rater biases may have inflated associations, and parental perceptions likely shift with children’s age, contributing to apparent changes in NA and EC (De Los Reyes et al., Reference De Los Reyes, Augenstein, Wang, Thomas, Drabick, Burgers and Rabinowitz2015, Reference De Los Reyes, Epkins, Asmundson, Augenstein, Becker, Becker, Bonadio, Borelli, Boyd, Bradshaw, Burns, Casale, Causadias, Cha, Chorpita, Cohen, Comer, Crowell, Dirks and Youngstrom2023). Future research should integrate multimethod approaches, including observational and performance-based tasks, and multiple informants to better capture children’s behaviors across contexts.
Fourth, the timing of assessments is a limitation. Temperament was assessed at ages 3 and 5, whereas externalizing symptoms were assessed at ages 5 and 9. This non-simultaneous design complicates the interpretation of whether associations reflect concurrent developmental processes or lagged effects. Although additional models with age-5 temperament provided some insight, true synchrony could not be tested.
Fifth, our models did not include covariates such as socioeconomic status, sex, parenting, parental psychopathology, or time-varying influences such as negative life events. Environmental and contextual factors are well-documented predictors of both temperament and externalizing outcomes (Belsky & Pluess, Reference Belsky and Pluess2009; Sameroff, Reference Sameroff2010), and their omission may have limited the specificity of our findings.
Sixth, although the latent change score approach offers advantages for modeling dynamic processes, it also entails assumptions. While measurement error is partly accounted for through the use of latent variables, and regression-to-the-mean effects are formally modeled via the autoproportion parameter rather than treated as statistical artifacts, unmeasured third-variable influences cannot be excluded (e.g. Grimm et al., Reference Grimm, An, McArdle, Zonderman and Resnick2012). Both Grimm et al. (Reference Grimm, An, McArdle, Zonderman and Resnick2012) and Kievit et al. (Reference Kievit, Brandmaier, Ziegler, Van Harmelen, De Mooij, Moutoussis, Goodyer, Bullmore, Jones, Fonagy, Lindenberger and Dolan2018) emphasize that the timing of assessments is critical: if lags are not developmentally appropriate, conclusions about change processes may be distorted. Moreover, as illustrated by the comparison between observed and model-implied trajectories (Figures 2 and 3), even well-fitting models can smooth over nonlinearities or individual differences in the empirical data. Consequently, predictions involving externalizing symptom change should be interpreted with some caution, as model-imposed structure may not fully capture the underlying developmental dynamics.
Finally, our sample was drawn from a population-based cohort, supporting generalizability to community samples. However, findings may not extend to clinical populations with more severe or comorbid presentations. The absence of diagnostic data and reliance on dimensional measures may underestimate associations at the extreme end of symptom severity, particularly given high comorbidity rates in clinical populations (e.g. Angold et al., Reference Angold, Costello and Erkanli1999). Moreover, although our sample was population-based, it still constitutes a form of convenience sampling. As noted by Jager and colleagues (Reference Jager, Putnick and Bornstein2017), homogeneous convenience samples are common in developmental research and may offer clearer generalizability than heterogeneous ones, yet they still fall short of probability samples. These considerations highlight the importance of replicating findings across both community and clinical samples. Together, these limitations underscore the need for future studies to employ multi-informant, multimethod assessments, ensure stronger measurement properties, incorporate key covariates and time-varying influences, and replicate findings across diverse contexts.
Implications
For research, our results demonstrate that intraindividual changes in reactive temperament, not only early trait levels, are important predictors of externalizing symptoms. This shows that questionnaire-based assessments capture not only dispositional reactivity but also its developmental unfolding in interaction with maturation and environmental input. Such findings further shift the perspective from viewing temperament as a fixed trait to considering it as a dynamic process. This implies that future research should focus explicitly on developmental change as a key construct, not just as error variance around stable traits. Building on this, dismantling studies are needed to identify the modifiable mechanisms of change, such as parenting behaviors, peer dynamics, and executive function development (Eisenberg et al., Reference Eisenberg, Valiente, Spinrad, Cumberland, Liew, Reiser, Zhou and Losoya2009; Zelazo, Reference Zelazo2020). Multimethod and context-sensitive assessments, including observational paradigms, EMA, and physiological measures (e.g. Gagne et al., Reference Gagne, Van Hulle, Aksan, Essex and Goldsmith2011; Gartstein & Rothbart, Reference Gartstein and Rothbart2003; Putnam et al., Reference Putnam, Rothbart and Gartstein2008; Wass & Goupil, Reference Wass and Goupil2022), will be of importance to disentangle biological predispositions from experience-driven adaptations. Furthermore, because the predictive patterns were consistent across CP and ADHD, our results support transdiagnostic perspectives (e.g. Beauchaine & Cicchetti, Reference Beauchaine and Cicchetti2019; Cuthbert & Insel, Reference Cuthbert and Insel2013; Kotov et al., Reference Kotov, Krueger, Watson, Cicero, Conway, DeYoung, Eaton, Forbes, Hallquist, Latzman, Mullins-Sweatt, Ruggero, Simms, Waldman, Waszczuk and Wright2021; Lahey et al., Reference Lahey, Pelham, Loney, Lee and Willcutt2005, Reference Lahey, Tiemeier and Krueger2022).
For practice, our results demonstrate that reactive traits do not represent a static “risk temperament” but function as indicators of susceptibility. Highly reactive children showed elevated risk when their reactivity increased over time, which highlights plasticity rather than fixed vulnerability. This means that such traits should be understood as markers of sensitivity to context, consistent with the differential susceptibility model (Belsky & Pluess, Reference Belsky and Pluess2009). The practical implication is that interventions should not treat temperament as destiny but as potential with space for flourishing, catch-up-effects etc., depending on the goodness-of-fit between child and environment (Thomas & Chess, Reference Thomas and Chess1977), parenting (Bornstein et al., Reference Bornstein, Putnick and Suwalsky2018; Kiff et al., Reference Kiff, Lengua and Zalewski2011) and tailored, reasonable interventions (Colizzi et al., Reference Colizzi, Lasalvia and Ruggeri2020; Shah et al., Reference Shah, Jones, Van Os, McGorry and Gülöksüz2022). Particularly in preschool, when children are most malleable, professionals should recognize that the same reactivity that confers vulnerability can also be harnessed for positive development. Supporting parents and educators in providing structured, responsive, and supportive environments can therefore redirect trajectories toward adaptive outcomes (Cicchetti, Reference Cicchetti2016; Masten et al., Reference Masten, Roisman, Long, Burt, Obradović, Riley, Boelcke-Stennes and Tellegen2005; Sameroff, Reference Sameroff2010).
Theoretically, our results demonstrate that reactive temperament and externalizing symptoms development are associated. These findings align with transactional and cascade models of development (Masten et al., Reference Masten, Roisman, Long, Burt, Obradović, Riley, Boelcke-Stennes and Tellegen2005; Moilanen et al., Reference Moilanen, Shaw and Maxwell2010) and contribute empirical evidence that reinforces dimensional frameworks such as HiTOP (Kotov et al., Reference Kotov, Krueger, Watson, Cicero, Conway, DeYoung, Eaton, Forbes, Hallquist, Latzman, Mullins-Sweatt, Ruggero, Simms, Waldman, Waszczuk and Wright2021) and RDoC (Cuthbert & Insel, Reference Cuthbert and Insel2013; Ostlund et al., Reference Ostlund, Myruski, Buss and Pérez-Edgar2021). The implication is that models of psychopathology should integrate reactivity and regulation within unified frameworks (Nigg, Reference Nigg2006; Santens et al., Reference Santens, Claes, Dierckx and Dom2020; Zelazo, Reference Zelazo2020) as we originally intended with the integration of the EC factor into our analyses. Importantly, the evidence that ages 3 to 5 are a sensitive window for developmental change means that early interventions timed to these shifts may be especially effective. Programs that target self-regulation, socio-emotional learning, or parenting during periods of heightened reactivity may not only prevent maladaptive outcomes but also leverage plasticity to promote resilience and thriving.
Conclusion
This study highlights the importance of a developmental perspective on temperament and its role in predicting emotional and behavioral difficulties. Our findings demonstrate that individual changes in NA and Surgency during early childhood are more informative than early trait levels alone when predicting later externalizing symptoms. This highlights the relevance of temperament as a dynamic, evolving construct rather than a static trait. The results suggest that early shifts in temperament may offer a more actionable window for prevention and intervention than static, cross-sectional assessments. Recognizing temperament change as a meaningful indicator of developmental trajectories can help refine early identification strategies and support individualized, adaptive responses to emerging mental health risks.
Future research should build on these insights by integrating multimethod assessments and exploring the mechanisms that drive temperament change, including environmental, relational, and maturational factors. Such work will be essential in shaping effective developmental models and translating findings into timely support strategies for children at risk.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0954579426101278.
Data availability statement
The study follows Transparency and Openness Promotion (TOP) guidelines as far as permitted by legal and ethical constraints.
Acknowledgements
This research was made possible through institutional support provided by the participating universities. Data used in this study were accessed via the NEPS (National Educational Panel Study) network. We thank the Leibniz Institute for Educational Trajectories for providing access to the data.
Funding statement
This research was supported by institutional funding from the participating universities.
Competing interests
The authors declare no competing interests.
Pre-registration statement
This study was preregistered prior to data analysis.
Active link
Deviations
Due to a lack of longitudinal measurement invariance, Effortful Control could not be modeled consistently across measurement occasions. As a result, the preregistered moderation hypotheses involving Effortful Control could not be tested. This deviation from the preregistered analysis plan is fully reported and discussed in the manuscript.
Availability of data
The data used in this study are from the National Educational Panel Study (NEPS), Start Cohort 1 (Newborns). Due to legal and ethical restrictions, the data cannot be shared publicly. Access to the data can be requested via the NEPS Network (https://www.neps-data.de).
Availability of code
All analysis code and the preregistration are publicly available on the Open Science Framework at https://osf.io/5hrnu.
Availability of methods/materials
All methods and materials are described in detail in the manuscript. Supplementary materials are available from the corresponding author upon reasonable request.
AI statement
No AI tools were used in the preparation of this manuscript beyond basic spelling and grammar checks (DEEPL).




