Internalizing symptoms, well-being, and correlates in adolescence: A multiverse exploration via cross-lagged panel network models

Internalizing symptoms are the most prevalent mental health problem in adolescents, with sharp increases seen, particularly for girls, and evidence that young people today report more problems than previous generations. It is therefore critical to measure and monitor these states on a large scale and consider correlates. We used novel panel network methodology to explore relationships between internalizing symptoms, well-being, and inter/intrapersonal indicators. A multiverse design was used with 32 conditions to consider the stability of results across arbitrary researcher decisions in a large community sample over three years ( N = 15,843, aged 11 – 12 at Time 1). Networks were consistently similar for girls and boys. Stable trait-like effects within anxiety, attentional, and social indicators were found. Within-person networks were densely connected and suggested mental health and inter/intrapersonal correlates related to one another in similar complex ways. The multiverse design suggested the particular operationalization of items can substantially influence conclusions. Nevertheless, indicators such as thinking clearly, unhappiness, dealing with stress, and worry showed more consistent centrality, suggesting these indicators may play particularly important roles in the development of mental health in adolescence.

(Received 1 September 2020; revised 28 January 2021; accepted 9 March 2021) Adolescence is recognized as a key developmental phase characterized by rapid physical, social, and psychological change (Dahl, Allen, Wilbrecht, & Suleiman, 2018;Patton et al., 2016;Sawyer, Azzopardi, Wickremarathne, & Patton, 2018). The majority of lifetime disorders also show first onset in the teenage years (Jones, 2013). Early adolescence is likely particularly important to understanding what sets in motion changes in mental health. For instance, key gender differences emerge, and contextual factors such as puberty and school transition are in process (Patalay & Fitzsimons, 2017. Evidence of the correlates of mental health in this age could therefore be key to improving identification, intervention, and prevention (Patalay & Fitzsimons, 2018). However, based on available evidence, methodological challenges make it difficult to determine which indicators are particularly important (see sections outlining analytical considerations below). This study therefore makes use of a new panel network model (Epskamp, 2020b) to consider indicator-level interactions (pairwise causal associations that are often bidirectional; Epskamp, Rhemtulla, & Borsboom, 2017) between internalizing symptoms, well-being, and inter/intrapersonal indicators.
Sharp increases in internalizing problems, particularly for girls, make up much of the mental health difficulties faced by adolescents (Rapee et al., 2019), and evidence suggests levels of these problems are increasing over time (Collishaw, 2015;NHS Digital, 2018). Furthermore, internalizing problems are also highly comorbid with other disorders (Carrellas, Biederman, & Uchida, 2017;Merikangas et al., 2010;NHS Digital, 2018;Wolff & Ollendick, 2006) and substantially correlated with other symptoms (Black, Panayiotou, & Humphrey, 2019;Patalay et al., 2015), making them an important focus for inquiry. Large numbers of adolescents also experience subthreshold internalizing symptoms. For instance, up to 12% of 11-14-year-olds experience subthreshold levels of depression (Bertha & Balázs, 2013). Here we consider the key theoretical and analytical considerations in the robust study of internalizing problems.

Theoretical considerations
The role of well-being Further insight into internalizing problems and those at risk might be afforded by also measuring well-being (Bartels, Cacioppo, van Beijsterveldt, & Boomsma, 2013). This reflects the World Health Organization's longstanding definition that mental health should not consist only of the absence of symptoms (WHO, 1946). This broader conceptualization is also likely to be more useful in nonclinical samples, since positive mental health can capture greater variability (Alexander, Salum, Swanson, & Milham, 2020). Well-being is also closely related to internalizing symptoms, statistically and conceptually (Black, Panayiotou, & Humphrey, 2020b), showing substantial correlations for total scores and latent constructs (.41-.68;Antaramian, Huebner, Hills, & Valois, 2010;Black et al., 2019;Suldo, Thalji, & Ferron, 2011). Correlations around this level suggest that constructs are substantially related while each still contributes distinct information.
Furthermore, given that self-report adolescent mental health problem data can be error-prone, it can be argued that well-being might be used to strengthen measurement. Specifically, substantial measurement error in adolescent mental health problems is suggested by low inter-rater associations and varying approaches to classification, and there is no clear criterion against which such measures can be validated (Wolpert & Rutter, 2018). Commonly used symptom measures are typically old and/or based on limited psychometric investigation (Bentley, Hartley, & Bucci, 2019;Black, Mansfield, & Panayiotou, 2020a;Dedrick, Greenbaum, Friedman, Wetherington, & Knoff, 1997;Goodman, 2001), whereas newer well-being measures that followed modern and rigorous item-development and validation standards (e.g., Ravens-Sieberer et al., 2005;Stewart-Brown et al., 2009), may complement symptom data and improve measurement accuracy. Routine adoption of such measures is also empirically justified since well-being seems to relate at a similar level to different domains of psychopathology as these relate to one another (e.g., Black et al., 2019). Since these psychopathology domains have been amalgamated into composites (e.g., Patalay & Fitzsimons, 2016), and there is conceptual and statistical similarity at the indicator level for internalizing symptoms and well-being (Black et al., 2020b), using well-being measures to capture additional information can be a useful approach.

Intra and interpersonal correlates of internalizing symptoms in adolescence
There is a substantial body of literature covering the developmental risk and promotive correlates of mental health in adolescents (for reviews see for example, Evans, Li, & Whipple, 2013;Fritz, de Graaff, Caisley, van Harmelen, & Wilkinson, 2018;Masten & Barnes, 2018). Moreover, there is theoretical consensus that systems models are appropriate (Bronfenbrenner, 2005;Evans et al., 2013;Masten & Barnes, 2018), namely considering factors from across personal (e.g., problem solving), family (e.g., secure attachment), and wider environments (e.g., school connectedness), and key correlates have consistently been identified across samples and methods (Masten & Barnes, 2018). It is important to capture these multiple systems since effects can cascade from one level to the other such that the interaction between mental health and environments is inherently complex (Masten & Cicchetti, 2010). Consistent with other literature considering the dynamic interplay of correlates and mental health, we focus on malleable (i.e., intra and interpersonal factors) rather than biological or socioeconomic variables .
For internalizing problems in adolescence specifically, it is thought that social factors and emotional regulation are particularly key factors (Rapee et al., 2019), suggesting these should be particularly studied in the development of internalizing symptoms. The sudden physical, psychological, and social changes experienced in adolescence might affect expectations and views of young people, and these changes likely in turn impact internalizing symptoms (Rapee et al., 2019). Perceived home, peer, and school support are therefore likely important correlates. More generally, emotion regulation can be impacted by difficult home environments (e.g., maternal depression or parental conflict), and resulting difficulties managing emotions pose significant risk for internalizing problems (Thompson, 2019).

Gender differences
Inclusion of such correlates also facilitates consideration of a key issue for internalizing symptoms in adolescence, namely that these disproportionately affect girls (Merikangas et al., 2010;NHS Digital, 2018), and that this is increasingly the case (Bor, Dean, Najman, & Hayatbakhsh, 2014;Collishaw, 2015). A key theme in the theoretical literature is whether girls and boys experience quantitatively or qualitatively different risk factors (Hyde, Mezulis, & Abramson, 2008). Indicator-level analysis of internalizing symptoms, well-being, and relevant malleable correlates over time may therefore shed light on this question. For instance, it may be that previous construct-level analyses have made differences difficult to pin-point with variation occurring (qualitatively) at the indicator level. Alternatively, if a common network structure that varies in edge strength is found, quantitative differences may explain prevalence findings.

Analytical considerations
Within-and between-person effects Longitudinal data consist of both variation within individuals (over time), and variation between individuals (Curran & Bauer, 2011). In panel data, people are nested in time, much as in multilevel data, for instance, children are nested in schools. This allows for the consideration of how variables influence one another within people on average over time, taking account of stable (or trait-like) individual differences. In the estimation of the cross-lagged panel network, estimated stable means, and deviations from these over time allow for a network of trait-like effects over time, a longitudinal network of malleable effects over time, and a contemporaneous network of (undirected) state-like effects that happen within the lag considered (in our case more quickly than once a year). For example, adolescents' general tendencies to report anxiety might be related to their general tendencies to report perceived social support. This trait-like effect therefore needs to be controlled for when considering the direction and strength of the temporal association between anxiety and social support.
This kind of disaggregation has led to new findings in construct-level panel models. For instance, while bidirectional relationships have been observed for internalizing and externalizing symptoms, only the latter predicted the former when disaggregated effects were considered (Flouri et al., 2019;Oh et al., 2020). Similar findings have been observed for adolescent depression and self-esteem (Masselink et al., 2018). There is also early evidence in younger children that correlates at different ecological levels can interact reciprocally at the within-person level (after accounting for between-person effects). Kaufman, Kretschmer, Huitsing, and Veenstra (2020) found evidence of such effects for internalizing symptoms, parenting, and bullying.
Thus, to understand how temporal effects between psychological variables occur for the average individual, analysis of withinperson effects, accounting for between-person differences, is needed. For instance, we might consider whether change in internalizing problems is predicted by bullying. Without disaggregated analysis, and assuming other requirements for causal inference are met (Rohrer, 2018), we cannot be sure that those experiencing symptoms are not in fact also those commonly targeted by bullies (a between-person effect). Crucially, while it is well established that disaggregation of within and between-person effects is needed for accurate inferences to be made, it is still common-place to assume within-person processes from analyses representing a blend of within and between variance (Hamaker, Kuiper, & Grasman, 2015).

Network analysis
While the studies cited above have modeled within and betweenperson effects separately, they have relied on total scores and latent factors which treat individual symptoms as indicators of a given mental state. While this approach can be statistically equivalent (Fried, 2020), we argue it is theoretically problematic, given the absence of external evidence for disorders, the likelihood that mental health states are contributed to by a constellation of biological and environmental factors, and the fact many disorders share indicators (Borsboom, Cramer, & Kalis, 2018;Borsboom, Cramer, Schmittmann, Epskamp, & Waldorp, 2011). Network approaches might better capture the nuance and complexity likely to be present in adolescent mental health (Kalisch et al., 2019). These offer the opportunity to consider individual indicators and correlates as outcomes and predictors while accounting for all other indicators in the model (Epskamp, 2020b;Kalisch et al., 2019). For example, from the network perspective we can consider the unique association of bullying and worry, and in longitudinal networks we can also track direction. Mental states and their correlates can therefore be represented as dynamic with interactive indicators, such that, for instance, bullying leads to worry, which in turn leads to somatic symptoms, which in turn leads to unhappiness. The modeling of indicators, and not latent variables, within network models is arguably particularly appropriate for internalizing symptoms in adolescence since evidence suggests a lack of clear clustering into theoretical disorders (e.g., depression, anxiety) for this domain (McElroy & Patalay, 2019;McElroy, Fearon, Belsky, Fonagy, & Patalay, 2018).
Item-level differences in reporting have also been found in young adolescents for internalizing and well-being (Black et al., 2019). Similarly, analysis of adult samples suggest indicator-level analysis could be important to understanding gender differences. Fried, Nesse, Zivin, Guille, and Sen (2014) found men reported more suicidal ideation and psychomotor symptoms of depression in response to stress, while women reported more fatigue, appetite, and sleep problems. Within-person analysis at the indicatorlevel could therefore be key to improving understanding of the development of internalizing symptoms, including gender differences.

The current study
The current study aimed to explore indicator-level within and between-person associations for internalizing symptoms, wellbeing, and inter/intrapersonal correlates via novel panel network models (Epskamp, 2020b). A conceptual demonstration of the panel network model, is shown in Figure 1. This diagram is simplified to aid interpretation and therefore shows parameters for only two indicators, while in the current study 22 are included. The existence of large panel studies represents an opportunity to consider longitudinal indicator-level associations in rich datasets, in which within-person effects can be modeled (Curran & Bauer, 2011). We therefore conducted secondary analysis of a dataset designed to explore and test new ways to improve mental health and well-being of young people aged 10-16. The current study was based on existing data which we were familiar with the HeadStart (HS) evaluation . Therefore, a multiverse approach in which multiple combinations of possible reasonable decisions are analyzed in parallel, was used to avoid researcher degrees of freedom obscuring results (Simmons, Nelson, & Simonsohn, 2011;Steegen, Tuerlinckx, Gelman, & Vanpaemel, 2016;Weston, Ritchie, Rohrer, & Przybylski, 2019).
Given gender differences for internalizing symptoms and personal and social resources (Rapee et al., 2019), we expected that associations in within and between models would be noninvariant across girls and boys. We hypothesized that irrespective of gender (a) social problems (e.g., being bullied) would show positive associations with internalizing symptoms and negative with wellbeing; (b) well-being and symptoms would be negatively associated; (c) intrapersonal factors (e.g., the ability to handle stress) would be negatively associated with symptoms and positively with well-being; (d) social support would be negatively associated with symptoms and positively associated with well-being. Given the lack of studies analyzing disaggregated models, we were unable to specify which effects would be observed at within or between-person levels. Finally, we explored which indicators were the most influential and the most predicted.

Background and procedure
We undertook secondary analysis of data from three annual waves (2017-2019) collected from a longitudinal cohort study. The project from which data were drawn aims to explore and test new ways to improve mental health and well-being of young people aged 10-16 and prevent serious mental health issues from developing.
Ethical approval was granted by the UCL ethics committe (reference: 8097/003), and opt-out parental consent was given for adolescents to complete secure online surveys during the school day. Teachers read out an information sheet which emphasized pupils' confidentiality and right to withdraw. Socio-demographic data were drawn from the National Pupil Database.

Participants
Data were collected from 15,859 pupils in year seven (age 11-12) at Time 1, from 118 secondary schools in England (52.7% female). Given the focus of the project, the sample was not drawn to be representative: 35.4% had ever been eligible for free school meals at Time 1 compared to the national figure of 28.5% eligible in the previous six years (Department for Education, 2017a); 12.0% had special educational needs (national figure = 14.4%; Department for Education, 2017c); in terms of ethnicity, 74.2% were white (national figure = 75.2%), 9.3% were Asian (national figure = 10.7%), 5.7% were of Black origin (national figure = 5.6%), 4.0% were of mixed origin (national figure = 5.0%), .2% were Chinese (national figure = .45), while 1.6% were classified as any other ethnic group (national figure = 1.75), and 1.5% were unclassified (national figure = 1.5%; Department for Education, 2017b). Of this total sample, 16 were removed from the current study since they had missing data for all items included for analysis.

Item selection
The conceptual domains explored in the current study (based on the literature reviewed above and indicators available in the dataset at each time point) were: internalizing symptoms (including attentional symptoms and social withdrawal; American Psychiatric Association, 2013; WHO, 2018), well-being, home, school and peer support, and intrapersonal factors such as managing stress. The choice of indicators was restricted, given that the software used for the panel network analysis currently cannot handle more than around 30 (Epskamp, 2020b) and it is not appropriate to indiscriminately include highly similar indicators in networks Rhemtulla, Cramer, van Bork, & Williams, 2018). Items were therefore selected from those available in the dataset according to the following criteria: (a) conceptual domain, (b) item simplicity, given issues highlighted in this area (Black et al., 2020a), (c) descriptive and factor model statistics.

Multiverse approach
In order to increase transparency, sensitivity analyses of many possible analytical decisions were conducted (Steegen et al., 2016;Weston et al., 2019). In line with multiple specification approaches (Simonsohn, Simmons, & Nelson, 2020;Steegen et al., 2016), variation in decisions was limited to those we considered likely to provide valid insight. In addition, given the novelty and computationally demanding nature of the analyses presented here, we also limited conditions based on feasibility. For instance, given that valid inferences can be drawn across a wide range of search algorithms at large sample sizes (Epskamp, 2020b), we used only two such robust, but relatively computationally light, procedures. 1 Two aspects were identified as vulnerable to researcher degrees of freedom. First, the choice of items was in some cases arbitrary such that there were items from more than one scale that considered the same relevant experience. Second, the novelty of the method means that which estimation algorithm is most appropriate has not been clearly established. In such instances, multiverse approaches are recommended (Epskamp, 2019). This resulted in 16 possible datasets (based on varying two possible item operationalizations for four items: distracted, mind, optimism, problem, see Table 1) × two search algorithms, meaning that models for 32 conditions were estimated. For more details on the choice of items see the supplementary material (S1). Only full information maximum likelihood (FIML) estimation was selected since data cannot yet be treated as ordinal in the panel network model and no robust adjustments are available. Two equally robust pruning methods were considered: alpha at .01, and this plus stepwise modification based on the Bayesian information criterion (BIC).
We also kept the number of indicators in each model constant since we wanted to avoid overfitting by including multiple indicators of the same experience (e.g., two peer support items), and since networks are not directly comparable with varying numbers of nodes (Costantini et al., 2019). Missing data were retained for all conditions given FIML estimation, and since analysis was at the item level, and data were ordinal, outliers were not considered.
The resulting design allowed us to assess the stability of the most influential and predicted indicators and gender invariance across these decisions. Fit was assumed to be good across conditions given the data-driven approach, and was not used to compare conditions. Since our analysis was exploratory, testing multiple contingent effects, we approached the results of our sensitivity analyses descriptively in line with Steegen et al. (2016). We therefore present how fit, strength, and gender invariance varied across analyses.

Analysis
Code for all analyses, and simulated data for the purpose of running code, is available in the supplementary material (S2-S4). In Figure 1. Conceptual diagram of a panel network model for two indicators, x and y, at three time points, T1-T3. Paths a-d represent average within-person directed partial correlations, including autocorrelations (temporal network). Paths marked f represent within-person partial correlations within lags (contemporaneous networks), with e representing the residual for each indicator after accounting for temporal effects. Path g represents between-person partial correlations for stable trait-like effects (between network). 1We found it was not possible to run the modelsearch function with item-level panel analysis in psychonetrics.
order to count relationships counter to our hypotheses across conditions, all indicators were coded to have a positive manifold (e.g., well-being indicators were reversed with respect to symptoms). The first stage of the main analysis (for each condition) was to estimate a panel network model for each whole sample in the psychonetrics package in R (0.7.1; Epskamp, 2020a). Once the model was estimated, nonsignificant parameters were recursively pruned at α = .01 and then parameters were added one at a time based on modification indices to minimize the BIC, via the step-up function. This data-driven approach is consistent with network methods (Epskamp, Borsboom, & Fried, 2018a;. Given this, model fit was expected to be good, with comparative fit index (CFI) > .95 and root mean square error of approximation (RMSEA) < .06 (Hu & Bentler, 1999).
Once each full sample network was estimated (via basic pruning and stepwise modification), three matrices from the model were extracted for invariance testing and further consideration: (a) the temporal matrix which encodes directed partial correlations for the average within-person effects over time; (b) the contemporaneous matrix which encodes partial correlations for the average within-person effects within lags (after accounting for the temporal effects); (c) the between-persons matrix which encodes partial correlations for stable trait-like differences across all time points. Average networks across all 32 models, excluding edges that occurred less than 50% of the time following Lin, Fried, and Eaton (2020), were plotted in qgraph (1.6.5; Epskamp et al., 2012) with red lines indicating negative parameter values (edges) and blue positive. In the temporal network, arrows between nodes indicate directed partial correlations while curved arrows represent autoregressions.
Finally, strength centrality was considered for networks in each model. Strength represents the sum of absolute edge weights for any given node (Costantini et al., 2015). For temporal networks, this includes both in-strength and out-strength, with the former indicating the relative predictability and the latter the relative influence of the target node. For undirected networks, a single strength index represents the overall extent to which a given node is directly influenced by or influences others.
Network matrices were also inspected to determine the number and size of edges and whether these were in expected directions. Note. SDQ = strengths and difficulties questionnaire; SWEMWBS = short Warwick-Edinburgh mental well-being scale; TEIQUE-ASF = trait emotional intelligence questionnaire-adolescent short form; PSS-4 = 4-item perceived stress scale; SRS = student resilience survey.

Gender invariance
Following standard practices for invariance testing two models were tested: an unconstrained model or H1, and a constrained model or H0, where H0 is nested in H1. Temporal, between, and contemporaneous matrices were used to determine which parameters should be considered in an unconstrained model (i.e., those retained in the whole sample were estimated for each group). In this model these parameters of interest were freely estimated in girls and boys simultaneously to provide a point of comparison for subsequent constraints. In the constrained model, all three matrices were then set to equality in girls and boys, and the resulting model was compared to the unconstrained model. Given the sample size of the current study, models were compared based on the Akaike information criterion (AIC) and BIC which penalize for model complexity (van de Schoot, Lugtig, & Hox, 2012), rather than chi-square difference testing which can be sensitive to large samples (Crede & Harms, 2019). Lower values for AIC and BIC indicate better model fit. Since the constrained model was more parsimonious, we interpreted higher AIC and BIC values for the constrained model as indicative of noninvariance.

Results
Gender was missing for .3% of the sample. Missing data for survey indicators were low for the first wave but higher for subsequent time points (Time 1 = 2.6%-6.6%, Time 2 = 16.4%-20.9%, Time 3 = 25.9%-29.6%). Descriptive statistics are summarized in Table 1. The average fit of models is presented in Table 2 and a full summary of fit statistics for each model can be found in the supplementary material (S5). In general, differences between equivalent datasets using different estimation algorithms were small indicating good stability across these. Though data-driven approaches were used to estimate models, and parameter estimates varied, the stable good fit across conditions nevertheless indicated that stationarity constraints imposed in the model (paths a and b in Figure 1) were reasonable in all cases (Epskamp, 2020b). In terms of invariance, the same mixed result was found across all conditions: AIC favored the unconstrained model while BIC favored the constrained model. This suggests differences in network structure between girls and boys were likely small. Post-hoc consideration of RMSEA and CFI also revealed differences typically considered to be small (Meade, Johnson, & Braddy, 2008), 2 (−.007 to −.006 for CFI, with M = −.006, SD < .001; range within <−.001 for RMSEA, M < .001, SD < .001). Edges for contemporaneous and between networks are interpreted as partial correlation coefficients, and those for temporal as directed partial directed correlations (standardized beta coefficients). For each network within each condition the number of parameters, means, standard deviations, and number of negative edges (unexpected results relative to our hypotheses, given the recoding of indicators to have a positive manifold) can be seen in the supplementary material (S6). Between networks had the fewest edges (3-23), though these were relatively large (ranging in absolute value from r = .004-r > .99 for similar indicators such as distracted and restless in some models with the mean of mean edge sizes within networks across conditions M = .40). No unexpected negative edges were found for between networks in any condition. Contemporaneous networks were more densely connected with 116-130 edges (r = .06-.27 in absolute value with the mean of mean edge sizes within networks across conditions M = .07), and with consistent unexpected negatives across all conditions (7-12; for example, a small negative edge featured in every contemporaneous network for the being bullied and [not] think clearly indicators). Temporal networks were also dense (166-196 edges; β = .05-.27 in absolute value, mean of means M = .06) with 1-3 unexpected edges found for each condition. These were consistently found for worry → think, school → withdrawn, and unhappy → peer (this was nonsignificant in eight conditions). Most estimated parameters were significant across conditions (0-12 were nonsignificant for any given condition, p < .01). In terms of edge parameters, only temporal networks occasionally included nonsignificant edges: 19 different edges in temporal networks were nonsignificant in different conditions, with most of these edges not occurring frequently across conditions or only rarely being nonsignificant (full information for all parameters in all conditions is provided in the supplementary material (S7-S10).
To summarize these networks across all conditions, for each of between, contemporaneous, and temporal networks, the mean of edges was calculated after excluding those that appeared in less than 50% of conditions. This resulted in 182 edges (37.60% of all possible edges) being retained in the average temporal network, all of which appeared across all conditions. Similar stability was seen for the between and contemporaneous networks, with all edges estimated across conditions appearing in 50% or more conditions (between: six edges, 2.60%; contemporaneous: 134 edges, 58.01%). The mean edge size for the average between network was r = .32 (range = .52), r = .06, (range = .32) for the average contemporaneous network, and r = .06 (range = .29) for the average temporal network. Autoregressive effects were present for all nodes in the average temporal network and ranged from .06 to .24 (M = .14, SD = .04). The average networks are summarized in Figure 2 with the thickness of edges scaled across the three panels (i.e., it is equivalent across each plot), and the supplementary material (S12). As mentioned above, a handful of nonsignificant parameters were found in temporal networks, six of which appear in the averaged network (all were nonsignificant only once across the 32 conditions, except unhappy → peer as described above).
Strength centrality was calculated for temporal and contemporaneous networks only, given the sparsity found for the between networks. Which nodes were most central, tended to depend on the condition. In and out strength for the temporal networks are shown in Figures 3 and 4, while strength for the contemporaneous network is shown in Figure 5. Stress was consistently high for in-strength but other nodes varied substantially. Stress was again fairly consistently one of the most central for out-strength as was worry, though again substantial variation in out-strength was seen for most nodes. Worry and think were consistently the most central for strength in the contemporaneous network. Nodes that were represented by varying items, depending on the condition, often showed particular discrepancies for strength (e.g., mind in Figure 3). However, nodes with the same item across conditions also showed substantial variation (e.g., worry 2We are not aware of simulation work providing recommendations for the size of alternative fit index differences for network invariance and therefore provide this example for confirmatory factor analysis (which recommends CFI difference of <.002 to consider invariance) since it includes larger samples closest to that used here. Note. df = degrees of freedom; CFI = comparative fit index; RMSEA = root mean square error of approximation; AIC = Akaike information criterion; BIC = Bayesian information criterion; M = mean; SD = standard deviation.
in Figure 3), given the conditional nature of edges which account for all others in the model. Each line represents how the in-strength of each node varies depending on the condition. Only nodes where the maximum in-strength is always >.40 are shown in color and labeled for ease of reading.
Each line represents how the out-strength of each node varies depending on the condition. Only nodes where the maximum out-strength is always >.50 are shown in color and labeled for ease of reading.
Each line represents how the strength of each node varies depending on the condition. Only nodes where the maximum strength is always >.90 are shown in color and labeled for ease of reading.

Discussion
We explored stable trait-like and within-person associations over time for internalizing symptoms, well-being and inter and intrapersonal correlates at the indicator level. A multiverse approach was adopted, varying estimation algorithms and operationalizations of certain indicators, given that secondary data analysis was conducted, and new methods were used (Epskamp, 2019;Weston et al., 2019). Though network analyses have boomed in recent years (Robinaugh, Hoekstra, Toner, & Borsboom, 2020), this was the first study to adopt a crossed multiverse design, to our knowledge, and an early example of Epskamp's (2020b) panel methodology. While previous work has considered longitudinal relationships between internalizing symptoms and inter/intrapersonal correlates (e.g., Goodman, Samek, Wilson, Iacono, & McGue, 2019;Saint-Georges & Vaillancourt, 2020), work at the indicator level was lacking. This revealed relationships between indicators of different domains, suggesting latent-variable approaches may miss complexity. Similarly, while some work has considered both symptoms and well-being over time (e.g., Patalay & Fitzsimons, 2018), covariance between these domains was only considered by controlling for each at the first time point. We found a sparse between-person network with few strong associations, while the contemporaneous (average within-lag within-person associations) and temporal (directed within-person associations) networks, were densely connected. All weights matrices were highly correlated, and networks showed good stability across conditions. We did not find clear evidence that networks differed between girls and boys, and results were consistent across conditions. Findings suggest if differences existed for the indicators used here, they were likely trivial. Finally, the choice of item operationalization had a substantial impact on strength centrality (considered for the within-person networks), though certain nodes were consistently central.

Between-person Findings
The between network revealed partial correlations in expected directions, some of which were very large. These were between attentional, anxiety, and social indicators. These could reflect consistent cognitive vulnerabilities, environments, personality traits (e.g., agreeableness and neuroticism) or stable biological factors (Fraley & Roberts, 2005). There were notably no between-person relationships among indicators of different domains (e.g., internalizing and well-being or internalizing and social correlates) despite the fact that such domains have shown meaningful relationships elsewhere (e.g., Patalay & Fitzsimons, 2018). The contrast in our findings with prior work could result for several reasons, including our disaggregation of within and betweenperson effects, control of informant-type, and separate modeling of temporal and contemporaneous effects.
Though there was a relatively strong effect between peer support and withdrawal, which could be considered different domains (internalizing and interpersonal), we interpret this in line with the other effects in the between network: Those indicators involved were very similar and tended to be rated in similar ways over time, that is, a trait-like tendency over time to rate high or low peer support was strongly related to a trait-like tendency to rate low or high social withdrawal. The fact that indicators of different domains were conditionally independent in the between network suggests that covariance between these domains may be more state like. We were able to identify this by controlling for trait-like reporting effects in the between network. The relative sparsity of the between network also indicates the majority of covariances were not stable and trait-like, consistent with the rapidly changing developmental context of early adolescence described in the introduction.

Within-person Findings
Dense within-person, temporal and contemporaneous, networks were found. These findings fit with systems approaches in which aspects from different levels (e.g., home and intrapersonal factors) interact with one another (Bronfenbrenner, 2005;Evans et al., 2013;Masten & Barnes, 2018). Furthermore, there was little evidence of particular associations for certain inter or intrapersonal factors being associated with only symptoms or well-being as has been suggested elsewhere (Patalay & Fitzsimons, 2016). Rather, symptoms, well-being and inter/intrapersonal factors seemed to influence one another in similar ways.
While both within-person networks were relatively dense, larger relationships were typically seen in the contemporaneous network. The current study sought to understand relationships between specific indicators (e.g., thinking clearly and being bullied) rather than latent constructs (e.g., well-being or peer problems). While levels of specific indicators such as these likely have meaningful relationships over time, the dense contemporaneous network suggests that interactions between the indicators modeled here often happened more quickly than annually (Epskamp et al., 2018a,b). Since both contemporaneous and temporal networks were relatively dense, many edges were common across both of these networks. Our results therefore suggest that indicators influenced one another both within and across lags. This further points to rapid changes in mental health and correlate variables, consistent with the rapid social, physical, and psychological development seen in early adolescence (Dahl et al., 2018;Patton et al., 2016;Sawyer et al., 2018). To better understand how these processes unfold, future work should vary the length between study waves, and there is a particular need for work focusing on shorter intervals.
Some of the larger effects in the temporal network were autoregressions, with each node showing such an effect. While the indicators studied here are known to be stable or show increasing trajectories (Meeus, 2016), meaning autoregressive effects would be expected, this finding is noteworthy. First, our analysis was at the indicator level, suggesting stability or reinforcement of these states can be specific to this level, rather than the domain (e.g., internalizing symptoms). Second, while latent-variable approaches account for construct-level covariance, parameters in our model controlled for those to all other indicators, and thus also included unique variance beyond that explained by a potential latent variable, which could be substantively important. Third, autoregressive parameters in our analysis accounted for stable between-person differences over time, thus representing more accurate within-person reinforcement of individual experiences over time. These within-person autoregressions have been interpreted by some as warning signals for transition into more disordered states (e.g., van de Leemput et al., 2014), but since we examined a large cohort via survey methods, we did not consider whether individuals were more or less disordered over time. 3 Nevertheless, the age range studied here is thought to be critical in the emergence of mental health problems (Jones, 2013) and rates are known to increase in this age range (Merikangas et al., 2010;NHS Digital, 2018). It may be therefore that cementing of symptoms, well-being indicators and inter/intrapersonal factors all contribute to this change.
Edges were mostly in expected directions, relative to our hypotheses. However, unexpected negative parameters were observed consistently in the temporal and contemporaneous networks. A certain level of such effects in partial correlation networks could be consistent with the nominal alpha level, or due to conditioning on common effects (Epskamp, Waldorp, Mõttus, & Borsboom, 2018b). We are therefore cautious in providing substantive interpretation of these results. However, one such effect was particularly stable across conditions and temporal and contemporaneous networks, that between withdrawn and (lack of) school support. 4 While we anticipated that internalizing symptoms would be positively associated with perceived lack of social support, it may be that adolescents who reported feeling socially withdrawn were focusing on the peer level when responding to the withdrawn item. In fact, the full item reads "I am usually on my own. I generally play alone or keep to myself". It is possible that adolescents who felt withdrawn from their peers tended to garner more support from, or were dependent on, school staff as can be the case for loneliness (Galanaki & Vassilopoulou, 2007).

Centrality
Strength centrality appeared more stable for the contemporaneous network than the temporal. Think and unhappy had the highest strength, depending on the condition, followed by worry, while the rank order of strength varied for the remaining nodes. This suggests that when considering relationships that happened more quickly than over a year, feeling unhappy and thinking clearly were particularly connected to other indicators, sharing the most variance with others (Costantini et al., 2015). Internalizing symptom and well-being indicators appeared to be among the most important in the contemporaneous network, suggesting both outcomes are intricately connected to each other and correlates. This further supports the use of well-being measures to better understand internalizing states, since wellbeing indicators clearly shared meaningful variance with other indicators without being redundant with respect to internalizing indicators.
Being able to deal with stress was one of the most consistently strongly predicted and influential nodes, suggesting that for effects that happened over the course of a year, the indicators in the model often related to this outcome via relatively strong directed partial correlations. Conversely, finding it hard to control feelings, an item designed to measure the same underlying trait as the stress indicator, was sometimes the most central for outstrength, while at other times several other indicators were stronger, and substantially lower values were seen. Worry, which was fairly consistently one of the most central nodes across conditions for out-strength, varied substantially for in-strength. Other particularly wide variations for in-strength were seen for the mind and problem indicators, both of which had varying operationalizations across conditions. Given the finding that the contemporaneous network remained dense, we do not interpret only the temporal centrality results as indicative of risk factors or outcomes. Rather, results 3This would have relied on total scores which can be problematic (McNeish & Wolf, 2020) and inconsistent with our modeling approach.
4The node is considered as lack of school support due to the recoding prior to analysis to obtain a positive manifold for the easy detection of results counter to hypotheses across conditions. suggest that worry, managing stress, thinking clearly and unhappiness may be key indicators for the development of adolescents' mental health. While more work is needed, this suggests that worry and unhappiness may be particularly important symptoms in early adolescence when considering how rapid developmental change is navigated. In turn, the think and stress indicators' centrality suggest that such cognitive indicators may play an important role in the reinforcement of social and psychological processes in this age group.
Our findings also highlight the importance of which items are chosen, and the issues of measurement error in adolescent mental health data. The stability of networks across samples using the same items has been given attention in recent years (e.g., Borsboom et al., 2017;, as has the stability across different measures in certain fields (Fried et al., 2018). However, this was the first study, to our knowledge, to consider the sensitivity of network parameters to item operationalizations in the same sample. We found that while some nodes showed relative stability others varied in strength centrality for both indicators that were constant and those that varied across conditions.

Gender invariance
Gender invariance results were stable across conditions but, we were unable to determine clear support for invariance based on AIC and BIC as recommended by van de Schoot et al. (2012). Consistent with known possible behavior of these criteria, AIC favored the model with more parameters (unconstrained), while BIC did the opposite, favoring the constrained model (Vrieze, 2012). Since we had no clear rationale to favor one over the other, we consider these results in light of other literature and indices (post-hoc). Kan, van der Maas, and Levine (2019) found the same pattern of AIC and BIC for their unconstrained and constrained networks. They concluded that the same structure was applicable to both groups, though at least one edge varied in magnitude. Our post-hoc consideration of CFI and RMSEA also suggested trivial differences, thus supporting the approximate invariance of networks between boys and girls. Substantively, a single pattern of edges fitted both girls and boys, though small differences may exist in the strength of different relationships between particular nodes. Results should be replicated considering other measures and samples, but this suggests tentative evidence that girls and boys may experience quantitative rather than qualitative differences in risk and protective factors for internalizing symptoms, when considering inter/intrapersonal correlates.

Implications
Taken together, the strong dissociated relationships at the between level and densely connected nodes at the within level suggest the apparent discriminant validity of scales may particularly capture between-person differences rather than profiles within individuals. This is consistent with the fact that measures are typically developed using between-person (i.e., cross-sectional) data, such that the covariance structure from which the model is estimated describes variation between people (Molenaar, 2004). Though many analyses assume a blend of within and between effects is modeled without explicitly attending to this, it is often the case that within and between associations are not aligned (Curran & Bauer, 2011). Future work should therefore consider further the within and between properties of measures such as those used here, since they are typically used to probe within-person effects.
Findings further suggest integrated indicator-level approaches to adolescents' mental states and perceived resources should be considered, rather than testing to diagnose specific disorders. Our analysis therefore represents an example of how clinical and research approaches can better align, as has been pointed out for network approaches more generally (Borsboom, 2017). While formulations are often preferred over strict diagnostic criteria by clinicians (Johnstone, 2018), research has tended to rely on simplistic total scores or latent variables to define groups and categories. These are powerful approaches, with many advantages such as the estimation of measurement error. Nevertheless, as indicator-level approaches gain increasing attention (Robinaugh et al., 2020), analyses such as ours can offer more detailed insights. While much more work is needed, the current study demonstrates that brief surveys deployed in large samples can be modeled in more nuanced ways. There is therefore potential to move beyond disorder-level (i.e., total-score) approaches. In addition, it may not be enough to disaggregate within and between effects at the construct level, since within-person effects likely happen across domains in a complex way (Borsboom et al., 2018). More transdiagnostic and indicator-level work is therefore needed to better understand within and between-person effects.
In addition to the substantive implications, our multiverse design revealed methodological issues. Where observed-data level networks are considered, as is typically the case (Robinaugh et al., 2020), rather than at the latent level (Epskamp, 2020b;Epskamp et al., 2017), researchers should be aware that item-level error may affect conclusions. Since many authors rely on single measures of each construct in their datasets, they will be unable to verify whether, for instance, centrality is robust to variations in items. Our out-strength results particularly demonstrate that had we chosen any one of the 32 conditions as the focus of our analysis, our conclusions could have varied substantially. While there are calls for increased use of latent networks, our approach also reveals that even in large rich datasets, there may not be enough indicators of each construct to conduct such analysis. For instance, our dataset had only one bullying item. We therefore echo the recent call for methods to be designed explicitly with network methods in mind (Robinaugh et al., 2020).

Strengths and limitations
The current study drew on a large sample, disaggregated withinperson variance from stable trait-like effects, and incorporated a comprehensive multiverse design. Despite this a number of limitations must be acknowledged. First, the panel methodology adopted did not allow us to control for stable covariates, such as socioeconomic status. While the sample was purposively drawn to target those at risk, and therefore generally consisted of more deprived adolescents, there was variation in this. The sample was therefore also not representative and results should only be generalized to similar community samples with above average levels of deprivation.
We were also unable to account for the nonnormal ordinal nature of our data since this is not yet possible in psychonetrics. Nevertheless, this is consistent with much of the network literature to date, which often treats similar Likert-type data to that used here as continuous (Robinaugh et al., 2020). In addition, the use of a polychoric matrix to account for the ordinal nature of items in skewed data, such as that used here, can lead to bias (Fried, van Borkulo, & Epskamp, 2020). It can also lead to convergence issues in samples with substantial missingness, as was the case here, suggesting FIML was more appropriate. We also had little to draw on to interpret our invariance analyses, and more work is needed to understand the properties of fit indices when comparing networks.
Quality issues have been highlighted for some SDQ items, from which internalizing and bullying indicators were drawn (Black et al., 2020a), though self-report mental health measures are typically of low quality (Bentley et al., 2019). Finally, decisions about which items were interchangeable were subjectively considered based on content, though decisions in multiverse analyses are not expected to be uniform across researchers (Simonsohn et al., 2020).

Conclusion
The current multiverse panel network model allowed consideration of complex interactions between indicators of mental health and inter/intrapersonal factors consistent with theory and clinical approaches (Bronfenbrenner, 2005;Johnstone, 2018). Stable traitlike effects within anxiety, attentional and social indicators were found that were insensitive to analytical decisions. No clear differences were observed between boys and girls. Within-person networks were densely connected and relationships between indicators often unfolded within waves, suggesting more work should consider shorter lags. Mental health and inter/intrapersonal indicators appeared to relate to one another in similar complex ways. Our multiverse design revealed that the particular operationalization of items can have substantial effects on conclusions. Nevertheless, indicators such as thinking clearly, unhappiness, dealing with stress and worry showed more consistent centrality, suggesting these indicators may play particularly important roles in the development of mental health in adolescence.
Supplementary Material. The supplementary material for this article can be found at https://doi.org/10.1017/S0954579421000225 Data Availability Statement. The HeadStart (HS) survey data on mental health and well-being belongs to the Evidence Based Practice Unit (a collaboration between UCL and the Anna Freud National Centre for Children and Families, AFNCCF), who led the HS evaluation. The authors accessed this survey data via membership in a consortium involved with the HS evaluation. As collaborators on the main HS evaluation, the authors were granted secure remote access to this data by the principal investigator of the main HS evaluation, Dr. Jessica Deighton. HS data cannot be made publicly available, since consent was not obtained from participants for the public sharing of their survey responses. However, an anonymized version of the survey dataset used in the present paper is available on request from Dr. Jessica Deighton (Jessica. DeightonPhD@annafreud.org) or Dr. Tanya Lereya (Tanya.lereya@annafreud.org) under the following terms: (a) schedule and arrange for site visit to AFNCCF to analyze data (password to user account supplied); (b) analysis to be worked on in situ; (c) results (but not data) taken away. In the event that either of these individual leaves the AFNCCF, updated contact information for new guardians of the data will be provided to the Journal of Abnormal Child Psychology. Code is provided in the supplementary material.
Funding Statement. The data used in this study were collected as part of the HeadStart learning program and supported by funding from The National Lottery Community Fund, grant R118420. The content is solely the responsibility of the authors and it does not reflect the views of The National Lottery Community Fund.