Defining the p-factor: an empirical test of five leading theories

Background Despite statistical evidence of a general factor of psychopathology (i.e., p-factor), there is little agreement about what the p-factor represents. Researchers have proposed five theories: dispositional negative emotionality (neuroticism), impulsive responsivity to emotions (impulsivity), thought dysfunction, low cognitive functioning, and impairment. These theories have primarily been inferred from patterns of loadings of diagnoses on p-factors with different sets of diagnoses included in different studies. Researchers who have directly examined these theories of p have examined a subset of the theories in any single sample, limiting the ability to compare the size of their associations with a p-factor. Methods In a sample of adults (N = 1833, Mage = 34.20, 54.4% female, 53.3% white) who completed diagnostic assessments, self-report measures, and cognitive tests, we evaluated statistical p-factor structures across modeling approaches and compared the strength of associations among the p-factor and indicators of each of these five theories. Results We found consistent evidence of the p-factor's unidimensionality across one-factor and bifactor models. The p-factor was most strongly and similarly associated with neuroticism (r = .88), impairment (r = .88), and impulsivity (r = .87), χ2(1)s < .15, ps > .70, and less strongly associated with thought dysfunction (r = .78), χ2(1)s > 3.92, ps < .05, and cognitive functioning (r = −.25), χ2(1)s > 189.56, ps < .01. Conclusions We discuss a tripartite definition of p that involves the transaction of impulsive responses to frequent negative emotions leading to impairment that extends and synthesizes previous theories of psychopathology.


Impulsive responsivity to emotions
Alternatively, impulsive, maladaptive responses to negative emotions may define p (Carver, Johnson, & Timpano, 2017). Impulsive responses may include impulsive inaction (e.g., passive avoidance or rumination) or action (e.g., aggressive behaviors), occur without much planning, and be maladaptively overreactive in the context used. p-factors have been associated with indicators of impulsivity, such as low conscientiousness (r = −.31; Caspi et al., 2014) and poor response inhibition (rs: −.34 to −.14; Castellanos-Ryan et al., 2016;Martel et al., 2017). Impulsivity has predicted a range of behaviors characteristic of psychopathology including non-suicidal self-injury (Riley, Combs, Jordan, & Smith, 2015), posttraumatic stress disorder symptoms (Gaher et al., 2014), and substance use (Riley, Rukavina, & Smith, 2016), above and beyond negative emotionality (Settles et al., 2012), suggesting that impulsive responsivity to emotions may be related to p regardless of the frequency or intensity of negative emotions.

Low cognitive functioning
Complementing these affective (i.e., negative emotionality) and behavioral (i.e., impulsivity) theories, some researchers have argued low cognitive functioning best characterizes p. p-factors have been negatively associated with IQ (rs: −.19 to −.10; Caspi et al., 2014;Castellanos-Ryan et al., 2016), executive functioning (rs: −.24 to −.07; e.g., attention, processing speed, visual-motor coordination; Castellanos-Ryan et al., 2016;Martel et al., 2017), and positively associated with cognitive problems in everyday life (rs: .20-.30; e.g., concentration problems, forgetfulness, difficulties organizing tasks; Caspi & Moffitt, 2018). Low cognitive functioning may predispose people to develop psychopathology because (a) low cognitive functioning indicates neuroanatomical abnormalities that increase a person's risk for developing psychopathology; (b) low cognitive functioning increases the risk and exposure to stressors that increase the likelihood of developing psychopathology; or (c) low cognitive functioning impairs treatment-seeking and -engagement, resulting in an increased burden of psychopathology (Caspi & Moffitt, 2018). However, the magnitude of the relations between Caspi et al.'s (2014) p-factor and IQ scores was about half as large as those between the p-factor and negative emotionality.

Impairment
In response to concerns about whether any substantive definition could appropriately capture the range of specific dysfunctions constitutive of psychopathology (e.g., hallucinations, a lack of pleasure, talking excessively), Smith et al. (2020) conceptualize the p-factor as an index of impairment (Oltmanns et al., 2018;Widiger & Oltmanns, 2017). Conceptualizing the p-factor as impairment resolves how seemingly contrasting responses (e.g., sluggishness vs. mania) may positively load onto the same factor because high levels of each symptom can lead to functional impairment (Widiger & Oltmanns, 2017). Lahey et al.'s (2012) p-factor, for instance, was more strongly predictive of functionally impairing outcomes (e.g., suicide attempts, psychiatric hospitalization, convictions for violence) than internalizing or externalizing dimensions.

Limitations
To date, no single study has simultaneously modeled and compared the strength of the relations between a p-factor and indicators of these five theories. Thus, researchers are left to compare the strength of associations between different studies with different samples, which may lead to biased conclusions. Furthermore, Psychological Medicine studies that do include measures of multiple theories have generally included observed correlations with a single indicator of each theory (e.g., Caspi et al., 2014), which may lead to attenuated and less reliable estimates compared to associations among multiindicator latent variables (Bollen, 1989). Directly comparing these five theories in a single sample would offer the strongest comparative test of the strength of their association with a p-factor, providing an initial evaluation of discriminant validity to strengthen the empirical foundations of the theory of the general factor of psychopathology (Fried, 2020).

Current study
In a secondary data analysis of participants recruited to represent a range of psychopathology with a focus on intermittent explosive disorder, we explored two primary hypotheses. First, we examined three potential factor structures of a p-factor using diagnoses, symptom counts, and self-report measures of psychopathology to examine (a) which indicators demonstrated the highest loadings on the p-factor and (b) whether the pattern of loadings differed by the factor structure tested. Second, we added indicators of the five theories of p to each of these models to compare the strength of the associations between each indicator and the p-factor.

Participants
The sample included 1833 community participants (M age = 34.20 years, S.D. = 10.73) from the American Midwest. Roughly half of participants identified as female (54.4%; n = 997), with a similar number identifying as white (53.3%; n = 977). The median reported annual income was $35 000-70 000. A plurality of participants (42.1%; n = 772) had earned at least a college degree. Participants were recruited to be in either a clinical or non-clinical group. Inclusion criteria for the clinical group were: being 18 years old or older and meeting criteria for a DSM-5 [American Psychiatric Association (APA), 2013] current or lifetime syndromal (formerly Axis I) or personality disorder. Exclusion criteria involved the presence of a medical illness requiring chronic treatment (e.g., hypertension, heart disease, diabetes, cancer), active mania or substance use disorder, a lifetime diagnosis of a psychotic disorder, or intellectual disability. Inclusion criteria for the nonclinical group involved being 18 years old or older, the absence of a current or lifetime DSM-5 disorder, and the absence of a medical illness requiring chronic treatment. All participants provided informed consent before engaging in study procedures, and the study was approved by the local university Institutional Review Board.

Measures
Indicators of the p-factor Diagnostic assessments. Participants completed the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID-I; First, Spitzer, Miriam, & Williams, 2002) to assess mood, anxiety, OCD, stress and trauma-related, eating, bipolar, substance use (including alcohol, cannabis, stimulants, and opioids), intermittent explosive, impulse-control, attention-deficit/hyperactivity (ADHD), conduct, and oppositional defiant disorders. Diagnoses were recorded as present or absent.
Participants also completed the Structured Interview for the Diagnosis of Personality Disorders (SIDP-IV; Pfohl, Blum, & Zimmerman, 1997) to assess PDs. Continuous symptom severity scores (items rated 0-3) were used in all models.
Interviews were conducted by master's or doctoral level clinical psychology students who exhibited good-to-excellent inter-rater reliability (average κ = .84; range: .79-.93) across diagnoses of mood, anxiety, substance use, impulse control, and PDs. Because data were originally collected using DSM-IV-TR (APA, 2000) criteria, assessors consulted information gathered during separate clinical interviews with a study psychiatrist and conducted chart reviews to update diagnoses to DSM-5 criteria. Final diagnoses were determined using best-estimate consensus procedures involving research psychiatrists and clinical psychologists (Kosten & Rounsaville, 1992).
Depression: Participants reported on the intensity of depressive symptoms in the prior 2 weeks using the Beck Depression Inventory-II (21 items rated 0-3; Beck, Steer, & Brown, 1996).
Anxiety: Participants reported on the intensity of anxiety symptoms in the prior month using the Beck Anxiety Inventory (21 items rated 0-3; Beck & Steer, 1990).
Anger: Participants reported on the intensity and frequency of anger experiences in general using the State-Trait Anger Expression Inventory (10 items rated 0-4; Spielberger, 1999).
Attention-deficit/hyperactivity: Participants reported on the current frequency of difficulties with attention and/or hyperactivity using a modified version of the Wender Utah Rating Scale (25 items rated 0-4; Ward, Wender, & Reimherr, 1993).
Psychosis: Participants reported the extent to which they had experienced ideas of reference and ideas of persecution in the past month using the Green et al. Paranoid Thoughts Scale (32 items rated 1-5; Green et al., 2008).
Trauma: Participants rated how frequently they had experienced physical, sexual, and emotional abuse and neglect as children and adolescents using the Childhood Trauma Questionnaire-Short Form (28 items rated 1-5; Bernstein et al., 2003).
Cognitive functioning: Assessors administered the Wechsler Abbreviated Scale of Intelligence-II (WASI-II; Wechsler, 2011), a brief screen of intelligence consisting of the Vocabulary, Similarities, Block Design, and Matrix Reasoning tests from the Wechsler Adult Intelligence Scale. Responses result in Verbal and Performance IQ scores.
Thought dysfunction: Participants reported the degree of general thought dysfunction using the Eysenck Personality Questionnaire-Psychoticism scale (12 items rated 0-2; Eysenck & Eysenck, 1991). Impairment: Assessors who administered the diagnostic assessments documented global assessment of functioning (GAF) scores for each participant (one item rated 0-100; APA, 2013). Lower scores indicate greater impairments in functioning.

Data analytic method
Little's Missing Completely At Random (MCAR) test suggested the data were not missing completely at random, χ 2 (4433) = 7034.73, p < .01. Given the lack of systematic bias in the administration and completion of measures and the small correlations among observed variables and patterns of missingness (rs: .05-.20), the data may be considered missing at random (MAR). Thus, we created 100 multiply imputed datasets after 40 000 iterations using Bayesian estimation in Mplus Version 7.0 (Muthén & Muthén, 1998 which asymptotically produces the same estimates as maximum likelihood estimation under MAR. We examined descriptive statistics of the frequency of diagnoses and distributions of continuous variables. Because models of the p-factor have included different combinations of disorders and measures, we first examined the fit of the p-factor using confirmatory factor analysis with weighted least square mean and variance adjusted estimation to account for the binary diagnostic data. We examined three solutions to test the stability and generalizability of these models based on previous specifications of the p-factor: a one-factor solution and two bifactor solutions.* 1 In the first bifactor solution, we allowed all items to load onto a higher-order p-factor and one of three lowerorder factors representing internalizing, externalizing, or thought disorders, which were restricted to be orthogonal to each other and the p-factor. In the second bifactor solution, the three lowerorder factors were allowed to intercorrelate. Given the current literature on the hierarchical structure of psychopathology (HiTOP; Forbes, 2021;Kotov et al., 2017), we only allowed the somatoform disorder indicator to load onto the p-factor and allowed the borderline personality disorder (BPD) indicator to load onto both internalizing and externalizing factors.
Given concerns about the ability of fit indices to accurately distinguish factor analytic models (Bonifay & Cai, 2017;Greene et al., 2019;Stanton et al., 2021), we followed Forbes et al.'s (2021a) recommendations to supplement standard model fit indices [root-mean-square error of approximation (RMSEA; acceptable fit ⩽.10; good fit ⩽ .06; Hu & Bentler, 1999), comparative fit index (CFI) and Tucker-Lewis index (TLI; acceptable fit ⩾.90; excellent fit ⩾.95; Hu & Bentler, 1999), weighted root-mean-square residual (WRMR; good fit <1.00; DiStefano, Lui, Jiang, & Shi, 2017)] with statistics to evaluate the unidimensionality of these models. Because residual variances are not identified with binary indicators, we estimated the reliability of the p-factor and lower-order factors with omega hierarchical (ω h ; McDonald, 1999;Zinbarg, Revelle, Yovel, & Li, 2005) using the available continuous symptom count and self-report measures. We use ω h * to denote omega hierarchical for the p-factor and ω h,specific * to denote omega hierarchical for the lower-order factors to indicate these ω h 's do not include all variables in the model. ω h * > .75 indicates sufficient reliability (Reise, Bonifay, & Haviland, 2013a). We calculated explained common variance (ECV; Reise, Scheines, Widaman, & Haviland, 2013b) to assess the proportion of variance across all indicators explained by the p-factor relative to the specific factors (ECV > .85 indicates likely unidimensionality; Stucky & Edelen, 2014). We also calculated ECV_S (Forbes et al. 2021a) for each specific factor to estimate the proportion of variance these factors explained. We calculated the percentage of uncontaminated correlations (PUC; Reise et al., 2013b), representing the proportion of correlations that only reflect variance from the p-factor (PUC > .70 indicates likely unidimensionality). Finally, we calculated the average parameter bias (APB), representing the difference between item loadings in the one-factor model and the bifactor model (10-15% is deemed acceptable; Muthén, Kaplan, & Hollis, 1987) to assess the similarity of loadings between models.
We then added factors representing the five theories of the p-factor to these models. When multiple observed indicators were available (i.e., neuroticism, impulsivity, cognitive functioning), we allowed them to load onto a latent factor to represent the construct. When only single indicators were available (i.e., thought dysfunction, impairment), we created singleindicator latent variables by fixing the residual variance of the indicator to 0. We modeled the covariances between the p-factor and factors representing the five theories to test the convergent and discriminant validity of the p-factor, while simultaneously modeling the covariances among indicators of the five theories to account for their intercorrelations. We tested for differences in the strength of the absolute value of the standardized associations between the p-factor and indicators of each of the five theories using Wald tests. We examined fully standardized results in all models to enhance interpretability. All code is available at https://doi.org/10.17605/osf.io/hs8cp. Because participants did not consent to the open sharing of their data, we provide the raw correlation matrix (Table S1, Online Supplemental Materials), with raw data and measures available upon reasonable request.

Testing five theories of p
When adding indicators of and factors representing the five theories of p to each of the three models of the p-factor above, no model demonstrated good fit across indices. We re-fit the models based on theory and modification indices, most notably removing UPPS-Sensation Seeking because it exhibited standardized loadings >1. The bifactor model with correlated lower-order factors was the best-fitting model with acceptable fit by RMSEA, χ 2 (973) = 6253.55, RMSEA = .054, CFI = .848, TLI = .832, WRMR = 2.390, and largely similar loadings on (Δλs: .01-.11) and associations with (Δrs: .01-.06) the p-factor (Tables S4a-S5).

Discussion
In this study, we compared the structure of the p-factor among three candidate models in a sample with a range of psychopathology and compared the strength of the associations between the p-factor and indicators of five leading theories of p to test the convergent and discriminant validity of these theories. Across indices, the p-factor was reliable and unidimensional, with similar patterns of factor loadings regardless of model specifications. The p-factor was nearly identical to factors representing neuroticism, impulsivity, and impairment, strongly associated with thought dysfunction, and relatively weakly associated with cognitive functioning. Using Forbes et al.'s (2021a) recommendations, we replicated their findings regarding the high reliability and unidimensionality of the p-factor in an independent sample with unique indicators. The consistency of these results across models and samples provides stronger evidence for the existence of a p-factor than relying solely on model fit indices (Stanton et al., 2021). The highest loading items on the p-factor were BPD, paranoid PD, and mania in line with meta-analytic (Ringwald et al., 2021) and longitudinal (Caspi et al., 2014) research.
However, each diagnostic indicator contains multiple symptoms, so using them to infer the definition of the p-factor is less direct and may exhibit more sample-to-sample variability (Levin-Aspenson et al., 2020) than empirically testing the relations between the p-factor and specific theories. The p-factor was nearly identical to indicators of neuroticism, impulsivity, and impairment. The p-factor was also strongly associated with thought dysfunction, but less strongly related to cognitive functioning. These results suggest a model of p that extends Barlow et al.'s (2014) model of emotional disorders and synthesizes it with Smith et al.'s (2020) nonspecific impairment interpretation of the p-factor. In Barlow et al.'s (2014) model, emotional disorders (e.g., mood disorders, anxiety and related disorders, BPD) are characterized by the transaction between frequent, intense experiences of negative emotions (i.e., neuroticism) and aversive, impulsive reactions to reduce the short-term intensity of those emotions (i.e., impulsivity). However, positive urgency, or the tendency to act rashly in response to positive emotions, demonstrated one of the highest loadings on the impulsivity factor. This suggests the possibility of extending Barlow et al.'s (2014)  impulsive responses to positive emotions that lead to maladaptive consequences, which could in turn prompt frequent negative emotions (i.e., neuroticism). Smith et al.'s (2020) interpretation would add that the transaction between impulsivity and negative and/or positive emotions is necessary but not sufficient to define general psychopathology because it must lead to some level of impairment. This tripartite definition of the p-factor can address Smith et al.'s (2020) challenge that an appropriate definition of the p-factor should explain variance in all items loading on the p-factor while providing a more falsifiable theory of p (Watts, Lane, Bonifay, Steinely, & Meyer, 2020). For instance, hallucinations may only indicate psychopathology if they are hostile or otherwise prompt negative emotions and impulsive attempts to stop them. Strong positive emotions in mania may prompt impulsive and impairing behaviors that may, in turn, lead to negative interpersonal consequences or other dysfunction and thus prompt negative emotions. Restrictive eating behaviors may be an avoidant response to strong negative emotions that can have impairing consequences for a person, despite promoting a temporary feeling of control.
Although we have focused on the relations between the p-factor and neuroticism, impulsivity, and impairment, the p-factor also demonstrated strong associations with thought dysfunction that varied by the measure of thought dysfunction used. These results, combined with the high loading of paranoid PD symptoms and mania on p, suggests the need for further study of the role of thought dysfunction in p. In particular, excluding participants with active psychosis and active mania may have restricted the range of thought disorder and thought dysfunction, attenuating the strength of their relations with the p-factor. Alternatively, our measures of thought dysfunction may not capture the breadth of Caspi and Moffitt's (2018) definition. We encourage future researchers to include explicit measures of these thought processes in studies of the p-factor to more specifically test this theory.
Finally, the relatively small relation between the p-factor and cognitive functioning suggests this theory may be less tenable than the others. This finding is in line with Caspi et al. (2014) in which the relation between childhood IQ and the p-factor was less than half as large as the relation between the p-factor and neuroticism. Cognitive functioning may exert a more distal, developmental effect on the p-factor, rather than reflecting psychopathology per se (Caspi et al., 2014;Caspi & Moffitt, 2018). Alternatively, method effects may reduce the strength of this association, given that cognitive functioning was measured by a behavioral task whereas indicators of the p-factor came from interview assessments and self-reports.
The results of this study should be considered in light of its limitations. Our sample was relatively small compared to other studies of the p-factor (e.g., Forbes et al. 2021a), and the exclusion of people with very low cognitive functioning may have reduced the generalizability of our results and attenuated the strength of the relations between the p-factor and cognitive functioning. The cross-sectional, between-person nature of the design restricts
our ability to draw causal conclusions (Fried, 2020). Although GAF scores characterize functional impairment and are easily implemented in clinical practice, they are only one indicator of impairment that can have low reliability in practice (Vatnaland, Vatnaland, Friis, & Opjordsmoen, 2007). Neither full model of the p-factor with indicators of its theories demonstrated good fit across indices, despite the two bifactor models of the p-factor alone demonstrating good fit. The high correlations among indicators may suggest this misfit is a result of parsing similar constructs into too many categories (Watts, Boness, Loeffelman, Steinley, & Sher, 2021). Although model fit indices alone may not adequately distinguish models from each other (Greene et al., 2019), relatively low model fit suggests our results should be interpreted cautiously until replicated. Conceptually, the constructs represented by the theories of p may be considered embedded in the diagnostic indicators of the p-factor, either through item content or diagnostic criteria, producing circular results. We do not dispute that these constructs are embedded in the diagnostic indicators. However, we note that diagnostic indicators include heterogeneous criteria that vary in how closely they align with the theories of p (e.g., most syndromal disorders include an impairment criterion but PDs do not). Rather than indirectly inferring the association between the p-factor and theories of p based on how heterogeneous diagnostic indicators load onto the p-factor, we believe that directly modeling these associations provides a stronger and more straightforward test of these theories, which can contribute to the formalization of theories of p by allowing for direct comparisons among these associations. Similarly, the high correlations among indicators may also result from item content overlap. When possible, we excluded scales with direct item overlap (e.g., the NEO-FFI scales include no impulsiveness items). Furthermore, previous researchers have found similarly sized associations between personality and psychopathology factors with and without overlapping items (Walton, Pantoja, & McDermot, 2017). Finally, the tripartite model of p we discuss risks defining p in terms of the primary characteristics of the lower-order factors. We echo calls from Fried (2020) and others to test these theories in longitudinal studies to examine whether these theories contribute to the development of the p-factor (e.g., Williams, Craske, Mineka, & Zinbarg, 2021).
Despite these limitations, we found evidence of a unidimensional p-factor in a relatively diverse sample of adults with a range of measures of psychopathology. BPD symptoms, paranoid PD symptoms, and mania loaded most strongly on the p-factor regardless of the specific model used. The p-factor was most strongly related to neuroticism, impulsivity, and impairment, followed by thought dysfunction, and, to a much lesser degree, low cognitive functioning. We suggest a tripartite definition of the p-factor that incorporates transactions between neuroticism and impulsivity leading to impairment, and we encourage future research to test these theories longitudinally.
Supplementary material. The supplementary material for this article can be found at https://doi.org/10.1017/S0033291722001635.