Cross-cultural validation of the revised Green et al., paranoid thoughts scale

Background With efforts increasing worldwide to understand and treat paranoia, there is a pressing need for cross-culturally valid assessments of paranoid beliefs. The recently developed Revised Green et al., Paranoid Thoughts Scale (R-GPTS) constitutes an easy to administer self-report assessment of mild ideas of reference and more severe persecutory thoughts. Moreover, it comes with clinical cut-offs for increased usability in research and clinical practice. With multiple translations of the R-GPTS already available and in use, a formal test of its measurement invariance is now needed. Methods Using data from a multinational cross-sectional online survey in the UK, USA, Australia, Germany, and Hong Kong (N = 2510), we performed confirmatory factory analyses on the R-GPTS and tested for measurement invariance across sites. Results We found sufficient fit for the two-factor structure (ideas of reference, persecutory thoughts) of the R-GPTS across cultures. Measurement invariance was found for the persecutory thoughts subscale, indicating that it does measure the same construct across the tested samples in the same way. For ideas of reference, we found no scalar invariance, which was traced back to (mostly higher) item intercepts in the Hong Kong sample. Conclusion We found sufficient invariance for the persecutory thoughts scale, which is of substantial practical importance, as it is used for the screening of clinical paranoia. A direct comparison of the ideas of reference sum-scores between cultures, however, may lead to an over-estimation of these milder forms of paranoia in some (non-western) cultures.


Introduction
Research on the aetiology and treatment of paranoia has grown exponentially over the last few decades, resulting in the necessity to develop reliable and valid self-report instruments to quantify current levels of, and longitudinal changes in, paranoia.Consequently, several selfreport questionnaires of paranoid beliefs have been developed for use in clinical and nonclinical populations, including for example the Paranoia Scale (Fenigstein & Vanable, 1992) , Peters' Delusions Inventory (Peters, Joseph, & Garety, 1999), the Personal Experiences of Paranoia Scale (Ellett, Lopes, & Chadwick, 2003), the Paranoia Checklist (Freeman et al., 2005), and the Green Paranoid Thoughts Scale (Green et al., 2008).These questionnaires have refined the assessment of paranoid beliefs and have provided a sound base for aetiological, epidemiological, and intervention research.
Among these questionnaires, the Revised Green et al., Paranoid Thoughts Scale (R-GPTS; Freeman et al., 2021) stands out for multiple reasons.First, building on prior theoretical (Freeman et al., 2005) and empirical evidence (Moritz, Van Quaquebeke, & Lincoln, 2012) showing that paranoid thoughts are multifaceted, it includes putatively more common social evaluative concerns and ideas of reference and more severe persecutory thoughts as separate subscales.Second, its validation is based on a particularly large sample (i.e.8386 non-clinical individuals and 2165 patients with psychosis; Freeman et al., 2021).Third, a recent systematic review (Statham, Emerson, & Rowse, 2019) identified the Green et al., Paranoid Thoughts Scale as the most valid and accurate questionnaire of paranoid thoughts in general population and clinical samples due to the fact that it covers the full range of mild to severe paranoid beliefs, has the most clearly defined construct underlying its items, and shows the comparatively best psychometric properties.Fourth, its validation included determining latent construct ranges that can be meaningfully interpreted as classes ranging from 'average' to 'very severe' levels of ideas of reference/persecutory thoughts.Finally, some of these class cut-offs correspond to validated cutoffs for clinical levels of paranoia (persecutory thoughts sum-score of 11) and likely persecutory delusions (persecutory thoughts sum-score of 18; Freeman et al., 2021).Specifically, the cut-off between mildly elevated and moderate levels of paranoia (i.e.'clinical levels of paranoia', sum-score⩾11) correspond to the optimal point identified from receiver-operator curves to differentiate between patients with clinical levels of persecutory delusions and non-clinical participants (sensitivity = 0.93, specificity = 0.85).The cut-off between moderate and severe levels of paranoia (i.e.'likely persecutory delusions', sum-score⩾18) in turn corresponds to the cut-off optimized for minimal false positives (specificity = 0.93) while also maintaining sufficient levels of sensitivity (0.81) to detect patients with clinical persecutory delusions (for more details, see Freeman et al., 2021).
Despite being comparatively novel, the R-GPTS has already been used in numerous studies including interventions to reduce paranoia (Brown, Waite, Rovira, Nickless, & Freeman, 2020;Freeman et al., 2022), epidemiological studies (Rek et al., 2022), and experimental studies (see Ellett et al., 2023b for a review) to explore the causal mechanisms of paranoia (Barnby, Mehta, & Moutoussis, 2022) in various countries.Furthermore, the R-GPTS has also already been translated into various languages, including French (Latteur, Larøi, & Bortolon, 2022), Polish (Kowalski, Marchlewska, Molenda, Górska, & Gawęda, 2020), as well as German and Chinese (Kingston et al., 2023a).With this multi-cultural, multi-lingual implementation of the R-GPTS comes the implicit assumption that its psychometric properties (including its UK-based cutoffs; Freeman et al., 2021) can be readily used in different contexts.To date, however, there have been no formal tests of the measurement invariance (i.e.equivalence of the assessed construct) of the R-GPTS across cultures and translated versions.Thus, we have yet to determine whether paranoia as assessed and quantified by the R-GPTS has the same meaning to people from different countries or whether language differences or differences in people's reaction to the item content preclude a direct comparison of scores across cultural groups.Commonly, three core components of measurement invariance are needed for a comparison of means of the latent constructs measured by a questionnaire (Putnick & Bornstein, 2016): First, configural invariance needs to be tested.In case of the R-GPTS, configural invariance is the equivalence of the underlying model of paranoia consisting of the two factors ideas of reference and persecutory thoughts.If a similar pattern of items loading on the respective factors can be established, it is then tested whether the item loadings are equal across group, i.e., whether the difference between two response options for any item is indicative of the same difference in the latent construct of ideas of reference/persecutory thoughts across cultural groups (metric invariance).Finally, scalar invariance is tested.Scalar invariance means that not only the loadings of items, but also their intercept is equal across the groups, meaning that the mean differences in the latent constructs of paranoia capture all mean differences in the variance the items share with their factor.In other words, no item introduces a systematic over-or underestimation of the latent construct in any of the cultural groups.Only if such a level of measurement invariance can be established is it possible to directly compare means between cultural groups or to establish the validity of cut-offs across cultures.
To address this gap in the literature, in this study, we examine the measurement invariance of the R-GPTS.Using data from an existing multinational cross-sectional survey with data from the UK, USA, Australia, Germany, and Hong Kong, we tested the R-GPTS for (1) configural invariance regarding its two-factor structure, followed by (2) metric invariance and scalar invariance, which would be prerequisites for meaningfully comparing sum-scores across countries.We hypothesized that the R-GPTS would show configural, metric, and scalar invariance.
However, if no invariance for the full R-GPTS was found, we aimed to explore (3) whether invariance can be established for the ideas of reference or persecutory thoughts subscale alone, and (4) whether invariance can be found within different countries and/ or language versions of the R-GPTS.

Design and procedure
This study uses data from a cross-sectional online survey.Data were collected between February and March 2021 across the UK, USA, Australia, Germany, and Hong Kong on the topic of vaccine hesitancy and pandemic-specific paranoia as well as general suspiciousness; Kingston et al., 2023aKingston et al., , 2023b;;Lincoln et al., 2022;So et al., 2022).Participants completed a questionnaire battery that included the R-GPTS, pandemic specific paranoid beliefs (pandemic paranoia scale; Kingston et al., 2023a;Ellett et al., 2023a), indicators of mental health in general (e.g.depression, anxiety, stress, worrying) as well as various resilience and risk factors (e.g.trauma experiences, core beliefs about one-self and others).In the current study, only the R-GPTS was analyzed.
Ethical approval was obtained separately from local ethics committees at each of the host sites.Potential participants were contacted by Qualtrics to take part.Consenting participants completed the questionnaires online via Qualtrics and were reimbursed via Qualtrics sampling services.Participants were required to respond to all questions on each page before progressing through the survey.Data accuracy was optimized by (1) including five attention check questions across the survey.Participants had to correctly respond to all five attention checks to be included.Moreover, (2) participants completing the survey at less than half of the median completion time and (3) participants with a geographical location outside the corresponding site location were excluded.Participants who did not fulfill the data accuracy requirements, did not give their informed consent to their data being used or dropped out without completing the full questionnaire battery were excluded at source by Qualtrics (excluded participants: n = 3555).

Participants
Participants were recruited via Qualtrics using stratified quota sampling.Each sample was stratified to be representative of the respective general population in terms of sex, age, and level of education.Sample size was determined based on the minimum sample size to validate the newly constructed pandemic paranoia scale (Kingston et al., 2023a).A total of 2510 participants (UK n = 512, USA n = 535, Australia n = 502, Germany n = 516, and Hong Kong n = 445) met quota and quality assurance conditions and were included in the final sample.Sample characteristics are summarized in Table 1.

R-GPTS
The Revised Green et al. Paranoid Thoughts Scale (Freeman et al., 2021) is an 18-item measure that comprises two subscales: ideas of reference (8 items, e.g.'I have been upset by friends and colleagues judging me critically') and paranoia/persecutory thoughts 1986 Björn Schlier et al.
(10 items, .e.g.'I was sure someone wanted to hurt me').Items are rated on a 5-point scale of 0not at all to 4totally.Sum-scores for ideas of reference (range: 0-32) and persecutory thoughts (range: 0-40) are usually calculated and provide the base for categorization from 'average' to 'very severe'.For the persecutory thoughts subscale, a sum-score of 18 or more is indicative of persecutory delusions.In the current sample, Cronbach's alpha was excellent for both subscales (ideas of reference: α = 0.94; persecutory thoughts: α = 0.96).
For the UK, USA, and Australia, we used the original English version of the R-GPTS.For the German site, we used an existing translation of the GPTS as a starting point (Watzke & Schwenke, 2014) for the translation of the R-GPTS.Translated and original items were compared for any discrepancies and changes to the wording of the German version were added based on consensus between TML and a graduate level psychologist from the German site.Reliability of the German R-GPTS has been tested in two concurrent, independent studies that are as yet unpublished (see Schönig, Krkovic, & Lincoln, 2022a, 2022b for corresponding pre-registrations).Preliminary results from these studies (sample sizes n = 50, and n = 31, respectively) showed good reliability for both ideas of reference (0.83 ⩽ α ⩽ 0.89), persecutory thoughts (0.87 ⩽ α = 0.91), and the full scale (0.93 ⩽ α ⩽ 0.93) as well as high correlations with the PSYRATS delusion subscale (n = 31, 0.55 ⩽ r ⩽ 0.61).For the Hong Kong site, we translated the R-GPTS to Traditional Chinese (i.e. the writing system that is used in both HK and Taiwan): First, the English scale was translated into Chinese, followed by a back-translation of the Chinese version into English, and a check for consistency between the back-translated version and the original English version.After a discussion of discrepancies, fine-tuning of the wording for the translated version resulted in the final Chinese version of the R-GPTS.The translation and back-translation were conducted independently by bilingual, graduate-level psychologists.Comparison of the original v. back-translated version and decisions for final changes were based on consensus between the  (Chau et al., 2022), where it showed an excellent Cronbach's alpha (α = 0.95).The German and Traditional Chinese versions of the R-GPTS have been added as an online Supplement to this article.

Strategy for data analyses
Analyses were conducted using R 4.2.2 and the R-package lavaan (Rosseel, 2012).For all analyses, we calculated confirmatory factor analyses with Satorra-Bentler scaled test statistic.We tested measurement invariance in the following three steps: (1) configural invariance, (2a) invariance of loading (i.e.metric invariance), and (2b) invariance of loadings and intercepts (i.e.scalar invariance).Configural invariance was determined by the three indicators CFI (good fit CFI > 0.95, sufficient fit CFI > 0.90), RMSEA (good fit RMSEA < 0.06, sufficient fit RMSEA < 0.08), and SRMR (SRMR < 0.08; for information of thresholds on all indicators, see Chen, 2007).Metric invariance was assessed on the basis of differences in fit indices in comparison to the configural invariance model, using the cutoffs ΔCFI > −0.010, ΔRMSEA < 0.015, and ΔSRMR < 0.030 as an indication of metric invariance (Meade, Johnson, & Braddy, 2008).For the assessment of scalar invariance, differences in fit indices in comparison to the metric invariance model were used, with the cut-offs ΔCFI > −0.010, ΔRMSEA < 0.015, and ΔSRMR < 0.015 indicating scalar invariance (Meade et al., 2008).First, we tested invariance for the full 18-item R-GPTS with the full sample.Second, if indicators of non-invariance were found, we followed up with secondary analyses of invariance of the R-GPTS ideas of reference and R-GPTS persecutory thoughts subscales to determine the source of invariance in relation to specific subscales.
Finally, (3) we repeated the invariance analyses in language and culture specific subsamples.We performed a measurement invariance analyses across the English speaking countries (UK, USA, Australia) to explore whether the R-GPTS shows invariance across countries with the same (English) version of the scale.Next, we followed up by adding either the German sample or the Hong Kong sample in two additional invariance analyses to explore if and how invariance differs across different language/culturedyads.

Language subsample invariance analyses
When analyzing the subsamples from the English speaking countries (UK, USA, and Australia), invariance analysis yielded sufficient fit for configural, metric, and scalar invariance (see Table 3).These results were similar, when the two subscales (ideas of reference and persecutory thoughts) were analyzed separately (see online Supplement).When analyzing the English and German subsamples without the Hong Kong sample, the full scale analyses yielded configural, metric, or scalar invariance (see Table 3).When analyzing the subscales separately, we found invariance for the persecutory thoughts subscale, whereas scalar invariance was not found for the ideas of reference subscale (χ 2 (122) = 658.83,χ 2 (122) scaled = 375.12,CFI = 0.967, ΔCFI = 0.013, RMSEA = 0.063, ΔRMSEA = 0.008, SRMR = 0.056, ΔSRMR = 0.006, see online Supplement for more details).
To better understand the nature of the invariance, we repeated the factor analyses while freeing one item intercept, respectively, and comparing the pooled estimate for the item intercept with the freely estimated intercept for the Hong Kong sample.The resulting comparisons yielded substantially higher intercept estimates in the Hong Kong v. pooled sample the items 7 ('I believed that certain people were not what they seemed', Δintercept = 0.57) and 3 ('I have been upset by friends and colleagues judging me critically', Δintercept = 0.31) and a lower estimate for item 5 ('I have been thinking a lot about people avoiding me', Δintercept = −0.28).

Discussion
In this study, we explored the measurement invariance of the Revised Green at al. Paranoid Thoughts Scale (R-GPTS).We found configural invariance, supporting the notion that paranoia can be differentiated in milder, more common symptoms that encompass ideas of reference and other forms of social evaluative concerns on the one hand, and more severe persecutory thoughts on the other hand (Freeman et al., 2021).These findings are consistent with continuum models (Strauss, 1969;Verdoux & van Os, 2002) and provide further support for a multifaceted model of paranoid beliefs (Green et al., 2008;Moritz et al., 2012).We add to this by showing empirically that a two factor model of paranoid beliefs comprising social evaluative beliefs and ideas of reference on the one hand and persecutory beliefs on the other hand appears to be applicable across cultures.
Regarding the more severe form of paranoid beliefs, persecutory thoughts, we further found substantial evidence for scalar invariance for the corresponding subscale of the R-GPTS.This is no small feat, since prior tests of paranoia measures across cultures showed no scalar invariance for paranoia (sub)scales (Heuvelman, Nazroo, & Rai, 2018;Jaya et al., 2022), which Jaya et al. (2022) highlight as an indication of the 'general difficulty of constructing an invariant measure of paranoia'.Scalar invariance for the persecutory thoughts subscale in particular is of substantial practical importance, as persecutory thoughts scores have previously been established as the basis for cut-offs in screening for clinical levels of paranoid delusions (Freeman et al., 2021).With evidence for scalar invariance for this subscale, future studies can now test for differences in mean latent construct values across languages/cultures in order to formally verify the validity of the existing cut-off (sum score ⩾ 18) cross-culturally and in other populations than the UK.Assuming that the same level of (the latent of construct of) persecutory beliefs is indicative of the same risk for a psychotic disorder across cultures, researchers and practitioners may provisionally work with the established UK cut-off.However, it needs noting that this equivalence has not been directly tested.Rather, our initial results regarding the invariance of the R-GPTS persecutory thoughts subscale are a promising first step toward cross-culturally valid cut-offs for this self-report questionnaire.What is needed to complete

Psychological Medicine 1989
validation of the cut-offs is a comparison of mean latent scores between samples that are optimized for the testing of cut-offs (i.e.samples stratified for mental health status).
Regarding the less severe form of paranoid beliefs, ideas of reference, we found metric but not scalar invariance.In other words, agreeing more with the R-GPTS ideas of reference items corresponds with equal gradual increases of the latent construct of ideas of reference across cultures.However, the intercept of these items varies across cultures, meaning that individual items can add an over-or underestimation of the latent construct of ideas of reference.In practice, this means that comparison of the latent mean scores of the R-GPTS ideas of reference factor between samples from different cultures is at risk for biased results.Interestingly, this scalar invariance was limited to the comparison of the three English-speaking samples (UK, USA, and Australia) and the Hong Kong sample.When only the English-speaking or the English-speaking and German-speaking samples were used for invariance analyses, we found no indication of non-invariance for ideas of reference.Possibly, cultural differences in trust-formation in Western v. Eastern societies (Yuki, Maddux, Brewer, & Takemura, 2005) that may be correlated with the degree of collectivism of the respective culture (Westjohn, Magnusson, Franke, & Peng, 2022) influence what constitutes the norm/absence of elevated social-evaluative concerns, though this would need to be established in future research.In line with this, Jaya et al. (2022) already pointed out that the assessment of paranoia in self-report is intertwined with objective levels of (social) threat.These levels may greatly differ as a function of geographical location, culture-specific social norms, and/ or differences in the social status of certain minorities that consequently face higher or lower levels of threat in different places.The combination of non-invariance for ideas of reference and scalar invariance for persecutory thoughts could point toward qualitative differences in the building blocks of the hierarchy of paranoia.In particular, what constitutes an elevated level of more mundane forms of mistrust may vary with cultural standards, whereas the starting point for what can be considered more severe or pathological forms of paranoia is universal and appears to be largely unaffected by cultural differences.In the case of ideas of reference, we found overall more items with an increased than a lower intercept when comparing with the Hong Kong sample.Specifically, participants from Hong Kong show higher scores in items that deal with a not directly disclosed social judgment by others (i.e.'believing others are not what they seem' and 'being upset by being critically judged').This samplespecific tendency is unrelated to the ideas of reference.Instead, it may be a result of a culture-specific difference in norms and expectation in social interactions.Thus, a person from Hong Kong is likely to score higher in ideas of reference than a person from the UK, USA, or Australia, even when both participants have the same true value.At the same time, a minority of items with a lower intercept in Hong Kong v. the other samples (for the item dealing with 'concerns regarding other people avoiding the respondent') make it difficult to predict the exact direction of the biased estimate for an individual participant.

Strengths and limitations
While the recruitment with stratified quota sampling constitutes a major strength of this study in terms of generalizability, it needs noting that the sampling was conducted via recruitment in the Panel samples provided by Qualtrics, impacting the representativeness of the sample to some degree.We purposefully sampled region-specific representative samples, yet there are groups that could nonetheless be under-represented, such as those with limited access to a computer or the internet.It needs noting though that a study comparing different online-recruitment procedures found Qualtrics to be the best option in terms of achieving approximate demographic representativeness and geographic representativeness in a high-income country (when compared to equally common commercial providers such as MTurk or Facebook-advertisement; Boas, Christenson, & Glick, 2020).Divergences from representativeness are gradual, and indications of some limitations regarding the participant pool have been documented for people with high income and people of higher age (i.e.aged 50 and older, Miller, Guidry, Dahman, & Thomson, 2020).To our knowledge, however, there is no existing research detailing potential selection bias in terms of clinical variables.Related to this, it needs noting that invariance-assessments of non-clinical and clinical samples (i.e.patients with psychosis) might show diverging results, necessitating future tests in clinical samples to further verify our initial findings.Furthermore, the lack of a more objective criterion to validate the R-GPTS, in particular other self-report-or interview-based assessment of paranoid delusions, prevented us from further criterion validation with the present data.Further, our data may have been impacted by the Covid-19 pandemic, given the timing of data collection.With COVID-19 affecting the five international sites differently at the time of recruitment, it is possible that the mean scores of persecutory thoughts may have temporarily shifted to different degrees in the sample, preventing us from disentangling permanent mean differences in persecutory thoughts across the five sites from transitory fluctuations due to the Covid-19 pandemic.Ideally, future research should collect data at several assessment points in different cultures to disentangle fluctuations in latent means due to, for example public or political crises, from stable intercultural differences in mean persecutory thoughts.Such an effort would allow for either the generalization of the established UK cut-offs or the generation of improved cross-cultural or culture-specific cutoffs.Finally, while there was some cultural variation in the samples analyzed, all five sites constitute high-income regions, so a more extensive follow-up to this study is needed to further verify invariance across high, middle, and low income countries.

Conclusion
This analysis of measurement invariance of the R-GPTS showed that its persecutory thoughts subscale is a reliable and valid selfreport measure of severe levels of paranoia, which provides the first step toward an unbiased assessment across cultures and cross-cultural verification of cut-off criteria.At the same time, our analyses revealed that a direct comparison of the ideas of reference subscale sum-scores between cultures may lead to an overestimation of these milder forms of paranoia in some (nonwestern) cultures.In sum, the R-GPTS constitutes a valuable tool for researchers and practitioners assessing, treating, and exploring the phenomenon of paranoid beliefs.
Supplementary material.The supplementary material for this article can be found at https://doi.org/10.1017/S0033291724000072

Table 1 .
Demographic and clinical sample characteristics by site Psychological Medicine two translating psychologists, with SHS acting as consultant if needed.The traditional Chinese version of the R-GPTS has been used in another study

Table 2 .
Measurement invariance confirmatory factor analyses on the full five-site sample (N = 2510) Note.Cells printed in bold denote indicators of non-invariance.1988BjörnSchlier et al.

Table 3 .
Measurement invariance results for the full R-GPTS two-factor model in language and culture subsamples Note.Cells printed in bold denote indicators of non-invariance.