Introduction
Major depression symptoms, as described in the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5), include cognitive and mood-related psychological symptoms as well as somatic symptoms, including fatigue, poor appetite or overeating and insomnia or hypersomnia (American Psychiatric Association, 2013). Most self-report depression symptom questionnaires include items that reflect both psychological and somatic symptoms (Sakakibara et al., Reference Sakakibara, Miller, Orenczuk and Wolfe2009; Wakefield et al., Reference Wakefield, Butow, Aaronson, Hack, Hulbert-Williams and Jacobsen2015). Scores obtained using such questionnaires, however, may be artefactually inflated in people with a chronic illness due to the overlap of somatic symptoms related to depression and those stemming from their physical illness (von Ammon Cavanaugh, Reference von Ammon Cavanaugh1995; Koenig et al., Reference Koenig, George, Peterson and Pieper1997; Sørensenf et al., Reference Sørensenf, Friis-Hasché, Haghfelt and Bech2005).
The 9-item Patient Health Questionnaire-9 (PHQ-9) (Kroenke et al., Reference Kroenke, Spitzer and Williams2001) and its 8-item version (PHQ-8) (Wu et al., Reference Wu, Levis, Riehm, Saadat, Levis, Azar, Rice, Boruff, Cuijpers, Gilbody, Ioannidis, Kloda, McMillan, Patten, Shrier, Ziegelstein, Akena, Arroll, Ayalon, Baradaran, Baron, Bombardier, Butterworth, Carter, Chagas, Chan, Cholera, Conwell, de Man-van Ginkel, Fann, Fischer, Fung, Gelaye, Goodyear-Smith, Greeno, Hall, Harrison, Härter, Hegerl, Hides, Hobfoll, Hudson, Hyphantis, Inagaki, Jetté, Khamseh, Kiely, Kwan, Lamers, Liu, Lotrakul, Loureiro, Löwe, McGuire, Mohd-Sidik, Munhoz, Muramatsu, Osório, Patel, Pence, Persoons, Picardi, Reuter, Rooney, Santos, Shaaban, Sidebottom, Simning, Stafford, Sung, Tan, Turner, van Weert, White, Whooley, Winkley, Yamada, Benedetti and Thombs2020) are commonly used for identifying people who may have depression (Moriarty et al., Reference Moriarty, Gilbody, McMillan and Manea2015; Levis et al., Reference Levis, Benedetti and Thombs2019) and as continuous measures to assess depression symptom severity (Kroenke et al., Reference Kroenke, Spitzer and Williams2001). The 9 items of the PHQ-9 align with the 9 DSM-5 symptom criteria for a major depressive episode (American Psychiatric Association, 2013), and the PHQ-8 includes all PHQ-9 items except an item on thoughts of suicide or self-harm (Wu et al., Reference Wu, Levis, Riehm, Saadat, Levis, Azar, Rice, Boruff, Cuijpers, Gilbody, Ioannidis, Kloda, McMillan, Patten, Shrier, Ziegelstein, Akena, Arroll, Ayalon, Baradaran, Baron, Bombardier, Butterworth, Carter, Chagas, Chan, Cholera, Conwell, de Man-van Ginkel, Fann, Fischer, Fung, Gelaye, Goodyear-Smith, Greeno, Hall, Harrison, Härter, Hegerl, Hides, Hobfoll, Hudson, Hyphantis, Inagaki, Jetté, Khamseh, Kiely, Kwan, Lamers, Liu, Lotrakul, Loureiro, Löwe, McGuire, Mohd-Sidik, Munhoz, Muramatsu, Osório, Patel, Pence, Persoons, Picardi, Reuter, Rooney, Santos, Shaaban, Sidebottom, Simning, Stafford, Sung, Tan, Turner, van Weert, White, Whooley, Winkley, Yamada, Benedetti and Thombs2020). In contrast to what might be expected, existing research suggests that having a medical illness and, among people with a medical illness, condition severity are not associated with higher PHQ-9 somatic symptom item scores (Leavens et al., Reference Leavens, Patten, Hudson, Baron and Thombs2012; Jones et al., Reference Jones, Ludman, McCorkle, Reid, Bowles, Penfold and Wagner2015; Cook et al., Reference Cook, Kallen, Bombardier, Bamer, Choi, Kim, Salem and Amtmann2017; Hu and Ward, Reference Hu and Ward2017; Marrie et al., Reference Marrie, Lix, Bolton, Fisk, Fitzgerald, Graff, Hitchon, Kowalec, Marriott, Patten, Salter and Bernstein2023). Differential item functioning (DIF) analyses assess the degree to which questionnaire items are likely to measure the intended construct and not be influenced by other external factors (Walker, Reference Walker2011). If DIF based on having a chronic illness were present, people with an illness would score higher on PHQ-9 somatic symptom items than people without an illness with similar psychological symptom item scores, presumably due to overlap with their medical symptoms.
We identified 5 studies on DIF in the PHQ-9 (Leavens et al., Reference Leavens, Patten, Hudson, Baron and Thombs2012; Jones et al., Reference Jones, Ludman, McCorkle, Reid, Bowles, Penfold and Wagner2015; Cook et al., Reference Cook, Kallen, Bombardier, Bamer, Choi, Kim, Salem and Amtmann2017; Hu and Ward, Reference Hu and Ward2017; Marrie et al., Reference Marrie, Lix, Bolton, Fisk, Fitzgerald, Graff, Hitchon, Kowalec, Marriott, Patten, Salter and Bernstein2023), including 4 that compared people with and without a medical illness (Leavens et al., Reference Leavens, Patten, Hudson, Baron and Thombs2012; Cook et al., Reference Cook, Kallen, Bombardier, Bamer, Choi, Kim, Salem and Amtmann2017; Hu and Ward, Reference Hu and Ward2017; Marrie et al., Reference Marrie, Lix, Bolton, Fisk, Fitzgerald, Graff, Hitchon, Kowalec, Marriott, Patten, Salter and Bernstein2023) and one that examined whether having more cancer-related somatic symptoms was associated with PHQ-9 somatic item responses (Jones et al., Reference Jones, Ludman, McCorkle, Reid, Bowles, Penfold and Wagner2015). None of the studies found detectable DIF on any somatic items that meaningfully influenced total measure scores. One of the studies (Leavens et al., Reference Leavens, Patten, Hudson, Baron and Thombs2012) was conducted among people with systemic sclerosis (SSc; also known as scleroderma), a complex, rare, chronic, autoimmune disease characterised by microvascular damage and fibrosis of the skin and other organs, including the lungs, gastrointestinal tract, kidneys and heart (Allanore et al., Reference Allanore, Simms, Distler, Trojanowska, Pope, Denton and Varga2015; Denton and Khanna, Reference Denton and Khanna2017; Volkmann et al., Reference Volkmann, Andréasson and Smith2023). Disease onset typically peaks around age 50, and over 80% of people with SSc are women (Allanore et al., Reference Allanore, Simms, Distler, Trojanowska, Pope, Denton and Varga2015). Common symptoms include skin thickening, Raynaud’s phenomenon, difficulty breathing, limitations in hand mobility, and 3 symptoms measured by PHQ-9 items: fatigue, sleep problems and gastrointestinal issues that can lead to changes in appetite (Allanore et al., Reference Allanore, Simms, Distler, Trojanowska, Pope, Denton and Varga2015; Denton and Khanna, Reference Denton and Khanna2017; Volkmann et al., Reference Volkmann, Andréasson and Smith2023).
It is possible that people with somatic symptoms from medical conditions do not score higher on somatic symptom items than people without medical conditions due to item-order or context effects. Item-order or context effects occur in questionnaires when the order in which items are presented provides a context that affects how people interpret and respond to items (Bowling and Windsor, Reference Bowling and Windsor2008; Lietz, Reference Lietz2010; Thau et al., Reference Thau, Mikkelsen, Hjortskov and Pedersen2021; Lee et al., Reference Lee, Krishnamurty, Van Horn, Cooper, Coutanche, McMullen, Panter, Rindskopf and Sher2023). If people with a chronic illness with substantial somatic symptom burden are aware that they are answering questions about depression, they might not report overlapping somatic symptoms because they attribute them entirely to their physical illness and not to depression, which is more often associated with cognitive and mood-related symptoms. If this were the case, these people would theoretically score higher on items measuring somatic symptoms of depression if these items were removed from the context of a depression questionnaire.
We did not identify any studies that have examined whether similar reporting of somatic symptoms on depression symptom questionnaires between people with and without medical conditions may be due to context effects and the awareness that depression symptoms are being assessed. Our primary objective was to evaluate differences in mean PHQ-8 somatic item sum scores (item 3 = sleep disturbances, item 4 = fatigue, item 5 = appetite changes) in people with SSc when the items were administered outside the context of a depression questionnaire versus as part of the PHQ-8. Our secondary objective was to evaluate differences in mean scores for each individual PHQ-8 somatic item when administered outside of or as part of the PHQ-8. We hypothesised that people who completed somatic items outside the context of a depression assessment would have significantly higher scores than those who completed the same items as part of a depression assessment.
Methods
Study design
We conducted a two-arm parallel superiority randomised controlled experiment with a 1:1 allocation ratio within the Scleroderma Patient-centred Intervention Network (SPIN) Cohort. The SPIN Cohort is a large multinational cohort that collects longitudinal data on patient-reported outcomes from people living with SSc (Scleroderma Patient-centered Intervention Network, 2024). Cohort participants complete a series of patient-reported outcome measures online every 3 months. In this experiment, when participants logged in to complete their routine assessment, they were randomly assigned to Reordered Items or Standard PHQ-8 arms. Participants in the Reordered Items arm completed a reordered version of the PHQ-8 in which the 3 PHQ-8 somatic items (item 3 = sleep disturbances, item 4 = fatigue and item 5 = appetite changes) were presented first, without any indication that they were part of a depression questionnaire, and the 5 psychological items were presented later in the assessment protocol.
The SPIN Cohort study was approved by the Research Ethics Committee of the Centre intégré universitaire de santé et de services sociaux du Centre-Ouest-de-l’Île-de-Montréal (#MP-05-2013-150) and by the ethics committees of all recruiting sites. Since the only change to routine cohort assessments was changing the order of item presentation, the Research Ethics Committee of the Centre intégré universitaire de santé et de services sociaux du Centre-Ouest-de-l’Île-de-Montréal determined that no additional ethics approval was needed and that SPIN Cohort participants did not need to be notified about the experiment.
We registered the experiment (ClinicalTrials.gov, NCT06772896) and posted a protocol on the Open Science Framework (https://osf.io/qkf8g/files/5kn49) prior to initiation. We reported results consistent with the Consolidated Standards of Reporting Trials statement (Moher et al., Reference Moher, Hopewell, Schulz, Montori, Gøtzsche, Devereaux, Elbourne, Egger and Altman2010). Studies that use SPIN Cohort data have similar methods. Thus, the results from our experiment were reported as consistent with guidance from the Text Recycling Research Project (Hall et al., Reference Hall, Moskovitz and Pemberton2021).
Eligibility criteria
Eligible SPIN Cohort participants must be classified as having SSc based on 2013 American College of Rheumatology/European League Against Rheumatism criteria (van den Hoogen et al., Reference van den Hoogen, Khanna, Fransen, Johnson, Baron, Tyndall, Matucci-Cerinic, Naden, Medsger, Carreira, Riemekasten, Clements, Denton, Distler, Allanore, Furst, Gabrielli, Mayes, van Laar, Seibold, Czirjak, Steen, Inanc, Kowal-Bielecka, Müller-Ladner, Valentini, Veale, Vonk, Walker, Chung, Collier, Csuka, Fessler, Guiducci, Herrick, Hsu, Jimenez, Kahaleh, Merkel, Sierakowski, Silver, Simms, Varga and Pope2013) confirmed by a SPIN physician; aged ≥18 years; fluent in English, French, or Spanish; and have access to a computer or tablet with internet access. The SPIN Cohort is a convenience sample. Cohort participants are recruited during regular medical visits at SPIN recruitment sites (Scleroderma Patient-centered Intervention Network, 2024), and written informed consent is obtained. A medical form is submitted online by site personnel to enrol participants. Once online registration is completed, participants are sent an automated welcome email with instructions on how to activate their SPIN account and complete SPIN Cohort measures online. Cohort participants complete routine online assessments that last approximately 20 minutes upon enrolment and at 3-month intervals. All SPIN Cohort participants who logged in to the SPIN Cohort platform to complete their routine 3-month assessment during the period when the experiment was conducted were included.
Experiment arms
Participants assigned to the Reordered Items arm received an assessment protocol in which the 3 PHQ-8 somatic items were presented separately from and prior to the 5 PHQ-8 psychological items without any indication that they were part of a depression symptom questionnaire. PHQ-8 psychological items were presented (1) later in the assessment and (2) with at least 1 other patient-reported outcome measure between the somatic items and psychological items. Apart from these 2 rules, as routinely done in SPIN Cohort assessments, all measures were presented in random order.
Participants assigned to the Standard PHQ-8 arm served as the control arm and completed the standard version of the PHQ-8, with the PHQ-8 and all other measures administered in random order. The title of the PHQ-8 was not presented to participants in either experiment arm; the PHQ-8 does not include instructions to respondents. PHQ-8 items were presented on 2 separate webpages in the Reordered Items arm and on a single webpage in the Standard PHQ-8 arm. The routine online assessment also contained 9 other measures, including the Health Literacy Survey Questionnaire (Finbråten et al., Reference Finbråten, Wilde-Larsson, Nordström, Pettersen, Trollvik and Guttersrud2018) and the Patient-Reported Outcomes Measurement Information System (Hays et al., Reference Hays, Spritzer, Schalet and Cella2018), which included 8 measures.
Randomisation and blinding
The SPIN Cohort platform was programmed to randomise each participant via simple randomisation in a 1:1 ratio to either the Reordered Items or Standard PHQ-8 arm. Randomisation occurred automatically and immediately when participants logged in to complete their routine assessment. Thus, study investigators were fully blind to participants’ assigned experiment arm. Since participants were not notified about the experiment, they were fully blind to the study objectives and their assigned experiment arm.
Participant characteristics and experiment outcomes
SPIN physicians provided age, sex and medical information upon enrolment of participants in the SPIN Cohort. Participants reported sociodemographic data, including race or ethnicity, education level and marital status.
The primary outcome analysis compared the mean sum score of the 3 PHQ-8 somatic items (item 3 = sleep disturbances, item 4 = fatigue and item 5 = appetite changes) between participants in the two experimental arms. Secondary outcome variables were individual mean scores of the 3 PHQ-8 somatic items.
The PHQ-8 consists of 8 items that measure depression symptoms over the last 2 weeks. Items are scored on a 4-point scale ranging from 0 (not at all) to 3 (nearly every day), with higher scores (range 0–24) indicating more depression symptoms. The PHQ-8 has been shown to be equivalent to the PHQ-9 (Wu et al., Reference Wu, Levis, Riehm, Saadat, Levis, Azar, Rice, Boruff, Cuijpers, Gilbody, Ioannidis, Kloda, McMillan, Patten, Shrier, Ziegelstein, Akena, Arroll, Ayalon, Baradaran, Baron, Bombardier, Butterworth, Carter, Chagas, Chan, Cholera, Conwell, de Man-van Ginkel, Fann, Fischer, Fung, Gelaye, Goodyear-Smith, Greeno, Hall, Harrison, Härter, Hegerl, Hides, Hobfoll, Hudson, Hyphantis, Inagaki, Jetté, Khamseh, Kiely, Kwan, Lamers, Liu, Lotrakul, Loureiro, Löwe, McGuire, Mohd-Sidik, Munhoz, Muramatsu, Osório, Patel, Pence, Persoons, Picardi, Reuter, Rooney, Santos, Shaaban, Sidebottom, Simning, Stafford, Sung, Tan, Turner, van Weert, White, Whooley, Winkley, Yamada, Benedetti and Thombs2020), which is a valid measure of depression symptoms in SSc (Milette et al., Reference Milette, Hudson and Thombs2010). The PHQ-8 is available in English, French and Spanish (Arthurs et al., Reference Arthurs, Steele, Hudson, Baron and Thombs2012; Gómez-Gómez et al., Reference Gómez-Gómez, Benítez, Bellón, Moreno-Peral, Oliván-Blázquez, Clavería, Zabaleta-Del-Olmo, Llobera, Serrano-Ripoll, Tamayo-Morales and Motrico2023). Only 2 studies have examined the minimal important difference of the PHQ-9 using anchor-based approaches, and they estimated a minimal important difference of between 2.0 and 3.7 points (Bauer-Staeb et al., Reference Bauer-Staeb, Kounali, Welton, Griffith, Wiles, Lewis, Faraway and Button2021; Kounali et al., Reference Kounali, Button, Lewis, Gilbody, Kessler, Araya, Duffy, Lanham, Peters, Wiles and Lewis2022). Many studies on the factor structure of the PHQ-9 have found that a one-factor model adequately explains item variance (Lamela et al., Reference Lamela, Soreira, Matos and Morais2020). However, other studies have suggested that a two-factor model, consisting of somatic and psychological latent factors, provides a better fit (Chilcot et al., Reference Chilcot, Rayner, Lee, Price, Goodwin, Monroe, Sykes, Hansford and Hotopf2013). Somatic latent factors in two-factor models of the PHQ-9 include 3–5 items (Lamela et al., Reference Lamela, Soreira, Matos and Morais2020). The 3-item version consists of item 3 (sleep disturbance), item 4 (fatigue) and item 5 (appetite changes) (Chilcot et al., Reference Chilcot, Rayner, Lee, Price, Goodwin, Monroe, Sykes, Hansford and Hotopf2013; Lamela et al., Reference Lamela, Soreira, Matos and Morais2020), which are commonly experienced in SSc. We performed a two-factor confirmatory factor analysis (CFA) to verify the 3-item somatic latent factor using SPIN Cohort data and found that the model fit well (Comparative Fit Index = 0.997; Tucker–Lewis Index = 0.995; Root Mean Square Error of Approximation = 0.058; Standardised Root Mean Square Residual = 0.036). See Appendix 1 for CFA methods and results.
Statistical analysis
In August 2024, prior to initiating the experiment, we determined that 903, 910, 851 and 798 SPIN Cohort participants completed all measures in their scheduled assessments during the previous 4 3-month periods (August 2023–October 2023, November 2023–January 2024, February 2024–April 2024 and May 2024–July 2024). Assuming that at least 798 participants would log in to the SPIN Cohort platform to complete an assessment and be randomised, using a two-tailed test with a significance level of 0.05, we calculated that we would be able to detect an estimated effect size of 0.20 standardised mean difference, which is considered a small difference (Cohen, Reference Cohen1988), with 80% power.
Two-tailed between-groups t-tests with 95% confidence intervals (CIs) were used to compare the mean sum score of the 3 PHQ-8 somatic items (item 3 = sleep disturbances, item 4 = fatigue and item 5 = appetite changes) and mean scores of individual somatic items between participants in the Reordered Items and Standard PHQ-8 arms. We determined pre-experiment that analyses would be conducted as complete case analyses because we did not expect significant missing data. However, per our protocol, if >10% of randomised participants had not completed all PHQ-8 items, we would have conducted intent-to-treat analyses with missing data handled using multiple imputations by chained equations (Rubin, Reference Rubin1987; van Buuren and Groothuis-Oudshoorn, Reference van Buuren and Groothuis-Oudshoorn2011). Analyses were conducted using the statistical software R (R Core Team, n.d.).
Post hoc analysis
We conducted post hoc subgroup analyses at the request of a peer reviewer to explore the possible effects of sex (male and female), age (≤60 years and >60 years), SSc subtype (diffuse and limited) and language (English, French, Spanish) on our results. We used linear regression models to examine the interaction between each subgroup variable and our experiment arms.
Results
The experiment was conducted during participant assessments from 17 January to 17 April 2025.
Participants
Of 881 SPIN Cohort participants who were randomised, 30 (3%) did not complete all PHQ-8 items and were excluded. Thus, 851 (97%) participants with complete PHQ-8 data were included in the analyses in the Reordered Items (N = 428) and Standard PHQ-8 (N = 423) arms. There were no substantive differences between participants who completed all PHQ-8 items and those who did not complete all items. See Appendix 2 for a comparison of characteristics between included and excluded participants. The flow of participants is presented in Figure 1.

Figure 1. Participant flow diagram.
Sociodemographic variables and disease characteristics are presented in Table 1. Mean (SD) age was 62.3 (11.8) years, 752 participants were female (88%), 703 participants identified as White (83%), 291 participants were classified as having diffuse SSc (34%) and mean (SD) time since onset of first non-Raynaud’s disease manifestation was 16.9 (9.8) years. Participants were from France (N = 326, 38%), the United States (N = 218, 26%), Canada (N = 197, 23%), the United Kingdom (N = 75, 9%) and Australia, Spain, or Mexico (N = 35, 4%). Characteristics of participants in the Reordered Items and Standard PHQ-8 arms were similar.
Table 1. Participant sociodemographic and disease characteristics at baseline

a Due to missing data, total sample size N < 851 for some characteristics.
b Race or ethnicity data were self-reported in each country using standard categories used in that country. Therefore, categories differed between countries.
Outcomes
The mean (SD) PHQ-8 score was 6.0 (5.3) for the full sample, 6.0 (5.4) for Reordered Items arm participants and 6.0 (5.2) for Standard PHQ-8 arm participants. Outcome comparisons are presented in Table 2. The difference in PHQ-8 somatic item sum scores was not statistically significant (0.05 points; 95% CI −0.29 to 0.38). Differences in mean item scores were 0.04 points (95% CI −0.09 to 0.19) for item 3 (sleep disturbances), 0.03 points (95% CI −0.11 to 0.16) for item 4 (fatigue) and −0.03 points (95% CI −0.15 to 0.10) for item 5 (appetite changes).
Table 2. Differences in mean Patient Health Questionnaire-8 somatic item scores between the Reordered Items and Standard PHQ-8 groups

CI, confidence interval.
a Differences between groups are calculated as Standard PHQ-8 – Reordered Items.
See Appendix 3 for post hoc analysis results. We did not find any significant subgroup interactions.
Discussion
We examined whether administering PHQ-8 somatic items outside the context of a depression questionnaire would influence item scores among people with SSc, a chronic condition with substantial somatic symptom burden, including symptoms that overlap with PHQ-8 somatic items (sleep disturbance, fatigue, appetite changes). We did not find a statistically significant or substantive difference in mean PHQ-8 somatic item sum scores between participants who completed these items outside the context of a depression symptom assessment or as part of the standard PHQ-8. Our estimated difference of 0.05 points was 1.4%–2.5% of the minimal important difference (Bauer-Staeb et al., Reference Bauer-Staeb, Kounali, Welton, Griffith, Wiles, Lewis, Faraway and Button2021; Kounali et al., Reference Kounali, Button, Lewis, Gilbody, Kessler, Araya, Duffy, Lanham, Peters, Wiles and Lewis2022). There were no statistically significant or substantive differences in individual somatic item scores.
To the best of our knowledge, this is the first study to test the hypothesis that context effects may explain why people with chronic illnesses with substantial somatic symptom burden do not report more somatic items on self-report depression assessments like the PHQ-8 than people with similar psychological symptom scores but without a chronic illness. Our results suggest that findings from previous studies that did not find DIF in PHQ-9 scores (Leavens et al., Reference Leavens, Patten, Hudson, Baron and Thombs2012; Jones et al., Reference Jones, Ludman, McCorkle, Reid, Bowles, Penfold and Wagner2015; Cook et al., Reference Cook, Kallen, Bombardier, Bamer, Choi, Kim, Salem and Amtmann2017; Hu and Ward, Reference Hu and Ward2017; Marrie et al., Reference Marrie, Lix, Bolton, Fisk, Fitzgerald, Graff, Hitchon, Kowalec, Marriott, Patten, Salter and Bernstein2023) are not due to context effects.
Over 85% of SPIN Cohort participants have gastrointestinal symptoms. Previous studies have found that approximately 90% of people with SSc experience fatigue and that more than 75% have sleep difficulties (Bassel et al., Reference Bassel, Hudson, Taillefer, Schieir, Baron and Thombs2011; Willems et al., Reference Willems, Kwakkenbos, Leite, Thombs, van den Hoogen, Maia, Vliet Vlieland and van den Ende2014). It is possible that people with SSc may become accustomed to these symptoms over time, which would affect how they view their severity. In this case, they might view the same symptom presentation as less severe compared to someone without a chronic illness. Another possibility is that our findings reflect shared pathways in physical and mental disorders. Large bodies of evidence have linked depression to inflammation and immune symptom dysfunction, both of which are core elements to many chronic conditions (Pasco et al., Reference Pasco, Moylan, Allen, Stuart, Hayley, Byrne and Maes2013; Miller and Raison, Reference Miller and Raison2016; Beurel et al., Reference Beurel, Toups and Nemeroff2020; Troubat et al., Reference Troubat, Barone, Leman, Desmidt, Cressant, Atanasova, Brizard, El Hage, Surget, Belzung and Camus2021), including SSc (Allanore et al., Reference Allanore, Simms, Distler, Trojanowska, Pope, Denton and Varga2015). It is possible that somatic symptoms of depression cannot be easily separated from similar somatic symptoms in SSc.
Our study has several strengths. We were able to include approximately 850 people, which provided sufficient power to detect even very small differences if they were present. Since we conducted our experiment as part of routine SPIN Cohort assessments, participants were not aware that an experiment was being done and were, thus, fully blinded to study conditions and our hypothesis. There are also limitations to consider. First, the SPIN Cohort is a convenience sample, and the outcome measure was completed online, which may reduce generalisability. However, participant characteristics from the SPIN Cohort have been shown to be comparable to those of other large SSc cohorts (Dougherty et al., Reference Dougherty, Kwakkenbos, Carrier, Salazar, Assassi, Baron, Bartlett, Furst, Gottesman, van den Hoogen, Malcarne, Mouthon, Nielson, Poiraudeau, Sauvé, Boire, Bruns, Chung, Denton, Dunne, Fortin, Frech, Gill, Gordon, Herrick, Hinchcliff, Hudson, Johnson, Jones, Kafaja, Larché, Manning, Pope, Spiera, Steen, Sutton, Thorne, Wilcox, Thombs and Mayes2018). Furthermore, it is unlikely that differences in sociodemographic or SSc disease characteristics would alter the degree to which context effects might play a role in depression symptom assessment. Second, it is possible that some SPIN Cohort participants may have recognised PHQ-8 somatic items even when they were administered separately and without reference to the PHQ-8. We believe that this is unlikely since the PHQ-8 had not been administered as part of SPIN Cohort routine assessments since October 2020, over 4 years before our experiment. Furthermore, the individual PHQ-8 items are written in a way that does not give any indication that they refer to somatic symptoms related to depression or any other cause.
In summary, we conducted a randomised experiment to test whether context effects may explain why people with chronic illnesses do not appear to report more somatic symptoms on depression self-report measures compared to others without chronic illnesses with similar psychological symptom levels. Our results showed no difference in PHQ-8 somatic item scores when these were administered separately from cognitive and mood-related items versus as part of the standard PHQ-8. We did not find evidence that people’s knowledge that depression is being assessed influences how they report somatic symptoms of depression that may overlap with symptoms of physical illnesses.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S204579602610047X.
Availability of data and materials
De-identified participant data reported in this study will be made available upon request to the corresponding author.
Acknowledgements
None.
Financial support
The Scleroderma Patient-centred Intervention Network Cohort has received funding from the Canadian Institutes of Health Research (CIHR); the Arthritis Society; the Lady Davis Institute for Medical Research of the Jewish General Hospital, Montreal, Quebec, Canada; the Jewish General Hospital Foundation, Montreal, Quebec, Canada; McGill University, Montreal, Quebec, Canada; the Scleroderma Society of Ontario; Scleroderma Canada; Sclérodermie Québec; Scleroderma Manitoba; Scleroderma Atlantic; the Scleroderma Association of BC; Scleroderma SASK; Scleroderma Australia; Scleroderma New South Wales; Scleroderma Victoria and the Scleroderma Foundation of California. SH was supported by a CIHR Canadian Graduate Scholarship – Master’s award, and MCG and BDT by Canada Research Chairs, all outside of the present work.
Open access funding provided by Radboud University Nijmegen. Open access funding provided by Radboud University Medical Center.
Competing interests
The authors have no conflicts of interest to declare.
Ethical standards
The SPIN Cohort study was approved by the Research Ethics Committee of the Centre intégré universitaire de santé et de services sociaux du Centre-Ouest-de-l’Île-de-Montréal (#MP-05-2013-150) and by the ethics committees of all recruiting sites. No additional ethics approval was needed for the present study.


