Introduction
There is a long-standing view that all children around the world follow similar patterns of biological and cognitive development, although there are marked individual differences in developmental rates, temperament and adaptive success among them (Achenbach et al. Reference Achenbach, Becker, Döpfner, Heiervang, Roessner, Steinhausen and Rothenberger2008). Within this developmental framework, culture strongly shapes the environments in which children develop, what consequently might lead to specificities in mental health expressions across different culture groups (Nikapota & Rutter, Reference Nikapota, Rutter, Rutter, Bishop, Pine, Scott, Stevenson, Taylor and Thapar2008). Over the past decades, variations in rates of disorders across cultural groups were observed, mostly due to the presence of culture-specific mental disorders, differences in the manifestation of disorders, and differences in risk factors across cultural/ethnic groups (Nikapota & Rutter, Reference Nikapota, Rutter, Rutter, Bishop, Pine, Scott, Stevenson, Taylor and Thapar2008).
Much of what we currently know about child's mental health internationally is based on the two main assessment systems – the Achenbach System of Empirically Based Assessment (ASEBA) and Strengths and Difficulties Questionnaire (SDQ) (Achenbach et al. Reference Achenbach, Becker, Döpfner, Heiervang, Roessner, Steinhausen and Rothenberger2008). Both systems consider a dimensional approach to child and adolescent mental health assessment and both emphasise cross-cultural perspectives with self-, parent and teacher rating scales developed in various languages (Achenbach, Reference Achenbach1991a , Reference Achenbach b , Reference Achenbach c ; Goodman, Reference Goodman1997, Reference Goodman2001; Goodman et al. Reference Goodman, Ford, Corbin and Meltzer2004; Achenbach & Rescorla, Reference Achenbach and Rescorla2007).
Using the youth self-report (YSR) from the ASEBA or the SDQ self-report, it was observed that the prevalence rates of general psychopathology that adolescents report differ substantially across nations/countries (Achenbach et al. Reference Achenbach, Becker, Döpfner, Heiervang, Roessner, Steinhausen and Rothenberger2008). For example, considering the SDQ only (Goodman, Reference Goodman1997, Reference Goodman2001), the rates of self-reported mental health problems in adolescent samples were 6.6% in Germany (Ravens-Sieberer et al. Reference Ravens-Sieberer, Wille, Erhart, Bettge, Wittchen, Rothenberger, Herpertz-Dahlmann, Resch, Hölling, Bullinger, Barkmann, Schulte-Markwort and Döpfner2008a ), 8.7% in Ireland (Greally et al. Reference Greally, Kelleher, Murphy and Cannon2010), or 5.3% in the Gaza Strip (Thabet et al. Reference Thabet, Stretch and Vostanis2000). However, in cross-cultural studies comparing analogous samples, a 1.6–2.8-fold difference in the rates had been observed across several countries (Ravens-Sieberer et al. Reference Ravens-Sieberer, Erhart, Gosch and Wille2008b ; Lai et al. Reference Lai, Luk, Leung, Wong, Law and Ho2010; Atilola et al. Reference Atilola, Balhara, Stevanovic, Avicenna and Kandemir2013). The reasons why the prevalence rates estimated by self-reports differ substantially across various nations can be many. There might be inherent cross-cultural differences due to many different economic, social and cultural factors that contribute to the development and expression of specific psychopathology (e.g. Hackett & Hackett, Reference Hackett and Hackett1999; Mabe & Josephson, Reference Mabe and Josephson2004; Camras & Fatani, Reference Camras and Fatani2006; Nikapota & Rutter, Reference Nikapota, Rutter, Rutter, Bishop, Pine, Scott, Stevenson, Taylor and Thapar2008). There might be cross-cultural differences due to completion rates, recruitment methods or adolescent's age and development at assessments (Achenbach et al. Reference Achenbach, Rescorla and Ivanova2012). Additionally, there might be cross-cultural differences in the SDQ or ASEBA measurement model itself, non-availability of population-specific norms, or inconsistencies in determining levels of psychopathology between a self-report questionnaire and clinical interview (Heiervang, et al. Reference Heiervang, Goodman and Goodman2008; Achenbach et al. Reference Achenbach, Rescorla and Ivanova2012; Goodman et al. Reference Goodman, Heiervang, Fleitlich-Bilyk, Alyahri, Patel, Mullick, Slobodskaya, Dos Santos and Goodman2012). This latter point is very important, because one recent study concluded that such biases are particularly likely in brief questionnaires such as the SDQ, which allow no role for clinical judgment (Goodman et al. Reference Goodman, Heiervang, Fleitlich-Bilyk, Alyahri, Patel, Mullick, Slobodskaya, Dos Santos and Goodman2012). The authors pointed out that due to these undesirable attributes, cross-national differences in SDQ caseness do not necessarily reflect comparable differences in disorder rates.
Beyond the inherent cross-cultural validity concerns, the extent to which differences in cross-national prevalence rates estimated by a self-report are determined by its measuring construct (i.e., factorial structure) is not clear so far. Actually, for a meaningful comparison across groups, demonstrating the measurement equivalence in the constructs underlying one questionnaire across the groups is required (Gregorich, Reference Gregorich2006; Milfont & Fisher, Reference Milfont and Fisher2010). There appears to be a prevailing notion that the replicability of a factorial structure of one questionnaire in different cultural groups guarantees that the questionnaire will operate equivalently across these groups and it is suitable for cross-cultural comparisons (e.g., Byrne & Watkins, Reference Byrne and Watkins2003). However, a prerequisite for cross-cultural comparisons is that the same theoretical construct is measured in each culture in the same way, namely that construct equivalence is achieved for the questionnaire measuring the construct when tested simultaneously across several cultural groups (He & van de Vijver, Reference He and van de Vijver2012). This is known as measurement equivalence (i.e., invariance) (Horn & McArdle, Reference Horn and McArdle1992). Therefore, in order to compare estimates by the questionnaire across various nations/countries, an important aspect that needs to be demonstrated is that reproducible factorial structure across different ethnic/cultural groups is also invariant (e.g., Byrne & Watkins, Reference Byrne and Watkins2003, Gregorich, Reference Gregorich2006; Milfont & Fisher, Reference Milfont and Fisher2010). Several types of measurement invariance form a nested hierarchy: dimensional, configural, metric, scalar and strict factorial (Byrne & Watkins, Reference Byrne and Watkins2003; Gregorich, Reference Gregorich2006). Dimensional invariance refers that the same number of common factors are present across groups. Assuming dimensional, configural invariance refers that the same items are associated with the same factors across groups. Assuming configural, metric invariance refers that the common factors have the same meaning across groups (i.e., the equivalence of factor loadings). Assuming metric, scalar invariance refers to the equivalent intercepts or threshold of the items and is required to compare latent means across groups. Strict factorial dictates that the regression residual variances for all items are equal across groups.
The most striking observation from validation studies of the SDQ self-report is that the replicability of the same factorial structure of the self-report is not achieved across different ethnic/cultural groups. Some factor analytic studies using different language versions did support the original five-factor model including emotional symptoms, conduct problems, hyperactivity–inattention, peer problems, prosocial behaviour (e.g., Ronning et al. Reference Ronning, Handegaard, Sourander and Morch2004; Ruchkin et al. Reference Ruchkin, Koposov and Schwab-Stone2007; Van Roy et al. Reference Van Roy, Veenstra and Clench-Aas2008; Giannakopoulos et al. Reference Giannakopoulos, Tzavara, Dimitrakaki, Kolaitis, Rotsika and Tountas2009). Other studies supported a modified five-factor model with reverse-worded items cross-loading on the other factors, being removed or with added error correlations to the factors (e.g. Van Roy et al. Reference Van Roy, Veenstra and Clench-Aas2008; van de Looij-Jansen et al. Reference van de Looij-Jansen, Goedhart, de Wilde and Treffers2011; Essau et al. Reference Essau, Olaya, Anastassiou-Hadjicharalambous, Pauli, Gilvarry, Bray, O'callaghan and Ollendick2012). A model with four factors was also supported including emotional symptoms and peer problems, conduct problems, hyperactivity–inattention, prosocial behaviour (e.g. van de Looij-Jansen et al. Reference van de Looij-Jansen, Goedhart, de Wilde and Treffers2011) and a model with three factors including internalising and externalising problems, prosocial behaviour (e.g. Koskelainen et al. Reference Koskelainen, Sourander and Vauras2001; Riso et al. Reference Riso, Salcuni, Chessa, Raudino, Lis and Altoè2010). Finally, some factor analytic studies failed to provide or provided modest support to proposed SDQ self-report models (e.g., Mellor & Stokes, Reference Mellor and Stokes2007; Percy et al. Reference Percy, McCrystal and Higgins2008).
Contrary to heterogeneous data about SDQ self-report factor structure, the factor structure found for the YSR is fairly consistent across different societies. Using data from 23 and then 44 different societies in confirmatory factor analyses (CFA), Ivanova et al. (Reference Ivanova, Achenbach, Rescorla, Dumenci, Almqvist, Bilenberg, Bird, Broberg, Dobrean, Döpfner, Erol, Forns, Hannesdottir, Kanbayashi, Lambert, Leung, Minaei, Mulatu, Novik, Oh, Roussos, Sawyer, Simsek, Steinhausen, Weintraub, Winkler Metzke, Wolanczyk, Zilber, Zukauskiene and Verhulst2007) and Rescorla et al. (Reference Rescorla, Ivanova, Achenbach, Begovac, Chahed, Drugli, Emerich, Fung, Haider, Hansson, Hewitt, Jaimes, Larsson, Maggiolini, Marković, Mitrović, Moreira, Oliveira, Olsson, Ooi, Petot, Pisa, Pomalima, da Rocha, Rudan, Sekulić, Shahini, de Mattos Silvares, Szirovicza, Valverde, Vera, Villa, Viola, Woo and Zhang2012) respectively demonstrated a consistent eight-syndrome measurement model including for the YSR. The eight-syndrome domains include: anxious/depressed, withdrawn/depressed, somatic complaints, social problems, thought problems, attention problems, rule-breaking behaviour and aggressive behaviour.
Turning to the measurement invariance of the YSR and SDQ self-report across different ethnic/cultural groups, there are several important findings. van de Looij-Jansen et al. (Reference van de Looij-Jansen, Goedhart, de Wilde and Treffers2011) demonstrated all forms of measurement invariance for proposed factor-models with the Dutch SDQ self-report in native Dutch and ethnic groups of Surinamese, Antillean/Aruban, Moroccan, Turkish and Capeverdian adolescents. Using the original English version of the SDQ and German, Cypriot Greek, Swedish and Italian translations, Essau et al. (Reference Essau, Olaya, Anastassiou-Hadjicharalambous, Pauli, Gilvarry, Bray, O'callaghan and Ollendick2012) tested measurement invariance among adolescents from five European countries. The fit indices indicated that both the five-factor and the three-factor models provided good fit for the whole sample achieving only configural invariance. Using the self-report Norwegian translation of the SDQ, across native Norwegian and ethnic groups of Pakistani, Iranian, Turkish, Somali and Vietnamese adolescents, Richter et al. (Reference Richter, Sagatun, Heyerdahl, Oppedal and Røysamb2011) however failed to demonstrate the measurement invariance of the original five-factor model. On the other hand, Verhulp et al. (Reference Verhulp, Stevens, Van de Schoot and Vollebergh2014) demonstrated the full measurement invariance of three internalising syndrome scales of the YSR across four ethnic groups including native Dutch, Surinamese, Turkish and Moroccan adolescents. On a different note, Lambert et al. (Reference Lambert, Essau, Schmitt and Samms-Vaughan2007) used the German and Jamaican versions of the YSR to test the original model considering item-response theory. They demonstrated that some items exhibit different item functioning in the YSR and only partially supported its measurement invariance (Lambert et al. Reference Lambert, Essau, Schmitt and Samms-Vaughan2007).
As shortly reviewed, there are scarce and heterogeneous results about the reproducibility of factorial structure and the measurement invariance of the two self-reports in multicultural contexts. Three of four studies on measurement invariance were organised in the same country considering only ethnic groups. To which extent the findings from these studies are generic to ethnic minority adolescents in their host nations/countries remains unclear. Another important limitation of these studies is in the use of the main language version without considering the cultural adaptations to that version for ethnic minorities. However, the cultural adaptation of a questionnaire is important for ensuring conceptual equivalence in the measurements with that questionnaire (Poortinga, Reference Poortinga1989), in order to avoid possible over- or under-evaluations of mental health from different ethnic groups. The only study evaluating the measurement invariance of the self-report across several nations included developed European countries (Essau et al. Reference Essau, Olaya, Anastassiou-Hadjicharalambous, Pauli, Gilvarry, Bray, O'callaghan and Ollendick2012), what significantly limit the generalisability of the findings to undeveloped and developing nations across the world with different socioeconomic development or cultural approach to mental health. This might be indicative that invariant cross-cultural general measures hardly exist and cross-cultural comparisons might be justified only for items within a general psychopathological measure that are identified as invariant across cultures.
Therefore, an important question that needs to be examined is the reproducibility of factorial structure of a self-report across different ethnic/cultural groups in order to evaluate whether different prevalence rates estimated by one questionnaire across various nations reflects true differences or the estimates are contaminated with the cultural-specific attributes related to the construct of interest. In order to provide more data on the applicability of the SDQ self-report measurement model in multicultural context, this study was organised to evaluate the measurement invariance of the SDQ self-report across seven national samples of convenience, sampled from India, Indonesia, Nigeria, Serbia, Turkey, Bulgaria and Croatia, which participate in our International Child Mental Health Study Group (ICMH-Study Group) project (Atilola, et al. Reference Atilola, Balhara, Stevanovic, Avicenna and Kandemir2013).
Methods
Participants
Data for the present study were obtained from the project organised by the ICMH-Study Group aiming to research mental health among children and adolescents living in undeveloped and developing countries (Atilola, et al. Reference Atilola, Balhara, Stevanovic, Avicenna and Kandemir2013). For the present study, data for adolescents aged 13–18 years from India, Indonesia, Nigeria, Serbia, Turkey, Bulgaria and Croatia were available. The same procedure was followed for recruiting participants in all countries. First, permission to interview students was obtained from local authorities and/or appropriate ethical committees in each region. Afterwards, participants were sampled from the following regions of convenience: Kikinda, Belgrade and Zajecar in Serbia, Haerul Ihwan and Rizki Mulya Rahman in Indonesia, Ms Shelza in India, Ibadan in Nigeria, Primorsko-goranska in Croatia, Varna in Bulgaria and Sanliurfa in Turkey. From these regions two to five high schools in each country were randomly selected depending of number of pupils they had. The schools were randomly selected with a list of schools in the locality stratified where possible into rural or urban.
The sampling frame per country was 560 adolescents in the 9th to 12th grade. The participants were randomly contacted by the school psychologists or counsellors. They were selected by random picking (in no particular order) from the school register, taking cognisance of gender balance. The adolescents and their teachers were informed of the study by the school psychologists and investigators. Of all contacted, only those who agreed to participate and returned the written consents were included. The adolescents completed the ICMH-Study Group set of questionnaires at schools in order to prevent a low responding rate. The questionnaires were administered with the adolescents while seated in schools and they had enough space for comfort and privacy. To ensure those teachers and the school authorities had no insight into the adolescents' responses, teachers were excused from the hall and the adolescents were provided with sealable envelopes with which completed questionnaires were returned.
Instrument
The SDQ is a brief behavioural screening questionnaire for 3–16 year olds and it exists in several language versions (Goodman, Reference Goodman2001). Specifically, the SDQ self-report is available in more than 77 language versions translated and culturally validated following linguistic procedures provided by the developer (Goodman, Reference Goodman2001). The language versions used in the study were obtained from www.sdqinfo.org. The SDQ self-report has 25 items in five-item scales: emotional, conduct, hyperactivity/inattention and peer relationship problems and prosocial behaviour (Goodman, Reference Goodman2001). Each item has a three-point response scale (0 = not true; 1 = somewhat true; 2 = certainly true), with the five items of the problem scales that reflect strengths were reverse scored. The sum of all answered items in a scale creates its total score (possible range 0–10), while the sum of all answered items in the first four scales creates the total score (possible range 0–40).
Data analyses
A series of CFA was conducted to identify the best fitting model that can be applied in all countries. We tested seven different models available across the literature mentioned in the introduction. The original five-factor model, a three-factor model, a one-factor model, a bifactor model with five independent specific factors, a bifactor model with five correlating factor, a bifactor model with three independent specific factors, and finally a bifactor model with three correlating specific factors.
All analyses were performed with MPLUS 7.11 (Muthén & Muthén, Reference Muthén and Muthén1998–2012). Weighted Least Squares Mean and Variance adjusted (WLSMV) estimation method was used ( Brown, Reference Brown2006; Finney & Di Stefano, Reference Finney, Di Stefano, Hancock and Mueller2006). The items were treated as ordinal indicators. The analyses were based on WLSMV estimation which utilises the entire weight matrix to compute s.e. for the parameters, but this method avoids the matrix inversion (Finney & Di Stefano, Reference Finney, Di Stefano, Hancock and Mueller2006).
In order to define the best fitting model we applied several fit indices. A satisfactory degree of fit requires the comparative fit index (CFI) and the Tucker–Lewis index (TLI) to be close to 0.95, and the model should be rejected when these indices are <0.90 (Brown, Reference Brown2006). The next fit index was root-mean-squared error of approximation (RMSEA) and values below 0.05 indicate excellent fit, a value around 0.08 indicates adequate fit, and a value above 0.10 indicates poor fit (Browne & Cudek, Reference Browne, Cudek, Bollen and Long1993; Kline, Reference Kline2011). Closeness of model fit using RMSEA (CFit of RMSEA) is a statistical test (Browne & Cudek, Reference Browne, Cudek, Bollen and Long1993), which evaluates the statistical deviation of RMSEA from the value 0.05. Non-significant probability values (p > 0.05) indicate acceptable model fit, though some methodologists would require larger values such as p > 0.50 (Brown, Reference Brown2006). In order to compare alternative nested models using WLSMV estimator, we used the DIFFTEST procedure within MPLUS (Asparouhov & Muthén, Reference Asparouhov and Muthén2006). We also planned to test the configural, metric and scalar invariance of SDQ, however we could not identify a measurement model that could be applied to data from all countries, therefore we did not perform these procedures.
After the failure of finding common measurement models across countries, we changed our strategies and performed exploratory factor analysis (EFA) in which we treated the indicators as ordinal scale, therefore the estimation method was also WLSMV and rotation was GEOMIN. In order to find the number of factors to extract, we also considered fit indices and interpretability of factor solutions. We considered important cross-loadings when the sizes of the factor loadings were higher than 0.30.
Results
Data from 2367 adolescents was available for this study. There were statistically significant differences between the countries in the participants' age (p < 0.0001) and gender (p < 0.001) (Table 1). Table 2 shows the SDQ scores across the countries.
Distribution of participants by age and gender across seven countries
*χ 2 (df) = 60.89 (6), p < 0.001; **F (df) = 201.73 (6), p < 0.0001.
Distribution of the SDQ scores across seven countries
Confirmatory factor analyses
Seven competing measurement models and the fit indices tested across the counties are presented in Table 3. The one-factor model (Model 1) was a starting model which yielded an inadequate level of fit in all countries. The original five-factor model (Model 2) also generated inadequate fit degree in all countries. Only in Croatia, the degree of fit of this model was close to the acceptable level, however in all other countries neither the RMSEA nor CFI, TLI were close to the acceptable. Because specification searches based on modification indices are more likely to be successful when the model contains only minor misspecifications (MacCallum, Reference MacCallum1986; Brown, Reference Brown2006), we did not examine further the cross-loadings and error covariances in this model. The model depicting three correlating factors (Model 3) also did not reach the adequate degree of fit. The classical bifactor model (Model 4) which specifies one general factor and five uncorrelated specific factors did not fit the data satisfactorily, even in data from two countries this model could not been identified. We tested a bifactor model with one general factor and three uncorrelated specific factors (Model 5), but this model did not reach again the satisfactory degree of fit. We estimated two modified bifactor models which contains the correlations between specific factors. Model 6 which is a bifactor model with five correlating specific factors and yielded adequate degree of fit in data from four countries including India, Nigeria, Turkey and Croatia. In all other countries, the degree of fit approached the adequate level. We also tested a bifactor model with three correlating specific factors (Model 7) which also yielded adequate degree of fit in India, Nigeria, Turkey and Croatia. Comparison of Model 6 and Model 7 resulted significant Δχ 2 value ranged between 20.0–44.1 (at least p < 0.006) with df = 7 in six countries, Δχ 2 value was not significant (Δχ 2 = 7.9; df = 7 p = 0.35) only in the Indonesian sample. Due to the lack of common acceptable model or in other words lack of dimensional invariance for the seven countries it was not possible to perform the other types of the invariance test.
Degree of model fit for five competing measurement models of the SDQ from seven different countries
Note. CFI, comparative fit index; TLI, Tucker–Lewis index; RMSEA, root-mean-squared error of approximation; Cfit of RMSEA, probability of RMSEA.
Exploratory factor analyses
Because the confirmatory analyses revealed that none of the tested models fits well to the data across the countries, we performed a series of EFA on data from each country separately in order to find which factors with corresponding items are replicable across the countries. EFA was performed with WLSMV estimator and we extracted five factors based on previous research and the inspections of eigenvalues and fit indices. Eigenvalues of factors in each sample are presented in Fig. 1. We also applied GEOMIN rotation which is an oblique type of rotation (Yates, Reference Yates1987). Analysing the factor loadings we identified three common factors (see Appendix 1 online). The three factors were common in each country namely prosocial behaviour, emotional symptoms and conduct problems. We also found two factors that had different meanings in each country. After inspection of factor loadings, three items defined prosocial behaviour factor across countries (item 1 ‘try to be nice to other people’, 4 ‘share with others’ and 9 ‘being helpful’). Some other items also loaded considerably on this factor but not in all countries, in these latter cases these items loaded saliently on other factors. Emotional symptoms factor was formed by five items (item 3 ‘headaches, stomach-aches or sickness’, 8 ‘worry a lot’, 13 ‘unhappy’, 16 ‘being nervous in new situations’ and 24 ‘having many fears’), however all items have one or two salient cross-loadings (>0.30) on other factors as well in some countries. Conduct problems factor was defined by two items including item 12 ‘fight a lot’, and 18 ‘accused of lying’.
Eigenvalues of factors of EFA.
Discussion
Previous studies with different language versions provided evidence to support different models for the SDQ self-report. However, a few studies tested the measurement models across various cultures and several countries. In the present report, we initially tested seven competing measurement models for the SDQ self-report in each country separately. Our results indicate that the original five-factor model had inadequate fit degree across all countries, contrary to the previous findings that mostly supported this model (e.g., Ronning et al. Reference Ronning, Handegaard, Sourander and Morch2004; Ruchkin et al. Reference Ruchkin, Koposov and Schwab-Stone2007; Van Roy et al. Reference Van Roy, Veenstra and Clench-Aas2008; Giannakopoulos et al. Reference Giannakopoulos, Tzavara, Dimitrakaki, Kolaitis, Rotsika and Tountas2009). Additionally, the model depicting three correlating factors did not reach the adequate degree of fit, contrary to some previous studies that reported acceptable model fit (e.g., Koskelainen et al. Reference Koskelainen, Sourander and Vauras2001; Dickey & Blumberg, Reference Dickey and Blumberg2004; Riso et al. Reference Riso, Salcuni, Chessa, Raudino, Lis and Altoè2010). Furthermore, the classical bifactor model which specifies one general factor and five uncorrelated specific factors did not fit the data satisfactorily as well, even in data from Indonesia and Croatia could not been identified. Considering that the most recent study supported a modification as a bifactor model including the five-factors and the general problem factor (Kóbor et al. Reference Kóbor, Takács and Urbán2013), we included this model and its modifications as well. A modified, bifactor model with six correlating specific factors and a bifactor model with three correlating specific factors yielded adequate fit degree in India, Nigeria, Turkey and Croatia, with the second one being more appropriate. This finding implies that the same five-factors and the general problem factor are common only for four countries. However, due to the lack of a common acceptable model across all seven countries, namely the same numbers of factors (i.e., dimensional invariance), it was not possible to perform the metric and scalar invariance test, which indicates that the SDQ self-report models tested lack appropriate measurement invariance across the countries included.
Turning to the results from a series of EFA on data from each country separately, it was observed that the prosocial behaviour, emotional symptoms and conduct problems factor were common for all countries. However, originally proposed items from these factors/scales loaded saliently on other factors besides the proposed ones or only some of them corresponded to proposed factors in all seven countries. Three items defining the prosocial behaviour factor were common for all countries, while the emotional symptoms factor was formed by five items. However, all items have one or two salient cross-loadings on the other factors. The conduct problems factor was defined only by two items. These items that loaded on the mentioned factors only could be regarded as culture-independent in the self-report. However, other items, especially the items of the peer problems and hyperactivity factors were perceived differently across the countries and they could be regarded as strongly influenced by specific factors – culture-dependent items.
A recent study using the data from Germany, Cyprus, England, Sweden, Italy tested the measurement invariance of the five-factor and three-factor model (Essau et al. Reference Essau, Olaya, Anastassiou-Hadjicharalambous, Pauli, Gilvarry, Bray, O'callaghan and Ollendick2012). A good fit of the data was found for the whole sample for both models, but it was observed that the SDQ structure might be different across the five countries. The study also confirmed only configural invariance, which has been found to be an insufficient form of invariance for appropriate cross-cultural comparisons (Gregorich, Reference Gregorich2006). The findings of our study strongly agree with this one about the measurement non-invariance of the SDQ self-report measurement model across nations indicating that the current SDQ models might not be suitable for comparisons in a multinational cross-cultural context. There may be several reasons for the measurement non-invariance of the SDQ self-report. First, there might be genuine differences in valuating, reporting and/or expressing psychological symptoms among adolescents from different nations, as it was observed for the YSR (Lambert et al. Reference Lambert, Essau, Schmitt and Samms-Vaughan2007). Accumulated evidence on child and adolescent psychopathology shows significant variations in rates of disorders across socio-cultural/ethnic groups, with culture-specific mental disorders suspected, different manifestations of disorders, and levels of similarities or differences in risk factors across groups (e.g., Nazroo, Reference Nazroo1998; Achenbach, et al. Reference Achenbach, Becker, Döpfner, Heiervang, Roessner, Steinhausen and Rothenberger2008; Nikapota & Rutter, Reference Nikapota, Rutter, Rutter, Bishop, Pine, Scott, Stevenson, Taylor and Thapar2008). Considering the SDQ is designed to screen for universally represented symptoms of specific disorders, it is possible that its items are more sensitive to one culture and less to another or that they are easily confounded by the culture-specific attributes related to the construct. In other words, the norms associated with a particular dimension in one culture confound cross-cultural comparisons. In this regard, some items might not represent specific psychopathology as intended or there might be some unimportant items compared to culture specific and reference norms (e.g. Heine et al. Reference Heine, Lehman, Peng and Greenholtz2002), what was only possible to recognise during the translation and cultural adaptation process of the SDQ from English into other languages (Berry et al. Reference Berry, Poortinga, Segall, Dasen, Berry, Poortinga, Segall and Dasen2002). Additionally, Goodman et al. (Reference Goodman, Lamping and Ploubidis2010) observed that some labels like conduct problems and hyperactivity may be misleading when applied in general population samples, and this could also be a source of the observed difference. For example, it we suspect that some SDQ factors such as conduct problem and attention deficit/hyperactivity problem share common factors (e.g. Heine et al. Reference Heine, Lehman, Peng and Greenholtz2002), which might be perceived and operated in different ways cross-culturally. One important thing is that considering high percentages of comorbidity in symptoms in child and adolescents psychopathology, there might be some unrecognised, underlying impact on how the SDQs are clustered in different cultures. Last, considering that we tested only adolescents, there might be less non-invariance if parent or teacher report is used, which need to be explored in future studies using a multitrait-multimethod type of analysis which requires administering all three reports of the SDQ.
Our findings have several research implications. The findings imply that the current SDQ self-report measurement model might not allow direct cross-country comparisons in levels of adolescent psychopathology. This finding, if further replicated, has implication for epidemiological and clinical research. In evaluating the impact of interventions targeted at improving the general mental health of children and adolescents in multinational context; researchers may be cautious in using the SDQ as a measure of pre-/post-intervention changes in mental health. More clinically oriented measures may be more useful if not outright clinical evaluation itself. This note of caution is even more apt at this time when the impact of the sundry Millennium Development Goal (MDG) interventions may soon be evaluated, and child mental health may be considered as one of the outcome measures. This does not imply that it cannot be used for in-country comparisons when the specific norms are developed for that country. Currently available alternative to the SDQ self-report for cross-cultural comparative research could be the YRS, however, sufficient evidence for its strong measurement invariance also lacks. Consequently, cross-cultural comparisons might be justified for SDQ items identified as invariant across cultures or the SDQ self-repost needs to be revised for meaningful cross-cultural comparisons. Possible revisions to the SDQ need primarily to be based on creating items that are more culture-independent and less culture-dependent, which can be easily implemented in multicultural contexts. This is probably achievable when future research attempt to contrast the weaker items found in the present study with open-ended questions or, preferably, interviews to validate their meaning, understanding and rating across different cultures. Furthermore, implementing minor model modifications, changing response format or items rewording suggested previously would be of importance as well (e.g. Ronning et al. Reference Ronning, Handegaard, Sourander and Morch2004; Giannakopoulos et al. Reference Giannakopoulos, Tzavara, Dimitrakaki, Kolaitis, Rotsika and Tountas2009; Essau et al. Reference Essau, Olaya, Anastassiou-Hadjicharalambous, Pauli, Gilvarry, Bray, O'callaghan and Ollendick2012; Kóbor et al. Reference Kóbor, Takács and Urbán2013). It has also been suggested that symptom ratings may achieve better cross-cultural comparability when assessments were based on more objective measures in which persons compare them with a certain standard instead of rating symptoms on scales like Likert scales (e.g. Heine et al. Reference Heine, Lehman, Peng and Greenholtz2002). This standard could be an arithmetic mean for a particular SDQ scale.
There are some limitations that need to be taken into consideration when interpreting our findings. First, there are significant differences in the participant's age and gender distribution across the countries, what could bias the findings considering that some items might be more or less sensitive or important to age and gender then others in a specific nation. Additionally, only adolescents who agreed to participate were included and the response rate varied substantially between the countries. Second, participants were sampled from regions of convenience, although schools in the regions were randomly selected, what could limit generalisability of the findings to adolescents from other country's regions. Additionally, although making sure to include samples with different socioeconomical, culturally and religious backgrounds, this does not imply that the seven countries included are representative of the developing and undeveloped world, and the generalisability of our finding is further limited. Third, the data were based solely on the adolescents' self-report, which dichotomise the outcome, and no behavioural observations or clinical indices were used to confirm this self-report measure (Purgato & Barbui, Reference Purgato and Barbui2012). Fourth, parent and teacher report were also not tested for invariance and clinical samples were not included. To improve the robustness of findings, future studies evaluating SDQ models in the multicultural context and attempting to revise the measurement model need to be based on all three SDQ reports and may do well to include clinical samples.
In conclusion, the study showed the measurement non-invariance of the SDQ self-report measurement model across several nations indicating that the current SDQ models might not be suitable for cross-national cross-cultural comparisons.
Acknowledgements
We would like to thank to all adolescents who participated in this project. In addition, we would like to thank to two reviewers who gave substantial comments to the manuscript.
Financial Support
This research received no specific grant from any funding agency, commercial or not-for-profit sectors.
Conflict of Interests
None.
Ethical Standards
All adolescents were informed about the aims and procedures of the study. Those who were 16 years and above signed consent forms while the younger participants returned signed parental consent and personal assent forms. Additionally, the authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008. The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional guides on the care and use of laboratory animals.
Supplementary material
The supplementary materials referred to in this article can be found at http://dx.doi.org/10.1017/S2045796014000201