School-based interventions targeting stigma of mental illness: systematic review

Aims and method To systematically review the published literature on the effectiveness of classroom-based interventions to tackle the stigma of mental illness in young people, and to identify any consistent elements within successful programmes. Results Seventeen studies were included in the analysis. A minority of studies reported a positive impact on stigma or knowledge outcomes at follow-up and there were considerable methodological shortcomings in the studies reviewed. These interventions varied substanitally in content and delivery. It was not possible to use this data to draw out what aspects make a successful intervention. There is currently no strong evidence to support previous conclusions that these types of intervention work for children and adolescents. Clinical implications When anti-stigma interventions for young people are rolled out in the future, it is important that the programme design and method of delivery have evidence to prove their effectiveness, and that the audience and setting are the most appropriate to target. There is a current lack of strong evidence to inform this.


Inclusion criteria
The types of studies included (using Cochrane Effective Practice and Organisation of Care (EPOC) group definitions) were randomised controlled trials (RCTs), cluster RCTs, non-randomised controlled trials (NRTs), or controlled before-after studies (CBA). Participants were children or adolescents attending primary or secondary school. Schoolbased interventions targeting attitudes and stigma about mental illness were included. Studies were included if they measured outcomes of: knowledge/beliefs and attitudes towards mental illness, behavioural intentions, stigmatising behaviour or affect. The analysis of help-seeking outcomes is not covered in this review, because help-seeking is not directly associated with stigmatising attitudes/behaviour. Level of knowledge is also not directly associated with stigmatising attitudes but these outcomes are included as many of the 'knowledge' measures contain some belief and attitude statements. Known reliability/validity of the instruments was not an inclusion criterion, but will be commented on within the results.
Aims and method To systematically review the published literature on the effectiveness of classroom-based interventions to tackle the stigma of mental illness in young people, and to identify any consistent elements within successful programmes.
Results Seventeen studies were included in the analysis. A minority of studies reported a positive impact on stigma or knowledge outcomes at follow-up and there were considerable methodological shortcomings in the studies reviewed. These interventions varied substanitally in content and delivery. It was not possible to use this data to draw out what aspects make a successful intervention. There is currently no strong evidence to support previous conclusions that these types of intervention work for children and adolescents.
Clinical implications When anti-stigma interventions for young people are rolled out in the future, it is important that the programme design and method of delivery have evidence to prove their effectiveness, and that the audience and setting are the most appropriate to target. There is a current lack of strong evidence to inform this.
Declaration of interest None.

School-based interventions targeting stigma of mental illness: systematic review
Catriona Mellor 1

Search methods and study selection
The following search engines were used: Medline, CINAHL and PsycINFO (1990-2013, articles in English) on 12 June 2013, using the keywords (stigma* OR attitude* OR awareness) AND (school or adolesc*) AND (educat* OR train* OR program*) AND (mental OR schizophreni* OR psychiatri*). The references lists cited in relevant reviews were also checked. [16][17][18][19][20][21] Studies were selected for inclusion by screening titles, abstracts and when necessary full texts, against the inclusion criteria.

Data extraction and critical appraisal
A data-extraction form based on the Cochrane EPOC group's data-collection checklist was used to record details about study characteristics, intervention design, outcome measures and results. Following this process the group's recently updated 'suggested risk of bias criteria for EPOC reviews' 22 was used to make judgements on the risk of bias (high, low or unclear) in each study in each of the domains suggested by the document. The domains assessed were: allocation sequence generation and concealment, baseline outcome measures and characteristics, comparison of site profiles (if applicable), protection against contamination, masking, completeness of outcome data, and outcome reporting (were data for each outcome, group and time point fully presented). In addition to this, the reliability and validity of the instruments used, as documented in the study reports, was noted.

Data synthesis
The review looked at the intervention effect of each study by comparing before and after outcome scores in the intervention and control groups. First, studies that provided follow-up data (rather than simply immediate post-test data) were reviewed. Of these, studies that reported a positive result (a statistically significant, P<0.05, change in any outcome measure compared with control) after the intervention were selected. These studies were reviewed for study quality, as judged by study design and risk of bias criteria. Studies with positive results at immediate postintervention only were then reviewed for study quality. Positive results based on the use of specifically developed outcome measures with low reliability were excluded. To answer the second review question the intervention design features (such as duration, contact or non-contact, delivery) of those studies showing positive results were tabulated and compared.

Results
Of the 1261 studies identified in the initial search, 17 met the above criteria (Fig. 1). 23

Intervention and study characteristics
The interventions varied in content and delivery methods (online Table DS1). Nine were education-only, 24-32 whereas eight had indirect 33,34 or direct [35][36][37][38][39][40] contact with someone with lived experience. Fifteen studies targeted secondaryschool pupils, two targeted primary school pupils. 28,31 One included a few individuals over 18. 35 The duration ranged from one-off interventions lasting 30-120 min to multiple sessions over a period of up to 4 months. The focus of the interventions was mental illness in 11 studies, schizophrenia in 3 and depression in 3. Five studies investigated the impact of already established interventions. 30,[36][37][38]40 REVIEW ARTICLE The number of participants varied from 40 to 616. The follow-up time ranged from immediately post intervention only, up to 12 months. The outcome measures were secondary outcomes in two studies, 26,27 which are shown at the end of Table DS1. One study was an RCT. 34 Five studies were cluster RCTs, two using cluster randomisation at the school level, 24,28 three using cluster randomisation at the class/year level within selected schools. 32,33,40 Four studies were NRTs, six were CBA trials. It is unclear whether one study was an NRT or CBA. 38 The comparison groups, other than Chan et al's, 33 which compared three intervention conditions, had normal lessons (no intervention) in 14 of the studies, a talk about healthy living from external speakers in 1 32 and a video presentation about smoking in another. 34 The vast majority therefore did not control for the effect of a novel programme, in many cases with outside speakers. Table DS1 shows all outcome measures used within the studies. Results from two additional scales were excluded as irrelevant to the review question (the Self-Efficacy Scale 38 and the Strengths and Difficulties Questionnaire 29 ). Of the remaining 31 outcome measures used (and reviewed here), most measures were 'stigma' measures: attitudes, behavioural intentions and in one study an affect measure. 40 In addition, several studies tested factual knowledge gained. No studies measured actual behaviour. All measurements were selfreport Yes/No, True/False or Likert-style questionnaires, except for the Implicit Association Test (IAT), 34 where participants categorise words as quickly as possible.

Outcome measures
For 14 of the instruments reliability was reported as good, in all but one of these cases the studies chose to report internal consistency (Cronbach's alpha) to back up that claim. The instruments' validity was usually not mentioned in the report, although some studies used previously welltested instruments.
A total of 13 of the instruments were designed for the intervention or study; 6 of these had poor (or untested) reliability, casting doubt also on their validity 24,[29][30][31]37,40 and therefore on the results that they provide. These six were all knowledge measures. The other seven were piloted and/or internally consistent. 30,31,33,36 Study quality Details of study quality are provided in Table 1. Only one study, a cluster RCT, adequately described randomisation and allocation concealment. 24 Baseline outcome measures and baseline group characteristics were clearly compared and similar in nine (in addition, one study showed similarity in one but not the other outcome 37 ) and six studies respectively. Four studies used different sites as their control and intervention groups and none of these studies clearly compared (with measures of significance) the sites' profiles. These four studies only were able to clearly protect against contamination. Due to using self-report questionnaires none of the outcome measures were masked or objective (the IAT is 'self-report' but aims to assess automatic memory associations and therefore is less open to bias 34 ). There was the potential of attrition bias being introduced because of incomplete data in 13 studies. Most of the studies did not omit important outcome data in their reports. Four studies mention a power calculation. One was underpowered, 40 the other three report having sufficient sample size. 26,27,29 Various methods were used to enhance consistency of delivery. In two studies the presenters were trained and sessions monitored for fidelity 29,40 and two interventions used a computer program. 26,27 Five others mention training the presenters, 24,33,34,37,38 the remaining eight provided material for the presenters to follow.

Intervention effects
To answer the first review question it is helpful to look at whether the studies with positive changes in stigma (and knowledge) outcomes are of high enough quality to give confidence in their findings. The final two rows of Table 1 show which studies reported statistically significant results at follow-up (for knowledge and stigma outcomes). Results of each outcome measure are tabulated as either reporting a significant positive change (a tick) or no significant change (a cross). Table 2 gives an overview of the results reported in the studies at post-test and follow-up, and indicates whether the outcomes measured changed significantly (a tick) or not (a cross). Results from the six outcome measures developed specifically for the interventions they were testing, with reliability not measured (or a50.7), are not included in this section. Table 2 shows which outcome measures this applies to (represented by /). For two studies, where this involved the only instrument used, 24,29 there are therefore limited conclusions that can be drawn here despite the fact that they did otherwise have relatively good methodology, according to the risk of bias table.

Studies with positive results at follow-up
Twelve studies collected information at follow-up. Of these, seven studies showed some statistically significant positive changes at follow-up, 25,[30][31][32][33]35,36 and these are summarised here. All were at high risk of selection bias except for the two cluster RCTs, which did not, however, have a clearly described method of randomisation. All had high-risk levels of attrition or an unclear description of individuals who dropped out, except for Economou et al. 32 Economou et al's 32 cluster RCT compared change in mean score per item on their belief/attitude questionnaire and reported that 8/10 items were answered significantly better at follow-up than baseline in the intervention group. They report no significant change in the control group but do not present these data. There was no significant improvement in social distance scores at follow-up. 32 Chan et al's 33 cluster RCT showed significant positive change in knowledge and social distance but not stigma at follow-up. This study discarded 35% of their data (because of absenteeism or returning incomplete measures) and it was not clear from which group(s) these missing data were from.
Ventieri et al's 31 study in a primary school used Schulze et al's 36 social distance scale and a novel instrument to test 'benevolence' and 'unkindliness', piloted on a group of pre-adolescents and tested for reliability. The intervention group showed positive change compared with the controls Design 6, no statistically significant difference; -, outcome not measured in study; n/a, outcome not measured in study at this time point. More than one tick or cross in a cell indicates that more than one outcome measure was evaluated. Results from the six outcome measures developed specifically for the interventions they were testing with reliability not measured (or a50.7) are not included. If this leaves no results at a time point, this is represented by /. Design in all three measures. Schools were invited into the study based on assignment (to control or intervention). 31 In Wahl et al's 30 study, mean total score in knowledge, attitudes and social distance (on scales developed for the study) improved slightly but significantly. Only 47% of eligible pupils were included in the analysis (those who took part in the three-session programme and completed all questionnaires). Schulze and colleagues summed the amount of positive responses from each student on their novel instrument testing for stereotypes and social distance. 36 Stereotypes, but not social distance, changed more positively in the intervention than in the control group. This study reported significant differences in baseline outcome measures and baseline characteristics, likely related to the fact their intervention group chose to take part in the mental illness module. Ng & Chan 35 report a significant improvement in 2/6 Opinion about Mental Illness in Chinese Community (OMICC) factors (benevolence and stigmatisation) between the intervention and control groups, but a significant worsening in both groups in attitudes to restrictiveness. Esters et al's small study (n = 40) reported statistically significant positive change on a well-validated scale measuring opinions about mental illness. 25 Studies with positive results at post-test but not follow-up There were a further four studies that report statistically significant positive results at immediate post-test only ( Table 2). They all have high or unclear risk of selection and attrition bias. Pitre et al's three-session puppet show in a primary school reports positive change for the intervention group on the adapted Opinions about Mental Illness (OMI) scale, in 3/6 factors. 28 Robinson et al's study reports significant changes (compared with baseline and control) after their 2 h session on stigma and attitudes. 39 The studies of Rickwood et al 37 (one session intervention) and Conrad et al 38 (1-day intervention), do not present any data other than regression statistics, making their findings hard to assess.

No positive results
Some studies showed no significant changes at either posttest or follow-up. Saporito et al 34 was the only RCT, randomising at pupil level, although it is not clear what method of randomisation they used. There was no significant improvement in implicit or explicit attitudes to mentally ill people. Pinto-Foltz et al 40 carried out a cluster RCT with more low-risk scores than most of the other studies reviewed. They provided a one-session intervention and found no post-intervention difference in stigmatising attitudes. O'Kearney et al's 26 and O'Kearney's 27 studies of five online sessions (one in males, one in females) recorded results at 5 months. Attitudes (and depression literacy in the later study) were measured as secondary outcomes but showed no significant change. 7

Effective intervention design
To answer the second review question it is necessary to see whether there are any consistent features in the intervention programmes in those studies that show positive results. However, the comparison between the results of studies describing such different interventions and methodology is difficult. Chan et al 33 is the only example of a study investigating which aspect of a one-off session might offer the most benefit. The most improvement was seen in the group that had education (a 30 min lecture) followed by a 15 min video (rather than vice versa, or purely education).
Of the studies with positive results at follow-up there is no obvious pattern about what makes a successful intervention. These seven studies include two interventions of only one session and one of the longest interventions (over 10 weeks). Four had no element of contact, two direct, one indirect contact. The follow-up time at which the positive results were recorded ranged from 1 to 12 months. One study was in a primary school.

Discussion
Within the literature there are frequent references to the existing evidence for the effectiveness of school-based interventions in reducing stigma of mental illness in young people. This systematic review of available evidence does not support those statements. Showing a significant difference in self-report questionnaires immediately after an intervention seems unsurprising and, if that is the limit of the effect of the programme, seems insufficient grounds for rolling out the programme more widely. It is proposed here that a successful programme would show a positive change in outcomes at follow-up, which was the case in seven studies 25,[30][31][32][33]35,36 However, the potential for selection and attrition bias, which can exaggerate intervention effect, are common themes in all but one (Economou et al 32 ) of these studies.
There is one RCT and five cluster RCTs within this body of evidence. Only two of these showed statistically significant improvements in outcome measures at followup. Only one of the RCTs clearly described their randomisation process, making it difficult to judge the risk of selection bias in the others. Of the other study designs, Naylor et al's 29 study stands out as having a greater number of lowrisk scores. Small positive changes were seen in their knowledge measure but the validity of the tool used remains doubtful.
There is insufficient data to answer the review question concerning how one might design a successful intervention. Unfortunately, no elements were found to be consistent between the studies with positive results. In the absence of this evidence it is tempting to extrapolate from similar adult studies (summarised in a review as showing positive results 19 ). However, two papers present findings that caution against this. 'In our own voice' had positive results in adults but 'disappointing' results in adolescents 40 and a more recent meta-analysis of anti-stigma approaches reported that although 'contact' was better than education at reducing stigma in adults the reverse was true in adolescents. 21 Results from studies to date leave uncertainty as to whether interventions to reduce stigma in schools are not effective, whether interventions have been unsuccessful because they have not contained the right combination of elements or whether the studies have not been designed in such a way as to demonstrate efficacy.
Challenges in developing interventions include the need to assess different elements of programme content (contact, educational, etc.) and delivery style against each other. Information is also needed about whether targeting certain groups of children is more successful than universal provision. Indeed, not all students may need an intervention of this type. Only a third of pupils in a Scottish study reported moderate-high levels of stigma. 41 It is also unclear whether the primary-school age child would be more open to anti-stigma messages, as very few studies have been carried out in this age group.
It is proposed here that the starting assumption when developing an intervention is that it should be long enough and intensive enough to provide some effect at follow-up. The studies reviewed here do not agree on how long a successful intervention should be or at what interval to assess follow-up.
There are daunting issues for study designers to contend with in this field. Stigma is a multifaceted concept, and even well-established measures have their limitations (for example social distance scales not being validated against discriminatory or supportive behaviour 42 ). These measures are self-report questionnaires, which are at risk of social desirability bias (particularly, it could be argued, if done after an anti-stigma intervention). The absence of measures to examine change in behaviour after anti-stigma programmes has been recently commented on in a metaanalysis as regrettable. 21 Maybe resources need to be first directed towards refining age-appropriate measures more closely linked to actual behavioural outcomes. Adverse effects of an intervention also need to be monitored. Recruiting pupils within a school environment is also challenging. Recruitment difficulties in some of the studies described led to a need to actively recruit volunteers to the intervention arm, leading to problems with selection bias. It is also resource heavy to expose the control group to a different type of intervention -hence most of the controls in these studies were simply exposed to 'normal lessons'.
The protocol of a proposed UK-based, feasibility trial 43 tackles many of these methodological issues. This well-powered study plans to have an active and randomised control (describing adequate sequence generation and allocation), comparing education with education and contact, carefully prepared material already piloted, 2-week and 6-month follow-up, and will compare the intervention effect by baseline characteristics. 43 If this trial does not suffer from significant implementation and reporting difficulties the results will be the most definitive to date. This review shows that, although it is inherently attractive to believe that school-based interventions reduce stigma to mental illness in young people, there is currently no strong evidence to support this conclusion.