Sociodemographic and clinical characteristics in child and youth mental health; comparison of routine outcome measurements of an Australian and Dutch outpatient cohort

Aims Although of great value to understand the treatment results for mental health problems obtained in clinical practice, studies using naturalistic data from children and adolescents seeking clinical care because of complex mental health problems are limited. Cross-national comparison of naturalistic outcomes in this population is seldomly done. Although careful consideration is needed, such comparisons are likely to contribute to an open dialogue about cross-national differences and may stimulate service improvement. The aim of this observational study is to investigate clinical characteristics and outcomes in naturalistic cohorts of specialized child and adolescent mental health outpatient care in two different countries. Methods Routinely collected data from 2013 to 2018 of 2715 outpatients in the Greater Area of Brisbane, Australia (CYMHS) and 1158 outpatients in Leiden, the Netherlands (LUMC-Curium) were analysed. Demographics, clinical characteristics and severity of problems at start and end of treatment were described, using Children's Global Assessment Scale (CGAS), Health of the Nation Outcome Scales for Children and Adolescents (HoNOSCA) and the parental Strength and Difficulties Questionnaire (SDQ-P). Results Routine outcome measures (CGAS, HoNOSCA, SDQ-P) showed moderate to severe mental health problems at start of treatment, which improved significantly over time in both cohorts. Effect sizes ranged between 0.73-0.90 (CYMHS) and 0.57-0.76 (LUMC-Curium). While internalizing problems (mood disorder, anxiety disorder and stress-related disorder) were more prevalent at CYMHS, externalizing developmental problems (ADHD, autism) prevailed at LUMC-Curium. Comorbidity (>1 diagnosis on ICD10/DSM-IV) was relatively similar: 45% at CYMHS and 39 % at LUMC-Curium. In both countries, improvement of functioning was lowest for conduct disorder and highest for somatoform/conversion disorders and obsessive-compulsive disorders (OCD). Overall, 20-40% showed clinically significant improvement (shift from clinical-range at start to a non-clinical-range at the end of treatment), but nearly half of patients still experienced significant symptoms at discharge. Conclusions This large-scale outcome study showed both cohorts from Australia and the Netherlands improve during the course of treatment on clinician- and parent-reported measures. Although samples were situated within different contexts and differed in patient profiles, they showed similar trends in improvement per diagnostic group. While 20-40% showed clinically significant change, many patients experienced residual symptoms reflecting increased risk for negative outcome into adulthood. We emphasize cross-national comparison of naturalistic outcomes faces challenges, although it can similarly reveal trends in treatment outcome providing direction for future research: what factors determine discharge from specialized services; and how to improve current treatments in this severely affected population.


Introduction
Routine outcome measurement (ROM) data offer unique opportunities to study treatment outcomes in clinical practice, and can help to assess the real-world impact of mental health services for children and adolescents (youth). This is illustrated by studies using naturalistic data from specialist child and adolescent mental healthcare services (CAMHS), showing the proportion of patients with reliable improvement, recovery or deterioration (Burgess et al., 2015;Wolpert et al., 2016), and revealing specific subgroups of patients with greater risk of poor outcome (Garralda et al., 2000;Lundh et al., 2013;Murphy et al., 2015;Edbrooke-Childs et al., 2017). Naturalistic data are therefore undeniably necessary in addition to data derived from randomised clinical trials, which often have limited generalisability due to strict selection criteria (Rothwell, 2005;Van Noorden et al., 2014).
The availability of ROM-data in different countries enables to compare treatment outcomes on a cross-national level. To date, this has rarely been done. There are opportunities to compare, as countries such as Australia and the Netherlands have implemented the same ROM (CGAS, HoNOSCA and SDQ-P) across CAMHS. Since they also have similar levels of economic development, social and demographic profiles and health outcomes (Prins et al., 2011;World Health Organization, 2016;The CommonwealthFund et al., 2017), though a different healthcare system, it is interesting to compare mental health treatment outcomes. There have been few Australian reports on children with a broad range of complex problems showing improvement on a range of measures (Brann and Coleman, 2010;Burgess et al., 2015;Howe et al., 2017;Lu et al., 2021). In the Netherlands, large-scale outcome studies in general youth psychiatric outpatient care are surprisingly lacking.
Cross-country comparisons of child psychiatric outcomes are relevant because they can provide insights into how cultural and country-specific system-and individual level-factors impact mental health outcomes (Canino and Alegría, 2008;Ronis et al., 2017). Regarding Australia and the Netherlands, country-specific factors that might impact outcomes, include differences in the organisation and financing of CAMHS. In Australia, specialised CAMHS, such as CYMHS, are funded by public health insurance coverage, whereas other mental health care services are often privately funded, so come with sometimes substantial out-of-pocket costs (Callander et al., 2017). Due to a high volume of referrals, there is a high threshold of severity and complexity to receive treatment at CYMHS. In contrast, the Dutch government offers free provision of all types of youth mental healthcare, which is regulated at the community level (Hilverdink et al., 2015). Based on these systemlevel factors and our clinical experiences, we would expect a better access to CAMHS in the Netherlands with perhaps lower severity levels of problems in specialised services compared to Australia.
This study described clinical characteristics and outcomes of child psychiatric outpatient treatment in two countries (total sample n = 3873), as measured by symptom reduction (HoNOSCA, SDQ-P) and improvement of global functioning (CGAS). Analyses of cohorts focused on a profile of presenting patients, prevalence rates of diagnoses, completion rates and effect sizes of ROM, the proportion of patients with clinical significant change, and change of global functioning among diagnostic groups. Findings are discussed in the light of contextual factors of both organisations. The aim is to open up conversations about the observed similarities and differences in the context of two specialist tertiary CAMHS within Australia and the Netherlands. Thereby this study contributes to the countryspecific evidence on the effectiveness of outpatient treatments, and to insights on cross-cultural trends of treatment outcome of youth with complex mental health needs.

Settings
CYMHS at Children's Health Queensland Hospital and Health Service, Australia, and LUMC-Curium, the Netherlands, are tertiary level specialist services for youth (aged 2-18) with complex and severe mental health problems. They provide communityand hospital-based services in a catchment area comprising 6 00 000 youth across the Greater Brisbane and Pine River regions in Queensland (CYMHS) and 1 62 000 youth in the northern part of the province 'Zuid-Holland' (LUMC-Curium). They both work in the presence ofand in collaboration with other mental healthcare providers in the region. Both organisations do not focus on the population with intellectual disabilities, however, comorbid psychiatric symptoms of these patients may be treated. Patient-data and ROM-data (collected at baseline and at case reviews) were registered in online records. In Australia this collection process is regulated in specific guidelines (Burgess et al., 2012(Burgess et al., , 2015AMHOCN, 2019).

Subjects
For the current comparison study we followed the same procedure as described by Lu et al. (submitted) in the selection of patients, with the exception that we focused on a different timeframe, and that we included data from the Eating Disorder-team at CYMHS. These exceptions were made to make selections of CYMHS and LUMC-Curium more comparable.
In sum, all patients, aged 5−18 years, attending outpatient mental health services between 2013 and 2018, were selected ( Fig. 1). Data related to the first 'clinical episode' during the timeframe were described. A 'clinical episode' is defined as the total period of care (including all provided services) that is needed for the treatment of a patient across a continuum of care in an integrated system. Clinical episodes may comprise multiple 'episodes of care'. For example, at CYMHS, a change of setting (or service team) means a new 'episode of care' (AMHOCN, 2019). Within LUMC-Curium 'episodes of care' were registered following the DBC-structure (Diagnose-Behandel-Combinatie (Netherlands)). DBC-registrations span no longer than maximally one year and multiple consecutive DBC's could contribute to a continuous period of care. Therefore, in both samples 'episodes of care' have been combined into 'clinical episodes' of care for the patient. The start of a clinical episode is defined as 'admission to CAMHS, without any contact in the previous 3 months', the end as 'discharge from CAMHS, without contact in the following 3 months'. To be able to compare outcomes before and after outpatient treatment, we excluded clinical episodes of less than 30 days, youth with inpatient admissions during their clinical episode, and patients without any psychiatric diagnosis (no F-Code in ICD-10 or no Axis I/Axis II diagnosis in DSM-IV) (Fig. 1).
Outpatient care at CYMHS was provided by seven community teams and the Eating Disorder-team, of which 2715 patients were included. At LUMC-Curium outpatient care was provided by several specialised teams, of which 1158 patients were included (Table 1).

Measures
Demographic data included gender, age at start, duration of episode and country of birth. In Australia, 'indigenous status', is reported, referring to the first nation people of Australia, i.e., 'Aboriginal' or 'Torres Strait Islander' (Department of Social Services, 2015). Diagnoses were classified according to the International Statistical Classifications of Diseases 10th revision ICD-10 (WHO, 2016) at CYMHS, or the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV-TR) at LUMC-Curium. Similar diagnoses were combined into major groups (see Appendix A). Psychosocial stressors were registered as Z-codes in ICD-10, or Axis-IV codes in DSM-IV. Diagnostic evaluation (in multidisciplinary meetings) and collection of clinician-rated ROM (CGAS and HoNOSCA) is done by the health care professional responsible for the case. At LUMC-Curium these were well-qualified, certified senior clinician (child-and adolescent psychiatrist or clinical psychologist) with multiple years of clinical experience, at CYMHS also a social worker, psychologist or occupational therapist may have been responsible. Both at CYMHS and LUMC-Curium initial training on HoNOSCA and CGAS was offered to all clinicians. However, completion rates of the training are not available. We assume in both settings sometimes untrained raters completed questionnaires, although the majority of clinicians have been trained.
The CGAS is a clinician-rated measure, assessing the youths' lowest level of global functioning during previous months, rated on a scale of 1 (lowest functioning) to 100 (excellent functioning) (Shaffer et al., 1983). A score below 61 indicates definite pathology, a score below 71 probable pathology (Bird et al., 1990;Dyrborg et al., 2000). End-of-treatment scores up to 60 are associated with increased risk for negative outcome and adversities in early adulthood (Lundh et al., 2016).
The HoNOSCA is a clinician-rated measure, comprising 15 items assessing emotional and behavioural problems in youth during the previous two weeks (Gowers et al., 1997(Gowers et al., , 1999. The first 13 items are used to compute the total score (range 0-52). The last two items address problems with knowledge and understanding and are not reported in most studies. Clinicians score items on a 5-point scale (0-4), ranging from 'no problems' to 'severe problems', or score '9' if unknown. Ratings ⩾ 2 on any item indicate a clinically significant problem.
For the psychometric properties of measures, see Appendix B. Cross-national intraclass correlation coefficient (ICC) was 0.61 for CGAS, and 0.84 for HoNOSCA total score (Hanssen-Bauer et al., 2007). All three measures have shown to be valid and useful for international comparison (Woerner et al., 2004;Becker et al., 2006;Hanssen-Bauer et al., 2007), though evidence regarding the SDQ-P is less strong (Stevanovic et al., 2017). Because of the limited evidence for cross-cultural validity of SDQ-P (Stevanovic et al., 2017), multiple group confirmatory Bi-Factor analysis was applied to assess measurement invariance for the subscale-and total-scale scores at baseline (Rosseel, 2012). In conclusion, while the relative fit indices echoed the results by Stevanovic et al., the absolute fit indices provided sufficient evidence supporting descriptive comparisons of Australian and Dutch SDQ-P subscale-and total-scale scores.

Data analysis
Statistical analyses were conducted using SPSS Version 25 (SPSS Inc., Chicago, IL, USA). A p-value <0.05 is regarded as a statistically significant result. The NOCC criteria (AMHOCN, 2019) were used to select valid ROM-ratings. For HoNOSCA, a minimum of 11 of the first 13 items needed a valid rating (score 0-4). Scores of '9not known' were regarded as missing, which is suggested in guidelines (Gowers et al., 1997;Department of Health and Ageing, 2003). Missing data were excluded from calculations of the total score, which is equivalent as treating them as zero. For SDQ-P, at least three of the five items per domain needed a valid score. Multiple imputation (MI) was used to impute valid, though incomplete SDQ-P-data at CYMHS (3%) and LUMC-Curium (0%).
Statistical analyses concern within-country comparisons between patients' start-and end-ROM-scores. Start-and end-ROM-scores are defined as scores collected within 90 days of the start-or end-of-service date. For all three measures, relevance of improvement was assessed by calculating within-samples effect sizes (Cohen's d ) using the pooled standard deviation from the start-and end-of-treatment means, taking the correlation between means into account (Morris, 2008). In the group of patients with both start-and end-scores available (matched scores), a series of McNemar's tests is used on CGAS, HoNOSCA-items and SDQ-P(TDS), to report on the proportion of patients with clinical significant change. For an individual patient it means the patient shifts from a dysfunctional to a functional population (Wise, 2004). Note that clinical significance for

Epidemiology and Psychiatric Sciences
HoNOSCA was analysed on item level, because a criterion for clinical significance for the total score is lacking. Moreover, the reporting of separate items may better reflect important clinical change (Brann and Coleman, 2010;Boon et al., 2019). To get an impression of the overall change in symptoms on HoNOSCA, we examined change in the number of clinically significant items between start-and end-of-treatment, using a paired sample t-test.
In addition, clinical change was analysed for each diagnostic group. Hereby, we focused on CGAS, because CGAS had most complete data. Per diagnostic group, the proportion of patients scoring in the clinical range was reported for both cohorts. Differences in the proportions between start-and endof-treatment were tested on significance using a series of McNemar's tests.

Patient characteristics and diagnoses
Data show that populations were different considering age and gender distributions (Table 1). Most patients were attending for the first time at CYMHS (92%) and at LUMC-Curium (94%). Internalising problems (mood disorder, anxiety disorder, OCD) were more prevalent at CYMHS; externalising developmental problems (ADHD, autism) prevailed at LUMC-Curium (Table 2). Adjustment disorder, stress-related disorder and eating disorder were also more prevalent at CYMHS. Comorbidity (>1 diagnosis on ICD10 or DSM-IV) was present in 45% at CYMHS and 39% at LUMC-Curium (Table 1).
Matched start-and end-of-treatment scores were available for 1851 (68%) patients at CYMHS, and for 321 (28%) patients at LUMC-Curium. The mean number of clinically significant items per patient decreased from 5.3 (SD 2.0) at start to 3.4 (SD 2.5) at end-of-treatment at CYMHS ( p = 0.000), and from 4.4 (SD 2.0) to 3.1 (SD 2.3) at LUMC-Curium ( p = 0.000). A list of the HoNOSCA-items is found in Appendix C. At discharge, significantly less patients scored in the clinical range on all of the 13 HoNOSCA-items at CYMHS. At LUMC-Curium, also all items showed a significant difference, except items 4 and 11 (Fig. 2b). Besides the statistical significant difference on item-level, which might be influenced by the different sample sizes of cohorts, we looked at trends that were visible in both cohorts. At CYMHS and at LUMC-Curium the most common problem faced by presenting outpatients regarded emotional and related symptoms (item 9), which was reported by 95% (CYMHS) and 82% (LUMC-Curium) of patients. Other prevalent symptoms included relational problems with peers (item 10) or family (item 12); this was reported by 64 and 81%, respectively, at CYMHS, and 58 and 58%, respectively, at LUMC-Curium. Although for a significant proportion of patients these issues resolved after treatment, they remained the most frequently reported problems upon discharge. A substantial difference between cohorts was the initial scoring on problems with selfinjury (item 3), which was 31% at CYMHS compared to 7% at LUMC-Curium.

SDQ-P
In total, 3380 SDQ-P at CYMHS and 1198 at LUMC-Curium were completed over time, originating from 2067 (76%) patients and 499 (43%) patients, respectively. Parental completion rates (SDQ-P) were lower compared to the clinician-rated measurements (Table 3). Patients significantly improved in both Epidemiology and Psychiatric Sciences cohorts, with effect sizes of 0.73 at CYMHS and 0.61 at LUMC-Curium. Matched start-and end-of-treatment scores were available for 413 (15%) patients at CYMHS, and 236 (20%) at LUMC-Curium. Twenty-nine percent of patients at CYMHS and 23% at LUMC-Curium, improved from a clinical TDS-score at start, to a non-clinical TDS-score at end of treatment (Fig. 2c). However, according to parents, still 38% at CYMHS, and 50% at LUMC-Curium experienced significant problems at discharge. Hence, in both countries the parent ratings on the SDQ-P corroborated the clinician ratings on CGAS and HoNOSCA.
Outcome per diagnostic group (Table 4) Exploration concentrated on the proportions of patients showing clinical relevant change on CGAS. At CYMHS and at LUMC-Curium, the largest ΔCGAS was seen in patients with somatoform, conversion or dissociative disorders and OCD. 1 The lowest ΔCGAS was seen in patients with conduct disorder. In both samples, the groups of intellectual disabilities and conduct disorder had highest proportion of patients (70-80%)   scoring in the clinical range (<61) at end of treatment. In the groups of mood disorders and anxiety disorders a higher proportion of patients showed clinical relevant change (40-50%) compared to the groups of ASD, ADHD or 'Emotional, social and behavioural disorders, childhood/adolescence onset' (25-35%). Again, these trends were seen in both cohorts.

Discussion
This study investigated clinical characteristics and treatment outcomes of child psychiatric outpatient care in Australia and the Netherlands. The results reveal both organisations differed in patient profiles and prevalence rates of diagnostic groups. However, similar trends were found. In both countries, patients presented with moderate-to-severe problems and showed significant improvement over time. Improvement was clinically relevant in 20-40% of patients, dependent on diagnostic group, and reported by both clinician (CGAS, HoNOSCA) and parents (SDQ-P). A significant proportion of patients experienced residual symptoms. The similarities observed between the countries and organisations align with outcomes in prior studies of outpatients with severe and complex mental health problems. First, severity of problems at start-of-treatment was high in both cohorts (79-93 and 66-72% in clinical range on CGAS and SDQ-P, respectively), reflecting the specialised settings, similar to other specialised CAMHS (Garralda et al., 2000;Becker et al., 2006;Lundh et al., 2013;Howe et al., 2017;Vugteveen et al., 2018). Second, the treatment outcome between 20 and 40% improving from dysfunctional to a functional level of symptoms were comparable to other studies in this population, in which the percentages range from 20 to 50% (Brann and Coleman, 2010;Lundh et al., 2016;Wolpert et al., 2016;Howe et al., 2017). This is in line with the impression that response to treatments varies little across cultures (Canino and Alegría, 2008). Third, gender patterns of psychiatric disorders were similar in both samples and consistent with findings from epidemiology studies, reporting girls have increased risks for mood-and other internalising disorders with increasing age, and boys are more often diagnosed with ADHD and conduct disorders (Cohen et al., 1993;Garland et al., 2001;Patel et al., 2007). Fourth, patients with conduct disorder improved less after treatment and together with the group with intellectual disabilities (treated for their comorbid psychiatric symptoms) they showed most residual symptoms compared to other diagnostic groups. In addition, in both samples, lower proportions of patients with clinical relevant change were found for externalising developmental problems (ADHD, autism) compared to internalising disorders. These trends might be explained by the more chronic and persistent course of externalising disorders (Wittchen et al., 2000;Roy et al., 2016;Ormel et al., 2017). Nevertheless, this may draw our attention to the specific subgroups in risk of poorer outcomes, to optimise their treatments and to make their problems as manageable as possible. The finding is in line with previous studies in which children with autism and mental retardation seemed more at risk for poor outcome (Lundh et al., 2013;Edbrooke-Childs et al., 2017).
In contrast to our expectations, both cohorts showed similar levels of severity at start. Based on the differences in accessibility of the service, we expected that youth attending CYMHS would have higher initial scores. Furthermore, a substantial part of patients experienced significant problems at the end of treatment: 37% at CYMHS and 50-60% at LUMC-Curium. The following factors could play a role in both countries. In the Netherlands, the government has decentralised and transformed the youth care system since 2014, aiming to put more effort in prevention, and to reduce the use of specialised care (Bosscher, 2014;Hilverdink et al., 2015). Consequently, specialised settings are pressured to refer to universal services if possible, in order to reduce treatment time and costs. In Australia, the high inflow of acute cases in specialised settings is putting pressure on the outflow of patients (Lu et al., 2021). Once symptoms have improved, patients are possibly referred to other services in Australia, which often require out-of-pocket costs. Although we can only speculate, these organisational factors in both countries might have contributed to the discharge (or transition) of patients with subclinical or residual symptoms. However, transitioning of services may increase the risk for drop-out, and clinical scores at discharge reflect risk for persistence of symptoms into adulthood (Lundh et al., 2016). We recognise that it may not be realistic to treat all patients until scores are in the non-clinical range, as some youth will experience enduring and chronic problems. Nevertheless, recovery should be the aim of treatment and should not be hindered by the financial climate.
In the presence of similar trends, both cohorts show different patient profiles which can possibly be explained by contextual factors of mental health organisation. First, in Australia treatment of youth with ADHD or autism is for the most part done by general or developmental paediatricians, while in the Netherlands the far majority of these patients are seen by child and adolescents psychiatrists. As a result, the percentage of youth with ADHD or autism was three times lower at CYMHS than at LUMC-Curium, and youth with internalising problems prevailed. Consequently, at CYMHS the proportion of boys is lower and the mean age at intake is higher. At LUMC-Curium the prevalence rate of eating disorders, adjustment disorders and trauma-and stress-related disorders were lower than at CYMHS. For eating disorders, this difference is possibly explained by the presence of another CAMHS-organisation specialised in eating disorders in the same region as LUMC-Curium. For adjustment disorder, it is likely due to a difference in insurance coverage, where treatment for this disorder is covered in Australia, but not covered in the Netherlands. Regarding stress-related disorders, the difference might be related to the 'acute stress disorder', which is regularly diagnosed at CYMHS, but is lacking at LUMC-Curium. This probably also is due to financial issues in the Netherlands. Second, the lower scores on SDQ-P at LUMC-Curium were expected because population-based studies showed lower normscores for the Dutch population (Theunissen et al., 2016), compared to the Australian population in which the spread of scores is in line with the UK norms (Mellor, 2005). At last, as the difference in duration of treatment between samples might represent an actual difference, we cannot exclude the contribution of methodological factors such as different methods of registration, more regular evaluations at CYMHS, or other system-level or individual-level factors.
Overall these findings point to the importance of including contextual factors, such as organisation of healthcare providers, the geographic availability of services, insurance coverage and methods of registration, as a background when interpreting results. The challenge for future research is to incorporate these factors in their models.
Although studying clinical naturalistic data carries undeniable relevance, this study bears some noteworthy limitations. The high degree of missing data and the lower level of completion rates of ROM (especially for the SDQ-P) makes data possibly biased (Wolpert and Rutter, 2018). More effort is needed in collecting higher completion rates, to achieve better generalisability. Further, in both naturalistic settings, no data were available on interrater-reliability within these organisations. Nevertheless, when carefully interpreting these data, the use in the context of complex adaptive mental health systems, can support dialogue on service improvement and learning (Wolpert and Rutter, 2018). Furthermore, results in change of ROM should be interpreted with care, because changes could have occurred naturally or may be due to factors as regression to the mean or other environmental factors influencing a child's outcome. In addition, access to 'start' CGAS ratings might have biased the clinician, when rating the 'end' CGAS. However, differences in CGAS change were seen among diagnostic groups, which moreover emerged in both countries. This suggests clinicians involved nuanced ratings, rather than simply rated patients as improved. Furthermore, findings were corroborated by a parent-report measure, which lessened the risk of clinician bias. Unfortunately, we were not able to include data on the type of intervention, although it is a factor associated with outcome. It will be of interest for future studies, to look into differences in treatment provision related to outcome for specific populations.
Unfortunately, the data about number of visits, or number of direct time (treatment minutes) turned out to be incomparable. Future studies conducted with data from clinical practice are advised to pay more attention to collecting data about the intensity of treatment in a standardised way, so that data can be better compared. In sum, these data have great potential to broaden the understanding of complex and severe mental health problems in children and adolescents. The present observational study revealed substantial improvement in both cohorts of CYMHS (AUS) and LUMC-Curium (NL) and similar trends in outcome across countries. However, many patients experienced residual symptoms at discharge, which increases the risk for impairment of functioning into adulthood. These data guide future research to further investigate what factors influence discharge from specialised services, and how to improve current treatments in this severely affected population.