A systematic review of the psychometric properties, usability and clinical impacts of mobile mood-monitoring applications in young people

Background Mobile mood-monitoring applications are increasingly used by mental health providers, widely advocated within research, and a potentially effective method to engage young people. However, little is known about their efficacy and usability in young populations. Method A systematic review addressing three research questions focused on young people: (1) what are the psychometric properties of mobile mood-monitoring applications; (2) what is their usability; and (3) what are their positive and negative clinical impacts? Findings were synthesised narratively, study quality assessed and compared with evidence from adult studies. Results We reviewed 25 articles. Studies on the psychometric properties of mobile mood-monitoring applications were sparse, but indicate questionable to excellent internal consistency, moderate concurrent validity and good usability. Participation rates ranged from 30% to 99% across studies, and appeared to be affected by methodological factors (e.g. payments) and individual characteristics (e.g. IQ score). Mobile mood-monitoring applications are positively perceived by youth, may reduce depressive symptoms by increasing emotional awareness, and could aid in the detection of mental health and substance use problems. There was very limited evidence on potential negative impacts. Conclusions Evidence for the use of mood-monitoring applications in youth is promising but limited due to a lack of high-quality studies. Future work should explicate the effects of mobile mood-monitoring applications on effective self-regulation, clinical outcomes across disorders and young people's engagement with mental health services. Potential negative impacts in this population should also be investigated, as the adult literature suggests that application use could potentially increase negativity and depression symptoms.


Introduction
Mood is an affective dynamic, which naturally varies across time and contexts (Trull et al. 2015). Problems with regulating mood can play a key role in the development and trajectory of a range of psychopathologies (Paris, 2004;Crowell et al. 2009;Marwaha et al. 2015). Traditionally, mood has been assessed with retrospective measures (Trull et al. 2015). This can increase the risk of recall bias subsequently reducing accuracy (Schwartz et al. 1999;Reid et al. 2009). The relatively recent use of ecological momentary assessment (EMA) facilitates the real-time assessment of mood by collecting data on multiple occasions throughout the day (Wenze & Miller, 2010). Thus, it may be more suitable for understanding daily mood changes (Cristobal-Narvaez et al. 2016;van Knippenberg et al. 2016).
Various EMA techniques exist, ranging from paper-and-pencil to physiological assessment (Wenze & Miller, 2010) to digital data collection. A number of UK governmental reports (HM Government, 2011;Department of Health, 2013) highlight the benefits of digital tools and Information and Communications Technology (ICT) in aiding the objective, reliable assessment and care of mental health problems. With demand for mental health services outgrowing available resources (Department of Health, 2013), technology might relieve some of this pressure by providing remote resources that increase access to effective treatment while reducing clinician load.
Applications ('apps') offer great promise to young people who are disproportionately affected by mental illness or may struggle to engage with mental health services (Seko et al. 2014). Apps are delivered in a medium young people are familiar with. Figures from Ofcom (2015) indicate that 90% of youth between the ages of 16 and 24 own a smartphone, regardless of sociodemographic domain. Given this widespread ownership and apparent attachment to mobile technology (Ofcom, 2015), youths might feel more comfortable with assessments and treatments utilising mobile apps.
Studies included in these reviews provide some evidence for the psychometric properties, e.g. internal consistency (Palmier-Claus et al. 2012) and concurrent validity (Faurholt-Jepsen et al. 2014) of these apps. There is also evidence for usability (Bardram et al. 2013). Participation rates are generally high across studies sampling adults, ranging from 65% (Depp et al. 2015) to 88% , though Depp et al. (2012) reported much higher completion rates for paper and pencil compared with app measures (82.9% v. 42.1%). Evidence also suggests that apps may help people with mental health problems to monitor triggers (Bardram et al. 2013), that the capacity to convey experience can be therapeutic, and that apps could be a useful tool for improving patientclinician communication .
Less is known about the use of mental health apps, particularly mood-monitoring apps, in youth (10-24 years). A scoping review by Seko et al. (2014) suggested that mood-monitoring apps are positively perceived by youth (Matthews et al. 2008a), may improve treatment adherence (Matthews et al. 2008b) and possibly improve mental wellbeing . While intriguing, findings were preliminary due to the low quality of available evidence (NCCMH, 2014), the small number of studies on mood-monitoring apps specifically and the limited number of apps studied (n = 2) (NCCMH, 2014;Seko et al. 2014).
In summary, mood-monitoring apps offer a potentially important step change in the assessment of mood and delivery of youth mental health services. Despite this potential and the widespread advocacy for their use (e.g. Firth et al. 2016;Sandstrom et al. 2016a), there are no extant reviews examining the psychometric properties, usability and clinical impacts of mood-monitoring apps in young populations. Therefore, a systematic review was completed to address the following research questions: (1) what are the psychometric properties of mobile moodmonitoring apps; (2) what is their usability; (3) and what are their positive and negative clinical impacts among clinical and non-clinical youth populations? Our secondary aims were to frame our findings within the adult literature, and conduct a quality assessment to examine potential sources of bias.

Method
Following a scoping review, the authors developed the protocol delineating the planned methodology. The review was conducted in adherence to this protocol, and in line with the PRISMA statement (Moher et al. 2009).

Information sources and search strategy
The following sources were searched: Medline, EMBASE, PsycINFO, ProQuest Dissertations & Theses, ProQuest SciTech Collection, the Association for Computing Machinery (ACM) Guide to Computing Literature and Web of Science for articles published from 2008 [the year when the first app was launched (Donker et al. 2013)]. Search terms were informed by previous reviews (Seko et al. 2014), and modified following advice from a medical librarian and field experts. The search was conducted by combining five groups of terms (see online Supplementary Table S1) relating to: type of technology (e.g. 'mhealth'), type of assessment (e.g. 'ambulatory assessment'), mood-related outcome or problem (e.g. 'bipolar disorder'), youth population (e.g. 'youth'), usability/treatment-related outcomes and psychometric properties (e.g. 'reliability', 'validity'). We were interested in all forms of validity potentially examined in the app literature, e.g. concurrent, face or predictive , though we anticipated a paucity of studies due to the novelty of the field. We defined the 'usability' of mood-monitoring apps in accordance with the International Organisation for Standardisation (2001) definition of usability, i.e. 'the capability of the software product to be understood, learned, used and attractive to the user, when used under specified conditions'. Consistent with previous systematic reviews (Donker et al. 2013), we included young people's participation rates (i.e. compliance, response and completion) and how apps were perceived by youths (including their acceptabilityhow satisfied they were with the app, whether it could be used with ease) as markers of usability.
MD conducted a hand search of articles published in Cyberpsychology, Behavior and Social Network, the Journal of Medical Internet Research (JMIR), the JMIR Mental Health, and the JMIR mHealth and uHealth over the last 5 years. An additional search of the first 15 pages of Google Scholar was conducted (search terms 'mood ', 'phone', 'app' and 'monitoring'). Reference lists and in-text citations of relevant articles were inspected. Finally, subject experts were approached to identify additional articles.

Study selection
Inclusion criteria were: (1) Apps must have been developed for, and delivered through, mobile phones or smartphones; (2) Participants aged 10-24 years (consistent with the World Health Organisation's definition of young people; World Health Organisation, 1986); (3) Studies included published and unpublished research reported in the grey literature; (4) Studies must have been published in the English language; (5) Studies must have been published in 2008 or later; (6) Studies must have included community or clinical populations (to ensure the inclusion of sub-clinical youth, who may subsequently access care).

Screening procedure
Following removal of duplicates, MD and ML independently screened 100% of titles and abstracts for fulltext retrieval. MD assessed full-text articles against the inclusion criteria and extracted relevant data.

Quality assessment
MD evaluated the quality of included studies for potential risk of bias using Cochrane's risk of bias tool, in which studies are allocated a rating of high, low or unclear risk of bias (Higgins et al. 2011).

Data synthesis
Quantitative and qualitative data were synthesised narratively.

Study selection
A total of 1747 articles were identified in the initial search, and 19 from the hand search ( Fig. 1). Following removal of duplicates, 1176 abstracts were screened, 86 of which were selected for full-text retrieval. There was a high level of agreement between raters (κ = 0.90). In total, 64 articles were excluded following full-text review. Three additional articles were identified following inspection of included studies. Twenty-five articles were included in the final review. Table 1 outlines study methodology, the characteristics and features assessed in the studies, and main findings. Three studies reported on a randomised controlled trial (RCT): one was the primary RCT (Reid et al. 2011), and two reported secondary analyses with the same dataset Reid et al. 2013). The remaining studies were non-experimental or quasi-experimental. The search identified 19 published studies and six unpublished studies (four conference proceedings; two theses). The majority of studies (n = 16) were quantitative; the remaining nine employed mixed methods. Sample size ranged from 6 to 1 08 996 participants. Eight studies recruited healthy participants. Eleven studies recruited participants from clinical populations including youth with a range of mental health, emotional or behavioural problems, such as depression (n = 8), high-functioning autism/Asperger's disorder (n = 2) and substance or alcohol use (n = 1). The remaining six studies recruited participants from mixed populations comprising healthy, mentally ill or substance-using individuals. Mean ages across studies ranged from 10.95 to 23.7 years.

Study characteristics
Methods across studies varied greatly. For example, some studies lent participants a phone, whereas others let participants use their own device. Please see Table 1 for a description of the different data collection methods used in each study. As observed in the adult literature, terminology also varied greatly across studies (please see Usability section for more details).
Various apps were used, the most frequent of which was the 'Mobiletype' programme ). Mood outcomes were either direct mood assessments, or described mood-related constructs or behaviours (e.g. stress, hostility). Outcomes were monitored over variable time periods. The shortest period was 24 h (Bossmann et al. 2013), the longest 326 days (Matthews & Doherty, 2011). Monitoring schedules also varied, and could comprise hourly, daily or weekly monitoring, or requirements to complete measures a fixed number of times per day (with or without pre-specified time intervals). Reimbursements or incentives were available in 18 studies (e.g. payments, gift vouchers).

Psychometric properties of mood-monitoring apps
Nine studies reported on the reliability or validity of mood-monitoring apps.

Reliability
The internal consistency (correlation between items within a scale) was assessed in four studies (Dunton et al. 2011Huh et al. 2014;Ansell et al. 2015). As demonstrated in Table 2, levels ranged from questionable to excellent (George & Mallery, 2003).

Validity
Concurrent validity. Three studies examined concurrent validity (the correlation between an assessment and a previously validated assessment of the same construct). Concurrent validity was mostly moderate across studies (see Table 1). Khor et al. (2014a) compared relationships between participant and parent-reported data from the retrospective Responses to Stress Questionnaire (Connor-Smith et al. 2000) and mobile app data recording participants' responses to stress. In two studies of university students, Ben-Zeev et al. (2015) and Wang et al. (2014) compared momentary app and retrospective questionnaire data on perceived stress.
Face validity. Two studies described participants' views on the face validity of the 'Mobiletype' app (see Table 1 for numerical details). Reid et al. (2012), using a sample with various mental health problems, found that the app was relatively successful in capturing participants' feelings and current situation. Khor et al. (2014a), using To explore the effects of marijuana use on impulsivity and hostility in everyday life using smartphone-based EMA To assess the feasibility of smartphone-based EMA and recovery support ecological momentary interventions (EMI) via smartphones. The study also assessed the feasibility of using EMA and EMI to predict substance use in the following week   ○ A significant moderate to strong correlation for the 'involuntary engagement' factor: r = 0.70, p < 0.01; parent report: r = 0.48, p < 0.01 ○ A significant strong correlation for the 'primary control engagement coping' factor: r = 0.53, p < 0.05 • Face validity: ○ The face validity was measured by assessing how well the app captured participants' current situation, thoughts and feelings ○ The highest ratings were reported for the app's ability to capture participants' feelings (67%); followed by its ability to capture participants' current situation (63%); and finally its ability to measure participants' thoughts (50%) Usability: • Participation rate: participants responded to 61.8% of prompts ○ Note that a substantial proportion of participants gradually stopped responding throughout the study; while every participant completed at least one entry on the first day, completion rates reduced to 45% on day 14 ○ Also note that there was a significant positive correlation between full scale IQ and compliance rates (r = 0.46, p < 0.01) Clinical impacts: not studied/reported Khor et al.  Psychometric properties: • Face validity: ○ The face validity was measured by assessing how well the app captured participants' current situation, thoughts and feelings ○ The highest ratings were reported for the app's ability to capture participants' feelings (86%); followed by its ability to capture participants' current situation (83%); and finally its ability to measure participants' thoughts (57%) • Incentive/reimbursement: none Usability: • Participation rate: Participants completed 91% of entries in week 1 Clinical impacts: • Potential implications for assessment and management Sacco (2015) To examine the feasibility and utility of a smartphone app developed to assess five areas of functioning associated with depression  ) were perceived as irritating.
Participants suggested more varied survey questions (23%), fewer crashes, bugs or freezes (9%) and provided suggestions for novel technical features (13%) ○ Some participants also enjoyed the user-friendliness of the app (40%) and the pop-up-reminder feature (17%) Clinical impacts: • Potential implications for self-reflection on emotions or behaviours Scotti (2015) To assess the efficacy, acceptability and feasibility of the school-based Dialectical Behaviour Therapy skills group for the treatment of adolescent eating disorders and sub-diagnostic problematic eating behaviours  Usability: • Participation rate: response rates for participants who used own phones: 65%; response rates for participants who used study phones: 72% Clinical impacts: not studied/reported a The accessibility of mood-monitoring apps was assessed through a search of Google and three app stores (iTunes, Google Play and Microsoft store) in June 2016. b Please refer to Table 2 for coefficient values. c These studies utilised the same data. d These studies utilised the same data. e These studies utilised the same data. f These studies partly utilised the same data. g These studies utilised the same data. a sample with high-functioning autism and Asperger's found that the app was not quite as successful in these domains. In both studies, the apps were less successful in capturing participants' thoughts.

Participation rates
Twenty-one studies examined participation rates, which ranged from 30% to 99%. Average percentages were not computed in four studies. Instead, these studies described the mean number of diary entries per participant (Bossmann et al. 2013), between-group differences (Matthews et al. 2008b;Kauer et al. 2009), or evidence of ongoing compliance (Tregarthen et al. 2015). There was some indication that response rates were higher in studies with incentives. For example, Dennis et al. (2015) offered an incentive of $50 per week, and had a participation rate of 89% (see Table 1 for comparative rates and incentive details). Participation rates also appeared to be affected by response fatigue. In Reid et al. (2009), for instance, response rates decreased from 91% on day 1 to 67% on day 7. Finally, participation rates were potentially affected by sample-specific characteristics. In a study with high-functioning autistic participants, Khor et al. (2014a) found a significant positive correlation between full-scale IQ and compliance rates (r = 0.46, p < 0.01).

Participants' perceptions
Nine studies considered participants' perceptions of the apps. Three of these studies specifically referred to the 'acceptability' of apps. In Dennis et al. (2015), 95% of adolescents felt that the EMA app 'was not too long'. Tregarthen et al. (2015) measured app utilisation data as a proxy for acceptability. There were over 100 000 users over a 2-year period (with 89% using the application at least three times), which the authors interpreted as a demonstration of broad acceptability. While they did not define acceptability specifically, Reid et al. (2009) concluded that their app was 'acceptable' based on the data they captured (e.g. completion rates, participants' feedback).
Across studies, 93-100% of respondents found apps easy to learn or use (Dennis et al. 2015;Kenny et al. 2015;Sacco, 2015). In addition, participants rated apps as useful (Kenny et al. 2015), convenient, userfriendly (Bachmann et al. 2015), youth-friendly and non-invasive ). Despite these positive experiences, technological difficulties (e.g. software crashes, reduced battery life) were reported to negatively affect user experience and participation (Loventoft et al. 2012;Huh et al. 2014;Dennis et al. 2015;Sacco, 2015). Although most young people reported a preference for mobile phone mood charting in comparison to paper diaries (Matthews et al. 2008b), not all young people preferred mobile technology Scotti, 2015). Scotti (2015), e.g. found that several participants from a sub-diagnostic eating disorder sample favoured paper-and-pencil to track their data.

Mental health and awareness
Five (two were from the same RCT) studies examined potential clinical impacts of the apps. Reid et al. (2011) found a significant improvement in emotional selfawareness, but no significant improvements in depression, anxiety or stress scores in youth with mental health or emotional problems. In a secondary analysis of the same RCT, Kauer et al. (2012) reported an indirect association between app use and depression Note: O, Overall, WS, within-subject level, BS, between-subject level. Internal consistency coefficients values interpretation: '>0.9excellent, >0.8good, >0.7acceptable, >0.6questionable, >0.5poor and <0.5unacceptable' (George & Mallery, 2003, pp. 231). symptoms via increased emotional self-awareness. The app, however, did not significantly reduce rumination.
Qualitative feedback from two studies also suggested that mood-monitoring apps can help improve self-awareness (Kenny et al. 2015), and self-reflection on emotions or behaviours (Sacco, 2015).
Though they did not test this premise directly, Ansell et al. (2015) hypothesised that app-based monitoring could have promoted self-awareness in participants subsequently reducing (perceived) interpersonal hostility.
In Khor et al. (2014b), parents rated their children with high-functioning autism as showing fewer symptoms of behaviour and emotional problems following use of the self-monitoring app.

Treatment implications
Five studies reported results that could have implications for the prevention and treatment of mental health problems. Mobile app data gathered by Dennis et al. (2015) were used to identify high-risk groups for substance use, which could potentially help with relapse prevention. Crooke et al. (2013) suggested that moodmonitoring apps could help investigate adolescents' motivations for drinking, thus informing the development of interventions.
Qualitative feedback from therapists suggests that the use of mobile apps could help facilitate engagement with participants suffering from various mental health problems (Matthews & Doherty, 2011). Reid et al. (2012) reported that the Mobiletype app facilitated the assessment and management of youth mental health problems and reduced consultation time with paediatricians; the data captured enabled more individually focused consultations, which assisted in rapport building and communication.
In the third of a series of papers detailing their RCT, Reid et al. (2013) explored the potential treatment benefits of 'Mobiletype'. In comparison to the control programme, the app significantly increased general practitioners' (GPs) understanding of their patients' health and current functioning, and aided diagnoses, communication, medication and referrals. However, there was no significant effect on doctor's confidence, doctor-patient rapport or pathways to care.
Finally, in a conference paper by Loventoft et al. (2012), clinicians highlighted the usefulness of selfmonitoring when combined with therapy.

Quality assessment
Please see online Supplementary Fig. S1 for an overall depiction of the risk of bias domains across studies.
Risk of selection bias was difficult to assess in many studies, as they often lacked treatment, control or comparison groups. Three studies (all using the same RCT data) were deemed at low risk of selection bias due to a clear description of the randomisation and concealment allocation process (Reid et al. 2011Kauer et al. 2012). Two studies were at unclear risk of selection bias because randomised sequence generation and method of allocation concealment were not sufficiently described (Matthews et al. 2008b;Reid et al. 2009). One study was considered at high risk of selection bias (Scotti, 2015) as there was no random allocation process for the control condition.
Only the RCT study (three publications) addressed the blinding of participants and personnel, and was thus considered at low risk of performance bias (Reid et al. 2011Kauer et al. 2012). The risk of detection bias in these studies was unclear due to a lack of clarity on blinding of outcome assessments.
The risk of attrition bias was difficult to ascertain in three studies. In one study (Kenny et al. 2015), a number of participants were not included in the final sample due to restrictions on school access (no other information was available). Bossmann et al. (2013) excluded 15 participants from the final sample due to 'missing data', but did not provide further information, including whether any analyses were performed to address missing data. Reid et al. (2012) was considered at unclear risk of attrition bias, as there was no information on the participants (21%) lost to follow-up. The remaining studies appeared to be at low risk of attrition bias. There was insufficient information to assess the risk of reporting bias in all studies but those of the RCT, which addressed pre-specified outcomes and appeared to be at low risk (Reid et al. 2011Kauer et al. 2012). All studies appeared to be at unclear or high risk of other types of bias.

Discussion
The aim of this review was to summarise and evaluate evidence for the use of mobile mood-monitoring apps in young people (aged 10-24 years) from clinical and non-clinical populations. We specifically focused on psychometric properties, usability and clinical impacts.

Psychometric properties of mood-monitoring apps
Few studies assessed psychometric properties. There was limited evidence for reliability, with four studies demonstrating questionable to excellent levels of internal consistency. Studies examining concurrent (n = 3) and face (n = 2) validity were also sparse, making it difficult to draw firm conclusions. Face validity findings, e.g. could have been moderated by sample characteristics, e.g. reduced insight in participants with autism (Khor et al. 2014a).
The limited assessment of psychometric properties observed in the youth literature mirrors the adult literature. Evidence for concurrent validity in adult populations is inconclusive (Depp et al. 2012;Palmier-Claus et al. 2012;Faurholt-Jepsen et al. 2014). Inconsistent methodology across these studies, e.g. momentary (Depp et al. 2012) v. retrospective assessments (Faurholt-Jepsen et al. 2014), varying periods between the event and participants' recollection of the event (Palmier-Claus et al. 2012), likely contribute to variable findings. Previous evidence suggests that real-time mood measurement methods (e.g. EMA) only have a modest correlation with retrospective assessments, such as questionnaires (Ebner-Priemer & Trull, 2009). This leads to the conceptual question of whether retrospective measures are the most appropriate comparators when assessing the validity of mood-monitoring apps. Questionnaires measure an individual's retrospective view of their mood state over a number of days. While they are subject to recall bias, this bias incorporates other emotional processing (e.g. contexts) that the more instantaneous assessment of mood (e.g. EMA) may not capture, or at least as richly. Thus, the two assessment methods may be measuring different types of affective experience. As it is difficult to draw robust conclusions about the validity of apps using retrospective assessments, future studies should further examine psychometric properties using other sources of comparative data, e.g. active smartphone app data (i.e. app assessments) with passive sensor smartphone data (Nicholas et al. 2015;Sandstrom et al. 2016b), associations with clinical rating scales ).

Usability of mood-monitoring apps
The usability of mood-monitoring apps was more extensively studied, and overall studies suggest that apps are usable for young people. However, there were some within-and between-study differences in participants' perceptions of apps, and participation rates.
Generally, participation rates were lower in studies where participants had mental health difficulties (Reid et al. 2011;Kauer et al. 2012), problematic drinking patterns ) or autism spectrum disordersespecially those with lower IQ (Khor et al. 2014a). In particular, participation levels were low for those living without set routines ). This is an important consideration, as youths with mood-related problems, e.g. borderline personality disorder, often have disorganised daily routines (Fleischer et al. 2012). This suggests a need to tailor apps for different clinical populations ). Some studies indicated that incentives could positively influence participation rates (e.g. Ansell et al. 2015;Dennis et al. 2015). It may not be financially feasible to offer incentives in non-research settings. However, results tentatively suggest that participation rates may be better for mobile apps than traditional paper-based assessments irrespective of incentives (Matthews et al. 2008b). Participation rates for paperbased diaries are as low as 11% (Stone et al. 2003) compared with 30-99% for mood-monitoring apps in the current review. This supports that apps could lead to better adherence rates than non-digital assessment tools in young populations. Factors that could improve participation rates include the use of less intensive assessments (e.g. once-daily rather than multiple times), shorter assessments and the incorporation of staff monitoring or automatic reminders .
Studies from the adult literature are somewhat congruent in supporting the usability of mood-monitoring apps (Bardram et al. 2013), though evidence suggests that increasing age (e.g. 'middle age') may lower likelihood of mood-monitoring app use (Depp et al. 2012). Both adult ) and adolescent (Bradford & Rickwood, 2014) populations expressed some reservations about using apps due to the perceived risk of reduced personal contact .
Overall our review demonstrated that young people positively perceive apps ) and would be willing to use this technology in real-life settings (Kenny et al. 2015;Tregarthen et al. 2015). Very few studies considered clinician perspectives on moodmonitoring apps. Matthews & Doherty (2011) found that therapists' confidence with technology was the biggest barrier to the use of mood apps. More qualitative studies are now needed to further explore young peoples' (and clinicians') perceptions (Hollis et al. 2016) to broaden our understanding of factors pertinent to the uptake of mood-monitoring apps in real-life settings.

Positive and negative clinical impacts of mood-monitoring apps
Few of the included studies assessed the clinical impacts of the mood-monitoring apps. Although evidence was generally positive (e.g. facilitating assessment, management and GPs' understanding), most studies relied on subjective participant feedback (Sacco, 2015) rather than RCT methodology with objective outcome measures.
The preliminary evidence ) very tentatively suggests that electronic mood-monitoring apps could function as an intervention tool (Seko et al. 2014;Olff, 2015;Faurholt-Jepsen et al. 2016). Intriguingly, results from the one RCT indicated that mood-monitoring apps might reduce depression in youths by increasing their levels of emotional awareness . Similarly, though in a nonexperimental study, Khor et al. (2014b) reported that self-monitoring improved parent-reported behavioural and emotional problems in participants with autism. While these results are promising, they require replication and future studies may further explore the mechanisms via which apps could potentially impact on clinical outcomes. One possibility is that mood apps could have a positive impact on clinical symptoms due to patient/participant expectations regarding their benefits. This phenomenon, coined the digital placebo effect, is an overlooked area, which also merits future investigation .
We were unable to fully examine the potential negative impacts of mood-monitoring apps in youth populations, as they were not directly investigated in studies. However, Reid et al. (2009) found that participants did not always respond to questions truthfully to avoid having to answer further questions. Thus, this type of assessment could potentially lead to the inaccurate assessment (and treatment) of mental health problems.
A small number of adult studies report on the negative effects of mood-monitoring apps. There is some suggestion that apps may increase negative reactivity , increase focus on negative symptoms and thoughts , and potentially maintain depressive symptoms (Faurholt-Jepsen et al. 2015). Given the evidence from the adult literature, research on the possible harmful effects of app use in youths is needed before these tools are routinely used in clinical practice. Part of this endeavour should seek to identify the optimal balance between a monitoring schedule, which accurately captures affective dynamic processes, while minimising respondent workload (Bolger et al. 2003;Trull et al. 2015). This is particularly important, not only because it affects participation rates, but also because the responsibility of self-monitoring could impose a burden on young people (Shiffman et al. 2008), might result in unnecessary pressure (Lupton, 2013;Seko et al. 2014) and exacerbate mental health problems (Conner & Reid, 2012;Faurholt-Jepsen et al. 2015).
Future work may investigate potential ethical issues surrounding the use of mood-monitoring apps. For example, their use could lead to an over-reliance on technology in young populations, which could exacerbate mental health problems (Thomée et al. 2011). There could also be information security-related risks (e.g. digital theft) that could compromise confidentiality (Prentice & Dobson, 2014). Finally, youths could use apps as a replacement for treatment and health monitoring (Tregarthen et al. 2015). Considering the importance of the therapeutic alliance for successful treatment outcomes (Karver et al. 2006), the efficacy of smartphone apps could be reduced if they are used without clinicians' involvement (Prentice & Dobson, 2014).

Strengths and limitations
As far as we are aware, this is the first review to systematically examine and quality assess the evidence for the psychometric properties, usability and clinical outcomes of mood-monitoring apps in youth. However, our results should be considered through the lens of a number of limitations.
First, despite undertaking a comprehensive search, there were very few high-quality studies available for inclusion in the review. There was only one primary RCT highlighting the need for more trials on the efficacy of mood-monitoring apps in young people. Indeed, our quality assessment indicated that the majority of studies included some form of bias. For example, many studies were at high or unclear risk of sampling (e.g. self-selected samples) and attrition bias. This could have affected the generalisability of our findings or led to an overestimation of positive effects, e.g. our findings may only apply to individuals with less severe psychopathology who are more likely to engage with services.
Second, studies demonstrated a great variability in terminology (especially for implementation outcomes, e.g. acceptability) making interpretations and crossstudy comparisons difficult (inconsistent terminology is also a common feature of the adult app literature). For example, we found that 'acceptability' was defined very differently across studies, ranging from proxy markers, i.e. utilisation data (Tregarthen et al. 2015) to participants' experience of burden (Dennis et al. 2015). This highlights the need for more careful delineation and measurement of implementation outcomes in future work (Proctor et al. 2011).
Third, there were large variations in samples and methodologies, again making cross-study comparisons difficult and quantitative synthesis (i.e. meta-analysis) impossible. Thus, some of our conclusions remain tentative pending further rigorous, higher quality research (e.g. RCTs).
Fourth, it should be noted that studies in this review often used apps that were specifically developed for the study, and therefore not publically available through app platforms (e.g. iTunes). Thus, there is a need for more research to assess the evidence for apps that are freely downloaded and used by youth, and whether their use can be incorporated into clinical care (Nicholas et al. 2015).

Clinical and research implications
Mood-monitoring apps could potentially have positive effects in both clinical and sub-clinical youth populations. Indeed, mood-monitoring apps may help youth identify and address burgeoning mental health and substance use problems (Dennis et al. 2015), and possibly utilise more adaptive coping strategies . Further research is needed to examine the effects of these apps in samples with serious mental disorders, such as bipolar disorder (Grunerbl et al. 2015), borderline personality disorder (Lederer et al. 2014) and psychosis Palmier-Claus et al. 2014).
Evidence, though limited, suggests that moodmonitoring apps could potentially aid diagnosis and treatment decision-making . Future studies should explore whether this technology could aid in the assessment of disorders that can be difficult to differentiate [e.g. borderline personality disorder, bipolar disorder (Yen et al. 2015)] by providing rich data about the timing and extent of mood fluctuations.
As technological innovations have been endorsed at a government level, integrating mood-monitoring apps within mental health services may improve access and relieve some of the strain these services are currently experiencing [e.g. by improving access to mental health treatment (Department of Health, 2013)]. However, to date, the potential positive and negative impacts of apps have not been sufficiently investigated in youth.

Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0033291717001659.