Selective serotonin reuptake inhibitors are still harmful and ineffective. Responses to the comments by Hieronymus et al.

Abstract In this response, we address point by point the additional issues raised by Hieronymus et al. in their second round of critique of our systematic review on selective serotonin reuptake inhibitors for major depression. We repulse that we are biased or mistaken in any major ways. We acknowledge that we missed a few small, mostly unpublished trials, and we made a few minor errors in our systematic review. However, these omissions and errors neither have any impact on our overall results nor on our conclusions. The critique by Hieronymus et al. seems to raise questions about their understanding of the systematic review process, and, on several occasions, they wrongly claimed that we made errors. Our analyses should be impartial and free from any biases or prejudices as we do not have any obligation to support the interests of sponsors or other groups.

In Hieronymus et al.'s most recent critique (Hieronymus et al., 2018a), they criticised both our original systematic review (Jakobsen et al., 2017a) and our response (Katakam et al., 2018) to their earlier critique (Hieronymus et al., 2018b). We strongly disagree with their claim that we want to portray the selective serotonin reuptake inhibitors (SSRIs) as ineffective and harmful, and that we distorted and misquoted our own data, both in the BMC Psychiatry paper (Jakobsen et al., 2017a) and in the lay media (Jakobsen et al., 2017b; TV 2 (Denmark), 2017). Janus Christian Jakobsen has never claimed that we have shown SSRIs to enhance the risk of suicide in any of his appearances in media as alleged by the critiques (Hieronymus et al., 2018a). In fact, we think Hieronymus et al. misrepresented his statements and dedicated three paragraphs to criticise us (Hieronymus et al., 2018a). Janus Christian Jakobsen correctly claimed that we have shown SSRIs to enhance the risk of serious adverse events (SAEs) (Jakobsen et al., 2017a). He mentioned the terms suicide and death while giving examples of SAEs in his appearances in Scandinavian media. The direct translation of the sentence in Danish article published in videnskab.dk (Jakobsen et al., 2017b) cited by Hieronymus et al. regarding SAEs is 'The review shows with great certainty that SSRIs increase the risk of serious adverse events (death, suicide, hospital admission OR any other serious event that is harmful) : : : '.
Hieronymus et al. claim that we are reluctant to interpret and report our results in an impartial manner (Hieronymus et al., 2018a). As an example, they cited our statement 'the "true" effect of SSRIs might not even be statistically significant' even though our results showed a highly significant effect (p < 0.00001) of SSRIs versus placebo on the Hamilton Depression Rating Scale 17 (HDRS17) (Jakobsen et al., 2017a). We do not agree with this accusation. The above statement is made in the context of observance of a small mean SSRI-placebo difference of approximately two HDRS points. Our results showed that all the trials were at high risk of bias (Jakobsen et al., 2017a). It has repeatedly been shown that trials at high risk of bias tend to overestimate beneficial effects of interventions (Sutton et al., 2000, Hróbjartsson et al., 2012Savović et al., 2012;Lundh et al., 2017). Earlier studies revealed that a number of FDA-registered antidepressant trials with negative results simply reported as 'negative results' never provided actual effect sizes and were never published (Ioannidis, 2008;Turner et al., 2008). Hence, we think that the real 'true' difference between the two intervention groups might be even smaller than observed in our review or in fact non-existent (Jakobsen et al., 2017a). Hence, we justify our statement 'the "true" effect of SSRIs might not even be statistically significant'. We will not be able to assess the 'true' effect of SSRIs before we get adequately conducted and blinded randomised clinical trials comparing SSRIs versus comparable nocebos (active placebo) (Jakobsen et al., 2017a).
Hieronymus et al. wonder why we care to present these data and plan to spend time on updating our analysis as all the included trials are at high risk of bias (Hieronymus et al., 2018a).
We think that Hieronymus et al. do not show proper understanding of the systematic review process, methodology and principles. We have published a protocol (Jakobsen et al., 2013a) before we started working on the review, where we mentioned how we would proceed with our analyses. Systematic reviews should be updated to include new trials. Hieronymus et al. (2018a) cited a recent network meta-analysis on antidepressants by Cipriani et al. (2018), where only 9% of the trials were categorised as being at high risk of bias. We clarify that there is a fundamental difference in assessing the overall risk of bias between our review (Jakobsen et al., 2017a) and the review by Cipriani et al. (2018). In accordance with the current evidence (Sutton et al., 2000;Hróbjartsson et al., 2012Hróbjartsson et al., , 2013Hróbjartsson et al., , 2014Savović et al., 2012;Lundh et al., 2017), we classified a trial at 'low risk of bias', only if all of the bias domains (generation of allocation sequence, allocation concealment, blinding of study personnel and participants, blinding of outcome assessor, attrition, selective outcome reporting and other bias including sponsorship bias) were classified at 'low risk of bias'. If one or more of the bias domains were classified at 'unclear' or at 'high risk of bias', then the trial was classified at 'high risk of bias'. In contrast, the published protocol (Furukawa et al., 2016) of the Cipriani et al. review (Cipriani et al., 2018) states that a trial is classified at low risk of bias if none of the domains described above was rated at high risk of bias and three or less were rated at unclear risk; moderate risk of bias if one was rated at high risk of bias or none was rated at high risk of bias but four or more were rated at unclear risk; and all other trials were assumed at high risk of bias. This Cipriani et al. approach is a questionable way of assessing bias risks without support in current evidence (Sutton et al., 2000;Hróbjartsson et al., 2012Hróbjartsson et al., , 2013Hróbjartsson et al., , 2014Savović et al., 2012;Lundh et al., 2017). Moreover, this questionable approach is likely the reason for the fact that we and Cipriani et al. reach different assessments. We find our methodology more in line with the results of meta-epidemiological studies (Sutton et al., 2000;Hróbjartsson et al., 2012Hróbjartsson et al., , 2013Hróbjartsson et al., , 2014Savović et al., 2012;Lundh et al., 2017). In fact, there is a lot of criticism (Boesen et al., 2018;Gøtzsche, 2018;Moncrieff, 2018;Timimi et al., 2018;Warren, 2018;Whitaker, 2018) regarding the risk of bias assessment and the conclusions of Cipriani et al. network metaanalysis (Cipriani et al., 2018).
Hieronymus et al. feel that we should have mentioned in our review that using HDRS17 as a measure of effect markedly underrates SSRI-induced improvement (Hieronymus et al., 2018a). They claim that the clinical significance of the effect of SSRIs is likely to be considerably higher than the effect captured using HDRS17 (Hieronymus et al., 2018a). However, there is no valid evidence supporting their claim. Moreover, several international institutions recommend HDRS17 for the assessment of depression symptoms and most of the depression trials used HDRS17 in their assessment of efficacy of antidepressants ( -GCP, 1996) which all clinical trials ought to follow. This internationally accepted guideline clearly states that it is mandatory to consider all the SAEs occurring in a trial whether the events are associated with the treatment or not. It is always difficult to assess whether a given event is caused by the intervention. For example, a traffic accident might be caused by some of the several adverse effects SSRIs lead to. (ii) that our decisions on whether a certain adverse event should be categorised as serious or not were somewhat arbitrary. We strongly disagree. We clearly predefined (and used) the GCP definition (Jakobsen et al., 2017a). Nevertheless, the reporting of SAEs in most of the publications was very poor and incomplete. Therefore, we considered some events [e.g. Adamson et al. (2015), Claghorn et al. (1996)] as SAEs if they met the definition of the SAE according to ICH-GCP guidelines (ICH-GCP, 1996). (iii) our decisions regarding which treatment groups to include regarding Adamson et al. (2015) and Pettinati et al. (2010) and the extent of follow-up phases. We explicitly included trials comparing SSRIs versus no intervention, placebo, or 'active' placebo in our review (Jakobsen et al., 2013a). In the Adamson et al. trial (Adamson et al., 2015), there are two intervention groups: citalopram and placebo. Naltrexone was prescribed for both the intervention groups. Hence, we believe we did not make any mistake in including this trial. Please see below regarding the Pettinati et al. trial (Pettinati et al., 2010). Regarding the extent of follow-up phases, we planned to report results assessed both at end of treatment and at maximum follow-up. But due to very limited data at maximum follow-up (making selection bias likely), we only reported results at end of treatment (Jakobsen et al., 2017a). (iv) that trial reports often detail potential SAEs in the active treatment group (this being the issue of interest) while not providing corresponding information regarding similar events in patients on placebo (Wernicke et al., 1988;Claghorn et al., 1996;Feighner & Overo, 1999). Hieronymus et al. do not refer to any valid evidence supporting their claim, perhaps because the evidence does not exist. We do not agree that trial reports often detail potential SAEs in the active treatment group, and we wonder why Hieronymus et al. came to such a conclusion? One cannot evaluate events in an active intervention group without having similar unbiased assessments in the control groups. (v) that our SAE data from older trials must generally be interpreted with caution. We wonder why Hieronymus et al. think that older trials are different from newer trials regarding SAE data. In fact, many studies reveal that the reporting of adverse event data is still suboptimal even after introduction of some standards (e.g. CONSORT) for reporting of adverse events (Ioannidis, 2009;Haidich et al., 2011;Shukralla et al., 2011;Bagul & Kirkham, 2012;Smith et al., 2012;Hodkinson et al., 2013;Péron et al., 2013).
We do not agree with Hieronymus et al.'s claim that we are biased regarding reporting of 'high risk of publication bias' (Hieronymus et al., 2018a). Hieronymus et al. cited our exchange with the peer reviewers made public by BMC Psychiatry. Here we made a statement that our material was skewed in favour of SSRI-positive studies regarding 'high risk of publication bias' and presented significant outcome of an Egger test to back this up. When we found out that we made an error in our assessment, we withdrew our statement and the outcome of the Egger test in our final version of the manuscript (Jakobsen et al., 2017a). Such revisions are natural parts of any peer review process. We do not understand how it can be considered as bias. Our mistake was unintentional. Do Hieronymus et al. want us to make a statement which was found out to be an error during the peer reviewing process?
As explained above, we included trials comparing SSRIs versus no intervention, placebo or 'active' placebo in our review (Jakobsen et al., 2013a). We did not make any mistake in including the Ball et al. trial (Ball et al., 2014) with no placebo group, but we made a mistake in the reporting of groups; instead of reporting the aprepitant plus paroxetine group versus the aprepitant group, we reported the aprepitant plus paroxetine group versus the paroxetine group. However, this change does not noticeably change our results or conclusions (Table 1). Regarding the Pettinati et al. trial (Pettinati et al., 2010), we acknowledge that we only included two treatment groups (sertraline vs. placebo) but missed two other treatment groups (sertraline plus naltrexone vs. naltrexone). However, the updated analysis does not in any way change our results and our conclusions (Table 1).
Regarding omission of an escitalopram arm in the SCT-MD-01 trial (Forest Laboratories Inc, 2001), we think Hieronymus et al. (2018a) did not fully understand our methodology in our systematic review (Jakobsen et al., 2017a) and our explanation in our earlier response (Katakam et al., 2018). As the SCT-MD-01 trial (Forest Laboratories Inc, 2001) is a multi-group trial, we subdivided the trial into three experimental groups [escitalopram 10 mg -SCT-MD-01 (A); escitalopram 20 mg -SCT-MD-01 (B); and citalopram 40 mg -SCT-MD-01 (C)], and subdividing the placebo group into three groups to correspond to each of the experimental SSRI group. As there were only two SAEs in the placebo group, we randomly distributed these events to the SCT-MD-01 (B) and SCT-MD-01 groups (C). As there were no SAEs in both the SSRI group and the corresponding placebo group in comparison A, both were excluded and hence there is no question of inflating the apparent rate of SAEs as claimed by Hieronymus et al. (2018a).
Regarding exclusion of female-specific SAEs in the GSK/810 trial (GlaxoSmithKline, 2005e), we do not agree with Hieronymus et al.'s claim that we adopted opposite policies when extracting data from a similar GSK trial presenting separate sets of fatal and non-fatal SAEs (GlaxoSmithKline, 2005a,d). In those reports, though fatal and nonfatal SAEs were presented as separate sets, they were reported in the same table and they used double asterisk symbol (**) if different events occurred in the same patient. But in the case of the GSK/ 810 trial report (GlaxoSmithKline, 2005e), female-specific events were reported in a separate table, and hence it was not clear whether the same participants had any other SAEs that were reported in the main table in that report. Anyhow, there were two female-specific SAEs in the placebo group and one female-specific SAE in the SSRI group. Inclusion of these events did not in any way change our results and conclusions (OR 1.26, 95% confidence interval 1.03-1.53, p = 0.026).
Hieronymus et al. in their earlier critique (Hieronymus et al., 2018b) criticised that our review (Jakobsen et al., 2017a) was marred by many factual errors and inconsistencies. Though we acknowledged in our earlier response (Katakam et al., 2018) that they have identified some errors, we did not agree with Hieronymus et al. regarding several of the 'errors' they claim that we made. In their new critique, Hieronymus et al. claim that many errata that they listed in their earlier criticism were just examples, and to be seen as illustrations of a flawed process (Hieronymus et al., 2018a). It is strange and difficult to understand that even after our point to point explanation to their earlier critique, Hieronymus et al. stick to their earlier stand and still consider them as errors and describe the process as flawed. To illustrate that there were many errors in the review, Hieronymus et al. provided additional examples of mistakes in the Supplementary Material. We do not agree with most of their claims and our detailed explanation can be seen in our responses in the Supplementary Material. We think Hieronymus et al. wrongly considered several as errors, for example, Hieronymus et al. claim that we used pre-treatment values instead of post-treatment values in the trial by Jindal et al. (2003). This trial investigated the impact of sertraline on the sleep of depressed patients. In Table 1 of the manuscript by Jindal et al. (2003), they reported baseline parameters, and in Table 2 Hieronymus et al. also stressed that they are just illustrative samples that were identified upon their relatively cursory review, and anyone caring to take a closer look would probably find more to add to the errata list (Hieronymus et al., 2018a). We think that even after conducting our review and responses with rigor and impartiality, errors and inaccuracies (e.g. data entry errors and transposition errors) are bound to happen in systematic reviews Table 1. Summary of our results of selective reuptake inhibitors versus placebo or no intervention on serious adverse events before (Jakobsen et al., 2017a) and when the valid issues raised by Hieronymus et al. (Hieronymus et al., 2018a,b;Katakam et al., 2018) (Jakobsen et al., 2017a) of having missed several trials for which SAE data are readily available. In our response (Katakam et al., 2018), we expressed surprise over Hieronymus et al.'s conclusion that there was no significant difference between SSRI and placebo with respect to SAEs without including data from the missed trials. In their new critique (Hieronymus et al., 2018a), they justify their action saying that they repeated our analysis using the same trials that were included in our analysis (Jakobsen et al., 2017a) merely to demonstrate the lack of robustness in our results. Their justification is not impressive. We think that they seem more interested in proving our results as not robust than to investigate whether there was an actual association between SSRIs and occurrence of SAEs.
Hieronymus et al. claim that we denounced and discarded previous meta-analyses in this field citing 'not searching all relevant databases' as one reason. We do not agree with this claim as we only stated the limitations of previous meta-analyses, and we clarify that we searched all relevant databases. It is inevitable that systematic reviews miss few trials due to a variety of reasons, especially when searching for unpublished reports where it is not possible to perform a systematic search in databases, etc. Hieronymus et al. seem to ignore that when our review was published, we included more than twice the number of trials than any other previous meta-analysis or systematic review (Khan et al., 2002;Kirsch et al., 2008;Arroll et al., 2009;Fournier et al., 2010;Gibbons et al., 2012;Undurraga & Baldessarini, 2012).
Hieronymus et al. questioned how we managed to locate two Eli Lilly-sponsored studies, HMAQa (Eli Lilly, 2004a) and HMATb (Eli Lilly, 2004d), and seem to have missed two other studies, HMAQb (Eli Lilly, 2004b) and HMATa (Eli Lilly, 2004c), in the same repository. We hereby clarify that we searched the repositories of pharmaceutical companies that produce SSRIs (Jakobsen et al., 2013a), and when we searched the Eli Lilly repository with the search term 'fluoxetine' during our earlier search, we did not get any of the above studies. We were able to locate the Eli Lilly-sponsored trials, HMATb (Eli Lilly, 2004d) and HMAQa (Eli Lilly, 2004b) as we found the published papers (Goldstein et al., 2002(Goldstein et al., , 2004 of these two trials during our screening. We did not search the repositories of the pharmaceutical companies Pharmacia & Upjohn and Novartis as these companies do not produce any of the SSRIs. Hence, we could not identify the three unpublished trials of Pharmacia & Upjohn (2001a,b,c) and an unpublished trial of Novartis (Novartis, 2009). We acknowledge that we missed one GSK trial (GlaxoSmithKline, 2005b) but we do not agree with Hieronymus et al. that we have missed another GSK trial (Study No.: 29060/442) (GlaxoSmithKline, 2005f). We excluded this trial as the primary diagnosis is dysthymia concomitant with depression. In our review, we included only trials where the primary diagnosis was major depressive disorder (Jakobsen et al., 2013a). Hieronymus et al. claim that while we included one trial of a substance P antagonist that did not include a placebo arm (Ball et al., 2014), we missed four additional studies (reported in two publications) regarding the same drug that were actually both placebo-and paroxetine-controlled (Keller et al., 2006;Liu et al., 2008). During our initial searches, it was not considered possible to go through the full texts of large number of records obtained. First, we screened the records based on titles and abstracts. But for some records, we did not find abstracts in the database library and we only screened those records based on titles. In such instances, we only screened the full text if the title gave indication that the reported trial was related to SSRI and was a randomised trial. The titles 'Lack of efficacy of the substance P (neurokinin 1 receptor) antagonist aprepitant in the treatment of major depressive disorder' and 'Is bigger better for depression trials?' did not give any indication that they are randomised trials of SSRIs. Therefore, they were excluded. In our review, we only included the trials where the diagnosis of major depressive disorder was made based on one of the standardised criteria, such as ICD 10 (World Health Organization, 1993), DSM III (American Psychiatric Association, 1980), DSM III-R (American Psychiatric Association, 1987), DSM IV (American Psychiatric Association, 1994) or the Feighner criteria (Feighner et al., 1972). Some of the trials which Hieronymus et al. claim that we missed were excluded in our review because the reports did not mention how major depressive disorder was diagnosed (Massana, 1998;Eli Lilly, 2014). Hieronymus et al. made a mistake in claiming that we missed the Eyding et al. study (Eyding et al., 2010), which is actually a systematic review and does not report any trial. Regarding the trials NCT00636246 and NCT00406952 (Pfizer, 2008a,b), results were not reported for these studies at clinicaltrials.gov and there were no records when we searched the repository of Pfizer (https://www.pfizer.com/research/ research_clinical_trials/trial_results) with these NCT numbers. As all our searches were conducted before 2016, we could not identify the trial results of protocol CL3-01574-237 which were published in October 2016 (EudraCT, 2016). We acknowledge that we missed one published trial (Gastpar et al., 2006). It is unfortunate that we were unable to locate few trials, and we are thankful to Hieronymus et al. for drawing our attention to these missed trials. Nevertheless, we reiterate that Hieronymus et al. seem to ignore that when our review was published, we included more than twice the number of trials than any other previous meta-analyses and systematic review (Khan et al., 2002;Kirsch et al., 2008;Arroll et al., 2009;Fournier et al., 2010;Gibbons et al., 2012;Undurraga & Baldessarini, 2012).
We have now included the data from missed trials and performed HDRS17 efficacy analysis. Random-effects meta-analysis of the updated data revealed a mean difference between SSRIs versus no SSRIs of −2.06 points (95% CI −2.36 to −1.75; p < 0.00001), which is 0.12 HDRS17 points different compared with that of our published review (Jakobsen et al., 2017a). This has no clinical impact on the results or our conclusions of our systematic review.
We do not agree with Hieronymus et al. (2018a) when they claim that we confirmed deviating from our protocol in our response (Katakam et al., 2018), and we also do not agree with their statement 'deviating from the protocol is usually regarded as a felony of the gravest kind by Cochranists and treated it as lapse : : : '. We are in fact surprised by this statement as we have never deviated from the protocol. We have clearly mentioned in our protocol that 'We will undertake this meta-analysis according to the recommendations stated in The Cochrane Handbook for Systematic Reviews of Interventions (Higgins & Green, 2011)'. The Cochrane Handbook mentioned computational problems when no events are observed in one or both groups and suggests alternative non-fixed zero-cell corrections as explored by Sweeting et al. (2004). It is clearly stated in the Cochrane Handbook that in case of rare events ' : : : including a correction proportional to the reciprocal of the size of the contrasting study arm, which they found preferable to the fixed 0.5 correction when arm sizes were not balanced : : : '. Hence, we do not agree with Hieronymus et al.'s claim that we again failed to implement the procedure according to the recommendations of Sweeting et al. (2004) with regards to the use of reciprocal zero-cell correction. Hence, we reject the Hieronymus et al.'s statement that 'the results would not have been those that Jakobsen and co-workers must have hoped for'. Hieronymus et al. expressed surprise that we have refrained from using Sweeting's method for events that were even rarer than SAEs in general, such as individual adverse events including suicides, suicide attempts and suicidal ideation. We clarify that the data for these events were so limited that whatever method we use, there were no enough information to confirm or reject even very large effects of SSRIs.
We followed our protocol in assessing our results regarding the effect of SSRIs on occurrence of SAEs in all trials without accounting for age and reported a p value of 0.009 in our systematic review (Jakobsen et al., 2017a). We reanalysed the data and reported a p value of 0.002 in our response to Hieronymus et al.'s earlier critique (Hieronymus et al., 2018b) after correcting valid mistakes and inclusion of missed trials. An updated reanalysis after inclusion of data from trials that were reported missed by Hieronymus et al. in their recent critique (Hieronymus et al., 2018a) still confirms our earlier conclusions. SSRIs significantly increase the risk of an SAE, the p value now being 0.012.
Hieronymus et al. introduced subgroup analysis of SAE according to age groups in their earlier critique (Hieronymus et al., 2018b). We reported a p value of 0.045 for non-elderly group (Katakam et al., 2018). Our updated reanalysis revealed a p value of 0.22 in the non-elderly population. However, it must be stressed that such post hoc sensitivity analyses must only be regarded as hypothesis generating and can of course not change the overall results and conclusions! When analysing large data sets such as ours, problems with multiplicity will lead to several random errors caused by multiple comparisons. If Hieronymus et al. believe that SSRIs offer more benefit than harm in certain patient groups, then they must present valid evidence confirming this claim.
Hieronymus et al. in their recent critique (Hieronymus et al., 2018a) presented results from sensitivity analyses of four overlapping populations, but they excluded six trials which are deemed eligible by us from all their analyses citing different reasons. We do not agree with the exclusion of these trials for the following reasons. As reporting of SAEs was very poor, we considered events that fulfil the definition of SAE according to the ICH-GCP guidelines (ICH-GCP, 1996). In the Adamson et al. publication (Adamson et al., 2015), it is mentioned that two patients were unblinded during treatment period for suicidal ideation and severe abdominal cramps. We believe these events are SAEs as it is mentioned in the publication that these events needed unblinding of patients (Adamson et al., 2015). In the Claghorn et al. publication (Claghorn et al., 1996), it is mentioned that three fluvoxaminetreated patients had clinically significant electrocardiogram deteriorations which we considered as SAEs. Ravindran et al. (1995), under the heading 'Safety', clearly mentioned the word 'serious side effects' referring to Table 2 in their publication; hence we still believe that they are SAEs. Hence, we do not agree with Hieronymus et al.'s decision to exclude these three trials owing to 'not presenting SAEs and/or selectively presenting potential SAEs'. Hieronymus et al. excluded the Ball et al. trial (Ball et al., 2014) for not being placebo-controlled. But it is surprising that they included the comparison of naltrexone versus naltrexone plus sertraline from Pettinati et al. trial (Pettinati et al., 2010) in their recent analyses (Hieronymus et al., 2018a). As explained earlier, we explicitly included trials comparing SSRIs versus no intervention, placebo or 'active' placebo in our review, and hence we do not agree with the exclusion of the Ball et al. trial (Ball et al., 2014). Hieronymus et al. also excluded one trial for being partially uncontrolled (GlaxoSmithKline, 2005c). Although this exclusion can be debated (there were data on adverse events in the SSRI exposed as well as in placebo control for one centre and active control from another centre), we now also excluded this trial from our reanalysis. For the Mancino et al. trial (Mancino et al., 2014), Hieronymus et al. claim that there did not occur any SAEs in the relevant arms according to the study report on ClinicalTrials.gov (2011). We do not agree with Hieronymus et al. as in Fig. 1 of the publication Mancino et al. (2014), it is clearly mentioned that one person in the sertraline group is hospitalised. We think Hieronymus et al. are biased in this regard as they selectively picked report on ClinicalTrials.gov (2011) which suited their claim and ignored publication of the trial which reported the event. Though we do not agree with their results due to exclusion of several trials that are deemed eligible by us, it is surprising that Hieronymus et al. are still adamant to reconsider the conventional view regarding the tolerability of the SSRIs even after they found a significant difference in the elderly subgroup (p range: 0.007-0.011) for all four populations of our data.
We agree with Hieronymus et al., however, that there may still be some trials that have been overlooked as it is not possible to retrieve all the trials that have been conducted. Even if we find all the trials, there is no guarantee that we retrieve all SAE data as only around half the events appear in published journals (Hughes et al., 2014). Systematic reviewing is an ongoing process and updates to a systematic review are conducted at regular intervals to include missed and new trials, and the results are published after each update.
We have now conducted analyses for three different sets of data, namely (i) data from the trials that were included in our original review (Jakobsen et al., 2017a) after correcting an event in placebo group for the trial 99024 (Lundbeck, 2005) and including the SAE events in the comparison of naltrexone versus naltrexone plus sertraline in Pettinati et al. trial (Pettinati et al., 2010); (ii) the latter data plus data from the additional trials that were included in our earlier response (Katakam et al., 2018) plus exclusion of one trial (GlaxoSmithKline, 2005c) for being partially uncontrolled; and (iii) the latter data plus data from the trials that were reported missing by Hieronymus et al. (2018a) and judged eligible by us (Eli Lilly, 2004c;GlaxoSmithKline, 2005b;Gastpar et al., 2006;Keller et al., 2006;Pfizer, 2008c,d;Novartis, 2009). We did not consider the serious treatment-emergent signs and symptoms as SAEs in three unpublished trials of Pharmacia & Upjohn (2001a,b,c). The reanalysis of the data showed that association between SSRIs and SAEs is still significant for the full population (Table 1) and confirm our results presented in the original review (Jakobsen et al., 2017a). Moreover, the fact that there may be age strata without statistical significance regarding SAEs does not exclude that one should consider using such interventions merely based on the risks of the occurrence of SAEs (European Medicines Agency, 2017).
We acknowledge that Jakobsen and Gluud in an earlier publication regarding HDRS17 concluded that 'There seems to be a need for other more clinically relevant assessment methods' (Jakobsen et al., 2013b). We have never claimed that HDRS17 is the perfect scale. That was also the reason why we planned to include other depression rating scales (e.g. MADRS or BDI) in our review, and the results when using the other scales were and are very similar compared to the HDRS17 results, that is, effect sizes far below sensible thresholds for clinical significance. We focused on the HDRS17 scale as it is the most widely used depression rating scale and accepted internationally. Moreover, HDRS17 is the recommended depression rating scale used by the international psychiatric society. Hieronymus et al. claimed that in our review, we refrained from mentioning that 'the shortcomings marring the HDRS17 have been suggested to make the difference between active drug and placebo appear smaller than it actually is' (Hieronymus et al., 2018a). We emphasise that our objective of our systematic review was to assess the beneficial and harmful effects of SSRIs, and it was not to assess the psychometric validity of HDRS17. Furthermore, the results using all other scales show similar results and we have not identified any valid evidence confirming their claim.
We think that the efficacy data presented by our group confirmed previous reports, and we do not agree with Hieronymus et al.'s claim that we are mistaken when arguing that these results suggest the effect of SSRIs to be clinically insignificant. We think it is because they do not believe in the use of HDRS17 scale as a measure of effect and suggest using alternative measure like HDRS6. To support their claim, they cited an earlier study (Hieronymus et al., 2016) which is a patient-level post hoc analysis of 18 industrysponsored placebo-controlled trials of paroxetine, citalopram, sertraline or fluoxetine. The authors reported a standardised mean difference (SMD) effect size of −0.35 when using HDRS6 and −0.27 when using HDRS17 in favour of SSRI (Jakobsen et al., 2013b). Please note that an SMD of 0.35 is far below the National Institute for Clinical Excellence (NICE) threshold for clinical significance (0.5 SMD). Furthermore, if a standard deviation of 10 points is assumed, this difference corresponds to 0.8 HDRS17 points. If Hieronymus et al. think that we are mistaken regarding clinical significance, then we wonder how much SMD on the HDRS17 would they consider clinically significant? We used a mean difference of more than 3 points on the HDRS17 as clinically significant (Jakobsen et al., 2013a) as previously recommended by the NICE (Kirsch et al., 2008;Fournier et al., 2010;Mathews et al., 2015). In fact, earlier studies showed that a difference 3 points on the HDRS17 is considered as 'no clinical change' and cannot usually be detected by clinicians (Leucht et al., 2013;Moncrieff & Kirsch, 2015). There should be a minimum mean difference of 7 points or more to show a meaningful improvement (Moncrieff & Kirsch, 2015). Hence, we do not agree with Hieronymus et al.'s claim that 'not just the use of the HDRS17 as measure of effect, but also many other methodological problems marring antidepressant trials, can be expected to make SSRIs appear less effective than they actually are'. Keeping the discussion on relative merits of different scales aside, the high prevalence rates of depression (Lewer et al., 2015) decades after introduction of these drugs and enormous increase in the prescription rates of antidepressants (NHS digital, 2016) indicate that antidepressants are not as effective as they were thought to be.

Concluding remarks
We do not agree with Hieronymus et al.'s conclusion that there were inaccuracies, misleading statements and bias in our responses (Hieronymus et al., 2018a). However, we acknowledge that in our response to their earlier critique (Katakam et al., 2018), we were not able to clarify clearly some of the issues raised by Hieronymus et al. (2018b). We have now clarified our responses in detail in this present response. We do not agree with Hieronymus et al.'s claim that our analysis regarding treatment of hepatitis C is flawed, and we wonder why they judge our analysis as flawed? We also wonder how they can conclude that interest in Cochrane checklists and handbooks can never substitute for actual insight into the subject of study? In conclusion, after accepting Hieronymus et al.'s valid suggestions for amendments, our updated analyses confirm our previous findings and conclusions. The harmful effects of SSRIs seem to outweigh the minimal (or non-existing) beneficial effects that SSRIs might have. Absence of evidence for harmful effects in young adults is not a valid evidence for absence of harmful effects in this age segment considering that SSRIs seem to raise SAEs in children (Olfson et al., 2006;Sharma et al., 2016;Gøtzsche, 2017) as well as the elderly (Katakam et al., 2018). Regulatory guidelines do not request statistical significance before they consider advise against interventions (European Medicines Agency, 2017).