Skip to main content Accessibility help

Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model

  • R. H. Perlis (a1) (a2), D. V. Iosifescu (a1) (a3), V. M. Castro (a4), S. N. Murphy (a5), V. S. Gainer (a4), J. Minnier (a6), T. Cai (a6), S. Goryachev (a4), Q. Zeng (a7), P. J. Gallagher (a2), M. Fava (a1), J. B. Weilburg (a1), S. E. Churchill (a8), I. S. Kohane (a9) and J. W. Smoller (a2)...



Electronic medical records (EMR) provide a unique opportunity for efficient, large-scale clinical investigation in psychiatry. However, such studies will require development of tools to define treatment outcome.


Natural language processing (NLP) was applied to classify notes from 127 504 patients with a billing diagnosis of major depressive disorder, drawn from out-patient psychiatry practices affiliated with multiple, large New England hospitals. Classifications were compared with results using billing data (ICD-9 codes) alone and to a clinical gold standard based on chart review by a panel of senior clinicians. These cross-sectional classifications were then used to define longitudinal treatment outcomes, which were compared with a clinician-rated gold standard.


Models incorporating NLP were superior to those relying on billing data alone for classifying current mood state (area under receiver operating characteristic curve of 0.85–0.88 v. 0.54–0.55). When these cross-sectional visits were integrated to define longitudinal outcomes and incorporate treatment data, 15% of the cohort remitted with a single antidepressant treatment, while 13% were identified as failing to remit despite at least two antidepressant trials. Non-remitting patients were more likely to be non-Caucasian (p<0.001).


The application of bioinformatics tools such as NLP should enable accurate and efficient determination of longitudinal outcomes, enabling existing EMR data to be applied to clinical research, including biomarker investigations. Continued development will be required to better address moderators of outcome such as adherence and co-morbidity.


Corresponding author

*Address for correspondence: Dr R. H. Perlis, Simches Research Building, 185 Cambridge St, 6th Floor, Boston, MA 02114, USA (Email:


Hide All
Bates, DW, Evans, RS, Murff, H, Stetson, PD, Pizziferri, L, Hripcsak, G (2003). Detecting adverse events using information technology. Journal of the American Medical Informatics Association 10, 115128.
Bunea, F, She, Y, Ombao, H, Gongvatana, A, Devlin, K, Cohen, R (2011). Penalized least squares regression methods and applications to neuroimaging. Neuroimage 55, 15191527.
Charlson, M, Szatrowski, TP, Peterson, J, Gold, J (1994). Validation of a combined comorbidity index. Journal of Clinical Epidemiology 47, 12451251.
Charlson, ME, Pompei, P, Ales, KL, MacKenzie, CR (1987). A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. Journal of Chronic Diseases 40, 373383.
Effler, P, Ching-Lee, M, Bogard, A, Ieong, MC, Nekomoto, T, Jernigan, D (1999). Statewide system of electronic notifiable disease reporting from clinical laboratories: comparing automated reporting with conventional methods. Journal of the American Medical Association 282, 18451850.
Fava, M, Rush, AJ (2006). Current status of augmentation and combination treatments for major depressive disorder: a literature review and a proposal for a novel approach to improve practice. Psychotherapy and Psychosomatics 75, 139153.
Ferruci, D, Brown, E, Chu-Carroll, J, Fan, J, Gondek, D, Kalyanpur, A, Lally, A, Murdock, JW, Nyberg, E, Prager, J, Schlaefer, N, Welty, C (2010). Building Watson: an overview of the DeepQA project. Artificial Intelligence 5979.
Fournier, JC, DeRubeis, RJ, Hollon, SD, Dimidjian, S, Amsterdam, JD, Shelton, RC, Fawcett, J (2010). Antidepressant drug effects and depression severity: a patient-level meta-analysis. Journal of the American Medical Association 303, 4753.
Frank, E, Prien, RF, Jarrett, RB, Keller, MB, Kupfer, DJ, Lavori, PW, Rush, AJ, Weissman, MM (1991). Conceptualization and rationale for consensus definitions of terms in major depressive disorder. Remission, recovery, relapse, and recurrence. Archives of General Psychiatry 48, 851855.
Garfield, DA, Rapp, C, Evens, M (1992). Natural language processing in psychiatry. Artificial intelligence technology and psychopathology. Journal of Nervous and Mental Disease 180, 227237.
Gibson, TB, Jing, Y, Smith Carls, G, Kim, E, Bagalman, JE, Burton, WN, Tran, QV, Pikalov, A, Goetzel, RZ (2010). Cost burden of treatment resistance in patients with depression. American Journal of Managed Care 16, 370377.
Guy, W (1976). ECDEU Assessment Manual for Psychopharmacology: US Dept Health Education and Welfare publication (ADM), 76–338, pp. 218222. National Institute of Mental Health: Rockville, MD.
Jakobsen, K, Hansen, T, Dam, H, Larsen, E, Gether, U, Werge, T (2008). Reliability of clinical ICD-10 diagnoses among electroconvulsive therapy patients with chronic affective disorders. European Journal of Psychiatry 22, 167172.
Klompas, M, Haney, G, Church, D, Lazarus, R, Hou, X, Platt, R (2008). Automated identification of acute hepatitis B using electronic medical record data to facilitate public health surveillance. PLoS One 3, e2626.
Kroenke, K, Spitzer, RL, Williams, JB (2001). The PHQ-9: validity of a brief depression severity measure. Journal of General Internal Medicine 16, 606613.
Lazarus, R, Klompas, M, Campion, FX, McNabb, SJ, Hou, X, Daniel, J, Haney, G, DeMaria, A, Lenert, L, Platt, R (2009). Electronic support for public health: validated case finding and reporting for notifiable diseases using electronic medical data. Journal of the American Medical Informatics Association 16, 1824.
Levin, MA, Krol, M, Doshi, AM, Reich, DL (2007). Extraction and mapping of drug names from free text to a standardized nomenclature. AMIA Annual Symposium Proceedings, pp. 438442.
Meystre, S, Haug, P (2006 a). Improving the sensitivity of the problem list in an intensive care unit by using natural language processing. AMIA Annual Symposium Proceedings, pp. 554558.
Meystre, S, Haug, PJ (2006 b). Natural language processing to extract medical problems from electronic clinical documents: performance evaluation. Journal of Biomedical Informatics 39, 589599.
Murphy, SN, Mendis, ME, Hackett, K, Kuttan, R, Pan, W, Phillips, L, Gainer, VS, Berkowicz, D, Glaser, J, Kohane, IS, Chueh, H (2007). Architecture of the open-source clinical research chart from informatics for integrating biology and the bedside. AMIA Annual Symposium Proceedings, pp. 548552.
Nierenberg, AA, Husain, MM, Trivedi, MH, Fava, M, Warden, D, Wisniewski, SR, Miyahara, S, Rush, AJ (2010). Residual symptoms after remission of major depressive disorder with citalopram and risk of relapse: a STAR*D report. Psychological Medicine 40, 4150.
Papakostas, GI, Petersen, T, Pava, J, Masson, E, Worthington, JJ 3rd, Alpert, JE, Fava, M, Nierenberg, AA (2003). Hopelessness and suicidal ideation in outpatients with treatment-resistant depression: prevalence and impact on treatment outcome. Journal of Nervous and Mental Disease 191, 444449.
Penz, JF, Wilcox, AB, Hurdle, JF (2007). Automated identification of adverse events related to central venous catheters. Journal of Biomedical Informatics 40, 174182.
Pestian, JP, Matykiewicz, P, Grupp-Phelan, J, Lavanier, SA, Combs, J, Kowatch, R (2008). Using natural language processing to classify suicide notes. Annual Symposium Proceedings of the American Medical Informatics Association, 6 November 2008. Abstract 1091.
Rush, AJ, Kraemer, HC, Sackeim, HA, Fava, M, Trivedi, MH, Frank, E, Ninan, PT, Thase, ME, Gelenberg, AJ, Kupfer, DJ, Regier, DA, Rosenbaum, JF, Ray, O, Schatzberg, AF (2006). Report by the ACNP Task Force on response and remission in major depressive disorder. Neuropsychopharmacology 31, 18411853.
Rush, AJ, Thase, ME, Dube, S (2003 a). Research issues in the study of difficult-to-treat depression. Biological Psychiatry 53, 743753.
Rush, AJ, Trivedi, MH, Ibrahim, HM, Carmody, TJ, Arnow, B, Klein, DN, Markowitz, JC, Ninan, PT, Kornstein, S, Manber, R, Thase, ME, Kocsis, JH, Keller, MB (2003 b). The 16-Item Quick Inventory of Depressive Symptomatology (QIDS), clinician rating (QIDS-C), and self-report (QIDS-SR): a psychometric evaluation in patients with chronic major depression. Biological Psychiatry 54, 573583.
Simon, GE, Perlis, RH (2010). Personalized medicine for depression: can we match patients with treatments? American Journal of Psychiatry 167, 14451455.
Solti, I, Aaronson, B, Fletcher, G, Solti, M, Gennari, JH, Cooper, M, Payne, T (2008). Building an automated problem list based on natural language processing: lessons learned in the early phase of development. AMIA Annual Symposium Proceedings 687691.
Trivedi, MH, Fava, M, Wisniewski, SR, Thase, ME, Quitkin, F, Warden, D, Ritz, L, Nierenberg, AA, Lebowitz, BD, Biggs, MM, Luther, JF, Shores-Wilson, K, Rush, AJ (2006). Medication augmentation after the failure of SSRIs for depression. New England Journal of Medicine 354, 12431252.
Trivedi, MH, Rush, AJ, Ibrahim, HM, Carmody, TJ, Biggs, MM, Suppes, T, Crismon, ML, Shores-Wilson, K, Toprac, MG, Dennehy, EB, Witte, B, Kashner, TM (2004). The Inventory of Depressive Symptomatology, Clinician Rating (IDS-C) and Self-Report (IDS-SR), and the Quick Inventory of Depressive Symptomatology, Clinician Rating (QIDS-C) and Self-Report (QIDS-SR) in public sector patients with mood disorders: a psychometric evaluation. Psychological Medicine 34, 7382.
Turchin, A, Morin, L, Semere, LG, Kashyap, V, Palchuk, MB, Shubina, M, Chang, F, Li, Q (2006). Comparative evaluation of accuracy of extraction of medication information from narrative physician notes by commercial and academic natural language processing software packages. AMIA Annual Symposium Proceedings 789793.
Zeng, QT, Goryachev, S, Weiss, S, Sordo, M, Murphy, SN, Lazarus, R (2006). Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Medical Informatics and Decision Making 6, 30.
Zou, H (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101, 14181429.


Related content

Powered by UNSILO
Type Description Title
Supplementary materials

Perlis Supplementary Material
Perlis Supplementary Material

 Word (1.5 MB)
1.5 MB

Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model

  • R. H. Perlis (a1) (a2), D. V. Iosifescu (a1) (a3), V. M. Castro (a4), S. N. Murphy (a5), V. S. Gainer (a4), J. Minnier (a6), T. Cai (a6), S. Goryachev (a4), Q. Zeng (a7), P. J. Gallagher (a2), M. Fava (a1), J. B. Weilburg (a1), S. E. Churchill (a8), I. S. Kohane (a9) and J. W. Smoller (a2)...


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.