Skip to main content

Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model

  • R. H. Perlis (a1) (a2), D. V. Iosifescu (a1) (a3), V. M. Castro (a4), S. N. Murphy (a5), V. S. Gainer (a4), J. Minnier (a6), T. Cai (a6), S. Goryachev (a4), Q. Zeng (a7), P. J. Gallagher (a2), M. Fava (a1), J. B. Weilburg (a1), S. E. Churchill (a8), I. S. Kohane (a9) and J. W. Smoller (a2)...

Electronic medical records (EMR) provide a unique opportunity for efficient, large-scale clinical investigation in psychiatry. However, such studies will require development of tools to define treatment outcome.


Natural language processing (NLP) was applied to classify notes from 127 504 patients with a billing diagnosis of major depressive disorder, drawn from out-patient psychiatry practices affiliated with multiple, large New England hospitals. Classifications were compared with results using billing data (ICD-9 codes) alone and to a clinical gold standard based on chart review by a panel of senior clinicians. These cross-sectional classifications were then used to define longitudinal treatment outcomes, which were compared with a clinician-rated gold standard.


Models incorporating NLP were superior to those relying on billing data alone for classifying current mood state (area under receiver operating characteristic curve of 0.85–0.88 v. 0.54–0.55). When these cross-sectional visits were integrated to define longitudinal outcomes and incorporate treatment data, 15% of the cohort remitted with a single antidepressant treatment, while 13% were identified as failing to remit despite at least two antidepressant trials. Non-remitting patients were more likely to be non-Caucasian (p<0.001).


The application of bioinformatics tools such as NLP should enable accurate and efficient determination of longitudinal outcomes, enabling existing EMR data to be applied to clinical research, including biomarker investigations. Continued development will be required to better address moderators of outcome such as adherence and co-morbidity.

Corresponding author
*Address for correspondence: Dr R. H. Perlis, Simches Research Building, 185 Cambridge St, 6th Floor, Boston, MA 02114, USA (Email:
Hide All
Bates DW, Evans RS, Murff H, Stetson PD, Pizziferri L, Hripcsak G (2003). Detecting adverse events using information technology. Journal of the American Medical Informatics Association 10, 115128.
Bunea F, She Y, Ombao H, Gongvatana A, Devlin K, Cohen R (2011). Penalized least squares regression methods and applications to neuroimaging. Neuroimage 55, 15191527.
Charlson M, Szatrowski TP, Peterson J, Gold J (1994). Validation of a combined comorbidity index. Journal of Clinical Epidemiology 47, 12451251.
Charlson ME, Pompei P, Ales KL, MacKenzie CR (1987). A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. Journal of Chronic Diseases 40, 373383.
Effler P, Ching-Lee M, Bogard A, Ieong MC, Nekomoto T, Jernigan D (1999). Statewide system of electronic notifiable disease reporting from clinical laboratories: comparing automated reporting with conventional methods. Journal of the American Medical Association 282, 18451850.
Fava M, Rush AJ (2006). Current status of augmentation and combination treatments for major depressive disorder: a literature review and a proposal for a novel approach to improve practice. Psychotherapy and Psychosomatics 75, 139153.
Ferruci D, Brown E, Chu-Carroll J, Fan J, Gondek D, Kalyanpur A, Lally A, Murdock JW, Nyberg E, Prager J, Schlaefer N, Welty C (2010). Building Watson: an overview of the DeepQA project. Artificial Intelligence 5979.
Fournier JC, DeRubeis RJ, Hollon SD, Dimidjian S, Amsterdam JD, Shelton RC, Fawcett J (2010). Antidepressant drug effects and depression severity: a patient-level meta-analysis. Journal of the American Medical Association 303, 4753.
Frank E, Prien RF, Jarrett RB, Keller MB, Kupfer DJ, Lavori PW, Rush AJ, Weissman MM (1991). Conceptualization and rationale for consensus definitions of terms in major depressive disorder. Remission, recovery, relapse, and recurrence. Archives of General Psychiatry 48, 851855.
Garfield DA, Rapp C, Evens M (1992). Natural language processing in psychiatry. Artificial intelligence technology and psychopathology. Journal of Nervous and Mental Disease 180, 227237.
Gibson TB, Jing Y, Smith Carls G, Kim E, Bagalman JE, Burton WN, Tran QV, Pikalov A, Goetzel RZ (2010). Cost burden of treatment resistance in patients with depression. American Journal of Managed Care 16, 370377.
Guy W (1976). ECDEU Assessment Manual for Psychopharmacology: US Dept Health Education and Welfare publication (ADM), 76–338, pp. 218222. National Institute of Mental Health: Rockville, MD.
Jakobsen K, Hansen T, Dam H, Larsen E, Gether U, Werge T (2008). Reliability of clinical ICD-10 diagnoses among electroconvulsive therapy patients with chronic affective disorders. European Journal of Psychiatry 22, 167172.
Klompas M, Haney G, Church D, Lazarus R, Hou X, Platt R (2008). Automated identification of acute hepatitis B using electronic medical record data to facilitate public health surveillance. PLoS One 3, e2626.
Kroenke K, Spitzer RL, Williams JB (2001). The PHQ-9: validity of a brief depression severity measure. Journal of General Internal Medicine 16, 606613.
Lazarus R, Klompas M, Campion FX, McNabb SJ, Hou X, Daniel J, Haney G, DeMaria A, Lenert L, Platt R (2009). Electronic support for public health: validated case finding and reporting for notifiable diseases using electronic medical data. Journal of the American Medical Informatics Association 16, 1824.
Levin MA, Krol M, Doshi AM, Reich DL (2007). Extraction and mapping of drug names from free text to a standardized nomenclature. AMIA Annual Symposium Proceedings, pp. 438442.
Meystre S, Haug P (2006 a). Improving the sensitivity of the problem list in an intensive care unit by using natural language processing. AMIA Annual Symposium Proceedings, pp. 554558.
Meystre S, Haug PJ (2006 b). Natural language processing to extract medical problems from electronic clinical documents: performance evaluation. Journal of Biomedical Informatics 39, 589599.
Murphy SN, Mendis ME, Hackett K, Kuttan R, Pan W, Phillips L, Gainer VS, Berkowicz D, Glaser J, Kohane IS, Chueh H (2007). Architecture of the open-source clinical research chart from informatics for integrating biology and the bedside. AMIA Annual Symposium Proceedings, pp. 548552.
Nierenberg AA, Husain MM, Trivedi MH, Fava M, Warden D, Wisniewski SR, Miyahara S, Rush AJ (2010). Residual symptoms after remission of major depressive disorder with citalopram and risk of relapse: a STAR*D report. Psychological Medicine 40, 4150.
Papakostas GI, Petersen T, Pava J, Masson E, Worthington JJ 3rd, Alpert JE, Fava M, Nierenberg AA (2003). Hopelessness and suicidal ideation in outpatients with treatment-resistant depression: prevalence and impact on treatment outcome. Journal of Nervous and Mental Disease 191, 444449.
Penz JF, Wilcox AB, Hurdle JF (2007). Automated identification of adverse events related to central venous catheters. Journal of Biomedical Informatics 40, 174182.
Pestian JP, Matykiewicz P, Grupp-Phelan J, Lavanier SA, Combs J, Kowatch R (2008). Using natural language processing to classify suicide notes. Annual Symposium Proceedings of the American Medical Informatics Association, 6 November 2008. Abstract 1091.
Rush AJ, Kraemer HC, Sackeim HA, Fava M, Trivedi MH, Frank E, Ninan PT, Thase ME, Gelenberg AJ, Kupfer DJ, Regier DA, Rosenbaum JF, Ray O, Schatzberg AF (2006). Report by the ACNP Task Force on response and remission in major depressive disorder. Neuropsychopharmacology 31, 18411853.
Rush AJ, Thase ME, Dube S (2003 a). Research issues in the study of difficult-to-treat depression. Biological Psychiatry 53, 743753.
Rush AJ, Trivedi MH, Ibrahim HM, Carmody TJ, Arnow B, Klein DN, Markowitz JC, Ninan PT, Kornstein S, Manber R, Thase ME, Kocsis JH, Keller MB (2003 b). The 16-Item Quick Inventory of Depressive Symptomatology (QIDS), clinician rating (QIDS-C), and self-report (QIDS-SR): a psychometric evaluation in patients with chronic major depression. Biological Psychiatry 54, 573583.
Simon GE, Perlis RH (2010). Personalized medicine for depression: can we match patients with treatments? American Journal of Psychiatry 167, 14451455.
Solti I, Aaronson B, Fletcher G, Solti M, Gennari JH, Cooper M, Payne T (2008). Building an automated problem list based on natural language processing: lessons learned in the early phase of development. AMIA Annual Symposium Proceedings 687691.
Trivedi MH, Fava M, Wisniewski SR, Thase ME, Quitkin F, Warden D, Ritz L, Nierenberg AA, Lebowitz BD, Biggs MM, Luther JF, Shores-Wilson K, Rush AJ (2006). Medication augmentation after the failure of SSRIs for depression. New England Journal of Medicine 354, 12431252.
Trivedi MH, Rush AJ, Ibrahim HM, Carmody TJ, Biggs MM, Suppes T, Crismon ML, Shores-Wilson K, Toprac MG, Dennehy EB, Witte B, Kashner TM (2004). The Inventory of Depressive Symptomatology, Clinician Rating (IDS-C) and Self-Report (IDS-SR), and the Quick Inventory of Depressive Symptomatology, Clinician Rating (QIDS-C) and Self-Report (QIDS-SR) in public sector patients with mood disorders: a psychometric evaluation. Psychological Medicine 34, 7382.
Turchin A, Morin L, Semere LG, Kashyap V, Palchuk MB, Shubina M, Chang F, Li Q (2006). Comparative evaluation of accuracy of extraction of medication information from narrative physician notes by commercial and academic natural language processing software packages. AMIA Annual Symposium Proceedings 789793.
Zeng QT, Goryachev S, Weiss S, Sordo M, Murphy SN, Lazarus R (2006). Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Medical Informatics and Decision Making 6, 30.
Zou H (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101, 14181429.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Psychological Medicine
  • ISSN: 0033-2917
  • EISSN: 1469-8978
  • URL: /core/journals/psychological-medicine
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Type Description Title
Supplementary materials

Perlis Supplementary Material
Perlis Supplementary Material

 Word (1.5 MB)
1.5 MB


Altmetric attention score

Full text views

Total number of HTML views: 16
Total number of PDF views: 124 *
Loading metrics...

Abstract views

Total abstract views: 679 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 19th January 2018. This data will be updated every 24 hours.