Skip to main content
    • Aa
    • Aa
  • Get access
    Check if you have access via personal or institutional login
  • Cited by 31
  • Cited by
    This article has been cited by the following publications. This list is generated based on data provided by CrossRef.

    Stewart, R. 2014. The big case register. Acta Psychiatrica Scandinavica, p. n/a.

    Stewart, Robert and Davis, Katrina 2016. ‘Big data’ in mental health research: current status and emerging possibilities. Social Psychiatry and Psychiatric Epidemiology, Vol. 51, Issue. 8, p. 1055.

    Federer, Lisa 2014. Exploring New Roles for Librarians: The Research Informationist. Synthesis Lectures on Emerging Trends in Librarianship, Vol. 1, Issue. 2, p. 1.

    Warner, J.L. and Denny, J.C. 2016. Translational Immunology.

    Munafò, Marcus R. Zammit, Stanley and Flint, Jonathan 2014. Commentary: Response to commentary by Rutter on Munafo et al. (2014). Journal of Child Psychology and Psychiatry, Vol. 55, Issue. 10, p. 1105.

    Perlis, Roy H. 2014. USE OF LARGE DATA SETS AND THE FUTURE OF PERSONALIZED TREATMENT. Depression and Anxiety, Vol. 31, Issue. 11, p. 916.

    Jensen, Peter B. Jensen, Lars J. and Brunak, Søren 2012. Mining electronic health records: towards better research applications and clinical care. Nature Reviews Genetics, Vol. 13, Issue. 6, p. 395.

    Ananthakrishnan, Ashwin N. Cagan, Andrew Cai, Tianxi Gainer, Vivian S. Shaw, Stanley Y. Savova, Guergana Churchill, Susanne Karlson, Elizabeth W. Murphy, Shawn N. Liao, Katherine P. and Kohane, Isaac 2016. Identification of Nonresponse to Treatment Using Narrative Data in an Electronic Health Record Inflammatory Bowel Disease Cohort. Inflammatory Bowel Diseases, Vol. 22, Issue. 1, p. 151.

    Monteith, Scott Glenn, Tasha Geddes, John and Bauer, Michael 2015. Big data are coming to psychiatry: a general introduction. International Journal of Bipolar Disorders, Vol. 3, Issue. 1,

    Abbe, Adeline Grouin, Cyril Zweigenbaum, Pierre and Falissard, Bruno 2016. Text mining applications in psychiatry: a systematic literature review. International Journal of Methods in Psychiatric Research, Vol. 25, Issue. 2, p. 86.

    Cellucci, Leigh W. Cellucci, Tony Stanton, Marina Kerrigan, Dan and Madrake, Mary 2013. 2013 46th Hawaii International Conference on System Sciences. p. 2565.

    Camacho, Jhon Moreno, Socorro Suarez-Obando, Fernando Carlos Puyana, Juan and Gomez-Restrepo, Carlos 2013. El procesamiento de lenguaje natural y su relación con la investigación en salud mental. Revista Colombiana de Psiquiatría, Vol. 42, Issue. 2, p. 227.

    Carrell, D. S. Halgrim, S. Tran, D.-T. Buist, D. S. M. Chubak, J. Chapman, W. W. and Savova, G. 2014. Using Natural Language Processing to Improve Efficiency of Manual Chart Abstraction in Research: The Case of Breast Cancer Recurrence. American Journal of Epidemiology, Vol. 179, Issue. 6, p. 749.

    Kohane, Isaac S. 2015. An Autism Case History to Review the Systematic Analysis of Large-Scale Data to Refine the Diagnosis and Treatment of Neuropsychiatric Disorders. Biological Psychiatry, Vol. 77, Issue. 1, p. 59.

    Lin, Junji Jiao, Tianze Biskupiak, Joseph E and McAdam-Marx, Carrie 2013. Application of electronic medical record data for health outcomes research: a review of recent literature. Expert Review of Pharmacoeconomics & Outcomes Research, Vol. 13, Issue. 2, p. 191.

    Griffith, Sandra D. Thompson, Nicolas R. Rathore, Jaivir S. Jehi, Lara E. Tesar, George E. and Katzan, Irene L. 2015. Incorporating patient-reported outcome measures into the electronic health record for research: application using the Patient Health Questionnaire (PHQ-9). Quality of Life Research, Vol. 24, Issue. 2, p. 295.

    Davis, Mary F Sriram, Subramaniam Bush, William S Denny, Joshua C and Haines, Jonathan L 2013. Automated extraction of clinical traits of multiple sclerosis in electronic medical records. Journal of the American Medical Informatics Association, Vol. 20, Issue. e2, p. e334.

    McCoy, Thomas H. Snapper, Leslie Stern, Theodore A. and Perlis, Roy H. 2016. Underreporting of Delirium in Statewide Claims Data: Implications for Clinical Care and Predictive Modeling. Psychosomatics, Vol. 57, Issue. 5, p. 480.

    Figueroa, Rosa L Zeng-Treitler, Qing Ngo, Long H Goryachev, Sergey and Wiechmann, Eduardo P 2012. Active learning for clinical text classification: is it better than random sampling?. Journal of the American Medical Informatics Association, Vol. 19, Issue. 5, p. 809.

    McGrath, J. J. Mortensen, P. B. Visscher, P. M. and Wray, N. R. 2013. Where GWAS and Epidemiology Meet: Opportunities for the Simultaneous Study of Genetic and Environmental Risk Factors in Schizophrenia. Schizophrenia Bulletin, Vol. 39, Issue. 5, p. 955.


Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model

  • R. H. Perlis (a1) (a2), D. V. Iosifescu (a1) (a3), V. M. Castro (a4), S. N. Murphy (a5), V. S. Gainer (a4), J. Minnier (a6), T. Cai (a6), S. Goryachev (a4), Q. Zeng (a7), P. J. Gallagher (a2), M. Fava (a1), J. B. Weilburg (a1), S. E. Churchill (a8), I. S. Kohane (a9) and J. W. Smoller (a2)
  • DOI:
  • Published online: 20 June 2011

Electronic medical records (EMR) provide a unique opportunity for efficient, large-scale clinical investigation in psychiatry. However, such studies will require development of tools to define treatment outcome.


Natural language processing (NLP) was applied to classify notes from 127 504 patients with a billing diagnosis of major depressive disorder, drawn from out-patient psychiatry practices affiliated with multiple, large New England hospitals. Classifications were compared with results using billing data (ICD-9 codes) alone and to a clinical gold standard based on chart review by a panel of senior clinicians. These cross-sectional classifications were then used to define longitudinal treatment outcomes, which were compared with a clinician-rated gold standard.


Models incorporating NLP were superior to those relying on billing data alone for classifying current mood state (area under receiver operating characteristic curve of 0.85–0.88 v. 0.54–0.55). When these cross-sectional visits were integrated to define longitudinal outcomes and incorporate treatment data, 15% of the cohort remitted with a single antidepressant treatment, while 13% were identified as failing to remit despite at least two antidepressant trials. Non-remitting patients were more likely to be non-Caucasian (p<0.001).


The application of bioinformatics tools such as NLP should enable accurate and efficient determination of longitudinal outcomes, enabling existing EMR data to be applied to clinical research, including biomarker investigations. Continued development will be required to better address moderators of outcome such as adherence and co-morbidity.

Corresponding author
*Address for correspondence: Dr R. H. Perlis, Simches Research Building, 185 Cambridge St, 6th Floor, Boston, MA 02114, USA (Email:
Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

DW Bates , RS Evans , H Murff , PD Stetson , L Pizziferri , G Hripcsak (2003). Detecting adverse events using information technology. Journal of the American Medical Informatics Association 10, 115128.

F Bunea , Y She , H Ombao , A Gongvatana , K Devlin , R Cohen (2011). Penalized least squares regression methods and applications to neuroimaging. Neuroimage 55, 15191527.

M Charlson , TP Szatrowski , J Peterson , J Gold (1994). Validation of a combined comorbidity index. Journal of Clinical Epidemiology 47, 12451251.

ME Charlson , P Pompei , KL Ales , CR MacKenzie (1987). A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. Journal of Chronic Diseases 40, 373383.

P Effler , M Ching-Lee , A Bogard , MC Ieong , T Nekomoto , D Jernigan (1999). Statewide system of electronic notifiable disease reporting from clinical laboratories: comparing automated reporting with conventional methods. Journal of the American Medical Association 282, 18451850.

M Fava , AJ Rush (2006). Current status of augmentation and combination treatments for major depressive disorder: a literature review and a proposal for a novel approach to improve practice. Psychotherapy and Psychosomatics 75, 139153.

JC Fournier , RJ DeRubeis , SD Hollon , S Dimidjian , JD Amsterdam , RC Shelton , J Fawcett (2010). Antidepressant drug effects and depression severity: a patient-level meta-analysis. Journal of the American Medical Association 303, 4753.

E Frank , RF Prien , RB Jarrett , MB Keller , DJ Kupfer , PW Lavori , AJ Rush , MM Weissman (1991). Conceptualization and rationale for consensus definitions of terms in major depressive disorder. Remission, recovery, relapse, and recurrence. Archives of General Psychiatry 48, 851855.

DA Garfield , C Rapp , M Evens (1992). Natural language processing in psychiatry. Artificial intelligence technology and psychopathology. Journal of Nervous and Mental Disease 180, 227237.

M Klompas , G Haney , D Church , R Lazarus , X Hou , R Platt (2008). Automated identification of acute hepatitis B using electronic medical record data to facilitate public health surveillance. PLoS One 3, e2626.

K Kroenke , RL Spitzer , JB Williams (2001). The PHQ-9: validity of a brief depression severity measure. Journal of General Internal Medicine 16, 606613.

R Lazarus , M Klompas , FX Campion , SJ McNabb , X Hou , J Daniel , G Haney , A DeMaria , L Lenert , R Platt (2009). Electronic support for public health: validated case finding and reporting for notifiable diseases using electronic medical data. Journal of the American Medical Informatics Association 16, 1824.

S Meystre , PJ Haug (2006 b). Natural language processing to extract medical problems from electronic clinical documents: performance evaluation. Journal of Biomedical Informatics 39, 589599.

GI Papakostas , T Petersen , J Pava , E Masson , JJ Worthington 3rd, JE Alpert , M Fava , AA Nierenberg (2003). Hopelessness and suicidal ideation in outpatients with treatment-resistant depression: prevalence and impact on treatment outcome. Journal of Nervous and Mental Disease 191, 444449.

JF Penz , AB Wilcox , JF Hurdle (2007). Automated identification of adverse events related to central venous catheters. Journal of Biomedical Informatics 40, 174182.

AJ Rush , HC Kraemer , HA Sackeim , M Fava , MH Trivedi , E Frank , PT Ninan , ME Thase , AJ Gelenberg , DJ Kupfer , DA Regier , JF Rosenbaum , O Ray , AF Schatzberg (2006). Report by the ACNP Task Force on response and remission in major depressive disorder. Neuropsychopharmacology 31, 18411853.

AJ Rush , ME Thase , S Dube (2003 a). Research issues in the study of difficult-to-treat depression. Biological Psychiatry 53, 743753.

AJ Rush , MH Trivedi , HM Ibrahim , TJ Carmody , B Arnow , DN Klein , JC Markowitz , PT Ninan , S Kornstein , R Manber , ME Thase , JH Kocsis , MB Keller (2003 b). The 16-Item Quick Inventory of Depressive Symptomatology (QIDS), clinician rating (QIDS-C), and self-report (QIDS-SR): a psychometric evaluation in patients with chronic major depression. Biological Psychiatry 54, 573583.

GE Simon , RH Perlis (2010). Personalized medicine for depression: can we match patients with treatments? American Journal of Psychiatry 167, 14451455.

MH Trivedi , M Fava , SR Wisniewski , ME Thase , F Quitkin , D Warden , L Ritz , AA Nierenberg , BD Lebowitz , MM Biggs , JF Luther , K Shores-Wilson , AJ Rush (2006). Medication augmentation after the failure of SSRIs for depression. New England Journal of Medicine 354, 12431252.

MH Trivedi , AJ Rush , HM Ibrahim , TJ Carmody , MM Biggs , T Suppes , ML Crismon , K Shores-Wilson , MG Toprac , EB Dennehy , B Witte , TM Kashner (2004). The Inventory of Depressive Symptomatology, Clinician Rating (IDS-C) and Self-Report (IDS-SR), and the Quick Inventory of Depressive Symptomatology, Clinician Rating (QIDS-C) and Self-Report (QIDS-SR) in public sector patients with mood disorders: a psychometric evaluation. Psychological Medicine 34, 7382.

QT Zeng , S Goryachev , S Weiss , M Sordo , SN Murphy , R Lazarus (2006). Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Medical Informatics and Decision Making 6, 30.

H Zou (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101, 14181429.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Psychological Medicine
  • ISSN: 0033-2917
  • EISSN: 1469-8978
  • URL: /core/journals/psychological-medicine
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Type Description Title
Supplementary Materials

Perlis Supplementary Material
Perlis Supplementary Material

 Unknown (1.5 MB)
1.5 MB