Hostname: page-component-5db58dd55d-lqwgf Total loading time: 0 Render date: 2026-06-01T10:33:32.903Z Has data issue: false hasContentIssue false

Development of a natural language-processing application for LGBTQ+ status in mental health records

Published online by Cambridge University Press:  13 October 2025

Margaret Heslin*
Affiliation:
Institute of Psychiatry, Psychology & Neuroscience, King’s College London, UK
Jaya Chaturvedi
Affiliation:
Institute of Psychiatry, Psychology & Neuroscience, King’s College London, UK
Anne Marie Bonnici Mallia
Affiliation:
South London and Maudsley NHS Foundation Trust, London, UK
Ace Taaca
Affiliation:
Institute of Psychiatry, Psychology & Neuroscience, King’s College London, UK
Diogo Pontes
Affiliation:
South London and Maudsley NHS Foundation Trust, London, UK
Charvi Saraswat
Affiliation:
Institute of Psychiatry, Psychology & Neuroscience, King’s College London, UK South London and Maudsley NHS Foundation Trust, London, UK
Charlotte Woodhead
Affiliation:
Institute of Psychiatry, Psychology & Neuroscience, King’s College London, UK
Katharine A. Rimes
Affiliation:
Institute of Psychiatry, Psychology & Neuroscience, King’s College London, UK
David Chandran
Affiliation:
Institute of Psychiatry, Psychology & Neuroscience, King’s College London, UK
Jyoti Sanyal
Affiliation:
South London and Maudsley NHS Foundation Trust, London, UK
Ruimin Ma
Affiliation:
Institute of Psychiatry, Psychology & Neuroscience, King’s College London, UK
Robert Stewart
Affiliation:
Institute of Psychiatry, Psychology & Neuroscience, King’s College London, UK South London and Maudsley NHS Foundation Trust, London, UK
Angus Roberts
Affiliation:
Institute of Psychiatry, Psychology & Neuroscience, King’s College London, UK
*
Correspondence: Margaret Heslin. Email: Margaret.heslin@kcl.ac.uk
Rights & Permissions [Opens in a new window]

Abstract

Background

Lesbian, gay, bisexual, transgender, queer and related community (LGBTQ+) individuals have significantly increased risk for mental health problems. However, research on inequalities in LGBTQ+ mental healthcare is limited because LGBTQ+ status is usually only contained in unstructured, free-text sections of electronic health records.

Aims

This study investigated whether natural language processing (NLP), specifically the large language model, Bi-directional Encoder Representations from Transformers (BERT), can identify LGBTQ+ status from this unstructured text in mental health records.

Method

Using electronic health records from a large mental healthcare provider in south London, UK, relevant search terms were identified and a random sample of 10 000 strings extracted. Each string contained 100 characters either side of a search term. A BERT model was trained to classify LGBTQ+ status.

Results

Among 10 000 annotations, 14% (1449) confirmed LGBTQ+ status while 86% (8551) did not. These other categories included LGBTQ+ negative status, irrelevant annotations and unclear cases. The final BERT model, tested on 2000 annotations, achieved a precision of 0.95 (95% CI 0.93–0.98), a recall of 0.93 (95% CI 0.91–0.96) and an F1 score of 0.94 (95% CI 0.92–0.97).

Conclusion

LGBTQ+ status can be determined using this NLP application with a high success rate. The NLP application produced through this work has opened up mental health records to a variety of research questions involving LGBTQ+ status, and should be explored further. Additional work should aim to extend what has been done here by developing an application that can distinguish between different LGBTQ+ groups to examine inequalities between these groups.

Information

Type
Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Royal College of Psychiatrists
Figure 0

Fig. 1 Flow diagram of participants and annotations. LGBTQ+, Lesbian, gay, bisexual, transgender, queer and related community; SLaM, South London and Maudsley NHS Foundation Trust.

Figure 1

Table 1 Fine-tuning parameters for the BERT_base binary classification model

Figure 2

Table 2 Summary of the annotated categories

Figure 3

Table 3 Performance metrics (with 95% CI) of the BERT_base model on the test set (n = 2000)

Figure 4

Table 4 Demographics for those identified as LGBTQ+ in the whole Clinical Record Interactive Search (CRIS) database

Figure 5

Table 5 Demographics for individuals active in Clinical Record Interactive Search (CRIS) on the census date by those identified as LGBTQ+ and those not identified as LGBTQ+

Supplementary material: File

Heslin et al. supplementary material

Heslin et al. supplementary material
Download Heslin et al. supplementary material(File)
File 15.1 KB
Submit a response

eLetters

No eLetters have been published for this article.