Hostname: page-component-77f85d65b8-g4pgd Total loading time: 0 Render date: 2026-03-26T23:41:44.487Z Has data issue: false hasContentIssue false

Generalizable and scalable multistage biomedical concept normalization leveraging large language models

Published online by Cambridge University Press:  12 March 2025

Nicholas J. Dobbins*
Affiliation:
Biomedical Informatics and Data Science, Department of Medicine, Johns Hopkins University, Baltimore, MD, USA
*
Corresponding author: Nicholas J. Dobbins; Email: nic.dobbins@jhu.edu
Rights & Permissions [Opens in a new window]

Abstract

Background

Biomedical entity normalization is critical to biomedical research because the richness of free-text clinical data, such as progress notes, can often be fully leveraged only after translating words and phrases into structured and coded representations suitable for analysis. Large Language Models (LLMs), in turn, have shown great potential and high performance in a variety of natural language processing (NLP) tasks, but their application for normalization remains understudied.

Methods

We applied both proprietary and open-source LLMs in combination with several rule-based normalization systems commonly used in biomedical research. We used a two-step LLM integration approach, (1) using an LLM to generate alternative phrasings of a source utterance, and (2) to prune candidate UMLS concepts, using a variety of prompting methods. We measure results by $F_{\beta }$, where we favor recall over precision, and F1.

Results

We evaluated a total of 5,523 concept terms and text contexts from a publicly available dataset of human-annotated biomedical abstracts. Incorporating GPT-3.5-turbo increased overall $F_{\beta }$ and F1 in normalization systems +16.5 and +16.2 (OpenAI embeddings), +9.5 and +7.3 (MetaMapLite), +13.9 and +10.9 (QuickUMLS), and +10.5 and +10.3 (BM25), while the open-source Vicuna model achieved +20.2 and +21.7 (OpenAI embeddings), +10.8 and +12.2 (MetaMapLite), +14.7 and +15 (QuickUMLS), and +15.6 and +18.7 (BM25).

Conclusions

Existing general-purpose LLMs, both propriety and open-source, can be leveraged to greatly improve normalization performance using existing tools, with no fine-tuning.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NoDerivatives licence (https://creativecommons.org/licenses/by-nd/4.0), which permits re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of The Society for Research Synthesis Methodology
Figure 0

Figure 1 Diagram of our multi-stage normalization strategy.

Figure 1

Figure 2 Visual example of our Multiple Choice and Binary Choice prompting strategies.

Figure 2

Table 1 Results of our first experiments using LLMs to generate and normalize synonyms and alternative phrasings of an initial utterance

Figure 3

Table 2 Results of experiments to determine an optimal prompting strategy for concept pruning

Figure 4

Table 3 Results of our final experiment to apply our best-performing prompting strategy using all normalization systems and LLMs

Supplementary material: File

Dobbins supplementary material

Dobbins supplementary material
Download Dobbins supplementary material(File)
File 114.3 KB