Hostname: page-component-77f85d65b8-2tv5m Total loading time: 0 Render date: 2026-03-29T22:07:42.991Z Has data issue: false hasContentIssue false

Augmenting a Spanish clinical dataset for transformer-based linking of negations and their out-of-scope references

Published online by Cambridge University Press:  17 May 2024

Antonio Jesús Tamayo-Herrera*
Affiliation:
Centro de Investigación en Computación, Instituto Politécnico Nacional, Mexico City, Mexico
Diego A. Burgos
Affiliation:
Wake Forest University, Winston-Salem, NC, USA
Alexander Gelbukh
Affiliation:
Centro de Investigación en Computación, Instituto Politécnico Nacional, Mexico City, Mexico
*
Corresponding author: Antonio Jesús Tamayo-Herrera; Email: antonio.tamayo@udea.edu.co
Rights & Permissions [Opens in a new window]

Abstract

A negated statement consists of three main components: the negation cue, the negation scope, and the negation reference. The negation cue is the indicator of negation, while the negation scope defines the extent of the negation. The negation reference, which may or may not be within the negation scope, is the part of the statement being negated. Although there has been considerable research on the negation cue and scope, little attention has been given to identifying negation references outside the scope, even though they make up almost half of all negations. In this study, our goal is to identify out-of-scope references (OSRs) to restore the meaning of truncated negated statements identified by negation detection systems. To achieve this, we augment the largest available Spanish clinical dataset by adding annotations for OSRs. Additionally, we fine-tune five robust BERT-based models using transfer learning to address negation detection, uncertainty detection, and OSR identification and linking with their respective negation scopes. Our best model achieves state-of-the-art performance in negation detection while also establishing a competitive baseline for OSR identification (Macro F1 = 0.56) and linking (Macro F1 = 0.86). We support these findings with relevant statistics from the newly annotated dataset and an extensive review of existing literature.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-ShareAlike licence (http://creativecommons.org/licenses/by-sa/4.0/), which permits re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. Negated statement components.

Figure 1

Table 1. Scenarios of incomplete negated statements

Figure 2

Table 2. Examples of truncated negated statements

Figure 3

Figure 2. Systems performance by approach over time. Rule-based (upper left), machine learning-based (upper right), hybrid (lower left), and deep learning-based (lower right).

Figure 4

Table 3. Related works using rule-based approaches

Figure 5

Table 4. Related works using classic machine learning algorithms

Figure 6

Table 5. Related works based on hybrid approaches

Figure 7

Table 6. Related works using deep learning

Figure 8

Figure 3. Approach overlaps in publications over time.

Figure 9

Table 7. Spanish clinical/biomedical datasets

Figure 10

Table 8. Spanish datasets in other domains

Figure 11

Figure 4. Augmenting NUBES into NeRUBioS.

Figure 12

Table 9. NeRUBioS general statistics

Figure 13

Figure 5. Number of samples across NeRUBioS dataset partitions.

Figure 14

Table 10. NeRUBioS tagset

Figure 15

Figure 6. Negation label distribution.

Figure 16

Figure 7. Uncertainty label distribution.

Figure 17

Figure 8. Out-of-scope negation references across partitions.

Figure 18

Table 11. Typology and examples of OSRs

Figure 19

Figure 9. The model’s architecture.

Figure 20

Table 12. Hyperparameters used during training

Figure 21

Figure 10. Model performance by epochs for OSR detection on the development (left) and testing (right) partitions.

Figure 22

Table 13. Results for OSR detection

Figure 23

Figure 11. Obtained $k$ values (left) and Extrapolated F1 scores (right).

Figure 24

Figure 12. Model performance by epochs for negation cues on the development (left) and testing (right) partitions.

Figure 25

Figure 13. Model performance by epochs for negation scopes on the development (left) and testing (right) partitions.

Figure 26

Table 14. Results for negation detection

Figure 27

Figure 14. Model performance by epochs for uncertainty cues on the development (left) and testing (right) partitions.

Figure 28

Figure 15. Model performance by epochs for uncertainty scopes on the development (left) and testing (right) partitions.

Figure 29

Table 15. Results for uncertainty detection

Figure 30

Table 16. Comparison with works doing negation and uncertainty detection using the NUBES dataset

Figure 31

Figure 16. Comparison with state-of-the-art models.

Figure 32

Figure 17. Model performance by epochs for OSR detection and linking task on the development (left) and testing (right) partitions.

Figure 33

Table 17. Results for the OSR detection and linking task

Figure 34

Figure 18. Model performance by epochs for OSR detection and linking + uncertainty detection on the development (left) and testing (right) partitions.

Figure 35

Table 18. Overall results for the OSR detection and linking task + uncertainty detection

Figure 36

Table 19. Cross-lingual inference with mBERT