Hostname: page-component-89b8bd64d-7zcd7 Total loading time: 0 Render date: 2026-05-08T03:48:26.424Z Has data issue: false hasContentIssue false

Supervised learning for the detection of negation and of its scope in French and Brazilian Portuguese biomedical corpora

Published online by Cambridge University Press:  30 June 2020

Clément Dalloux*
Affiliation:
Univ Rennes, Inria, CNRS, IRISA, Campus de Beaulieu, 263 Avenue Général Leclerc, 35042 Rennes, France
Vincent Claveau
Affiliation:
Univ Rennes, Inria, CNRS, IRISA, Campus de Beaulieu, 263 Avenue Général Leclerc, 35042 Rennes, France
Natalia Grabar
Affiliation:
UMR 8163 STL CNRS, Université de Lille, 59000 Lille, France
Lucas Emanuel Silva Oliveira
Affiliation:
Pontifícia Universidade Católica do Paraná (PUC-PR), R. Imac. Conceição, 1155 - Prado Velho, Curitiba - PR, 80215-901, Brazil
Claudia Maria Cabral Moro
Affiliation:
Pontifícia Universidade Católica do Paraná (PUC-PR), R. Imac. Conceição, 1155 - Prado Velho, Curitiba - PR, 80215-901, Brazil
Yohan Bonescki Gumiel
Affiliation:
Pontifícia Universidade Católica do Paraná (PUC-PR), R. Imac. Conceição, 1155 - Prado Velho, Curitiba - PR, 80215-901, Brazil
Deborah Ribeiro Carvalho
Affiliation:
Pontifícia Universidade Católica do Paraná (PUC-PR), R. Imac. Conceição, 1155 - Prado Velho, Curitiba - PR, 80215-901, Brazil
*
*Corresponding author. E-mail: clement.dalloux@irisa.fr
Rights & Permissions [Opens in a new window]

Abstract

Automatic detection of negated content is often a prerequisite in information extraction systems in various domains. In the biomedical domain especially, this task is important because negation plays an important role. In this work, two main contributions are proposed. First, we work with languages which have been poorly addressed up to now: Brazilian Portuguese and French. Thus, we developed new corpora for these two languages which have been manually annotated for marking up the negation cues and their scope. Second, we propose automatic methods based on supervised machine learning approaches for the automatic detection of negation marks and of their scopes. The methods show to be robust in both languages (Brazilian Portuguese and French) and in cross-domain (general and biomedical languages) contexts. The approach is also validated on English data from the state of the art: it yields very good results and outperforms other existing approaches. Besides, the application is accessible and usable online. We assume that, through these issues (new annotated corpora, application accessible online, and cross-domain robustness), the reproducibility of the results and the robustness of the NLP applications will be augmented.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2020. Published by Cambridge University Press
Figure 0

Figure 1. General overview of our work: from the development of annotated corpora in French and Brazilian Portuguese to the automatic detection of negation and its scope.

Figure 1

Table 1. Statistics of the BioScope corpora (Vincze et al.2008)

Figure 2

Table 2. Statistics on the four corpora created and annotated

Figure 3

Figure 2. Example of the summary from a clinical trial protocol in French.

Figure 4

Figure 3. Example of the detailed description from clinical trial protocol in French.

Figure 5

Figure 4. Example of the clinical case in French.

Figure 6

Figure 5. Example of a clinical protocol in Brazilian Portuguese.

Figure 7

Figure 6. Example of a clinical narrative in Brazilian Portuguese.

Figure 8

Table 3. Excerpts from the two French corpora. The columns contain linguistic information (lemmas, POS-tag), negation cues, and their scope

Figure 9

Figure 7. Our BiRNN uses LSTM or GRU cells and a softmax or CRF output layer.

Figure 10

Table 4. Results for the cue detection task on the four corpora. The results are given as Precision, Recall and F1-score. The best scores are in bold

Figure 11

Table 5. Results for the scope detection task. The results are given in terms of Precision, Recall and F1-score. The best scores are in bold

Figure 12

Figure 8. Learning curve for the BiLSTM-CRF, without pre-trained Word Embeddings.