Hostname: page-component-77f85d65b8-45ctf Total loading time: 0 Render date: 2026-03-29T14:00:33.133Z Has data issue: false hasContentIssue false

Automatic generation of lexica for sentiment polarity shifters

Published online by Cambridge University Press:  09 July 2020

Marc Schulder*
Affiliation:
Universität des Saarlandes, Sprach- & Signalverarbeitung, C7 1, 66123 Saarbrücken, Germany Institut für Deutsche Gebärdensprache, Gorch-Fock-Wall 7, 20354 Hamburg, Germany
Michael Wiegand
Affiliation:
Universität des Saarlandes, Sprach- & Signalverarbeitung, C7 1, 66123 Saarbrücken, Germany Institut für Deutsche Sprache, R 5, 6-13, 68161 Mannheim, Germany
Josef Ruppenhofer
Affiliation:
Institut für Deutsche Sprache, R 5, 6-13, 68161 Mannheim, Germany
*
*Corresponding author. E-mail: marc.schulder@uni-hamburg.de
Rights & Permissions [Opens in a new window]

Abstract

Alleviating pain is good and abandoning hope is bad. We instinctively understand how words like alleviate and abandon affect the polarity of a phrase, inverting or weakening it. When these words are content words, such as verbs, nouns, and adjectives, we refer to them as polarity shifters. Shifters are a frequent occurrence in human language and an important part of successfully modeling negation in sentiment analysis; yet research on negation modeling has focused almost exclusively on a small handful of closed-class negation words, such as not, no, and without. A major reason for this is that shifters are far more lexically diverse than negation words, but no resources exist to help identify them. We seek to remedy this lack of shifter resources by introducing a large lexicon of polarity shifters that covers English verbs, nouns, and adjectives. Creating the lexicon entirely by hand would be prohibitively expensive. Instead, we develop a bootstrapping approach that combines automatic classification with human verification to ensure the high quality of our lexicon while reducing annotation costs by over 70%. Our approach leverages a number of linguistic insights; while some features are based on textual patterns, others use semantic resources or syntactic relatedness. The created lexicon is evaluated both on a polarity shifter gold standard and on a polarity classification task.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s) 2020. Published by Cambridge University Press
Figure 0

Fig. 1. Workflow for creating the polarity shifter lexicon.

Figure 1

Table 1. Distribution of polarity shifters in gold standard. For each part of speech, a random sample of 2000 words was taken from WordNet.

Figure 2

Table 2. Shifter distribution in the Subjectivity Lexicon (Wilson et al. 2005).

Figure 3

Table 3. Analysis of task-specific features (Section 4.2) for the classification of verbs. Features generate a ranked list of potential shifters. Best results are depicted in bold.

Figure 4

Table 4. Classification of polarity shifters for individual parts of speech. SVM features are grouped as task-specific (T), generic (G), and VerbLex (V). The evaluation is run as a 10-fold cross validation. All metrics are macro-averages.

Figure 5

Fig. 2. Learning curves for supervised training. This repeats the evaluation of Section 5.2 but reduces the amount of training data. At 90% training data, this task is identical to the one reported in Table 4.

Figure 6

Fig. 3. Bootstrapping of shifters that were not part of the gold standard (compare Table 1). Each bar represents the number of words that a classifier predicted to be shifters, separated by how many of them are actually shifters (true positives) and how many are misclassified non-shifters (false positives).

Figure 7

Fig. 4. Evaluation of the bootstrapping of shifters that were not part of the gold standard (see Section 3.1). SVM classifiers provide a confidence value for each label they assign. We split the set of potential shifters into quarters, sorting them from highest (Q1) to lowest (Q4) confidence. We report precision for each quarter.

Figure 8

Table 5. Result of the lexicon generation workflow outlined in Figure 1. The complete lexicon contains both the gold and bootstrap lexica.

Figure 9

Table 6. Annotation example for the sentiment analysis impact evaluation. The annotator determines the polarities of polar noun and verb phrase given the sentence context. If the polarities differ, the shifting label is “shifted” else it is “not shifted”.

Figure 10

Table 7. Classifier performance for sentiment analysis task of determining whether shifting occurs between a polar noun and the VP that contains it (see Section 7.1).

Figure 11

Table 8. Comparison between lemma- and sense-level shifter lexica on the sentiment analysis task (see Section 7.1). SENSE$_{\text{first}}$ assigns the first WordNet sense to each verb. SENSE$_{\text{oracle}}$ always chooses an appropriate word sense where possible.