Hostname: page-component-89b8bd64d-j4x9h Total loading time: 0 Render date: 2026-05-11T21:59:24.194Z Has data issue: false hasContentIssue false

Open-domain extraction of future events from Twitter

Published online by Cambridge University Press:  22 February 2016

FLORIAN KUNNEMAN
Affiliation:
Centre for Language Studies, Radboud University, P.O. Box 9103, NL-6500, HD Nijmegen, the Netherlands e-mails: a.vandenbosch@let.ru.nl, f.kunneman@let.ru.nl
ANTAL VAN DEN BOSCH
Affiliation:
Centre for Language Studies, Radboud University, P.O. Box 9103, NL-6500, HD Nijmegen, the Netherlands e-mails: a.vandenbosch@let.ru.nl, f.kunneman@let.ru.nl
Rights & Permissions [Opens in a new window]

Abstract

Explicit references on Twitter to future events can be leveraged to feed a fully automatic monitoring system of real-world events. We describe a system that extracts open-domain future events from the Twitter stream. It detects future time expressions and entity mentions in tweets, clusters tweets together that overlap in these mentions above certain thresholds, and summarizes these clusters into event descriptions that can be presented to users of the system. Terms for the event description are selected in an unsupervised fashion.1 We evaluated the system on a month of Dutch tweets, by showing the top-250 ranked events found in this month to human annotators. Eighty per cent of the candidate events were indeed assessed as being an event by at least three out of four human annotators, while all four annotators regarded sixty-three per cent as a real event. An added component to complement event descriptions with additional terms was not assessed better than the original system, due to the occasional addition of redundant terms. Comparing the found events to gold-standard events from maintained calendars on the Web mentioned in at least five tweets, the system yields a recall-at-250 of 0.20 and a recall based on all retrieved events of 0.40.

Information

Type
Articles
Copyright
Copyright © Cambridge University Press 2016 
Figure 0

Table 1. Overview of the number of events that were collected as gold standard for recall evaluation, divided into the total of curated events, events mentioned at least once in the tweet set, and events mentioned at least five times. The percentage of the total is given between brackets

Figure 1

Table 2. Top-10 ranked events from the commonness systems

Figure 2

Fig. 1. Counts of the number of extracted events by week number in 2014, from the top 250 events extracted from the Twitter stream in August 2014 (weeks 31–35).

Figure 3

Table 3. Precision@250 of output identified as event by human annotators at hundred per cent, seventy-five per cent, and fifty per cent agreement, and Cohen’s Kappa and Mutual F-score between the annotators

Figure 4

Fig. 2. Precision-at-curves for the N-gram and Commonness approach with different degrees of strictness. (a) Ngram approach. (b) Commonness approach.

Figure 5

Table 4. Average assessment of terms for all three approaches. Assessment is on a scale of 1 (bad) to 3 (good)

Figure 6

Table 5. Recall performance by event type, based on a gold standard set of events that are mentioned in at least five tweets (see Table 1 for the exact numbers)

Figure 7

Fig. 3. Recall-at-curves of the Commonness system, ranked by either the G2 formula or the G2U formula.

Figure 8

Fig. 4. Overview of output properties when they are not rated as event by at least one annotator, divided by the percentage of annotators who rated the output as event per property.

Figure 9

Table 6. Overview of possible combinations between the event terms of the Commonness and Commonness+ approaches and the assessment by annotators for any event output. Only the 214 events that were assessed for both approaches are included in the counts

Figure 10

Table 7. Comparison between Heideltime and the rule-based approach to finding tweets with a TIMEX that points to a future date, from the August 2014 tweets that were used in the main experiment

Figure 11

Table 8. Significance estimates of commonness and Frog NED in retrieving entities from an annotated sample of 1,000 tweets. Scores are obtained after bootstrap resampling with 250 samples

Figure 12

Table 9. Performance of the clustering component. RI = Rand Index. #Total = the term pairs that should be merged according to a manual gold standard clustering. #Merged = the merges made by the clustering component. #Correct = the correctly merged pairs

Figure 13

Table A1. Date related rules for the extraction of time expressions. Values in the columns can be combined sequentially

Figure 14

Table A2. Exact rules for the extraction of time expressions. Values in the columns can be combined sequentially

Figure 15

Table A3. Rules for the extraction of time expressions that contain a weekday. Values in the columns can be combined sequentially