Skip to main content
×
×
Home

Open-domain extraction of future events from Twitter

  • FLORIAN KUNNEMAN (a1) and ANTAL VAN DEN BOSCH (a1)
Abstract

Explicit references on Twitter to future events can be leveraged to feed a fully automatic monitoring system of real-world events. We describe a system that extracts open-domain future events from the Twitter stream. It detects future time expressions and entity mentions in tweets, clusters tweets together that overlap in these mentions above certain thresholds, and summarizes these clusters into event descriptions that can be presented to users of the system. Terms for the event description are selected in an unsupervised fashion. 1 We evaluated the system on a month of Dutch tweets, by showing the top-250 ranked events found in this month to human annotators. Eighty per cent of the candidate events were indeed assessed as being an event by at least three out of four human annotators, while all four annotators regarded sixty-three per cent as a real event. An added component to complement event descriptions with additional terms was not assessed better than the original system, due to the occasional addition of redundant terms. Comparing the found events to gold-standard events from maintained calendars on the Web mentioned in at least five tweets, the system yields a recall-at-250 of 0.20 and a recall based on all retrieved events of 0.40.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      Open-domain extraction of future events from Twitter
      Available formats
      ×
      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      Open-domain extraction of future events from Twitter
      Available formats
      ×
      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      Open-domain extraction of future events from Twitter
      Available formats
      ×
Copyright
Footnotes
Hide All

This research was funded by the Dutch national program COMMIT. We thank Erik Tjong Kim Sang for the development and support of the http://twiqs.nl service.

Footnotes
References
Hide All
Aggarwal, C., and Subbian, K. 2012. Event detection in social streams. In Proceedings of SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp. 624–35.
Becker, H., Iter, D., Naaman, M., and Gravano, L. 2012. Identifying content for planned events across social media sites. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining, ACM, New York, NY, USA, pp. 533–42.
Benson, E., Haghighi, A., and Barzilay, R., 2011. Event discovery in social media feeds. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Stroudsburg, PA, USA, vol. 2, pp. 389–98.
Bosch, A. van den, Busser, B., Canisius, S., and Daelemans, W., 2007. An efficient memory-based morphosyntactic tagger and parser for Dutch. In Computational Linguistics in the Netherlands: Selected Papers from the 17th CLIN Meeting, LOT, Utrecht, pp. 99114.
Cohen, M. J., van den Brink, G. J. M., Adang, O. M. J., van Dijk, J. A. G. M., and Boeschoten, T. 2013. Twee werelden, You Only Live Once.
Cordeiro, M. 2012. Twitter event detection: combining wavelet analysis and topic inference summarization. In Doctoral Symposium on Informatics Engineering, DSIE, Faculdade de Engenharia da Universidade do Porto, Porto.
Day, W. H., and Edelsbrunner, H., 1984. Efficient algorithms for agglomerative hierarchical clustering methods. Journal of classification 1 (1): 724.
Diao, Q., Jiang, J., Zhu, F., and Lim, E. P., 2012. Finding bursty topics from microblogs. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 536–44.
Gwet, K. 2001. Handbook of Inter-Rater Reliability, Gaithersburg: Advanced Analytics, LLC.
Halkidi, M., Batistakis, Y., and Vazirgiannis, M., 2001. On clustering validation techniques. Journal of Intelligent Information Systems 17 (2): 107–45.
Jackoway, A., Samet, H., and Sankaranarayanan, J., 2011. Identification of live news events using Twitter. In Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Location-Based Social Networks, ACM, New York, NY, USA, pp. 2532.
Kumar, S., Liu, H., Mehta, S., and Venkata Subramaniam, L. 2014. From tweets to events: exploring a scalable solution for twitter streams, arXiv preprint arXiv:1405.1392.
Kunneman, F., Liebrecht, C., and van den Bosch, A., 2014. The (Un)predictability of emotional hashtags in Twitter. In Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM), Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 2634.
Li, C., Sun, A., and Datta, A., 2012. Twevent: segment-based event detection from tweets. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, ACM, New York, NY, USA, pp. 155–64.
McMinn, A. J., Moshfeghi, Y., and Jose, J. M., 2013. Building a large-scale corpus for evaluating event detection on twitter. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, ACM, New York, NY, USA, pp. 409–18.
Meij, E., Weerkamp, W., and de Rijke, M., 2012. Adding semantics to microblog posts. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining, ACM, New York, NY, USA, pp. 563–72.
Noreen, E. W. 1989. Computer-Intensive Methods for Testing Hypotheses: An Introduction, New Jersey: Wiley-Interscience.
Ou, G., Chen, W., Wang, T., Wei, Z., Li, B., and Yang, D., 2014. Exploiting community emotion for microblog event detection. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 1159–68.
Ozdikis, O., Senkul, P., and Oguztuzun, H. 2012. Semantic expansion of hashtags for enhanced event detection in twitter. In Proceedings of the 1st International Workshop on Online Social Systems, ACM, New York, NY, USA.
Petrović, S., Osborne, M., and Lavrenko, V., 2010. Streaming first story detection with application to twitter. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 181–9.
Rand, W. M., 1971. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66 (336): 846–50.
Reuter, T., and Cimiano, P. 2012. Event-based classification of social media streams. In Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, ACM, New York, NY, USA.
Ritter, A., Clark, S., and Etzioni, O., 2011. Named entity recognition in tweets: an experimental study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 1524–34.
Ritter, A., Mausam Etzioni, O., and Clark, S., 2012. Open domain event extraction from twitter. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, USA, pp. 1104–12.
Sakaki, T., Okazaki, M., and Matsuo, Y., 2010. Earthquake shakes Twitter users: real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web, ACM, New York, NY, USA, pp. 851–60.
Strötgen, J., and Gertz, M., 2010. HeidelTime: high quality rule-based extraction and normalization of temporal expressions. In Proceedings of the 5th International Workshop on Semantic Evaluation, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 321–4.
Tao, K., Abel, F., Hauff, C., Houben, G. J., and Gadiraju, U., 2013. Groundhog day: near-duplicate detection on twitter. In Proceedings of the 22nd International Conference on World Wide Web, ACM, New York, NY, USA, pp. 1273–84.
Tjong Kim Sang, E., and van den Bosch, A. 2013. Dealing with big data: the case of Twitter. In Computational Linguistics in the Netherlands Journal 3: 121–34.
Valkanas, G., and Gunopulos, D., 2013. How the live web feels about events. In Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, ACM, New York, NY, USA, pp. 639–48.
Weerkamp, W., and Rijke, M. de 2012. Activity prediction: a twitter-based exploration. In Proceedings of the SIGIR 2012 Workshop on Time-aware Information Access, TAIA-2012, ACM, New York, NY, USA.
Weiler, A., Scholl, M. H., Wanner, F., and Rohrdantz, C., 2013. Event identification for local areas using social media streaming data. In Proceedings of the ACM SIGMOD Workshop on Databases and Social Networks, ACM, New York, NY, USA, pp. 16.
Weng, J., and Lee, B. S., 2011. Event detection in twitter. In Proceedings of the AAAI Conference on Weblogs and Social Media (ICWSM-11), AAAI Press, Palo Alto, CA, USA, pp. 401–8.
Zhao, S., Zhong, L., Wickramasuriya, J., and Vasudevan, V. 2011. Human as real-time sensors of social and physical events: a case study of Twitter and sports games. Technical Report TR0620-2011, Houston, TX: Rice University and Motorola Labs.
Zhao, W. X., Jiang, J., Weng, J., He, J., Lim, E. P., and Yan, H. 2011. Comparing twitter and traditional media using topic models. In Clough, P., Foley, C., Gurrin, C., Jones, G., Kraaij, W., Lee, H., and Murdock, V. (eds.), Advances in Information Retrieval, pp. 338–49. Berlin: Springer Verlag.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed