Skip to main content Accessibility help
×
Hostname: page-component-7bb8b95d7b-l4ctd Total loading time: 0 Render date: 2024-09-19T21:41:56.608Z Has data issue: false hasContentIssue false

1 - Analyzing Twitter Data

Published online by Cambridge University Press:  05 May 2015

Shamanth Kumar
Affiliation:
Arizona State University
Fred Morstatter
Affiliation:
Arizona State University
Huan Liu
Affiliation:
Arizona State University
Yelena Mejova
Affiliation:
Qatar Computing Research Institute, Doha
Ingmar Weber
Affiliation:
Qatar Computing Research Institute, Doha
Michael W. Macy
Affiliation:
Cornell University, New York
Get access

Summary

Twitter is a social network with over 250 million active users who collectively generate more than 500 million tweets each day. In social sciences research, Twitter has earned the focus of extensive research largely due to its openness in sharing its public data. Twitter exposes an extensive application programming interfaces (APIs) that can be used to collect a wealth of social data. In this chapter, we introduce these APIs and discuss how they can be used to conduct social sciences research. We also outline some issues that arise when using these APIs, and some strategies for collecting datasets that can give insight into a particular event.

Introduction

Twitter is a rich data source that provides several forms of information generated through the interaction of its users. These data can be harnessed to accomplish a variety of personalization and prediction tasks. Recently, Twitter data have been used to predict things as diverse as election results (Tumasjan et al., 2010; c.f. Chapter 2) or the location of earthquakes (Sakaki et al., 2010; c.f. Chapter 6). Twitter currently has over 250 million active users who collectively generate more than 500 million tweets each day. This creates a unique opportunity to conduct large-scale studies on user behavior. An important step before conducting such studies is the identification and collection of data relevant to the problem.

Twitter is an online social networking platform where the registered users can create connections and share messages with other users. Messaging on Twitter is unique, as messages are required to be at most 140 characters long, and these messages are normally broadcast to all the users on Twitter. Thus, the platform provides an avenue to share content with a large and diverse population with few resources. These interactions generate different kinds of information. Information is made accessible to the public via APIs or interfaces where requests for data can be submitted. In this chapter, we introduce different forms of Twitter data and illustrate the capabilities and restrictions imposed by the API on Twitter data analysis.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2015

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Amiri, Hadi, and Chua, Tat-Seng. 2012. Mining slang and urban opinion words and phrases from cQA services: an optimization approach. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining. WSDM ’12 (pp. 193–202). ACM.
Bergsma, Shane, Dredze, Mark, Van Durme, Benjamin, Wilson, Theresa, and Yarowsky, David. 2013. Broadly Improving User Classification via Communication-Based Name and Location Clustering on Twitter (pp.1010–1019). HLT-NAACL.
Bollen, Johan, Mao, Huina, and Pepe, Alberto. 2011. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media. ICWSM ’11 (pp. 450–3). AAAI.
Bontcheva, Kalina, Derczynski, Leon, Funk, Adam, Greenwood, Mark A., Maynard, Diana, and Aswani, Niraj. 2013. TwitIE: an open-source information extraction pipeline for microblog text. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (pp. 83–90). Association for Computational Linguistics.
Brandes, Ulrik, Pfeffer, Jürgen, and Mergel, Ines. 2013. Studying Social Networks: A Guide to Empirical Research. Campus Verlag.Google Scholar
Cheng, Zhiyuan, Caverlee, James, and Lee, Kyumin. 2010. You are where you tweet: a content-based approach to geo-locating Twitter users. In Proceedings of the Nineteenth ACM International Conference on Information and Knowledge Management. CIKM ‘10 (pp. 759–68). ACM.
Gayo-Avello, D. (2011). All liaisons are dangerous when all your friends are known to us. In Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia (pp. 171–180). ACM.CrossRef
Hansen, Derek, Shneiderman, Ben, and Smith, Marc. 2010. Analyzing Social Media Networks with NodeXL: Insights from a Connected World. Morgan Kaufmann.Google Scholar
Hecht, Brent, Hong, Lichan, Suh, Bongwon, and Chi, Ed. 2011. Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 237–46). ACM.
Hu, Xia, Tang, Jiliang, Gao, Huiji, and Liu, Huan. 2013b. Unsupervised sentiment analysis with emotional signals. In Proceedings of the 22nd International Conference on World Wide Web. WWW'13. (pp. 607–18). International World Wide Web Conferences Steering Committee.
Hu, Xia, Tang, Lei, Tang, Jiliang, and Liu, Huan. 2013a. Exploiting social relations for sentiment analysis in microblogging. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining (pp. 537–46). ACM.
Kim, Elsa, Gilbert, Sam, Edwards, Michael J, and Graeff, Erhardt. 2009. Detecting sadness in 140 characters: sentiment analysis and mourning Michael Jackson on Twitter. Web Ecology, 3, 1–15.
Kumar, Shamanth, Barbier, Geoffrey, Abbasi, Mohammad Ali, and Liu, Huan. 2011. TweetTracker: an analysis tool for humanitarian and disaster relief. In Proceedings of 5th AAAI International Conference on Weblogs and Social Media (pp. 661–2). AAAI.
Kumar, Shamanth, Morstatter, Fred, and Liu, Huan. 2014a. Twitter Data Analytics. SpringerBriefs in Computer Science.
Kumar, Shamanth, Hu, Xia, and Liu, Huan. 2014b. A behavior analytics approach to identifying tweets from crisis regions. In Proceedings of the 25th ACM Conference on Hypertext and Social Media. HT ’14 (pp. 555–6). ACM.
Kumar, Shamanth, Liu, Huan, Mehta, Sameep, and Subramaniam, L Venkata. 2014c. From tweets to events: exploring a scalable solution for Twitter streams. arXiv preprint arXiv:1405.1392.
Li, Rui, Lei, Kin Hou, Khadiwala, Ravi, and Chang, Kevin Chen-Chuan. 2012. TEDAS: A Twitter-based Event Detection and Analysis System. In 2012 IEEE 28th International Conference on Data Engineering (ICDE) (pp. 1273–6). IEEE.
Mahmud, Jalal, Nichols, Jeffrey, and Drews, Clemens. 2012. Where is this tweet from? Inferring home locations of Twitter users. In Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media. ICWSM ’12 (pp. 511–14). AAAI.
Mejova, Yelena Aleksandrovna. 2012. Sentiment Analysis Within and Across Social Media Streams. Ph.D. thesis, University of Iowa.
Mislove, A., Lehmann, S., Ahn, Y. Y., Onnela, J. P., & Rosenquist, J. N. (2011). Understanding the demographics of Twitter users. ICWSM, 11 (5).
Morstatter, F., Kumar, S., Liu, H., and Maciejewski, R. 2013a. Understanding Twitter data with TweetXplorer. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1482–5). ACM.
Morstatter, Fred, Pfeffer, Jürgen, Liu, Huan, and Carley, Kathleen. 2013b. Is the sample good enough? Comparing data from Twitter's streaming API with Twitter's Firehose. In Proceedings of the International Conference on Weblogs and Social Media (pp. 23–7). Association for Computational Linguistics.
Morstatter, Fred, Lubold, Nichola, Pon-Barry, Heather, Pfeffer, Jürgen, and Liu, Huan. 2014a. Finding eyewitness tweets during crises. In Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science (pp. 23–7). Association for Computational Linguistics.
Morstatter, Fred, Pfeffer, Jürgen, and Liu, Huan. 2014b. When is it biased? Assessing the representativeness of Twitter's streaming API. In Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion. WWW Companion ’14 (pp. 555–6). International World Wide Web Conferences Steering Committee.
Olston, Christopher, Reed, Benjamin, Srivastava, Utkarsh, Kumar, Ravi, and Tomkins, Andrew. 2008. Pig Latin: A not-so-foreign language for data processing. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. SIGMOD ’08 (pp. 1099–1110). ACM.
Owoputi, Olutobi, O'Connor, Brendan, Dyer, Chris, Gimpel, Kevin, Schneider, Nathan, and Smith, Noah. 2013. Improved part-of-speech tagging for online conversational text with word clusters. In Proceedings of NAACL-HLT (pp. 380–90). Association for Computational Linguistics.
Pennacchiotti, M., & Popescu, A. M. (2001a). Democrats, Republicans and Starbucks afficionados: user classification in Twitter. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 430–8). ACM.
Pennacchiotti, M. & Popescu, A. M. (2001b). A machine learning approach to Twitter user classification. In Proceedings of the Fifth International AAAI Conferences on Weblogs and Social Media (pp. 281–8). AAAI Press.
Ritter, Alan, Clark, Sam, Mausam, , and Etzioni, Oren. 2011. Named entity recognition in tweets: an experimental study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. EMNLP ’11 (pp. 1524–34). Association for Computational Linguistics.
Rout, Dominic, Bontcheva, Kalina, Preotiuc-Pietro, Daniel, and Cohn, Trevor. 2013. Where's @Wally? A classification approach to geolocating users based on their social ties. In 24th ACM Conference on Hypertext and Social Media (pp. 11–20). ACM.
Sakaki, Takeshi, Okazaki, Makoto, and Matsuo, Yutaka. 2010. Earthquake shakes Twitter users: real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web (pp. 851–60). International World Wide Web Conferences Steering Committee.
Speriosu, Michael, Sudan, Nikita, Upadhyay, Sid, and Baldridge, Jason. 2011. Twitter polarity classification with label propagation over lexical links and the follower graph. In Proceedings of the First Workshop on Unsupervised Learning in NLP (pp. 53–63). Association for Computational Linguistics.
Starbird, Kate, and Stamberger, Jeannie. 2010. Tweak the tweet: leveraging microblogging proliferation with a prescriptive syntax to support citizen reporting. In Proceedings of Information Systems for Crisis Response and Management (ISCRAM) (pp. 1071–80). ACM.
Tumasjan, Andranik, Sprenger, Timm Oliver, Sandner, Philipp, and Welpe, Isabell. 2010. Predicting elections with Twitter: what 140 characters reveal about political sentiment. ICWSM, 10, 178–85.Google Scholar
Weng, Jianshu, and Lee, Bu-Sung. 2011. Event detection in Twitter. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media. ICWSM ‘11 (pp. 401–8). AAAI.
Zafarani, Reza, Abbasi, Mohammad Ali, and Liu, Huan. 2014. Social Media Mining: An Introduction. Cambridge University Press.CrossRefGoogle Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×