We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Chapter 5 introduces network analysis. Social media data frequently has elements that are amenable to network analysis, including friend/follower networks and retweet networks. This chapter addresses how to collect and operationalize this data into measures appropriate for network analysis. It shows how to collect en masse the timelines of a given set of users, in addition to traversing their friend and follower networks. In addition, it demonstrates how to do so by collecting all tweets of all members of Congress in real time. Finally, it demonstrates in applied form how to identify automated accounts (bots) among the data being collected.
This chapter introduces the building blocks of an infrastructure for collecting social media data. It includes a summary of what data is available via Twitter, and how we can best structure a collection system in general for any number of social science applications. In addition, it walks through how to collect a worldwide sample of all posted tweets in real time, along with database and compression tools for ensuring that the infrastructure can be used for long-term data collection projects encompassing millions of data points.
The book concludes with Chapter 6’s discussion of the particular ethical concerns raised by using social media data. This includes data privacy concerns, for instance the need in some contexts to anonymize unique user identifiers in all stored tweets so that not even the researchers have access to user names and such. In addition, the chapter mentions concerns frequently raised by IRB in terms of human subjects research, and some of the thorny issues raised by the terms of use of social media sites with regard to data sharing and replicability. In particular, it will walk through what scholars need to know about the limitations imposed by Twitter’s terms of use, what use cases are considered acceptable use (sharing data among researchers on the same project), and strategies for common scholarly needs that fall within gray areas (for example, providing word frequency matrices so that content analysis can be fully replicated, but the terms of use conditions regarding republication of tweets are not violated).
One of the most exciting types of social media data is geolocated data, which includes the source location of the post, based on the GPS capabilities of the posting device. Chapter 4 discusses the particular advantages offered by this data, including the capacity to perform extremely fine-grained subnational studies impossible with traditional sources of data. In addition, the chapter provides software for processing geocoded social media data in order to efficiently identify the country and subnational unit of every tweet in a collection, including an example application collecting all geocoded tweets in the world.
This chapter focuses on content analysis and introduces the collection of data from Twitter by either select keywords or languages. It then develops computerized content analysis techniques for use on tweets, covering the particular challenges of adapting these techniques for usage on the text from social media (for instance, dealing with the often very short passages of text, the especially dense usage of colloquialisms, and the frequent mixing of different languages within a particular source of social media text data). It also covers the download of other forms of content (such as video and images) and the handling of meta-objects such as mentions and hashtags.
The book opens with a discussion of the theoretical importance of the rise of social media, focusing on the way that decreases in transaction costs of communication change the potential for populations to solve their collective action problem. In addition, it highlights the historical importance of social media as a data source, with scholars having access to the communications of the public en masse for the first time. The cheapness and accessibility of this data democratizes data collection efforts, which changes the nature of research questions that can be asked by scholars.
Social media has put mass communication in the hands of normal people on an unprecedented scale, and has also given social scientists the tools necessary to listen to the voices of everyday people around the world. This book gives social scientists the skills necessary to leverage that opportunity, and transform social media's vast stream of information into social science data. The book combines the big data techniques of computer science with social science methodology. Intended as a text for advanced undergraduates, graduate students, and researchers in the social sciences, this book provides a methodological pathway for scholars who want to make use of this new and evolving source of data. It provides a framework for building one's own data collection and analysis infrastructure, a toolkit of content analysis, geographic analysis, and network analysis, and meditations on the ethical implications of social media data.