Hostname: page-component-89b8bd64d-4ws75 Total loading time: 0 Render date: 2026-05-07T09:51:42.890Z Has data issue: false hasContentIssue false

Twitter as research data

Tools, costs, skill sets, and lessons learned

Published online by Cambridge University Press:  09 August 2021

Kaiping Chen*
Affiliation:
University of Wisconsin–Madison
Zening Duan
Affiliation:
University of Wisconsin–Madison
Sijia Yang
Affiliation:
University of Wisconsin–Madison
*
Correspondence: Kaiping Chen, Life Sciences Communication, University of Wisconsin, Madison, WI. Email: kchen67@wisc.edu

Abstract

Scholars increasingly use Twitter data to study the life sciences and politics. However, Twitter data collection tools often pose challenges for scholars who are unfamiliar with their operation. Equally important, although many tools indicate that they offer representative samples of the full Twitter archive, little is known about whether the samples are indeed representative of the targeted population of tweets. This article evaluates such tools in terms of costs, training, and data quality as a means to introduce Twitter data as a research tool. Further, using an analysis of COVID-19 and moral foundations theory as an example, we compared the distributions of moral discussions from two commonly used tools for accessing Twitter data (Twitter’s standard APIs and third-party access) to the ground truth, the Twitter full archive. Our results highlight the importance of assessing the comparability of data sources to improve confidence in findings based on Twitter data. We also review the major new features of Twitter’s API version 2.

Information

Type
Research Tool Report
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is included and the original work is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use.
Copyright
© The Author(s), 2021. Published by Cambridge University Press on behalf of the Association for Politics and the Life Sciences
Figure 0

Figure 1. Example of a dashboard layout of a third-party platform.

Figure 1

Figure 2. Example of data output from a third-party platform (not limited to the selected columns).Note: other third-party platforms might give a different output.

Figure 2

Table 1. Financial costs of different data collection circumstances.

Figure 3

Table 2. Skill sets required for using various Twitter tools.

Figure 4

Table 3. Twitter’s standard, premium, and enterprise levels of Search and Streaming APIs.

Figure 5

Figure 3. Comparisons of score distributions of five moral appeals.

Figure 6

Table 4. Comparison of five basic statistics across three data sources.

Figure 7

Figure 4. K-S statistics distribution comparison.