Hostname: page-component-89b8bd64d-72crv Total loading time: 0 Render date: 2026-05-08T08:11:07.359Z Has data issue: false hasContentIssue false

A content analysis: analyzing topics of conversation under the #sustainability hashtag on Twitter

Published online by Cambridge University Press:  20 February 2024

Aydan Gerber*
Affiliation:
Iona Preparatory School, New Rochelle, NY, USA

Abstract

This study aimed to identify and understand the major topics of discussion under the #sustainability hashtag on Twitter (now known as “X”) and understand user engagement. The sharp increase in social media usage combined with a rise in climate anomalies in recent years makes the area of sustainability with respect to social media a critical topic. Python was used to gather Twitter posts between January 1, 2023, and March 1, 2023. User engagement metrics were analyzed using a variety of statistical analysis methods, including keyword-frequency analysis and Latent Dirichlet Allocation (LDA), which were used to identify significant topics of discussion under the #sustainability hashtag. Additionally, histograms and scatter plots were used to visualize user engagement. LDA analysis was conducted with 7 topics after trials were run with various topics and results were analyzed to determine which number of topics best fit the dataset. The frequency analysis provided a basic overview of the discourse surrounding #sustainability with the topics of technology, business and industry, environmental awareness, and discussion of the future. The LDA model provided a more comprehensive view, including additional topics such as Environmental, Social, and Governance (ESG) and infrastructure, investing, collaboration, and education. These findings have implications for researchers, businesses, organizations, and politicians seeking to align their strategies and actions with the major topics surrounding sustainability on Twitter to have a greater impact on their audience. Researchers can use the results of this study to guide further research on the topic or contextualize their study with existing literature within the field of sustainability.

Information

Type
Application Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices
Open data
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Table 1. Standard twitter metrics data (n = 37,728)

Figure 1

Figure 1. Histograms of likes, retweets, times quoted, reply count, word count, and view count.Note: The y-axis (Frequency) for each graph represents the number of tweets within the dataset.

Figure 2

Figure 2. Scattergrams of likes, retweets, times quoted, reply count, view count, word count over time.Note: The x-axis displays the date in the format month-day that the posts were collected. The year for all of the posts was 2023.

Figure 3

Figure 3. View count plotted against likes.Note: The graph on the left is a zoomed-out version of the graph on the right.

Figure 4

Figure 4. Likes plotted against retweets.Note: The graph on the left is a zoomed-out version of the graph on the right.

Figure 5

Figure 5. View count plotted against follower count.Note: The red dotted line represents an estimate of the x value where a spike occurs as referenced in the paper.

Figure 6

Figure 6. View count plotted against the word count.Note: As referenced in the paper, the red dotted line represents an estimate of the x value where a spike occurs.

Figure 7

Figure 7. Frequency of the top 30 keywords.

Figure 8

Table 2. Coherence and perplexity for topics 5-13

Figure 9

Figure 8. Latent Dirichlet Allocation (LDA) output visualization intertopic distance plots for 7 topics.Note: The intertopic distance map represents the similarity between topics. More distinct topics are further away from each other. This plot represents the output for 7 topics, which was the number of topics used in this research.

Figure 10

Figure 9. Latent Dirichlet Allocation (LDA) output visualization intertopic distance plots for 10 topics.Note: The intertopic distance map represents the similarity between topics. More distinct topics are further away from each other. This plot represents the output for 10 topics.

Figure 11

Figure 10. Latent Dirichlet Allocation (LDA) output visualization for topic 1.Note: The top-30 most relevant terms represent term relevance for the specific topic. Term relevancy is automatically calculated using Equation (1). Topic 1 corresponds to the impact of technology on the environment and the future.

Figure 12

Figure 11. Latent Dirichlet Allocation (LDA) output visualization for topic 2.Note: The top-30 most relevant terms represent term relevance for the specific topic. Term relevancy is automatically calculated using Equation (1). Topic 2 corresponds to climate action.

Figure 13

Figure 12. Latent Dirichlet Allocation (LDA) output visualization for topic 3.Note: The top-30 most relevant terms represent term relevance for the specific topic. Term relevancy is automatically calculated using Equation (1). Topic 3 corresponds to community collaboration and education.

Figure 14

Figure 13. Latent Dirichlet Allocation (LDA) output visualization for topic 4.Note: The top-30 most relevant terms represent term relevance for the specific topic. Term relevancy is automatically calculated using Equation (1). The terms “helsh,” “campany,” and “bosh” in this topic are present due to an unfortunate limitation of being unable to filter out all bots that spam words such as these. Topic 4 corresponds to natural resource conservation efforts.

Figure 15

Figure 14. Latent Dirichlet Allocation (LDA) output visualization for topic 5.Note: The top-30 most relevant terms represent term relevance for the specific topic. Term relevancy is automatically calculated using Equation (1). Topic 5 corresponds to business and products.

Figure 16

Figure 15. Latent Dirichlet Allocation (LDA) output visualization for topic 6.Note: The top-30 most relevant terms represent term relevance for the specific topic. Term relevancy is automatically calculated using Equation (1). Topic 6 corresponds to Environmental, Social, and Governance criteria (ESG) and infrastructure.

Figure 17

Figure 16. Latent Dirichlet Allocation (LDA) output visualization for topic 7.Note: The top-30 most relevant terms represent term relevance for the specific topic. Term relevancy is automatically calculated using Equation (1). Topic 7 corresponds to investing and commerce.

Figure 18

Figure 17. Latent Dirichlet Allocation (LDA) output visualization for overall term saliency.Note: Saliency is a measure of the most important words in the dataset to classify topics. The method of calculation is listed in Equation (2).