Hostname: page-component-8448b6f56d-tj2md Total loading time: 0 Render date: 2024-04-18T05:46:07.939Z Has data issue: false hasContentIssue false

Identification of Keywords From Twitter and Web Blog Posts to Detect Influenza Epidemics in Korea

Published online by Cambridge University Press:  31 July 2017

Hyekyung Woo
Affiliation:
Department of Public Health Science, School of Public Health, Seoul National University, Seoul, Korea
Hyeon Sung Cho
Affiliation:
Department of Intelligent Cognitive Technology Research, Electronics and Telecommunications Research Institute, Daejeon, Korea
Eunyoung Shim
Affiliation:
Department of Public Health Science, School of Public Health, Seoul National University, Seoul, Korea Department of New Business, Samsung Fire and Marine Insurance, Seoul, Korea
Jong Koo Lee
Affiliation:
College of Medicine, Seoul National University, Seoul, Korea
Kihwang Lee
Affiliation:
Mining Laboratory, Daumsoft, Seoul, Korea
Gilyoung Song
Affiliation:
Mining Laboratory, Daumsoft, Seoul, Korea
Youngtae Cho*
Affiliation:
Department of Public Health Science, School of Public Health, Seoul National University, Seoul, Korea
*
Correspondence and reprint requests to Youngtae Cho, Department of Public Health Science, School of Public Health, Seoul National University, 1 Kwanak-ro, Kwanak-gu, Seoul 151-742, Korea (e-mail: youngtae@snu.ac.kr).

Abstract

Objective

Social media data are a highly contextual health information source. The objective of this study was to identify Korean keywords for detecting influenza epidemics from social media data.

Methods

We included data from Twitter and online blog posts to obtain a sufficient number of candidate indicators and to represent a larger proportion of the Korean population. We performed the following steps: initial keyword selection; generation of a keyword time series using a preprocessing approach; optimal feature selection; model building and validation using least absolute shrinkage and selection operator, support vector machine (SVM), and random forest regression (RFR).

Results

A total of 15 keywords optimally detected the influenza epidemic, evenly distributed across Twitter and blog data sources. Model estimates generated using our SVM model were highly correlated with recent influenza incidence data.

Conclusions

The basic principles underpinning our approach could be applied to other countries, languages, infectious diseases, and social media sources. Social media monitoring using our approach may support and extend the capacity of traditional surveillance systems for detecting emerging influenza. (Disaster Med Public Health Preparedness. 2018; 12: 352–359)

Type
Original Research
Copyright
Copyright © Society for Disaster Medicine and Public Health, Inc. 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

1. Collier, N, Doan, S, Kawazoe, A, et al. BioCaster: detecting public health rumors with a web-based text mining system. Bioinformatics. 2008;24:2940-2941.Google Scholar
2. Freifeld, CC, Mandl, KD, Reis, BY, et al. HealthMap: global infectious disease monitoring through automated classification and visualization of Internet media reports. JAMIA. 2008;15:150-157.Google Scholar
3. Tolentino, H, Kamadjeu, R, Fontelo, P, et al. Scanning the emerging infectious diseases horizon-visualizing ProMED emails using EpiSPIDER. Adv Dis Surveill. 2007;2:169.Google Scholar
4. Yuan, Q, Nsoesie, EO, Lv, B, et al. Monitoring influenza epidemics in China with search query from Baidu. PLoS One. 2013;8:e64323.Google Scholar
5. Ginsberg, J, Mohebbi, MH, Patel, RS, et al. Detecting influenza epidemics using search engine query data. Nature. 2009;457:1012-1014.Google Scholar
6. Hulth, A, Rydevik, G. Web query-based surveillance in Sweden during the influenza A (H1N1) 2009 pandemic, April 2009 to February 2010. Euro Surveill. 2011;16:1-6.Google Scholar
7. Broniatowski, DA, Paul, MJ, Dredze, M. National and local influenza surveillance through twitter: an analysis of the 2012-2013 influenza epidemic. PLoS One. 2013;8:e83672.Google Scholar
8. Santos, JC, Matos, S. Analysing Twitter and web queries for flu trend prediction. Theor Biol Med Model. 2014;11:S6.Google Scholar
9. Corley, CD, Cook, DJ, Mikler, AR, Singh, KP. Using web and social media for influenza surveillance. Adv Exp Med Biol. 2010;680:559-564.Google Scholar
10. Gu, H, Chen, B, Zhu, H, et al. Importance of internet surveillance in public health emergency control and prevention: evidence from a digital epidemiologic study during avian influenza A H7N9 outbreaks. J Med Internet Res. 2014;16:e20.Google Scholar
11. Paul, MJ, Dredze, M, Broniatowski, D. Twitter improves influenza forecasting. PLoS Curr. 2014;6.Google Scholar
12. Chew, C, Eysenbach, G. Pandemics in the age of Twitter: content analysis of Tweets during the 2009 H1N1 outbreak. PLoS One. 2010;5:e14118.Google Scholar
13. Prieto, VM, Matos, S, Álvarez, M, et al. Twitter: a good place to detect health conditions. PLoS One. 2014;9:e86191.Google Scholar
14. Pawelek, KA, Oeldorf-Hirsch, A, Rong, L. Modeling the impact of Twitter on influenza epidemics. Math Biosci Eng. 2014;11:1337-1356.Google Scholar
15. Gesser-Edelsburg, A, Shir-Raz, Y, Walter, N, et al. The public sphere in emerging infectious disease communication: recipient or active and vocal partner? Disaster Med Public Health Preparedness. 2015;9:447-458.Google Scholar
16. Lazer, D, Kennedy, R, King, G, et al. Big data. The parable of Google Flu: traps in big data analysis. Science. 2014;343:1203-1205.Google Scholar
17. Lazer, D, Kennedy, R, King, G, et al. Twitter: big data opportunities–response. Science. 2014;345:148-149.Google Scholar
18. Milinovich, GJ, Williams, GM, Clements, AC, et al. Internet-based surveillance systems for monitoring emerging infectious diseases. Lancet Infect Dis. 2014;14:160-168.Google Scholar
19. Kass-Hout, TA, Alhinnawi, H. Social media in public health. Br Med Bull. 2013;108:5-24.Google Scholar
20. Broniatowski, DA, Paul, MJ, Dredze, M. Twitter: big data opportunities. Science. 2014;345:148.Google Scholar
21. KISDI. KISDI STAT Report (13-04): Current use of SNS Seoul, Korea; 2013.Google Scholar
22. Guyon, I, Elisseeff, A. An introduction to variable and feature selection. J Mach Learn Res. 2003;3:1157-1182.Google Scholar
23. Saeys, Y, Inza, I, Larrañaga, P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23:2507-2517.Google Scholar
24. Li, F, Yang, Y, Xing, EP. From lasso regression to feature vector machine. Advances in Neural Information Processing Systems; 2005.Google Scholar
25. Cook, S, Conrad, C, Fowlkes, AL, et al. Assessing Google Flu Trends performance in the United States during the 2009 influenza virus A (H1N1) pandemic. PLoS One. 2011;6:e23610.Google Scholar
Supplementary material: PDF

Woo supplementary material

Supplementary Figure

Download Woo supplementary material(PDF)
PDF 174.8 KB