Exploring open information via event network†

YANPING CHEN; QINGHUA ZHENG; FENG TIAN; HUAN LIU; YAZHOU HAO; NAZARAF SHAH

doi:10.1017/S1351324917000390

Exploring open information via event network†

Published online by Cambridge University Press: 26 October 2017

YANPING CHEN ,

QINGHUA ZHENG ,

FENG TIAN ,

HUAN LIU ,

YAZHOU HAO and

NAZARAF SHAH

Show author details

YANPING CHEN: Affiliation:
Guizhou Provincial Key Laboratory of Public Big Data, GuiZhou University, China e-mail: ypench@gmail.com Shaanxi Province Key Laboratory of Satellite and Terrestrial Network Technology, Xi’an Jiaotong University, China e-mail: qhzheng@mail.xjtu.edu.cn, hliuxjtu@gmail.com, yazhouhao@gmail.com
QINGHUA ZHENG: Affiliation:
Shaanxi Province Key Laboratory of Satellite and Terrestrial Network Technology, Xi’an Jiaotong University, China e-mail: qhzheng@mail.xjtu.edu.cn, hliuxjtu@gmail.com, yazhouhao@gmail.com
FENG TIAN: Affiliation:
National Engineering Lab of Big Data Analytics, Xi’an Jiaotong University, China e-mail: fengtian@mail.xjtu.edu.cn
HUAN LIU: Affiliation:
Shaanxi Province Key Laboratory of Satellite and Terrestrial Network Technology, Xi’an Jiaotong University, China e-mail: qhzheng@mail.xjtu.edu.cn, hliuxjtu@gmail.com, yazhouhao@gmail.com
YAZHOU HAO: Affiliation:
Shaanxi Province Key Laboratory of Satellite and Terrestrial Network Technology, Xi’an Jiaotong University, China e-mail: qhzheng@mail.xjtu.edu.cn, hliuxjtu@gmail.com, yazhouhao@gmail.com
NAZARAF SHAH: Affiliation:
The Faculty of Engineering and Computing, Coventry University, UK e-mail: aa0699@coventry.ac.uk

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

It is a challenging task to discover information from a large amount of data in an open domain.1 In this paper, an event network framework is proposed to address this challenge. It is in fact an empirical construct for exploring open information, composed of three steps: document event detection, event network construction and event network analysis. First, documents are clustered into document events for reducing the impact of noisy and heterogeneous resources. Secondly, linguistic units (e.g., named entities or entity relations) are extracted from each document event and combined into an event network, which enables content-oriented retrieval. Then, in the final step, techniques such as social network or complex network can be applied to analyze the event network for exploring open information. In the implementation section, we provide examples of exploring open information via event network.

Information

Type: Articles
Information: Natural Language Engineering , Volume 24 , Issue 2 , March 2018 , pp. 199 - 220

DOI: https://doi.org/10.1017/S1351324917000390 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

†

This research is supported in part by the National Science Foundation of China under grant numbers 201721002, 61462011, 61540050 and 61472315; The Fundamental Theory and Applications of Big Data with Knowledge Engineering under the National Key Research and Development Program of China with grant number 2016YFB1000903, Project of China Knowledge Centre for Engineering Science and Technology, and the Ministry of Education Innovation Research Team no. IRT13035. The Open project no. 2017BDKFJJ018; the Major Applied Basic Research Program of Guizhou Province no. JZ20142001. Introduce Talents Science Projects of Guizhou University no. 201650.

References

Agichtein, E., and Gravano, L., 2000. Snowball: extracting relations from large plain-text collections. In Proceedings of the DL ’00, San Antonio, USA, ACM, pp. 85–94.Google Scholar

Ahn, D., 2006. The stages of event extraction. In Proceedings of the ARTE ’06, Sydney, Australia, ACL, pp. 1–8.Google Scholar

Alex, B., Haddow, B., and Grover, C., 2007. Recognising nested named entities in biomedical text. In Proceedings of the BioNLP ’07, Prague, Czech Republic, ACL, pp. 65–72.Google Scholar

Allan, J., Carbonell, J., Doddington, G., Yamron, J., and Yang, Y. 1998. Topic detection and tracking pilot study: final report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop.Google Scholar

Angel, A., Sarkas, N., Koudas, N., and Srivastava, D., 2012. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. Proceedings of the VLDB Endowment 5 (6): 574–85.CrossRef Google Scholar

Angeli, G., Premkumar, M., and Manning, C., 2015. Leveraging linguistic structure for open domain information extraction. In Proceedings of the ACL ’15, Beijing, China, ACL, pp. 344–54.Google Scholar

Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., and Ives, Z. 2007. Dbpedia: a nucleus for a web of open data. In The Semantic Web, pp. 722–35. Springer.Google Scholar

Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M., and Etzioni, O., 2007. Open information extraction for the web. In Proceedings of the IJCAI ’07, Hyderabad, India, AAAI, pp. 2670–6.Google Scholar

Banko, M., Etzioni, O., and Center, T. 2008. The tradeoffs between open and traditional relation extraction. In Proceedings of the ACL ’08, NAACL, pp. 28–36.Google Scholar

Batagelj, V., and Mrvar, A., 1998. Pajek-program for large network analysis. Connections 21 (2): 47–57.Google Scholar

Blei, D., Ng, A., and Jordan, M., 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3 : 993–1022.Google Scholar

Bollacker, K., Evans, C., Paritosh, P., Sturge, T., and Taylor, J., 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the SIGMOD ’08, Vancouver, Canada, ACM, pp. 1247–50.Google Scholar

Brin, S. 1998. Extracting patterns and relations from the world wide web. In The World Wide Web and Databases, pp. 172–83. Springer.Google Scholar

Carpenter, B., 2006. Character language models for Chinese word segmentation and named entity recognition. In Proceedings of the SIGHAN ’06, Sydney, Australia, ACL, pp. 169–72.Google Scholar

Che, W., Liu, T., and Li, S., 2005. Automatic entity relation extraction. Journal of Chinese Information Processing 19 : 1–6.Google Scholar

Chen, A., Peng, F., Shan, R., and Sun, G., 2006. Chinese named entity recognition with conditional probabilistic models. In Proceedings of the SIGHAN ’06, Sydney, Australia, ACL, pp. 173–6.Google Scholar

Chen, W., Zhang, Y., and Isahara, H., 2006. Chinese named entity recognition with conditional random fields. In Proceedings of the SIGHAN ’06, Sydney, Australia, ACL, pp. 118–21.Google Scholar

Chen, Y., Ouyang, Y., Li, W., Zheng, D., and Zhao, T., 2010. Using deep belief nets for Chinese named entity categorization. In Proceedings of the NEWS ’10, Uppsala, Sweden, ACL, pp. 102–9.Google Scholar

Chen, Y., Zheng, Q., and Chen, P., 2015a. A boundary assembling method for chinese entity mention recognition. IEEE Intelligent Systems 30 (6): 50–8.CrossRef Google Scholar

Chen, Y., Zheng, Q., and Chen, P., 2015b. Feature assembly method for extracting relations in Chinese. Artificial Intelligence 228 : 179–94.Google Scholar

Chen, Y., Zheng, Q., and Zhang, W., 2014. Omni-word feature and soft constraint for chinese relation extraction. In Proceedings of the ACL’14, Baltimore, USA, ACL, pp. 572–81.Google Scholar

Chiu, J., and Nichols, E. 2015. Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics, 4: 357–70.CrossRef Google Scholar

Collins, M., and Duffy, N., 2001. Convolution kernels for natural language. In Proceedings of the NIPS ’01, Vancouver, Canada, pp. 625–32.Google Scholar

Csardi, G., and Nepusz, T., 2006. The igraph software package for complex network research. InterJournal, Complex Systems 1695 (5): 1–9.Google Scholar

Curran, J., Murphy, T., and Scholz, B. 2007. Minimising semantic drift with mutual exclusion bootstrapping. In Proceedings of the PACL ’07, Melbourne, Australia, ACL.Google Scholar

Das Sarma, A., Jain, A., and Yu, C., 2011. Dynamic relationship and event discovery. In Proceedings of the WSDM ’11, Hong Kong, China, ACM, pp. 207–16.Google Scholar

Deerwester, S., Dumais, S., Furnas, G., Landauer, T., and Harshman, R., 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41 (6): 391–407.3.0.CO;2-9>CrossRef Google Scholar

Derczynski, L., Maynard, D., Rizzo, G., van Erp, M., Gorrell, G., Troncy, R., Petrak, J., and Bontcheva, K., 2015. Analysis of named entity recognition and linking for tweets. Information Processing & Management 51 : 32–49.Google Scholar

Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., and Weischedel, R. 2004. The automatic content extraction (ACE) program–tasks, data, and evaluation. In Proceedings of LREC ’04, Lisbon, Portugal, ELRA 4: 837–40.Google Scholar

Downey, D., Schoenmackers, S., and Etzioni, O. 2007. Sparse information extraction: unsupervised language models to the rescue. In Proceedings of the ACL ’07, Prague, Czech Republic, ACL.Google Scholar

Etzioni, O., Cafarella, M., Downey, D., Popescu, A. M., Shaked, T., Soderland, S., Weld, D. S., and Yates, A., 2005. Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence 165 : 91–134.Google Scholar

Etzioni, O., Fader, A., Christensen, J., Soderland, S., and Mausam, M. 2011. Open Information extraction: the second generation. In Proceedings of the IJCAI ’11, Barcelona, Spain, AAAI 11: 3–10.Google Scholar

Fu, G., and Luke, K., 2005. Chinese named entity recognition using lexicalized HMMs. In Proceedings of the SIGKDD ’05, Chicago, USA, ACM, pp. 19–25.Google Scholar

Hacioglu, K., Douglas, B., and Chen, Y., 2005. Detection of entity mentions occurring in English and Chinese text. In Proceedings of the HLT-EMNLP ’05, Vancouver, Canada, ACL, pp. 379–86.Google Scholar

Hoffmann, R., Zhang, C., and Weld, D. S., 2010. Learning 5000 relational extractors. In Proceedings of the ACL ’10, Uppsala, Sweden, ACL, vol. 10, pp. 286–95.Google Scholar

Jones, K., 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28 (1): 11–21.Google Scholar

Kambhatla, N., 2004. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relation. In Proceedings of the ACL ’04, Barcelona, Spain, ACL, pp. 178–81.Google Scholar

Kozareva, Z., and Hovy, E., 2010. Learning arguments and supertypes of semantic relations using recursive patterns. In Proceedings of the ACL’10, Uppsala, Sweden, ACL, pp. 1482–91.Google Scholar

Kuzey, E., and Weikum, G., 2014. Evin: building a knowledge base of events. In Proceedings of the WWW ’14, Seoul, Korea, IW3C2, pp. 103–6.Google Scholar

Kuzey, E., Vreeken, J., and Weikum, G., 2014. A fresh look on knowledge bases: distilling named events from news. In Proceedings of the CIKM ’14, Shanghai, China, ACM, pp. 1689–98.Google Scholar

Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., and Dyer, C. 2016. Neural architectures for named entity recognition. Proceedings of the NAACL-HLT ’16, San Diego, USA, ACL, pp. 260–70.Google Scholar

Lenat, D., 1995. CYC: A large-scale investment in knowledge infrastructure. Communications of the ACM 38 (11): 33–8.CrossRef Google Scholar

Leydesdorff, L., and Vaughan, L. 2006. Co-occurrence matrices and their applications in information science: Extending ACA to the Web environment. Journal of the Association for Information Science and Technology 57 (12), 1616–28.Google Scholar

Ling, G., Asahara, M., and Matsumoto, Y., 2003. Chinese unknown word identification using character-based tagging and chunking. In Proceedings of the ACL ’03, Sapporo, Japan, ACL, pp. 197–200.Google Scholar

Liu, M., Liu, K., Xu, L., and Zhao, J., 2014. Exploring fine-grained entity type constraints for distantly supervised relation extraction. In Proceedings of COLING’14, Nantes, France, ACL, pp. 2107–16.Google Scholar

Luhn, H., 1957. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development 1 (4): 309–17.Google Scholar

McCallum, A., 2005. Information extraction: distilling structured data from unstructured text. Queue 3 (9): 48–57.Google Scholar

McIntosh, T., Yencken, L., Curran, J. R., and Baldwin, T., 2011. Relation guided bootstrapping of semantic lexicons. In Proceedings of the ACL ’11, Portland, USA, ACL, pp. 266–70.Google Scholar

Miller, G., 1995. WordNet: a lexical database for English. Communications of the ACM 38 (11): 39–41.Google Scholar

Mintz, M., Bills, S., Snow, R., and Jurafsky, D., 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the ACL ’09, Singapore, ACL, pp. 1003–11.Google Scholar

Mohamed, T., Hruschka, E. Jr, and Mitchell, T., 2011. Discovering relations between noun categories. In Proceedings of the ACL’11, Portland, USA, ACL, pp. 1447–55.Google Scholar

Moro, A., and Navigli, R., 2013. Integrating syntactic and semantic analysis into the open information extraction paradigm. In Proceedings of the IJCAI ’13, Beijing, China, AAAI, pp. 2148–54.Google Scholar

Moro, A., Li, H., Krause, S., Xu, F., Navigli, R., and Uszkoreit, H., 2013. Semantic rule filtering for web-scale relation extraction. In Proceedings of the ISWC’13, Sydney, Australia, Springer, pp. 347–62.Google Scholar

Nallapati, R., Feng, A., Peng, F., and Allan, J., 2004. Event threading within news topics. In Proceedings of the CIKM ’04, Washington, USA, ACM, pp. 446–53.CrossRef Google Scholar

Nothman, J., Ringland, N., Radford, W., Murphy, T., and Curran, J., 2013. Learning multilingual named entity recognition from Wikipedia. Artificial Intelligence 194 : 151–75.Google Scholar

Padró, L., Agić, Ž., Carreras, X., Fortuna, B., Garcia-Cuesta, E., Li, Z., Štajner, T., and Tadić, M., 2014. Language processing infrastructure in the xlike project. In Proceedings of the LREC ’14, Reykjavik, Iceland, ELRA, pp. 3811–6.Google Scholar

Parikh, R., and Karlapalem, K., 2013. Et: events from tweets. In Proceedings of the WWW ’13, Rio de Janeiro, Brazil, IW3C2, pp. 613–20.Google Scholar

Phan, X. H., and Nguyen, C. T. 2007. GibbsLDA++: AC/C++ implementation of latent Dirichlet allocation. Technical Report. see http://gibbslda.sourceforge.net/.Google Scholar

Piskorski, J., Tanev, H., Atkinson, M., Van Der Goot, E., and Zavarella, V. 2011. Online news event extraction for global crisis surveillance. In Nguyen, N. T. (ed.) Transactions on Computational Collective Intelligence V, pp. 182–212. Berlin, Heidelberg: Springer.Google Scholar

Riedel, S., Yao, L., McCallum, A., and Marlin, B., 2013. Relation extraction with matrix factorization and universal schemas. In Proceedings of the HLT-NAACL ’13, Atlanta, USA, ACL, pp. 721–9.Google Scholar

Ritter, A., Mausam, Etzioni, O., and Clark, S., 2012. Open domain event extraction from twitter. In Proceedings of the SIGKDD ’12, Beijing, China, ACM, pp. 1104–12.Google Scholar

Roth, D., and Yih, W., 2002. Probabilistic reasoning for entity & relation recognition. In Proceedings of the COLING ’02, Taipei, Taiwan, ACL, pp. 1–7.Google Scholar

Roth, D., and Yih, W. 2007. Global inference for entity and relation identification via a linear programming formulation. In Introduction to Statistical Relational Learning, pp. 553–80. Cambridge, USA: MIT Press.Google Scholar

Rospocher, M., van Erp, M., Vossen, P., Fokkens, A., Aldabe, I., Rigau, G., Soroa, A., Ploeger, T., and Bogaard, T. 2016. Building event-centric knowledge graphs from news. Journal of Web Semantics, 37, 132–51.Google Scholar

Sowa, J. F., 1984. Conceptual Structures: Information Processing in Mind and Machine. Boston, USA: Addison-Wesley Pub.Google Scholar

Suchanek, F., Kasneci, G., and Weikum, G., 2007. Yago: A core of semantic knowledge. In Proceedings of the WWW ’07, Banff, Canada, IW3C2, pp. 690–706.Google Scholar

Sun, L., and Han, X., 2014. A feature-enriched tree kernel for relation extraction. In Proceedings of the ACL’14, Baltimore, USA, ACL, pp. 61–7.Google Scholar

Takamatsu, S., Sato, I., and Nakagawa, H., 2012. Reducing wrong labels in distant supervision for relation extraction. In Proceedings of the ACL ’12, Jeju, Korea, ACL, pp. 721–9.Google Scholar

Tang, B., Cao, H., Wang, X., Chen, Q., and Xu, H. 2014. Evaluating word representation features in biomedical named entity recognition tasks. BioMed Research International, 2014, 1–6.Google Scholar PubMed

Trieschnigg, D., and Kraaij, W. 2004. TNO Hierarchical topic detection report at TDT 2004. In Proceedings of the Topic Detection and Tracking Workshop.Google Scholar

Vossen, P., Agerri, R., Aldabe, I., Cybulska, A., van Erp, M., Fokkens, A., Laparra, E., Minard, A. L., Aprosio, A. P., Rigau, G., Rospocher, M., and Segers, R., 2016. NewsReader: Using knowledge resources in a cross-lingual reading machine to generate more knowledge from massive streams of news. Knowledge-Based Systems 110 : 60–85.CrossRef Google Scholar

Wang, W., Besançon, R., Ferret, O., and Grau, B., 2011. Filtering and clustering relations for unsupervised information extraction in open domain. In Proceedings of the CIKM ’11, Glasgow, Scotland, ACM, pp. 1405–14.Google Scholar

Weld, D. S., Hoffmann, R., and Wu, F. 2009. Using wikipedia to bootstrap open information extraction. ACM SIGMOD Record 37 (4): 266–70.Google Scholar

Xu, Y., Kim, M., Quinn, K., Goebel, R., and Barbosa, D., 2013. Open information extraction with tree kernels. In Proceedings of the HLT-NAACL ’13, Atlanta, USA, ACL, pp. 868–77.Google Scholar

Yang, Y., Carbonell, J. G., Brown, R. D., Pierce, T., Archibald, B. T., and Liu, X., 1999. Learning approaches for detecting and tracking news events. IEEE Intelligent Systems and Their Applications 14 (4): 32–43.Google Scholar

Zelenko, D., Aone, C., and Richardella, A., 2003. Kernel methods for relation extraction. Journal of Machine Learning Research 3 : 1083–106.Google Scholar

Zeng, D., Liu, K., Lai, S., Zhou, G., and Zhao, J., 2014. Relation classification via convolutional deep neural network. In Proceedings of the COLING’14, Nantes, France, ACL, pp. 2335–44.Google Scholar

Zhang, M., Zhang, J., Su, J., and Zhou, G., 2006. A composite kernel to extract relations between entities with both flat and structured features. In Proceedings of the COLING-ACL ’06, Sydney, Australia, ACL, pp. 825–32.Google Scholar

Zhang, S., Duh, K., and Van Durme, B., 2017. MT/IE: Cross-lingual open information extraction with neural sequence-to-sequence models. In Proceedings of the EACL ’17, Valencia, Spain, ACL, pp. 64–70.Google Scholar

Zhang, P., Li, W., Hou, Y., and Song, D., 2011. Developing position structure-based framework for Chinese entity relation extraction. ACM Transactions on Asian Language Information Processing 10 : 14.Google Scholar

Zhang, Y., and Callan, J. 2004. CMU DIR supervised tracking report. In Proceedings of the DARPA Workshop.Google Scholar

Zhou, G., Su, J., Zhang, J., and Zhang, M., 2005. Exploring various knowledge in relation extraction. In Proceedings of the ACL ’05, Ann Arbor, Michigan, ACL, pp. 427–34.Google Scholar

Zhu, J., Nie, Z., Liu, X., Zhang, B., and Wen, J., 2009. StatSnowball: a statistical approach to extracting entity relationships. In Proceedings of the WWW ’09, Raleigh, North Carolina, IW3C2, pp. 101–10.Google Scholar

Zhou, Y., Huang, C., Gao, J., and Wu, L., 2005. Transformation based Chinese entity detection and tracking. In Proceedings of the IJCNLP ’05, Jeju Island, Korea, Springer, pp. 232–7.Google Scholar

Article contents

Exploring open information via event network†

Abstract

Information

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests