Skip to main content
×
Home

Exploring open information via event network

  • YANPING CHEN (a1) (a2), QINGHUA ZHENG (a2), FENG TIAN (a3), HUAN LIU (a2), YAZHOU HAO (a2) and NAZARAF SHAH (a4)...
Abstract
Abstract

It is a challenging task to discover information from a large amount of data in an open domain. 1 In this paper, an event network framework is proposed to address this challenge. It is in fact an empirical construct for exploring open information, composed of three steps: document event detection, event network construction and event network analysis. First, documents are clustered into document events for reducing the impact of noisy and heterogeneous resources. Secondly, linguistic units (e.g., named entities or entity relations) are extracted from each document event and combined into an event network, which enables content-oriented retrieval. Then, in the final step, techniques such as social network or complex network can be applied to analyze the event network for exploring open information. In the implementation section, we provide examples of exploring open information via event network.

Copyright
Footnotes
Hide All

This research is supported in part by the National Science Foundation of China under grant numbers 201721002, 61462011, 61540050 and 61472315; The Fundamental Theory and Applications of Big Data with Knowledge Engineering under the National Key Research and Development Program of China with grant number 2016YFB1000903, Project of China Knowledge Centre for Engineering Science and Technology, and the Ministry of Education Innovation Research Team no. IRT13035. The Open project no. 2017BDKFJJ018; the Major Applied Basic Research Program of Guizhou Province no. JZ20142001. Introduce Talents Science Projects of Guizhou University no. 201650.

Footnotes
References
Hide All
Agichtein E., and Gravano L., 2000. Snowball: extracting relations from large plain-text collections. In Proceedings of the DL ’00, San Antonio, USA, ACM, pp. 8594.
Ahn D., 2006. The stages of event extraction. In Proceedings of the ARTE ’06, Sydney, Australia, ACL, pp. 18.
Alex B., Haddow B., and Grover C., 2007. Recognising nested named entities in biomedical text. In Proceedings of the BioNLP ’07, Prague, Czech Republic, ACL, pp. 6572.
Allan J., Carbonell J., Doddington G., Yamron J., and Yang Y. 1998. Topic detection and tracking pilot study: final report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop.
Angel A., Sarkas N., Koudas N., and Srivastava D., 2012. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. Proceedings of the VLDB Endowment 5 (6): 574–85.
Angeli G., Premkumar M., and Manning C., 2015. Leveraging linguistic structure for open domain information extraction. In Proceedings of the ACL ’15, Beijing, China, ACL, pp. 344–54.
Auer S., Bizer C., Kobilarov G., Lehmann J., Cyganiak R., and Ives Z. 2007. Dbpedia: a nucleus for a web of open data. In The Semantic Web, pp. 722–35. Springer.
Banko M., Cafarella M. J., Soderland S., Broadhead M., and Etzioni O., 2007. Open information extraction for the web. In Proceedings of the IJCAI ’07, Hyderabad, India, AAAI, pp. 2670–6.
Banko M., Etzioni O., and Center T. 2008. The tradeoffs between open and traditional relation extraction. In Proceedings of the ACL ’08, NAACL, pp. 2836.
Batagelj V., and Mrvar A., 1998. Pajek-program for large network analysis. Connections 21 (2): 4757.
Blei D., Ng A., and Jordan M., 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3 : 9931022.
Bollacker K., Evans C., Paritosh P., Sturge T., and Taylor J., 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the SIGMOD ’08, Vancouver, Canada, ACM, pp. 1247–50.
Brin S. 1998. Extracting patterns and relations from the world wide web. In The World Wide Web and Databases, pp. 172–83. Springer.
Carpenter B., 2006. Character language models for Chinese word segmentation and named entity recognition. In Proceedings of the SIGHAN ’06, Sydney, Australia, ACL, pp. 169–72.
Che W., Liu T., and Li S., 2005. Automatic entity relation extraction. Journal of Chinese Information Processing 19 : 16.
Chen A., Peng F., Shan R., and Sun G., 2006. Chinese named entity recognition with conditional probabilistic models. In Proceedings of the SIGHAN ’06, Sydney, Australia, ACL, pp. 173–6.
Chen W., Zhang Y., and Isahara H., 2006. Chinese named entity recognition with conditional random fields. In Proceedings of the SIGHAN ’06, Sydney, Australia, ACL, pp. 118–21.
Chen Y., Ouyang Y., Li W., Zheng D., and Zhao T., 2010. Using deep belief nets for Chinese named entity categorization. In Proceedings of the NEWS ’10, Uppsala, Sweden, ACL, pp. 102–9.
Chen Y., Zheng Q., and Chen P., 2015a. A boundary assembling method for chinese entity mention recognition. IEEE Intelligent Systems 30 (6): 50–8.
Chen Y., Zheng Q., and Chen P., 2015b. Feature assembly method for extracting relations in Chinese. Artificial Intelligence 228 : 179–94.
Chen Y., Zheng Q., and Zhang W., 2014. Omni-word feature and soft constraint for chinese relation extraction. In Proceedings of the ACL’14, Baltimore, USA, ACL, pp. 572–81.
Chiu J., and Nichols E. 2015. Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics, 4: 357–70.
Collins M., and Duffy N., 2001. Convolution kernels for natural language. In Proceedings of the NIPS ’01, Vancouver, Canada, pp. 625–32.
Csardi G., and Nepusz T., 2006. The igraph software package for complex network research. InterJournal, Complex Systems 1695 (5): 19.
Curran J., Murphy T., and Scholz B. 2007. Minimising semantic drift with mutual exclusion bootstrapping. In Proceedings of the PACL ’07, Melbourne, Australia, ACL.
Das Sarma A., Jain A., and Yu C., 2011. Dynamic relationship and event discovery. In Proceedings of the WSDM ’11, Hong Kong, China, ACM, pp. 207–16.
Deerwester S., Dumais S., Furnas G., Landauer T., and Harshman R., 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41 (6): 391407.
Derczynski L., Maynard D., Rizzo G., van Erp M., Gorrell G., Troncy R., Petrak J., and Bontcheva K., 2015. Analysis of named entity recognition and linking for tweets. Information Processing & Management 51 : 3249.
Doddington G., Mitchell A., Przybocki M., Ramshaw L., Strassel S., and Weischedel R. 2004. The automatic content extraction (ACE) program–tasks, data, and evaluation. In Proceedings of LREC ’04, Lisbon, Portugal, ELRA 4: 837–40.
Downey D., Schoenmackers S., and Etzioni O. 2007. Sparse information extraction: unsupervised language models to the rescue. In Proceedings of the ACL ’07, Prague, Czech Republic, ACL.
Etzioni O., Cafarella M., Downey D., Popescu A. M., Shaked T., Soderland S., Weld D. S., and Yates A., 2005. Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence 165 : 91134.
Etzioni O., Fader A., Christensen J., Soderland S., and Mausam M. 2011. Open Information extraction: the second generation. In Proceedings of the IJCAI ’11, Barcelona, Spain, AAAI 11: 3–10.
Fu G., and Luke K., 2005. Chinese named entity recognition using lexicalized HMMs. In Proceedings of the SIGKDD ’05, Chicago, USA, ACM, pp. 1925.
Hacioglu K., Douglas B., and Chen Y., 2005. Detection of entity mentions occurring in English and Chinese text. In Proceedings of the HLT-EMNLP ’05, Vancouver, Canada, ACL, pp. 379–86.
Hoffmann R., Zhang C., and Weld D. S., 2010. Learning 5000 relational extractors. In Proceedings of the ACL ’10, Uppsala, Sweden, ACL, vol. 10, pp. 286–95.
Jones K., 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28 (1): 1121.
Kambhatla N., 2004. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relation. In Proceedings of the ACL ’04, Barcelona, Spain, ACL, pp. 178–81.
Kozareva Z., and Hovy E., 2010. Learning arguments and supertypes of semantic relations using recursive patterns. In Proceedings of the ACL’10, Uppsala, Sweden, ACL, pp. 1482–91.
Kuzey E., and Weikum G., 2014. Evin: building a knowledge base of events. In Proceedings of the WWW ’14, Seoul, Korea, IW3C2, pp. 103–6.
Kuzey E., Vreeken J., and Weikum G., 2014. A fresh look on knowledge bases: distilling named events from news. In Proceedings of the CIKM ’14, Shanghai, China, ACM, pp. 1689–98.
Lample G., Ballesteros M., Subramanian S., Kawakami K., and Dyer C. 2016. Neural architectures for named entity recognition. Proceedings of the NAACL-HLT ’16, San Diego, USA, ACL, pp. 260–70.
Lenat D., 1995. CYC: A large-scale investment in knowledge infrastructure. Communications of the ACM 38 (11): 33–8.
Leydesdorff L., and Vaughan L. 2006. Co-occurrence matrices and their applications in information science: Extending ACA to the Web environment. Journal of the Association for Information Science and Technology 57 (12), 1616–28.
Ling G., Asahara M., and Matsumoto Y., 2003. Chinese unknown word identification using character-based tagging and chunking. In Proceedings of the ACL ’03, Sapporo, Japan, ACL, pp. 197200.
Liu M., Liu K., Xu L., and Zhao J., 2014. Exploring fine-grained entity type constraints for distantly supervised relation extraction. In Proceedings of COLING’14, Nantes, France, ACL, pp. 2107–16.
Luhn H., 1957. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development 1 (4): 309–17.
McCallum A., 2005. Information extraction: distilling structured data from unstructured text. Queue 3 (9): 4857.
McIntosh T., Yencken L., Curran J. R., and Baldwin T., 2011. Relation guided bootstrapping of semantic lexicons. In Proceedings of the ACL ’11, Portland, USA, ACL, pp. 266–70.
Miller G., 1995. WordNet: a lexical database for English. Communications of the ACM 38 (11): 3941.
Mintz M., Bills S., Snow R., and Jurafsky D., 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the ACL ’09, Singapore, ACL, pp. 1003–11.
Mohamed T., Hruschka E. Jr, and Mitchell T., 2011. Discovering relations between noun categories. In Proceedings of the ACL’11, Portland, USA, ACL, pp. 1447–55.
Moro A., and Navigli R., 2013. Integrating syntactic and semantic analysis into the open information extraction paradigm. In Proceedings of the IJCAI ’13, Beijing, China, AAAI, pp. 2148–54.
Moro A., Li H., Krause S., Xu F., Navigli R., and Uszkoreit H., 2013. Semantic rule filtering for web-scale relation extraction. In Proceedings of the ISWC’13, Sydney, Australia, Springer, pp. 347–62.
Nallapati R., Feng A., Peng F., and Allan J., 2004. Event threading within news topics. In Proceedings of the CIKM ’04, Washington, USA, ACM, pp. 446–53.
Nothman J., Ringland N., Radford W., Murphy T., and Curran J., 2013. Learning multilingual named entity recognition from Wikipedia. Artificial Intelligence 194 : 151–75.
Padró L., Agić Ž., Carreras X., Fortuna B., Garcia-Cuesta E., Li Z., Štajner T., and Tadić M., 2014. Language processing infrastructure in the xlike project. In Proceedings of the LREC ’14, Reykjavik, Iceland, ELRA, pp. 3811–6.
Parikh R., and Karlapalem K., 2013. Et: events from tweets. In Proceedings of the WWW ’13, Rio de Janeiro, Brazil, IW3C2, pp. 613–20.
Phan X. H., and Nguyen C. T. 2007. GibbsLDA++: AC/C++ implementation of latent Dirichlet allocation. Technical Report. see http://gibbslda.sourceforge.net/.
Piskorski J., Tanev H., Atkinson M., Van Der Goot E., and Zavarella V. 2011. Online news event extraction for global crisis surveillance. In Nguyen N. T. (ed.) Transactions on Computational Collective Intelligence V, pp. 182212. Berlin, Heidelberg: Springer.
Riedel S., Yao L., McCallum A., and Marlin B., 2013. Relation extraction with matrix factorization and universal schemas. In Proceedings of the HLT-NAACL ’13, Atlanta, USA, ACL, pp. 721–9.
Ritter A., Mausam Etzioni, O., and Clark S., 2012. Open domain event extraction from twitter. In Proceedings of the SIGKDD ’12, Beijing, China, ACM, pp. 1104–12.
Roth D., and Yih W., 2002. Probabilistic reasoning for entity & relation recognition. In Proceedings of the COLING ’02, Taipei, Taiwan, ACL, pp. 17.
Roth D., and Yih W. 2007. Global inference for entity and relation identification via a linear programming formulation. In Introduction to Statistical Relational Learning, pp. 553–80. Cambridge, USA: MIT Press.
Rospocher M., van Erp M., Vossen P., Fokkens A., Aldabe I., Rigau G., Soroa A., Ploeger T., and Bogaard T. 2016. Building event-centric knowledge graphs from news. Journal of Web Semantics, 37, 132–51.
Sowa J. F., 1984. Conceptual Structures: Information Processing in Mind and Machine. Boston, USA: Addison-Wesley Pub.
Suchanek F., Kasneci G., and Weikum G., 2007. Yago: A core of semantic knowledge. In Proceedings of the WWW ’07, Banff, Canada, IW3C2, pp. 690706.
Sun L., and Han X., 2014. A feature-enriched tree kernel for relation extraction. In Proceedings of the ACL’14, Baltimore, USA, ACL, pp. 61–7.
Takamatsu S., Sato I., and Nakagawa H., 2012. Reducing wrong labels in distant supervision for relation extraction. In Proceedings of the ACL ’12, Jeju, Korea, ACL, pp. 721–9.
Tang B., Cao H., Wang X., Chen Q., and Xu H. 2014. Evaluating word representation features in biomedical named entity recognition tasks. BioMed Research International, 2014, 16.
Trieschnigg D., and Kraaij W. 2004. TNO Hierarchical topic detection report at TDT 2004. In Proceedings of the Topic Detection and Tracking Workshop.
Vossen P., Agerri R., Aldabe I., Cybulska A., van Erp M., Fokkens A., Laparra E., Minard A. L., Aprosio A. P., Rigau G., Rospocher M., and Segers R., 2016. NewsReader: Using knowledge resources in a cross-lingual reading machine to generate more knowledge from massive streams of news. Knowledge-Based Systems 110 : 6085.
Wang W., Besançon R., Ferret O., and Grau B., 2011. Filtering and clustering relations for unsupervised information extraction in open domain. In Proceedings of the CIKM ’11, Glasgow, Scotland, ACM, pp. 1405–14.
Weld D. S., Hoffmann R., and Wu F. 2009. Using wikipedia to bootstrap open information extraction. ACM SIGMOD Record 37 (4): 266–70.
Xu Y., Kim M., Quinn K., Goebel R., and Barbosa D., 2013. Open information extraction with tree kernels. In Proceedings of the HLT-NAACL ’13, Atlanta, USA, ACL, pp. 868–77.
Yang Y., Carbonell J. G., Brown R. D., Pierce T., Archibald B. T., and Liu X., 1999. Learning approaches for detecting and tracking news events. IEEE Intelligent Systems and Their Applications 14 (4): 3243.
Zelenko D., Aone C., and Richardella A., 2003. Kernel methods for relation extraction. Journal of Machine Learning Research 3 : 1083–106.
Zeng D., Liu K., Lai S., Zhou G., and Zhao J., 2014. Relation classification via convolutional deep neural network. In Proceedings of the COLING’14, Nantes, France, ACL, pp. 2335–44.
Zhang M., Zhang J., Su J., and Zhou G., 2006. A composite kernel to extract relations between entities with both flat and structured features. In Proceedings of the COLING-ACL ’06, Sydney, Australia, ACL, pp. 825–32.
Zhang S., Duh K., and Van Durme B., 2017. MT/IE: Cross-lingual open information extraction with neural sequence-to-sequence models. In Proceedings of the EACL ’17, Valencia, Spain, ACL, pp. 6470.
Zhang P., Li W., Hou Y., and Song D., 2011. Developing position structure-based framework for Chinese entity relation extraction. ACM Transactions on Asian Language Information Processing 10 : 14.
Zhang Y., and Callan J. 2004. CMU DIR supervised tracking report. In Proceedings of the DARPA Workshop.
Zhou G., Su J., Zhang J., and Zhang M., 2005. Exploring various knowledge in relation extraction. In Proceedings of the ACL ’05, Ann Arbor, Michigan, ACL, pp. 427–34.
Zhu J., Nie Z., Liu X., Zhang B., and Wen J., 2009. StatSnowball: a statistical approach to extracting entity relationships. In Proceedings of the WWW ’09, Raleigh, North Carolina, IW3C2, pp. 101–10.
Zhou Y., Huang C., Gao J., and Wu L., 2005. Transformation based Chinese entity detection and tracking. In Proceedings of the IJCNLP ’05, Jeju Island, Korea, Springer, pp. 232–7.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Metrics

Full text views

Total number of HTML views: 1
Total number of PDF views: 24 *
Loading metrics...

Abstract views

Total abstract views: 115 *
Loading metrics...

* Views captured on Cambridge Core between 26th October 2017 - 25th November 2017. This data will be updated every 24 hours.