Skip to main content
    • Aa
    • Aa

Data mining for building knowledge bases: techniques, architectures and applications

  • Alfred Krzywicki (a1), Wayne Wobcke (a1), Michael Bain (a1), John Calvo Martinez (a1) and Paul Compton (a1)...

Data mining techniques for extracting knowledge from text have been applied extensively to applications including question answering, document summarisation, event extraction and trend monitoring. However, current methods have mainly been tested on small-scale customised data sets for specific purposes. The availability of large volumes of data and high-velocity data streams (such as social media feeds) motivates the need to automatically extract knowledge from such data sources and to generalise existing approaches to more practical applications. Recently, several architectures have been proposed for what we call knowledge mining: integrating data mining for knowledge extraction from unstructured text (possibly making use of a knowledge base), and at the same time, consistently incorporating this new information into the knowledge base. After describing a number of existing knowledge mining systems, we review the state-of-the-art literature on both current text mining methods (emphasising stream mining) and techniques for the construction and maintenance of knowledge bases. In particular, we focus on mining entities and relations from unstructured text data sources, entity disambiguation, entity linking and question answering. We conclude by highlighting general trends in knowledge mining research and identifying problems that require further research to enable more extensive use of knowledge bases.

Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

C. C. Aggarwal & C. Zhai 2012. Mining Text Data. Springer.

G. Antoniou & F. van Harmelen 2009. Web ontology language (OWL). In Handbook on Ontologies, Staad S. & Studer R. (eds). Springer, 91110.

P. Cimiano , V. Lopez , C. Unger , E. Cabrio , A.-C. N. Ngomo & S. Walter 2013. Multilingual Question Answering over Linked Data (QALD-3): lab overview. In Information Access Evaluation. Multilinguality, Multimodality, and Visualization, Forner P., Müller H., Paredes R., Rosso P. & Stein B. (eds). Springer-Verlag, 321332.

P. Compton & R. Jansen 1990. A philosophical basis for knowledge acquisition. Knowledge Acquisition 2, 241258.

C. Cortes & V. Vapnik 1995. Support-vector networks. Machine Learning 20, 273297.

O. Etzioni , M. Cafarella , D. Downey , A.-M. Popescu , T. Shaked , S. Soderland , D. S. Weld & A. Yates 2005. Unsupervised named-entity extraction from the web: an experimental study. Artificial Intelligence 165, 91134.

B. Furht & A. Escalante 2011. Handbook of Data Intensive Computing. Springer Science & Business Media.

J. Gama 2012. A survey on learning from data streams: current and future trends. Progress in Artificial Intelligence 1, 4555.

J. Gama , I. Žliobaitė , A. Bifet , M. Pechenizkiy & A. Bouchachia 2014. A survey on concept drift adaptation. ACM Computing Surveys (CSUR) 46, 44.

A. Gattani , D. S. Lamba , N. Garera , M. Tiwari , X. Chai , S. Das , S. Subramaniam , A. Rajaraman , V. Harinarayan & A. Doan 2013. Entity extraction, linking, classification, and tagging for social media: a Wikipedia-based approach. Proceedings of the VLDB Endowment 6, 11261137.

L. Geng & H. J. Hamilton 2006. Interestingness measures for data mining: a survey. ACM Computing Surveys (CSUR) 38, 132.

T. R. Gruber 1993. A translation approach to portable ontology specifications. Knowledge Acquisition 5, 199220.

A. Gupta , I. S. Mumick & V. S. Subrahmanian 1993. Maintaining views incrementally. ACM SIGMOD Record 22, 157166.

J. Hoffart , F. M. Suchanek , K. Berberich & G. Weikum 2013. YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence 194, 2861.

M. H. Kim & P. Compton 2012a. Improving open information extraction for informal web documents with ripple-down rules. In Knowledge Management and Acquisition for Intelligent Systems, Richards D. & Kang B. H. (eds). Springer-Verlag, 160174.

A. Krzywicki & W. Wobcke 2011. Exploiting concept clumping for efficient incremental news article categorization. In Advanced Data Mining and Applications, Tang J., King I., Chen L. & Wang J. (eds). Springer-Verlag, 353366.

J. Li , G. A. Wang & H. Chen 2011. Identity matching using personal and social identity features. Information Systems Frontiers 13, 101113.

T. Nasukawa & T. Nagano 2001. Text analysis and knowledge mining system. IBM Systems Journal 40, 967984.

J. Z. Pan 2009. Resource description framework. In Handbook on Ontologies, Staad S. & Studer R. (eds). Springer, 7190.

P. A. Schrodt , S. G. Davis & J. L. Weddle 1994. Political science: KEDS—a program for the machine coding of event data. Social Science Computer Review 12, 561587.

J. Shin , S. Wu , F. Wang , C. D. Sa , C. Zhang & C. 2015. Incremental knowledge base construction using DeepDive. Proceedings of the VLDB Endowment 8, 13101321.

T. Tudorache , N. F. Noy , S. Tu & M. A. Musen 2008. Supporting collaborative ontology development in protégé. In The Semantic Web − ISWC 2008, Sheth A., Staab S., Dean M., Paolucci M., Maynard D., Finin T. & Thirunarayan K. (eds). Springer-Verlag, 1732.

H. Van Dyke Parunak , R. Rohwer , T. Belding & S. Brueckner 2007. Dynamic decentralized any-time hierarchical clustering. In Engineering Self-Organising Systems, Brueckner S., Hassas S., Jelasity M. & Yamins D. (eds). Springer-Verlag, 6681.

G. Widmer 1997. Tracking context changes through meta-learning. Machine Learning 27, 259286.

Y. Yang , J. G. Carbonell , R. D. Brown , T. Pierce , B. T. Archibald & X. Liu 1999. Learning approaches for detecting and tracking news events. IEEE Intelligent Systems 14, 3243.

J. M. Zacks & B. Tversky 2001. Event structure in perception and conception. Psychological Bulletin 127, 321.

A. Nenkova & K. McKeown 2012. A Survey of Text Summarization Techniques. In Mining Text Data. Aggarwal C. C. and Zhai C. (eds). Springer Science+Business Media, 43–76.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

The Knowledge Engineering Review
  • ISSN: 0269-8889
  • EISSN: 1469-8005
  • URL: /core/journals/knowledge-engineering-review
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 13
Total number of PDF views: 173 *
Loading metrics...

Abstract views

Total abstract views: 771 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 25th September 2017. This data will be updated every 24 hours.