Skip to main content
    • Aa
    • Aa

Emerging approaches in literature-based discovery: techniques and performance review

  • Yakub Sebastian (a1), Eu-Gene Siew (a2) and Sylvester O. Orimaye (a3)

Literature-based discovery systems aim at discovering valuable latent connections between previously disparate research areas. This is achieved by analyzing the contents of their respective literatures with the help of various intelligent computational techniques. In this paper, we review the progress of literature-based discovery research, focusing on understanding their technical features and evaluating their performance. The present literature-based discovery techniques can be divided into two general approaches: the traditional approach and the emerging approach. The traditional approach, which dominate the current research landscape, comprises mainly of techniques that rely on utilizing lexical statistics, knowledge-based and visualization methods in order to address literature-based discovery problems. On the other hand, we have also observed the births of new trends and unprecedented paradigm shifts among the recently emerging literature-based discovery approach. These trends are likely to shape the future trajectory of the next generation literature-based discovery systems.

Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

T. Bekhuis 2006. Conceptual biology, hypothesis discovery, and text mining: Swanson’s legacy. Biomedical Digital Libraries 3(1), 1.

L. Bornmann & R. Mutz 2015. Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. Journal of the Association for Information Science and Technology 66(11), 22152222.

K. W. Boyack & R. Klavans 2010. Co-citation analysis, bibliographic coupling, and direct citation: which citation approach represents the research front most accurately? Journal of the American Society for Information Science and Technology 61(12), 23892404.

K. W. Boyack , H. Small & R. Klavans 2013. Improving the accuracy of co-citation clustering using full text. Journal of the American Society for Information Science and Technology 64(9), 17591767.

S. Brin & L. Page 2012. Reprint of: the anatomy of a large-scale hypertextual web search engine. Computer Networks 56(18), 38253833.

M. Callon , J.-P. Courtial , W. A. Turner & S. Bauin 1983. From translations to problematic networks: an introduction to co-word analysis. Social Science Information 22(2), 191235.

D. Cameron , O. Bodenreider , H. Yalamanchili , T. Danh , S. Vallabhaneni , K. Thirunarayan , A. P. Sheth & T. C. Rindflesch 2013. A graph-based recovery and decomposition of Swanson’s hypothesis using semantic predications. Journal of Biomedical Informatics 46(2), 238251.

D. Cameron , R. Kavuluru , T. C. Rindflesch , A. P. Sheth , K. Thirunarayan & O. Bodenreider 2015. Context-driven automatic subgraph creation for literature-based discovery. Journal of Biomedical Informatics 54, 141157.

J. Chang & D. M. Blei 2010. Hierarchical relational models for document networks. The Annals of Applied Statistics 4(1), 124150.

C. Chen 2012. Predictive effects of structural variation on citation counts. Journal of the American Society for Information Science and Technology 63(3), 431449.

C. Chen , Y. Chen , M. Horowitz , H. Hou , Z. Liu & D. Pellegrino 2009. Towards an explanatory and computational theory of scientific discovery. Journal of Informetrics 3(3), 191209.

A. M. Cohen & W. R. Hersh 2005. A survey of current work in biomedical text mining. Briefings in Bioinformatics 6(1), 5771.

P. R. Cohen 2015. Darpa’s big mechanism program. Physical Biology 12(4), 045008.

T. Cohen , R. Schvaneveldt & D. Widdows 2010. Reflective random indexing and indirect inference: a scalable method for discovery of implicit connections. Journal of Biomedical Informatics 43(2), 240256.

T. Cohen , D. Widdows , R. W. Schvaneveldt , P. Davies & T. C. Rindflesch 2012. Discovering discovery patterns with predication-based semantic indexing. Journal of Biomedical Informatics 45(6), 10491065.

K. A. Cory 1997. Discovering hidden analogies in an online humanities database. Computers and the Humanities 31(1), 112.

R. Davies 1989. The creation of new knowledge by information retrieval and classification. Journal of Documentation 45(4), 273301.

S. Deerwester , S. T. Dumais , G. W. Furnas , T. K. Landauer & R. Harshman 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391.

R. A. DiGiacomo , J. M. Kremer & D. M. Shah 1989. Fish-oil dietary supplementation in patients with Raynaud’s phenomenon: a double-blind, controlled, prospective study. The American Journal of Medicine 86(2), 158164.

Y. Ding , M. Song , J. Han , Q. Yu , E. Yan , L. Lin & T. Chambers 2013. Entitymetrics: measuring the impact of entities. PloS One 8(8), e71416.

L. Eronen & H. Toivonen 2012. Biomine: predicting links between biological entities using network models of heterogeneous databases. BMC Bioinformatics 13(1), 1.

L. C. Freeman 1978. Centrality in social networks conceptual clarification. Social Networks 1(3), 215239.

R. Frijters , M. Van Vugt , R. Smeets , R. Van Schaik , J. De Vlieg & W. Alkema 2010. Literature mining for the discovery of hidden connections between drugs, genes and diseases. PLoS Computational Biology 6(9), e1000943.

L. Getoor & C. P. Diehl 2005. Link mining: a survey. ACM SIGKDD Explorations Newsletter 7(2), 312.

M. D. Gordon & S. Dumais 1998. Using latent semantic indexing for literature based discovery. Journal of the American Society for Information Science 49(8), 674685.

M. D. Gordon & R. K. Lindsay 1996. Toward discovery support systems: a replication, re-examination, and extension of Swanson’s work on literature-based discovery of a connection between Raynaud’s and fish oil. Journal of the American Society for Information Science 47(2), 116128.

M. Gordon , R. K. Lindsay & W. Fan 2002. Literature-based discovery on the world wide web. ACM Transactions on Internet Technology 2(4), 261275.

U. Hahn , K. B. Cohen , Y. Garten & N. H. Shah 2012. Mining the pharmacogenomics literature: a survey of the state of the art. Briefings in Bioinformatics 13(4), 460494.

V. Ittipanuvat , K. Fujita , I. Sakata & Y. Kajikawa 2014. Finding linkage between technology and social issue: a literature based discovery approach. Journal of Engineering and Technology Management 32, 160184.

F. Janssens , W. Glänzel & B. De Moor 2008. A hybrid mapping of information science. Scientometrics 75(3), 607631.

L. J. Jensen , J. Saric & P. Bork 2006. Literature mining for the biologist: from information retrieval to biological discovery. Nature Reviews Genetics 7(2), 119129.

M. Juršič , B. Sluban , B. Cestnik , M. Grčar & N. Lavrač 2012. Bridging concept identification for constructing information networks from text documents. In Bisociative Knowledge Discovery: An Introduction to Concept, Algorithms, Tools, and Applications, M. R. Berthold (ed.). Springer Berlin Heidelberg, 6690.

M. M. Kessler 1963. Bibliographic coupling between scientific papers. American Documentation 14(1), 1025.

J. M. Kleinberg 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604632.

R. N. Kostoff 2007. Validating discovery in literature-based discovery. Journal of Biomedical Informatics 40(4), 448450.

R. N. Kostoff 2008. Literature-related discovery (LRD): potential treatments for cataracts. Technological Forecasting and Social Change 75(2), 215225.

R. N. Kostoff 2012. Literature-related discovery and innovation update. Technological Forecasting and Social Change 79(4), 789800.

R. N. Kostoff 2014. Literature-related discovery: common factors for Parkinson’s disease and Crohn’s disease. Scientometrics 100(3), 623657.

R. N. Kostoff , J. A. Block , J. L. Solka , M. B. Briggs , R. L. Rushenberg , J. A. Stump , D. Johnson , T. J. Lyons & J. R. Wyatt 2009. Literature-related discovery. Annual Review of Information Science and Technology 43(1), 171.

R. N. Kostoff & M. B. Briggs 2008. Literature-related discovery (LRD): potential treatments for Parkinson’s disease. Technological Forecasting and Social Change 75(2), 226238.

R. N. Kostoff , M. B. Briggs & T. J. Lyons 2008. Literature-related discovery (LRD): potential treatments for multiple sclerosis. Technological Forecasting and Social Change 75(2), 239255.

R. N. Kostoff , J. L. Solka , R. L. Rushenberg & J. A. Wyatt 2008. Literature-related discovery (LRD): water purification. Technological Forecasting and Social Change 75(2), 256275.

P. O. Larsen & M. Von Ins 2010. The rate of growth in scientific publication and the decline in coverage provided by science citation index. Scientometrics 84(3), 575603.

C. Li , M. Liakata & D. Rebholz-Schuhmann 2014. Biological network extraction from scientific literature: state of the art and challenges. Briefings in Bioinformatics 15(5), 856877.

J. Li , X. Zhu & J. Y. Chen 2010. Discovering breast cancer drug candidates from biomedical literature. International Journal of Data Mining and Bioinformatics 4(3), 241255.

C. D. Manning , P. Raghavan & H. Schütze 2008. Introduction to Information Retrieval. Cambridge University Press.

H. Nakamura , S. Ii , H. Chida , K. Friedl , S. Suzuki , J. Mori & Y. Kajikawa 2014. Shedding light on a neglected area: a new approach to knowledge creation. Sustainability Science 9(2), 193204.

V. Narayanasamy , S. Mukhopadhyay , M. Palakal & D. A. Potter 2004. Transminer: Mining transitive associations among biological objects from text. Journal of Biomedical Science 11(6), 864873.

M. E. Newman 2001. The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences 98(2), 404409.

M. E. Newman 2003. The structure and function of complex networks. SIAM Review 45(2), 167256.

M. E. Newman 2004. Fast algorithm for detecting community structure in networks. Physical Review E 69(6), 066133.

C. Perez-Iratxeta , M. Wjst , P. Bork & M. A. Andrade 2005. G2d: a tool for mining genes associated with disease. BMC Genetics 6(1), 1.

G. Piatetsky-Shapiro , C. Djeraba , L. Getoor , R. Grossman , R. Feldman & M. Zaki 2006. What are the grand challenges for data mining?: Kdd-2006 panel report. ACM SIGKDD Explorations Newsletter 8(2), 7077.

J. Preiss & R. Stevenson 2016. The effect of word sense disambiguation accuracy on literature based discovery. BMC Medical Informatics and Decision Making 16(Suppl 1), 57.

Y. Sebastian , E.-G. Siew & S. O. Orimaye 2017. Learning the heterogeneous bibliographic information network for literature-based discovery. Knowledge-Based Systems 115, 6679.

K. Seki 2015. Hypothesis discovery exploiting closed chains of relation. In A. Hameurlain, J. Küng & R. Wagner (eds). Transactions on Large-Scale Data- and Knowledge-Centered Systems XXII. Springer Berlin Heidelberg, 145164.

N. Shang , H. Xu , T. C. Rindflesch & T. Cohen 2014. Identifying plausible adverse drug reactions using knowledge extracted from the literature. Journal of Biomedical Informatics 52, 293310.

N. R. Smalheiser 2012. Literature-based discovery: beyond the ABCs. Journal of the American Society for Information Science and Technology 63(2), 218224.

N. R. Smalheiser & D. R. Swanson 1996a. Indomethacin and Alzheimer’s disease. Neurology 46(2), 583583.

N. R. Smalheiser & D. R. Swanson 1996b. Linking estrogen to Alzheimer’s disease an informatics approach. Neurology 47(3), 809810.

N. R. Smalheiser & V. I. Torvik 2008. The place of literature-based discovery in contemporary scientific practice. In P. Bruza & M. Weeber (eds). Literature-Based Discovery. Springer Berlin Heidelberg, 1322.

H. Small 2010. Maps of science as interdisciplinary discourse: co-citation contexts and the role of analogy. Scientometrics 83(3), 835849.

M. Song , N.-G. Han , Y.-H. Kim , Y. Ding & T. Chambers 2013. Discovering implicit entity relation with the gene-citation-gene network. PloS One 8(12), e84639.

M. Song , G. E. Heo & Y. Ding 2015. SemPathFinder: semantic path analysis for discovering publicly unknown knowledge. Journal of Informetrics 9(4), 686703.

P. Srinivasan 2004. Text mining: generating hypotheses from medline. Journal of the American Society for Information Science and Technology 55(5), 396413.

P. Srinivasan & B. Libbus 2004. Mining medline for implicit links between dietary substances and diseases. Bioinformatics 20(Suppl 1), i290i296.

J. Stegmann & G. Grohmann 2003. Hypothesis generation guided by co-word clustering. Scientometrics 56(1), 111135.

Y. Sun & J. Han 2012. Mining heterogeneous information networks: principles and methodologies. Synthesis Lectures on Data Mining and Knowledge Discovery 3(2), 1159.

D. Swanson 2008. Literature-based discovery? The very idea. In Literature-Based Discovery, Peter Bruza & Marc Weeber (eds.). Springer, 311.

D. R. Swanson 1979. Libraries and the growth of knowledge. The Library Quarterly 49(1), 325.

D. R. Swanson 1986a. Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspectives in Biology and Medicine 30(1), 718.

D. R. Swanson 1986b. Undiscovered public knowledge. The Library Quarterly 56(2), 103118.

D. R. Swanson 1987. Two medical literatures that are logically but not bibliographically connected. Journal of the American Society for Information Science 38(4), 228.

D. R. Swanson 1988. Migraine and magnesium: eleven neglected connections. Perspectives in Biology and Medicine 31(4), 526557.

D. R. Swanson & N. R. Smalheiser 1997. An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artificial Intelligence 91(2), 183203.

R. Tarjan 1972. Depth-first search and linear graph algorithms. SIAM Journal on Computing 1(2), 146160.

V. I. Torvik & N. R. Smalheiser 2007. A quantitative model for linking two disparate sets of articles in medline. Bioinformatics 23(13), 16581665.

B. Uzzi , S. Mukherjee , M. Stringer & B. Jones 2013. Atypical combinations and scientific impact. Science 342(6157), 468472.

R. E. Valdés-Pérez 1999. Principles of human-computer collaboration for knowledge discovery in science. Artificial Intelligence 107(2), 335346.

H.H. van Haagen , P. AC’t Hoen , A.B. Bovo , A. de Morrée , E.M. van Mulligen , C. Chichester , J.A. Kors , J.T. den Dunnen , G.J.B. van Ommen , S.M. van der Maarel & V.M. Kern 2009. Novel protein-protein interactions inferred from literature context. PLoS One 4(11), e7894.

H. H. van Haagen , P. A. ’t Hoen , A. de Morree , W. van Roon-Mom , D. J. Peters , M. Roos , B. Mons , G.-J. van Ommen & M. J. Schuemie 2011. In silico discovery and experimental validation of new protein–protein interactions. Proteomics 11(5), 843853.

L. Waltman & N. J. Eck 2012. A new methodology for constructing a publication-level classification system of science. Journal of the American Society for Information Science and Technology 63(12), 23782392.

M. Weeber , H. Klein , L. de Jong-van den Berg & R. Vos 2001. Using concepts in literature-based discovery: simulating Swanson’s Raynaud–fish oil and migraine–magnesium discoveries. Journal of the American Society for Information Science and Technology 52(7), 548557.

M. Weeber , J. A. Kors & B. Mons 2005. Online tools to support literature-based discovery in the life sciences. Briefings in Bioinformatics 6(3), 277286.

M. Weeber , R. Vos , H. Klein , A. R. Aronson & G. Molema 2003. Generating hypotheses by discovering implicit associations in the literature: a case report of a search for new potential therapeutic uses for thalidomide. Journal of the American Medical Informatics Association 10(3), 252259.

H. D. White & B. C. Griffith 1981. Author cocitation: a literature measure of intellectual structure. Journal of the American Society for Information Science 32(3), 163171.

J. D. Wren 2004. Extending the mutual information measure to rank inferred literature relationships. BMC Bioinformatics 5(1), 1.

J. D. Wren 2008. The ‘open discovery’ challenge. In Literature-Based Discovery, P. Bruza & M. Weeber (eds). Springer Berlin Heidelberg, 3955.

J. D. Wren , R. Bekeredjian , J. A. Stewart , R. V. Shohet & H. R. Garner 2004. Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics 20(3), 389398.

Y. Yamamoto & T. Takagi 2007. Biomedical knowledge navigation by literature clustering. Journal of Biomedical Informatics 40(2), 114130.

M. Yetisgen-Yildiz & W. Pratt 2006. Using statistical and knowledge-based approaches for literature-based discovery. Journal of Biomedical Informatics 39(6), 600611.

M. Yetisgen-Yildiz & W. Pratt 2008. Evaluation of literature-based discovery systems. In Literature-Based Discovery, P. Bruza & M. Weeber (eds). Springer Berlin Heidelberg, 101113.

M. Yetisgen-Yildiz & W. Pratt 2009. A new evaluation methodology for literature-based discovery systems. Journal of Biomedical Informatics 42(4), 633643.

H. Youn , D. Strumsky , L. M. Bettencourt & J. Lobo 2015. Invention as a combinatorial process: evidence from US patents. Journal of The Royal Society Interface 12(106), 20150272.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

The Knowledge Engineering Review
  • ISSN: 0269-8889
  • EISSN: 1469-8005
  • URL: /core/journals/knowledge-engineering-review
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 8
Total number of PDF views: 58 *
Loading metrics...

Abstract views

Total abstract views: 318 *
Loading metrics...

* Views captured on Cambridge Core between 16th May 2017 - 21st September 2017. This data will be updated every 24 hours.