Skip to main content

Scientific Knowledge Engineering: a conceptual delineation and overview of the state of the art

  • Paulo Sérgio M. Dos Santos (a1) and Guilherme H. Travassos (a1)

As a community work, scientific contributions are usually built incrementally, involving some transformation, expansion or refutation of existing conceptual and propositional networks. As the body of knowledge increases, scientists concentrate more effort on ensuring that new hypotheses and observations are needed and consistent with previous findings. In this paper, we will characterize Knowledge Engineering as an important groundwork for structuring scientific knowledge. We argue that knowledge-based computational infrastructures can support researchers in organizing and making explicit the main aspects needed to make inferences or extract conclusions from an existing body of knowledge. This view is also comparatively built, contrasting it with alternatives for manipulating scientific knowledge, namely data-intensive approaches and the computational discovery of scientific knowledge. The current state of the art is presented with 22 knowledge representations and computational infrastructure implementations, with their main relevant properties analyzed and compared. Based on this review and on the theoretical foundations of Knowledge Engineering, a high level step-by-step approach for specifying and constructing scientific computational environments is described. The paper concludes by indicating paths for further development of the view initiated here, especially related to the technical specificities that originates from applying Knowledge Engineering to scientific knowledge.

Hide All
Atkins, D., Best, D., Briss, P. A., Eccles, M., Falck-Ytter, Y., Flottorp, S., Guyatt, G. H., Harbour, R. T., Haugh, M. C., Henry, D., Hill, S., Jaeschke, R., Leng, G., Liberati, A., Magrini, N., Mason, J., Middleton, P., Mrukowicz, J., O’Connell, D., Oxman, A. D., Phillips, B., Schünemann, H. J., Edejer, T. T.-T., Varonen, H., Vist, G. E.,Williams, J. W. Jr & Zaza, S., GRADE Working Group 2004. Grading quality of evidence and strength of recommendations. BMJ 328(7454), 1490.
Bairoch, A. 2009. The future of annotation/biocuration, Nature Precedings.
Barga, R. & Gannon, D. 2007. Scientific versus business workows. In Workows for e-Science, Taylor I. J., Deelman E., Gannon D. B. & Shields M. (eds). Springer, 916.
Bauer-Mehren, A., Furlong, L. I. & Sanz, F. 2009. Pathway databases and tools for their exploitation: benefits, current limitations and challenges. Molecular Systems Biology 5(290).
Bechhofer, S., Buchan, I., De Roure, D., Missier, P., Ainsworth, J., Bhagat, J., Couch, P., Cruickshank, D., Delderfield, M., Dunlop, I., Gamble, M., Michaelides, D., Owen, S., Newman, D., Sufi, S. & Goble, C. 2013. Why linked data is not enough for scientists. Future Generation Computer Systems 29(2), 599611.
Biolchini, J., Mian, P., Natali, A. & Travassos, G. H. 2005. Systematic review in software engineering. Technical report No. RT-ES 679/05, Federal University of Rio de Janeiro (UFRJ/COPPE).
Booth, A. 2011. Evidence-based practice: triumph of style over substance? Health Information & Libraries Journal 28(3), 237241.
Budgen, D., Turner, M., Brereton, P. & Kitchenham, B. 2008. Using mapping studies in software engineering. In Proceedings of PPIG Psychology of Programming Interest Group, 195–204. Lancaster University.
Bunge, M. 2004. How does it work? The search for explanatory mechanisms. Philosophy of the Social Sciences 34(2), 182210.
Bylander, T. & Chandrasekaran, B. 1987. Generic tasks for knowledge-based reasoning: the ‘right’ level of abstraction for knowledge acquisition. International Journal of Man-Machine Studies 26(2), 231243.
Callahan, A., Dumontier, M. & Shah, N. H. 2011. HyQue: evaluating hypotheses using semantic web technologies. Journal of Biomedical Semantics 2(2), 117.
Chua, C. E. H., Storey, V. C. & Chiang, R. H. 2012. Deriving knowledge representation guidelines by analyzing knowledge engineer behavior. Decision Support Systems 54(1), 304315.
Cohen, A. M. & Hersh, W. R. 2005. A survey of current work in biomedical text mining. Briefings in Bioinformatics 6(1), 5771.
Cooper, H. M., Hedges, L. V. & Valentine, J. C. 2009. The Handbook of Research Synthesis and Meta-Analysis. Russell Sage Foundation.
Craver, C. F. & Darden, L. 2005. Introduction. Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences 36(2), 233244.
da Cruz, S., Campos, M. & Mattoso, M. 2009. Towards a taxonomy of provenance in scientific workow management systems. In 2009 World Conference on Services – I, 259–266.
Deelman, E., Gannon, D., Shields, M. & Taylor, I. 2009. Workows and e-science: an overview of workow system features and capabilities. Future Generation Computer Systems 25(5), 528540.
Dennis, C. 2002. Biology databases: information overload. Nature 417(6884), 14.
Dibble, D. & Bostrom, R. P. 1987. Managing expert systems projects: factors critical for successful implementation. In Proceedings of the Conference on the 1987 ACM SIGBDP-SIGCPR Conference, SIGCPR’ 87, 96–128. ACM.
Dinakarpandian, D., Lee, Y., Vishwanath, K. & Lingambhotla, R. 2006. MachineProse: an ontological framework for scientific assertions. Journal of the American Medical Informatics Association 13(2), 220232.
Dixon-Woods, M., Agarwal, S., Jones, D., Young, B. & Sutton, A. 2005. Synthesising qualitative and quantitative evidence: a review of possible methods. Journal of Health Services Research & Policy 10(1), 4553.
Dung, P. M. 1995. On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games. Artificial Intelligence 77(2), 321357.
Dyba, T., Dingsoyr, T. & Hanssen, G. 2007. Applying systematic reviews to diverse study types: an experience report. In International Symposium on Empirical Software Engineering and Measurement, 225–234.
Džeroski, S., Langley, P. & Todorovski, L. 2007. Computational discovery of scientific knowledge. In Computational Discovery of Scientific Knowledge, Džeroski S. & Todorovski L. (eds), Lecture Notes in Computer Science 4660, 114. Springer.
Easterbrook, S., Singer, J., Storey, M.-A. & Damian, D. 2008. Selecting empirical methods for software engineering research. In Guide to Advanced Empirical Software Engineering, Shull F., Singer J. & Sjøberg D. I. K. (eds). Springer, 285311.
Eriksson, H. 1992. A survey of knowledge acquisition techniques and tools and their relationship to software engineering. Journal of Systems and Software 19(1), 97107.
Fayyad, U. & Stolorz, P. 1997. Data mining and KDD: promise and challenges. Future Generation Computer Systems 13(2–3), 99115.
Fellers, J. 1987. Key factors in knowledge acquisition. SIGCPR Computer Personnel 11(1), 1024.
Fiore, S. & Aloisio, G. 2011. Special section: data management for eScience. Future Generation Computer Systems 27(3), 290291.
Forbus, K. D. & DeKleer, J. 1993. Building Problem Solvers. MIT Press.
Ford, K. M. 1993. Knowledge Acquisition as Modeling. Wiley.
Freiling, M., Alexande, J., Messick, S., Rehfuss, S. & Shulman, S. 1985. Starting a knowledge engineering project: a step-by-step approach. AI Magazine 6(3), 150.
Goertz, G. & Mahoney, J. 2012. A Tale of Two Cultures: Qualitative and Quantitative Research in the Social Sciences. Princeton University Press.
Hars, A. 2001. Designing scientific knowledge infrastructures: the contribution of epistemology. Information Systems Frontiers 3(1), 6373.
Hey, T. & Trefethen, A. 2003. The data deluge: an e-science perspective. In Grid Computing, Berman F., Fox G. & Hey T. (eds). John Wiley & Sons Ltd, 809824.
Hunter, A. & Liu, W. 2010. A survey of formalisms for representing and reasoning with scientific knowledge. The Knowledge Engineering Review 25(2), 199222.
Hunter, J. 2008. Scientific publication packages—a selective approach to the communication and archival of scientific output. International Journal of Digital Curation 1(1), 3352.
Ivarsson, M. & Gorschek, T. 2012. Tool support for disseminating and improving development practices. Software Quality Journal 20(1), 173199.
Khatri, P., Sirota, M. & Butte, A. J. 2012. Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Computational Biology 8(2), e1002375.
Kiritchenko, S., Bruijn, B. D., Carini, S., Martin, J. & Sim, I. 2010. ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC Medical Informatics and Decision Making 10(1), 56.
Kitchenham, B. & Charters, S. 2007. Guidelines for performing systematic literature reviews in software engineering. Technical Report No. EBSE 2007-001, Keele University and Durham University Joint Report.
Langley, P. 1987. Scientific Discovery: Computational Explorations of the Creative Processes. MIT Press.
Langley, P., Zytkow, J. M., Bradshaw, G. L. & Simon, H. A. 1983. Three facets of scientific discovery. In Proceedings of the Eighth International Joint Conference on Artificial Intelligence – Volume 1, IJCAI’83, 465–468. Morgan Kaufmann Publishers, Inc.
Lenat, D. B. & Feigenbaum, E. A. 1991. On the thresholds of knowledge. Artificial Intelligence 47(1–3), 185250.
Lewis-Beck, M., Bryman, A. & Liao, T. F. 2004. Encyclopedia of Social Science Research Methods. SAGE Publications, Inc.
Lin, C., Lu, S., Fei, X., Chebotko, A., Pai, D., Lai, Z., Fotouhi, F. & Hua, J. 2009. A reference architecture for scientific workflow management systems and the VIEW SOA solution. IEEE Transactions on Services Computing 2(1), 7992.
Lord, P., Macdonald, A., Lyon, L. & Giaretta, D. 2004. From data deluge to data curation, In Proceeding of the 3th UK e-Science All Hands Meeting, 371–375.
Maccagnan, A., Riva, M., Feltrin, E., Simionati, B., Vardanega, T., Valle, G. & Cannata, N. 2010. Combining ontologies and workflows to design formal protocols for biological laboratories. Automated Experimentation 2(1), 114.
Martinez-Fernandez, S., Santos, P., Ayala, C., Franch, X. & Travassos, G. 2015. Aggregating Empirical Evidence about the Benefits and Drawbacks of Software Reference Architectures, 2015 ACM/IEEE. International Symposium, on Empirical Software Engineering and Measurement (ESEM), pp. 110.
Mcdermott, J. 1988. Preliminary steps toward a taxonomy of problem-solving methods. In Automating Knowledge Acquisition for Expert Systems, number 57 in The Kluwer International Series in Engineering and Computer Science, Marcus S. (ed.). Springer, 225256.
Mons, B. 2005. Which gene did you mean? BMC Bioinformatics 6(1), 142.
Mons, B. & Velterop, J. 2009. Nano-publication in the e-science era. In Workshop on Semantic Web Applications in Scientific Discourse.
Moody, D. 2009. The ‘physics’ of notations: toward a scientific basis for constructing visual notations in software engineering. IEEE Transactions on Software Engineering 35(6), 756779.
Motta, E., Rajan, T. & Eisenstadt, M. 1990. Knowledge acquisition as a process of model refinement. Knowledge Acquisition 2(1), 2149.
Newman, H. B., Ellisman, M. H. & Orcutt, J. A. 2003. Data-intensive e-science frontier research. Communication of the ACM 46(11), 6877.
Noblit, G. W. & Hare, R. D. 1988. Meta-Ethnography: Synthesizing Qualitative Studies. SAGE.
Novère, N. L., Hucka, M., Mi, H., Moodie, S., Schreiber, F., Sorokin, A., Demir, E., Wegner, K., Aladjem, M. I., Wimalaratne, S. M., Bergman, F. T., Gauges, R., Ghazal, P., Kawaji, H., Li, L., Matsuoka, Y., Villéger, A., Boyd, S. E., Calzone, L., Courtot, M., Dogrusoz, U., Freeman, T. C., Funahashi, A., Ghosh, S., Jouraku, A., Kim, S., Kolpakov, F., Luna, A., Sahle, S., Schmidt, E., Watterson, S., Wu, G., Goryanin, I., Kell, D. B., Sander, C., Sauro, H., Snoep, J. L., Kohn, K. & Kitano, H. 2009. The systems biology graphical notation. Nature Biotechnology 27(8), 735741.
Petersen, K., Feldt, R., Mujtaba, S. & Mattsson, M. 2008. Systematic mapping studies in software engineering. In Proceedings of the 12th International Conference on Evaluation and Assessment in Software Engineering, EASE’08, 68–77. British Computer Society.
Plant, R. T. 1991. Rigorous approach to the development of knowledge-based systems. Knowledge-Based Systems 4(4), 186196.
Rainer, A., Jagielska, D. & Hall, T. 2005. Software engineering practice versus evidence-based software engineering research. In Proceedings of the 2005 Workshop on Realising Evidence-Based Software Engineering, REBSE’05, 1–5. ACM.
Rook, F. & Croghan, J. 1989. The knowledge acquisition activity matrix: a systems engineering conceptual framework. IEEE Transactions on Systems, Man and Cybernetics 19(3), 586597.
Rzhetsky, A., Iossifov, I., Koike, T., Krauthammer, M., Kra, P., Morris, M., Yu, H., Duboué, P. A., Weng, W., Wilbur, W. J., Hatzivassiloglou, V. & Friedman, C. 2004. GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data. Journal of Biomedical Informatics 37(1), 4353.
Sackett, D. L., Rosenberg, W. M., Gray, J. A., Haynes, R. B. & Richardson, W. S. 1996. Evidence based medicine: what it is and what it isn’t. BMJ 312(7023), 7172.
Sanders, T. J. M., Spooren, W. P. M. & Noordman, L. G. M. 1993. Coherence relations in a cognitive theory of discourse representation. Cognitive Linguistics 4(2), 93134.
Santos, P. & Travassos, G. 2013. On the representation and aggregation of evidence in software engineering: a theory and belief-based perspective. Electronic Notes in Theoretical Computer Science 292, 95118.
Santos, P. & Travassos, G. 2015. Aggregating empirical evidence about the benefits and drawbacks of software reference architectures. In International Symposium on Empirical Software Engineering and Measurement (in press).
Santos, P. S., Nascimento, I. & Travassos, G. H. 2015. A computational infrastructure for research synthesis in software engineering. In XVIII Ibero-American Conference on Software Engineering, 309–322. URP, SPC, UCSP, UCSP.
Schreiber, G. 2000. Knowledge Engineering and Management: The CommonKADS Methodology. MIT Press.
Shafer, G. 1976. A Mathematical Theory of Evidence. Princeton University Press.
Shotton, D. 2009. Semantic publishing: the coming revolution in scientific journal publishing. Learned Publishing 22(2), 8594.
Shrager, J. 1990. Computational Models of Scientific Discovery and Theory Formation. Morgan Kaufmann Publisher.
Shull, F., Feldmann, R. & Shaw, M. 2006. Building decision support in an imperfect world. In International Symposium on Empirical Software Engineering ISESE, 33–35.
Shull, F., Singer, J. & Sjøberg, D. I. K. 2007. Guide to Advanced Empirical Software Engineering, 2008 edition. Springer.
Simon, H. A. 1977. Scientific discovery and the psychology of problem solving. In Models of Discovery, Number 54 in Boston Studies in the Philosophy of Science, Simon H. A. (ed.). Springer, 286303.
Sjøberg, D. I. K., Dybå, T., Anda, B. C. D. & Hannay, J. E. 2008. Building theories in software engineering. In Guide to Advanced Empirical Software Engineering, Shull F., Singer J. & Sjøberg D. I. K. (eds). Springer, 312336.
Slater, T., Bouton, C. & Huang, E. S. 2008. Beyond data integration. Drug Discovery Today 13(13–14), 584589.
Stock, K., Robertson, A., Reitsma, F., Stojanovic, T., Bishr, M., Medyckyj-Scott, D. & Ortmann, J. 2009. eScience for sea science: a semantic scientific knowledge infrastructure for marine scientists. In Fifth IEEE International Conference on e-Science. e-Science’ 09, 110–117.
Studer, R., Benjamins, V. & Fensel, D. 1998. Knowledge engineering: principles and methods. Data & Knowledge Engineering 25(1–2), 161197.
Travassos, G., Santos, P., Neto, P. & Biolchini, J. 2008. An environment to support large scale experimentation in software engineering. In 13th IEEE International Conference on Engineering of Complex Computer Systems, 2008. ICECCS 2008, 193–202.
Valdés-Pérez, R. E. 1996. Computer science research on scientific discovery. The Knowledge Engineering Review 11(1), 5766.
Vorms, M. 2011. Representing with imaginary models: formats matter. Studies in History and Philosophy of Science Part A 42(2), 287295.
Wallace, D. & Fujii, R. 1989. Software verification and validation: an overview. IEEE Software 6(3), 1017.
Wielinga, B., Schreiber, A. & Breuker, J. 1992. KADS: a modelling approach to knowledge engineering. Knowledge Acquisition 4(1), 553.
Bölling, C., Weidlich, M. & Holzhüutter, H.-G. 2014. SEE: structured representation of scientific evidence in the biomedical domain using semantic web techniques. Journal of Biomedical Semantics 5(Suppl 1), S1.
Boyce, R., Collins, C., Horn, J. & Kalet, I. 2007. Modeling drug mechanism knowledge using evidence and truth maintenance. IEEE Transactions on Information Technology in Biomedicine 11(4), 386397.
Brodaric, B., Reitsma, F. & Qiang, Y. 2008. SKIing with DOLCE: toward an e-science knowledge infrastructure. In Proceedings of the Fifth International Conference on Formal Ontology in Information Systems (FOIS 2008), 208–219. IOS Press.
Ciccarese, P., Wu, E., Wong, G., Ocana, M., Kinoshita, J., Ruttenberg, A. & Clark, T. 2008. The SWAN biomedical discourse ontology. Journal of Biomedical Informatics 41(5), 739751.
Clare, A., Croset, S., Grabmueller, C., Kafkas, S., Liakata, M., Oellrich, A. & Rebholz-Schuhmann, D. 2011. Exploring the generation and integration of publishable scientific facts using the concept of nano-publications. In Workshop on Semantic Publishing at ESWC2011, 13–17.
Croft, D., O’Kelly, G., Wu, G., Haw, R., Gillespie, M., Matthews, L., Caudy, M., Garapati, P., Gopinath, G., Jassal, B., Jupe, S., Kalatskaya, I., Mahajan, S., May, B., Ndegwa, N., Schmidt, E., Shamovsky, V., Yung, C., Birney, E., Hermjakob, H., D’Eustachio, P. & Stein, L. 2011. Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Research 39(Suppl 1), D691D697.
de Waard, A., Buckingham Shum, S., Carusi, A., Park, J., Samwald, M. & Sándor, Á. 2009. Hypotheses, evidence and relationships: the HypER approach for representing scientific knowledge claims.
de Waard, A. & Schneider, J. 2012. Formalising uncertainty: an ontology of reasoning, certainty and attribution (ORCA). In Workshop on Semantic Technologies Applied to Biomedical Informatics and Individualized Medicine (SATBI+SWIM)..
Dinakarpandian, D., Lee, Y., Vishwanath, K., Lingambhotla, R. 2006. MachineProse: an ontological framework for scientific assertions. Journal of the American Medical Informatics Association 13(2), 220232.
Ekaputra, F., Sabou, M., Serral, E. & Biffl, S. 2014. Supporting information sharing for reuse and analysis of scientific research publication data. In Proceedings of the 4th Workshop on Semantic Publishing, SePublica ‘14.
Groth, P., Gibson, A. & Velterop, J. 2010. The anatomy of a nanopublication. Information Services and Use 30(1), 5156.
Groza, T., Möller, K., Handschuh, S., Trif, D. & Decker, S. 2007. SALT: weaving the claim web. In The Semantic Web, Aberer K., Choi K.-S., Noy N., Allemang D., Lee K.-I., Nixon L., Golbeck J., Mika P., Maynard D., Mizoguchi R., Schreiber G. & Cudré-Mauroux P. (eds), Lecture Notes in Computer Science 4825, 197210. Springer.
Hunter, A. & Williams, M. 2012. Aggregating evidence about the positive and negative effects of treatments. Artificial Intelligence in Medicine 56(3), 173190.
Kraines, S. & Guo, W. 2011. A system for ontology-based sharing of expert knowledge in sustainability science. Data Science Journal 9, 107123.
Kuhn, T., Barbano, P. E., Nagy, M. L. & Krauthammer, M. 2013. Broadening the scope of nanopublications. In The Semantic Web: Semantics and Big Data, Cimiano P., Corcho O., Presutti V., Hollink L. & Rudolph S. (eds), Lecture Notes in Computer Science 7882, 487501. Springer.
Mancini, C. & Buckingham Shum, S. J. 2006. Modelling discourse in contested domains: a semiotic and cognitive framework. International Journal of Human-Computer Studies 64(11), 11541171.
Marcondes, C. H. 2011. Knowledge network of scientific claims derived from a semantic publication system. Information Services and Use 31(3), 167176.
Pike, W. & Gahegan, M. 2007. Beyond ontologies: toward situated representations of scientific knowledge. International Journal of Human-Computer Studies 65(7), 674688.
Russ, T. A., Ramakrishnan, C., Hovy, E. H., Bota, M. & Burns, G. A. 2011. Knowledge engineering tools for reasoning with scientific observations and interpretations: a neural connectivity use case. BMC Bioinformatics 12(1), 351.
Santos, P. & Travassos, G. 2013. On the representation and aggregation of evidence in software engineering: a theory and belief-based perspective. Electronic Notes in Theoretical Computer Science 292, 95118.
Sharma, R., Poole, D. & Smyth, C. 2010. A framework for ontologically-grounded probabilistic matching. International Journal of Approximate Reasoning 51(2), 240262.
van Valkenhoef, G., Tervonen, T., Zwinkels, T., de Brock, B. & Hillege, H. 2013. ADDIS: a decision support system for evidence-based medicine. Decision Support Systems 55(2), 459475.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

The Knowledge Engineering Review
  • ISSN: 0269-8889
  • EISSN: 1469-8005
  • URL: /core/journals/knowledge-engineering-review
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed