Skip to main content
    • Aa
    • Aa

A survey of data mining and knowledge discovery process models and methodologies

  • Gonzalo Mariscal (a1), Óscar Marbán (a2) and Covadonga Fernández (a2)

Up to now, many data mining and knowledge discovery methodologies and process models have been developed, with varying degrees of success. In this paper, we describe the most used (in industrial and academic projects) and cited (in scientific literature) data mining and knowledge discovery methodologies and process models, providing an overview of its evolution along data mining and knowledge discovery history and setting down the state of the art in this topic. For every approach, we have provided a brief description of the proposed knowledge discovery in databases (KDD) process, discussing about special features, outstanding advantages and disadvantages of every approach. Apart from that, a global comparative of all presented data mining approaches is provided, focusing on the different steps and tasks in which every approach interprets the whole KDD process. As a result of the comparison, we propose a new data mining and knowledge discovery process named refined data mining process for developing any kind of data mining and knowledge discovery project. The refined data mining process is built on specific steps taken from analyzed approaches.

Corresponding author
Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

R. Agrawal , J. C. Shafer 1996. Parallel mining of association rules. IEEE Engineering in Medicine and Biology Magazine Trans. On Knowledge and Data Engineering 8, 962969.

S. S. Anand , A. R. Patrick , J. G. Hughes , D. A. Bell 1998. A data mining methodology for cross sales. Knowledge-based System Journal 10(7), 449461.

K. J. Cios , L. A. Kurgan 2005. Trends in data mining and knowledge discovery. In Advanced Techniques in Knowledge Discovery and Data Mining, Pal, L. C. Jain, N. (eds), Advanced Information and Knowledge Processing. Springer, 126.

K. Cios , A. Teresinska , S. Konieczna , J. Potocka , S. Sharma 2000. Diagnosing myocardial perfusion from pect bull’s-eye maps — a knowledge discovery approach. IEEE Engineering in Medicine and Biology Magazine 19, 1725.

U. Fayyad , G. Piatetsky-Shapiro , P. Smyth 1996b. The KDD PROCESS for extracting useful knowledge from volumes of data. Communication of the ACM 39, 2734.

C. Gertosio , A. Dussauchoy 2004. Knowledge discovery from industrial databases. Journal of Intelligent Manufacturing 15, 2937.

H.-P. Kriegel , K. M. Borgwardt , P. Kröger , A. Pryakhin , M. Schubert , A. Zimek 2007. Future trends in data mining. Data Mining Knowledge Discovery 15(1), 8797.

O. Marbán , G. Mariscal , E. Menasalvas , F. J. Segovia 2007. An engineering approach to data mining projects. Lecture Notes in Computer Science 4881, 578588. Springer.

G. Piatetsky-Shapiro 2000. Knowledge discovery in databases: 10 years after. SIGKDD Explorations: Newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining 1(2), 5961.

S. Sharma , K.-M. Osei-Bryson 2009. Framework for formal implementation of the business understanding phase of data mining projects. Expert Systems with Applications 36(2), 41144124.

S. Tyrrell 2000. The many dimensions of the software process. ACM Crossroads 6(4), 2226.

Q. Yang , X. Wu 2006. 10 challenging problems in data mining research. International Journal of Information Technology and Decision Making 5(4), 597604.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

The Knowledge Engineering Review
  • ISSN: 0269-8889
  • EISSN: 1469-8005
  • URL: /core/journals/knowledge-engineering-review
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 19
Total number of PDF views: 178 *
Loading metrics...

Abstract views

Total abstract views: 943 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 20th September 2017. This data will be updated every 24 hours.