Skip to main content

A survey of data mining and knowledge discovery process models and methodologies

  • Gonzalo Mariscal (a1), Óscar Marbán (a2) and Covadonga Fernández (a2)

Up to now, many data mining and knowledge discovery methodologies and process models have been developed, with varying degrees of success. In this paper, we describe the most used (in industrial and academic projects) and cited (in scientific literature) data mining and knowledge discovery methodologies and process models, providing an overview of its evolution along data mining and knowledge discovery history and setting down the state of the art in this topic. For every approach, we have provided a brief description of the proposed knowledge discovery in databases (KDD) process, discussing about special features, outstanding advantages and disadvantages of every approach. Apart from that, a global comparative of all presented data mining approaches is provided, focusing on the different steps and tasks in which every approach interprets the whole KDD process. As a result of the comparison, we propose a new data mining and knowledge discovery process named refined data mining process for developing any kind of data mining and knowledge discovery project. The refined data mining process is built on specific steps taken from analyzed approaches.

Corresponding author
Hide All
Agrawal, R., Shafer, J. C. 1996. Parallel mining of association rules. IEEE Engineering in Medicine and Biology Magazine Trans. On Knowledge and Data Engineering 8, 962969.
Anand, S., Buchner, A. 1998. Decision Support Using Data Mining. Financial Times Management, 184.
Anand, S. S., Patrick, A. R., Hughes, J. G., Bell, D. A. 1998. A data mining methodology for cross sales. Knowledge-based System Journal 10(7), 449461.
Arranz, C. 2007. 6 sigma desde la praxis. Experiencias concretas de empresas españnolas, AEC (Asociación Española para la Calidad), chapter ¿Qué Es En Realidad Six-Sigma? 36–46. Morgan Kaufmann.
Barker, J. 1992. Paradigms: The Business of Discovering the Future. HarperBusiness.
Blockeel, H.Moyle, S. 2002. Collaborative data mining needs centralised model evaluation. In Proceedings of ICML’02 Workshop on Data Mining: Lessons Learned, T. Fawcett (ed.), 2128.
Brachman, R. J., Anand, T. 1996. The process of knowledge discovery in databases. Advances in Knowledge Discovery and Data Mining. American Association for Artificial Intelligence, 3757.
Buchner, A. G., Mulvenna, M. D., Anand, S. S., Hughes, J. G. 1999. An Internet-enabled Knowledge Discovery Process, 13–27.
Cabena, P., Hadjinian, P., Stadler, R., Verhees, J., Zanasi, A. 1997. Discovery Data Mining. From Concept to Implementation. Prentice Hall.
Capra, F. 1996. The Web of Life: A New Scientific Understanding of Living Systems. Anchor Books.
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R. 2000. CRISP-DM 1.0 Step-by-Step Data Mining Guide. Technical report, CRISP-DM.
Cios, K. J., Kurgan, L. A. 2005. Trends in data mining and knowledge discovery. In Advanced Techniques in Knowledge Discovery and Data Mining, Pal, L. C. Jain, N. (eds), Advanced Information and Knowledge Processing. Springer, 126.
Cios, K., Teresinska, A., Konieczna, S., Potocka, J., Sharma, S. 2000. Diagnosing myocardial perfusion from pect bull’s-eye maps — a knowledge discovery approach. IEEE Engineering in Medicine and Biology Magazine 19, 1725.
de Pisón Ascacibar, F. M. 2003. Optimización Mediante Técnicas de Minería de Datos Del Ciclo de Recocido de Una Línea de Galvanizado. PhD thesis, Univeridad de la Rioja.
Debuse, J. C. W., de la Iglesia, B., Howard, C., Rayward-Smith, V. 2001. Building the KDD Roadmap: A Methodology for Knowledge Discovery. Industrial Knowledge Management. Springer-Verlag, 179–196.
Edelstein, H. A., Edelstein, H. C. 1997. Building, Using, and Managing the Data Warehouse, Data Warehousing Institute, 1st edition. Prentice Hall PTR.
Eisenfeld, B., Kolsky, E., Topolinski, T. 2003a. 42 percent of crm Software Goes Unused.
Eisenfeld, B., Kolsky, E., Topolinski, T., Hagemeyer, D., Grigg, J. 2003b. Unused CRM Software Increases TCO and Decreases ROI.
EITO (European Information Technology Observatory) 2007. Eito report 2007.
Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P. 1996a. From data mining to knowledge discovery: an overview, Advances in Knowledge Discovery and Data Mining, 134. American Association for Artificial Intelligence.
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P. 1996b. The KDD PROCESS for extracting useful knowledge from volumes of data. Communication of the ACM 39, 2734.
Fayyad, U., Piatetsky-Shapiro, G., Smith, P., Uthurusamy, R. 1996c. Advances in Knowledge Discovey and Data Mining. AAAI/MIT Press.
Gallo, M. A., Hancock, W. M. 2001. Networking Explained. Butterworth-Heinemann.
Gartner, Inc. 2005. Gartner says more than 50 percent of data warehouse projects will have limited acceptance or will be failures through 2007.
Gartner, Inc. 2008a. Gartner exp survey of more than 1,400 cios shows cios must create leverage to remain relevant to the business.
Gartner, Inc. 2008b. Gartner exp worldwide survey of 1,500 cios shows 85 percent of cios expect Significant Change over next three years.
Gertosio, C., Dussauchoy, A. 2004. Knowledge discovery from industrial databases. Journal of Intelligent Manufacturing 15, 2937.
Gondar, J. E. 2005. Metodología Del Data Mining. Data Mining Institute S. L.
Harman, W. 1970. An Incomplete Guide to the Future. W. W. Norton.
Harry, M., Schroeder, R. 1999. Six Sigma, the Breakthrough Management Strategy Revolutionizing the World’s Top Corporations. Currency.
IBM 1999. Application Programming Interface and Utility Reference. IBM DB2 Intelligent Miner for Data, IBM.
IEEE 1991. Standard for Developing Software Life Cycle Processes. IEEE Std. 1074-1991. IEEE Computer Society.
ISL 1995. Clementine User Guide, Version 5, ISL, Integral Solutions Limited.
ISO 1995. ISO/IEC Standard 12207:1995. Software Life Cycle Processes. International Organization for Standarization.
Jacobson, I., Booch, G., Rumbaugh, J. 1999. The Unified Software Development Process. Addison Wesley Longman Inc.
KdNuggets.Com 2002. Data Mining Methodology.
KdNuggets.Com 2004. Data Mining Methodology.
KdNuggets.Com 2007a. Data Mining Activity in 2007 vs 2006.
KdNuggets.Com 2007b. Data Mining Methodology.
KdNuggets.Com 2008. Data Mining Roi.
Khabaza, T., Shearer, C. 1995. Data Mining with Clementine 16(2), 15. London.
Kriegel, H.-P., Borgwardt, K. M., Kröger, P., Pryakhin, A., Schubert, M., Zimek, A. 2007. Future trends in data mining. Data Mining Knowledge Discovery 15(1), 8797.
Kurgan, L. A.Musilek, P. 2006. A survey of knowledge discovery and data mining process models. The Knowledge Engineering Review 21(1), 124.
Marbán, O., Mariscal, G., Menasalvas, E., Segovia, F. J. 2007. An engineering approach to data mining projects. Lecture Notes in Computer Science 4881, 578588. Springer.
Marbán, O., Segovia, J., Menasalvas, E., Fernandez-Baizan, C. 2008. Towards data mining engineering: a software engineering approach. Information Systems Journal.
McCall, J., Richards, P., Walters, G. 1977. Factors in software quality. NTIS AD-A049-014 015(055).
McConnell, S. 1997. Desarrollo y gestión de proyectos informáticos. McGraw-Hill.
McDonald, M., Blosch, M., Jaffarian, T., Mok, L., Stevens, S. 2006. Growing It’s Contribution: The 2006 Cio Agenda.
McMurchy, N. 2008. Toolkit Tactical Guideline: Five Success Factors for Effective Bi Initiatives.
Moyle, S., Jorge, A . 2001. Ramsys—a methodology for supporting rapid remote collaborative data mining projects, ECML/PKDD 2001 Workshop on Integrating Aspects of Data Mining, Decision Support and Meta-Learning: Internal SolEuNet Session, 20–31.
Piatetsky-Shaphiro, G., Frawley, W. 1991. Knowledge Discovery in Databases. AAAI/MIT Press.
Piatetsky-Shapiro, G. 1991. Report on the AAAI-91 Workshop on Knowledge Discovery in Databases. Technical report 6, IEEE Expert.
Piatetsky-Shapiro, G. 2000. Knowledge discovery in databases: 10 years after. SIGKDD Explorations: Newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining 1(2), 5961.
Pressman, R. S. 2005. Software Engineering: A Practitioner’s Approach, 6th edition. McGraw-Hill Science.
Presutti, G. D. 1999. CRoss industry standard process for data mining: CRISP-DM, 4th CRISP-DM Special Interest Group (SIG) Meeting., Brussels.
Pyzdek, T. 2003. The Six Sigma Handbook, 2nd edition. McGraw-Hill.
Reinartz, T. 2002. Stages of the Discovery Process. Oxford University Press, Inc., 185192.
Richardson, J., Schlegel, K., Hostmann, B., McMurchy, N. 2008. Magic Quadrant for Business Intelligence Platforms, 2008.
SAS Institute 2005. Semma Data Mining Methodology. 2008. What is Business Intelligence?
Sharma, S., Osei-Bryson, K.-M. 2009. Framework for formal implementation of the business understanding phase of data mining projects. Expert Systems with Applications 36(2), 41144124.
Shearer, C. 1996. User driven data mining. Unicom Data Mining Conference. London.
Solarte, J. 2002. A Proposed Data Mining Methodoloy and Its Aplication to Industrial Engineering, Master’s thesis, University of Tennessee, Knoxville.
SpringerLink 2008. Data Mining and Knowledge Discovery.
SPSS 2007. Spss Website.
StatSoft, I. 2005. Data Mining Techniques.
Strand, M. 2000. The Business Value of Data Warehouses–Opportunities, Pitfalls and Future Directions. PhD thesis, Department of Computer Science, University of Skövde.
The CRISP-DM Consortium 2008. The crisp-dm Blog.
The Data Mining Research Group 1997. DBMiner User Manual. Simnon Fraser University, Intelligent Database Systems Laboratory.
Tkach, D. 1998. Information Mining with the IBM Intelligent Miner Family. IBM Software Solutions White Paper.
Two Crows Corporation 1998. Introduction to Data Mining and Knowledge Discovery, 2nd edition. Two Crows Corporation. ISBN 892095-00-0.
Two Crows Corporation 1999. Introduction to Data Mining and Knowledge Discovery, 3rd edition. Two Crows Corporation. ISBN 1-892095-02-5.
Tyrrell, S. 2000. The many dimensions of the software process. ACM Crossroads 6(4), 2226.
Witten, I. H., Frank, E. 2005. Data Mining: Practical Machine Learning Tools with Java Implementations, 2nd edition. Morgan Kaufmann.
Yang, Q., Wu, X. 2006. 10 challenging problems in data mining research. International Journal of Information Technology and Decision Making 5(4), 597604.
Zornes, A. 2003. The top 5 global 3000 data mining trends for 2003/04. META Group Research-Delta Summary 2061, 1–20.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

The Knowledge Engineering Review
  • ISSN: 0269-8889
  • EISSN: 1469-8005
  • URL: /core/journals/knowledge-engineering-review
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 21
Total number of PDF views: 355 *
Loading metrics...

Abstract views

Total abstract views: 1666 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 19th March 2018. This data will be updated every 24 hours.