Skip to main content

Using automated planning for improving data mining processes

  • Susana Fernández (a1), Tomás de la Rosa (a1), Fernando Fernández (a1), Rubén Suárez (a1), Javier Ortiz (a1), Daniel Borrajo (a1) and David Manzano (a2)...

This paper presents a distributed architecture for automating data mining (DM) processes using standard languages. DM is a difficult task that relies on an exploratory and analytic process of processing large quantities of data in order to discover meaningful patterns. The increasing heterogeneity and complexity of available data requires some expert knowledge on how to combine the multiple and alternative DM tasks to process the data. Here, we describe DM tasks in terms of Automated Planning, which allows us to automate the DM knowledge flow construction. The work is based on the use of standards that have been defined in both DM and automated-planning communities. Thus, we use PMML (Predictive Model Markup Language) to describe DM tasks. From the PMML, a problem description in PDDL (Planning Domain Definition Language) can be generated, so any current planning system can be used to generate a plan. This plan is, again, translated to a DM workflow description, Knowledge Flow for Machine Learning format (Knowledge Flow file for the WEKA (Waikato Environment for Knowledge Analysis) tool), so the plan or DM workflow can be executed in WEKA.

Hide All
Amant, R. S., Cohen, P. R. 1997. Evaluation of a semi-autonomous assistant for exploratory data analysis. In Proceedings of the 1st International Conference on Autonomous Agents, Johnson, W. L. & Hayes-Roth, B. (eds). Marina del Rey, California, United States, 355–362. ACM Press.
Ambite, J. L., Kapoor, D. 2007. Automatically composing data workflows with relational descriptions and shim services. In The Semantic Web, Lecture Notes in Computer Science 4825, 15–29. Springer.
Bernstein, A., Provost, F., Hill, S. 2005. Towards intelligent assistance for a data mining process: an ontology based approach for cost-sensitive classification. IEEE Transactions on Knowledge and Data Engineering 17(4), 503518.
Chien, S. A., Mortensen, H. B. 1996. Automating image processing for scientific data analysis of a large image database. IEEE Transactions on Pattern Analysis and Machine Intelligence 18(8), 854859.
De la Rosa, T., García-Olaya, A., Borrajo, D. 2007. Using cases utility for heuristic planning improvement. In Case-Based Reasoning Research and Development: Proceedings of the 7th International Conference on Case-Based Reasoning, Weber, R. O. & Richter, M. M. Belfast, Northern Ireland, UK, 137–148. Springer Verlag. ISBN 978-3-540-74138-1.
Diamantini, C., Potena, D., Storti, E. 2009. Ontology-driven KDD process composition. In Advances in Intelligent Data Analysis VIII, Lecture Notes in Computer Science 5772, 285–296. Springer.
Engels, R. 1996. Planning tasks for knowledge discovery in databases; performing task-oriented user-guidance. In Proceedings of the 2nd International Conference on KDD, Menlo Park, California. AAAI Press.
Etzioni, O., Weld, D. 1994. A softbot-based interface to the internet. Communications of the ACM 37(7), 7276.
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P. 1996. From data mining to knowledge discovery in databases. AI Magazine 17(3), 3754.
Fernández, F., Borrajo, D., Fernández, S., Manzano, D. 2009. Assisting data mining through automated planning. In Machine Learning and Data Mining 2009 (MLDM 2009), Perner, P. (ed.), Lecture Notes in Artificial Intelligence 5632, 760–774. Springer-Verlag.
Fox, M., Long, D. 2003. PDDL2.1: an extension to PDDL for expressing temporal planning domains. Journal of Artificial Intelligence Research 20, 61124.
Ghallab, M., Nau, D., Traverso, P. 2004. Automated Planning—Theory and Practice. Morgan Kaufmann.
Goebel, M., Gruenwald, L. 1999. A survey of data mining and knowledge discovery software tools. SIGKDD Explorations 1, 2033.
Golden, K. 1997. Planning and Knowledge Representations for Softbots. PhD thesis, University of Washington.
Hilario, M., Kalousis, A., Nguyen, P., Woznica, A. 2009. A data mining ontology for algorithm selection and meta-learning. In ECML/PKDD09 Workshop on Third Generation Data Mining: Towards Service-oriented Knowledge Discovery (SoKD-09), Bled, Slovenia, 76–87.
Hoffmann, J., Bertoli, P., Helmert, M., Pistore, M. 2009. Message-based web service composition, integrity constraints, and planning under uncertainty: a new connection. Journal of Artificial Intelligence Research 35, 49117.
Kietz, J.-U., Serban, F., Bernstein, A., Fischer, S. 2009. Towards cooperative planning of data mining workflows. In ECML/PKDD09 Workshop on Third Generation Data Mining: Towards Service-oriented Knowledge Discovery (SoKD-09), Bled, Slovenia, 1–12.
Livingston, G. R., Rosenberg, J. M., Buchanan, B. G. 2001. Closing the loop: an agenda- and justification-based framework for selecting the next discovery task to perform. IEEE International Conference on Data Mining, Vancouver, BC, Canada, 385. doi:
Michalski, R. S., Kaufman, K. A. 1998. Discovery planning: multistrategy learning in data mining. In Proceedings of the 4th International Workshop on Multistrategy Learning, Desenzano de Garda, Italy, 14–20.
Michie, D., Spiegelhalter, D., Taylor, C. (eds) 1994. Machine Learning, Neural and Statistical Classification. Ellis Horwood.
Morik, K., Scholz, M. 2003. The miningmart approach to knowledge discovery in databases. In Intelligent Technologies for Information Analysis, Zhong, N. & Liu, J. (eds), 4765. Springer.
Penberthy, J. S., Weld, D. 1992. UCPOP: a sound, complete, partial order planner for ADL. In Proceedings of the 3rd International Conference on Principles of Knowledge Representation and Reasoning, San Mateo, CA.
Rodríguez-Moreno, M. D., Borrajo, D., Cesta, A., Oddi, A. 2007. Integrating planning and scheduling in workflow domains. Expert System with Applications, 33(2). Retrieved from
Rosset, S., Perlich, C., Zadrozny, B. 2007. Ranking-based evaluation of regression models. Knowledge and Information Systems 12(3), 331353.
Sumathi, S., Sivanandam, S. 2006. Active data mining. In Studies in Computational Intelligence (SCI), 29. Springer-Verlag.
Witten, I. H., Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques, 2nd edition. Morgan Kaufmann.
Zakova, M., Kremen, P., Zelezny, F., Lavrac, N. 2008. Planning for data mining workflow composition. In SoKD: ECML/PKDD 2008 Workshop on 3rd Generation Data Mining: Towards Service-oriented Knowledge Discovery, Antwerp, Belgium.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

The Knowledge Engineering Review
  • ISSN: 0269-8889
  • EISSN: 1469-8005
  • URL: /core/journals/knowledge-engineering-review
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 2
Total number of PDF views: 28 *
Loading metrics...

Abstract views

Total abstract views: 193 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 24th March 2018. This data will be updated every 24 hours.