Skip to main content Accessibility help
×
Home

Hybrid ASP-based Approach to Pattern Mining

  • SERGEY PARAMONOV (a1), DARIA STEPANOVA (a2) and PAULI MIETTINEN (a2)

Abstract

Detecting small sets of relevant patterns from a given data set is a central challenge in data mining. The relevance of a pattern is based on user-provided criteria; typically, all patterns that satisfy certain criteria are considered relevant. Rule-based languages like answer set programming (ASP) seem well suited for specifying such criteria in a form of constraints. Although progress has been made, on the one hand, on solving individual mining problems and, on the other hand, developing generic mining systems, the existing methods focus either on scalability or on generality. In this paper, we make steps toward combining local (frequency, size, and cost) and global (various condensed representations like maximal, closed, and skyline) constraints in a generic and efficient way. We present a hybrid approach for itemset, sequence, and graph mining which exploits dedicated highly optimized mining systems to detect frequent patterns and then filters the results using declarative ASP. To further demonstrate the generic nature of our hybrid framework, we apply it to a problem of approximately tiling a database. Experiments on real-world data sets show the effectiveness of the proposed method and computational gains for itemset, sequence, and graph mining, as well as approximate tiling.

Under consideration in Theory and Practice of Logic Programming.

Copyright

Footnotes

Hide All

This work has been supported by the FWO and by the ERC-ADG-201 project 694980 SYNTH funded by the European Research Council. This is an extended version of a paper presented at the RuleML+RR 2017 conference, which has been invited for submission to TPLP. The authors acknowledge the assistance of the RuleML+RR 2017 Program Chairs Stefania Costantini, Enrico Franconi, Fariba Sadri and William van Woensel.

Footnotes

References

Hide All
Aggarwal, C. C. 2015. Data Mining: The Textbook. Springer, Cham.
Agrawal, R., Imielinski, T. and Swami, A. 1993. Mining association rules between sets of items in large databases. In SIGMOD 1993, ACM, New York, NY, USA, 207216.
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H. and Verkamo, A. I. 1996. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, Menlo Park, 307328.
Alviano, M., Dodaro, C., Faber, W., Leone, N. and Ricca, F. 2013. WASP: A native asp solver based on constraint learning. In International Conference on Logic Programming and Nonmonotonic Reasoning. Springer, Berlin, 5466.10.1007/978-3-642-40564-8_6
Aoga, J. O. R., Guns, T. and Schaus, P. 2016. An efficient algorithm for mining frequent sequence with constraint programming. In ECML PKDD, Springer, Cham, 315330.
Babai, L. 2015. Graph isomorphism in quasipolynomial time. CoRR abs/1512.03547.
Bonchi, F. and Lucchese, C. 2006. On condensed representations of constrained frequent patterns. Knowledge and Information Systems 9, 2, 180201.10.1007/s10115-005-0201-1
Cook, S. A. 1971. The complexity of theorem-proving procedures. In Proc. of the 3rd Annual ACM Symposium on Theory of Computing, Shaker Heights, OH, USA, 3–5 May 1971, Harrison, M. A., Banerji, R. B., and Ullman, J. D., Eds. ACM, New York, 151158.
Eiter, T., Brewka, G., Dao-Tran, M., Fink, M., Ianni, G. and Krennwallner, T. 2009a. Combining nonmonotonic knowledge bases with external sources. In Frontiers of Combining Systems, 7th International Symposium, Ghilardi, S. and Sebastiani, R., Eds. Frontiers of Combining Systems. FroCoS 2009. Lecture Notes in Computer Science, vol. 5749. Springer, Berlin, Heidelberg, 16–18 Sep. 2009, 1842.10.1007/978-3-642-04222-5_2
Eiter, T., Ianni, G. and Krennwallner, T. 2009b. Answer set programming: A primer. In 5th International Reasoning Web Summer School (RW 2009), Brixen/Bressanone, Italy, 30 Aug.–4 Sep. 2009. LNCS, vol. 5689. Springer, Berlin.
Faber, W., Pfeifer, G., Leone, N., Dell’Armi, T. and Ielpa, G. 2008. Design and implementation of aggregate functions in the DLV system. CoRR abs/0802.3137.
Gebser, M., Guyet, T., Quiniou, R., Romero, J. and Schaub, T. 2016. Knowledge-based sequence mining with ASP. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, AAAI, New York, NY, USA, 9–15 July 2016.
Gebser, M., Kaminski, R., Kaufmann, B. and Schaub, T. 2012. Answer Set Solving in Practice. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan and Claypool Publishers, San Rafael.10.2200/S00457ED1V01Y201211AIM019
Gebser, M., Kaufmann, B., Neumann, A. and Schaub, T. 2007. clasp: A conflict-driven answer set solver. In LPNMR, Springer, 260265.
Geerts, F., Goethals, B. and Mielikäinen, T. 2004. Tiling databases. In Proc. of the 7th International Conference on Discovery Science, DS 2004, Springer, Padova, Italy, 2–5 Oct. 2004, 278289.
Gelfond, M. and Lifschitz, V. 1988. The stable model semantics for logic programming. In Proceedings of ICLP/SLP, MIT Press, 10701080.
Guns, T., Dries, A., Nijssen, S., Tack, G. and De Raedt, L. 2017. MiningZinc: A declarative framework for constraint-based mining. Artificial Intelligence 244, 629.10.1016/j.artint.2015.09.007
Guns, T., Nijssen, S., and de Raedt, L. 2013. k-pattern set mining under constraints. IEEE Transactions on Knowledge and Data Engineering 25, 2, 402418.10.1109/TKDE.2011.204
Guns, T., Paramonov, S. and Négrevergne, B. 2016. On declarative modeling of structured pattern mining. In Declarative Learning Based Programming, Papers from the 2016 AAAI Workshop, AAAI Press, Phoenix, AZ, USA, 13 Feb. 2016.
Guyet, T., Moinard, Y. and Quiniou, R. 2014. Using answer set programming for pattern mining. CoRR abs/1409.7777.
Jabbour, S., Sais, L. and Salhi, Y. 2013. Boolean satisfiability for sequence mining. In 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, San Francisco, CA, USA, 27 Oct.–1 Nov. 2013, ACM, 649658.
Jabbour, S., Sais, L. and Salhi, Y. 2015. Decomposition based SAT encodings for itemset mining problems. In PAKDD, Springer, 662674.
Järvisalo, M. 2011. Itemset mining as a challenge application for answer set enumeration. In Proc. of the 11th International Conference on Logic Programming and Nonmonotonic Reasoning, LPNMR 2011, Vancouver, Canada, 16–19 May 2011, Springer, 304310.10.1007/978-3-642-20895-9_35
Kimelfeld, B. and Kolaitis, P. G. 2014. The complexity of mining maximal frequent subgraphs. ACM Transactions on Database Systems 39, 4, 3233.10.1145/2629550
Lifschitz, V. 2008. What is answer set programming? AAAI, Chicago, IL.
Mannila, H. and Toivonen, H. 1997. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1, 3, 241258.10.1023/A:1009796218281
Métivier, J., Loudni, S. and Charnois, T. 2013. A constraint programming approach for mining sequential patterns in a sequence database. CoRR abs/1311.6907.
Miettinen, P. 2008. On the positive-negative partial set cover problem. Information Processing Letters 108, 4, 219221.10.1016/j.ipl.2008.05.007
Miettinen, P. 2009. Matrix Decomposition Methods for Data Mining: Computational Complexity and Algorithms. Ph.D. thesis, Department of Computer Science, University of Helsinki.
Miettinen, P. 2015. Generalized matrix factorizations as a unifying framework for pattern set mining: Complexity beyond blocks. In ECMLPKDD 2015, Springer, 3652.
Miettinen, P., Mielikäinen, T., Gionis, A., Das, G. and Mannila, H. 2008. The discrete basis problem. IEEE Transactions on Knowledge and Data Engineering 20, 10, 13481362.10.1109/TKDE.2008.53
Négrevergne, B., Dries, A., Guns, T. and Nijssen, S. 2013. Dominance programming for itemset mining. In 2013 IEEE 13th International Conference on Data Mining, Dallas, TX, USA, 7–10 Dec. 2013, IEEE, 557566.10.1109/ICDM.2013.92
Négrevergne, B. and Guns, T. 2015. Constraint-based sequence mining using constraint programming. In CPAIOR, Springer, 288305.
Neumann, S. and Miettinen, P. 2017. Reductions for frequency-based data mining problems. In Proc. of the 2017 IEEE International Conference on Data Mining, IEEE, New Orleans, LA, USA, 9971002.10.1109/ICDM.2017.128
Paramonov, S., van Leeuwen, M., Denecker, M. and De Raedt, L. 2015. An exercise in declarative modeling for relational query mining. In ILP, Springer, 166182.
Pei, J. and Han, J. 2000. Can we push more constraints into frequent pattern mining? In ACM SIGKDD, ACM, Boston, MA, USA, 350354.
Pei, J., Han, J. and Mao, R. 2000. CLOSET: An efficient algorithm for mining frequent closed itemsets. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Dallas, TX, USA, 2130.
Pensa, R. G. and Boulicaut, J.-F. 2005. Towards fault-tolerant formal concept analysis. In AI*IA 2005, Springer, 212223.
Rojas, W. U., Boizumault, P., Loudni, S., Crémilleux, B. and Lepailleur, A. 2014. Mining (soft-) skypatterns using dynamic CSP. In CPAIOR, Springer, 7187.
Simons, P., Niemelä, I. and Soininen, T. 2002. Extending and implementing the stable model semantics. Artificial Intelligence 138, 1–2, 181234.10.1016/S0004-3702(02)00187-X
Tyukin, A., Kramer, S. and Wicker, J. 2014. BMaD – A boolean matrix decomposition framework. In ECML PKDD 2014, Calders, T., Esposito, F., Hüllermeier, E. and Meo, R., Eds., Springer, 481484.
van der Hallen, M., Paramonov, S., Leuschel, M. and Janssens, G. 2016. Knowledge representation analysis of graph mining. CoRR abs/1608.08956.
Yan, X. and Han, J. 2002. gSpan: Graph-based substructure pattern mining. In ICDM 2002, Maebashi City, Japan, Japan, IEEE.
Yan, X., Han, J. and Afshar, R. 2003. CloSpan: Mining closed sequential patterns in large datasets. In SDM. 166–177.10.1137/1.9781611972733.15
Zaki, M. J., Parthasarathy, S., Ogihara, M. and Li, W. 1997. New algorithms for fast discovery of association rules. Technical report, University of Rochester, Rochester, NY, USA.10.1007/978-1-4615-5669-5_1

Keywords

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed