Hostname: page-component-77f85d65b8-hzqq2 Total loading time: 0 Render date: 2026-03-28T14:32:45.285Z Has data issue: false hasContentIssue false

Extended High-Utility Pattern Mining: An Answer Set Programming-Based Framework and Applications

Published online by Cambridge University Press:  19 April 2023

FRANCESCO CAUTERUCCIO
Affiliation:
DII, Polytechnic University of Marche, Ancona (Italy) (e-mail: f.cauteruccio@univpm.it)
GIORGIO TERRACINA
Affiliation:
DEMACS, University of Calabria, Rende (Italy) (e-mail: terracina@mat.unical.it)
Rights & Permissions [Opens in a new window]

Abstract

Detecting sets of relevant patterns from a given dataset is an important challenge in data mining. The relevance of a pattern, also called utility in the literature, is a subjective measure and can be actually assessed from very different points of view. Rule-based languages like Answer Set Programming (ASP) seem well suited for specifying user-provided criteria to assess pattern utility in a form of constraints; moreover, declarativity of ASP allows for a very easy switch between several criteria in order to analyze the dataset from different points of view. In this paper, we make steps toward extending the notion of High-Utility Pattern Mining; in particular, we introduce a new framework that allows for new classes of utility criteria not considered in the previous literature. We also show how recent extensions of ASP with external functions can support a fast and effective encoding and testing of the new framework. To demonstrate the potential of the proposed framework, we exploit it as a building block for the definition of an innovative method for predicting ICU admission for COVID-19 patients. Finally, an extensive experimental activity demonstrates both from a quantitative and a qualitative point of view the effectiveness of the proposed approach.

Information

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Table 1. Terminology and facets for the paper reviews use case running example introduced in Example 1

Figure 1

Table 2. Examples of container, object, and transaction utility vectors for the paper reviews use case running example

Figure 2

Table 3. An excerpt of the transactions contained in the paper reviews use case, along with some objects and containers, used in the running example

Figure 3

Listing 1. A general ASP encoding for the e-HUPM problem.

Figure 4

Fig 1. Modular composition of ASP subprograms for the paper reviews use case introduced as running example.

Figure 5

Listing 2. A full example of the ASP encoding for the ICU admission prediction for COVID-19 patients.

Figure 6

Listing 3. The core Python function of the constraint propagator exploited in WASP.

Figure 7

Fig. 2. Percentage of transactions (a) and Percentage of combinations of patient attributes (b) covered by valid patterns.

Figure 8

Table 4. Accuracy (left) and missing rate (right)

Figure 9

Fig 3. Average running time (in seconds) for utility functions sum (a) and disagreement degree (b).

Figure 10

Fig. 4. Average running time (log scale) for utility functions sum (a) and product (b) using pure ASP or external functions.

Figure 11

Fig. 5. Average memory usage (log scale) for utility functions sum (a) and product (b). Horizontal white lines denote the boundary for the memory required by the grounder (bottom) and solver (top).

Figure 12

Table 5. Qualitative analysis on coherence/disagreement degrees