Extended High Utility Pattern Mining: An Answer Set Programming Based Framework and Applications

Detecting sets of relevant patterns from a given dataset is an important challenge in data mining. The relevance of a pattern, also called utility in the literature, is a subjective measure and can be actually assessed from very different points of view. Rule-based languages like Answer Set Programming (ASP) seem well suited for specifying user-provided criteria to assess pattern utility in a form of constraints; moreover, declarativity of ASP allows for a very easy switch between several criteria in order to analyze the dataset from different points of view. In this paper, we make steps toward extending the notion of High Utility Pattern Mining (HUPM); in particular we introduce a new framework that allows for new classes of utility criteria not considered in the previous literature. We also show how recent extensions of ASP with external functions can support a fast and effective encoding and testing of the new framework. To demonstrate the potential of the proposed framework, we exploit it as a building block for the definition of an innovative method for predicting ICU admission for COVID-19 patients. Finally, an extensive experimental activity demonstrates both from a quantitative and a qualitative point of view the effectiveness of the proposed approach. Under consideration in Theory and Practice of Logic Programming (TPLP)


Introduction
Pattern mining is one of the data mining branches that attracted vast attention in the literature.Pattern mining algorithms extract human-understandable knowledge from databases and have fostered a growing interest in the development of scalable and flexible methods for data analysis.In this context, flexibility is intended as the ability to incorporate users' prior knowledge and multiple criteria to measure the relevance of a pattern in the analysis process.These criteria can be modelled in the form of constraints to validate a set of candidate patterns.The first and most common studied constraint is the frequency threshold, where a pattern is validated only if it appears sufficiently often (Agarwal and Srikant 1994).Frequent pattern mining is a fairly well studied problem and effective solutions are available also in the logic programming arena (Järvisalo 2011;Gebser et al. 2016;Guyet et al. 2014).
However, frequency alone may be of little interest in many cases.As an example, consider a sales database and the pattern {windshield washer fluid , new windshield wipers}; this pattern can be very frequent but uninteresting, as it represents a common purchase.But what if we also consider the price of the products and a constraint on the minimum profit given by the corresponding purchase pattern?In this case, the pattern {car , car alarm} might be more interesting than the previous one, even if less frequent.
In light of this last consideration, the academic community has begun to emphasise that pattern validity can be assessed according to utility functions (Fournier-Viger et al. 2019;Gan et al. 2019); the introduction of the notion of the utility of an item and, more generally, of a pattern made it possible to develop a new generation of pattern mining approaches called High Utility Pattern Mining -HUPM (Fournier-Viger et al. 2019;Gan et al. 2019;Yao et al. 2006).
However, the basic assumption of HUPM is that each item is associated with one, static, external utility; this limits the flexibility expected from modern data analysis processes.For instance, continuing the example on the sales database, it would not be possible to devise validation constraints that dynamically combine price and minimum packaging size of the various products composing the pattern in the utility calculation; this could be very valuable, for example, in logistics optimization applications.Generalizing, it would be really important to be able to combine, in a flexible way, different aspects of the items, which we call facets in the following, to compute patterns' utility.
Another limitation of HUPM lies in the fact that the expected input is a flat representation of transactions; this allows only local utility notions to be defined.On the contrary, a multi-layered representation of the data, coupled with the possibility of combining different facets in utility functions, may allow more advanced pattern constraints to be defined.As an example, by grouping purchases by customer and considering the degree of each customer satisfaction as a facet, one can identify patterns with a good correlation between customer purchases and their satisfaction.This is not possible with classical HUPM methods.
Finally, it can be useful to incorporate users' prior knowledge into pattern constraints, in the form of pattern masks, to assess pattern utility.For instance, it may be useful to state that only certain combinations of product categories are relevant for the analysis.Again, this is not currently possible in the classical HUPM setting.
A first contribution of the present paper is the definition of a framework, which we call e-HUPM, that generalizes the classical HUPM in the following directions: • Dataset representation.A multi-layered representation of the input is defined, where aggregation levels of transactions, objects, and containers are conventionally identified.• Facets.We introduce the notion of facet, which can be associated with an item, a transaction, an object or a container; each of these elements may be characterized by more than one facet.• Utility functions.In order to make the most of facets and layers, we introduce a taxonomy of utility functions classes, most of which are not allowed in classical HUPM.
• Pattern masks.We introduce the notion of pattern mask, in order to specify structural or semantic constraints patterns must comply with.
To solve the pattern mining problem introduced above, a guess-and-check resolution scheme, with the support of rule-based languages such as Answer Set Programming (ASP), seems to be quite intuitive and effective.In particular, we have seen that solutions for frequent pattern mining are already available in ASP (Järvisalo 2011;Gebser et al. 2016;Guyet et al. 2014); however, the constraints we are concerned with in this work to validate patterns come in the form of potentially complex functions, which are difficult, or even impossible, to express in ASP.In order to solve this issue, we resort to recent extensions of ASP systems, such as DLVHEX (Eiter et al. 2016;2018), WASP (Dodaro and Ricca 2020), clingo (Gebser et al. 2019), etc., which allow external calculation functions, usually written in Python, to be integrated into ASP programs.
A second contribution of the present work is a modular ASP encoding of the proposed framework, so that, given a pool of alternative encodings for each module, even those unfamiliar with ASP can set up their own variants in a way that provides a high degree of flexibility for data analysis.
Finally, as a third contribution, we exploit the proposed framework and the corresponding ASP-based solution as a building block for the definition of an innovative method for predicting the ICU admission for COVID-19 patients, based on patients' blood results and vital signs.
The remainder of the paper is organized as follows.In Section 2 the general framework is proposed, along with all of its components and the problem definition.Section 3 is devoted to the design of the ASP solution.The application of our framework to a biomedical context is presented in Section 4, whereas an extensive experimental evaluation is presented in Section 5.In Section 6 a broad overview of related work is presented.Finally, in Section 7 we draw our conclusions and highlight future research directions.

A General Framework for Extending High Utility Pattern Mining (e-HUPM)
In this section we present our framework.First, we briefly recall the classical background definitions related to the HUPM problem.Then, we show how we extend the classical problem in several ways.In particular, we first extend the concept of Transaction database with Containers and Objects; then we extend the concept of utility with the notion of facet.After this, we introduce a new classification of pattern utility functions and the notion of pattern mask.Finally, we provide a formal definition of the problem addressed in this work.

Background
We now briefly summarize the classical definitions and notations for the HUPM problem based on the ones presented in (Fournier-Viger et al. 2019).It is worth pointing out that several variants of this problem have been proposed in the literature; since it is out of the scope of the paper covering all of them here, we focus on the most classical one.
A quantitative transaction database D is composed of a set of transactions and denoted as D = {T 1 , T 2 , . . ., T n }.Each transaction T p is uniquely identified by a transaction identifier (tid p ) and contains a set of items I p where each I p ⊆ I = {i 1 , i 2 , . . ., i m }.Each item i ∈ I is associated with an external utility eu(i) and every occurrence of i in a transaction T p is associated with an internal utility iu(i, T p ) which generally represents the quantity of i in T p .In the classical definition of HUPM both external and internal utilities are positive numbers.The objective of HUPM is the identification of sets of items (patterns) that present a high utility, i.e., a utility higher than a certain threshold th u .
The utility of an item i in a transaction T p is obtained as eu(i) × iu(i, T p ).Given a pattern P appearing in a transaction T p , the utility of P in T p is denoted as tu(P, T p ) and is computed as i∈P eu(i) × iu(i, T p ).
Given the set T P of transactions containing occurrences of the pattern P , the utility of P in the database D is denoted by u(P ) and computed as u(P ) = Tp∈T P tu(P, T p ).
Problem definition.Given a pattern P , P is a high utility pattern if its utility u(P ) computed over its occurrences in D is greater than a utility threshold th u .The problem of high utility pattern mining is to discover all high utility patterns in a database D.

Databases, containers and objects.
The first extension we propose with respect to the classical HUPM relates to the database representation.In the classical context, a database is simply composed of a flat set of transactions.Here we assume that a database of transactions is organized in different abstraction layers.This allows us to include more sophisticated utility functions in order to analyze, as an example, correlations among different transactions at different abstraction levels.Here we propose a possible hierarchy for the database, which fixes some boundaries for the definition of the problem; clearly more general structures and more layers can be defined in different contexts.The proposed reference database structure is as follows: In particular, given a database D and a set of transactions {T 1 , T 2 , . . ., T n }, D is organized as a set of containers C = {C 1 , C 2 , . . ., C r } where each container C s can be associated with a set of objects O = {O 1 , O 2 , . . ., O t } and each object O u contains a set of transactions {T 1 , T 2 , . . ., T v }.Clearly each transaction is composed of a (possibly ordered) set of items.Each transaction belongs to precisely one object and each object is associated with precisely one container.Observe that this representation allows covering several interesting application scenarios.
Example 1. (Running example) In order to better describe our framework, let us introduce a running example concerning scientific reviews, which will be explored further in the experiments.This context presents several peculiarities captured by our framework.In particular, we build upon the paper reviews use case presented in (Chakraborty et al. 2020); it deals with papers, reviews and annotated sentences.Each paper is a container in our model; for each paper, several reviews are available: each review is an object in our model.Each review contains a list of sentences: each sentence is a transaction in our model.

Utilities and facets.
The second main generalization deals with utility representation.As a matter of fact, a great limit of the classical definition of HUPM, and its variants, is that each item is associated with a unique, external, and fixed value of utility.We next extend the notion of utility with the concept of facet.In fact, in several contexts, the utility of an item can be defined from different perspectives, which we call facets.Then, each item can be associated with a list of values defining its utility from different perspectives.Moreover, given the new organization of the database, facets can be defined also for transactions, objects, and containers.Formally: Item utility vector.Given an item i, the utility of i is defined by an item utility vector where each iu k describes a certain facet of i.
Transaction utility vector.Given a transaction T p , the utility of T p is defined by a transaction utility vector T U Tp = [tu 1 , tu 2 , . . ., tu m ], where each tu k describes a facet of T p .Observe that these facets for the transaction describe properties of the transaction as a whole and represent a different information than the standard internal utility of an item i in the transaction T p .In order to keep the compatibility with the classical problem, we assume that the internal utility of i in T p is available and represented as q(i, T p ).
Object utility vector.Given an object O u , the utility of O u is defined by an object utility vector OU Ou = [ou 1 , ou 2 , . . ., ou n ], where each ou k describes a facet of O u .
Container utility vector.Given a container C s , the utility of C s is defined by a vector CU Cs = [cu 1 , cu 2 , . . ., cu o ], where each cu k describes a facet of C s .
It is worth observing that the length of any of the above utility vectors could be 0 if no interesting facet can be defined for that vector.The list of facets is fixed at problem modelling stage; however, we assume that all utility vectors of a certain type have the same number of facets.All utilities introduced above are not constrained to be numeric values.The interpretation and combination of utilities is left to the pattern utility computation function, introduced in the next section.
Example 2. (Running example) Table 1 illustrates the correspondence between our e-HUPM framework and the paper reviews use case adopted from (Chakraborty et al. 2020).The table depicts the selected facets for each domain element and its representation in the database, e.g., facets for containers, objects and transactions; the database and items have no associated facets.Each container represents a paper and for a container C s we have one single facet, that is the decision.Reviews information are represented by objects and for an object O u we have two facets, namely rating and confidence.Transaction (reviews' sentence) Table 3: An excerpt of the transactions contained in the paper reviews use case, along with some objects and containers, used in the running example S 2 {(problem, 1), (paper, 1), (concern, 1), (reproducibility, 1)} C 2 R 3 S 3 {(paper, 1), (readable, 1)} S 4 {(paper, 1), (good, 1), (experiment, 1), (reproducibility, 1)} Each review's sentence is a transaction and for a transaction T p we have the facets corresponding to the eight annotated aspects defined in (Chakraborty et al. 2020) and reported in Table 1.Finally, each sentence's word is an item.Table 2 depicts the utility vectors for container, object, and transaction respectively, relative to a simple instance excerpt adapted from the dataset; this excerpt is reported in Table 3 for completeness of presentation.In this table, for a better readability, we report the word corresponding to each item.We point out that a pre-processing pipeline is applied to the sentences with usual stemming and lemmatization procedures.Note that each transaction T p is a set of pairs (i, q(i, T p )) where i is an item and q(i, T p ) its quantity in the transaction T p .

Pattern utility computation
Recall that a pattern is a (possibly ordered) set of items and it may occur in a certain number of transactions.The generalization of utilities in facets, and the structuring of the database at different abstraction levels pave the way to more advanced and diversified computations of utilities.
Intra-pattern utility.First of all, given a pattern P , composed of a set of r items and occurring in a transaction T p , all the item utility vectors of items i ∈ P must be merged into a unique item utility vector IU .In this process, internal utilities of items for T p can be taken into account.Formally, given the set IUS = {IU 1 , . . .IU r } of item utility vectors associated with each item i ∈ P , let's define the intra-pattern utility function ip which takes as input the pattern P , the transaction T p and the associated set of item utility vectors IUS , and generates a unique item utility vector for the pattern occurrence: IU Tp = ip(P, T p , IUS ) Example 3.An example of intra-pattern utility function is the following: Depending on the context of interest, the combination of the utilities across the single facets can be carried out with functions different from the SUM.As an example, MAX, MIN, or AVG operators can be applied across the same facet of the different items in the pattern.Interestingly, if r = 1 it corresponds to the classical definition.
Pattern utility functions.Now, given a pattern P occurring in a transaction T p , the corresponding occurrence utility vector OccU Tp is obtained by juxtaposing the item, transaction, object and container utility vectors: Here, for the sake of simplicity, we refer to OU Tp (resp., CU Tp ) as the object (resp., container) utility vector of the object (resp., container) containing transaction T p .
Given the set T P of transactions containing occurrences of the pattern P , the pattern utility vector U P is obtained as the collection of all the occurrence utility vectors of P : Example 4. (Running example) Let us continue the running example.Consider the pattern P = {paper, reproducibility}, and the set of transactions T P = {S 2 , S 4 } in which it occurs.To proceed with the pattern utility computation, we first should compute its intra-pattern utility by merging the item utility vectors into a unique item utility vector.However, in the paper reviews use case, items have no facets.Hence, a simple intrapattern utility function which returns an empty list can be exploited, that is The corresponding occurrence utility vector OccU S2 and OccU S4 can now be derived as the juxtaposition of unique transaction, object and container utility vectors: Then, the pattern utility vector U P consists of the occurrence utility vectors OccU S2 and OccU S4 , that is: Now, from U P we can virtually build a matrix where each row represents a utility vector associated with an occurrence of P and each column represents a facet of P .The utility u of the pattern P can be then obtained as an arbitrary combination of the values in U P , using a function u(P ).
In order to formalize function u(P ) we distinguish between formulas that operate by row (we call them horizontal first and we refer to them as f h ), formulas that operate by column (we call them vertical first and we refer them as f v ), and formulas that operate on the whole data at once (we call them mixed and we refer them as f ).
Formally, utility of P can be classified in: • Horizontal first; it first combines utilities of the various facets in each occurrence (by row) and then combines the values across all occurrences (by column): • Vertical first; it first evaluates utilities on a facet basis across the occurrences (by column) and then combines the obtained values across the facets (by row): Observe that u(P ) is a single number, whereas intermediate computations may provide sets of values.
Both Horizontal first, Vertical first, and Mixed utility functions may be further classified in: • inter-transaction utility: functions that combine item and transaction utilities; • pattern-vs-object utility: functions that compute the utility of the pattern by correlating one or more item or transaction utility facets with one or more object utility facets; • pattern-vs-container utility: functions that correlate one or more item or transaction utility facets with one or more container utility facets.
The list of potentially interesting functions is virtually infinite.A non-exhaustive set of functions for some of the classes introduced above is provided next.Their specific semantics is strongly dependent on the application context.
• Horizontal first -inter-transaction: -Filter & Sum: Filters one single facet first and then sums the obtained values1 .
-Filter & Times: Filters one single facet first and then multiplies the obtained values2 .-Coherence degree: returns a predefined value if (a subset of) facets show coherent values and then computes the percentage of pattern occurrences containing this value; an index of coherence can be the fact that all facets are positive or all facets are negative.
• Horizontal first -pattern-vs-object / pattern-vs-container : -Disagreement degree: returns a predefined value if the value of an item/transaction facet disagrees with that of an object/container facet and then computes the percentage of pattern occurrences containing this value.
• Vertical first -inter-transaction: -Max & Sum: computes the maximum for each facet across transactions first and then sums the obtained values.To understand the applicability of this function, consider a context in which each facet is a rating of a certain aspect of a transaction (e.g.shipping speed, reliability, etc.); this function makes it possible to find the patterns with the highest overall ratings on all facets.-Std & Max: computes the standard deviation of each facet across transactions first and then takes the maximum value among facets.
• Vertical first -pattern-vs-object: -Mixed Coherence degree: for each facet, computes the fraction of transactions with positive (resp., negative) values, then filters on an item/transaction facet and on an object facet and multiplies the corresponding fractions.
• Mixed pattern-vs-object / pattern-vs-container : - We start by giving a simple example for Horizontal and Vertical first functions.We consider an inter-transaction function involving Filter & Max.More in detail, let f h = filter(•) and f v = max(•) be the Filter and Max functions, respectively.Then, we have u(P ) = f v (f h (U P ))).Assume we want to focus on the rating facet of each object utility vector, therefore u(P ) consists in filtering the rating facet (by means of f h ) and then selecting the maximum value of such facet (by means of f v ).In detail, where [4] (resp., [9]) indicates the rating facet of OccU S2 (resp., OccU S4 ) represented as a singleton.Then, by applying f v across each occurrence, we obtain u In this case, the same result is obtained by applying these functions in a vertical first fashion, e.g., Instead, if we consider a mixed pattern-vs-object utility function, an interesting example is the computation of the Pearson correlation coefficient.Let us define f = r xy , where r xy is the classic Pearson correlation coefficient.Also, suppose that x indicates the Clarity aspect value present in the sentence and y indicates the rating facet of the review, across all occurrence utility vectors in U P .According to U P , we have x = [0, 1] and y = [4, 9], and by applying the pattern-vs-object utility function f we obtain f (U P ) = r xy = 1.0, indicating a perfect positive correlation between Clarity aspect and the rating facets for the pattern P = {paper, reproducibility}.

Pattern masks.
As a final extension, we introduce pattern masks, i.e., properties that patterns (resp., pattern occurrences) must satisfy in order to be considered valid patterns (resp., occurrences).Simple examples of pattern masks are given below.Clearly, other kinds of masks can be defined, depending on the context of interest.
• Mask on pattern size: it constrains the number of items in a pattern into a given range.• Mask on pattern (external) properties: it constrains items in the pattern to satisfy certain properties.As an example, if items are words from review sentences, each word can be associated with a Part of Speech (POS) tag, and it can be interesting to constrain each pattern to contain at least a noun, a verb, and an adjective.

Definition of extended high utility patterns.
In the classical problem, a high utility pattern is detected disregarding its support on the database focusing on its utility only.In the new setting we are proposing in this work, we must consider that there are utility functions which may have low significance in presence of few transactions supporting the pattern.As an example, in order to properly compute the Pearson correlation at least two, but preferably more, data points are necessary.As a consequence, in order to provide a framework as general as possible, our problem formulation includes the definition of a minimum support for the pattern, in order to detect high utility patterns; obviously, this threshold can be set to 1, meaning that a pattern should appear at least once, whenever a minimum support is not relevant.
Problem definition.Given a pattern P containing a set of items, and a pattern mask M , P is an extended high utility pattern if P and its occurrences satisfy M , its utility u(P ) is greater than a utility threshold th u , and it occurs in at least th f transactions.The problem of extended high utility pattern mining is to discover all extended high utility patterns in a database D.

Design of the ASP Approach
As previously pointed out, one of the main objectives of this work is to provide as much flexibility as possible in the definition of what is a valid pattern.In what follows, we provide an encoding as general as possible for the framework introduced in Section 2; this can be specialized for specific settings and, indeed, we also show how it can be specialized for the paper reviews use case introduced as running example in Section 2.
It is important to point out again that coding complex formulae for pattern utility, such as some of those described in Section 2, may in general not be easy (or even impossible) to achieve with a purely declarative approach; think, for instance, about the computation of the standard deviation or the multiple correlation coefficient.Similar difficulties may arise when it is necessary to handle real or negative numbers.To overcome this issue, we resort to recent extensions of ASP systems, such as DLVHEX (Eiter et al. 2016;2018), clingo (Gebser et al. 2019), and WASP (Dodaro and Ricca 2020), which allow the integration of external computation sources, usually written in Python, within the ASP program.The ASP standardization group has not yet released standard language directives for such features; in order to present our ASP formalization in this section, we refer to syntax and semantics of DLVHEX (Eiter et al. 2016), which is more intuitive.In Section 4 we also show a complete implementation in WASP (Dodaro and Ricca 2020) that exploits constraint propagators.
The general encoding is presented in Listing 1; it is structured in separate parts, in such a way that the various aspects of the problem can be modelled, and changed, with local modifications.
The first part (lines 1-9) recalls the expected schema for input facts to simplify the reading of the code.Observe that attributes Position and Q for predicate item will be needed only if item position within the transaction and its internal utility are actually needed for the problem at hand (see Section 2.3).Similarly, the number of facets for item, transaction, object and container utility vectors will be specific to the application scenario; the corresponding predicates may be omitted if no facets are available for them.Thresholds for pattern frequency (th f ) and utility (th u ) must be provided as input as well with facts occurrencesThreshold and utilityThreshold (line 11).
Rule prototype in line 13 allows items to be pre-filtered, according to users' background knowledge.Think for instance to items with a price lower than a certain threshold which are certainly not relevant for the analysis.Such filters can be very application-specific and we therefore leave this part open to user specification.Predicate usefulItem is true only for unfiltered items.
The second part of the formulation considers the generation of candidate patterns and the verification of their minimum frequency.The chosen encoding generates one answer set for each pattern; this simplifies both pattern representation and the application of user-defined constraints to candidate patterns.As pointed out in Section 2.6, in the classical HUPM context minimum support is not considered at all as a constraint on pattern validity (only utility is actually considered); in this setting, all combinations of items would be candidate patterns.However, we decided to keep the frequency constraint in order to cope with specific application scenarios.To identify the set of candidate patterns with a minimum support, we build upon the effective and elegant solution already presented in Järvisalo (2011), which has been adapted to our context (lines 15-18).Here, the predicate inCandidatePattern is true for an item i only if i is a useful item and belongs to a frequent pattern.The predicate inSupport is true for a transaction t if it supports all the items in the candidate pattern.The lack of support for some items is determined by the conflictAt predicate.We also use the support predicate contains to project the first two variables of item.Observe that minimum frequency is checked directly in the choice rule (line 15).
The third part of the formulation (lines 20-21) generates the unique item utility vector for the pattern occurrence; in particular, predicate patternItemUtilityVectors is true for each item i in the candidate pattern supported by a transaction t, and collects the corresponding facets and internal utility.Predicate intraPatternUtilityVector then instantiates the unique item utility vector for the corresponding transaction by the application of the external function computeIntraPatternUtility.Observe that, generally, this computation involves simple sum-product operations; in this case, it might be more convenient to express the formula directly within the rule.Moreover, we point out that, when items have no facets and internal utility, this part of the encoding can be omitted.
The fourth part of the formulation (lines 23-25) builds the occurrence utility vector (line 23) and checks utility criteria on candidate patterns (line 25).Here, the different database layers introduced by the framework are taken into account and the full power of external functions is exploited.Observe that when using WASP constraint propagators, the constraint expressed in line 25 is replaced by a propagator.
Finally, the last part of the formulation (lines 27-29) is devoted to express pattern masks; this part is very application specific and we show a mask on pattern size just as a simple example.Here, users' background knowledge may play a relevant part.As an example, if items are words and we know for each word its part-of-speech tag, a more elaborated mask can be defined requiring that a pattern must contain at least a noun, a verb, and an adjective.%%% The following constraint is implemented as a constraint propagator in WASP :-&computeUtility[occurrenceUtilityVector](U), U < Thu, utilityTreshold(Thu). %%% Pattern mask (e.g.) on size minLength(...).maxLength(...).:-#count{T : inCandidatePattern(T)} < L, minLength(L).:-#count{T : inCandidatePattern(T)} > L, maxLength(L).

5a 5b
Fig. 1: Modular composition of ASP subprograms for the paper reviews use case introduced as running example introduced as running example in Section 2; in particular, we show how the various parts of the general program described above, which we call Blocks in the following to simplify the presentation, can be modelled by different ASP code portions that can be combined in many different ways.In our opinion, this makes evident the flexibility provided by our hybrid solution combining ASP and Python, where the main ability of declarative programming, which allows programmers to define the what and not the how, is fully exploited in the definition and combination of the various Blocks, whereas the potential of Python to switch between different complex utility functions is exploited by external functions.
Figure 1 shows the proposed formulations, where the diverse Blocks are numbered in order to simplify the presentation.In particular, Block 1 simply defines the schema of the input dataset, along with expected input parameters.Observe that, for the paper reviews use case, no utility is associated with items; consequently, there is no itemUtilityVector and, thus, no intra-pattern utility computation is needed.For this reason, the proposed formulation does not include a Block for intra-pattern utility computation.
As far as the pre-filtering part is concerned, Block 2a and Block 2b show two different alternatives.In the code of Block 2a, only words associated with at least one positive or negative sentiment value for the sentences they appear in are considered useful for the analysis; Block 2b shows a pre-filtering that includes only words for which a disagreement between Appropriateness and Decision exists.Obviously many other pre-filtering rules can be defined for the problem at hand.Block 3a and Block 3b refer to the candidate pattern generation and occurrences check part.In particular, Block 3a reproduces the code already described in the general encoding, whereas Block 3b shows how easy it can be with ASP to switch from the classical scenario to a very different problem, in this case sequence pattern mining.In particular, in this formulation, the classical code is slightly modified in such a way that the order of items within sentences becomes relevant, and an occurrence of an item in a sentence is validated only if all items occur precisely in the same order.It is interesting to observe that the switch of this Block alone makes it possible to tackle very different pattern mining problems, and that switching from a pattern mining problem to another one with an imperative programming solution would require to rewrite long code portions.
Pattern utility computation, coded in Block 4, is standard; in this case the different potential utility functions are coded in Python by the external function computeUtility.
Finally, Block 5a and Block 5b express two different ways of defining pattern masks.Block 5a defines a simple mask on pattern size, whereas Block 5b implements a more elaborated mask where, if we assume to know the part-of-speech tag of each word, it states that if a pattern contains at least three words, these must be at least a noun, a verb, and an adjective.Of course, again, one can imagine many other types of masks.
As a final consideration, it is noteworthy that even with a limited number of alternatives for each block, a wide variety of final encodings can be generated.As an example, the combination of Blocks 1-2a-3a-4-5a represents a full encoding; other possibilities are Blocks 1-2b-3a-4-5b or Blocks 1-2a-3b-4-5a, and many more.Given a pool of alternative encodings for each Block, even those unfamiliar with ASP could put up their own variants to provide maximum flexibility in data analysis.

Application to ICU admission prediction for COVID-19 patients
In this section we present a possible application of the proposed approach in a biomedical context; in particular, we show how high utility patterns derived from a structured database can be exploited to compute correlations, and possibly make predictions, for ICU admission of confirmed COVID-19 cases, based on patients' blood results and vital signs.It is worth pointing out that the main objective of this section is not to provide a final solution to the problem, but to show with a proof-of-concept that our e-HUPM framework can actually help in relevant practical problems.
To give a full example of application of our framework, Listing 2 reports the ASP program specialized to this context, where WASP propagators are used to implement the constraint in line 22.The Listing 3 shows an excerpt of the propagator implementation; here, the most important thing to note is the simplicity with which the Pearson correlation can be calculated (see line 13 of Listing 3).It is beyond the scope of this paper to explain all the details of the implementation of propagators.The interested reader is referred to (Dodaro and Ricca 2020) for the literature on propagators and to https://www.mat.unical.it/~cauteruccio/tplp-ehupm/ for the complete implementation of propagators for this problem.
The first objective of this analysis was to find patterns, derived from patient demographic information and patient disease groups, showing a significant Pearson correlation between each of the 42 facets and ICU outcome.As an example, for the pattern: (Gender:Male, AgePercentile:60, DiseaseGroup5:YES) we found a Pearson correlation of -0.75 between Oxygen Saturation values and ICU; similarly, for the pattern (Gender:Female, Immunocompromised:NO, DiseaseGroup5:YES, DiseaseGroup6:NO, Hypertension:YES) we found a Pearson correlation of -0.82 between Hemoglobin values and ICU.More generally, we found 590 (resp., 1422) patterns showing an absolute value of the Pearson correlation greater than 0.5 between Oxygen Saturation (resp., Hemoglobin) values and ICU occurring in at least 10 transactions in the dataset.Similar number of patterns have been derived also for the other facets.Note that, to switch from one facet to another, it was sufficient to change the attribute to be projected in line 13 of Listing 2.
As a preliminary analysis of obtained results, in light of exploiting them for ICU prediction, we analyzed the fraction of transactions in the database supported by at least one valid pattern, and the percentage of combinations of patients attributes covered by at least one valid pattern.This last has been measured as the fraction of combinations that can be extracted from available data which are also supported by at least one valid pattern.Observe that, having a high fraction of covered transactions (resp., combinations) might be important in order to improve the chances to have a personalized predictive model for each upcoming unknown patient.However, setting very low minimum thresholds just to increase the number of valid patterns may, in principle, be counterproductive for their subsequent use in prediction tasks.It is then important to study these properties for a better understanding of the next steps of analysis.Results obtained from this analysis are shown in Figure 2 for several values of minimum occurrence and minimum correlation thresholds.From the analysis of this figure, we can observe that even a slight increase of the minimum correlation has a strong impact on the fraction of transactions/combinations covered by valid patterns; an increase of the minimum occurrence, instead, has a lower impact.Interestingly, too stringent thresholds end up to completely miss all transactions/combinations in the dataset and, clearly, these should be avoided.As At the end of the first phase, we have a set P = { p i , π j } of patterns p i for each facet π j corresponding to a target parameter (as an example, Oxygen saturation, Hemoglobin, etc.).Now, each p i , π j is characterized by a number ν ij of transactions supporting it and a Pearson correlation γ ij .It is then possible to derive a linear regression model µ ij (•) between π j and ICU for p i .Given the value v jk for the facet π j of a generic (possibly unseen) transaction t k supported by p i , π j , we can then apply µ ij (v jk ) to estimate the value of ICU.µ ij (v jk ) is a value in the real interval [0, 1] and constitutes a building block for the overall prediction process which is described next.
Given a generic unseen transaction t k , let first identify the set P t k ⊆ P of patterns in P contained also in t k .Observe that, in principle, P t k might be void; in this case, no prediction can be carried out.For each p i , π j ∈ P t k and the corresponding value v jk in t k let apply the regression model µ ij (v jk ) previously constructed, in order to get an estimation of ICU for the patient corresponding to t k based on v jk .Now, since there can be more patterns in P t k for t k , it is possible to produce a more accurate estimation by combining the predictions obtained from all the patterns.
There are several ways in which we can combine the various predictions.Since each p i , π j is characterized by both a Pearson correlation γ ij and a number of transactions ν ij supporting it, we can use both these values as weights in an average mean of all obtained predictions.In fact, a very high correlation indicates a very good predictive capability; however, if this correlation is not supported by many transactions, it might be unreliable for unseen transactions.On the contrary, a high number of transactions supporting the pattern indicates a good support for the prediction, but obtaining a high correlation from several transactions is, in general, more complicated.By combining these two factors we can weigh each prediction in a more accurate way.In particular, the formula exploited to estimate ICU for the patient corresponding to the transaction t k is: Observe that ICU k is a value in the real interval [0, 1]; in order to finalize the estimation, we carry out a rounding up of its value.
In order to test the validity of the approach we carried out a 5-fold cross-validation where, in each run, 80% of tuples are used for extracting patterns and building the prediction model, and 20% of tuples to compute predictions.A prediction is considered exact if its value corresponds to the ICU value present in the database for the corresponding patient.This allows us to measure the Accuracy of the approach as the fraction of exact results vs the total ones.Table 4 (left) shows the average accuracy obtained among all the runs varying the minimum allowed frequency and the Pearson correlation.Since a transaction might not be supported by valid patterns, Table 4 (right) shows also the missing rate, i.e., the percentage of transactions with no predictions.We also computed the variance for the obtained results on Accuracy, which ranges from a minimum of 0.0015 to a maximum of 0.057.
From the analysis of this table we can observe that the Accuracy is quite stable and above 70% across the various combinations of parameters, with a slight increase as the Pearson increases.However, looking at the missing rate, we can observe that increasing values of the correlation significantly increases also the missing rate.If we observe Figure 2 there are actually threshold values for which the missing rate is very high (even 1, when no predictions are available); the Accuracy obtained for these configurations, although higher than that of the others, is of less significance since these configurations only allow us to obtain predictions on a small set of data.Reasonable values for the missing rate are obtained for a minimum absolute Pearson equal to 0.5, which corresponds also to a satisfying Accuracy.The quite stable behaviour of the Accuracy can be motivated by the weighed formula we adopted for computing the predictions, which is able to adapt either to a high number of supported patterns and transactions or to a high Pearson with less support.Summarizing, it turns out that it is better to have a good coverage with a moderate correlation than a very strong correlation with a low coverage.

Experimental Evaluation
In this section we show and discuss the results of several experiments we carried out in order to assess the effectiveness of the proposed approach.We focus on the paper reviews use case introduced as running example and we consider a quantitative analysis,  aiming to characterize the applicability of the approach in terms of performances, and a qualitative analysis, aiming to assess the quality of results.
The full dataset exploited in these experiments includes 814 papers, 1148 reviews and 2230 annotated sentences, containing overall 15124 distinct words.The complete dataset, expressed in ASP format, is available at https://www.mat.unical.it/~cauteruccio/ tplp-ehupm/.Other properties are shown in Table 1.All the experiments have been performed on a 2.3GHz MacBook computer (Intel Core i9) with 16 GB of memory.The data preprocessing pipeline was implemented in Python 3.9 and exploited the spaCy (https://spacy.io/)library.In order to run our ASP programs, we used clingo version 5.4.0 with the flag --mode=gringo as the grounder and WASP version 2.0 compiled via clang (version 11.0.3) with Python 3.9 support as the solver.

Quantitative Analysis
In a first series of experiments, we computed the average running times for two different settings.The first one is a sort of a stress test: it involves all the facets provided by the use case and computes their sum via an external function call; thus, all input items and values are relevant for the computation and, consequently, a large number of patterns are expected.The second one is more realistic and computes the disagreement degree between one of the annotated sentiments and the decision about the corresponding paper, namely the percentage of pattern occurrences showing a positive sentiment on originality and a reject decision; in this case, given its simplicity, the utility function is directly encoded with ASP rules.Results for running times are shown in Figure 3 for increasing number of papers, pattern size, and occurrence threshold (th f ); each data point represents the average running time computed for utility thresholds equal to [1,5,10,15,20,25,30] for the sum and [15,50,75,100] for the disagreement degree.To rescale the dataset, we randomly selected the set of papers to be removed at each downsizing step, so that each instance is a strict subset of its larger versions.This selection process has been carried out once and the same datasets have been used throughout the experiments.
First of all, from the analysis of the results, it is possible to observe that the choice of the setting can significantly impact the performances.This is due to a combination of factors, including the amount of input data relevant to the calculation, the number of patterns satisfying the thresholds, and the complexity of the utility function itself.
In more detail, looking at Figure 3a, we can observe that increasing the number of papers significantly increases execution time but, increasing the occurrence threshold significantly reduces required time.Moreover, especially for the bigger datasets, a higher occurrence threshold combined with a higher pattern size significantly reduces execution time.This behaviour can be mainly motivated by the portion of encoding for identifying frequent patterns; in particular, checking the number of occurrences within the choice rule, instead of using a standard constraint, makes rules more complex in general, but allows the evaluator to cut the search space more easily for bigger occurrence thresholds and, also, for larger patterns.In fact, it is worth pointing out that ground programs resulted to be smaller for increasing occurrence thresholds.We also observed that (results not shown due to space constraints) the utility threshold does not significantly impact performances.Similar considerations can be drawn for the tests on disagreement degree (Figure 3b).For these tests, it is worth observing the following differences with respect the previous ones using the sum: (i) input relevant facts are significantly lower (only words in sentences annotated with originality are useful); (ii) required time and memory (see below) are significantly lower; (iii) the number of valid patterns is significantly lower and, in this case, decreases with increasing utility thresholds.In particular, the maximum number of extracted patterns is 2480 with the sum, and 95 with the disagreement degree.
In a second series of experiments we considered the impact of external function calls in ASP.In particular, we consider two scenarios in which the utility function can be both expressed directly in ASP and in Python.Before going on with the analysis, let us emphasise once again that the constraints we are dealing with in this work to validate the patterns may come in the form of potentially complex functions, which are difficult, or even impossible, to express in ASP, and, as a result, external functions significantly broaden the scope of applicability of the proposed approach.Moreover, we observe that a deep analysis of when it is better to implement functions directly in ASP or in Python is out of the scope of this paper.The interested reader can find an extensive discussion on this aspect in (Cuteri et al. 2017).
In the first experiment we considered the sum function, as introduced above, with the following parameters: pattern size 3, occurrence threshold 6, utility threshold 5.This is the worst case scenario for evaluating external functions, since ASP includes built-in functions for computing the sum and, in fact, results presented in Figure 4a show that the impact of the external function calls may become consistent, especially when the number of papers increases.
The second experiment considered the product function; in particular, for the purposes of this experiment, we added a facet to each review representing a (fictitious) probability, and we considered the combined probability as the utility function, i.e., the product of all the probabilities associated with pattern occurrences.Also in this case, we implemented a version of the program using external functions and a version implementing the product computation directly in ASP; it is worth observing that, in this case, no built-in function like the sum is available in ASP for the product and, thus, it must be computed with the support of recursive rules (probabilities have been obviously scaled to integers between 0 and 100).Parameters used in these tests are as follows: pattern size 3, occurrence threshold 6, utility threshold 5. Results presented in Figure 4b clearly show that, in this case, the version with external functions significantly outperforms the one using Horizontal white lines denote the boundary for the memory required by the grounder (bottom) and solver (top) ASP only, thus motivating the adoption of external sources of computation for utility functions which are not straightforward to be expressed in ASP.In order to show the impact of external functions on memory usage, we report in Figure 5 memory consumption for functions sum and product expressed directly in ASP or in Python.From the analysis of this figure, it is possible to observe, again, that for the sum function, the impact of external functions may become consistent.However, when we turn to the product function, the amount of memory required by the version using external functions is extremely lower.In both cases, the main difference is on the solver part, whereas the grounder part requires similar amounts of memory.

Qualitative Analysis
In this section we describe the results of some experiments carried out to show if, and how much, facets and utility functions introduced in Section 2 can help users in analyzing input data from different points of view and at different abstraction levels.
In particular, we concentrate on the computation of coherence and disagreement degrees, introduced in Section 2.4, between the decision on a paper (Accept/Reject) and one of the eight aspects used to label review sentences (Chakraborty et al. 2020).In particular, the objective is to look for patterns (sets of words) characterizing with good accuracy the relationship of interest and to qualitatively check if they are meaningful.
Before going into the details of the analysis, in order to better show the properties of our approach, consider that the paper review dataset contains 686 words with contrasting values between a "sentiment" (on one of the aspects among Appropriateness, Clarity, Originality, Empirical/Theoretical Soundness, Meaningful Comparison, Substance, Impact of Dataset/Software/Ideas and Recommendation) and the Decision.Among them, 481 different words are in sentences with an "Originality" sentiment; this situation leads to 929 patterns of size between 2 and 4 with at least 4 occurrences just related to "Originality".As a consequence, any classical approach based only on pattern frequency would be overwhelmed by a large amount of patterns.
Table 5 shows some of the obtained results.Here, given the sentiment label on a sentence aspect X, the utility function measures the percentage of occurrences of the pattern showing coherence/disagreement between the sentiment on X and the decision on the paper, as specified in the first column.The second column of Table 5 reports the total number of patterns obtained by the approach for each test, whereas the third and fourth columns show an excerpt of the most relevant patterns along with their utilities.Parameters exploited for these tests are as follows: Minimum frequency=4, Minimum utility=70,3 pattern size between 2 and 4.4 It is worth pointing out that each run completed the computation in just few seconds, and switching between one test and the other involved few clicks and minor modifications to the ASP program.
From the analysis of Table 5 we can draw the following observations: (i) The number of valid patterns characterizing each setting is very low; this allows the user to concentrate on the most relevant ones.(ii) Patterns extracted in the different settings are quite different; this indicates a good specificity of derived patterns.(iii) For some settings, the number of derived patterns is 0; this means that the corresponding situation cannot be characterized.As far as this last observation is concerned, consider the very interesting situation analyzing Recommendation and Reject: in this case no patterns characterize a positive recommendation (which we recall is a derived sentiment from the sentences) associated with a reject decision, whereas 84 patterns characterize the opposite situation, namely negative recommendation and reject; this provides an interesting insight on the appropriateness of derived patterns.We also carried out a similar analysis relating the Rating facet, associated with the Review (which is an object in our framework), and the Decision facet, associated with the Paper (which is a container in our framework).Results are shown in the last two rows of Table 5; these confirm the analysis relating recommendation and decision and show how easy is to switch the analysis also between different levels of abstraction.

Related Work
Considering its importance as a research topic in the data mining field, HUPM received a huge interest both from academic community and industrial practitioners.HUPM finds application both in classical contexts where (normal) itemset mining was born, such as market analysis, and in novel ones such as biomedicine, mobile computing, etc. (Suvarna and Srinivas 2019).Thanks to its exhibited versatility, a huge volume of research studies has been produced regarding HUPM and its variants.According to Semantic Scholar5 , in the last twenty years more than 8,000 studies related to HUPM have been published.
In order to better characterize the work related to our approach and to discern among the plethora of presented studies, we divide the analysis of the related literature in two parts.The former provides a bird's-eye view of the general literature about HUPM, focusing on variations of the utility measure; the latter focuses on some declarative-based approaches to pattern mining.
It is worth pointing out that, to the best of our knowledge, no previous work addressed in a comprehensive way the various extensions introduced in the present paper; additionally, our work is the first ASP-based solution to the e-HUPM problem.

Approaches for HUPM and its variants
As previously pointed out, HUPM is a general container for several high utility-based mining approaches (Fournier-Viger et al. 2019;Gan et al. 2019).The corresponding research studies can be divided in several groups, depending on the aspect one is interested to analyze, such as employed algorithms or pruning strategies (Krishnamoorthy 2017;Yun et al. 2017;Wu et al. 2019;Yun et al. 2019;Sumalatha and Subramanyam 2020).In our context, we mainly focus on research that studies utility measures and their variants (Yao et al. 2006;Shen et al. 2002;Fournier-Viger et al. 2014;Cagliero et al. 2014;Fournier-Viger et al. 2020;Deng 2020).
One of the first works on unifying utility based measures is presented in (Yao et al. 2006), where the authors introduce a unified utility framework in which a user defined function f (x, y) is exploited to define utilities.Here, the significance of an item is measured by two parts: one is the statistical significance of the item, measured by x, and is an objective measure; the other is the semantic significance of the item, measured by y, which is a subjective measure dependent on the user perception of utility.Moreover, a weight to each transaction, called vertical weight, can be assigned.The authors show that this generalization can assign semantic significance at different levels of granularity (item, transaction, and cell level).The approach proposed in (Yao et al. 2006) and our own share some similarities, but the present work further extends the pattern mining capabilities, as specified next.(i) First of all, (Yao et al. 2006) attempts to provide different levels of abstraction for the identification of pattern utility; the present paper has a similar goal but extends it not only to transactions but also to generic abstraction layers, enabling the introduction of more advanced utility functions.(ii) Allowing f to be a user defined function extends the scope of application; however, only one property at a time can be used to state the semantic significance of an item; in our approach, the introduction of facets allows not only to consider several properties at the same time, but also to combine them dynamically.(iii) The introduction of facets also on transactions, objects, and containers allows not only analyses at different levels of granularity, as done in (Yao et al. 2006), but also the identification of patterns expressing correlations (and possibly causality) between different properties, at different levels.(iv) Finally, we have shown that beside classical sum and product operations on utility values, exploited in (Yao et al. 2006), more complex functions like Pearson correlation or Multiple correlation, allowed by the availability of facets, broaden the scope of potential analyses.
An effort to define a general model, called Objective-Oriented Utility-Based Association mining, in which mined association patterns are both statistically and semantically related to a user's given objective, is explained in (Shen et al. 2002).Here, the database is composed of a set of records structured in a pre-defined set of attributes, and an association rule is enhanced with an objective and utility value.Thus, the approach aims at deriving patterns from the records that both statistically and semantically support a given objective, by taking into account minimum support, confidence and utility.While the scope of (Shen et al. 2002) and our own are not completely overlapping, namely association rule mining vs high utility pattern mining, our approach can embrace the concept of objective-driven mining.Thus, with the adoption of suitable pattern masks, facets, and utility functions, our proposal can be considered to some extent as an extension of the concepts presented in (Shen et al. 2002).
The authors in (Deng 2020) define a novel measure called occupancy and formulate the problem of mining high occupancy itemsets.The occupancy is defined as the sum of the ratio between the cardinality of an itemset p and the cardinality of the transactions that contain p.Then, a high occupancy itemset is an itemset whose occupancy is not below a given threshold.Interestingly, the approach described in (Deng 2020) can be seen as a special case of our approach, where a horizontal first utility function class is applied.
It is also worth pointing out that different extensions of the HUPM problem are presented in the literature.Different aspects are considered, and each aspect aims at addressing the limitations of the classical problem (Fournier-Viger et al. 2019).As an example, an HUPM algorithm may show a large number of patterns to the user if the minimum utility threshold given in input is too low, thus it can be very hard for a user to analyze and describe retrieved patterns.To this end, several concise representations for high utility patterns have been studied.Four main representations are: closed high utility patterns, maximal patterns, generators of high utility patterns and maximal high utility patterns (Fournier-Viger et al. 2019); moreover, in (Gan et al. 2019) the Kulc measure has been adopted to extract non-redundant strongly correlated and profitable patterns.In our approach, the number of retrieved patterns can be reduced by defining more stringent utility functions; however, we can see our approach and the ones mentioned above as orthogonal, since they do not focus on the definition of utility, but rather on the representation and filtering of obtained results.
Other different aspects are studied in the literature, such as HUPM with negative utility and HUPM with discount strategies (Fournier-Viger et al. 2019).These are all covered by our framework, which can be seen as an extension of these special cases with further potentialities.
In addition to the aforementioned approaches, it is worth noting that HUPM has been employed in several different contexts, such as topic detection (Choi and Park 2019), relevant information extraction (Belhadi et al. 2020), aspect-based sentiment analysis (Demir et al. 2019), and mining in noisy databases (Baek et al. 2020).This variety of application contexts for HUPM further motivates our framework which, thanks to its generality, can accommodate very different settings, as it has been shown through the paper.
Finally, there are also different pattern extraction methods, not related to HUPM, which could be indirectly encompassed by our framework.An example is that of subgroup discovery (Herrera et al. 2011;Atzmueller 2015).Briefly, subgroup discovery is a method for knowledge discovery in databases whose aim is to identify relations between a target variable and many explaining and independent variables, determined by a quality function that can be flexibly defined (Atzmueller 2015).Intuitively, it means to discover the subgroups of the population that are statistically most interesting and, therefore, can be identified as a sort of pattern extraction technique.This task has been addressed with several approaches, including those exploiting evolutionary algorithms (del Jesus et al. 2007), complex network aspects ( Škrlj et al. 2018) and inductive logic programming ( Černoch and Železný 2012).Although our notion of utility could encompass similar quality measures as those used in subgroup discovery, the aim of our framework is different from the task of subgroup discovery.Indeed, this last task could be further investigated in a future work by considering the multi-level structure offered by our approach.The application of our framework we presented in Section 4 differs from subgroup discovery in terms of the purpose of the knowledge discovery task.The goal of classification in our application is to generate a model for each class that contains rules representing class characteristics regarding the descriptions of training examples.This model is used to predict the class of unknown objects.In contrast, subgroup discovery attempts to discover interesting relations with respect to the property of interest (Helal 2016).

Declarative-based approaches for pattern mining
The usage of declarative systems to tackle combinatorial problems is a consolidated approach which attracted a peculiar interest from researchers.The possibility of coupling the power of a high-level declaration with an optimised solver allows users to specify how patterns should be and not how they should be computed.There are different existing approaches that blend together the expressiveness and readiness to use of a declarative system within the problem of pattern mining (Järvisalo 2011;Gebser et al. 2016;Samet et al. 2017;Paramonov et al. 2019;Guyet et al. 2014;2016).The main objective of this section is to show the growing interest in declarative-based approaches for pattern mining as an alternative to dedicated algorithms, especially when the main focus is to add expressiveness to the standard problem at hand.To the best of our knowledge there is no ASP-based approach tackling the HUPM problem, even in its basic form; then, with respect to this aspect, our approach moves a step forward in the application of ASP with external functions in interesting data mining contexts.
The seminal work in (Järvisalo 2011) exploits ASP in the task of finding all frequent itemsets: each itemset is an answer set, thus the enumeration of all answer sets corresponds to the enumeration of all frequent itemsets.Although both our approach and the one in (Järvisalo 2011) exploit ASP, the latter addresses only the standard frequent itemset setting; in our extended HUPM setting, we also exploit the recent ASP language extensions for the application of complex functions on sets of data inferred during the evaluation of the program.
In (Gebser et al. 2016), the context of sequence mining is addressed, and an ASP-based mining approach is presented.Here, the focus is to express and incorporate knowledge in the mining process.The authors consider frequent (closed or maximal) sequential patterns and complex preferences on patterns.Frequent patterns are mined following similar encoding principles of (Järvisalo 2011), whereas preference-based mining is accomplished by exploiting the asprin system (Brewka et al. 2015) which allows expressing combinations of qualitative and quantitative preferences among answer sets.Rare sequential pattern mining is considered in (Samet et al. 2017).
A general and hybrid approach to pattern mining is presented in (Paramonov et al. 2019), where different dedicated mining systems are employed to detect frequent patterns and ASP is used to filter the results.Here, the tasks considered are itemset, sequence and graph mining, where local (frequency, size, and cost) and global (condensed representations such as maximal, closed, and skyline) constraints are combined in a generic way.The declarative approach here is devoted to post-process the patterns in order to select the valid ones.
Mining serial patterns, i.e., frequent sequential patterns from a unique sequence of itemsets, is introduced in (Guyet et al. 2014), whereas sequential pattern mining with two representations of embeddings (fill-gaps and skip-gaps) and several kinds of patterns are considered in (Guyet et al. 2016).
Other declarative approaches, not constrained to ASP and based, e.g., on constraint programming or SAT, are also discussed in the literature (Négrevergne et al. 2013;Guns et al. 2017;2016).Itemset mining is considered in (Négrevergne et al. 2013), in which a novel algebra for programming pattern mining problems is introduced.It allows for the generic combination of constraints on individual patterns with dominance relations between patterns.Dominance relations are pairwise preferences between patterns, used to express the idea that a pattern p 1 is preferred over another pattern p 2 .Various settings are described with combinations of constraints and dominance relations, including maximal and closed itemset mining, sky patterns, etc. Dominance programming offers a great extensibility, although it is not as general as our approach.Furthermore, the context considered here only aims at itemset mining.Itemset, sequence and graph mining tasks are also considered in a framework introduced in (Guns et al. 2016), in which these pattern types are studied under the lens of constraint-based mining (Mannila and Toivonen 1997).A declarative framework for constraint-based mining is presented in (Guns et al. 2017), where the objective is to bridging the methodological gap between data mining and constraint programming by providing a framework to support constraint-based pattern mining.In particular, the authors offer several contributions, spanning from a novel library of functions and constraints in the MiniZinc language to support modeling itemset mining tasks, to the automatic composition of execution strategies involving different solving methods.Indeed, the work in (Guns et al. 2017) shares some common similarities with our proposed approach.In particular, both the approaches provide a general framework for high-utility mining, and they are based on declarative paradigms.Our approach supports the various extensions to HUPM introduced in Section 2, which are not supported by (Guns et al. 2017).Nevertheless, (Guns et al. 2017) also deals with multiple databases, which is a feature not currently supported by our approach.
An interesting task studied in the literature is the discovery of skyline patterns, or "skypatterns" (Soulet et al. 2011;Ugarte et al. 2017).Skypatterns draw their definition from the economics and multi-criteria decision making worlds, in which they are commonly called Pareto efficiency or optimality queries.Intuitively, given a multidimensional space where a preference is defined for each dimension, a point a dominates another point b if a is better than b in at least one dimension, and a is not worse than b on every other dimension.Given a set of patterns, the skyline set contains patterns that are not dominated by any other pattern (Ugarte et al. 2017).Skypattern mining provides an easy expression of preferences over patterns by combining several measures on patterns.An approach for skypattern mining is presented in (Ugarte et al. 2017).Here, the authors address the discovery of skypatterns by proposing a unified methodology based on the following contributions, namely (i) a condensed representation of patterns, (ii) a dynamic method based on constraint programming that allows a continuous refinement of the skyline constraints, and (iii) the notion of particular local extrema in the search space of the problem.The work in (Ugarte et al. 2017) shares few similarities with ours; actually the two approaches can be considered complementary.

Conclusion
In this paper we introduced a general framework for HUPM with several extensions that significantly broaden the applicability of HUPM even in non classical contexts.We provided an ASP based computation method, which exploits the most recent extensions of ASP, and we have shown that this solution allows for a reasonable and versatile implementation.A real use case on paper reviews has been employed as a fil-rouge to analyze the different aspects of the proposed framework; in particular, we have shown that the introduction of facets and suitable advanced utility functions can both reduce the amount of relevant patterns and provide deep insights on the data, thus advancing the state-of-the-art on the interesting topic of high utility pattern mining.An application to a biomedical context has also been presented and tested, which showed how the novel pattern mining framework can be suitably exploited to design effective prediction models on biomedical data.
The presented work is just a first step in this new direction of utility-based analyses and several future research directions are now open.First of all, it will be interesting to apply the new features of the framework to a wide spectrum of contexts, such as biomedical data analysis, IoT data analysis, and fraud detection.A classification of the computational properties of the various extended utility functions will be also useful to devise ad-hoc algorithms.In particular, we plan to develop dedicated and efficient algorithms for specific settings of interest, in order to provide fast computational methods for extended high utility patterns.Another interesting line of future research regards a deeper study of the application of our e-HUPM to prediction tasks in various contexts.
Pearson Correlation (Pearson 1895): computes the correlation between one of the item/transaction facets and one of the object/container facets.It can be used to measure the agreement/disagreement about information at pattern/transaction level and object/container level.-Multiple Correlation (Lewis-Beck et al. 2003): computes the correlation between two or more item/transaction facets and one of the object/container facets.It can be used to measure how well a given object/container facet can be predicted using a linear function of a set of item/transaction facets.Example 5. (Running example) Let us consider the pattern utility vector U P for the pattern P = {paper, reproducibility} shown in Example 4. We now give examples for some of the classes of utility functions introduced above.

Fig. 2 :
Fig. 2: Percentage of transactions (a) and Percentage of combinations of patient attributes (b) covered by valid patterns

Fig. 3 :
Fig. 3: Average running time (in seconds) for utility functions sum (a) and disagreement degree (b)

Fig. 4 :Fig. 5 :
Fig. 4: Average running time (log scale) for utility functions sum (a) and product (b) using pure ASP or external functions

Table 1 :
Terminology and facets for the paper reviews use case running example introduced in Example 1

Table 2 :
Examples of container, object and transaction utility vectors for the paper reviews use case running example

Table 4 :
Accuracy (left)and missing rate (right)

Table 5 :
Qualitative analysis on coherence/disagreement degrees