Temporal Minimal-World Query Answering over Sparse ABoxes

Ontology-mediated query answering is a popular paradigm for enriching answers to user queries with background knowledge. For querying the absence of information, however, there exist only few ontology-based approaches. Moreover, these proposals conﬂate the closed-domain and closed-world assumption, and therefore are not suited to deal with the anonymous objects that are common in ontological reasoning. Many real-world applications, like processing electronic health records (EHRs), also contain a temporal dimension, and require eﬃcient reasoning algorithms. Moreover, since medical data is not recorded on a regular basis, reasoners must deal with sparse data with potentially large temporal gaps. Our contribution consists of two main parts: In the ﬁrst part we introduce a new closed-world semantics for answering conjunctive queries with negation over ontologies formulated in the description logic ELH (cid:150) , which is based on the minimal canonical model. We propose a rewriting strategy for dealing with negated query atoms, which shows that query answering is possible in polynomial time in data complexity. In the second part, we extend this minimal-world semantics for answering metric temporal conjunctive queries with negation over the lightweight temporal logic TELH c ◊ , lhs , − (cid:150) and obtain similar rewritability and complexity results. This paper is under consideration in Theory and Practice of Logic Programming (TPLP).


Introduction
Ontology-mediated query answering (OMQA) allows using background knowledge for answering user queries, supporting data-focused applications offering search, analytics, or data integration functionality.An ontology is a logical theory formulated in a decidable fragment of first-order logic, for example, a description logic (DL) (Baader et al. 2017), with a trade-off between the expressivity of the ontology language and the efficiency of query answering.Rewritability is a popular topic of research, the idea being to reformulate ontological queries, such as conjunctive queries (CQs) enhanced by an ontology, into database queries that can be answered by traditional database management systems (Mugnier and Thomazo 2014;Eiter et al. 2012;Bienvenu and Ortiz 2015;Calvanese et al. 2017;Kharlamov et al. 2017).
Ontology-based systems do not use the closed-domain and closed-world semantics of databases.Instead, they acknowledge that unknown (anonymous) objects may exist (open S. Borgwardt et al. domain) and that facts that are not explicitly stated may still be true (open world).Anonymous objects are related to null values in databases; for example, if we know that every person has a mother, then first-order models include all mothers, even though they may not be mentioned explicitly in the input data set.In addition, the open-world assumption ensures that, if the data set does not contain an entry on, for example, whether a person is male or female, then we do not infer that this person is neither male nor female, but rather consider all possibilities.
In the last decade, OMQA has been extended to temporal DLs that combine terminological and temporal knowledge representation capabilities (Wolter and Zakharyaschev 2000;Lutz et al. 2008;Artale et al. 2017).To obtain tractable reasoning procedures, lightweight temporal DLs have been developed (Artale et al. 2007;Gutiérrez-Basulto et al. 2016).The idea is to use temporal operators, often from the linear temporal logic LTL, inside DL axioms.For example, − ◊∃diagnosis.BrokenLeg ⊑ ∃treatment.LegCast states that after breaking a leg one has to wear a cast.However, this basic approach cannot represent the distance of events, for example, that the cast only has to be worn for a fixed amount of time.Recently, metric temporal ontology languages have been investigated (Gutiérrez-Basulto et al. 2016;Baader et al. 2017;Brandt et al. 2018), which allow to replace − ◊ in the above axiom with ◊ [−8,0] , that is, wearing the cast is required only if the leg was broken ≤ 8 time points (e.g.weeks) ago.
The biomedical domain is a fruitful area for OMQA methods, due to the availability of large ontologies covering a multitude of topics1 and the demand for managing large amounts of patient data, in the form of electronic health records (EHRs) (Cresswell and Sheikh 2017).For example, for the preparation of clinical trials,2 a large number of patients need to be screened for eligibility, and an important area of current research is how to automate this process (Patel et al. 2007;Besana et al. 2010;Köpcke and Prokosch 2014;Tagaris et al. 2014;Ni et al. 2015).In particular, many clinical trials contain temporal eligibility criteria (Crowe and Tao 2015), such as: "type 1 diabetes with duration at least 12 months"3 ; "known history of heart disease or heart rhythm abnormalities"4 ; "CD4+ lymphocytes count ≥ 250/mm 3 , for at least 6 months"5 ; or "symptomatic recurrent paroxysmal atrial fibrillation (PAF) (≥ 2 episodes in the last 6 months)"6 .Moreover, measurements, diagnoses, and treatments in a patients' EHR are clearly valid only for a certain amount of time.To automatically screen patients according to the temporal criteria above, one needs a sufficiently powerful formalism that can reason about biomedical and temporal knowledge.
Additionally, ontologies and EHRs mostly contain positive information, while clinical trials also require certain exclusion criteria to be absent in the patients.For example, we may want to select only patients who have not been diagnosed with cancer, 7 but such information cannot be entailed from the given knowledge.The culprit for this problem is the open-world semantics, which considers a cancer diagnosis possible unless it has been explicitly ruled out.Unfortunately, existing approaches like (partial) closed-world semantics (Lutz et al. 2013;Ahmetaj et al. 2016) or epistemic logics are unable to deal with closed-world knowledge over anonymous objects (Wolter 2000;Calvanese et al. 2006).
In the first part of this paper, we introduce a new closed-world semantics to answer conjunctive queries with (guarded) negation (Bárány et al. 2012) over ontologies formulated in ELH , an ontology language that covers many biomedical ontologies, for example, SNOMED CT. 8 Our semantics, called minimal-world semantics, is based on the minimal canonical model, which encodes all inferences of the ontology in the most concise way possible.As a side effect, this means that ordinary CQs without negation are interpreted under the standard open-world semantics.In order to properly handle negative knowledge about anonymous objects, however, we have to be careful in the construction of the canonical model, in particular about the number and types of anonymous objects that are introduced.Since in general the minimal canonical model is infinite, we develop a rewriting technique, in the spirit of the combined approach of Lutz et al. (2009), Kontchakov et al. (2011), and most closely inspired by Eiter et al. (2012), Bienvenu and Ortiz (2015), which allows us to evaluate CQs with negation over a finite part of the canonical model, using traditional database techniques.
In the second part of the paper, we extend the minimal-world semantics to support also temporal information.When working with EHRs, which contain information for specific points in time only, it is especially important to be able to infer what happened to the patient in the meantime.For example, if a patient is diagnosed with a (currently) incurable disease like Diabetes, they will still have the disease at any future point in time.Similarly, if the EHR contains two entries of CD4Above250 four weeks apart, one may reasonably infer that this was true for the whole four weeks.We use the qualitative temporal DL T ELH c ◊,lhs,− (Borgwardt et al. 2019) that can express the former statement by declaring Diabetes as expanding via the axiom − ◊Diabetes ⊑ Diabetes.Additionally, by a special kind of metric temporal operators it is allowed to write c c ◊ 4 CD4Above250 ⊑ CD4Above250, making the measurement convex for a specified length of time n (e.g. 4 weeks).This means that information is interpolated between time points of distance less than n, thereby computing a convex closure of the available information.The threshold n allows to distinguish the case where two mentions of CD4Above250 are years apart, and are therefore unrelated.Reasoning in T ELH c ◊,lhs,− is tractable in data complexity, because ◊-operators are only allowed on the left-hand side of concept inclusions (Borgwardt et al. 2019;Gutiérrez-Basulto et al. 2016), which is also common for temporal DLs based on DL-Lite (Artale et al. 2013;2015).
Here we consider the problem of answering metric temporal queries over T ELH c ◊,lhs,− knowledge bases (KBs) with the minimal-world semantics introduced in the first part of the paper.Our query language extends the temporal CQs from Baader et al. (2015b) by metric temporal operators (Gutiérrez-Basulto et al. 2016;Baader et al. 2017) and negation.For example, we can use queries like ◻ [−12,0] (∃y.diagnosedWith(x, y)∧Diabetes(y)) to select all patients x for which the first criterion from above is satisfied.By extending our combined rewriting approach, we show that the data complexity of temporal query answering is not higher than for (positive) atemporal queries in ELH , and also provide a tight combined complexity result of ExpSpace.Unlike most research on temporal query answering (Baader et al. 2015b;Artale et al. 2015), we do not assume that input data are given for all time points in a certain interval, but rather at sporadic time points with arbitrarily large gaps (Brandt et al. 2018).The main technical difficulty is to determine which additional time points are relevant for answering a query, and how to access these time points without having to fill infinitely many gaps.
This paper extends the conference paper Borgwardt and Forkel (2019) by allowing also nonrooted queries and extends parts of Borgwardt et al. (2019) by providing full proofs of all results and improving some of the running examples.

Preliminaries
We recall the definitions of ELH and first-order queries, which are needed for our rewriting of CQs with negation.
The Description Logic ELH .Let C, R, I be countably infinite sets of concept, role, and individual names, respectively.A concept is built according to the syntax rule In the following, we assume the TBox to be in normal form, that is, that it contains only inclusions of the form where The semantics of ELH is defined as usual (Baader et al. 2007) in terms of interpretations I = (Δ I , ⋅ I ), where the domain Δ I is a nonempty set and the functions ⋅ I are extended from concept and role names inductively as follows: , and the KB K if it satisfies all axioms in K.In the following, we assume all KBs to be consistent and make the standard name assumption, that is, that for every individual name a in any interpretation I we have a I = a.An axiom α is entailed by K and similarly for role inclusions; note that the ABox does not influence the entailment of inclusions.Entailment in ELH can be decided in polynomial time (Baader et al. 2005).are at most n − 1 time points apart, that is, enclose an interval of length n.

The Temporal Description
To make the behavior of these operators more clear, we consider their semantics in a more abstract way: given a set of time points where a certain information is available (e.g. a diagnosis), described by a propositional variable p, we consider the resulting set of time points at which ⋆ ◊p holds, where ⋆ ◊ is a placeholder for one of the operators defined above (we will similarly use • ◊, † ◊, ‡ ◊ as placeholders for different ◊-operators in the following).
We consider the sets ◊M is also empty, for any ⋆ ◊ ∈ D. For any nonempty M ⊆ Z, the following expressions are obtained, where max M may be ∞ and min M may be −∞.
concept assertions A(a, i) and role assertions r (a, b, i), where A ∈ C, r ∈ R, a, b ∈ I, and i ∈ Z.The set of time points i ∈ Z occurring in A we denote as tem(A).Also we assume each time point is encoded in binary.In the following, we always assume a KB K = T ∪A to be given.
The functions ⋅ Ii are extended to temporal concepts as follows: , and the KB K if it satisfies all axioms in K.This fact is denoted by I ⊧ α, where α is an axiom (i.e.inclusion or assertion) or a KB.Entailment of CIs and assertions in T ELH c ◊,lhs,− is P-complete (Borgwardt et al. 2019).
As in many lightweight temporal logics, diamonds are not allowed to occur on the right-hand side of CIs, because that would allow to simulate concept disjunction and make the logic intractable (Artale et al. 2007;Gutiérrez-Basulto et al. 2016).As usual, we can simulate CIs involving complex concepts by introducing fresh concept and role names as abbreviations.For example, ∃r.− ◊A ⊑ B can be split into − ◊A ⊑ A ′ , and ∃r.A ′ ⊑ B.
Hence, we can restrict ourselves w.l.o.g. to inclusions in the following normal form: where Conjunctive Query Answering.Let V be a countably infinite set of variables.The set of terms is T ∶= V ∪ I.A first-order query φ(x) is a first-order formula built from concept atoms A(t) and role atoms r(t, t ′ ) with A ∈ C, r ∈ R, and t i ∈ T, using the boolean connectives (∧, ∨, ¬, →) and universal and existential quantifiers (∀x, ∃x).The free variables x of φ( x) are called answer variables and we say that φ is k-ary if there are k answer variables.The remaining variables are the quantified variables.We use Var(φ) to denote the set of all variables in φ.A query without any answer variables is called a Boolean query.
Let I = (Δ, ⋅ I ) be an interpretation.An assignment π∶ Var(φ) → Δ satisfies φ in I, if I, π ⊧ φ under the standard semantics of first-order logic.We write I ⊧ φ if there is a satisfying assignment for φ in I. Let K be a KB.A k-tuple a of individual names from Ind(K) is an answer to φ in I if φ has a satisfying assignment π in I with π(x) = a; it is a certain answer to q over K if it is an answer to q in all models of K. We denote the set of all answers to φ in I by ans(φ, I), and the set of all certain answers to φ over K by cert(φ, K).
A conjunctive query (CQ) q(x) is a first-order query of the form ∃y. ϕ(x, y), where ϕ is a conjunction of atoms.Abusing notation, we write α ∈ q if the atom α occurs in q, and conversely may treat a set of atoms as a conjunction.The leaf variables x in q are those that do not occur in any atoms of the form r(x, y).Clearly, q is satisfied in an interpretation if there is a satisfying assignment for ϕ(x, y), which is often called a match for q.A CQ is rooted if all variables are connected to an answer variable through (a sequence of) role atoms.
CQ answering over ELH KBs is combined first-order rewritable (Lutz et al. 2009): For any CQ q and consistent KB K = (T , A), we can find a first-order query q T and a finite interpretation I fin K such that cert(q, K) = ans(q T , I fin K ).Importantly, I fin K is independent of q, that is, can be reused to answer many different queries, while q T is independent of A, that is, each query can be rewritten without using the (possibly large) data set.The rewritability results are based crucially on the canonical model property of ELH : For any consistent KB K one can construct a model I K that is homomorphically contained in any other model.This is a very useful property since any match in the canonical model corresponds to matches in all other models of K, and therefore cert(q, K) = ans(q, I K ) holds for all CQs q.For the complexity analysis, one considers the decision problem of whether a given tuple a belongs to cert(q, K).The complexity of query answering is usually viewed in two different ways: In combined complexity the KB, the data and the query are considered as input, while in data complexity, which is closely related to rewritability, only the data is considered as input.The latter is often viewed as the more appropriate, since the queries and the KB are tend to be fixed and relatively small, compared to potentially huge amounts of data, that are updated frequently.CQ answering for ELH is P-complete in data complexity and NP-complete in combined complexity (Rosati 2007b).

Conjunctive queries with negation
We are interested in answering queries of the following form.

Definition 1
Conjunctive Queries with (guarded) negation (NCQs) are constructed by extending CQs with negated concept atoms ¬A(t) and negated role atoms ¬r(t, t ′ ), such that, for any negated atom over terms t (and t ′ ) the query contains at least one positive atom over t (and t ′ ).
An NCQ is rooted if its variables are all connected via role atoms to an answer variable (from x) or an individual name.An NCQ is Boolean if it does not have answer variables.To determine whether I ⊧ φ holds for an NCQ φ and an atemporal interpretation I, we use standard first-order semantics.
We first discuss different ways of handling the negated atoms, and then propose a new semantics that is based on a particular kind of minimal canonical model.For this, we consider an example based on real EHRs (ABoxes) from the MIMIC-III database (Johnson et al. 2016), criteria (NCQs) from clinicaltrials.gov,and the large medical ontology SNOMED CT 9 (the TBox).We omit here the "role groups" used in SNOMED CT, which do not affect the example.We also simplify the concept names and their definitions for ease of presentation.We assume that the ABoxes have been extracted from EHRs by a natural language processing tool based, for example, on existing concept taggers like (Aronson 2001;Savova et al. 2010); of course, this extraction is an entire research field in itself, which we do not attempt to tackle in this paper.

Example 2
We consider three patients.Patient p 1 (patient 2693 in the MIMIC-III data set) is diagnosed with breast cancer and an unspecified form of cancer (this often occurs when there are multiple mentions of cancer in a patient's EHR, which cannot be resolved to be the same entity).Patient p 2 (patient 32304 in the MIMIC-III data set) suffers from breast cancer and skin cancer ("[S]tage IV breast cancer with mets to skin, bone, and liver").For p 3 (patient 88432 in the MIMIC-III data set), we know that p 3 has breast cancer that involves the skin ("Skin, left breast, punch biopsy: Poorly differentiated carcinoma").
Since SNOMED CT does not model patients, we add a special role name diagnosedWith that connects patients with their diagnoses.One can use this to express diagnoses in two ways.First, one can explicitly introduce individual names for diagnoses in assertions like diagnosedWith(p 1 , d 1 ), BreastCancer(d 1 ), diagnosedWith(p 1 , d 2 ), Cancer(d 2 ), implying that these diagnoses are treated as distinct entities under the standard name assumption.Alternatively, one can use complex assertions like ∃diagnosedWith.Cancer(p 1 ), which allows the logical semantics to resolve whether two diagnoses actually refer to the same object.Since ABoxes only contain concept names, in this case one has to introduce auxiliary definitions like CancerPatient ≡ ∃diagnosedWith.Cancer into the TBox.We use both variants in our example, to illustrate their different behaviors.
We obtain the KB K C , containing knowledge about different kinds of cancers and cancer patients, together with information about the three patients.Temporal Minimal-World Semantics for Sparse ABoxes 201 diagnosed with breast cancer.This means that, in every model of K C , every object that satisfies BreastCancerPatient (in particular p 2 ) must have a diagnosedWith-connected object that satisfies BreastCancer, and so on.Moreover, a cancer may also occur in the skin of the breast, that is, a part of the body that is classified as both a "skin structure" and as a "breast structure".
For a clinical trial,10 we want to find patients that have "breast cancer", but not "breast cancer that involves the skin."This can be translated into an NCQ: We know that p 1 is diagnosed with BreastCancer as well as Cancer.Since the former is more specific, we assume that the latter refers to the same BreastCancer.However, since we have no information about an involvement of the skin, p 1 should be returned as an answer to q B .We know that p 2 suffers from cancer in the skin and the breast, but not if the skin of the breast is also affected.Since neither location is implied by the other, we assume that they refer to distinct areas.p 2 should thus be an answer to q B .
In the case of p 3 , it is explicitly stated that it is the same cancer that is occurring (not necessarily exclusively) at the skin of the breast.In this case, the ABox assertions override the distinctness assumption we made for p 2 .Thus, p 3 should not be an answer to q B .∎ In practice, more complicated cases than in our example can occur: The nesting of anonymous objects will be deeper and more branched when using large biomedical ontologies.For example, in SNOMED CT it is possible to describe many details of a cancer, such as the kind of cancer, whether it is a primary or secondary cancer, and in which part of the body it is found.This means that even a single assertion can lead to the introduction of multiple levels of anonymous objects in the canonical model.In some ontologies, there are even cyclic concept inclusions, which lead to infinitely many anonymous individuals, for example, in the GALEN ontology11 .We focus on Example 2 in this paper, to illustrate the relevant issues in a clear and easy to follow manner.
We now evaluate existing semantics on this example.
Standard Certain Answer Semantics as defined in Section 2 is clearly not suited here, because one can easily construct a model of K C in which c 1 is also a skin cancer, and hence p 1 is not an element of cert(q B , K C ).Moreover, under certain answer semantics answering CQs with guarded negation is already coNP-complete (Gutiérrez-Basulto et al. 2015), and hence not (combined) rewritable.
Epistemic Logic allows us to selectively apply closed-world reasoning using the modal knowledge operator K.For a formula Kϕ to be true, it has to hold in all "connected worlds", which is often considered to mean all possible models of the KB, adopting an S5-like view (Calvanese et al. 2006).For q B , we could read ¬SkinStructure(z) as "not known to be a skin structure", that is, ¬KSkinStructure(z).Consider the model I K C in Figure 2 and the assignment π = {x ↦ p 3 , y ↦ c 3 , z ↦ f 3 }, for which we want to check whether it is a match for q B .Under epistemic semantics, ¬KSkinStructure(z) is considered true if K has a (different) model in which f 3 does not belong to SkinStructure.However, f 3 is an anonymous object, and hence its name is not fixed.For example, we can easily obtain another model by renaming f 3 to f 1 and vice versa.Then f 3 would not be a skin structure, which means that ¬KSkinStructure(z) is true in the original model I K C , which is not what we expected.This is a known problem with epistemic first-order logics (Wolter 2000).
Skolemization can enforce a stricter comparison of anonymous objects between models.
The inclusion SkinOfBreastCancer ⊑ ∃findingSite.SkinOfBreast could be rewritten as the first-order sentence where f is a fresh function symbol.This means that c 3 would be connected to a finding site that has the unique name f (c 3 ) in every model.Queries would be evaluated over Herbrand models only.Hence, for evaluating ¬KSkinStructure(z) when z is mapped to f (c 3 ), we would only be allowed to compare the behavior of f (c 3 ) in other Herbrand models.The general behavior of this anonymous individual is fixed, however, since in all Herbrand models it is the finding site of c 3 .While this improves the comparison by introducing pseudo-names for all anonymous individuals, it limits us in different ways: Since p 3 is inferred to be a BreastCancerPatient, the Skolemized version of BreastCancerPatient ⊑ ∃diagnosedWith.BreastCancer introduces a new successor g(p 3 ) of p 3 satisfying BreastCancer, which, together with the definition of BreastCancer, means that p 3 is an answer to q B since there is an additional breast cancer diagnosis that does not involve the skin.
Datalog-based Ontology Languages with negation (Hernich et al. 2013;Arenas et al. 2014) are closely related to Skolemized ontologies, since their semantics is often based on the https://doi.org/10.1017/S1471068421000119Published online by Cambridge University Press Temporal Minimal-World Semantics for Sparse ABoxes 203 so-called Skolem chase (Marnette 2009).This is closer to the semantics we propose in Section 3.1, in that a single canonical model is used for all inferences.However, it suffers from the same drawback of Skolemization described above, due to superfluous successors.
To avoid this, our semantics uses a special minimal canonical model (see Definition 4), which is similar to the restricted chase (Fagin et al. 2005) or the core chase (Deutsch et al. 2008), but always produces a unique model without having to merge domain elements.
Our minimal canonical model can also be seen as an instance of the approach in Krötzsch (2020), which identifies conditions under which a chase variant produces a unique core model (for more details on this connection see Section 3.5.3 in Forkel ( 2020)).However, (Krötzsch 2020) focuses on cases where this model is finite, which is not the case for general ELH ontologies.To the best of our knowledge, there exist no complexity results for Datalog-based languages with negation over these other chase variants.
Closed Predicates are a way to declare, for example, the concept name SkinStructure as "closed", which means that all skin structures must be declared explicitly, and no other SkinStructure object can exist (Lutz et al. 2013;Ahmetaj et al. 2016).This provides a way to give answers to negated atoms as in q B .However, as explained in the introduction, this mechanism is not suitable for anonymous objects since it means that only named individuals can satisfy SkinStructure.When applied to K C , the result is even worse: Since there is no (named) SkinStructure object, no skin structures can exist at all and K C becomes inconsistent.Closed predicates are appropriate in cases where the KB contains a full list of all instances of a certain concept name, and no other objects should satisfy it; but they are not suitable to infer negative information about anonymous objects.Moreover, CQ answering with closed predicates in ELH is already coNP-hard (Lutz et al. 2013).
Summary All of this should not be read as saying that these semantics are bad, just that they have not been developed with our use case involving anonymous objects in mind.
Only Skolemization deals with anonymous objects by giving them unique names, but this forces all of them to be distinct.Our new semantics remedies this, but at the same time tries to stay as close as possible to the behavior of the existing semantics when restricted to the known objects in the ABox.

Semantics for NCQs
We propose to answer NCQs over a special canonical model of the KB.On the one hand, this eliminates the problem of tracking anonymous objects across different models, and on the other hand enables us to encode our assumptions directly into the construction of the model.In particular, we should only introduce the minimum necessary number of anonymous objects since, unlike in standard CQ answering, the precise shape and number of anonymous objects has an impact on the semantics of negated atoms.Given K C , in contrast to the Skolemized semantics, we will not create both a generic "Cancer" and another "BreastCancer" successor for p 1 , because the BreastCancer is also a Cancer, and hence the first object is redundant.Therefore, in the minimal canonical model of K C depicted in Figure 2, for patient p 1 only one successor is introduced to satisfy the definitions of both BreastCancerPatient and CancerPatient at the same time.
In contrast, p 2 has two successors, because BreastCancer and SkinCancer do not imply each other.Finally, for p 3 the ABox contains a single successor that is a SkinOfBreast-Cancer, which implies a single findingSite-successor that satisfies both SkinStructure and BreastStructure.
To detect whether an object required by an existential restriction ∃r.A is redundant, we use the following notion of minimality.
Definition 3 (Structural Subsumption) Let ∃r.A, ∃t.B be concepts with A, B ∈ C and r, t ∈ R. We say that ∃r.A is structurally subsumed by ∃t.B (written with the same answer variables (written q 1 ⊑ s T q 2 ) if, for all x, y ∈ x, it holds that α, and where role conjunction is interpreted in the standard way (Baader et al. 2007).
In contrast to standard subsumption, ∃r.A is not structurally subsumed by ∃t.B w.r.t. the TBox T = {∃r.A ⊑ ∃t.B}, as neither r ⊑ T t nor A ⊑ T B hold.Similarly, structural subsumption for CQs considers all (pairs of) variables separately.
We use this notion to define the minimal canonical model.

Definition 4 (Minimal Canonical Model)
Let K = (T , A) be an ELH KB.We construct the minimal canonical model I K of K as follows: 1. Set Δ I K ∶= I and a I K ∶= a for all a ∈ I.

Define
(a) Select an element d ∈ Δ I K that has not been selected before and let T , add a fresh element e to Δ I K , for each B ⊑ T A add e to A I K , and for each r ⊑ T s add (d, e) to s I K .
By I A we denote the restriction of I K to named individuals, that is, the result of applying only Steps 1 and 2, but not Step 3.

If
Step 3 is applied fairly, that is, such that each new domain element that is created in (b) is eventually also selected in (a), then I K is indeed a model of K (if K is consistent at all).In particular, all required existential restrictions are satisfied at each domain element, because the existential restrictions that are minimal w.r.t.⊑ s T entail all others.Moreover, I K satisfies the properties expected of a canonical model (Lutz et al. 2009;Eiter et al. 2012): It can be homomorphically embedded into any other model of K, and therefore cert(q, K) = ans(q, I K ) holds for all CQs q.It turns out that the minimal canonical model is also a core: Each endomorphism of I K , that is, a homomorphism from I K into itself, is injective, surjective, and preserves negation (see also Forkel (2020)

Proof
To show this, we prove the stronger statement that the only endomorphism on I K is the identity.Let I 0 , I 1 , . . .be the interpretations obtained in the construction of I K before each application of Step 3. We show by induction on i that there are h 0 , h 1 , . . .such that h i is the only possible homomorphism from I i to I K and h i and h i+1 agree on Δ Ii , that is h i+1 (d) = h i (d) for all d ∈ Δ Ii .The endomorphism is then obtained in the limit as h = ⋃ i≥0 h i .By definition, the homomorphism h 0 has to map all constants onto itself, that is, h 0 (a) = a for all a ∈ I, and is therefore the identity function.For the induction step, assume that h i has already been defined.To define h i+1 , assume that d ∈ Δ Ii was picked in Step 3(a) and V is the set as defined in Definition 4. For each ∃r.B ∈ V that is minimal w.r.t.structural subsumption, in Step 3(b) exactly one successor e is introduced.Suppose there would be another successor e ′ of d, introduced through some minimal ∃r ′ .B ′ and e could be mapped to e ′ .This would imply that ∃r ′ .B ′ ⊑ s T ∃r.B, which is a contradiction since we assumed ∃r.B to be minimal.Hence, such an e ′ cannot exist and therefore the only possibility is to map e onto itself.Therefore, the only possibility is to define h i+1 ∶= h i ∪ {e ↦ e}, which is the identity function.
We can now adopt a result from graph theory and cores, namely Theorem 11 in Bauslaugh (1995), which shows that, if two structures are homomorphically equivalent, then their cores are isomorphic.Since all canonical models are homomorphically equivalent by definition, this implies that every consistent ELH -KB has a unique core model (up to isomorphism), which is why a minimal canonical model is also the minimal canonical model of K.
We now define the semantics of NCQs as described before, that is, by evaluating them as first-order formulas over the minimal canonical model I K , which ensures that our semantics is compatible with the usual certain-answer semantics for CQs.
Definition 6 (Minimal-World Semantics) Let K be a consistent ELH KB.The (minimal-world) answers to an NCQ q over K are mwa(q, K) ∶= ans(q, I K ).
For Example 2, we get mwa(q B , K C ) = {p 1 , p 2 } (see Figure 2), which is exactly as intended.Unfortunately, in general the minimal canonical model is infinite, and we cannot evaluate the answers directly.Hence, we employ a rewriting approach to reduce NCQ answering over the minimal canonical model to (first-order) query answering over I A only.

A combined rewriting for NCQs
We show that NCQ answering is combined first-order rewritable.As target representation, we obtain first-order queries of a special form.
Definition 7 (Filtered query) Let K = (T , A) be an ELH KB.A filter on a variable z is a first-order expression ψ(z) of the form where ψ + (z, z ′ ) is a conjunction of atoms of the form A(z ′ ) or r(z, z ′ ) that contains at least one role atom, ψ − (z, z ′ ) is a conjunction of negated atoms ¬A(z ′ ) or ¬r(z, z ′ ), and Ψ is a (possibly empty) set of filters on z ′ .A filtered query φ is of the form ∃y.(ϕ(x, y) ∧ Ψ) where ∃y.ϕ(x, y) is an NCQ and Ψ is a set of filters on leaf variables in ϕ.It is rooted if ∃y.ϕ(x, y) is rooted.
Note that every NCQ is a filtered query where the set of filters Ψ is empty.
We will use filters to check for the existence of "typical" successors, that is, role successors that behave like the ones that are introduced by the canonical model construction to satisfy an existential restriction.In particular, a typical successor does not satisfy any superfluous concept or role atoms.For example, in Figure 2 the element c 1 introduced to satisfy ∃diagnosedWith.BreastCancer for p 1 is a typical successor, because it satisfies only BreastCancer and Cancer and not, for example, SkinCancer.In contrast, the diagnosedWith-successor c 3 of p 3 is atypical, since the ontology does not contain an existential restriction ∃diagnosedWith.SkinOfBreastCancer that could have introduced such a successor in the canonical model.
The idea of the rewriting procedure is to not only rewrite the positive part of the query, as in Eiter et al. (2012), Bienvenu and Ortiz (2015), but to also ensure that no critical information is lost.This is accomplished by rewriting the negative parts and by saving the structure of the eliminated part of the query in the filter.A filter on z ensures that the rewritten query can only be satisfied by mapping z to an anonymous individual in the canonical model, or to a named individual that behaves in a similar way.
Definition 8 (Rewriting) Let K = (T , A) be a KB and φ = ∃y.ϕ(x,y) ∧ Ψ be a filtered query.We write φ → T φ ′ if φ ′ can be obtained from φ by applying the following steps: (S2) Select some M ⊑ T ∃s.N with M, N ∈ C that satisfies all of the following: We write φ → * T φ ′ if there exists a finite sequence φ Note that rew T (φ) may be infinite.However, for rooted NCQs it is finite and we show later that even for nonrooted NCQs it suffices to consider a finite subset of rew T (φ) (see Lemma 12).To see the former claim, observe that there is only a finite number of possible subsumptions M ⊑ T ∃s.N that can be used for rewriting steps.Additionally, in every step one variable (x) is eliminated from the NCQ part of the filtered query.If the query is rooted, there always exists at least one predecessor that is renamed to ŷ, hence the introduction of ŷ never increases the number of variables.Finally, it is easy to see that rewriting a rooted query always yields a rooted query.
The rewriting of Neg to the new negated atoms (via M ′ in (S6)) ensures that we do not lose important exclusion criteria, which may result in too many answers.Similarly, the filters exclude atypical successors in the ABox that may result in spurious answers.Both of these constructions are necessary.

Example 9
Consider the query q B from Example 2. Using Definition 8, we obtain the first-order is obtained due to BreastCancerPatient ⊑ K C ∃diagnosedWith.BreastCancer.We omitted the redundant atoms Cancer(y) for clarity.The finite interpretation I A C can be seen in Figure 2 by ignoring all light grey figures.When computing the answers over B , the conjunct ¬SkinOfBreastCancer(y) is necessary to exclude p 3 as an answer.In φ ′′ B , p 3 is excluded due to the filter that detects c 3 as an atypical successor, because it satisfies not only BreastCancer, but also SkinOfBreastCancer.Hence, both (S6) and (S7) are necessary steps in our rewriting.∎

Correctness
In Definition 8, the new filter ψ * (ŷ) may end up inside another filter expression after applying subsequent rewriting steps, that is, by rewriting w.r.t.ŷ.In this case, however, the original structure of the rewriting is preserved, including all internal filters as well as the atoms M (ŷ), which are included implicitly by ∃s.N ⊑ M , and {¬M ′ (ŷ) | M ′ ∈ M ′ }, which are included in Neg.We exploit this behavior to show that, whenever a rewritten query is satisfied in the finite interpretation I A , then it is also satisfied in I K .This is the most interesting part of the correctness proof, because it differs from the known constructions for ordinary CQs, for which this step is trivial.

Lemma 10
Let K = (T , A) be a consistent ELH KB and φ be an NCQ.Then, for all φ ′ ∈ rew T (φ),
Since I A and I K coincide on the domain I, we also have Then ψ(z) was introduced at some point during the rewriting, suppose by selecting M ⊑ T ∃s.N in (S2).This means that ϕ contains the atom M (z), and hence ), and we have to show that ) by the same argument as for ϕ(x, y) above, and we can proceed by induction on the structure of the filters to show that the inner filters Ψ * are satisfied by the assignment π (extended appropriately for z ′ ). 2. If I A , π / ⊧ ∃z ′ .β(z,z ′ ), then we cannot use a named individual to satisfy the filter ψ(z) in I K .Moreover, since I A satisfies ψ(z), we also know that We show that the assignment π ∪ {z ′ ↦ d ′ } also satisfies ψ − (z, z ′ ) = Neg.Assume to the contrary that there is ¬A(z ′ ) ∈ Neg such that d ′ ∈ A I K (the case of negated role atoms is again analogous).Then we have N ′ ⊑ T A, which shows that all conditions of (S3) are satisfied, and hence M ′ must be included in M ′ .Since the atoms {¬M ′ (z) | M ′ ∈ M ′ } are contained in ϕ, we know that they are satisfied by π in I K , that is, d ∉ (M ′ ) I K and hence also d ∉ (M ′ ) I A , which is a contradiction.
It remains to show that the inner filters Ψ * are satisfied by the assignment π ∪{z ′ ↦ d ′ } in I K .Since we are now dealing with an anonymous domain element d ′ , we can use similar, but simpler, arguments as above to prove this by induction on the structure of the filters.This is possible because the atoms s(ŷ, x), N (x) implied by M (ŷ) and the negated atoms induced by M ′ are present in the query even if the filter is integrated into another filter during a subsequent rewriting step.◻ We can use this lemma to show correctness of our approach, that is, the answers returned for the union of queries given by rew T (φ) over I A are exactly the answers of the original NCQ φ over I K .
The proof is based on existing proofs for ordinary CQs (Eiter et al. 2012;Bienvenu and Ortiz 2015), extended appropriately to deal with the filters.
Furthermore, there exists a sequence φ 0 → T ⋅ ⋅ ⋅ → T φ n (n > 0) with φ = φ 0 and φ ′ = φ n .Hence it is sufficient to show that ans(φ i , I K ) ⊆ ans(φ i−1 , I K ) for all i, 1 ≤ i ≤ n.Suppose the queries are of the following forms: Let π i be a satisfying assignment for ϕ i (x i , y i ) ∧ Ψ i in I K .Suppose φ i−1 → T φ i by 1. selecting variable x and introducing ŷ in (S1) and 2. selecting M ⊑ T ∃s.N in (S2).
Let π i (ŷ) = d.By Step (S6), M (ŷ) ∈ ϕ i and since π i satisfies ϕ i , it has to hold that d ∈ M I K .This implies that d ∈ (∃s.N ) I K .Since π i satisfies the new filter ψ * i (ŷ) that is constructed in (S7), and by selecting M ⊑ T ∃s.N in (S2) the precondition of ψ * i (ŷ) is satisfied by π i in I K , there has to be an assignment π i ∪ {x ↦ d ′ } that satisfies the conclusion of ψ * i (ŷ).We define the assignment π i−1 of the variables of ϕ i−1 as follows Then π i−1 is a satisfying assignment for φ i−1 in I K .To see this, first consider an atom α in ϕ i−1 .We show that π i−1 satisfies α in I K .If α contains x, it can be of the following forms: A(x), ¬A(x), r(y, x) or ¬r(y, x) with y ∈ Pred.For all of these cases, we know by Step (S7) that they are either implied by s(ŷ, x) ∧ N (x) or contained in Neg, with y replaced by ŷ.By the choice of d ′ , we know that π i−1 satisfies each such atom.
If α does not contain x, then ϕ i contains the atom α ′ that is obtained from α by replacing all of the variables from Pred with ŷ.By construction, we know that π i−1 (y) = π i (ŷ) for all y ∈ Pred and What remains to show is that π i−1 satisfies Ψ i−1 .Consider any ψ(z) ∈ Ψ i−1 , and distinguish the following cases: Otherwise the filter is present in Ψ i .In this case we know that I K , π i ⊧ ψ(z) and π i (z) = π i−1 (z).Hence, it must also hold that I K , π i−1 ⊧ ψ(z).
(⊆): Suppose that a ∈ mwa(φ, K) = ans(φ, I K ).We have to show that there exists a rewriting φ ′ ∈ rew T (φ) and a satisfying assignment π for φ ′ in I A such that a = π(x).To do this, we assign a degree (a natural number) to each satisfying assignment (including the existentially quantified variables of the NCQ part) such that a satisfying assignment with degree 0 does not use any anonymous individuals.We then show that for each satisfying assignment with a degree greater than 0, we can find a rewriting for which a satisfying assignment yielding the same answer, but with a lower degree, exists.In addition, for every such assignment π and for all filters ψ(y) in φ ′ it should hold that, if π(y) ∈ I, then that is, all filters (at any stage of the rewriting) are satisfied within the confines of I A .
For any element d ∈ Δ I K , we denote by |d| the minimal number of role connections required to reach d from an element in I, with |d| = 0 iff d ∈ I. Additionally, for any assignment π Since φ ∈ rew T (φ), to prove the claim it suffices to show that whenever there is φ 1 = ∃y.ϕ 1 (x, y) ∧ Ψ ∈ rew T (φ) such that ϕ 1 has a match π 1 in I K with a = π 1 (x), deg(π 1 ) > 0, and Equation ( †) holds for π 1 and the filters in Ψ, then there exist φ 2 and π 2 with the same properties, but deg(π 2 ) < deg(π 1 ).
Assume φ 1 ∈ rew T (φ) as above, and let π 1 be a match of ϕ 1 .Since deg(π 1 ) > 0 by assumption, there must exist a variable x of ϕ 1 such that π 1 (x) / ∈ I. Select x such that it is a leaf node in the subforest of I K induced by π 1 .Note that x cannot be an answer variable.
We know that π 1 (x) = d x was induced by some axiom α = M ⊑ T ∃s.N and element d p ∈ M I K in Definition 4. By the construction of I K , we know that 1. d x has just the one predecessor d p , and 2.
We obtain the query φ 2 from φ 1 through rewriting, by selecting x and introducing ŷ in (S1), and selecting α in (S2).Let Pred denote the set of predecessor variables of x as defined in (S1).To see that this is a valid choice, the conditions in (S2) need to be verified: (S2a) For any A(x) ∈ ϕ 1 , we have d x = π 1 (x) ∈ A I K , and hence N ⊑ T A by (ii).Consider any role atom r(y, x) ∈ ϕ 1 .From (i), the construction of I K (no inverse edges), and the fact that π 1 is a satisfying assignment for r(y, x) in I K , the only possibility is that π 1 (y) = d p .Therefore (d p , d x) = (π 1 (y), π 1 (x)) ∈ r I K .By (ii), this implies that s ⊑ T r. (S2b) Consider any ¬A(x) ∈ ϕ 1 , for which we must have d x ∉ A I K .From (ii) we know that N / ⊑ T A. Consider any ¬r(y, x) ∈ ϕ 1 .Since this is guarded by a positive role atom as above, again the only possibility is that π 1 (y) = d p .Hence (d p , d x) ∉ r I K .By (ii), this implies that s / ⊑ T r.
Therefore, we obtain a satisfying assignment π 2 for φ 2 in I K such that a ∈ π 2 (x) (and deg(π 2 ) < deg(π 1 )) by setting for all z ∈ Var(ϕ 2 ): To see that π 2 satisfies φ 2 , we argue why it satisfies the new atoms and filter from (S6) and (S7); the old atoms (possibly with renamed variables) remain satisfied.
The new atom M (ŷ) is satisfied since π 2 (ŷ) = d p ∈ M I K .Consider now an atom ¬M ′ (ŷ) with M ′ ∈ M ′ as specified in (S6); we have to show that d p ∉ (M ′ ) I K .Assume to the contrary that d p ∈ (M ′ ) I K .By (S3), we know that M ′ ⊑ T ∃s ′ .N ′ ⊑ s T ∃s.N .Moreover, ∃s ′ .N ′ must be included in the set V in Step 3(a) of Definition 4, because otherwise we would already have d p ∈ (∃s ′ .N ′ ) I A , that is, there would be a named individual b such that which shows that the anonymous object d x would not have been created.Since ∃s ′ .N ′ is included in V and we assumed that ∃s.N is minimal w.r.t.⊑ s T , we must have s ≡ T s ′ and N ≡ T N ′ .But then (S3b) directly contradicts (S2b).
We now consider the filters in φ 2 .Suppose that Equation ( †) holds for π 1 and all filters in φ 1 .For the ones that are only copied from φ 1 (modulo renaming some variables to ŷ), the property is clearly preserved.For the new filter ψ * (ŷ), assume that π 2 (ŷ) ∈ I, and hence we need to show that I A ⊧ π 2 (ψ * (ŷ)).Assume that there exists an element d ′ ∈ I such that (d p , d ′ ) ∈ s I A and d ′ ∈ N I A .But then in Step 3(a) in Definition 4, ∃s.N could not have been added to V since d p ∈ (∃s.N ) I K already holds.Hence, the element d x would have never been introduced, which is a contradiction.Therefore, in I A the precondition of ψ * (ŷ) is never met, which makes the filter trivially satisfied.
Finally, to show that deg(π 2 ) < deg(π 1 ), we make a case distinction on whether the set Pred is empty or not.If Pred = ∅, then we essentially replace the variable x in Under data complexity assumptions, φ and T , and hence rew T (φ), are fixed, and I A is of polynomial size in the size of A.
For nonrooted queries the rewriting can get infinite, because we can always find a leaf variable that is not an answer variable.For example, suppose that the TBox T consists of the two GCIs A ⊑ ∃r.B and B ⊑ ∃r.A. Let φ() = ∃x.A(x) ∧ ¬B(x) be a Boolean query.The rewriting algorithm would produce an infinite rewriting of the form: . . .
In the following, we show that such infinite rewritings can be avoided, since after a certain number of rewriting steps queries will not generate any new answers.
For a query φ with filters Ψ, where each ψ(y) ∈ Ψ is of the form we define the nested filter depth as follows: where the second expression is applied recursively to subfilters.

Lemma 12
Let K = (T , A) be a consistent ELH KB and let φ be an NCQ.Then ans(φ ′ , I A ).
where v denotes the number of variables in φ, and C T and R T denote the number of concept and role names in T , respectively.

Proof
In the following, we assume a connected query φ with v variables.This is without loss of generality since the nonconnected parts in the query can be dealt with separately.
Connectedness is preserved by the rewriting algorithm and hence there are two possible scenarios.In the first one, φ has only a finite number of rewritings.That happens if after at most v rewriting steps the rewriting is of the form ∃z.ϕ(z) ∧ Ψ(z), where ϕ(z) cannot be rewritten further.Otherwise, φ has infinitely many rewritings.In this case, after v rewriting steps, the rewriting and all further rewritings are of the form ∃y.(A(y) ∧ Neg ∧ ψ(y)), ( ‡) where A ∈ C (recall that we assume a TBox in normal form), Neg is a conjunction of atoms of the form ¬ Â(y) for Â ∈ C, and ψ is a filter that is linearly extended in every further rewriting step.Assume a query φ 0 of the form ( ‡) that has been further rewritten to φ n in a sequence of φ 0 → T ⋅ ⋅ ⋅ → T φ i → T ⋅ ⋅ ⋅ → T φ n , where n ∈ N, every φ j is of the form ∃y.A j (y)∧Neg j ∧ ψ j (y) for 0 ≤ j ≤ n, and there exists 1 We show that if there is a satisfying assignment π n for φ n over I A , then there is also a satisfying assignment for a query φ ′ with |φ ′ | < |φ n |.Suppose the nested filters in φ n are matched up to a nested filter depth of 0 ≤ d ≤ n.
Case 1: If d = 0, the body of ψ n is not satisfied, then π n is also a match for A i ∧ Neg i , since the A i = A n , Neg i = Neg n and the body of ψ i is the same as the body of ψ n by assumption.Clearly Case 2: If d > 0, then we construct a match for query φ n−1 .Suppose φ n−1 has been rewritten to φ n by using a GCI A ⊑ T ∃r.B.Then we know that Because d > 0, there has to be a satisfying assignment {x ↦ a, y ↦ b} with a, b ∈ Δ I A for r(x, y) ∧ B(y), which also satisfies Neg n−1 ∧ ψ(y).Then the assignment π n−1 ∶= {y ↦ b} satisfies φ n−1 , and ψ(y) will then be matched up to a depth of d − 1.
This result can be used to bound the nested filter depth of queries during rewriting: In each rewriting step the query is rewritten w.r.t. a GCI of the form A ⊑ T ∃r.B with A, B ∈ C T and r ∈ R T .There can be at most where v denotes the number of variables in φ.Then at least two rewritings between φ and φ ′ must start with the same expression ∃x.A(x) ∧ Neg ∧(∃y.r(x,y) ∧ B(y) → . . .).By the arguments above, there are rewritings of φ of nested filter depth at most v + C 2 T ⋅ R T that yield the same answers as φ ′ .Hence, queries need to be rewritten only until this bound on the nested filter depth, which does not depend on A. We obtain the claimed complexity result.

Theorem 13
Answering NCQs under minimal-world semantics over consistent ELH KBs is NPcomplete, and P-complete in data complexity.

Proof
The lower bounds are inherited from certain answer semantics of CQs over EL KBs (Rosati 2007b) since for CQs the two semantics coincide.For data complexity, it suffices to observe that the size and number of the rewritten queries does not depend on the ABox, and the size of I A is polynomial in the size of A, hence evaluating the rewriting over I A can be done in polynomial time.
For combined complexity, first we can guess an element φ ′ of rew T (φ) since by Lemma 12 the number of necessary rewriting steps is bounded polynomially and each rewriting is of polynomial size.Now it remains to evaluate φ ′ over I A , which is possible in PSpace since φ ′ is a first-order query.To obtain a bound of NP, we need to do a more careful evaluation.First, we start by guessing a match of the initial CQ part of φ ′ in I A .For each of the polynomially many (nested) filters in φ ′ , we then enumerate all (polynomially many) possibilities to extend the initial match by mapping z ′ into I A .If none of these satisfies ψ + (z, z ′ ), this (deterministic) check ends the evaluation successfully.Otherwise, we can continue with recursively guessing an extended match for ∃z ′ .ψ+ (z, z ′ ) ∧ ψ − (z, z ′ ) ∧ Ψ, for each nested filter extending the match by one new variable.Overall, this takes only polynomial time, which implies the claim.
As discussed in Example 2 it can be useful to allow complex assertions in the ABox: The assertions of patient p 3 can then be stated without introducing a new constant for the disease by stating ∃diagnosedWith.SkinOfBreastCancer(p 3 ).
This leads to the introduction of additional acyclic definitions T ′ , which are not fixed.The complexity nevertheless remains the same: Since T does not use the new concept names in T ′ , we can apply the rewriting only w.r.t.T , and extend I A by a polynomial number of new elements that result from applying Definition 4 only w.r.t.T ′ .
What is more important than the complexity result is that this approach can be used to evaluate NCQs using standard database methods, for example, using views to define the finite interpretation I A based on the input data given in A, and SQL queries to evaluate the elements of rew T (φ) over these views (Kontchakov et al. 2011).

Temporal minimal-world semantics in T ELH
In the previous part, we introduced the minimal-world semantics and showed how NCQs can be answered efficiently over ELH -KBs.In the remaining part of the paper we extend the minimal-world semantics to answering temporal queries over T ELH c ◊,lhs,− .

Example 14
Continuing the previous example of different cancer patients, we can now also model temporal aspects.Chemotherapy is a treatment that is usually given in a number of cycles to fight the cancer of a given patient.We formalize this knowledge as follows, where the basic time unit is one day: We make the assumption that being a cancer patient is 365-days convex, hence if a patient is reported to have cancer at two time points at most 1 year apart, we assume that they were also suffering from cancer in between the two reports.Similarly, we assume that a patient reported to be receiving chemotherapy at most four months apart also received chemotherapy in between.Suppose the ABox A contains assertions ChemotherapyPatient(p 1 , i), i ∈ {0, 167, 258}, for a patient p 1 .The set of entailed assertions, denoted by A * , is illustrated below, where for simplicity we omit the individual name p 1 .Here, CancerPatient and ChemotherapyPatient are abbreviated by C and T, respectively.Representative time points −1, 60, 200, and 259 have been introduced additionally, and the intervals they represent are shown above them.Temporal Minimal-World Semantics for Sparse ABoxes Since there are infinitely many time points, in general A * is infinite.However, in Borgwardt et al. (2019) this infinite set is represented via a finite set of representative time points rep(A), each of which represents a time interval in which the assertions do not change.

Metric temporal conjunctive queries with negation
We now consider the reasoning problem of temporal query answering, which generalizes the results from Section 4. We develop a new temporal query language with negation and extend the closed-world semantics for negation from the previous sections to T ELH c ◊,lhs,− .
We combine atemporal NCQs with MTL operators (Alur and Henzinger 1994;Gutiérrez-Basulto et al. 2016;Baader et al. 2017) to obtain Metric Temporal NCQs (MT-NCQs) and finally show that such queries can be answered efficiently over T ELH c ◊,lhs,− KBs when using the minimal-world semantics.

Definition 15
Metric temporal Conjunctive Queries with negation (MTNCQs) are built by the grammar rule where ψ is an NCQ, and I is an interval over N.An MTNCQ φ is rooted/Boolean if all NCQs in it are rooted/Boolean.
We employ the standard semantics shown in Figure 3.One can define the next operator as φ ∶= ⊺ U [1,1] φ, and similarly − φ ∶= ⊺ S [1,1] φ.We can also express and ◻ I φ ∶= ¬◊ I ¬φ, and hence, by (1), the c c ◊ n -operators from Section 2. An MTCQ (or positive MTNCQ) is an MTNCQ without negation, where we assume that the operator ◻ I is nevertheless included as part of the syntax of MTCQs.Let K = (T , A) be a T ELH c ◊,lhs,− KB, φ(x) an MTNCQ, a a tuple of individual names from A, i ∈ tem(A), and I a temporal interpretation.The pair (a, i) is an answer to φ(x) w.r.t.I if I, i ⊧ φ(a).The set of all answers for φ w.r.t.I is denoted ans(φ, I).The tuple (a, i) is a certain answer to φ w.r.t.K if it is an answer in every model of K; all these tuples are collected in the set cert(φ, K).

S. Borgwardt et al. Example 16
Consider the criterion "Patients that received chemotherapy for more than 3 months, but less than 6 months."This can be expressed as an MTNCQ as follows.
The negated conjunct expresses that the data does not indicate an ongoing chemotherapy for the past 6 months (minimal-world semantics), rather than that such a treatment is categorically ruled out by some TBox axioms (certain-answer semantics).

Minimal-world semantics for MTNCQs
To extend the approach from Section 4 we need to find a minimal canonical model of a T ELH c ◊,lhs,− KB.Note that T ELH c ◊,lhs,− does not allow temporal roles, because temporal roles interfere with the minimality: they may entail the existence of a generic role successor long after it has been superseded by several more specific successors, and hence the generic successor becomes redundant.
In the definition of the model, we make use of entailment in T ELH c ◊,lhs,− , which can be checked in polynomial time (Borgwardt et al. 2019).Thus, we can exclude w.l.o.g.equivalent concept and role names.Also, for simplicity, in the following we assume w.l.o.g. that all TBox inclusions are in normal form (2). In particular, disallowing CIs of the form ⋆ ◊A ⊑ ∃r.B allows us to draw a stronger connection to the original construction in Section 4; see in particular Step 3(a) in Definition 17 below.

Definition 17
The minimal temporal canonical model We denote by I A the result of executing only Steps 1 and 2 of this definition, that is, restricting I K to the named individuals.Since there are only finitely many elements of I, C, and R that are relevant for this definition (i.e.those that occur in K), for simplicity we often treat I A as if it had a finite object (but still infinite time) domain.
In I K , there may exist anonymous objects that are not connected to any named individuals in I i and are not relevant for the satisfaction of the KB.For this reason, in the following we consider only rooted MTNCQs, which can be evaluated only over the parts of I K that are connected to the named individuals.We again show that I K is actually a model of K and is canonical in the usual sense that it can be used to answer positive queries over K under certain answer semantics.

Lemma 18
Let K be a consistent T ELH c ◊,lhs,− KB.Then I K is a model of K and, for every rooted MTCQ φ, we have cert(φ, K) = ans(φ, I K ).

Proof
The first claim is easy to prove, and the inclusion cert(φ, K) ⊆ ans(φ, I K ) follows from the fact that I K is a model of K.For the other inclusion, consider any model J = (Δ J , (J i ) i≥0 ) of K. We prove that (a, i) ∈ ans(φ, I K ) implies (a, i) ∈ ans(φ, J) by induction on the structure of φ.
• If φ is a rooted CQ, then I i ⊧ φ(a).Moreover, since φ is rooted, only the rooted part of I i , consisting of all elements connected to named individuals, is relevant for satisfying φ(a).It is easy to show that this part can be homomorphically mapped into J i , hence J, i ⊧ φ(a).
• The cases of S I , ◊ I , and ◻ I are similar, and therefore the claim also extends to c c ◊ n , , and − .◻ Thus, the following minimal-world semantics is compatible with certain answer semantics for positive (rooted) queries, while keeping a tractable data complexity.

Definition 19
The set of minimal-world answers to an MTNCQ φ over a consistent T ELH c ◊,lhs,− KB K is mwa(φ, K) ∶= ans(φ, I K ).

A combined rewriting for MTNCQs
Following the approach used for the atemporal case, we now show that rooted MTNCQ answering under minimal-world semantics is combined first-order rewritable.For rewriting atomic queries we use the results from Borgwardt et al. (2019).Thus, we proceed as follows.First, we rewrite φ into a metric first-order temporal logic (MFOTL) formula φ T , which combines first-order formulas via metric temporal operators; for details, see Basin et al. (2015).The formula φ T is obtained by replacing each (rooted) NCQ ψ with the first-order rewriting ψ T ∶= ⋁ ψ ′ ∈rew T (ψ) ψ ′ from Section 4, and hence φ T can be evaluated already over I A .Second, we then further rewrite φ T into a three-sorted first-order formula (with explicit variables for (a) objects, (b) time points, and (c) bits of the binary representation of time values), which is then evaluated over a restriction I fin A of I A that contains only finitely many time points.
These two steps produce a valid rewriting for the query φ.

Proof
We prove the claim by induction over the structure of φ.
Suppose that φ is a rooted NCQ.Since φ does not contain temporal operators, we can restrict our attention to a single atemporal interpretation I i in I K = (Δ I K , (I i ) i∈Z ).Since φ is rooted, only the "rooted" part I r i of I i , consisting only of those elements connected to named individuals via a sequence of role connections, is relevant for evaluating φ (and similarly for φ T ).In the construction of I K (Definition 17), we can observe that I r i is uniquely determined by the definition of A Ii and r Ii in Step 2.Moreover, I r i is isomorphic to the (atemporal) minimal canonical model of (T ′ , A i ) as in Definition 4, where In particular, one can observe that the temporal operators in T are irrelevant for the behavior of the anonymous elements in and we can restrict the attention to those assertions entailed for time point i.
Hence, by Lemma 11 we can conclude that (a, i) ∈ mwa(φ, K) = ans(φ, I K ) iff a ∈ ans(φ, I i ) = ans(φ, I r i ) = ans(φ T , I i,A ) iff (a, i) ∈ ans(φ T , I A ), where I i,A is the restriction of I i to the named individuals.
For the remaining cases, it suffices to observe that φ and φ T are built on the same structure of temporal operators, which have the same semantics for both MTNCQs and MFOTL formulas.◻ For the second rewriting step, we restrict ourselves to finitely many time points.More precisely, we consider the finite structure I fin A , which is obtained from I A by restricting the set of time points to rep(A).By Lemma 4 in Borgwardt et al. (2019), the information contained in I fin A is already sufficient to answer rooted atomic queries.The idea is that each time point i is assigned a representative time point |i| from rep(A), at which the same assertions hold.Therefore, the answers to a query at time point i are the same as the answers at |i|.We extend this structure a little, by considering the two representatives i, j for each maximal interval [i, j] in Z ∖ tem(A).In this way, we ensure that the "border" elements are always representatives for their respective intervals.The size of the resulting structure I fin A is polynomial in the size of K. Example 21 Continuing Example 16, below one can see the finite structure I fin A over the representative time points {−1, 0, 1, 166, 167, 168, 257, 258, 259}, where for simplicity we omit the individual name.The rewriting from Lemma 20 can refer to time instants outside of rep(A).However, when we want to evaluate a pure FO formula over the finite structure I fin A , this is not possible anymore, because the first-order quantifiers must quantify over the domain of I fin A .Moreover, since the query φ T can contain metric temporal operators, we need to keep track of the distance between the time points in tem(A).Hence, in the following we assume that I fin A is given as a first-order structure with the domain I∪{b 1 , .Temporal Minimal-World Semantics for Sparse ABoxes 219 and additional predicates bit and sign such that bit(i, j), 1 ≤ j ≤ n, is true iff the jth bit of the binary representation of the time stamp i is 1, and sign(i) is true iff i is nonnegative.Thus, we now consider three-sorted first-order formulas with the three sorts I (for objects), {b 1 , . . ., b n } (for bits) and rep(A) (for time stamps).We denote variables of sort rep(A) by t, t ′ , t ′′ .To simplify the presentation, we do not explicitly denote the sort of all variables, but this is always clear from the context.Every concept name is now accessed as a binary predicate of sort I × rep(A), for example, A(a, i) refers to the fact that individual a satisfies A at time point i.Similarly, role names correspond to ternary predicates of sort I × I × rep(A).
In the following we show that the expressions t ′ t and even t ′ − t m for some constant m and ∈ {≥, >, =, <, ≤} are definable as first-order formulas using the natural order < on {1, . . ., m}.More specifically, we define t ′ − t d, for some constant d < ∞ and ∈ {≥, >, =, <, ≤}, as first-order formulas with build-in predicates bit(t, j), 1 ≤ j ≤ n, which is true iff the jth bit of t is 1, and sign(t), which is true iff t ≥ 0. W.l.o.g., let d be a nonnegative integer; otherwise, the formula can be reformulated as t − t ′ k, where 0 ≤ k = −d.First, consider the case of equality.
where sign d (t) checks whether t + d is nonnegative, bit d (t, j) says what the jth bit of the binary representation of t + d is, and ovf d (t) detects whether the addition of d to t causes an overflow in the n-bit.These three auxiliary predicates are defined inductively as follows: The remaining cases ∈ {≥, >, <, ≤} are obtained similarly to the formulas above.For example, t < t ′ can be expressed by looking at the signs and the most significant bit in which they differ, formally: https://doi.org/10.1017/S1471068421000119Published online by Cambridge University Press S. Borgwardt et al.

Lemma 22
For φ T there is a constant N ∈ N such that, for every subformula ψ of φ T , every maximal interval J in Z ∖ ⋃{[i − N, i + N ] | i ∈ tem(A)}, all k, ∈ J, and all relevant tuples a over I, we have I A , k ⊧ ψ(a) iff I A , ⊧ ψ(a).

Proof
We are going to prove a more specific statement.Namely, let N ψ be the sum of all interval bounds of temporal formulas in a subformula ψ of φ T (except ∞).Consequently, for the proof we consider instead every maximal interval We show this by induction on the structure of ψ, but only consider three representative cases; the other cases are similar.
• If ψ is the rewriting of an NCQ, then N ψ = 0 and the semantics of ψ depends only on the interpretation at a single time point.Since k and belong to the same maximal interval in Z ∖ tem(A), by Lemma 4 in Borgwardt et al. (2019) and the construction of I A , this interpretation behaves in the same way at k and at .
In case that j = c 1 = 0, we have I A , k ⊧ ψ 2 (a).Since k and are farther than N ψ ≥ N ψ2 from the nearest element of rep(A), by induction we also have I A , ⊧ ψ 2 (a) and thus I A , ⊧ ψ(a) in this case.Hence, we can assume in the following that j ≥ c 1 > 0, and thus in particular I A , k ⊧ ψ 1 (a).
Since both k +j and +c 2 are farther than N ψ −c 2 ≥ N ψ2 from the nearest element of rep(A), by induction we have I A , + c 2 ⊧ ψ 2 (a).Moreover, since I A , k ⊧ ψ 1 (a) and k as well as all elements in [ , + c 2 ] are farther than N ψ − c 2 ≥ N ψ1 from the nearest element of rep(A), by induction we have I A , m ⊧ ψ 1 (a) for all m with ≤ m ≤ + c 2 .Hence, then we have a similar situation as above, except that j is not bounded by c 2 .We can again assume that j > 0 and I A , k ⊧ ψ 1 (a).
Let p be the maximal element of J.If k + j > p + c 1 , then k + j > and the distance between and k + j must be at least c 1 .Moreover, by assumption 13 we have I A , m ⊧ ψ 1 (a) for all m with p < m < k + j.Since I A , k ⊧ ψ 1 (a) and all elements in J are farther than N ψ ≥ N ψ1 from the nearest element of rep(A), by induction we also have I A , m ⊧ ψ 1 (a) for all m with ≤ m ≤ p.Thus, I A , ⊧ ψ(a).
We now consider the remaining case that k + j ≤ p + c 1 .Then both k + j and + c 1 are farther than N ψ − c 1 ≥ N ψ2 from the nearest element of rep(A), and thus by induction we have I A , + c 1 ⊧ ψ 2 (a).
By similar arguments as above, we obtain I A , ⊧ ψ(a).◻ Hence, for evaluating subformulas of φ T , it suffices to keep track of time points up to N steps away from the elements of rep(A); this includes at least one element from each of the intervals J mentioned in Lemma 22, since every element of tem(A) is immediately surrounded by two elements of rep(A).
We exploit Lemma 22 in the following definition of the three-sorted first-order formula [ψ] n (x, t) that simulates the behavior of ψ(x) at the "virtual" time point t + n, where https://doi.org/10.1017/S1471068421000119Published online by Cambridge University Press Temporal Minimal-World Semantics for Sparse ABoxes Whenever we use a formula [ψ] n (x, t), we require that t denotes a representative for t + n.Due to our assumption that each maximal interval from Z ∖ tem(A) is represented by its endpoints (see Example 21), we know that t is a representative for t + n iff there is no element of rep(A) between t and t + n.We can encode this check in an auxiliary formula:

Example 23
In Example 21, time points 1 and 166 are representatives for the missing time points 2-165, and we have I fin A ⊧ rep 1 (1) (with N = 1).However, for φ T = ¬T(x), we have , that is, the behavior at 1 and 166 differs.To distinguish this, we need to refer to the "virtual" time point 2 (see the gray circled "v"s below) that is not included in I fin A , via the formula [¬T(x)] 1 (p 1 , 1).By Lemma 22, it is sufficient to consider time point 2, because this determines the behavior at 3-165 .
We now define [ψ] n (x, t) recursively, for each subformula ψ of φ T .If ψ is a single rewritten NCQ, then [ψ] n (x, t) is obtained by replacing each atemporal atom A(x) by A(x, t), and similarly for role atoms.The parameter n can be ignored here, because we assumed that t is a representative for t+n, and hence the time points t and t+n are interpreted in I A equally.For conjunctions, we set and similarly for the other Boolean constructors.Finally, we demonstrate the translation for U-formulas (the case of S-formulas is analogous).We define where c 2 may be ∞, in which case the upper bound of t + n + c 2 can be removed.

Lemma 24
Let K = (T , A) be a consistent T ELH c ◊,lhs,− KB and φ be an MTNCQ.

Proof
We show the following claim by induction on the structure of φ: For all i ∈ rep(A), all n ∈ [−N, N ], all relevant tuples a, and all MTNCQs φ such that if I fin A ⊧ rep n (i) then Since this includes the case where i ∈ tem(A), n = 0, for which I fin A ⊧ rep 0 (i) holds, the statement of the lemma follows.
given KB consisting of a temporal data instance and a temporal ontology has a model); we refer the reader to Artale and Franconi (2005), Baader et al. (2003), Lutz et al. (2008), andArtale et al. (2014) for details.Particularly, in the presence of a single rigid role, allowing the operator + ◊ on both sides of EL CIs makes subsumption undecidable (Artale et al. 2007).In Gutiérrez-Basulto et al. (2016), a variety of restrictions are investigated to regain decidability, such as acyclic TBoxes and restrictions on the occurrences of temporal operators.In particular, allowing the qualitative operators ± ◊, − ◊, + ◊, c c ◊ only on the left-hand side of CIs makes the logic tractable.Less closely related work considers temporalized DLs where LTL operators are used to combine axioms, but are not allowed within concepts (Baader et al. 2012).
Since atomic query answering (whether a concept or role name is entailed by a KB) can be reduced to the satisfiability checking problem, further developments of query languages and the complexities of ontology-mediated query answering over temporal data have recently appeared in the literature.In the following we briefly summarize recent research on combining ontology and query languages with negation and temporal formalisms.For a general overview of temporal ontology and query languages, see Lutz et al. (2008), Artale et al. (2017).
In this research we focus on a discrete timeline (over Z) and data facts stamped with a single time point.However, in the literature there are other approaches how to incorporate temporal formalisms in an ontology and the data, for example, dense timelines, like Q or R (Brandt et al. 2018;Ryzhikov et al. 2019); or interval-based data models, where facts are stamped with a pair of time points denoting the interval in which they are true (Kontchakov et al. 2016;Brandt et al. 2018).Discussing this work is beyond the scope of this article.We only want to mention here that choosing Z rather than Q or R is not a restriction in our setting since (i) our formalism allows arbitrarily large gaps between time points in the input and (ii) there are no inferences that would require a dense model of time, for example, computations approaching some time point only in the limit.
Within the discrete time point-based approach, one can distinguish between formalisms with LTL temporal constructs and formalisms employing more refined Metric Temporal Logic (MTL) operators (Alur et al. 1996).MTL extends LTL with temporal intervals for the modal operators, restricting them to a specific time range.Combining OMQA with LTL operators has been investigated in depth.In particular, similarly to the query language adapted in this paper, there is a multitude of works (Baader et al. 2013;Borgwardt and Thost 2015b;Baader et al. 2015a;b) investigating the complexity of answering LTL-CQs that are obtained from LTL formulas by replacing occurrences of propositional variables by arbitrary CQs.Moreover, as in this article, the research (Borgwardt et al. 2013;2015) focuses on the rewritability properties of LTL-CQs.An orthogonal approach for query rewriting over a temporalized DL-Lite ontology was proposed in Artale et al. (2013), Artale et al. (2015).Here the focus mainly lies on increasing expressivity of an ontology language by allowing a concept or a role to be prefixed with LTL constructs.Only recently, also metric variants of LTL-CQs have been considered (Gutiérrez-Basulto et al. 2016;Baader et al. 2017;Thost 2018).
Negation in queries with the classical open-world semantics results in nontractable (mostly coNP or even undecidable) query evaluation (Rosati 2007a;Gutiérrez-Basulto et al. 2015).Moreover, prior work (Borgwardt and Thost 2015a;2020) on temporalized OMQA answering with negation shows that the high complexity of temporal query answering with negation is mostly due to the open-world assumption for negation in a query language.There are several approaches how to introduce negation in OMQA without losing tractability, for example, closed predicates, epistemic semantics, Skolemization.For details, we refer the reader to Section 3, where we motivate the semantics adopted in the paper.Indeed, as we show by this work, by changing semantics for negation, we can apply efficient (in data complexity) algorithms for temporal query answering with negation.

Conclusion
Dealing with the absence of information is an important and at the same time challenging task.In many real-world scenarios, it is not clear whether a piece of information is missing because it is unknown or because it is false.EHRs mostly talk about positive diagnoses and it would be impossible to list all the negative diagnoses, that is, the diseases a patient does not suffer from.Moreover, EHRs and clinical trial criteria contain an inherent temporal component.We showed that such a setting cannot be handled adequately by existing logic-based approaches, mostly because they do not deal with closed-world negation over anonymous objects.We introduced a novel semantics for answering metric temporal CQs with negation and showed that it is well-behaved also for anonymous objects.Moreover, we demonstrated combined first-order rewritability, which allows us to answer MTNCQs by using conventional relational database technologies.The data complexity result of P shows that MTNCQ answering over T ELH c ◊,lhs KBs is not inherently more difficult than CQ answering in EL.Similarly, the combined complexity does not increase over the lower bound of ExpSpace inherited from metric temporal logic.
We are working on an optimized implementation of this method with the aim to deal with queries over large ontologies such as SNOMED CT.First results show that this is feasible, in spite of the theoretical complexity of ExpSpace.12There is already a prototype system that can translate simple clinical trial criteria from clinicaltrials.govinto MTNCQs (Xu et al. 2019).On the theoretical side, we will further develop our approach to also represent numeric information, such as measurements and dosages of medications, which are important for evaluating eligibility criteria of clinical trials (Crowe and Tao 2015;Bonomi and Jiang 2018).Similar to negation, inequalities in CQs also lead to an increase in complexity under certain answer semantics (Gutiérrez-Basulto et al. 2015).It would be interesting to investigate their behavior under closed-world semantics, also because inequalities are related to counting queries, for example, counting the number of diseases or treatments of a patient.Another open problem is to extend our results to temporal logics that support temporal roles, that is, where temporal operators can also be applied to role names.
where A ∈ C and r ∈ R.An ABox is a finite set of concept assertions A(a) and role assertions r(a, b), where a, b ∈ I.A TBox is a finite set of concept inclusions C ⊑ D and role inclusions r ⊑ s, where C, D are concepts and r, s are roles.
r, s ∈ R, and n ≥ 1.A knowledge base (or ontology) K = T ∪ A consists of a TBox T and an ABox A. We refer to the set of individual names occurring in K by Ind(K).We write C ≡ D to abbreviate the two inclusions C ⊑ D, D ⊑ C, and similarly for role inclusions.
Logic T ELH c ◊,lhs,− .We recall the relevant definitions from Borgwardt et al. (2019), starting with the MTL operators that are used in T ELH c ◊,lhs,− .LTL formulas are formulated over a finite set P of propositional variables.In this section,

Fig. 1 .
Fig. 1.The resulting intervals of the induced functions of the diamond operators when applied to the intervals at which p holds (denoted below the timeline).While c and w i ∶= ∅ otherwise.For an illustration see Figure1.We will omit the parentheses in ⋆ ◊(M) for a cleaner presentation.If M is empty, then ⋆

Fig. 2 .
Fig. 2. The minimal canonical model I K C .The named individuals p1, p2, p3, and c3 are depicted by dark grey figures, the remaining anonymous objects by light grey.
is obtained by the following steps.1.Set Δ I K ∶= I and a Ii ∶= a for all a ∈ I and i ∈ Z.2.For each time pointi ∈ Z, define A Ii ∶= {a | K ⊧ A(a, i)} for all A ∈ C and r Ii ∶= {(a, b) | K ⊧ r(a, b, i)} for all r ∈ R.3.Repeat the following steps:(a) Select an element d ∈ Δ I K that has not been selected before and, for each i ∈ Z,let V i ∶= {∃r.B | d ∈ A Ii , d / ∈ (∃r.B) Ii , K ⊧ A ⊑ ∃r.B, A, B ∈ C}. (b)For each ∃r.B that is minimal in some V i , add a fresh element e rB to Δ I K .For all i ∈ Z and K ⊧ B ⊑ A, add e rB to A Ii .(c) For all i ∈ Z, minimal ∃r.B in V i , and K ⊧ r ⊑ s, add (d, e rB ) to s Ii .
Every endomorphism of I K is a strong isomorphism.