In this chapter, we introduce and explain the basic notions of Description Logic, including syntax, semantics and reasoning services, and we explain how the latter are used in applications.
The concept language of the DL ALC
In this section, we will describe the central notions of Description Logic first on an intuitive level and then on a more precise level. As a running example, we use the domain of university courses and teaching, and we will use a conceptualisation given informally, in graphical form, in Figure 2.1. Please note that this is one way of viewing university teaching – which might be very different from the reader's way of viewing it. Also, as it is an informal representation, different readers may interpret arrows in different ways; that is, our representation does not come with a well-defined semantics that would inform us in an unambiguous way how to interpret the different arrows. In the next sections, we will describe our way of viewing university teaching in a DL knowledge base, thereby establishing some constraints on the meaning of terms like “Professor” and “teaches” used in Figure 2.1 and throughout this section.
In Description Logic, we assume that we want to describe some abstraction of some domain of interest, and that this abstraction is populated by elements. We use three main building blocks to describe these elements:
• Concepts represent sets of elements and can be viewed as unary predicates. Concepts are built from concept names and role names (see below) using the constructors provided by the DL used. The set a concept represents is called its extension. For example, Person and Course are concept names, and m is an element in the extension of Person and c6 is in the extension of Course. To make our life a bit easier, we often use “is a” as an abbreviation for “is in the extension of” as, for example, in “m is a Person”.
• Role names stand for binary relations on elements and can be viewed as binary predicates. If a role r relates one element with another element, then we call the latter one an r-filler of the former one. For example, if m teaches c6, then we call c6 a teaches-filler of m.
This is, to the best of our knowledge, the first textbook dedicated solely to Description Logic (DL), a very active research area in logic-based knowledge representation and reasoning that goes back to the late 1980s and that has a wide range of applications in knowledge-intensive information systems. In this introductory chapter we will sketch what DLs are, how they are used and where they come from historically. We will also explain how to use this book.
What are DLs and where do they come from?
Description logics (DLs) are a family of knowledge representation languages that can be used to represent knowledge of an application domain in a structured and well-understood way. The name description logics is motivated by the fact that, on the one hand, the important notions of the domain are represented by concept descriptions, i.e., expressions that are built from atomic concepts (unary predicates) and atomic roles (binary predicates) using the concept and role constructors provided by the particular DL; on the other hand, DLs differ from their predecessors, such as semantic networks and frames, in that they are equipped with a logic-based semantics which, up to some differences in notation, is actually the same semantics as that of classical first-order logic.
Description logics typically separate domain knowledge into two components, a terminological part called the TBox and an assertional part called the ABox, with the combination of a TBox and an ABox being called a knowledge base (KB). The TBox represents knowledge about the structure of the domain (similar to a database schema), while the ABox represents knowledge about a concrete situation (similar to a database instance). TBox statements capturing knowledge about a university domain might include, e.g., a teacher is a person who teaches a course, a student is a person who attends a course and students do not teach, while ABox statements from the same domain might include Mary is a person, CS600 is a course and Mary teaches CS600. As already mentioned, a crucial feature of DLs is that such statements have a formal, logic-based semantics.
The main purpose of this chapter is to show that sets of models of ALC concepts or knowledge bases satisfy several interesting properties, which can be used to prove expressivity and decidability results. To be more precise, we will introduce the notion of bisimulation between elements of ALC interpretations, and prove that ALC concepts cannot distinguish between bisimilar elements. On the one hand, we will use this to show restrictions of the expressive power of ALC: number restrictions, inverse roles and nominals cannot be expressed within ALC. On the other hand, we will employ bisimulation invariance of ALC to show that ALC has the tree model property and satisfies closure under disjoint union of models. We will also show that ALC has the finite model property, though not as a direct consequence of bisimulation invariance. These properties will turn out to be useful in subsequent chapters and of interest to people writing knowledge bases: for example, ALC's tree model property implies that it is too weak to describe the ring structure of many chemical molecules since any ALC knowledge base trying to describe such a structure will also have acyclic models. In the present chapter, we will only use the finite model property (or rather the stronger bounded model property) to show a basic, not complexity-optimal decidability result for reasoning in ALC. For the sake of simplicity, we concentrate here on the terminological part of ALC, i.e., we consider only concepts and TBoxes, but not ABoxes.
To obtain a better intuitive view of the definitions and results introduced below, one should recall that interpretations of ALC can be viewed as graphs, with edges labelled by roles and nodes labelled by sets of concept names. More precisely, in such a graph
• the nodes are the elements of the interpretation and they are labeled with all the concept names to which this element belongs in the interpretation;
• an edge with label r between two nodes says that the corresponding two elements of the interpretation are related by the role r.
In Chapter 4, we looked at concrete algorithms for reasoning in ALC and some of its extensions. In this chapter, we are taking a more abstract viewpoint and discuss the computational complexity of reasoning, which essentially is the question of how efficient we can expect any reasoning algorithm for a given problem to be, even on very difficult (“worst-case”) inputs. Although we will concentrate on the basic reasoning problems satisfiability and subsumption for the sake of simple exposition, all results established in this chapter also apply to the corresponding KB consistency problem. In fact, there are very few relevant cases in which the computational complexity of satisfiability and of KB consistency diverge. We start with ALC and show that the complexity of satisfiability and of subsumption depend on the TBox formalism that is used: without TBoxes and with acyclic TBoxes, it is PSPACE-complete while general TBoxes raise the complexity to EXPTIME-complete. Then we consider two extensions of ALC, ALCOI and ALCOIQ, and show that satisfiability and subsumption are more difficult in these DLs: in ALCOI, satisfiability and subsumption are EXPTIME-complete already without TBoxes. We show only hardness to illustrate the increase in complexity. In ALCOIQ, reasoning even becomes NEXPTIME-complete (without TBoxes). Again, we show only hardness. Finally, we consider two extensions of ALC that render reasoning undecidable: role value maps and a certain concrete domain based on the natural numbers and incrementation.
Before starting to analyse the computational complexity of DLs, let us recall some basics of complexity theory. A complexity class is a set of problems that share some relevant computational property such as being solvable within the same resource bounds. For example, PTIME is the class of all problems that can be solved by a deterministic Turing machine in time polynomial in the size of the input. In this chapter, we will mainly be concerned with the following standard complexity classes, which we order according to set inclusion:
The reader is referred to standard textbooks on complexity theory for the exact definition of these classes [AB09, Sip97, Pap94]. It is commonly believed that the inclusions shown above are all strict, but proofs have not yet been found.
An important application of ontologies is to provide semantics and domain knowledge for data. Traditionally, data has been stored and managed inside relational database systems (aka SQL databases) where it is organised according to a pre-specified schema that describes its structure and meaning. In recent years, though, less and less data comes from such controlled sources. In fact, a lot of data is now found on the web, in social networks and so on, where typically neither its structure nor its meaning is explicitly specified; moreover, data coming from such sources is typically highly incomplete. Ontologies can help to overcome these problems by providing semantics and background knowledge, leading to a paradigm that is often called ontology-mediated querying. As an example, consider data about used-car offers. The ontology can add knowledge about the domain of cars, stating for example that a grand tourer is a kind of sports car. In this way, it becomes possible to return a car that the data identifies as a grand tourer as an answer to a query which asks for finding all sports cars. In the presence of data, a fundamental description logic reasoning service is answering database queries in the presence of ontologies. Since answers to full SQL queries are uncomputable in the presence of ontologies, the prevailing query language is conjunctive queries (CQs) and slight extensions thereof such as unions of conjunctive queries (UCQs) and positive existential queries. Conjunctive queries are essentially the select-from-where fragment of SQL, written in logic.
In this chapter, we study conjunctive query answering in the presence of ontologies that take the form of a DL TBox. In particular, we show how to implement this reasoning service using standard database systems such as relational (SQL) systems and Datalog engines, taking advantage of those systems’ efficiency and maturity. Since database systems are not prepared to deal with TBoxes, we need a way to “sneak them in”. While there are several approaches to achieve this, here we will concentrate on query rewriting: given a CQ q to be answered and a TBox T, produce a query qT such that, for any ABox A, the answers to q on A and T are identical to the answers to qT given by a database system that stores A as data.
As discussed in Section 1.2, DL systems have been used in a range of application domains, including configuration, software information and documentation systems and databases, where they have been used to support schema design, schema and data integration, and query answering. More recently, DLs have played a central role in the semantic web [Hor08], having been adopted as the basis for ontology languages such as OIL, DAML+OIL and OWL [HPSvH03]. This has rapidly become the most prominent application of DLs, and DL knowledge bases are now often referred to as ontologies.
In computer science, an ontology is a conceptual model specified using some ontology language; this idea was succinctly captured by Gruber in his definition of an ontology as “an explicit specification of a conceptualisation” [Gru93]. Early ontology languages were often based on frames, but as in the case of early DLs, a desire to provide them with precise semantics and well-defined reasoning procedures increasingly led to ontology languages becoming logic-based. The OIL ontology language was something of a compromise: it had a frame-based syntax, but complemented this with a formal semantics based on a mapping to SHIQ. In DAML+OIL and OWL the DL-based semantics were retained, but the frame-based syntax of OIL was replaced with a structure much closer to DL-style axioms.
In Section 8.1 we will discuss OWL in more detail, examining its relationship to RDF and to SROIQ, its syntax (or rather syntaxes), some features that go beyond what is typically found in a DL, and its various profiles or sub-languages. In Section 8.2 we will look at some interesting examples of OWL tools and applications.
The OWL ontology language
OWL is a semantic web ontology language developed by the World Wide Web Consortium (W3C), an international community that defines Web technologies. W3C follows a consensus-driven process for the publication of specification documents for Web technologies, in particular Recommendations, which are considered Web standards. OWL was first standardised in 2004, and then revised in 2012, with the revision being denoted OWL 2. Although using a variety of more “Web-friendly” syntaxes based, e.g., on XML and RDF, the basic structure of OWL corresponds closely with that of a DL, and includes such familiar constructs as existential and value restrictions, (qualified) number restrictions, inverse roles, nominals and role hierarchies (see Chapter 2).
Email your librarian or administrator to recommend adding this to your organisation's collection.