To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In data exchange, we are interested in computing certain answers to a query. However, we do not yet know when such a computation is feasible. The goal of the chapter is to answer this question.
The bad news is that the problem of computing certain answers for relational calculus (equivalently, relational algebra or FO) queries is undecidable, even in the absence of target dependencies. But the good news is that the problem becomes decidable, and, indeed, tractable, for unions of conjunctive queries over mappings with a weakly acyclic set of tgds. Conjunctive queries, as was already mentioned several times, play a very important role and are very common, and mappings with weakly acyclic sets of tgds, as we have seen, are the ones behaving particularly well when it comes to materializing solutions.
The positive result, however, breaks down when we extend conjunctive queries with inequalities. But we can still find a meaningful class of queries capable of expressing interesting properties in the data exchange context that extends conjunctive queries with a limited amount of negation and shares most of their good properties for data exchange.
Finally, we study the notion of query rewriting, i.e., when certain answers to a query Q can be computed by posing a possibly different query Q′ over a materialized solution. Such rewritings are easy for unions of conjunctive queries; our study concentrates on rewritings of relational algebra queries.
So far we have concentrated on handling data in data exchange, i.e., transforming source databases into target ones, and answering queries over them. We now look at manipulating information about schemas and schema mappings, known as metadata, i.e., we deal with metadata management. In this short chapter we outline the key problems that need to be addressed in the context of metadata management. These are divided into two groups of problems. The first concerns reasoning about mappings, and the second group of problems is about manipulating mappings, i.e., building new mappings from existing ones.
Reasoning about schema mappings
As we have seen, mappings are logical specifications of the relationship between schemas, both in the relational and XML scenarios. In particular, we have seen many different logical languages that are used to specify mappings. Thus, a first natural problem that one would like to study in the context of metadata management is to characterize the properties that a mapping satisfies depending on the logical formulae that are used to define it. More precisely, one would like, in the first place, to understand whether the logical formulae used to specify a mapping are excessively restrictive in the sense that no source instance admits a solution, or at least restrictive in the sense that some source instances do not admit solutions. Note that this is different from the problem of checking for the existence of solutions, studied in Chapter 5.
Data exchange, as the name suggests, is the problem of exchanging data between different databases that have different schemas. One often needs to exchange data between existing legacy databases, whose schemas cannot be easily modified, and thus one needs to specify rules for translating data from one database to the other. These rules are known as schema mappings. Once a source database and a schema mapping are given, one needs to transfer data to the target, i.e., construct a target database. And once the target database is constructed, one needs to answer queries against it.
This problem is quite old; it has been studied, and systems have been built, but it was done in a rather ad hoc way. A systematic study of the problem of data exchange commenced with the 2003 paper “Data exchange: semantics and query answering” by Fagin, Kolaitis, Miller, and Popa, published in the proceedings of the International Conference on Database Theory. A large number of followup papers appeared, and for a while data exchange was one of the most active research topics in databases. Foundational questions related to data exchange largely revolved around three key problems:
how to build a target solution;
how to answer queries over target solutions; and
how to manipulate schema mappings themselves.
The last question is also known under the name of metadata management, since mappings represent metadata, rather than data in the database.
As we know by now, solutions in data exchange are not unique; indeed, if a source instance has a solution under a relational mapping M, then it has infinitely many of them. But if many solutions exist, which one should we materialize? To answer this question, we must be able to distinguish good solutions from others. As we already mentioned in Section 4.2, good solutions in data exchange are usually identified with the most general solutions (see Example 4.3), whichinturn can be characterized as the universal solutions. We will see in this chapter that universal solutions admit different equivalent characterizations.
The existence of solutions does not in general imply the existence of universal solutions (in fact, checking whether universal solutions exist is an undecidable problem). However, this negative theoretical result does not pose serious problems in data exchange. Recall from the previous chapter that the class of mappings M = (Rs, Rt, Σst, Σt), such that Σt consists of a set of egds and a weakly acyclic set of tgds, is particularly well-behaved for data exchange. Indeed, for this class the existence of solutions – that is undecidable in general – can be checked in polynomial time, and, in case a solution exists, at least one solution can be efficiently computed. We will see in this chapter that for this class, the existence of solutions implies the existence of universal solutions. Furthermore, when one exists, it can be efficiently computed.
In this chapter we shall study data exchange for XML documents. XML itself was invented as a standard for data exchange on the Web, albeit under a different interpretation of the term “data exchange”. In the Web context, it typically refers to a common, flexible format that everyone agrees on, and that, therefore, facilitates the transfer of data between different sites and applications. When we speak of data exchange, we mean transforming databases under different schemas with respect to schema mapping rules, and querying the exchanged data.
XML documents and schemas
In this section we review the basic definitions regarding XML. Note that a simple example was already shown in Chapter 1. XML documents have a hierarchical structure, usually abstracted as a tree. An example is shown in Figure 10.1. This document contains information about rulers of European countries. Its structure is represented by a labeled tree; in this example, the labels are europe, country, and ruler. In the XML context, these are referred to as element types. We assume that the labels come from a finite labeling alphabet and correspond, roughly, to relation names from the classical relational setting.
The root of the tree is labeled europe, and it has two children that are labeled country. These have data values, given in parentheses: the first one is Scotland, and the second one is England. Each country in turn has a set of rulers.