To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This book continues the series of volumes containing reprints of the papers in the original Cabal Seminar volumes of the Springer Lecture Notes in Mathematics series [Cabal i, Cabal ii, Cabal iii, Cabal iv], unpublished material, and new papers. The first volume, [Cabal I], contained papers on games, scales and Suslin cardinals. In this volume, we continue with Parts III and IV of the project: Wadge degrees and pointclasses and Projective ordinals. As in our first volume, each of the parts contains an introductory survey (written by Alessandro Andretta and Alain Louveau for Part III and by Steve Jackson for Part IV) putting the papers into a present-day context.
In addition to the reprinted papers, this volume contains papers by Steel (More measures from AD) and Martin (Projective sets and cardinal numbers) that date back to the period of the original Cabal publications but were not included in the old volumes. Jackson contributed a new paper Regular cardinals without the weak partition property with recent results that fit well with the topic of Part IV. The paper Early investigations of the degrees of Borel sets by Wadge is a historical overview of the process of the development of the basic theory of the Wadge degrees. Table 1 gives an overview of the papers in this volume with their original references.
As emphasized in our first volume, our project is not to be understood as a historical edition of old papers. In the retyping process, we uniformized and modernized notation and numbering of sections and theorems.
In this paper, I give an overview/summary of the techniques used, and the results derived, in my 1984 PhD dissertation, Reducibility and Determinateness on the Baire Space. In particular, I focus on the calculation of the order type (and structure) of the collection of degrees of Borel sets.
§1 Introduction. I would like in this article to present a overview of the main results of my PhD dissertation, and of the game and other techniques used to derive them.
My first thought was to print the entire dissertation but I quickly realized that it was too long—about ten times too long! Hopefully, this condensed version will still be useful. In producing such a drastically shortened account, I have omitted detailed proofs, and many less important or intermediate results. Also, the remaining definitions and results are for the most part given informally.
In writing this I have in mind, first, colleagues (whether in Mathematics or Computing) who are not familiar with descriptive set theory but nevertheless would like to learn about “Wadge Degrees”. To make the material accessible to these readers I have included some basic information about, say, Borel sets that will be very familiar to Cabal insiders. However, my hope is that even experts in descriptive set theory may learn something, if not about my results, at least about the manner in which they were discovered. In particular, I would like to give some ‘classic’ notions, such as Boolean set operations, the attention they deserve.
This chapter proposes exercises and projects based on CouchDB, a recent database system which relies on many of the concepts presented so far in this book. In brief:
CouchDB adopts a semistructured data model, based on the JSON (JavaScript Object Notation) format; JSON offers a lightweight alternative to XML;
A database in CouchDB is schema-less: the structure of the JSON documents may vary at will depending on their specific features;
In order to cope with the absence of constraint that constitutes the counterpart of this flexibility, CouchDB proposes an original approach, based on structured materialized views that can be produced from document collections;
Views are defined with the MapReduce paradigm, allowing both a parallel computation and incremental maintenance of their content;
Finally, the system aspects of CouchDB illustrate most of the distributed data management techniques covered in the last part of the present book: distribution based on consistent hashing, support for data replication and reconciliation, horizontal scalability, parallel computing, and so forth.
CouchDB is representative of the emergence of so-called key-value store systems that give up many features of the relational model, including schema, structured querying, and consistency guarantees, in favor of flexible data representation, simplicity and scalability. It illustrates the “No[tOnly]SQL” trend with an original and consistent approach to large-scale management of “documents” viewed as autonomous, rich pieces of information that can be managed independently, in contrast with relational databases, which take the form of a rich graph of interrelated flat tuples.
This chapter introduces XPath and XQuery, two related languages that respectively serve to navigate and query XML documents. XPath is actually a subset of XQuery. Both languages, specified by theW3C, are tightly associated and share in particular the same conceptual modeling of XML documents. Note that the XPath fragment of XQuery has a well-identified purpose (expressing “paths” in an XML tree) and as such can be used independently in other XML processing contexts, such as inside the XSLT transformation language. XQuery uses XPath as a core language for path expressions and navigation.
XQuery is a declarative language and intends to play for XML data the role of SQL in the relational realm. At a syntactical level, it is somewhat inspired from SQL. More importantly, it is expected to benefit from a mixture of physical storage, indexing, and optimization techniques in order to retrieve its result by accessing only a small fraction of its input. XQuery constitutes therefore an appropriate choice when large XML documents or large collections of documents must be manipulated.
In this chapter, we use as running example a movies XML database. Each XML document represents one movie and is similar in structure to the sample document shown in Figure 2.1.
We begin the chapter with a bird's-eye view of XQuery principles, introducing the XML data model that supports the interpretation of path expressions and queries, and showing the main features of the two languages. We then consider in more detail XPath and XQuery in a rather informal way.
Lucene is an open-source tunable indexing platform often used for full-text indexing of Web sites. It implements an inverted index, creating posting lists for each term of the vocabulary. This chapter proposes some exercises to discover the Lucene platform and test its functionalities through its Java API.
PRELIMINARY: A LUCENE SANDBOX
We provide a simple graphical interface that lets you capture a collection of Web documents (from a given Web site), index it, and search for documents matching a keyword query. The tool is implemented with Lucene (surprise!) and helps to assess the impact of the search parameters, including ranking factors.
You can download the program from our Web site. It consists of a Java archive that can be executed right away (provided you have a decent Java installation on your computer). Figure 17.1 shows a screenshot of the main page. It allows you to
Download a set of documents collected from a given URL (including local addresses),
Index and query those documents,
Consult the information used by Lucene to present ranked results.
Use this tool as a preliminary contact with full text search and information retrieval. The projects proposed at the end of the chapter give some suggestions to realize a similar application.
INDEXING PLAIN TEXT WITH LUCENE – A FULL EXAMPLE
We embark now in a practical experimentation with Lucene. First, download the Java packages from the Web site http://lucene.apache.org/java/docs/.
The vision of the Semantic Web is that of a world-wide distributed architecture where data and services easily interoperate. This vision is not yet a reality in the Web of today, in which given a particular need, it is difficult to find a resource that is appropriate to it. Also, given a relevant resource, it is not easy to understand what it provides and how to use it. To solve such limitations, facilitate interoperability, and thereby enable the Semantic Web vision, the key idea is to also publish semantics descriptions of Web resources. These descriptions rely on semantic annotations, typically on logical assertions that relate resources to some terms in predefined ontologies. This is the topic of the chapter.
An ontology is a formal description providing human users a shared understanding of a given domain. The ontologies we consider here can also be interpreted and processed by machines thanks to a logical semantics that enables reasoning. Ontologies provide the basis for sharing knowledge, and, as such, they are very useful for a number of reasons:
Organizing data. It is very easy to get lost in large collections of documents. An ontology is a natural means of “organizing” (structuring) it and thereby facilitates browsing through it to find interesting information. It provides an organization that is flexible, and that naturally structures the information in multidimensional ways. For instance, an ontology may allow browsing through the courses offered by a university by topic or department, by quarter or time, by level, and so forth.
The goal of data integration is to provide a uniform access to a set of autonomous and possibly heterogeneous data sources in a particular application domain. This is typically what we need when, for instance, querying the deep web that is composed of a plethora of databases accessible through Web forms. We would like to be able with a single query to find relevant data no matter which database provides it.
A first issue for data integration (that will be ignored here) is social: The owners of some data set may be unwilling to fully share it and be reluctant to participate in a data integration system. Also, from a technical viewpoint, the difficulty comes from the lack of interoperability between the data sources, that may use a variety of formats, specific query-processing capabilities, different protocols. However, the real bottleneck for data integration is logical. It comes from the so-called semantic heterogeneity between the data sources. They typically organize data using different schemas even in the same application domain. For instance, each university or educational institution may choose to model students and teaching programs in its own way. A French university may use the social security number to identify students and the attributes nom, prenom, whereas the Erasmus database about European students may use a European student number and the attributes firstname, lastname, and home university.
In this chapter, we study data integration in the mediator approach. In this approach, data remain exclusively in data sources and are obtained when the system is queried.
The Internet and the Web have revolutionized access to information. Individuals are depending more and more on the Web to find or publish information, download music and movies, and interact with friends in social networking Web sites. Following a parallel trend, companies go more and more toward Web solutions in their daily activity by using Web services (e.g., agenda) as well as by moving some applications into the cloud (e.g., with Amazon Web services). The growth of this immense information source is witnessed by the number of newly connected people, by the interactions among them facilitated by the social networking platforms, and above all by the huge amount of data covering all aspects of human activity. With the Web, information has moved from data isolated in very protected islands (typically relational databases) to information freely available to any machine or any individual connected to the Internet.
Perhaps the best illustration comes from a typical modern Web user. She has information stored on PCs, a personal laptop, and a professional computer, but also possibly on some server at work, on her smartphone, in an e-book, and so on. Also, she maintains information in personal Web sites or social network Web sites. She may store pictures in Picasa, movies in You Tube, bookmarks in Firefox Sync, and the like. So, even an individual is now facing the management of a complex distributed collection of data.