Chapter Objectives
In this chapter, you will learn to:
• understand the basic concepts of XML, Document Type Definition, XML Schema Definition, Extensible Stylesheet Language, namespaces, and XPath;
• process XML documents using the DOM and SAX APIs;
• store XML documents using a document-oriented, data-oriented, or hybrid approach;
• grasp the key differences between XML and relational data;
• map between XML documents and (object-)relational data using table-based mapping, schema-oblivious mapping, schema-aware mapping, and SQL/XML;
• search XML data using full-text search, keyword-based search, structured search using XQuery, and semantic search using RDF and SPARQL;
• use XML for information exchange in combination with message-oriented middleware (MOM) and web services;
• understand other data representation formats such as JSON and YAML.
Opening Scenario
For regulatory and insurance purposes, Sober needs to store a report for each accident. The report should include the date, the location (including GPS coordinates), a summary of what happened and the individuals involved. Furthermore, for each individual Sober needs to know:
• the name;
• whether he/she is a driver driving a Sober car or not, a pedestrian, or a cyclist;
• whether he/she was injured.
The report should also include information about aid provided, such as police or ambulance assistance. Sober would like to know the best way to store this report.
In this chapter, we discuss how to store, process, search, and visualize XML documents, and how DBMSs can support this. We start by looking at the XML data representation standard and discuss related concepts such as DTDs and XSDs for defining XML documents, XSL for visualizing or transforming XML documents, and namespaces to provide for a unique naming convention. This is followed by introducing XPath, which uses path expressions to navigate through XML documents. We review the DOM and SAX API to process XML documents. Next, we cover both the documentand data-oriented approach for storing XML documents. We extensively highlight the key differences between the XML and relational data model. Various mapping methods between XML and (object-)relational data are discussed: table-based mapping, schema-oblivious mapping, schema-aware mapping, and the SQL/XML extension. We also present various ways to search XML data: full-text search, keyword-based search, structured search, XQuery, and semantic search. We then illustrate how XML can be used for information exchange, both at the company level using RPC and message-oriented middleware and between companies using SOAP or REST-based web services.