Skip to main content Accesibility Help
×
×
Home
  • This chapter is unavailable for purchase
  • Cited by 1
  • Cited by
    This chapter has been cited by the following publications. This list is generated based on data provided by CrossRef.

    Tan, Saravadee Sae and Hoon, Gan Keng 2016. An efficient similarity matching for clustering XML element. p. 101.

    ×
  • Print publication year: 2008
  • Online publication date: June 2012

10 - XML retrieval

Summary

Information retrieval (IR) systems are often contrasted with relational databases. Traditionally, IR systems have retrieved information from unstructured text – by which we mean “raw” text without markup. Databases are designed for querying relational data, sets of records that have values for predefined attributes such as employee number, title, and salary. There are fundamental differences between IR and database systems in terms of retrieval model, data structures, and query language as shown in Table 10.1.

Some highly structured text search problems are most efficiently handled by a relational database; for example, if the employee table contains an attribute for short textual job descriptions and you want to find all employees who are involved with invoicing. In this case, the SQL query:

select lastname from employees where job_desc like ‘invoic%’;

may be sufficient to satisfy your information need with high precision and recall.

STRUCTURED RETRIEVAL

However, many structured data sources containing text are best modeled as structured documents rather than relational data. We call the search over such structured documents structured retrieval. Queries in structured retrieval can be either structured or unstructured, but we assume in this chapter that the collection consists only of structured documents. Applications of structured retrieval include digital libraries, patent databases, blogs, text in which entities like persons and locations have been tagged (in a process called named entity tagging), and output from office suites like OpenOffice that save documents as marked up text.

Recommend this book

Email your librarian or administrator to recommend adding this book to your organisation's collection.

Introduction to Information Retrieval
  • Online ISBN: 9780511809071
  • Book DOI: https://doi.org/10.1017/CBO9780511809071
Please enter your name
Please enter a valid email address
Who would you like to send this to *
×