WILDCARD QUERY
In Chapters 1 and 2, we developed the ideas underlying inverted indexes for handling Boolean and proximity queries. Here, we develop techniques that are robust to typographical errors in the query, as well as alternative spellings. In Section 3.1, we develop data structures that help the search for terms in the vocabulary in an inverted index. In Section 3.2, we study the idea of a wildcard query: a query such as *a*e*i*o*u*, which seeks documents containing any term that includes all the five vowels in sequence. The * symbol indicates any (possibly empty) string of characters. Users pose such queries to a search engine when they are uncertain about how to spell a query term, or seek documents containing variants of a query term; for instance, the query automat* seeks documents containing any of the terms automatic, automation, and automated.
We then turn to other forms of imprecisely posed queries, focusing on spelling errors in Section 3.3. Users make spelling errors either by accident, or because the term they are searching for (e.g., Herman) has no unambiguous spelling in the collection. We detail a number of techniques for correcting spelling errors in queries, one term at a time as well as for an entire string of query terms. Finally, in Section 3.4 we study a method for seeking vocabulary terms that are phonetically close to the query term(s).
Review the options below to login to check your access.
Log in with your Cambridge Higher Education account to check access.
If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.