Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-m9kch Total loading time: 0 Render date: 2024-06-09T14:05:19.462Z Has data issue: false hasContentIssue false

12 - Multilingual search

Published online by Cambridge University Press:  08 June 2018

Get access

Summary

In this chapter:

■ Why multilingual search is so difficult

■ The value of Unicode

■ The problems of transliteration

Searching the Tower of Babel

The issues concerning searching in multiple languages are often poorly understood, and yet the need to be able to do so is going to be increasingly important. Research from Byte Level Research (http://bytelevel.com) indicates that the majority of internet users are not native-English speakers.

From the perspective of website search, this means that there may be a considerable number of visitors searching the site who may have only a limited range of synonyms and linguistic awareness. This is not just on a cross-national basis. It is estimated that over 300 languages are spoken in London alone, though this is probably the most linguistically diverse city in the world. Clearly, in the period up to the 2008 Olympic Games, the growth in Chinese users is considerable.

The management of multiple languages also needs to be carefully considered in the enterprise environment. Just because an organization has English as its global corporate language does not mean to say that all documents will be in English. Documents relating to staff contracts and policies, and contracts with local suppliers, will invariably be in local languages. Patents and other legal documents will also be in more than one language, and if any one individual user is to have global access to the resources of the organization, the problems of how to search in a language-independent way as regards both the language skills of the searcher and the languages of the documents need to be addressed.

Searching multiple languages

Many search engines claim to be able to search in multiple languages, but care must be taken over just what this means. It usually means that the search engine can parse a document written in a wide range of languages, create an index, and then run a query against that index to present a number of relevant documents. Although not easy to undertake, this is now quite well developed technology and uses Unicode to convert a language to a standardized (or rather normalized) format. This enables a search to be carried out using a query in the destination language.

Type
Chapter
Information
Making Search Work
Implementing web, intranet and enterprise search
, pp. 127 - 134
Publisher: Facet
Print publication year: 2007

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×