Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-2pzkn Total loading time: 0 Render date: 2024-05-31T11:34:47.136Z Has data issue: false hasContentIssue false

11 - Building a thesaurus 2: term extraction from document titles

Published online by Cambridge University Press:  09 June 2018

Get access

Summary

Following the systematic searches on catalogues and databases, we now have about 150 titles which will form the basis of the working vocabulary. Because these were selected carefully to avoid duplication of terms where possible, this should provide us with about 400–500 terms. This is really as big a vocabulary as one can comfortably manage in the initial stages, and there are quite enough terms to establish a sound and reliable structure for the thesaurus. There will probably be gaps in the terminology, but it is more efficient to fill these in at a later stage if necessary. In the majority of cases, far fewer terms than this can provide a reasonable structure for the thesaurus, so if you are dealing with a small, specialist vocabulary you can manage with around 100 terms as a starting point.

The titles must now be analysed to identify relevant terms. I find it easiest to do this in a rather rough and ready way, using a list of the titles, and cutting and pasting the relevant terms into another document. Some level of vocabulary control can be imposed as you go along and, at the end, the extracted terms are easily sorted using the A–Z sort facility of a word processing package.

Identification of significant terms

Let's start by picking out the key concepts in each title. We will look at some examples of titles from our list, starting with a very straightforward one:

Cat overpopulation in the United States

Here there are three important terms:

cat – overpopulation – United States

A similar straightforward example is:

Preference of domestic rabbits for grass or coarse mix

feeds

Domestic rabbits – grass – coarse mix feeds

You will notice that I did not select ‘preference’ as a term to be used. Vague or general terms of this kind are not generally used in indexing, and the purpose of the exercise is to identify significant terms. If you find it hard to decide on what is significant or not, you may think about the kinds of terms that an end-user is likely to search for. A useful tip is to look for nouns or noun phrases first, and then for any significant verbs. You will remember from the previous chapter that most thesaurus terms fall into these two categories.

Type
Chapter
Information
Publisher: Facet
Print publication year: 2006

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×