Hostname: page-component-89b8bd64d-dvtzq Total loading time: 0 Render date: 2026-05-08T18:19:59.366Z Has data issue: false hasContentIssue false

Technical terminology: some linguistic properties and an algorithm for identification in text

Published online by Cambridge University Press:  12 September 2008

John S. Justeson
Affiliation:
Department of AnthropologySUNY at AlbanyAlbany, NY 12222, USA
Slava M. Katz
Affiliation:
IBM Research DivisionT. J. Watson Research Center Yorktown Heights, NY 10598, USA

Abstract

This paper identifies some linguistic properties of technical terminology, and uses them to formulate an algorithm for identifying technical terms in running text. The grammatical properties discussed are preferred phrase structures: technical terms consist mostly of noun phrases containing adjectives, nouns, and occasionally prepositions; rerely do terms contain verbs, adverbs, or conjunctions. The discourse properties are patterns of repetition that distinguish noun phrases that are technical terms, especially those multi-word phrases that constitute a substantial majority of all technical vocabulary, from other types of noun phrase.

The paper presents a terminology indentification algorithm that is motivated by these linguistic properties. An implementation of the algorithm is described; it recovers a high proportion of the technical terms in a text, and a high proportaion of the recovered strings are vaild technical terms. The algorithm proves to be effective regardless of the domain of the text to which it is applied.

Information

Type
Articles
Copyright
Copyright © Cambridge University Press 1995

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable