Skip to main content Accessibility help
×
Home
Hostname: page-component-55597f9d44-5zjcf Total loading time: 0.509 Render date: 2022-08-14T08:55:29.903Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "useRatesEcommerce": false, "useNewApi": true } hasContentIssue true

Technical terminology: some linguistic properties and an algorithm for identification in text

Published online by Cambridge University Press:  12 September 2008

John S. Justeson
Affiliation:
Department of AnthropologySUNY at AlbanyAlbany, NY 12222, USA
Slava M. Katz
Affiliation:
IBM Research DivisionT. J. Watson Research Center Yorktown Heights, NY 10598, USA

Abstract

This paper identifies some linguistic properties of technical terminology, and uses them to formulate an algorithm for identifying technical terms in running text. The grammatical properties discussed are preferred phrase structures: technical terms consist mostly of noun phrases containing adjectives, nouns, and occasionally prepositions; rerely do terms contain verbs, adverbs, or conjunctions. The discourse properties are patterns of repetition that distinguish noun phrases that are technical terms, especially those multi-word phrases that constitute a substantial majority of all technical vocabulary, from other types of noun phrase.

The paper presents a terminology indentification algorithm that is motivated by these linguistic properties. An implementation of the algorithm is described; it recovers a high proportion of the technical terms in a text, and a high proportaion of the recovered strings are vaild technical terms. The algorithm proves to be effective regardless of the domain of the text to which it is applied.

Type
Articles
Copyright
Copyright © Cambridge University Press 1995

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Akmajian, A., and Lehrer, A., (1976) NP-like quantifiers and the problem of determining the head of an NP. Linguistic Analysis 2: 295313.Google Scholar
Berlin, , Brent, , Breedlove, , Dennis, , and Raven, Peter, (1973) General principles of classification and nomenclature in folk biology. American Anthropologist 75: 214–42.CrossRefGoogle Scholar
Blakiston's Gould Medical Dictionary (1984) 2nd edition. New York: McGraw-hill Book Co.Google Scholar
Bourigault, , Didier, , Surface grammatical analysis for the extraction of terminological noun phrases. (1992) Proceedings of COLING-92.France.Google Scholar
Cherry, Lorinda L., (1990) Index. UNIX Research System Papers, Tenth Edition, volume 2, pp. 609–10. Murray Hill, NJ: Computing Science Research Center, AT&T Bell Laboratories.Google Scholar
Church, Kenneth W., Stochastic parts program and noun phrase parser for unrestricted text. (1988) Proceedings of the Second Conference on Applied Natural Language Processing,Austin, Texas.Google Scholar
Cox, Geoffrey B. (1995) Preparative HPLC of biomolecules. To appear in HPLC: Priniciples and Methods in Biotechnology ed. by Elena, Katz. Chichest, England: Wiley.Google Scholar
Dagan, , Ido, , and Church, Ken, Termight: identifying and translating technical terminology. (1994) Proceedings of the Fifth Conference on Applied Natural Language Processing,Stuttgart.Google Scholar
Damerau, Fred J., (1993) Generating and evaluating domain-oriented multi-word terms from texts. Information Processing & Management 29(4): 433447.CrossRefGoogle Scholar
Ellis, Stephen R., and Hitchcock, R. J., (1986) The emergence of Zipf's law: spontaneous encoding optimization by users of a command language. IEEE Trans. Syst., Man and Cybern. 16(3): 423–27.CrossRefGoogle Scholar
English, Horace B., and English, Ava Champney, (1958) A Comprehensive Dictionary of Psychological and Psychoanalytical Terms. New York: Longmans, Green and Co.Google Scholar
Hamill, Karen A., and Zamora, Antonio, (1980) The use of titles for automatic document classification. JASIS 31(6): 396402.CrossRefGoogle Scholar
Huddleston, , Rodney, , (1984) Introduction to the Grammar of English. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Jones, Leslie P., Gassie, Edward W., and Radhakrishnan, , Sridhar, (1990) INDEX: the statistical basis for an automatic conceptual phrase-indexing system. JASIS 41(2): 8797.3.0.CO;2-8>CrossRefGoogle Scholar
Lapedes, Daniel N., (editor-in-chief) (1978) McGraw-Hill Dictionary of Physics and Mathematics. New York: McGraw-Hill.Google Scholar
McCord, Michael C., (1990) Slot grammar: a system for simpler construction of practical natural language grammars. In Studer, R., (ed.), Natural Language and Logic: International Scientific Symposium, Lecture Notes in Computer Science, Berlin: Springer Verlag. pp. 118145.CrossRefGoogle Scholar
Mueller, , Patrick, (1990) Optimized dictionary (OD) user's manual. Unpublished paper. Bethesda, MD: IBM.Google Scholar
Nádas, , Arthur, (1995) Binary classification by stochastic neural nets. IEEE Transactions on Neural Networks 6(2): 488–91.CrossRefGoogle ScholarPubMed
Pustejovsky, , James, , and Boguraev, , Branimir, , (1993) Lexical knowledge representation and natural language processing. Artificial Intelligence 63: 193223.CrossRefGoogle Scholar
Saltons, , Gerald, , Syntactic approaches to automatic book indexing. (1988) Proceedings of the 26th Annual Meeting of the Association for Computational Linguistics,Buffalo, New York.Google Scholar
Salton, , Gerald, , Zhao, , Zhongnan, , and Buckley, , Chris, (1990) A simple syntactic approach for the generation of indexing phrases. Technical Report 90–1137. Department of Computer Science, Cornell University.Google Scholar
Shepard, Roger N., and Romney, A. Kimball, (1972) Multidimensional Scaling: Theory and Application in the Behavioral Sciences. 2 volumes. New York: Seminar Press.Google Scholar
Weik, Martin H., (1989) Fiber Optics Standard Dictionary. New York: Van Nostrand Reinhold.CrossRefGoogle Scholar
285
Cited by

Save article to Kindle

To save this article to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Technical terminology: some linguistic properties and an algorithm for identification in text
Available formats
×

Save article to Dropbox

To save this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Dropbox account. Find out more about saving content to Dropbox.

Technical terminology: some linguistic properties and an algorithm for identification in text
Available formats
×

Save article to Google Drive

To save this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Google Drive account. Find out more about saving content to Google Drive.

Technical terminology: some linguistic properties and an algorithm for identification in text
Available formats
×
×

Reply to: Submit a response

Please enter your response.

Your details

Please enter a valid email address.

Conflicting interests

Do you have any conflicting interests? *