Search results for Artificial Intelligence and Natural Language Processing

Frontmatter
Ehud Reiter, University of Aberdeen, Robert Dale, Macquarie University, Sydney
Book:

Building Natural Language Generation Systems

Published online:

25 August 2009

Print publication:

28 January 2000, pp i-vi
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

3 - The Architecture of a Natural Language Generation System
Ehud Reiter, University of Aberdeen, Robert Dale, Macquarie University, Sydney
Book:

Building Natural Language Generation Systems

Published online:

25 August 2009

Print publication:

28 January 2000, pp 41-78
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

A piece of software as complex as a complete natural language generation system is unlikely to be constructed as a monolithic program. In this chapter, we introduce a particular architecture for nlg systems, by which we mean a specification of how the different types of processing are distributed across a number of component modules. As part of this architectural specification, we discuss how these modules interact with each other and we describe the data structures that are passed between the modules.
Introduction
Like other complex software systems, nlg systems are generally easier to build and debug if they are decomposed into distinct, well-defined, and easily-integrated modules. This is especially true if the software is being developed by a team rather than by one individual. Modularisation can also make it easier to reuse components amongst different applications and can make it easier to modify an application. Suppose, for example, we adopt a modularisation where one component is responsible for selecting the information content of a text and another is responsible for expressing this content in some natural language. Provided a well-defined interface between these components is specified, different teams or individuals can work on the two components independently. It may be possible to reuse the components (and in particular the second, less application-dependent component) independently of one another.

MLDS: A translator-oriented MultiLingual dictionary system
E. AGIRRE, X. ARREGI, X. ARTOLA, A. DIAZ DE ILARRAZA, K. SARASOLA, A. SOROA
Journal:

Natural Language Engineering / Volume 5 / Issue 4 / December 1999

Published online by Cambridge University Press:

01 December 1999, pp. 325-353
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper focuses on the design methodology of the MultiLingual Dictionary-System (MLDS), which is a human-oriented tool for assisting in the task of translating lexical units, oriented to translators and conceived from studies carried out with translators. We describe the model adopted for the representation of multilingual dictionary-knowledge. Such a model allows an enriched exploitation of the lexical-semantic relations extracted from dictionaries. In addition, MLDS is supplied with knowledge about the use of the dictionaries in the process of lexical translation, which was elicitated by means of empirical methods and specified in a formal language. The dictionary-knowledge along with the task-oriented knowledge are used to offer the translator active, anticipative and intelligent assistance.

Topic-based mixture language modelling
YOSHIHIKO GOTOH, STEVE RENALS
Journal:

Natural Language Engineering / Volume 5 / Issue 4 / December 1999

Published online by Cambridge University Press:

01 December 1999, pp. 355-375
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper describes an approach for constructing a mixture of language models based on simple statistical notions of semantics using probabilistic models developed for information retrieval. The approach encapsulates corpus-derived semantic information and is able to model varying styles of text. Using such information, the corpus texts are clustered in an unsupervised manner and a mixture of topic-specific language models is automatically created. The principal contribution of this work is to characterise the document space resulting from information retrieval techniques and to demonstrate the approach for mixture language modelling. A comparison is made between manual and automatic clustering in order to elucidate how the global content information is expressed in the space. We also compare (in terms of association with manual clustering and language modelling accuracy) alternative term-weighting schemes and the effect of singular value decomposition dimension reduction (latent semantic analysis). Test set perplexity results using the British National Corpus indicate that the approach can improve the potential of statistical language modelling. Using an adaptive procedure, the conventional model may be tuned to track text data with a slight increase in computational cost.

Evaluating two methods for Treebank grammar compaction
ALEXANDER KROTOV, MARK HEPPLE, ROBERT GAIZAUSKAS, YORICK WILKS
Journal:

Natural Language Engineering / Volume 5 / Issue 4 / December 1999

Published online by Cambridge University Press:

01 December 1999, pp. 377-394
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Treebanks, such as the Penn Treebank, provide a basis for the automatic creation of broad coverage grammars. In the simplest case, rules can simply be ‘read off’ the parse-annotations of the corpus, producing either a simple or probabilistic context-free grammar. Such grammars, however, can be very large, presenting problems for the subsequent computational costs of parsing under the grammar. In this paper, we explore ways by which a treebank grammar can be reduced in size or ‘compacted’, which involve the use of two kinds of technique: (i) thresholding of rules by their number of occurrences; and (ii) a method of rule-parsing, which has both probabilistic and non-probabilistic variants. Our results show that by a combined use of these two techniques, a probabilistic context-free grammar can be reduced in size by 62% without any loss in parsing performance, and by 71% to give a gain in recall, but some loss in precision.

Context-sensitive spoken dialogue processing with the DOP model
RENS BOD
Journal:

Natural Language Engineering / Volume 5 / Issue 4 / December 1999

Published online by Cambridge University Press:

01 December 1999, pp. 309-323
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We show how the DOP model can be used for fast and robust context-sensitive processing of spoken input in a practical spoken dialogue system called OVIS. OVIS (Openbaar Vervoer Informatie Systeem) – ‘Public Transport Information System’, is a Dutch spoken language information system which operates over ordinary telephone lines. The prototype system is the immediate goal of the NWO Priority Programme ‘Language and Speech Technology’. In this paper, we extend the original Data-Oriented Parsing (DOP) model to context-sensitive interpretation of spoken input. The system we describe uses the OVIS corpus (which consists of 10,000 trees enriched with compositional semantics) to compute from an input word-graph the best utterance together with its meaning. Dialogue context is taken into account by dividing up the OVIS corpus into context-dependent subcorpora. Each system question triggers a subcorpus by which the user answer is analysed and interpreted. Our experiments indicate that the context-sensitive DOP model obtains better accuracy than the original model, allowing for fast and robust processing of spoken input.

Nomi Erteschik-Shir, The Dynamics of Focus Structure. Cambridge: Cambridge University Press, 1997. ISBN 0 521 59217 8, Price £40.00/$64.96 (hardback). 280+xiv pages.
MAYUMI MASUKO
Journal:

Natural Language Engineering / Volume 5 / Issue 4 / December 1999

Published online by Cambridge University Press:

01 December 1999, pp. 395-402
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Jennifer Pearson. Terms in Context. John Benjamins Publishing Co., Amsterdam. 1998. ISBN 90 272 2269 X (Eur.)/1 55619 342 (US). $69.00. xiii+243 pages
PAUL R. BOWDEN
Journal:

Natural Language Engineering / Volume 5 / Issue 4 / December 1999

Published online by Cambridge University Press:

01 December 1999, pp. 395-402
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Parsing with discontinuous phrases
ALLAN RAMSAY
Journal:

Natural Language Engineering / Volume 5 / Issue 3 / September 1999

Published online by Cambridge University Press:

01 September 1999, pp. 271-300
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Most parsing algorithms require phrases that are to be combined to be either contiguous or marked as being ‘extraposed’. The assumption that phrases which are to be combined will be adjacent to one another supports rapid indexing mechanisms: the fact that in most languages items can turn up in unexpected locations cancels out much of the ensuing efficiency. The current paper shows how ‘out of position’ items can be incorporated directly. This leads to efficient parsing even when items turn up having been right-shifted, a state of affairs which makes Johnson and Kay's (1994) notion of ‘sponsorship’ of empty nodes inapplicable.

Routing email automatically by purpose not topic
HAMID KHOSRAVI, YORICK WILKS
Journal:

Natural Language Engineering / Volume 5 / Issue 3 / September 1999

Published online by Cambridge University Press:

01 September 1999, pp. 237-250
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
In this paper we present results concerning the large scale automatic extraction of pragmatic content from email by a system based on a phrase matching approach to speech act detection combined with empirical detection of speech act patterns in corpora. The results show that most speech acts that occur in this corpus can be recognized by the approach. This investigation is supported by analysis of a corpus consisting of 1000 emails.

Roger Garside, Geoffrey Leech, Anthony McEnery (eds). Corpus Annotation: Linguistic Information from Computer Text Corpora. Longman. 1997. ISBN 0-582-29837-7 (Paperback). £26.00. 281 pages
OLIVER MASON
Journal:

Natural Language Engineering / Volume 5 / Issue 3 / September 1999

Published online by Cambridge University Press:

01 September 1999, pp. 301-307
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

A reestimation algorithm for probabilistic dependency grammars
SEUNGMI LEE, KEY-SUN CHOI
Journal:

Natural Language Engineering / Volume 5 / Issue 3 / September 1999

Published online by Cambridge University Press:

01 September 1999, pp. 251-270
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
A probabilistic parameter reestimation algorithm plays a key role in the automatic acquisition of stochastic grammars. In the case of context-free phrase structure grammars, the inside-outside algorithm is widely used. However, it is not directly applicable to Probabilistic Dependency Grammar (PDG), because PDG is not based on constituents but on a head-dependent relation between pairs of words. This paper presents a reestimation algorithm which is a variation of the inside-outside algorithm adapted to probabilistic dependency grammar. The algorithm can be used either to reestimate the probabilistic parameters of an existing dependency grammar, or to extract a PDG from scratch. Using the algorithm, we have learned a PDG from a part-of-speech-tagged corpus of Korean, which showed about 62·82% dependency accuracy (the percentage of correct dependencies) for unseen test sentences.

An approach to program understanding by natural language understanding
LETHA H. ETZKORN, LISA L. BOWEN, CARL G. DAVIS
Journal:

Natural Language Engineering / Volume 5 / Issue 3 / September 1999

Published online by Cambridge University Press:

01 September 1999, pp. 219-236
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
An automated tool to assist in the understanding of legacy code components can be useful both in the areas of software reuse and software maintenance. Most previous work in this area has concentrated on functionally-oriented code. Whereas object-oriented code has been shown to be inherently more reusable than functionally-oriented code, in many cases the eventual reuse of the object-oriented code was not considered during development. A knowledge-based, natural language processing approach to the automated understanding of object-oriented code as an aid to the reuse of object-oriented code is described. A system, called the PATRicia system (Program Analysis Tool for Reuse) that implements the approach is examined. The natural language processing/information extraction system that comprises a large part of the PATRicia system is discussed and the knowledge-base of the PATRicia system, in the form of conceptual graphs, is described. Reports provided by natural language-generation in the PATRicia system are described.

Dafydd Gibbon, Roger Moore, and Richard Winski (eds). Handbook of Standards and Resources for Spoken Language Systems. Mouton de Gruyter. 1997. ISBN 3-11-015366-1. DM 298. xxx+886 pages.
GEOFFREY SAMPSON
Journal:

Natural Language Engineering / Volume 5 / Issue 3 / September 1999

Published online by Cambridge University Press:

01 September 1999, pp. 301-307
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Ronnie W. Smith and D. Richard Hipp, Spoken Natural Dialog Systems: A Practical Approach. Oxford: Oxford University Press, 1994. Price £45.00 ISBN 0-19-509187-6.
DEBORAH DAHL
Journal:

Natural Language Engineering / Volume 5 / Issue 3 / September 1999

Published online by Cambridge University Press:

01 September 1999, pp. 301-307
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Semantic tagging of unknown proper nouns
ALESSANDRO CUCCHIARELLI, DANILO LUZI, PAOLA VELARDI
Journal:

Natural Language Engineering / Volume 5 / Issue 2 / June 1999

Published online by Cambridge University Press:

01 June 1999, pp. 171-185
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
In this paper, we describe a context-based method to semantically tag unknown proper nouns (U-PNs) in corpora. Like many others, our system relies on a gazetteer and a set of context-dependent heuristics to classify proper nouns. However, proper nouns are an open-end class: when parsing new fragments of a corpus, even in the same language domain, we can expect that several proper nouns cannot be semantically tagged. The algorithm that we propose assigns to an unknown PN an entity type based on the analysis of syntactically and semantically similar contexts already seen in the application corpus. The performance of the algorithm is evaluated not only in terms of precision, following the tradition of MUC conferences, but also in terms of information gain, an information theoretic measure that takes into account the complexity of the classification task.

SENSE: an analogy-based Word Sense Disambiguation system
STEFANO FEDERICI, SIMONETTA MONTEMAGNI, VITO PIRRELLI
Journal:

Natural Language Engineering / Volume 5 / Issue 2 / June 1999

Published online by Cambridge University Press:

01 June 1999, pp. 207-218
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
The paper describes SENSE, a word sense disambiguation system which makes use of multidimensional analogy-based proportions to infer the most likely sense of a word given its context. Architecture and functioning of the system are illustrated in detail. Results of different experimental settings are given, showing that the system, in spite its conservative bias, successfully copes with the problem of training data sparseness.

Matching the tagging to the task
GEORGE A. MILLER, RANDEE TENGI, SHARI LANDES
Journal:

Natural Language Engineering / Volume 5 / Issue 2 / June 1999

Published online by Cambridge University Press:

01 June 1999, pp. 135-145
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Two tasks involving lexical semantic sense tagging are described. Different task requirements made it necessary to select different corpora to be tagged and to develop different tagging interfaces to achieve the desired result. A vocabulary-building task required sequential tagging of connnected text, whereas a word-sense identification task required targeted tagging of many instances of common polysemous words. Advantages and drawbacks of both are compared.

A corpus-based bootstrapping algorithm for Semi-Automated semantic lexicon construction
ELLEN RILOFF, JESSICA SHEPHERD
Journal:

Natural Language Engineering / Volume 5 / Issue 2 / June 1999

Published online by Cambridge University Press:

01 June 1999, pp. 147-156
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Many applications need a lexicon that represents semantic information but acquiring lexical information is time consuming. We present a corpus-based bootstrapping algorithm that assists users in creating domain-specific semantic lexicons quickly. Our algorithm uses a representative text corpus for the domain and a small set of ‘seed words’ that belong to a semantic class of interest. The algorithm hypothesizes new words that are also likely to belong to the semantic class because they occur in the same contexts as the seed words. The best hypotheses are added to the seed word list dynamically, and the process iterates in a bootstrapping fashion. When the bootstrapping process halts, a ranked list of hypothesized category words is presented to a user for review. We used this algorithm to generate a semantic lexicon for eleven semantic classes associated with the MUC-4 terrorism domain.

Distinguishing systems and distinguishing senses: new evaluation methods for Word Sense Disambiguation
PHILIP RESNIK, DAVID YAROWSKY
Journal:

Natural Language Engineering / Volume 5 / Issue 2 / June 1999

Published online by Cambridge University Press:

01 June 1999, pp. 113-133
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Resnik and Yarowsky (1997) made a set of observations about the state-of-the-art in automatic word sense disambiguation and, motivated by those observations, offered several specific proposals regarding improved evaluation criteria, common training and testing resources, and the definition of sense inventories. Subsequent discussion of those proposals resulted in SENSEVAL, the first evaluation exercise for word sense disambiguation (Kilgarriff and Palmer 2000). This article is a revised and extended version of our 1997 workshop paper, reviewing its observations and proposals and discussing them in light of the SENSEVAL exercise. It also includes a new in-depth empirical study of translingually-based sense inventories and distance measures, using statistics collected from native-speaker annotations of 222 polysemous contexts across 12 languages. These data show that monolingual sense distinctions at most levels of granularity can be effectively captured by translations into some set of second languages, especially as language family distance increases. In addition, the probability that a given sense pair will tend to lexicalize differently across languages is shown to correlate with semantic salience and sense granularity; sense hierarchies automatically generated from such distance matrices yield results remarkably similar to those created by professional monolingual lexicographers.

Artificial Intelligence and Natural Language Processing

Refine search

Refine search

Actions for selected content:

3242 results in Artificial Intelligence and Natural Language Processing

Frontmatter

3 - The Architecture of a Natural Language Generation System

Summary

MLDS: A translator-oriented MultiLingual dictionary system

Topic-based mixture language modelling

Evaluating two methods for Treebank grammar compaction

Context-sensitive spoken dialogue processing with the DOP model

Nomi Erteschik-Shir, The Dynamics of Focus Structure. Cambridge: Cambridge University Press, 1997. ISBN 0 521 59217 8, Price £40.00/$64.96 (hardback). 280+xiv pages.

Jennifer Pearson. Terms in Context. John Benjamins Publishing Co., Amsterdam. 1998. ISBN 90 272 2269 X (Eur.)/1 55619 342 (US). $69.00. xiii+243 pages

Parsing with discontinuous phrases

Routing email automatically by purpose not topic

Roger Garside, Geoffrey Leech, Anthony McEnery (eds). Corpus Annotation: Linguistic Information from Computer Text Corpora. Longman. 1997. ISBN 0-582-29837-7 (Paperback). £26.00. 281 pages

A reestimation algorithm for probabilistic dependency grammars

An approach to program understanding by natural language understanding

Dafydd Gibbon, Roger Moore, and Richard Winski (eds). Handbook of Standards and Resources for Spoken Language Systems. Mouton de Gruyter. 1997. ISBN 3-11-015366-1. DM 298. xxx+886 pages.

Ronnie W. Smith and D. Richard Hipp, Spoken Natural Dialog Systems: A Practical Approach. Oxford: Oxford University Press, 1994. Price £45.00 ISBN 0-19-509187-6.

Semantic tagging of unknown proper nouns

SENSE: an analogy-based Word Sense Disambiguation system

Matching the tagging to the task

A corpus-based bootstrapping algorithm for Semi-Automated semantic lexicon construction

Distinguishing systems and distinguishing senses: new evaluation methods for Word Sense Disambiguation

Artificial Intelligence and Natural Language Processing

Refine search

Refine search

Actions for selected content:

Save Search

3242 results in Artificial Intelligence and Natural Language Processing

Summary