Search results for Artificial Intelligence and Natural Language Processing

A bootstrapping approach for robust topic analysis
OLIVIER FERRET, BRIGITTE GRAU
Journal:

Natural Language Engineering / Volume 8 / Issue 2-3 / June 2002

Published online by Cambridge University Press:

21 August 2002, pp. 209-233
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Topic analysis is important for many applications dealing with texts, such as text summarization or information extraction. However, it can be done with great precision only if it relies on structured knowledge, which is difficult to produce on a large scale. In this paper, we propose using bootstrapping to solve this problem: a first topic analysis based on a weakly structured source of knowledge, a collocation network, is used for learning explicit topic representations that then support a more precise and reliable topic analysis.

Selecting effective index terms using a decision tree
TOKUNAGA TAKENOBU, KIMURA KENJI, OGIBAYASHI HIRONORI, TANAKA HOZUMI
Journal:

Natural Language Engineering / Volume 8 / Issue 2-3 / June 2002

Published online by Cambridge University Press:

21 August 2002, pp. 193-207
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper explores the effectiveness of index terms more complex than the single words used in conventional information retrieval systems. Retrieval is done in two phases: in the first, a conventional retrieval method (the Okapi system) is used; in the second, complex index terms such as syntactic relations and single words with part-of-speech information are introduced to rerank the results of the first phase. We evaluated the effectiveness of the different types of index terms through experiments using the TREC-7 test collection and 50 queries. The retrieval effectiveness was improved for 32 out of 50 queries. Based on this investigation, we then introduce a method to select effective index terms by using a decision tree. Further experiments with the same test collection showed that retrieval effectiveness was improved in 25 of the 50 queries.

Parsing engineering and empirical robustness
ROBERTO BASILI, FABIO MASSIMO ZANZOTTO
Journal:

Natural Language Engineering / Volume 8 / Issue 2-3 / June 2002

Published online by Cambridge University Press:

21 August 2002, pp. 97-120
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Robustness has been traditionally stressed as a general desirable property of any computational model and system. The human NL interpretation device exhibits this property as the ability to deal with odd sentences. However, the difficulties in a theoretical explanation of robustness within the linguistic modelling suggested the adoption of an empirical notion. In this paper, we propose an empirical definition of robustness based on the notion of performance. Furthermore, a framework for controlling the parser robustness in the design phase is presented. The control is achieved via the adoption of two principles: the modularisation, typical of the software engineering practice, and the availability of domain adaptable components. The methodology has been adopted for the production of CHAOS, a pool of syntactic modules, which has been used in real applications. This pool of modules enables a large validation of the notion of empirical robustness, on the one side, and of the design methodology, on the other side, over different corpora and two different languages (English and Italian).

Word clustering and disambiguation based on co-occurrence data
HANG LI
Journal:

Natural Language Engineering / Volume 8 / Issue 1 / March 2002

Published online by Cambridge University Press:

17 June 2002, pp. 25-42
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We address the problem of clustering words (or constructing a thesaurus) based on co-occurrence data, and conducting syntactic disambiguation by using the acquired word classes. We view the clustering problem as that of estimating a class-based probability distribution specifying the joint probabilities of word pairs. We propose an efficient algorithm based on the Minimum Description Length (MDL) principle for estimating such a probability model. Our clustering method is a natural extension of that proposed in Brown, Della Pietra, deSouza, Lai and Mercer (1992). We next propose a syntactic disambiguation method which combines the use of automatically constructed word classes and that of a hand-made thesaurus. The overall disambiguation accuracy achieved by our method is 88.2%, which compares favorably against the accuracies obtained by the state-of-the-art disambiguation methods.

SUMMAC: a text summarization evaluation
INDERJEET MANI, GARY KLEIN, DAVID HOUSE, LYNETTE HIRSCHMAN, THERESE FIRMIN, BETH SUNDHEIM
Journal:

Natural Language Engineering / Volume 8 / Issue 1 / March 2002

Published online by Cambridge University Press:

17 June 2002, pp. 43-68
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
The TIPSTER Text Summarization Evaluation (SUMMAC) has developed several new extrinsic and intrinsic methods for evaluating summaries. It has established definitively that automatic text summarization is very effective in relevance assessment tasks on news articles. Summaries as short as 17% of full text length sped up decision-making by almost a factor of 2 with no statistically significant degradation in accuracy. Analysis of feedback forms filled in after each decision indicated that the intelligibility of present-day machine-generated summaries is high. Systems that performed most accurately in the production of indicative and informative topic-related summaries used term frequency and co-occurrence statistics, and vocabulary overlap comparisons between text passages. However, in the absence of a topic, these statistical methods do not appear to provide any additional leverage: in the case of generic summaries, the systems were indistinguishable in accuracy. The paper discusses some of the tradeoffs and challenges faced by the evaluation, and also lists some of the lessons learned, impacts, and possible future directions. The evaluation methods used in the SUMMAC evaluation are of interest to both summarization evaluation as well as evaluation of other ‘output-related’ NLP technologies, where there may be many potentially acceptable outputs, with no automatic way to compare them.

An LVQ connectionist solution to the non-determinacy problem in Arabic morphological analysis: a learning hybrid algorithm
M. A. EL-AFFENDI
Journal:

Natural Language Engineering / Volume 8 / Issue 1 / March 2002

Published online by Cambridge University Press:

17 June 2002, pp. 3-23
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Most of the morphological properties of derivational Arabic words are encapsulated in their corresponding morphological patterns. The morphological pattern is a template that shows how the word should be decomposed into its constituent morphemes (prefix + stem + suffix), and at the same time, marks the positions of the radicals comprising the root of the word. The number of morphological patterns in Arabic is finite and is well below 1000. Due to these properties, most of the current analysis algorithms concentrate on discovering the morphological pattern of the input word as a major step in recognizing the type and category of the word. Unfortunately, this process is non-determinitic in the sense that the underlying search process may sometimes associate more than one morphological pattern with the given word, all of them satisfying the major lexical constraints. One solution to this problem is to use a collection of connectionist pattern associaters that uniquely associate each word with its corresponding morphological pattern. This paper describes an LVQ-based learning pattern association system that uniquely maps a given Arabic word to its corresponding morphological pattern, and therefore deduces its morphological properties. The system consists of a collection of hetroassociative models that are trained using the LVQ algorithm plus a collection of autoassociative models that have been trained using backpropagation. Experimental results have shown that the system is fairly accurate and very easy to train. The LVQ algorithm has been chosen because it is very easy to train and the implied training time is very small compared to that of backpropagation.

Contextually appropriate reference generation
ÖZGÜR YÜKSEL, CEM BOZSAHIN
Journal:

Natural Language Engineering / Volume 8 / Issue 1 / March 2002

Published online by Cambridge University Press:

17 June 2002, pp. 69-89
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We describe a system for contextually appropriate anaphor and pronoun generation for Turkish. It uses binding theory and centering theory to model local and nonlocal references. We describe the rules for Turkish, and their computational treatment. A cascaded method for anaphor and pronoun generation is proposed for handling pro-drop and discourse constraints on pronominalization. The system has been tested as a stand-alone nominal expression generator, and also as a reference planning component of a transfer-based MT system.

Christopher D. Manning and Hinrich Schutze. Foundations of Statistical Natural Language Processing. MIT Press, 2000. ISBN 0-262-13360-1. 620 pp. $64.95/£44.95 (cloth).
ADVAITH SIDDHARTHAN
Journal:

Natural Language Engineering / Volume 8 / Issue 1 / March 2002

Published online by Cambridge University Press:

17 June 2002, pp. 91-92
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Editorial
John Tait, Branimir Boguraev, Christian Jacquemin
Journal:

Natural Language Engineering / Volume 8 / Issue 1 / March 2002

Published online by Cambridge University Press:

17 June 2002, pp. 1-2
- Article
- - You have access
- PDF
- Export citation
This is the first issue of Volume 8, and we thought we would take this opportunity to bring readers of Natural Language Engineering up-to-date with various developments with the journal.

Discovery of inference rules for question-answering
DEKANG LIN, PATRICK PANTEL
Journal:

Natural Language Engineering / Volume 7 / Issue 4 / December 2001

Published online by Cambridge University Press:

15 February 2002, pp. 343-360
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
One of the main challenges in question-answering is the potential mismatch between the expressions in questions and the expressions in texts. While humans appear to use inference rules such as ‘X writes Y’ implies ‘X is the author of Y’ in answering questions, such rules are generally unavailable to question-answering systems due to the inherent difficulty in constructing them. In this paper, we present an unsupervised algorithm for discovering inference rules from text. Our algorithm is based on an extended version of Harris’ Distributional Hypothesis, which states that words that occurred in the same contexts tend to be similar. Instead of using this hypothesis on words, we apply it to paths in the dependency trees of a parsed corpus. Essentially, if two paths tend to link the same set of words, we hypothesize that their meanings are similar. We use examples to show that our system discovers many inference rules easily missed by humans.

The TREC question answering track
ELLEN M. VOORHEES
Journal:

Natural Language Engineering / Volume 7 / Issue 4 / December 2001

Published online by Cambridge University Press:

14 February 2002, pp. 361-378
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
The Text REtrieval Conference (TREC) question answering track is an effort to bring the benefits of large-scale evaluation to bear on a question answering (QA) task. The track has run twice so far, first in TREC-8 and again in TREC-9. In each case, the goal was to retrieve small snippets of text that contain the actual answer to a question rather than the document lists traditionally returned by text retrieval systems. The best performing systems were able to answer about 70% of the questions in TREC-8 and about 65% of the questions in TREC-9. While the 65% score is a slightly worse result than the TREC-8 scores in absolute terms, it represents a very significant improvement in question answering systems. The TREC-9 task was considerably harder than the TREC-8 task because TREC-9 used actual users’ questions while TREC-8 used questions constructed for the track. Future tracks will continue to challenge the QA community with more difficult, and more realistic, question answering tasks.

Complex answers: a case study using a WWW question answering system
S. BUCHHOLZ, W. DAELEMANS
Journal:

Natural Language Engineering / Volume 7 / Issue 4 / December 2001

Published online by Cambridge University Press:

14 February 2002, pp. 301-323
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We investigate the problem of complex answers in question answering. Complex answers consist of several simple answers. We describe the online question answering system SHAPAQA, and using data from this system we show that the problem of complex answers is quite common. We define nine types of complex questions, and suggest two approaches, based on answer frequencies, that allow question answering systems to tackle the problem.

Natural language question answering: the view from here
L. HIRSCHMAN, R. GAIZAUSKAS
Journal:

Natural Language Engineering / Volume 7 / Issue 4 / December 2001

Published online by Cambridge University Press:

14 February 2002, pp. 275-300
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
As users struggle to navigate the wealth of on-line information now available, the need for automated question answering systems becomes more urgent. We need systems that allow a user to ask a question in everyday language and receive an answer quickly and succinctly, with sufficient context to validate the answer. Current search engines can return ranked lists of documents, but they do not deliver answers to the user.
Question answering systems address this problem. Recent successes have been reported in a series of question-answering evaluations that started in 1999 as part of the Text Retrieval Conference (TREC). The best systems are now able to answer more than two thirds of factual questions in this evaluation.

Analyses for elucidating current question answering technology
MARC LIGHT, GIDEON S. MANN, ELLEN RILOFF, ERIC BRECK
Journal:

Natural Language Engineering / Volume 7 / Issue 4 / December 2001

Published online by Cambridge University Press:

14 February 2002, pp. 325-342
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
In this paper, we take a detailed look at the performance of components of an idealized question answering system on two different tasks: the TREC Question Answering task and a set of reading comprehension exams. We carry out three types of analysis: inherent properties of the data, feature analysis, and performance bounds. Based on these analyses we explain some of the performance results of the current generation of Q/A systems and make predictions on future work. In particular, we present four findings: (1) Q/A system performance is correlated with answer repetition; (2) relative overlap scores are more effective than absolute overlap scores; (3) equivalence classes on scoring functions can be used to quantify performance bounds; and (4) perfect answer typing still leaves a great deal of ambiguity for a Q/A system because sentences often contain several items of the same type.

A corpus-based approach for Korean nominal compound analysis based on linguistic and statistical information
JUNTAE YOON, KEY-SUN CHOI, MANSUK SONG
Journal:

Natural Language Engineering / Volume 7 / Issue 3 / September 2001

Published online by Cambridge University Press:

29 August 2001, pp. 251-270
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
The syntactic structure of a nominal compound must be analyzed first for its semantic interpretation. In addition, the syntactic analysis of nominal compounds is very useful for NLP application such as information extraction, since a nominal compound often has a similar linguistic structure with a simple sentence, as well as representing concrete and compound meaning of an object with several nouns combined. In this paper, we present a novel model for structural analysis of nominal compounds using linguistic and statistical knowledge which is coupled based on lexical information. That is, the syntactic relations defined between nouns (complement-predicate and modifier-head relation) are obtained from large corpora and again used to analyze the structures of nominal compounds and identify the underlying relations between nouns. Experiments show that the model gives good results, and can be effectively used for application systems which do not require deep semantic information.

Ehud Reiter and Robert Dale. Building Natural Language Generation Systems. Cambridge University Press, 2000. $64.95/£37.50 (Hardback). 234 pages
ADVAITH SIDDHARTHAN
Journal:

Natural Language Engineering / Volume 7 / Issue 3 / September 2001

Published online by Cambridge University Press:

29 August 2001, pp. 271-274
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Scalable generation of texts using causal and temporal expansions of sentences
YLLIAS CHALI
Journal:

Natural Language Engineering / Volume 7 / Issue 3 / September 2001

Published online by Cambridge University Press:

29 August 2001, pp. 191-205
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper presents a exible bottom-up process to incrementally generate several versions of the same text, building up the core text from its kernel version into other versions varying of the levels of details. We devise a method for identifying the question/answer relations holding between the propositions of a text, we give rules for characterizing the kernel version of a text, and we provide a procedure, based on causal and temporal expansions of sentences, which distinguishes semantically these levels of details according to their importance. This is based on the assumption that we have a stock of information from the interpreter's knowledge base available. The sentence expansion operation is formally defined according to three principles: (1) the kernel principle ensures to obtain the gist information; (2) the expansion principle defines an incremental augmentation of a text; and (3) the subsume principle defines an importance-based order among the possible details of the information. The system developed allows users to generate in a follow-up way their own text version which meets their expectations and their demands expressed as questions about the text under consideration.

Applied morphological processing of English
GUIDO MINNEN, JOHN CARROLL, DARREN PEARCE
Journal:

Natural Language Engineering / Volume 7 / Issue 3 / September 2001

Published online by Cambridge University Press:

29 August 2001, pp. 207-223
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We describe two newly developed computational tools for morphological processing: a program for analysis of English inflectional morphology, and a morphological generator, automatically derived from the analyser. The tools are fast, being based on finite-state techniques, have wide coverage, incorporating data from various corpora and machine readable dictionaries, and are robust, in that they are able to deal effectively with unknown words. The tools are freely available. We evaluate the accuracy and speed of both tools and discuss a number of practical applications in which they have been put to use.

Inderjeet Mani and Mark T. Maybury (eds). Advances in Automatic Text Summarization. MIT Press, 1999. ISBN 0-262-13359-8. 442 pp. $47.95/£32.95 (paperback).
ADVAITH SIDDHARTHAN
Journal:

Natural Language Engineering / Volume 7 / Issue 3 / September 2001

Published online by Cambridge University Press:

29 August 2001, pp. 271-274
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

ILEX: an architecture for a dynamic hypertext generation system
M. O'DONNELL, C. MELLISH, J. OBERLANDER, A. KNOTT
Journal:

Natural Language Engineering / Volume 7 / Issue 3 / September 2001

Published online by Cambridge University Press:

29 August 2001, pp. 225-250
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Generating text in a hypermedia environment places different demands on a text generation system than occurs in non-interactive environments. This paper describes some of these demands, then shows how the architecture of one text generation system, ILEX, has been shaped by them. The architecture is described in terms of the levels of linguistic representation used, and the processes which map between them. Particular attention is paid to the processes of content selection and text structuring.

Artificial Intelligence and Natural Language Processing

Refine search

Refine search

Actions for selected content:

3241 results in Artificial Intelligence and Natural Language Processing

A bootstrapping approach for robust topic analysis

Selecting effective index terms using a decision tree

Parsing engineering and empirical robustness

Word clustering and disambiguation based on co-occurrence data

SUMMAC: a text summarization evaluation

An LVQ connectionist solution to the non-determinacy problem in Arabic morphological analysis: a learning hybrid algorithm

Contextually appropriate reference generation

Christopher D. Manning and Hinrich Schutze. Foundations of Statistical Natural Language Processing. MIT Press, 2000. ISBN 0-262-13360-1. 620 pp. $64.95/£44.95 (cloth).

Editorial

Discovery of inference rules for question-answering

The TREC question answering track

Complex answers: a case study using a WWW question answering system

Natural language question answering: the view from here

Analyses for elucidating current question answering technology

A corpus-based approach for Korean nominal compound analysis based on linguistic and statistical information

Ehud Reiter and Robert Dale. Building Natural Language Generation Systems. Cambridge University Press, 2000. $64.95/£37.50 (Hardback). 234 pages

Scalable generation of texts using causal and temporal expansions of sentences

Applied morphological processing of English

Inderjeet Mani and Mark T. Maybury (eds). Advances in Automatic Text Summarization. MIT Press, 1999. ISBN 0-262-13359-8. 442 pp. $47.95/£32.95 (paperback).

ILEX: an architecture for a dynamic hypertext generation system

Artificial Intelligence and Natural Language Processing

Refine search

Refine search

Actions for selected content:

Save Search

3241 results in Artificial Intelligence and Natural Language Processing