To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
We characterize the impact of a linear $\beta$-reduction on the result of a control-flow analysis. (By ‘a linear $\beta$-reduction’ we mean the $\beta$-reduction of a linear $\lambda$-abstraction, i.e., of a $\lambda$-abstraction whose parameter occurs exactly once in its body.) As a corollary, we consider the administrative reductions of a Plotkin-style transformation into Continuation-Passing Style (CPS), and how they affect the result of a constraint-based control-flow analysis and, in particular, the least element in the space of solutions. We show that administrative reductions preserve the least solution. Preservation of least solutions solves a problem that was left open in Palsberg and Wand's article ‘CPS Transformation of Flow Information.’ Together, Palsberg and Wand's article and the present article show how to map in linear time the least solution of the flow constraints of a program into the least solution of the flow constraints of the CPS counterpart of this program, after administrative reductions. Furthermore, we show how to CPS transform control-flow information in one pass.
We consider the question of how a Continuation-Passing-Style (CPS) transformation changes the flow analysis of a program. We present an algorithm that takes the least solution to the flow constraints of a program and constructs in linear time the least solution to the flow constraints for the CPS-transformed program. Previous studies of this question used CPS transformations that had the effect of duplicating code, or of introducing flow sensitivity into the analysis. Our algorithm has the property that for a program point in the original program and the corresponding program point in the CPS-transformed program, the flow information is the same. By carefully avoiding both duplicated code and flow-sensitive analysis, we find that the most accurate analysis of the CPS-transformed program is neither better nor worse than the most accurate analysis of the original. Thus a compiler that needed flow information after CPS transformation could use the flow information from the original program to annotate some program points, and it could use our algorithm to find the rest of the flow information quickly, rather than having to analyze the CPS-transformed program.
We present functional implementations of Koda and Ruskey's algorithm for generating all ideals of a forest poset as a Gray code. Using a continuation-based approach, we give an extremely concise formulation of the algorithm's core. Then, in a number of steps, we derive a first-order version whose efficiency is comparable to that of a C implementation given by Knuth.
Textual question answering is a technique of extracting a sentence or text snippet from a document or document collection that responds directly to a query. Open-domain textual question answering presupposes that questions are natural and unrestricted with respect to topic. The question answering (Q/A) techniques, as embodied in today's systems, can be roughly divided into two types: (1) techniques for Information Seeking (IS), which localize the answer in vast document collections; and (2) techniques for Reading Comprehension (RC) that answer a series of questions related to a given document. Although these two types of techniques and systems are different, it is desirable to combine them for enabling more advanced forms of Q/A. This paper discusses an approach that successfully enhanced an existing IS system with RC capabilities. This enhancement is important because advanced Q/A, as exemplified by the ARDA AQUAINT program, is moving towards Q/A systems that incorporate semantic and pragmatic knowledge enabling dialogue-based Q/A. Because today's RC systems involve a short series of questions in context, they represent a rudimentary form of interactive Q/A which constitutes a possible foundation for more advanced forms of dialogue-based Q/A.
This study aims to improve the performance of identifying grammatical functions between an adnoun clause and a noun phrase in Korean. The key task is to determine the relation between the two constituents in terms of such functional categories as subject, object, adverbial and appositive. The problem is mainly caused by the fact that functional morphemes, which are considered to be crucial for identifying the relation, are omitted in the noun phrases. To tackle this problem, we propose to employ the Support Vector Machines (SVM) in determining the grammatical functions. Through an experiment with a tagged corpus for training SVMs, we found the proposed model to be more useful than both the Maximum Entropy Model (MEM) and the backed-off model.
This paper has two purposes. First, it suggests a formal approach for specifying and verifying lingware. This approach is based on a unified notation of the main existing formalisms for describing linguistic knowledge (i.e. Formal Grammars, Unification Grammars, HPSG, etc.) on the one hand, and the integration of data and processing on the other. Accordingly, a lingware specification includes all related aspects in a unified framework. This facilitates the development of a lingware system, since one has to follow a single development process instead of two separate ones. Secondly, it presents an environment for the formal specification of lingware, based on the suggested approach, which is neither restricted to a particular kind of application nor to a particular class of linguistic formalisms. This environment provides interfaces enabling the specification of both linguistic knowledge and functional aspects of a lingware system. Linguistic knowledge is specified with the usual grammatical formalisms, whereas functional aspects are specified with a suitable formal notation. Both descriptions will be integrated into the same framework to obtain a complete requirement specification that can be refined towards an executable program.
In this paper, we describe a system for coreference resolution and emphasize the role of evaluation for its design. The goal of the system is to group referring expressions (identified beforehand in narrative texts) into sets of coreferring expressions that correspond to discourse entities. Several knowledge sources are distinguished, such as referential compatibility between a referring expression and a discourse entity, activation factors for discourse entities, size of working memory, or meta-rules for the creation of discourse entities. For each of them, the theoretical analysis of its relevance is compared to scores obtained through evaluation. After looping through all knowledge sources, an optimal behavior is chosen, then evaluated on test data. The paper also discusses evaluation measures as well as data annotation, and compares the present approach to others in the field.
Constraint-based reasoning is often used to represent and find solutions to configuration problems. In the field of constraint satisfaction, the major focus has been on finding solutions to difficult problems. However, many real-life configuration problems, although not extremely complicated, have a huge number of solutions, few of which are acceptable from a practical standpoint. In this paper we present a value ordering heuristic for constraint solving that attempts to guide search toward solutions that are acceptable. More specifically, by considering weights that are assigned to values and sets of values, the heuristic can guide search toward solutions for which the total weight is within an acceptable interval. Experiments with random constraint satisfaction problems demonstrate that, when a problem has numerous solutions, the heuristic makes search extremely efficient even when there are relatively few solutions that fall within the interval of acceptable weights. In these cases, an algorithm that is very effective for finding a feasible solution to a given constraint satisfaction problem (the “maintained arc consistency” algorithm or MAC) does not find a solution in the same weight interval within a reasonable time when it is run without the heuristic.
Configuration problems often involve large product catalogs, and the given user requests can be met by many different kinds of parts from this catalog. Hence, configuration problems are often weakly constrained and have many solutions. However, many of those solutions may be discarded by the user as long as more interesting solutions are possible. The user often prefers certain choices to others (e.g., a red color for a car to a blue color) or prefers solutions that minimize or maximize certain criteria such as price and quality. In order to provide satisfactory solutions, a configurator needs to address user preferences and user wishes. Another important problem is to provide high-level features to control different reasoning tasks such as solution search, explanation, consistency checking, and reconfiguration. We address those problems by introducing a preference programming system that provides a new paradigm for expressing user preferences and user wishes and provides search strategies in a declarative and unified way, such that they can be embedded in a constraint and rule language. The preference programming approach is completely open and dynamic. In fact, preferences can be assembled from different sources such as business rules, databases, annotations of the object model, or user input. An advanced topic is to elicit preferences from user interactions, especially from explanations of why a user rejects proposed choices. Our preference programming system has successfully been used in different configuration domains such as loan configuration, service configuration, and other problems.
In the automotive industry, the compilation and maintenance of correct product configuration data is a complex task. Our work shows how formal methods can be applied to the validation of such business critical data. Our consistency support tool BIS works on an existing database of Boolean constraints expressing valid configurations and their transformation into manufacturable products. Using a specially modified satisfiability checker with an explanation component, BIS can detect inconsistencies in the constraints set and thus help increase the quality of the product data. BIS also supports manufacturing decisions by calculating the implications of product or production environment changes on the set of required parts. In this paper, we give a comprehensive account of BIS: the formalization of the business processes underlying its construction, the modifications of satisfiability-checking technology we found necessary in this context, and the software technology used to package the product as a client–server information system.
The paper introduces and discusses the notion of decomposition of a configuration problem within the framework of a structured logical approach. The paper describes under which conditions a given configuration problem can be decomposed into a set of noninteracting subproblems and how to exploit such a decomposition, both for improving the performance of the configurator and for supporting interactive configuration. Different kinds of decomposition are considered, but all of them exploit, as much as possible, the explicit representation of the partonomic relations in the language, a KL-One like representation formalism augmented with constraints for expressing complex interrole relations. The paper introduces a notion of boundness among constraints, which is used for formally specifying different types of decomposition. One decomposition strategy aims at singling out the components and subcomponents that are directly related to the constraints put by the user's requirements; the configurator exploits such decomposition by first configuring that portion of the product and then configuring the parts that are not related to the user's requirements. Another decomposition strategy verifies whether the set of constraints for the product to be configured can be split into a set of noninteracting problems. In such a case the configurator solves the configuration problem by splitting the whole search space into a set of smaller search spaces. Different combinations of these two decomposition techniques are considered, and the impact of the decomposition strategies on the performance of the configurator is evaluated via a set of experiments using the configuration of computer systems as a test bed. The results of the experiments show a significant reduction of the computational effort (both in terms of number of backtrackings and in CPU time) when decomposition strategies are used.
Today's economy exhibits a growing trend toward highly specialized solution providers cooperatively offering configurable products and services to their customers. This paradigm shift requires the extension of current standalone configuration technology with capabilities of knowledge sharing and distributed problem solving. In this context a standardized configuration knowledge representation language with formal semantics is needed in order to support knowledge interchange between different configuration environments. Languages such as Ontology Inference Layer (OIL) and DARPA Agent Markup Language (DAML+OIL) are based on such formal semantics (description logic) and are very popular for knowledge representation in the Semantic Web. In this paper we analyze the applicability of those languages with respect to configuration knowledge representation and discuss additional demands on expressivity. For joint configuration problem solving it is necessary to agree on a common problem definition. Therefore, we give a description logic based definition of a configuration problem and show its equivalence with existing consistency-based definitions, thus joining the two major streams in knowledge-based configuration (description logics and predicate logic/constraint based configuration).
The configuration task is commonly defined as composing a complex product from a set of predefined component types while taking into account a set of well-defined restrictions on how components belonging to these types can be combined. Configuration, always a successful artificial intelligence (AI) application area ever since the R1/XCON system of the early 1980s, has recently attracted renewed research interest. This is also demonstrated by an annual series of workshops on configuration that have been held at the AAAI, ECAI, and IJCAI conferences since 1999. Important real-world industrial configuration tasks are encountered in marketing, manufacturing, and design. They usually involve physical products, such as telecommunication switches, computers, elevators, large diesel engines, automation systems, or vehicles (some of which appear as application domains in the articles in this issue), but can also pertain to financial or other services or software.
The main issue when building Information Extraction (IE) systems is how to obtain the knowledge needed to identify relevant information in a document. Most approaches require expert human intervention in many steps of the acquisition process. In this paper we describe ESSENCE, a new method for acquiring IE patterns that significantly reduces the need for human intervention. The method is based on ELA, a specifically designed learning algorithm for acquiring IE patterns without tagged examples. The distinctive features of ESSENCE and ELA are that (1) they permit the automatic acquisition of IE patterns from unrestricted and untagged text representative of the domain, due to (2) their ability to identify regularities around semantically relevant concept-words for the IE task by (3) using non-domain-specific lexical knowledge tools such as WordNet, and (4) restricting the human intervention to defining the task, and validating and typifying the set of IE patterns obtained. Since ESSENCE does not require a corpus annotated with the type of information to be extracted and it uses a general purpose ontology and widely applied syntactic tools, it reduces the expert effort required to build an IE system and therefore also reduces the effort of porting the method to any domain. The results of the application of ESSENCE to the acquisition of IE patterns in an MUC-like task are shown.
Given the lack of word delimiters in written Japanese, word segmentation is generally considered a crucial first step in processing Japanese texts. Typical Japanese segmentation algorithms rely either on a lexicon and syntactic analysis or on pre-segmented data; but these are labor-intensive, and the lexico-syntactic techniques are vulnerable to the unknown word problem. In contrast, we introduce a novel, more robust statistical method utilizing unsegmented training data. Despite its simplicity, the algorithm yields performance on long kanji sequences comparable to and sometimes surpassing that of state-of-the-art morphological analyzers over a variety of error metrics. The algorithm also outperforms another mostly-unsupervised statistical algorithm previously proposed for Chinese. Additionally, we present a two-level annotation scheme for Japanese to incorporate multiple segmentation granularities, and introduce two novel evaluation metrics, both based on the notion of a compatible bracket, that can account for multiple granularities simultaneously.
This paper presents the results of a study on information extraction from unrestricted Turkish text using statistical language processing methods. In languages like English, there is a very small number of possible word forms with a given root word. However, languages like Turkish have very productive agglutinative morphology. Thus, it is an issue to build statistical models for specific tasks using the surface forms of the words, mainly because of the data sparseness problem. In order to alleviate this problem, we used additional syntactic information, i.e. the morphological structure of the words. We have successfully applied statistical methods using both the lexical and morphological information to sentence segmentation, topic segmentation, and name tagging tasks. For sentence segmentation, we have modeled the final inflectional groups of the words and combined it with the lexical model, and decreased the error rate to 4.34%, which is 21% better than the result obtained using only the surface forms of the words. For topic segmentation, stems of the words (especially nouns) have been found to be more effective than using the surface forms of the words and we have achieved 10.90% segmentation error rate on our test set according to the weighted TDT-2 segmentation cost metric. This is 32% better than the word-based baseline model. For name tagging, we used four different information sources to model names. Our first information source is based on the surface forms of the words. Then we combined the contextual cues with the lexical model, and obtained some improvement. After this, we modeled the morphological analyses of the words, and finally we modeled the tag sequence, and reached an F-Measure of 91.56%, accordingto the MUC evaluation criteria. Our results are important in the sense that, using linguistic information, i.e. morphological analyses of the words, and a corpus large enough to train a statistical model significantly improves these basic information extraction tasks for Turkish.
This paper starts by introducing a class of future document authoring systems that will allow authors to specify the content and form of a text+pictures document at a high level of abstraction, while leaving responsibility for linguistic and graphical details to the system. Next, we describe two working prototypes that implement parts of this functionality, based on semantic modeling of the pictures and the text of the document; one of these two, the ILLUSTRATE prototype, is a multimedia extension of previous text authoring systems in the What You See Is What You Meant (WYSIWYM) tradition. The paper concludes with an exploration of the ways in which Multimedia WYSIWYM can be further enhanced, allowing it to approximate the ‘ideal’ systems that were sketched earlier in the paper. Applications of Multimedia WYSIWYM to general-purpose picture retrieval (in the context of the Semantic Web, for example) are also discussed.
This paper describes a large, interactive sound installation that was presented in Oslo during October 2002. The installation, in broad terms, brought the presence of the whole country into the one location through sound, and made the sound available for the public as material to play with or explore in a more structured fashion. The sonic results were streamed to the Internet, together with images from the exterior of the installation. The installation was located at the central train station in an area where thousands of people pass through every day. The curatorial idea was developed by two institutions, as an answer to their missions of providing interesting sonic material and events for the whole country. The idea was given concrete form by three composers, and brought up on a national level through co-arrangement with a large festival of contemporary music. Funding for the installation was provided by both private and public organisations. The installation serves as an example on how a large and complex work of art can be developed through institutional curatorial effort, artistic intentions and activity, and commercial interests. The installation maintained a high degree of artistic integrity while being accessible and attractive for large audiences.