To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Operating system command languages assist the user in executing commands for a significant number of common everyday tasks. On the other hand, the introduction of textual command languages for robots has provided the opportunity to perform some important functions that leadthrough programming cannot readily accomplish. However, such command languages assume the user to be expert enough to carry out a specific task in these application domains. On the contrary, a natural language interface to such command languages, apart from being able to be integrated into a future speech interface, can facilitate and broaden the use of these command languages to a larger audience. In this paper, advanced techniques are presented for an adaptive natural language interface that can (a) be portable to a large range of command languages, (b) handle even complex commands thanks to an embedded linguistic parser, and (c) be expandable and customizable by providing the casual user with the opportunity to specify some types of new words as well as the system developer with the ability to introduce new tasks in these application domains. Finally, to demonstrate the above techniques in practice, an example of their application to a Greek natural language interface to the MS-DOS operating system is given.
Words unknown to the lexicon present a substantial problem to part-of-speech tagging. In this paper we present a technique for fully unsupervised acquisition of rules which guess possible parts of speech for unknown words. This technique does not require specially prepared training data, and uses instead the lexicon supplied with a tagger and word frequencies collected from a raw corpus. Three complimentary sets of word-guessing rules are statistically induced: prefix morphological rules, suffix morphological rules and ending guessing rules. The acquisition process is strongly associated with guessing-rule evaluation methodology which is solely dedicated to the performance of part-of-speech guessers. Using the proposed technique a guessing-rule induction experiment was performed on the Brown Corpus data and rule-sets, with a highly competitive performance, were produced and compared with the state-of-the-art.To evaluate the impact of the word-guessing component on the overall tagging performance, it was integrated into a stochastic and a rule-based tagger and applied to texts with unknown words.
This paper describes NL-OOPS, a CASE tool that supports requirements analysis by generating object oriented models from natural language requirements documents. The full natural language analysis is obtained using as a core system the Natural Language Processing System LOLITA. The object oriented analysis module implements an algorithm for the extraction of the objects and their associations for use in creating object models.
In this article, we describe AIMS (Assisted Indexing at Mississippi State), a system intended to aid human document analysts in the assignment of indexes to physical chemistry journal articles. The two major components of AIMS are a natural language processing (NLP) component and an index generation (IG) component. We provide an overview of what each of these components does and how it works. We also present the results of a recent evaluation of our system in terms of recall and precision. The recall rate is the proportion of the ‘correct’ indexes (i.e. those produced by human document analysts) generated by AIMS. The precision rate is the proportion of the generated indexes that is correct. Finally, we describe some of the future work planned for this project.
Recently, most part-of-speech tagging approaches, such as rule-based, probabilistic and neural network approaches, have shown very promising results. In this paper, we are particularly interested in probabilistic approaches, which usually require lots of training data to get reliable probabilities. We alleviate such a restriction of probabilistic approaches by introducing a fuzzy network model to provide a method for estimating more reliable parameters of a model under a small amount of training data. Experiments with the Brown corpus show that the performance of the fuzzy network model is much better than that of the hidden Markov model under a limited amount of training data.
We describe new applications of the theory of automata to natural language processing: the representation of very large scale dictionaries and the indexation of natural language texts. They are based on new algorithms that we introduce and describe in detail. In particular, we give pseudocodes for the determinisation of string to string transducers, the deterministic union of p-subsequential string to string transducers, and the indexation by automata. We report on several experiments illustrating the applications.
This paper addresses the problem of distribution of words and phrases in text, a problem of great general interest and of importance for many practical applications. The existing models for word distribution present observed sequences of words in text documents as an outcome of some stochastic processes; the corresponding distributions of numbers of word occurrences in the documents are modelled as mixtures of Poisson distributions whose parameter values are fitted to the data. We pursue a linguistically motivated approach to statistical language modelling and use observable text characteristics as model parameters. Multi-word technical terms, intrinsically content entities, are chosen for experimentation. Their occurrence and the occurrence dynamics are investigated using a 100-million word data collection consisting of a variety of about 13,000 technical documents. The derivation of models describing word distribution in text is based on a linguistic interpretation of the process of text formation, with the probabilities of word occurrence being functions of observable and linguistically meaningful text characteristics. The adequacy of the proposed models for the description of actually observed distributions of words and phrases in text is confirmed experimentally. The paper has two focuses: one is modelling of the distributions of content words and phrases among different documents; and another is word occurrence dynamics within documents and estimation of corresponding probabilities. Accordingly, among the application areas for the new modelling paradigm are information retrieval and speech recognition.
We discuss the random generation of strings using the grammatical formalism AGFL. This formalism consists of context-free grammars extended with a parameter mechanism, where the parameters range over a finite domain. Our approach consists in static analysis of the combinations of parameter values with which derivations can be constructed. After this analysis, generation of sentences can be performed without backtracking.
All systems developers approach the development task with a number of explicit and implicit assumptions about, for example, the nature of human organizations, the nature of the design task, the value of technology, and what is expected of them. As was noted in chapter 2, these assumptions play a central role in guiding the information systems development process. They guide not only the definition of object systems, but also the preferred approach to inquiry, i.e. how the developers improve their understanding and knowledge about them. The assumptions can either be held by the system developers or be embedded in their preferred development approach. In either case they affect the designed and implemented system.
But in order to understand the relationship between assumptions and development approaches we need to elaborate on the notion of ‘paradigm’ and how it applies to ISD. An exploration of the philosophical assumptions underlying different methodologies and their tools is a prerequisite for a better understanding of the influence of philosophical attitudes on the practice of ISD. Groups of related assumptions about reality and knowledge are at the core of research paradigms. By introducing a general classification of the assumptions that characterize alternative research paradigms, this chapter provides the philosophical basis for the analysis of ISD and data modeling in the subsequent chapters of this book.
The purpose of this chapter is to look at the nature and kinds of philosophical assumptions that are made in the literature on information systems development and data modeling.
It is a truism to say that computers have become ubiquitous in today's organizations. Since their application in administrative data processing in the mid-1950s, they have become one of the key instruments for improving the formal information processing activities of organizations. In less than four decades, computer-based information systems (IS) have evolved from supporting back office, already formalized, systems such as payroll, to penetrating the entire organization. New applications and technologies have emerged with great fanfare, and the enthusiasm for information systems continues to run high. Indeed, many enthusiasts conceive of information technology as the primary vehicle for organizational problem-solvers, increasing an organization's capacity to cope with external and internal complexity and improve its performance. Nor is there any doubt that information systems will play an even more vital role in tomorrow's organization.
The development of these information systems has received considerable attention in both the popular and academic literature. New methods for designing systems, new approaches for analysis, new strategies for implementing the developed systems, and the like, have proliferated over the past 30 years. Yet, a majority of information systems design approaches conceive of information systems development (ISD) with the assumption that they are technical systems with social consequences. This leads one to focus on IS design problems as problems of technical complexity. Proponents of this view assume that IS development problems can largely be resolved by more sophisticated technical solutions (tools, models, methods and principles).