To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Edited by
Jan Van Eijck, Centre for Mathematics and Computer Science, Amsterdam,Vincent Van Oostrom, Universiteit Utrecht, The Netherlands,Albert Visser, Universiteit Utrecht, The Netherlands
Existing software systems for automated essay scoring can provide NLP researchers with opportunities to test certain theoretical hypotheses, including some derived from Centering Theory. In this study we employ the Educational Testing Service's e-rater essay scoring system to examine whether local discourse coherence, as defined by a measure of Centering Theory's Rough-Shift transitions, might be a significant contributor to the evaluation of essays. Rough-Shifts within students' paragraphs often occur when topics are short-lived and unconnected, and are therefore indicative of poor topic development. We show that adding the Rough-Shift based metric to the system improves its performance significantly, better approximating human scores and providing the capability of valuable instructional feedback to the student. These results indicate that Rough-Shifts do indeed capture a source of incoherence, one that has not been closely examined in the Centering literature. They not only justify Rough-Shifts as a valid transition type, but they also support the original formulation of Centering as a measure of discourse continuity even in pronominal-free text. Finally, our study design, which used a combination of automated and manual NLP techniques, highlights specific areas of NLP research and development needed for engineering practical applications.
The topic of mood and modality (MOD) is a difficult aspect of language description because, among other reasons, the inventory of modal meanings is not stable across languages, moods do not map neatly from one language to another, modality may be realised morphologically or by free-standing words, and modality interacts in complex ways with other modules of the grammar, like tense and aspect. Describing MOD is especially difficult if one attempts to develop a unified approach that not only provides cross-linguistic coverage, but is also useful in practical natural language processing systems. This article discusses an approach to MOD that was developed for and implemented in the Boas Knowledge-Elicitation (KE) system. Boas elicits knowledge about any language, L, from an informant who need not be a trained linguist. That knowledge then serves as the static resources for an L-to-English translation system. The KE methodology used throughout Boas is driven by a resident inventory of parameters, value sets, and means of their realisation for a wide range of language phenomena. MOD is one of those parameters, whose values are the inventory of attested and not yet attested moods (e.g. indicative, conditional, imperative), and whose realisations include flective morphology, agglutinating morphology, isolating morphology, words, phrases and constructions. Developing the MOD elicitation procedures for Boas amounted to wedding the extensive theoretical and descriptive research on MOD with practical approaches to guiding an untrained informant through this non-trivial task. We believe that our experience in building the MOD module of Boas offers insights not only into cross-linguistic aspects of MOD that have not previously been detailed in the natural language processing literature, but also into KE methodologies that could be applied more broadly.
With this first issue of Volume 10 of Natural Language Engineering we would like to take this opportunity to reflect on the development of the journal, especially over the last year or two, and to look forward to a number of future developments.
If you wanted to make some money out of natural language processing, an appropriate strategy might be to identify an area of technology that was relatively mature—one where the more fundamental technical problems had been resolved through a significant amount of research activity—and then identify potential applications for that technology.
This paper presents modifications to a standard probabilistic context-free grammar that enable a predictive parser to avoid garden pathing without resorting to any ad-hoc heuristic repair. The resulting parser is shown to apply efficiently to both newspaper text and telephone conversations with complete coverage and excellent accuracy. The distribution over trees is peaked enough to allow the parser to find parses efficiently, even with the much larger search space resulting from overgeneration. Empirical results are provided for both Wall St. Journal and Switchboard test corpora.
style: 1b. the shadow-producing pin of a sundial. 2c. -the custom or plan followed in spelling, capitalization, punctuation, and typographic arrangement and display.
—Webster's New Collegiate Dictionary
The syntax of a programming language tells you what code it is possible to write—what machines will understand. Style tells you what you ought to write—what humans reading the code will understand. Code written with a consistent, simple style is maintainable, robust, and contains fewer bugs. Code written with no regard to style contains more bugs, and may simply be thrown away and rewritten rather than maintained.
Attending to style is particularly important when developing as a team. Consistent style facilitates communication, because it enables team members to read and understand each other's work more easily. In our experience, the value of consistent programming style grows exponentially with the number of people working with the code.
Our favorite style guides are classics: Strunk and White's The Elements of Style and Kernighan and Plauger's The Elements of Programming Style. These small books work because they are simple: a list of rules, each containing a brief explanation and examples of correct, and sometimes incorrect, use. We followed the same pattern in this book. This simple treatment—a series of rules—enabled us to keep this book short and easy to understand.
Some of the advice that you read here may seem obvious to you, particularly if you've been writing code for a long time. Others may disagree with some of our specific suggestions about formatting or indentation.
A complete treatment of programming principles and software design is clearly beyond the scope of this book. However, this chapter includes some core principles that we have found to be central to good software engineering.
Engineering
Do Not be Afraid to Do Engineering
The ultimate goal of professional software development is to create something useful—an engineering task much more than a scientific one. (Science is more immediately concerned with understanding the world around us, which is admittedly necessary, but not sufficient, for engineering.)
Resist the temptation to write code to model scientific realities that include all theoretical possibilities. It is not a “hack” to write code that has practical limitations if you are confident those limits do not affect the utility of the resulting system.
For example, imagine you need a data structure for tree traversal and choose to write a stack. The stack needs to hold at least as many items as the maximum depth of any tree. Now suppose that there is no theoretical limit to how deep one of these trees can be. You might be tempted to create a stack that can grow to an arbitrary size by reallocating memory and copying its items as needed. On the other hand, your team's understanding of the application may be such that in your wildest imagination you'd be amazed to see a tree with depth greater than 10. If so, the better choice would be to create a fixed-length stack with a maximum of, say, 50 elements.