To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter introduces a concrete syntax. It is ideally isomorphic to an abstract syntax that specifies a semantic annotation scheme, while providing a format for representing annotation structures. This format can represent them either in a serialized way from left to right, or in graphic images or tabular forms with linking arrows. Representation formats may vary, depending on kinds of the use of annotation. Human readers, for instance, prefer tabular formats especially for illustrations or demonstrations. For the purposes of merging, comparing, or exchanging various types of annotations or different annotations of the same type, graphs are considered useful. In this chapter, I introduce a graphic annotation format, called GrAF, for linguistic annotation. For the construction of larger corpora, however, there are practical computing reasons to prefer a serialization of annotations. For the serialized representation of annotation structures, this chapter mainly discusses two formats: (i) XML and (ii) pFormat, a predicate-logic-like representation format, which represents annotation structures in a strictly serial (linear) manner by avoiding embedded structures.
In Chapter 9, I introduce eXTimeML, an extended variant of ISO-TimeML, with three extensions. (i) Temporal measure expressions are annotated as part of generalized measure: e.g., 30 hours. (ii) Quantified temporal expressions are annotated as part of generalized quantification: e.g., every day. (iii) Adjectives and adverbs are annotated as modifiers of nouns and verbs, respectively: e.g., daily, never. Temporal measures and temporal quantifiers are treated as part of generalized measures and quantifiers. I then illustrate how the representation language of ABS applies to each of these extensions in eXTimeML by deriving appropriate (logical) semantic forms from the well-formed annotation structures of temporal measures, quantifiers, and modifiers. Semantic forms are then interpreted with respect to admissible models, constrained by the formal definitions of logical predicates such as twice or three thousand.
In this chapter, I introduce the four types of category path: static, dynamic, oriented, and projected, while characterizing them for the interpretation of path-related information in language. Static paths are finite paths with two endpoints, but neither of the endpoints is identified intrinsically as the start or the end of a path. Dynamic paths are trajectories caused by motions. Oriented paths are simply directed to some goals and may not reach the goals. Projected paths are virtual or intended, which are not actually traversed but devised in the mind of a human or rational agent. To discuss their characteristic features in formal terms, I introduce Pustejovsky and Yocum’s (2013) axioms on motions and derive a corollary based on them. This corollary relates a mover to an event-path. I then show how the movement link (moveLink) is reformulated to link a mover to a motion-triggered event-path with the relation traverses. I also analyze the notions of orientation and projection with respect to the frames of reference, either absolute, relative, or intrinsic, while showing how these frames apply to the annotation and interpretation of oriented or projected paths.
This chapter works toward the specification of a dynamic annotation scheme, called dSpace. It extends the scope of ISO-Space to the domains of space and time over motions by being amalgamated with ISO-TimeML. In dSpace, various types of temporal relations interact with spatial relations. The temporal dimension characterizes various types of paths and motions anchored to each location on the paths; dSpace also generalizes the notion of paths by classifying them into four types: static, dynamic, projected, and oriented, while introducing a relational link, called pathLink, over paths with various relation types such as meet and deviate.
Coreference resolution is an important part of natural language processing used in machine translation, semantic search, and various other information retrieval and understanding systems. One of the challenges in this field is an evaluation of resolution approaches. There are many different metrics proposed, but most of them rely on certain assumptions, like equivalence between different mentions of the same discourse-world entity, and do not account for overrepresentation of certain types of coreferences present in the evaluation data. In this paper, a new coreference evaluation strategy that focuses on linguistic and semantic information is presented that can address some of these shortcomings. Evaluation model was developed in the broader context of developing coreference resolution capabilities for Lithuanian language; therefore, the experiment was also carried out using Lithuanian language resources, but the proposed evaluation strategy is not language-dependent.
The recent progress of deep learning techniques has produced models capable of achieving high scores on traditional Natural Language Inference (NLI) datasets. To understand the generalization limits of these powerful models, an increasing number of adversarial evaluation schemes have appeared. These works use a similar evaluation method: they construct a new NLI test set based on sentences with known logic and semantic properties (the adversarial set), train a model on a benchmark NLI dataset, and evaluate it in the new set. Poor performance on the adversarial set is identified as a model limitation. The problem with this evaluation procedure is that it may only indicate a sampling problem. A machine learning model can perform poorly on a new test set because the text patterns presented in the adversarial set are not well represented in the training sample. To address this problem, we present a new evaluation method, the Invariance under Equivalence test (IE test). The IE test trains a model with sufficient adversarial examples and checks the model’s performance on two equivalent datasets. As a case study, we apply the IE test to the state-of-the-art NLI models using synonym substitution as the form of adversarial examples. The experiment shows that, despite their high predictive power, these models usually produce different inference outputs for equivalent inputs, and, more importantly, this deficiency cannot be solved by adding adversarial observations in the training data.
Named entity recognition (NER) aims to identify mentions of named entities in an unstructured text and classify them into predefined named entity classes. While deep learning-based pre-trained language models help to achieve good predictive performances in NER, many domain-specific NER applications still call for a substantial amount of labeled data. Active learning (AL), a general framework for the label acquisition problem, has been used for NER tasks to minimize the annotation cost without sacrificing model performance. However, the heavily imbalanced class distribution of tokens introduces challenges in designing effective AL querying methods for NER. We propose several AL sentence query evaluation functions that pay more attention to potential positive tokens and evaluate these proposed functions with both sentence-based and token-based cost evaluation strategies. We also propose a better data-driven normalization approach to penalize sentences that are too long or too short. Our experiments on three datasets from different domains reveal that the proposed approach reduces the number of annotated tokens while achieving better or comparable prediction performance with conventional methods.
Recognition skills refer to the ability of a practitioner to rapidly size up a situation and know what actions to take. We describe approaches to training recognition skills through the lens of naturalistic decision-making. Specifically, we link the design of training to key theories and constructs, including the recognition-primed decision model, which describes expert decision-making; the data-frame model of sensemaking, which describes how people make sense of a situation and act; and macrocognition, which encompasses complex cognitive activities such as problem solving, coordination, and anticipation. This chapter also describes the components of recognition skills to be trained and defines scenario-based training.