To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Ascription of an intention to an agent is especially important in law. In criminal law the intent to commit a criminal act, called mens rea, refers to the guilty mind, the key element needed to prosecute a defendant for a crime. For example, in order to prove that a defendant has committed the crime of theft of an object, it needs to be established that the defendant had the intention never to return the object to its owner. Studying examples of how intention is proved in law is an important resource for giving us clues on how reasoning to an intention should be carried out. Intention is also fundamentally important in ethical reasoning where there are problems about how the end can justify the means.
This chapter introduces the notion of inference to the best explanation, often called abductive reasoning, and presents recent research on evidential reasoning that uses the concept of a so-called script or story as a central component. The introduction of these two argumentation tools show how they are helpful in moving forward toward a solution to the longstanding problem of analyzing how practical reasoning from circumstantial evidence can be used to support or undermine a hypothesis that an agent has a particular intention. Legal examples are used to show that even though ascribing an intention to an agent is an evaluation procedure that combines argumentation and explanation, it can be rationally carried out by using a practical reasoning model that accounts for the weighing of factual evidence on both sides of a disputed case.
The examples studied in this chapter will involve cases where practical reasoning is used as the glue that combines argumentation with explanation. Section 1 considers a simple example of a message on the Internet advising how to mount a flagpole bracket to a house. The example tells the reader how to take the required steps to attach a bracket to the house in order to mount a flagpole so that the reader can show his patriotism by displaying a flag on his house. The example text is clearly an instance of practical reasoning. The author of the ad presumes that the reader has a goal, and he tells the reader how to fulfill that goal by carrying out a sequence of actions.
Logic Forms (LF) are simple, first-order logic knowledge representations of natural language sentences. Each noun, verb, adjective, adverb, pronoun, preposition and conjunction generates a predicate. LF systems usually identify the syntactic function by means of syntactic rules but this approach is difficult to apply to languages with a high syntax flexibility and ambiguity, for example, Spanish. In this study, we present a mixed method for the derivation of the LF of sentences in Spanish that allows the combination of hard-coded rules and a classifier inspired on semantic role labeling. Thus, the main novelty of our proposal is the way the classifier is applied to generate the predicates of the verbs, while rules are used to translate the rest of the predicates, which are more straightforward and unambiguous than the verbal ones. The proposed mixed system uses a supervised classifier to integrate syntactic and semantic information in order to help overcome the inherent ambiguity of Spanish syntax. This task is accomplished in a similar way to the semantic role labeling task. We use properties extracted from the AnCora-ES corpus in order to train a classifier. A rule-based system is used in order to obtain the LF from the rest of the phrase. The rules are obtained by exploring the syntactic tree of the phrase and encoding the syntactic production rules. The LF algorithm has been evaluated by using shallow parsing with some straightforward Spanish phrases. The verb argument labeling task achieves 84% precision and the proposed mixed LFi method surpasses 11% a system based only on rules.
We investigate the problem of improving performance in distributional word similarity systems trained on sparse data, focusing on a family of similarity functions we call Dice-family functions (Dice 1945Ecology26(3): 297–302), including the similarity function introduced in Lin (1998Proceedings of the 15th International Conference on Machine Learning, 296–304), and Curran (2004 PhD thesis, University of Edinburgh. College of Science and Engineering. School of Informatics), as well as a generalized version of Dice Coefficient used in data mining applications (Strehl 2000, 55). We propose a generalization of the Dice-family functions which uses a weight parameter α to make the similarity functions asymmetric. We show that this generalized family of functions (α systems) all belong to the class of asymmetric models first proposed in Tversky (1977Psychological Review84: 327–352), and in a multi-task evaluation of ten word similarity systems, we show that α systems have the best performance across word ranks. In particular, we show that α-parameterization substantially improves the correlations of all Dice-family functions with human judgements on three words sets, including the Miller–Charles/Rubenstein Goodenough word set (Miller and Charles 1991Language and Cognitive Processes6(1): 1–28; Rubenstein and Goodenough 1965Communications of the ACM8: 627–633).
‘Deep-syntactic’ dependency structures that capture the argumentative, attributive and coordinative relations between full words of a sentence have a great potential for a number of NLP-applications. The abstraction degree of these structures is in between the output of a syntactic dependency parser (connected trees defined over all words of a sentence and language-specific grammatical functions) and the output of a semantic parser (forests of trees defined over individual lexemes or phrasal chunks and abstract semantic role labels which capture the frame structures of predicative elements and drop all attributive and coordinative dependencies). We propose a parser that provides deep-syntactic structures. The parser has been tested on Spanish, English and Chinese.
This article presents silhouette–attraction (Sil–Att), a simple and effective method for text clustering, which is based on two main concepts: the silhouette coefficient and the idea of attraction. The combination of both principles allows us to obtain a general technique that can be used either as a boosting method, which improves results of other clustering algorithms, or as an independent clustering algorithm. The experimental work shows that Sil–Att is able to obtain high-quality results on text corpora with very different characteristics. Furthermore, its stable performance on all the considered corpora is indicative that it is a very robust method. This is a very interesting positive aspect of Sil–Att with respect to the other algorithms used in the experiments, whose performances heavily depend on specific characteristics of the corpora being considered.
With this comprehensive guide you will learn how to apply Bayesian machine learning techniques systematically to solve various problems in speech and language processing. A range of statistical models is detailed, from hidden Markov models to Gaussian mixture models, n-gram models and latent topic models, along with applications including automatic speech recognition, speaker verification, and information retrieval. Approximate Bayesian inferences based on MAP, Evidence, Asymptotic, VB, and MCMC approximations are provided as well as full derivations of calculations, useful notations, formulas, and rules. The authors address the difficulties of straightforward applications and provide detailed examples and case studies to demonstrate how you can successfully use practical Bayesian inference methods to improve the performance of information systems. This is an invaluable resource for students, researchers, and industry practitioners working in machine learning, signal processing, and speech and language processing.
We propose a language-independent word normalisation method and exemplify it on modernising historical Slovene words. Our method relies on character-level statistical machine translation (CSMT) and uses only shallow knowledge. We present relevant data on historical Slovene, consisting of two (partially) manually annotated corpora and the lexicons derived from these corpora, containing historical word–modern word pairs. The two lexicons are disjoint, with one serving as the training set containing 40,000 entries, and the other as a test set with 20,000 entries. The data spans the years 1750–1900, and the lexicons are split into fifty-year slices, with all the experiments carried out separately on the three time periods. We perform two sets of experiments. In the first one – a supervised setting – we build a CSMT system using the lexicon of word pairs as training data. In the second one – an unsupervised setting – we simulate a scenario in which word pairs are not available. We propose a two-step method where we first extract a noisy list of word pairs by matching historical words with cognate modern words, and then train a CSMT system on these pairs. In both sets of experiments, we also optionally make use of a lexicon of modern words to filter the modernisation hypotheses. While we show that both methods produce significantly better results than the baselines, their accuracy and which method works best strongly correlates with the age of the texts, meaning that the choice of the best method will depend on the properties of the historical language which is to be modernised. As an extrinsic evaluation, we also compare the quality of part-of-speech tagging and lemmatisation directly on historical text and on its modernised words. We show that, depending on the age of the text, annotation on modernised words also produces significantly better results than annotation on the original text.
With NLP services now widely available via cloud APIs, tasks like named entity recognition and sentiment analysis are virtually commodities. We look at what's on offer, and make some suggestions for how to get rich.
Ontologising is the task of associating terms, in text, with an ontological representation of their meaning, in an ontology. In this article, we revisit algorithms that have previously been used to ontologise the arguments of semantic relations in a relationless thesaurus, resulting in a wordnet. For increased flexibility, the algorithms do not use the extraction context when selecting the most adequate synsets for each term argument. Instead, they exploit a term-based lexical network which can be established by knowledge extracted automatically, or obtained from the resource the relations are being ontologised to. On the latter idea, we made several experiments to conclude that the algorithms can be used both for wordnet creation and for their enrichment. Besides describing the algorithms with some detail, the aforementioned experiments, which target both English and Portuguese, and their results are reported and discussed.
This chapter focuses on basic statistical models (Gaussian mixture models (GMM), hidden Markov models (HMM),n–gram models and latent topic models), which are widely used in speech and language processing. These are well-known generative models, and these probabilistic models can generate speech and language features based on their likelihood functions. We also provide parameter-learning schemes based on maximum likelihood (ML) estimation which is derived according to the expectation and maximization (EM) algorithm (Dempster et al. 1976). Basically, the following chapters extend these statistical models from ML schemes to Bayesian schemes. These models are fundamental for speech and language processing.We specifically build an automatic speech recognition (ASR) system based on these models and extend them to deal with different problems in speaker clustering, speech verification, speech separation and other natural language processing systems.
In this chapter, Section 3.1 first introduces the probabilistic approach to ASR, which aims to find the most likely word sequence W corresponding to the input speech feature vectors O. Bayes decision theory provides a theoretical solution to build up a speech recognition system based on the posterior distribution of the word sequence p(W|O) given speech feature vectors O. Then the Bayes theorem decomposes the problem based on p(W|O) into two problems based on two generative models of speech features p(O|W) (acoustic model) and language features p(W) (language model), respectively. Therefore, the Bayes theorem changes the original problem to these two independent generative model problems.
Next, Section 3.2 introduces the HMM with the corresponding likelihood function as a generative model of speech features. The section first describes the discrete HMM, which has a multinomial distribution as a state observation distribution, and Section 3.2.4 introduces the GMM as a state observation distribution of the continuous density HMM for acoustic modeling. The GMM by itself is also used as a powerful statistical model for other speech processing approaches in the later chapters. Section 3.3 provides the basic algorithms of forward–backward and Viterbi algorithms. In Section 3.4, ML estimation of HMM parameters is derived according to the EM algorithm to deal with latent variables included in the HMM efficiently. Thus, we provide the conventional ML treatment of basic statistical models for acoustic models based on the HMM.
Maximum a-posteriori (MAP) approximation is a well-known and widely used approximation for Bayesian inference. The approximation covers all variables including model parameters Θ, latent variables Z, and classification categories C (word sequence W in the automatic speech recognition case). For example, the Viterbi algorithm (arg maxZp(Z|O)) in the continuous density hidden Markov model (CDHMM), as discussed in Section 3.3.2, corresponds to the MAP approximation of latent variables, while the forward–backward algorithm, as discussed in Section 3.3.1, corresponds to an exact inference of these variables. As another example, the MAP decision rule (arg maxCp(C|O)) in Eq. (3.2) also corresponds to the MAP approximation of inferring the posterior distribution of classification categories. Since the final goal of automatic speech recognition is to output the word sequence, the MAP approximation of the word sequence matches the final goal. Thus, the MAP approximation can be applied to all probabilistic variables in speech and language processing as an essential technique.
This chapter starts to discuss the MAP approximation of Bayesian inference in detail, but limits the discussion only to model parameters Θ in Section 4.1. In the MAP approximation for model parameters, the prior distributions work as a regularization of these parameters, which makes the estimation of the parameters more robust than that of the maximum likelihood (ML) approach. Another interesting property of the MAP approximation for model parameters is that we can easily involve the inference of latent variables by extending the EM algorithm from ML to MAP estimation. Section 4.2 describes the general EM algorithm with the MAP approximation by following the ML-based EM algorithm, as discussed in Section 3.4. Based on the general MAP–EM algorithm, Section 4.3 provides MAP–EM solutions for CDHMM parameters, and introduces the well-known applications based on speaker adaptation. Section 4.5 describes the parameter smoothing method in discriminative training of the CDHMM, which actually corresponds to the MAP solution for discriminative parameter estimation. Section 4.6 focuses on the MAP estimation of GMM parameters, which is a subset of the MAP estimation of CDHMM parameters. It is used to construct speaker GMMs that are used for speaker verification. Section 4.7 provides an MAP solution of n –gram parameters that leads to one instance of interpolation smoothing, as discussed in Section 3.6.2. Finally, Section 4.8 deals with the adaptive MAP estimation of latent topic model parameters.