We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Opinions from social media are increasingly used by individuals and organizations for making purchase decisions and making choices at elections and for marketing and product design. Positive opinions often mean profits and fames for businesses and individuals. Unfortunately, that gives imposters a strong incentive to game the system by posting fake reviews or opinions to promote or to discredit some target products, services, organizations, individuals, and even ideas without disclosing their true intentions, or the person or organization for which they are secretly working. Such individuals are called opinion spammers and their activities are called opinion spamming (Jindal and Liu, 2007, 2008). An opinion spammer is also called a shill, a plant, or a stooge in the social media environment, and opinion spamming is also called shilling or astroturfing. Opinion spamming can not only hurt consumers and damage businesses, but also warp opinions and mobilize masses into positions counter to legal or ethical mores. This can be frightening, especially when spamming is about opinions on social and political issues. It is safe to say that as opinions in social media are increasingly used in practice, opinion spamming is becoming more and more sophisticated, which presents a major challenge for its detection. However, such offenses must be detected to ensure that social media continue to be a trusted source of public opinions, rather than being full of fakes, lies, and deceptions.
As discussed in Chapter 3, document-level sentiment classification is too coarse for practical applications. We now move to the sentence level and look at methods that classify sentiment expressed in each sentence. The goal is to classify each sentence in an opinion document (e.g., a product review) as expressing a positive, negative, or neutral opinion. This gets us closer to real-life sentiment analysis applications, which require opinions about sentiment targets. Sentence-level classification is about the same as document-level classification because sentences can be regarded as short documents. Sentence-level classification, however, is often harder because the information contained in a typical sentence is much less than that contained in a typical document owing to their length difference. Most document-level sentiment classification research papers ignore the neutral class mainly because it is more difficult to perform three-class classification (positive, neutral, and negative) accurately. However, for sentence-level classification, the neutral class cannot be ignored because an opinion document can contain many sentences that express no opinion or sentiment. Note that neutral opinion often means no opinion or sentiment expressed.
Sentiment analysis is the computational study of people's opinions, sentiments, emotions, moods, and attitudes. This fascinating problem offers numerous research challenges, but promises insight useful to anyone interested in opinion analysis and social media analysis. This comprehensive introduction to the topic takes a natural-language-processing point of view to help readers understand the underlying structure of the problem and the language constructs commonly used to express opinions, sentiments, and emotions. The book covers core areas of sentiment analysis and also includes related topics such as debate analysis, intention mining, and fake-opinion detection. It will be a valuable resource for researchers and practitioners in natural language processing, computer science, management sciences, and the social sciences.In addition to traditional computational methods, this second edition includes recent deep learning methods to analyze and summarize sentiments and opinions, and also new material on emotion and mood analysis techniques, emotion-enhanced dialogues, and multimodal emotion analysis.
Resource-limited and morphologically rich languages pose many challenges to natural language processing tasks. Their highly inflected surface forms inflate the vocabulary size and increase sparsity in an already scarce data situation. In this article, we present an unsupervised learning approach to vocabulary reduction through morphological segmentation. We demonstrate its value in the context of machine translation for dialectal Arabic (DA), the primarily spoken, orthographically unstandardized, morphologically rich and yet resource poor variants of Standard Arabic. Our approach exploits the existence of monolingual and parallel data. We show comparable performance to state-of-the-art supervised methods for DA segmentation.
In machine-learning applications, data selection is of crucial importance if good runtime performance is to be achieved. In a scenario where the test set is accessible when the model is being built, training instances can be selected so they are the most relevant for the test set. Feature Decay Algorithms (FDA) are a technique for data selection that has demonstrated excellent performance in a number of tasks. This method maximizes the diversity of the n-grams in the training set by devaluing those ones that have already been included. We focus on this method to undertake deeper research on how to select better training data instances. We give an overview of FDA and propose improvements in terms of speed and quality. Using German-to-English parallel data, first we create a novel approach that decreases the execution time of FDA when multiple computation units are available. In addition, we obtain improvements on translation quality by extending FDA using information from the parallel corpus that is generally ignored.
An entity mention in text such as “Washington” may correspond to many different named entities such as the city “Washington D.C.” or the newspaper “Washington Post.” The goal of named entity disambiguation (NED) is to identify the mentioned named entity correctly among all possible candidates. If the type (e.g., location or person) of a mentioned entity can be correctly predicted from the context, it may increase the chance of selecting the right candidate by assigning low probability to the unlikely ones. This paper proposes cluster-based mention typing for NED. The aim of mention typing is to predict the type of a given mention based on its context. Generally, manually curated type taxonomies such as Wikipedia categories are used. We introduce cluster-based mention typing, where named entities are clustered based on their contextual similarities and the cluster ids are assigned as types. The hyperlinked mentions and their context in Wikipedia are used in order to obtain these cluster-based types. Then, mention typing models are trained on these mentions, which have been labeled with their cluster-based types through distant supervision. At the NED phase, first the cluster-based types of a given mention are predicted and then, these types are used as features in a ranking model to select the best entity among the candidates. We represent entities at multiple contextual levels and obtain different clusterings (and thus typing models) based on each level. As each clustering breaks the entity space differently, mention typing based on each clustering discriminates the mention differently. When predictions from all typing models are used together, our system achieves better or comparable results based on randomization tests with respect to the state-of-the-art levels on four defacto test sets.
Benchmarks can be a useful step toward the goals of the field (when the benchmark is on the critical path), as demonstrated by the GLUE benchmark, and deep nets such as BERT and ERNIE. The case for other benchmarks such as MUSE and WN18RR is less well established. Hopefully, these benchmarks are on a critical path toward progress on bilingual lexicon induction (BLI) and knowledge graph completion (KGC). Many KGC algorithms have been proposed such as Trans[DEHRM], but it remains to be seen how this work improves WordNet coverage. Given how much work is based on these benchmarks, the literature should have more to say than it does about the connection between benchmarks and goals. Is optimizing P@10 on WN18RR likely to produce more complete knowledge graphs? Is MUSE likely to improve Machine Translation?
The rise of artificial intelligence is mainly associated with software-based robotic systems such as mobile robots, unmanned aerial vehicles, and increasingly, semi-autonomous cars. However, the large gap between the algorithmic and physical worlds leaves existing systems still far from the vision of intelligent and human-friendly robots capable of interacting with and manipulating our human-centered world. The emerging discipline of machine intelligence (MI), unifying robotics and artificial intelligence, aims for trustworthy, embodiment-aware artificial intelligence that is conscious both of itself and its surroundings, adapting its systems to the interactive body it is controlling. The integration of AI and robotics with control, perception and machine-learning systems is crucial if these truly autonomous intelligent systems are to become a reality in our daily lives. Following a review of the history of machine intelligence dating back to its origins in the twelfth century, this chapter discusses the current state of robotics and AI, reviews key systems and modern research directions, outlines remaining challenges and envisages a future of man and machine that is yet to be built.
As robots and intangible autonomous systems increasingly interact with humans, we wonder who should be held accountable when things go wrong. This chapter examines the extra-contractual liability of users, keepers and operators for wrongs committed by autonomous systems. It explores how the concept of ‘wrong’ can be defined with respect to autonomous systems and what standard of care can reasonably be expected of them. The chapter also looks at existing accountability rules for things and people in various legal orders and explains how they can be applied to autonomous systems. From there, various approaches to a new liability regime are explored. Neither product liability nor the granting of a legal persona to robots is an adequate response to the current challenges. Rather, both the keeper and the operator of the autonomous system should be held strictly liable for any wrong committed, opening up the possibility of privileges being granted to the operators of machine-learning systems that learn from data provided by the system’s users.
The possible emulation of human creativity by various models of artificial intelligence systems is discussed in this chapter. In some instances, the degree of originality of creations using algorithms may surprise even human beings themselves. For this reason, copyright protection of ‘works’ created by autonomous systems is proposed, which would take account of both the fundamental contributions of computer science researchers and the investment in human and economic resources that give rise to these ‘works’.
Rapid progress in AI and robotics is challenging the traditional boundaries of law. Algorithms are widely employed to make decisions that have an increasingly far-reaching impact on individuals and society, potentially leading to manipulation, biases, censorship, social discrimination, violations of privacy and property rights, and more. This has sparked a global debate on how to regulate AI and robotics.