We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Nowadays everything revolves around digital data. They are, however, difficult to capture in legal terms due to their great variety. They may be either valuable goods or completely useless. They may be regarded as syntactic or semantic. However, it is the particularly sensitive data protected by data protection law that are highly valuable and interesting for data-trading, big-data and artificial-intelligence applications in the European data market. The European legislator appears to favour both a high level of protection of personal data, including the principle of ‘data minimisation’, and a free flow of data. The GDPR includes some free-flow elements, but especially legislation on trading and usage of non-personal data is currently under discussion. The European legislator faces key challenges regarding the (partly) conflicting objectives reflected in data protection law and data economic law. This contribution assesses the current state of legal discussions and legislative initiatives at the European level.
Machine-learning algorithms are used to profile individuals and make decisions based on them. The European Union is a pioneer in the regulation of automated decision-making. The regime for solely automated decision-making under Article 22 of the General Data Protection Regulation (GDPR), including the interpretative guidance of the Article 29 Working Party (WP29, replaced by the European Data Protection Board under the GDPR), has become more substantial (i.e., less formalistic) than was the case under Article 15 of the Data Protection Directive. This has been achieved by: endorsing a non-strict concept of ‘solely’ automated decisions; explicitly recognising the enhanced protection required for vulnerable adults and children; linking the data subject’s right to an explanation to the right to challenge automated decisions; and validating the ‘general prohibition’ approach to Article 22(1). These positive developments enhance legal certainty and ensure higher levels of protection for individuals. They represent a step towards the development of a more mature and sophisticated regime for automated decision-making that is committed to helping individuals retain adequate levels of autonomy and control, whilst meeting the technology and innovation demands of the data-driven society.
This chapter introduces the notion of “wake neutrality” of artificial intelligence devices and reviews its implication for wake-word approaches in open conversational commerce (OCC) devices such as Amazon’s Alexa, Google Home and Apple’s Siri. Examples illustrate how neutrality requirements such as explainability, auditability, quality, configurability, institutionalization, and non-discrimination may impact the various layers of a complete artificial intelligence architecture stack. The legal programming implications of these requirements for algorithmic law enforcement are also analysed. The chapter concludes with a discussion of the possible role of standards bodies in setting a neutral, secure and open legal programming voice name system (VNS) for human-to-AI interactions to include an “emotional firewall.”
High-frequency trading has become important on financial markets and is one of the first areas in algorithmic trading to be intensely regulated. This chapter reviews the EU approach to regulation of algorithmic trading, which can be taken as a blueprint for other regulations on algorithms by focusing on organizational requirements such as pre- and post-trade controls and real-time monitoring.
Despite their profound and growing influence on our lives, algorithms remain a partial “black box.” Keeping the risks that arise from rule-based and learning systems in check is a challenging task for both: society and the legal system. This chapter examines existing and adaptable legal solutions and complements them with further proposals. It designs a regulatory model in four steps along the time axis: preventive regulation instruments; accompanying risk management; ex post facto protection; and an algorithmic responsibility code. Together, these steps form a legislative blueprint to further regulate artificial intelligence applications.
The legal consideration of a robot machine as a ‘product’ has led to the application of civil liability rules for producers. Nevertheless, some aspects of the relevant European regulation suggest special attention should be devoted to a review in this field in relation to robotics. Types of defect, the meanings of the term ‘producer’, the consumer expectation test and non-pecuniary damages are some of the aspects that could give rise to future debate. The inadequacy of the current Directive 85/374/EEC for regulating damages caused by robots, particularly those with self-learning capability, is highlighted by the document ‘Follow up to the EU Parliament Resolution of 16 February 2017 on Civil Law Rules on Robotics’. Other relevant documents are the Report on “Liability for AI and other emerging digital technologies” prepared by the Expert Group on Liability and New Technologies, the “Report on the safety and liability implications of Artificial Intelligence, the Internet of Things and Robotics” [COM(2020) 64 final, 19.2.2020] and the White Paper “On Artificial Intelligence – A European approach to excellence and trust” [COM(2020) 65 final, 19.2.2020].
We investigate the usage of semantic information for morphological segmentation since words that are derived from each other will remain semantically related. We use mathematical models such as maximum likelihood estimate (MLE) and maximum a posteriori estimate (MAP) by incorporating semantic information obtained from dense word vector representations. Our approach does not require any annotated data which make it fully unsupervised and require only a small amount of raw data together with pretrained word embeddings for training purposes. The results show that using dense vector representations helps in morphological segmentation especially for low-resource languages. We present results for Turkish, English, and German. Our semantic MLE model outperforms other unsupervised models for Turkish language. Our proposed models could be also used for any other low-resource language with concatenative morphology.
Alleviating pain is good and abandoning hope is bad. We instinctively understand how words like alleviate and abandon affect the polarity of a phrase, inverting or weakening it. When these words are content words, such as verbs, nouns, and adjectives, we refer to them as polarity shifters. Shifters are a frequent occurrence in human language and an important part of successfully modeling negation in sentiment analysis; yet research on negation modeling has focused almost exclusively on a small handful of closed-class negation words, such as not, no, and without. A major reason for this is that shifters are far more lexically diverse than negation words, but no resources exist to help identify them. We seek to remedy this lack of shifter resources by introducing a large lexicon of polarity shifters that covers English verbs, nouns, and adjectives. Creating the lexicon entirely by hand would be prohibitively expensive. Instead, we develop a bootstrapping approach that combines automatic classification with human verification to ensure the high quality of our lexicon while reducing annotation costs by over 70%. Our approach leverages a number of linguistic insights; while some features are based on textual patterns, others use semantic resources or syntactic relatedness. The created lexicon is evaluated both on a polarity shifter gold standard and on a polarity classification task.
This article describes the criteria for identifying the focus of negation in Spanish. This work involved an in-depth linguistic analysis of the focus of negation through which we identified some 10 different types of criteria that account for a wide variety of constructions containing negation. These criteria account for all the cases that appear in the NewsCom corpus and were assessed in the annotation of this corpus. The NewsCom corpus consists of 2955 comments posted in response to 18 different news articles from online newspapers. The NewsCom corpus contains 2965 negative structures with their corresponding negation marker, scope, and focus. This is the first corpus annotated with focus in Spanish and it is freely available. It is a valuable resource that can be used both for the training and evaluation of systems that aim to automatically detect the scope and focus of negation and for the linguistic analysis of negation grounded in real data.
Distributional semantic word representations are at the basis of most modern NLP systems. Their usefulness has been proven across various tasks, particularly as inputs to deep learning models. Beyond that, much work investigated fine-tuning the generic word embeddings to leverage linguistic knowledge from large lexical resources. Some work investigated context-dependent word token embeddings motivated by word sense disambiguation, using sequential context and large lexical resources. More recently, acknowledging the need for an in-context representation of words, some work leveraged information derived from language modelling and large amounts of data to induce contextualised representations. In this paper, we investigate Syntax-Aware word Token Embeddings (SATokE) as a way to explicitly encode specific information derived from the linguistic analysis of a sentence in vectors which are input to a deep learning model. We propose an efficient unsupervised learning algorithm based on tensor factorisation for computing these token embeddings given an arbitrary graph of linguistic structure. Applying this method to syntactic dependency structures, we investigate the usefulness of such token representations as part of deep learning models of text understanding. We encode a sentence either by learning embeddings for its tokens and the relations between them from scratch or by leveraging pre-trained relation embeddings to infer token representations. Given sufficient data, the former is slightly more accurate than the latter, yet both provide more informative token embeddings than standard word representations, even when the word representations have been learned on the same type of context from larger corpora (namely pre-trained dependency-based word embeddings). We use a large set of supervised tasks and two major deep learning families of models for sentence understanding to evaluate our proposal. We empirically demonstrate the superiority of the token representations compared to popular distributional representations of words for various sentence and sentence pair classification tasks.
Accurate negation identification is one of the most important tasks in the context of sentiment analysis. In order to correctly interpret the sentiment value of a particular expression, we need to identify whether it is in the scope of negation. While much of the work on negation detection has focused on English, we have seen recent developments that provide accurate identification of negation in other languages. In this paper, we provide an overview of negation detection systems and describe an implementation of a Spanish system for negation cue detection and scope identification. We apply this system to the sentiment analysis task, confirming also for Spanish that improvements can be gained from accurate negation detection. The paper contributes an implementation of negation detection for sentiment analysis in Spanish and a detailed error analysis. This is the first work in Spanish in which a machine learning negation processing system is applied to the sentiment analysis task. Existing methods have used negation rules that have not been assessed, perhaps because the first Spanish corpus annotated with negation for sentiment analysis has only recently become available.
Algorithms permeate our lives in numerous ways, performing tasks that until recently could only be carried out by humans. Artificial Intelligence (AI) technologies, based on machine learning algorithms and big-data-powered systems, can perform sophisticated tasks such as driving cars, analyzing medical data, and evaluating and executing complex financial transactions - often without active human control or supervision. Algorithms also play an important role in determining retail pricing, online advertising, loan qualification, and airport security. In this work, Martin Ebers and Susana Navas bring together a group of scholars and practitioners from across Europe and the US to analyze how this shift from human actors to computers presents both practical and conceptual challenges for legal and regulatory systems. This book should be read by anyone interested in the intersection between computer science and law, how the law can better regulate algorithmic design, and the legal ramifications for citizens whose behavior is increasingly dictated by algorithms.
Neural Networks applied to Machine Translation need a finite vocabulary to express textual information as a sequence of discrete tokens. The currently dominant subword vocabularies exploit statistically-discovered common parts of words to achieve the flexibility of character-based vocabularies without delegating the whole learning of word formation to the neural network. However, they trade this for the inability to apply word-level token associations, which limits their use in semantically-rich areas and prevents some transfer learning approaches e.g. cross-lingual pretrained embeddings, and reduces their interpretability. In this work, we propose new hybrid linguistically-grounded vocabulary definition strategies that keep both the advantages of subword vocabularies and the word-level associations, enabling neural networks to profit from the derived benefits. We test the proposed approaches in both morphologically rich and poor languages, showing that, for the former, the quality in the translation of out-of-domain texts is improved with respect to a strong subword baseline.
Automatic detection of negated content is often a prerequisite in information extraction systems in various domains. In the biomedical domain especially, this task is important because negation plays an important role. In this work, two main contributions are proposed. First, we work with languages which have been poorly addressed up to now: Brazilian Portuguese and French. Thus, we developed new corpora for these two languages which have been manually annotated for marking up the negation cues and their scope. Second, we propose automatic methods based on supervised machine learning approaches for the automatic detection of negation marks and of their scopes. The methods show to be robust in both languages (Brazilian Portuguese and French) and in cross-domain (general and biomedical languages) contexts. The approach is also validated on English data from the state of the art: it yields very good results and outperforms other existing approaches. Besides, the application is accessible and usable online. We assume that, through these issues (new annotated corpora, application accessible online, and cross-domain robustness), the reproducibility of the results and the robustness of the NLP applications will be augmented.