We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this chapter, we review random encoding models that directly reduce the dimensionality of distributional data without first building a co-occurrence matrix. While matrix distributional semantic models (DSMs) output either explicit or implicit distributional vectors, random encoding models only produce low-dimensional embeddings, and emphasize efficiency, scalability, and incrementality in building distributional representations. We discuss the mathematical foundation for models based on random encoding, the Johnson-Lindenstrauss lemma. We introduce Random Projection, before turning to Random Indexing and BEAGLE, a random encoding model that encodes sequential information in distributional vectors. Then, we introduce a variant of Random Indexing that uses random permutations to represent the position of the context lexemes with respect to the target, similarly to BEAGLE. Finally, we discuss Self-Organizing Maps, a kind of unsupervised neural network that shares important similarities with random encoding models.
Distributional semantics is the study of how distributional information can be used to model semantic facts. Its theoretical foundation has become known as the Distributional Hypothesis: Lexemes with similar linguistic contexts have similar meanings. This chapter presents the epistemological principles of distributional semantics. First, we explore the historical roots of the Distributional Hypothesis, tracing them in several different theoretical traditions, including European structuralism, American distributionalism, the later philosophy of Ludwig Wittgenstein, corpus linguistics, and behaviorist and cognitive psychology. Then, we discuss the place of distributional semantics in theoretical and computational linguistics.
The most recent development in distributional semantics is represented by models based on artificial neural networks. In this chapter, we focus on the use of neural networks to build static embeddings. Like random encoding models, neural networks incrementally learn embeddings by reducing the high dimensionality of distributional data without building an explicit co-occurrence matrix. Differing from the first generation of distributional semantic models (DSMs), also termed count models, the distributional representations produced by neural DSMs are the by-product of training the network to predict neighboring words, hence the name of predict models. Since semantically similar words tend to co-occur with similar contexts, the network learns to encode similar lexemes with similar distributional vectors. After introducing the basic concepts of neural computation, we illustrate neural language models and their use to learn distributional representations. Then we pass to describe the most popular static neural DSMs, CBOW, and Skip-Gram. We conclude the chapter with a comparison between count and predict models.
This chapter focuses on the evaluation of distributional semantic models (DSMs). Distributional semantics has usually favored intrinsic methods that test DSMs for their ability to model various kinds of semantic similarity and relatedness. Recently, extrinsic evaluation has also become very popular: the distributional vectors are fed into a downstream NLP task and are evaluated with the system’s performance. The goal of this chapter is twofold: (i) to present the most common evaluation methods in distributional semantics, and (ii) to carry out a large-scale comparison between the static DSMs reviewed in Part II. First, we discuss the notion of semantic similarity, which is central in distributional semantics. Then, we present the major tasks for intrinsic and extrinsic evaluation, and we analyze the performance of a representative group of static DSMs on several semantic tasks. Finally, we explore the differences of the semantic spaces produced by these models with Representational Similarity Analysis.
This chapter discusses the major types of matrix models, a rich and multifarious family of distributional semantic models (DSMs) that extend and generalize the vector space model in information retrieval from which they derive the use of co-occurrence matrices to represent distributional information. We first focus on a group of matrix DSMs (e.g., Latent Semantic Analysis) that we refer to as classical models, since they directly implement the basic procedure to build distributional representations introduced in Chapter 2. Then, we present DSMs that propose extensions and variants to classical ones. Latent Relational Analysis uses pairs of lexical items as targets to measure the semantic similarity of the relations between them. Distributional Memory represents distributional data with a high-order tensor, from which different types of co-occurrence matrices are derived to address various semantic tasks. Topic Models and GloVe introduce new approaches to reduce the dimensionality of the co-occurrence matrix, respectively based on probabilistic inference and a method strongly inspired by neural DSMs.
Lexical semantic competence is a multifaceted and complex reality, which includes the ability of drawing inferences, distinguishing different word senses, referring to the entities in the world, and so on. A long-standing tradition of research in linguistics and cognitive science has investigated these issues using symbolic representations. The aim of this chapter is to understand how and to what extent the major aspects of lexical meaning can be addressed with distributional representations. We have selected a group of research topics that have received particular attention in distributional semantics: (i) identifying and representing multiple meanings of lexical items, (ii) discriminating between different paradigmatic semantic relations, (iii) establishing cross-lingual links among lexemes, (iv) analyzing connotative aspects of meaning, (v) studying semantic change, (vi) grounding distributional representations in extralinguistic experiential data, and (vii) using distributional vectors in cognitive science to model the mental lexicon and semantic memory.
This chapter presents current research in compositional distributional semantics, which aims at designing methods to construct the interpretation of complex linguistic expressions from the distributional representations of the lexical items they contain. This theme includes two major questions that we are going to explore: What is the distributional representation of a phrase or sentence and to what extent it is able to encode key aspects of its meaning? How can we build such representations compositionally? After introducing the classical symbolic paradigm of compositionality based on function-argument structures and function application, we review different methods to create phrase and sentence vectors (simple vector operations, neural networks trained to learn sentence embeddings, etc.). Then, we investigate the context-sensitive nature of semantic representations, with a particular focus on the last generation of contextual embeddings, and distributional models of selectional preferences. We end with some general considerations about compositionality, semantic structures, and vector models of meaning.
The distributional representation of a lexical item is typically a vector representing its co-occurrences with linguistic contexts. This chapter introduces the basic notions to construct distributional semantic representations from corpora. We present (i) the major types of linguistic contexts used to characterize the distributional properties of lexical items (e.g., window-based and syntactic collocates and documents) , (ii) their representation with co-occurrence matrices, whose rows are labeled with lexemes and columns with contexts, (iii) mathematical methods to weight the importance of contexts (e.g., Pointwise Mutual Information and entropy), ( iv) the distinction between high-dimensional explicit vectors and low-dimensional embeddings with latent dimensions, (v) dimensionality reduction methods to generate embeddings from the original co-occurrence matrix (e.g., Singular Value Decomposition), and (vi) vector similarity measures (e.g., cosine similarity).
This chapter contains a synoptic view of the different types and generations of distributional semantic models (DSMs), including the distinction between static and contextual models. Part II then focuses on static DSMs, since they are still the best known and widely studied family of models, and they learn context-independent distributional representations that are useful for several linguistic and cognitive tasks.
Distributional semantics develops theories and methods to represent the meaning of natural language expressions, with vectors encoding their statistical distribution in linguistic contexts. It is at once a theoretical model to express meaning, a practical methodology to construct semantic representations, a computational framework for acquiring meaning from language data, and a cognitive hypothesis about the role of language usage in shaping meaning. This book aims to build a common understanding of the theoretical and methodological foundations of distributional semantics. Beginning with its historical origins, the text exemplifies how the distributional approach is implemented in distributional semantic models. The main types of computational models, including modern deep learning ones, are described and evaluated, demonstrating how various types of semantic issues are addressed by those models. Open problems and challenges are also analyzed. Students and researchers in natural language processing, artificial intelligence, and cognitive science will appreciate this book.