Search

Possession identification in text
CARMEN BANEA, RADA MIHALCEA
Journal:

Natural Language Engineering / Volume 24 / Issue 4 / July 2018

Published online by Cambridge University Press:

04 April 2018, pp. 589-610
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Just as industrialization matured from mass production to customization and personalization, so has the Web migrated from generic content to public disclosures of one’s most intimately held thoughts, opinions, and beliefs. This relatively new type of data is able to represent finer and more narrowly defined demographic slices. If until now researchers have primarily focused on leveraging personalized content to identify latent information such as gender, nationality, location, or age, this article seeks to establish a structured way of extracting possessions, or items that people own or are entitled to, as a way to ultimately provide insights into people’s behaviors and characteristics. We introduce the new task of ‘possession identification in text’, and release a novel dataset where possessions are marked at different confidence levels. We present experiments and results obtained when seeking to automatically identify and extract possessions from the text.

A survey of graphs in natural language processing *
VIVI NASTASE, RADA MIHALCEA, DRAGOMIR R. RADEV
Journal:

Natural Language Engineering / Volume 21 / Issue 5 / November 2015

Published online by Cambridge University Press:

12 October 2015, pp. 665-698
- Article
- - You have access
- PDF
- HTML
- Export citation
Graphs are a powerful representation formalism that can be applied to a variety of aspects related to language processing. We provide an overview of how Natural Language Processing problems have been projected into the graph framework, focusing in particular on graph construction – a crucial step in modeling the data to emphasize the phenomena targeted.

Word from the editors
ZORNITSA KOZAREVA, VIVI NASTASE, RADA MIHALCEA
Journal:

Natural Language Engineering / Volume 21 / Issue 5 / November 2015

Published online by Cambridge University Press:

20 April 2015, pp. 661-664
- Article
- - You have access
- PDF
- HTML
- Export citation
Graph structures naturally model connections. In natural language processing (NLP) connections are ubiquitous, on anything between small and web scale. We find them between words – as grammatical, collocation or semantic relations – contributing to the overall meaning, and maintaining the cohesive structure of the text and the discourse unity. We find them between concepts in ontologies or other knowledge repositories – since the early ages of artificial intelligence, associative or semantic networks have been proposed and used as knowledge stores, because they naturally capture the language units and relations between them, and allow for a variety of inference and reasoning processes, simulating some of the functionalities of the human mind. We find them between complete texts or web pages, and between entities in a social network, where they model relations at the web scale. Beyond the more often encountered ‘regular’ graphs, hypergraphs have also appeared in our field to model relations between more than two units.

Explorations in lexical sample and all-words lexical substitution
RAVI SINHA, RADA MIHALCEA
Journal:

Natural Language Engineering / Volume 20 / Issue 1 / January 2014

Published online by Cambridge University Press:

09 October 2012, pp. 99-129
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
In this paper, we experiment with several techniques to solve the problem of lexical substitution, both in a lexical sample as well as an all-words setting, and compare the benefits of combining multiple lexical resources using both unsupervised and supervised approaches. Overall in the lexical sample setting, the results obtained through the combination of several resources exceed the current state-of-the-art when selecting the best substitute for a given target word, and place second when selecting the top ten substitutes, thus demonstrating the usefulness of the approach. Further, we put forth a novel exploration in all-words lexical substitution and set ground for further explorations of this more generalized setting.

2 - Graph-Based Algorithms
from Part I - Introduction to Graph Theory
Rada Mihalcea, University of North Texas, Dragomir Radev, University of Michigan, Ann Arbor
Book:

Graph-based Natural Language Processing and Information Retrieval

Published online:

01 June 2011

Print publication:

11 April 2011, pp 20-50
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter discusses frequently used graph-theoretical algorithms, with emphasis on those algorithms that are relevant to the text-processing applications addressed in Parts III and IV. The chapter covers graph-traversal, including depth-first and breadth-first strategies, minimum path length, and minimum spanning trees; flow-on graphs, including algorithms for mincut/max-flow; graph matching; random walks, with harmonic functions and electrical networks; and linear algebra on graphs.
Depth-First Graph Traversal
Algorithms for graph traversal are concerned with reaching all of the nodes in a graph while obeying certain constraints. The depth-first and breadth-first traversal strategies are concerned primarily with the order in which the nodes in the graph are traversed. The minimum-spanning-tree algorithm reaches to all nodes in the graph and ensures at the same time that the traversal path forms a tree.
Depth-first traversal is considered in this section. The next section addresses breadth-first traversal.
In depth-first traversal, the traversal of the graph starts from one node and then iteratively progresses by moving from each node to one of its neighbors until it reaches a node that has no untraversed neighbors. When such a “dead end” node is hit, the algorithm backtracks and continues the traversal with other candidate nodes until all nodes in the graph are covered.
Consider the graph provided as an example in Figure 1.2 and replicated in Figure 2.1. Assuming that we start with the node A in the undirected graph shown in Figure 2.1(a), the neighbors of A are B, C, and D.

Introduction
Rada Mihalcea, University of North Texas, Dragomir Radev, University of Michigan, Ann Arbor
Book:

Graph-based Natural Language Processing and Information Retrieval

Published online:

01 June 2011

Print publication:

11 April 2011, pp 1-8
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Graph theory is a well-studied discipline as are the fields of natural language processing and information retrieval. Traditionally, these areas of study have been perceived as distinct, with different algorithms, different applications, and different potential end-users. However, as recent research work has shown, these disciplines in fact are intimately connected, with much variety in the way that natural language processing and information retrieval applications find efficient solutions within graph-theoretical frameworks.
In a cohesive text, language units – whether they are words, phrases, or entire sentences – are connected through various relationships, which contribute to the overall meaning and maintain the cohesive structure and discourse unity of the text. Since the early stages of artificial intelligence, associative or semantic networks have been proposed as representations that enable the storage of such language units and their interconnecting relationships, which allow for a variety of inference and reasoning processes that simulate functionalities of the human mind (Sowa 1983). The symbolic structures that emerge from these representations correspond naturally to graphs – in which text constituents are represented as vertices and their interconnecting relationships form the edges in the graph.
Many text-processing applications can be modeled by means of a graph. These data structures have the capability to encode naturally the meaning and structure of a cohesive text and to follow closely the associative or semantic memory representations.

9 - Applications
from Part IV - Graph-Based Natural Language Processing
Rada Mihalcea, University of North Texas, Dragomir Radev, University of Michigan, Ann Arbor
Book:

Graph-based Natural Language Processing and Information Retrieval

Published online:

01 June 2011

Print publication:

11 April 2011, pp 149-178
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter addresses graph-theoretical methods for text-processing applications. The discussion includes topic identification text summarization using graph-centrality methods; keyword extraction using randomwalk language models; text segmentation using normalized-cut criteria for graph partitioning; graph structures to encode discourse relationships; word graphs for decoding in machine translation and speech processing; randomwalk algorithms for translation selection in cross-language information retrieval; and graph representations and patterns on graphs for information extraction and question answering.
Summarization
Automatic summarization has received attention from the natural language processing community ever since the early approaches to automatic abstraction that laid the foundations of the current text-summarization techniques (Luhn 1958; Edmunson 1969). The literature typically distinguishes between extraction, which is concerned with identification of the information important in the input text, and abstraction, which involves a generation step to add fluency to a previously compressed text.
Most efforts to date have focused on the extraction step, which is perhaps the most critical component in a successful summarization algorithm. Among these efforts, some of the most promising approaches are based on graph representations of the text, which enable the application of graphtheoretical algorithms to identify the most salient elements in the text.
One of the first summarization techniques based on graphs is a method that creates graph representations for encyclopedic articles, in which nodes correspond to paragraphs and edges connect lexically similar paragraphs (Salton et al. 1994, 1997).

Part IV - Graph-Based Natural Language Processing
Rada Mihalcea, University of North Texas, Dragomir Radev, University of Michigan, Ann Arbor
Book:

Graph-based Natural Language Processing and Information Retrieval

Published online:

01 June 2011

Print publication:

11 April 2011, pp 121-122
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

8 - Syntax
from Part IV - Graph-Based Natural Language Processing
Rada Mihalcea, University of North Texas, Dragomir Radev, University of Michigan, Ann Arbor
Book:

Graph-based Natural Language Processing and Information Retrieval

Published online:

01 June 2011

Print publication:

11 April 2011, pp 140-148
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter covers the use of graph-theoretical algorithms for tasks in syntax, including part-of-speech tagging using graphs that encode word and tag dependencies; dependency parsing using minimum spanning trees; prepositional attachment using word-dependency distributions induced from random-walk models; and co-reference resolution using graphclustering and min-cut algorithms.
Part-of-Speech Tagging
Part-of-speech tagging is defined as the task of automatically assigning parts of speech to words. For instance, given the text, “This is a book,” a part-of-speech tagger identifies that “this” is a pronoun, “is” is a verb, “a” is a determiner, and “book” is a noun. Part-of-speech tagging is required by almost any text-processing task, including word-sense disambiguation, parsing, and semantic analysis. As one of the first processing steps in any such application, the accuracy of the part-of-speech tagger directly impacts the accuracy of any subsequent text-processing steps.
Although most of the part-of-speech taggers developed to date rely on machine-learning algorithms with features drawn from the surrounding text of an ambiguous word, there are also other approaches such as unsupervised tagging using clustering algorithms. In particular, Biemann's part-of-speech tagger (2006c) is based on the idea of word co-occurrence. First, a bipartite graph of words that appear next to one another is built, followed by the calculation of the second power of that graph's connectivity matrix, thereby connecting words that appear in the same context. Hence, words like red, green, and blue appear in the same cluster because they are distributionally similar (Lee 1997).

Frontmatter
Rada Mihalcea, University of North Texas, Dragomir Radev, University of Michigan, Ann Arbor
Book:

Graph-based Natural Language Processing and Information Retrieval

Published online:

01 June 2011

Print publication:

11 April 2011, pp i-iv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Index
Rada Mihalcea, University of North Texas, Dragomir Radev, University of Michigan, Ann Arbor
Book:

Graph-based Natural Language Processing and Information Retrieval

Published online:

01 June 2011

Print publication:

11 April 2011, pp 191-192
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Contents
Rada Mihalcea, University of North Texas, Dragomir Radev, University of Michigan, Ann Arbor
Book:

Graph-based Natural Language Processing and Information Retrieval

Published online:

01 June 2011

Print publication:

11 April 2011, pp v-viii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

6 - Text Clustering
from Part III - Graph-Based Information Retrieval
Rada Mihalcea, University of North Texas, Dragomir Radev, University of Michigan, Ann Arbor
Book:

Graph-based Natural Language Processing and Information Retrieval

Published online:

01 June 2011

Print publication:

11 April 2011, pp 106-120
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter addresses text clustering using a variety of graph-based algorithms, including the Fiedler method, the Kernighan–Lin method, text categorization with min-cuts, betweenness, and random walks.
Text clustering is a technique that is frequently used to organize collections of documents, and it is often used in conjunction with other tasks such as information retrieval and topic identification. The goal of text clustering is to partition a given set of documents into meaningful classes, without assuming a predefined list of categories. For example, given a collection of documents that contain the word apple, a text categorizer would divide the documents into two major clusters, one pertaining to computers (related to the meaning of Apple Macintosh) and the other pertaining to food and agriculture (related to the meaning of apple fruits and trees). A sample set of documents and the resulting clusters are illustrated in Figure 6.1.
Clustering can be either flat, in which all clusters are at the same level, or hierarchical, in which the clusters are organized into a hierarchy, which is often represented as a tree structure, or a dendrogram. Assuming the example in Figure 6.1, a hierarchical clustering algorithm would further divide the cluster on food and agriculture into two smaller clusters, one pertaining to apple trees and the other to apple fruits. The corresponding dendrogram is shown in Figure 6.2.
The quality of an automatically generated set of clusters is usually measured using two metrics considered standard in clustering evaluation – namely, purity and entropy (Zhao and Karypis 2001), which are computed for the automatically generated sense groupings relative to the “goldstandard” clusters (i.e., clusters that are considered correct for the purpose of evaluation).

Part II - Networks
Rada Mihalcea, University of North Texas, Dragomir Radev, University of Michigan, Ann Arbor
Book:

Graph-based Natural Language Processing and Information Retrieval

Published online:

01 June 2011

Print publication:

11 April 2011, pp 51-52
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Part III - Graph-Based Information Retrieval
Rada Mihalcea, University of North Texas, Dragomir Radev, University of Michigan, Ann Arbor
Book:

Graph-based Natural Language Processing and Information Retrieval

Published online:

01 June 2011

Print publication:

11 April 2011, pp 89-90
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

1 - Notations, Properties, and Representations
from Part I - Introduction to Graph Theory
Rada Mihalcea, University of North Texas, Dragomir Radev, University of Michigan, Ann Arbor
Book:

Graph-based Natural Language Processing and Information Retrieval

Published online:

01 June 2011

Print publication:

11 April 2011, pp 11-19
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter introduces the most commonly used graph notations that are referenced throughout the book. The chapter also covers the most important properties of graphs: diameter; clustering coefficient; degree distribution and average shortest path; and graph types and classifications, including random graphs, regular graphs, small-world graphs, and scale-free graphs. Finally, the chapter describes the representations that can be used in practical implementations, including adjacency lists and matrices.
Graph Terminology and Notations
A graph is a data structure consisting of a set of vertices connected by a set of edges that can be used to model relationships among the objects in a collection. Graphs are typically studied in the field of graph theory, which is an area of study in mathematical research.
Figure 1.1 shows the seven bridges in Königsberg that led to the invention of graph theory. The problem is to cross all bridges without crossing any bridge twice. Euler showed in 1741 that it was impossible to do so (Euler 1741).
Formally, a graph is defined as a set G = (V, E), where V is a collection of vertices V = {Vi, i = 1, n} and E is a collection of edges over V, Eij = {(Vi, Vj), Vi ∈ V, Vj ∈ V}. In alternative terminology, graphs also are referred to as networks, vertices as nodes, and edges as links.

4 - Language Networks
from Part II - Networks
Rada Mihalcea, University of North Texas, Dragomir Radev, University of Michigan, Ann Arbor
Book:

Graph-based Natural Language Processing and Information Retrieval

Published online:

01 June 2011

Print publication:

11 April 2011, pp 78-88
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Part I - Introduction to Graph Theory
Rada Mihalcea, University of North Texas, Dragomir Radev, University of Michigan, Ann Arbor
Book:

Graph-based Natural Language Processing and Information Retrieval

Published online:

01 June 2011

Print publication:

11 April 2011, pp 9-10
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

3 - Random Networks
from Part II - Networks
Rada Mihalcea, University of North Texas, Dragomir Radev, University of Michigan, Ann Arbor
Book:

Graph-based Natural Language Processing and Information Retrieval

Published online:

01 June 2011

Print publication:

11 April 2011, pp 53-77
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter discusses random and naturally occurring networks. We look at their properties as well as some algorithms for computing those properties.
Networks and Graphs
In the literature, the terms network and graph often are used interchangeably. Here, we use network to indicate a representation of a natural relationship among objects (in the context of this book, mostly linguistic objects) and graph for relationships generated through an automatic process. Furthermore, networks have a more complex structure than some types of graphs, such as lattices (i.e., predictable degree) or random graphs (i.e., completely arbitrary edges).
Network theory is a relatively new field. Apart from classic papers that appeared in the literature for other domains of science (e.g., economics and information science), most work in this field has been published since 1998.
The field of network theory studies the behavior of networks (mostly naturally occurring networks), for example, among people, drugs, animals, and scientific papers – under specific assumptions. It is concerned with the measurement of certain topological aspects of networks such as their degree distribution (i.e., how many nodes are connected to a specific node), the distribution of their connected components, and their diameter (e.g., the maximal distance between any two arbitrarily chosen nodes). Some properties can be computed exactly, in closed form, whereas others are computed using sampling. One goal of network theory is to predict the extrinsic behavior of a network based on its measurable properties.

5 - Link Analysis for the World Wide Web
from Part III - Graph-Based Information Retrieval
Rada Mihalcea, University of North Texas, Dragomir Radev, University of Michigan, Ann Arbor
Book:

Graph-based Natural Language Processing and Information Retrieval

Published online:

01 June 2011

Print publication:

11 April 2011, pp 91-105
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter addresses link-analysis methods used by search engines, such as PageRank and HITS, and covers topics relevant to their application, including method stability, the combination of link- and content-based models, topic-sensitive ranking, and query-dependent link analysis.
The Web as a Graph
The Web – a common abbreviation for the World Wide Web – consists of billions of interlinked hypertext pages. These pages contain text, images, videos, or sounds and are usually viewed using Web browsers, such as Firefox or Internet Explorer. Users can navigate the Web by either directly typing the address of a Web page (i.e., the URL) inside a browser or following the links that connect Web pages among them.
The Web is a typical example of a graph, with Web pages corresponding to vertices in the graph and links between pages corresponding to directed edges. For instance, if the page http://www.unt.edu includes a link to the page http://www.cs.unt.edu and another to the page http://www.htsc.unt.edu, and the latter page in turn links to the page of the National Institutes of Health http://www.nih.gov and also back to the http://www.unt.edu page, it means that these four pages form a subgraph of four vertices with four edges, as illustrated in Figure 5.1.
Although the size of the Web is generally considered to be unknown, there are various estimates concerning the size of the indexed Web – that is, the subset of the Web that is covered by search engines.

Search Results

Refine search

Refine search

Actions for selected content:

25 results

Possession identification in text

A survey of graphs in natural language processing *

Word from the editors

Explorations in lexical sample and all-words lexical substitution

2 - Graph-Based Algorithms

Summary

Introduction

Summary

9 - Applications

Summary

Part IV - Graph-Based Natural Language Processing

8 - Syntax

Summary

Frontmatter

Index

Contents

6 - Text Clustering

Summary

Part II - Networks

Part III - Graph-Based Information Retrieval

1 - Notations, Properties, and Representations

Summary

4 - Language Networks

Part I - Introduction to Graph Theory

3 - Random Networks

Summary

5 - Link Analysis for the World Wide Web

Summary

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

25 results

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary