Search results for Artificial Intelligence and Natural Language Processing

Part Three - Emerging Directions in Personalized Machine Learning
Julian McAuley, University of California, San Diego
Book:

Personalized Machine Learning

Published online:

20 January 2022

Print publication:

03 February 2022, pp 217-218
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Part Two - Fundamentals of Personalized Machine Learning
Julian McAuley, University of California, San Diego
Book:

Personalized Machine Learning

Published online:

20 January 2022

Print publication:

03 February 2022, pp 79-80
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

2 - Regression and Feature Engineering
from Part One - Machine Learning Primer
Julian McAuley, University of California, San Diego
Book:

Personalized Machine Learning

Published online:

20 January 2022

Print publication:

03 February 2022, pp 19-48
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

5 - Model-Based Approaches to Recommendation
from Part Two - Fundamentals of Personalized Machine Learning
Julian McAuley, University of California, San Diego
Book:

Personalized Machine Learning

Published online:

20 January 2022

Print publication:

03 February 2022, pp 104-143
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

8 - Personalized Models of Text
from Part Three - Emerging Directions in Personalized Machine Learning
Julian McAuley, University of California, San Diego
Book:

Personalized Machine Learning

Published online:

20 January 2022

Print publication:

03 February 2022, pp 219-251
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Frontmatter
Julian McAuley, University of California, San Diego
Book:

Personalized Machine Learning

Published online:

20 January 2022

Print publication:

03 February 2022, pp i-iv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

4 - Introduction to Recommender Systems
from Part Two - Fundamentals of Personalized Machine Learning
Julian McAuley, University of California, San Diego
Book:

Personalized Machine Learning

Published online:

20 January 2022

Print publication:

03 February 2022, pp 81-103
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

9 - Personalized Models of Visual Data
from Part Three - Emerging Directions in Personalized Machine Learning
Julian McAuley, University of California, San Diego
Book:

Personalized Machine Learning

Published online:

20 January 2022

Print publication:

03 February 2022, pp 252-272
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Ranking facts for explaining answers to elementary science questions
Jennifer D’Souza, Isaiah Onando Mulang, Sören Auer
Journal:

Natural Language Engineering / Volume 29 / Issue 2 / March 2023

Published online by Cambridge University Press:

24 January 2022, pp. 228-253
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
In multiple-choice exams, students select one answer from among typically four choices and can explain why they made that particular choice. Students are good at understanding natural language questions and based on their domain knowledge can easily infer the question’s answer by “connecting the dots” across various pertinent facts. Considering automated reasoning for elementary science question answering, we address the novel task of generating explanations for answers from human-authored facts. For this, we examine the practically scalable framework of feature-rich support vector machines leveraging domain-targeted, hand-crafted features. Explanations are created from a human-annotated set of nearly 5000 candidate facts in the WorldTree corpus. Our aim is to obtain better matches for valid facts of an explanation for the correct answer of a question over the available fact candidates. To this end, our features offer a comprehensive linguistic and semantic unification paradigm. The machine learning problem is the preference ordering of facts, for which we test pointwise regression versus pairwise learning-to-rank. Our contributions, originating from comprehensive evaluations against nine existing systems, are (1) a case study in which two preference ordering approaches are systematically compared, and where the pointwise approach is shown to outperform the pairwise approach, thus adding to the existing survey of observations on this topic; (2) since our system outperforms a highly-effective TF-IDF-based IR technique by 3.5 and 4.9 points on the development and test sets, respectively, it demonstrates some of the further task improvement possibilities (e.g., in terms of an efficient learning algorithm, semantic features) on this task; (3) it is a practically competent approach that can outperform some variants of BERT-based reranking models; and (4) the human-engineered features make it an interpretable machine learning model for the task.

Enhancement of Twitter event detection using news streams
Samaneh Karimi, Azadeh Shakery, Rakesh M. Verma
Journal:

Natural Language Engineering / Volume 29 / Issue 2 / March 2023

Published online by Cambridge University Press:

24 January 2022, pp. 181-200
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
A new framework for improving event detection is proposed that employs joint information in news media content and social networks, such as Twitter, to leverage detailed coverage of news media and the timeliness of social media. Specifically, a short text clustering method is employed to detect events from tweets, then the language model representations of the detected events are expanded using another set of events obtained from news articles published simultaneously. The expanded representations of events are employed as a new initialization of the clustering method to run another iteration and consequently enhance the event detection results. The proposed framework is evaluated using two datasets: a tweet dataset with event labels and a news dataset containing news articles published during the same time interval as the tweets. Experimental results show that the proposed framework improves the event detection results in terms of F1 measure compared to the results obtained from tweets only.

Personalized Machine Learning

Julian McAuley
Published online:

20 January 2022

Print publication:

03 February 2022
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Every day we interact with machine learning systems offering individualized predictions for our entertainment, social connections, purchases, or health. These involve several modalities of data, from sequences of clicks to text, images, and social interactions. This book introduces common principles and methods that underpin the design of personalized predictive models for a variety of settings and modalities. The book begins by revising 'traditional' machine learning models, focusing on adapting them to settings involving user data, then presents techniques based on advanced principles such as matrix factorization, deep learning, and generative modeling, and concludes with a detailed study of the consequences and risks of deploying personalized predictive systems. A series of case studies in domains ranging from e-commerce to health plus hands-on projects and code examples will give readers understanding and experience with large-scale real-world datasets and the ability to design models and systems for a wide range of applications.

Gamified crowdsourcing for idiom corpora construction
GülŞen Eryiğit, Ali Şentaş, Johanna Monti
Journal:

Natural Language Engineering / Volume 29 / Issue 4 / July 2023

Published online by Cambridge University Press:

20 January 2022, pp. 909-941
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Learning idiomatic expressions is seen as one of the most challenging stages in second-language learning because of their unpredictable meaning. A similar situation holds for their identification within natural language processing applications such as machine translation and parsing. The lack of high-quality usage samples exacerbates this challenge not only for humans but also for artificial intelligence systems. This article introduces a gamified crowdsourcing approach for collecting language learning materials for idiomatic expressions; a messaging bot is designed as an asynchronous multiplayer game for native speakers who compete with each other while providing idiomatic and nonidiomatic usage examples and rating other players’ entries. As opposed to classical crowd-processing annotation efforts in the field, for the first time in the literature, a crowd-creating & crowd-rating approach is implemented and tested for idiom corpora construction. The approach is language-independent and evaluated on two languages in comparison to traditional data preparation techniques in the field. The reaction of the crowd is monitored under different motivational means (namely, gamification affordances and monetary rewards). The results reveal that the proposed approach is powerful in collecting the targeted materials, and although being an explicit crowdsourcing approach, it is found entertaining and useful by the crowd. The approach has been shown to have the potential to speed up the construction of idiom corpora for different natural languages to be used as second-language learning material, training data for supervised idiom identification systems, or samples for lexicographic studies.

Statistics in Corpus Linguistics: A New Approach by Sean Wallis. New York/Oxon: Routledge, 2021. ISBN 9781138589384 (PB: 44.95), ISBN 9781138589377 (HB: 160.00), ISBN 9780429491696 (eBook: 44.95), xxvi+382 pages.
Zheyuan Dai
Journal:

Natural Language Engineering / Volume 29 / Issue 1 / January 2023

Published online by Cambridge University Press:

20 January 2022, pp. 177-180
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

A survey of the extraction and applications of causal relations
Brett Drury, Hugo Gonçalo Oliveira, Alneu de Andrade Lopes
Journal:

Natural Language Engineering / Volume 28 / Issue 3 / May 2022

Published online by Cambridge University Press:

20 January 2022, pp. 361-400
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Causationin written natural language can express a strong relationship between events and facts. Causation in the written form can be referred to as a causal relation where a cause event entails the occurrence of an effect event. A cause and effect relationship is stronger than a correlation between events, and therefore aggregated causal relations extracted from large corpora can be used in numerous applications such as question-answering and summarisation to produce superior results than traditional approaches. Techniques like logical consequence allow causal relations to be used in niche practical applications such as event prediction which is useful for diverse domains such as security and finance. Until recently, the use of causal relations was a relatively unpopular technique because the causal relation extraction techniques were problematic, and the relations returned were incomplete, error prone or simplistic. The recent adoption of language models and improved relation extractors for natural language such as Transformer-XL (Dai et al. (2019). Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860) has seen a surge of research interest in the possibilities of using causal relations in practical applications. Until now, there has not been an extensive survey of the practical applications of causal relations; therefore, this survey is intended precisely to demonstrate the potential of causal relations. It is a comprehensive survey of the work on the extraction of causal relations and their applications, while also discussing the nature of causation and its representation in text.

Joint learning of morphology and syntax with cross-level contextual information flow
Burcu Can, Hüseyin Aleçakır, Suresh Manandhar, Cem Bozşahin
Journal:

Natural Language Engineering / Volume 28 / Issue 6 / November 2022

Published online by Cambridge University Press:

20 January 2022, pp. 763-795
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
We propose an integrated deep learning model for morphological segmentation, morpheme tagging, part-of-speech (POS) tagging, and syntactic parsing onto dependencies, using cross-level contextual information flow for every word, from segments to dependencies, with an attention mechanism at horizontal flow. Our model extends the work of Nguyen and Verspoor ((2018). Proceedings of the CoNLL Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. The Association for Computational Linguistics, pp. 81–91.) on joint POS tagging and dependency parsing to also include morphological segmentation and morphological tagging. We report our results on several languages. Primary focus is agglutination in morphology, in particular Turkish morphology, for which we demonstrate improved performance compared to models trained for individual tasks. Being one of the earlier efforts in joint modeling of syntax and morphology along with dependencies, we discuss prospective guidelines for future comparison.

Authorship attribution using author profiling classifiers
Caio Deutsch, Ivandré Paraboni
Journal:

Natural Language Engineering / Volume 29 / Issue 1 / January 2023

Published online by Cambridge University Press:

19 January 2022, pp. 110-137
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Authorship attribution – the computational task of identifying the author of a given text document within a set of possible candidates – has been attracting interest in Natural Language Processing research for many years. At the same time, significant advances have also been observed in the related field of author profiling, that is, the computational task of learning author demographics from text such as gender, age and others. The close relation between the two topics – both of which focused on gaining knowledge about the individual who wrote a piece of text – suggests that research in these fields may benefit from each other. To illustrate this, this work addresses the issue of author identification with the aid of author profiling methods, adding demographics predictions to an authorship attribution architecture that may be particularly suitable to extensions of this kind, namely, a stack of classifiers devoted to different aspects of the input text (words, characters and text distortion patterns.) The enriched model is evaluated across a range of text domains, languages and author profiling estimators, and its results are shown to compare favourably to those obtained by a standard authorship attribution method that does not have access to author demographics predictions.

UNLT: Urdu Natural Language Toolkit
Jawad Shafi, Hafiz Rizwan Iqbal, Rao Muhammad Adeel Nawab, Paul Rayson
Journal:

Natural Language Engineering / Volume 29 / Issue 4 / July 2023

Published online by Cambridge University Press:

19 January 2022, pp. 942-977
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This study describes a Natural Language Processing (NLP) toolkit, as the first contribution of a larger project, for an under-resourced language—Urdu. In previous studies, standard NLP toolkits have been developed for English and many other languages. There is also a dire need for standard text processing tools and methods for Urdu, despite it being widely spoken in different parts of the world with a large amount of digital text being readily available. This study presents the first version of the UNLT (Urdu Natural Language Toolkit) which contains three key text processing tools required for an Urdu NLP pipeline; word tokenizer, sentence tokenizer, and part-of-speech (POS) tagger. The UNLT word tokenizer employs a morpheme matching algorithm coupled with a state-of-the-art stochastic n-gram language model with back-off and smoothing characteristics for the space omission problem. The space insertion problem for compound words is tackled using a dictionary look-up technique. The UNLT sentence tokenizer is a combination of various machine learning, rule-based, regular-expressions, and dictionary look-up techniques. Finally, the UNLT POS taggers are based on Hidden Markov Model and Maximum Entropy-based stochastic techniques. In addition, we have developed large gold standard training and testing data sets to improve and evaluate the performance of new techniques for Urdu word tokenization, sentence tokenization, and POS tagging. For comparison purposes, we have compared the proposed approaches with several methods. Our proposed UNLT, the training and testing data sets, and supporting resources are all free and publicly available for academic use.

A Survey on Machine Reading Comprehension Systems
Razieh Baradaran, Razieh Ghiasi, Hossein Amirkhani
Journal:

Natural Language Engineering / Volume 28 / Issue 6 / November 2022

Published online by Cambridge University Press:

19 January 2022, pp. 683-732
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Machine Reading Comprehension (MRC) is a challenging task and hot topic in Natural Language Processing. The goal of this field is to develop systems for answering the questions regarding a given context. In this paper, we present a comprehensive survey on diverse aspects of MRC systems, including their approaches, structures, input/outputs, and research novelties. We illustrate the recent trends in this field based on a review of 241 papers published during 2016–2020. Our investigation demonstrated that the focus of research has changed in recent years from answer extraction to answer generation, from single- to multi-document reading comprehension, and from learning from scratch to using pre-trained word vectors. Moreover, we discuss the popular datasets and the evaluation metrics in this field. The paper ends with an investigation of the most-cited papers and their contributions.

An analysis of property inference methods
Alex Rosenfeld, Katrin Erk
Journal:

Natural Language Engineering / Volume 29 / Issue 2 / March 2023

Published online by Cambridge University Press:

14 January 2022, pp. 201-227
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Property inference involves predicting properties for a word from its distributional representation. We focus on human-generated resources that link words to their properties and on the task of predicting these properties for unseen words. We introduce the use of label propagation, a semi-supervised machine learning approach, for this task and, in the first systematic study of models for this task, find that label propagation achieves state-of-the-art results. For more variety in the kinds of properties tested, we introduce two new property datasets.

$NLP: How to spend a billion dollars
Robert Dale
Journal:

Natural Language Engineering / Volume 28 / Issue 1 / January 2022

Published online by Cambridge University Press:

11 January 2022, pp. 125-136
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Funding for AI start-ups in general is booming, and natural language processing as a subfield has not missed out. We take a closer look at early-stage funding over the last year—just over US$1B in total—for companies that offer solutions that are based on or make significant use of NLP, providing a picture of what funders think is innovative and bankable in this space, and we make some observations on notable trends and developments.

Artificial Intelligence and Natural Language Processing

Refine search

Refine search

Actions for selected content:

3242 results in Artificial Intelligence and Natural Language Processing

Part Three - Emerging Directions in Personalized Machine Learning

Part Two - Fundamentals of Personalized Machine Learning

2 - Regression and Feature Engineering

5 - Model-Based Approaches to Recommendation

8 - Personalized Models of Text

Frontmatter

4 - Introduction to Recommender Systems

9 - Personalized Models of Visual Data

Ranking facts for explaining answers to elementary science questions

Enhancement of Twitter event detection using news streams

Personalized Machine Learning

Gamified crowdsourcing for idiom corpora construction

Statistics in Corpus Linguistics: A New Approach by Sean Wallis. New York/Oxon: Routledge, 2021. ISBN 9781138589384 (PB: 44.95), ISBN 9781138589377 (HB: 160.00), ISBN 9780429491696 (eBook: 44.95), xxvi+382 pages.

A survey of the extraction and applications of causal relations

Joint learning of morphology and syntax with cross-level contextual information flow

Authorship attribution using author profiling classifiers

UNLT: Urdu Natural Language Toolkit

A Survey on Machine Reading Comprehension Systems

An analysis of property inference methods

$NLP: How to spend a billion dollars

Artificial Intelligence and Natural Language Processing

Refine search

Refine search

Actions for selected content:

Save Search

3242 results in Artificial Intelligence and Natural Language Processing

Personalized Machine Learning