Hostname: page-component-6766d58669-h8lrw Total loading time: 0 Render date: 2026-05-15T12:10:38.772Z Has data issue: false hasContentIssue false

A new approach for textual feature selection based on N-composite isolated labels

Published online by Cambridge University Press:  29 April 2019

Samir Elloumi*
Affiliation:
University of Tunis El Manar, Faculty of Sciences of Tunis, Computer Science Department, Tunis, Tunisia
*
*Corresponding author. Email: samir.elloumi@fst.utm.tn

Abstract

Textual Feature Selection (TFS) aims to extract relevant parts or segments from text as being the most relevant ones w.r.t. the information it expresses. The selected features are useful for automatic indexing, summarization, document categorization, knowledge discovery, so on. Regarding the huge amount of electronic textual data daily published, many challenges related to the semantic aspect as well as the processing efficiency are addressed. In this paper, we propose a new approach for TFS based on Formal Concept Analysis background. Mainly, we propose to extract textual features by exploring the regularities in a formal context where isolated points exist. We introduce the notion of N-composite isolated points as a set of N words to be considered as a unique textual feature. We show that a reduced value of N (between 1 and 3) allows extracting significant textual features compared with existing approaches even for non-completely covering an initial formal context.

Information

Type
Article
Copyright
© Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable