Hostname: page-component-89b8bd64d-9prln Total loading time: 0 Render date: 2026-05-06T16:25:10.507Z Has data issue: false hasContentIssue false

A PROBABILISTIC 1 METHOD FOR CLUSTERING HIGH-DIMENSIONAL DATA

Published online by Cambridge University Press:  05 April 2021

Tsvetan Asamov
Affiliation:
Department of Management Science & Information Systems, Rutgers Business School, 100 Rockafeller Road, Piscataway, NJ 08854, USA E-mails: tsvetan.asamov@gmail.com; adi.benisrael@gmail.com
Adi Ben-Israel
Affiliation:
Department of Management Science & Information Systems, Rutgers Business School, 100 Rockafeller Road, Piscataway, NJ 08854, USA E-mails: tsvetan.asamov@gmail.com; adi.benisrael@gmail.com

Abstract

In general, the clustering problem is NP-hard, and global optimality cannot be established for non-trivial instances. For high-dimensional data, distance-based methods for clustering or classification face an additional difficulty, the unreliability of distances in very high-dimensional spaces. We propose a probabilistic, distance-based, iterative method for clustering data in very high-dimensional space, using the ℓ1-metric that is less sensitive to high dimensionality than the Euclidean distance. For K clusters in ℝn, the problem decomposes to K problems coupled by probabilities, and an iteration reduces to finding Kn weighted medians of points on a line. The complexity of the algorithm is linear in the dimension of the data space, and its performance was observed to improve significantly as the dimension increases.

Information

Type
Research Article
Copyright
Copyright © The Author(s), 2021. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable