A Unified Framework for Representation-based Subspace Clustering of Out-of-sample and Large-scale Data

A Unified Framework for Representation-based Subspace Clustering of Out-of-sample and Large-scale Data
Title:
A Unified Framework for Representation-based Subspace Clustering of Out-of-sample and Large-scale Data
Other Titles:
IEEE Transactions on Neural Networks and Learning Systems
Keywords:
Publication Date:
29 October 2015
Citation:
X. Peng; H. Tang; L. Zhang; Z. Yi; S. Xiao, "A Unified Framework for Representation-Based Subspace Clustering of Out-of-Sample and Large-Scale Data," in IEEE Transactions on Neural Networks and Learning Systems , vol.PP, no.99, pp.1-14 doi: 10.1109/TNNLS.2015.2490080
Abstract:
Under the framework of spectral clustering, the key of subspace clustering is building a similarity graph, which describes the neighborhood relations among data points. Some recent works build the graph using sparse, low-rank, and ℓ₂-norm-based representation, and have achieved the state-of-the-art performance. However, these methods have suffered from the following two limitations. First, the time complexities of these methods are at least proportional to the cube of the data size, which make those methods inefficient for solving the large-scale problems. Second, they cannot cope with the out-of-sample data that are not used to construct the similarity graph. To cluster each out-of-sample datum, the methods have to recalculate the similarity graph and the cluster membership of the whole data set. In this paper, we propose a unified framework that makes the representation-based subspace clustering algorithms feasible to cluster both the out-of-sample and the large-scale data. Under our framework, the large-scale problem is tackled by converting it as the out-of-sample problem in the manner of sampling, clustering, coding, and classifying. Furthermore, we give an estimation for the error bounds by treating each subspace as a point in a hyperspace. Extensive experimental results on various benchmark data sets show that our methods outperform several recently proposed scalable methods in clustering a large-scale data set.
License type:
PublisherCopyrights
Funding Info:
Description:
(c) 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.
ISSN:
2162-237X
Files uploaded:

File Size Format Action
tnnls-2015-p-4406.pdf 704.86 KB PDF Open