CoIn: Correlation Induced Clustering for Cognition of High Dimensional Bioinformatics Data

Page view(s)

Checked on Sep 09, 2025

Please use this identifier to cite or link to this item: https://oar.a-star.edu.sg/communities-collections/articles/18650

Title:

CoIn: Correlation Induced Clustering for Cognition of High Dimensional Bioinformatics Data

Journal Title:

IEEE Journal of Biomedical and Health Informatics

DOI:

10.1109/JBHI.2022.3179265

Publication URL:

http://dx.doi.org/10.1109/jbhi.2022.3179265

Authors:

Zeng Zeng, Ziyuan Zhao, Kaixin Xu, Yangfan Li, Cen Chen, Xiaofeng Zou, Yulan Wang, Wei Wei, Pierce KH Chow, Xiaoli Li

Keywords:

Electrical and Electronic Engineering, Computer Science Applications, Health Informatics, Health Information Management

Publication Date:

20 June 2022

Citation:

Zeng, Z., Zhao, Z., Xu, K., Li, Y., Chen, C., Zou, X., Wang, Y., Wei, W., Chow, P. K., & Li, X. (2022). CoIn: Correlation Induced Clustering for Cognition of High Dimensional Bioinformatics Data. IEEE Journal of Biomedical and Health Informatics, 1–1. https://doi.org/10.1109/jbhi.2022.3179265

Abstract:

Analysis of high dimensional biomedical data such as microarray gene expression data and mass spectrometry images, is crucial to provide better medical services including cancer subtyping, protein homology detection,etc. Clustering is a fundamental cognitive task which aims to group unlabeled data into multiple clusters based on their intrinsic similarities. The K-means algorithm is one of the most widely used clustering heuristics that aims at grouping the data objects into meaningful clusters such that the sum of squared Euclidean distances within each cluster is minimized. Its conceptual simplicity and computational efficiency make it easy to be used for wide applications of different data types. However, all features of data in K-means are considered equally in relevance, which distorts the performance when clustering high-dimensional data such as microarray gene expression data, mass spectrometry images, where there exist many redundant variables and correlated variables. In this paper, we propose a new correlation induced clustering, CoIn, to capture complex correlations among high dimensional data and guarantee the correlation consistency within each cluster. We evaluate the proposed method on a high dimensional mass spectrometry dataset of liver cancer tumor to explore the metabolic differences on tissues and discover the intra-tumor heterogeneity (ITH). By comparing the results of baselines and ours, it has been found that our method produces more explainable and understandable results for clinical analysis, which demonstrates the proposed clustering paradigm has the potential with application to knowledge discovery in high dimensional bioinformatics data.

License type:

Publisher Copyright

Funding Info:

This research is supported by core funding from: Institute for Infocomm Research
Grant Reference no. : SC20/20-132910-CORE

Description:

© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

URI:

https://oar.a-star.edu.sg/communities-collections/articles/18650

ISSN:

2168-2208
2168-2194

Collections:

Institute for Infocomm Research

Files uploaded:

Manuscripts in This Item:

File	Size	Format	Action
jbhi-corcluster-tou-lyf-1.pdf	9.36 MB	PDF	Open