Sanitized clustering against confounding bias

Page view(s)
25
Checked on Feb 15, 2025
Sanitized clustering against confounding bias
Title:
Sanitized clustering against confounding bias
Journal Title:
Machine Learning
Keywords:
Publication Date:
27 December 2023
Citation:
Yao, Y., Pan, Y., Li, J., Tsang, I. W., & Yao, X. (2023). Sanitized clustering against confounding bias. Machine Learning. https://doi.org/10.1007/s10994-023-06451-5
Abstract:
AbstractReal-world datasets inevitably contain biases that arise from different sources or conditions during data collection. Consequently, such inconsistency itself acts as a confounding factor that disturbs the cluster analysis. Existing methods eliminate the biases by projecting data onto the orthogonal complement of the subspace expanded by the confounding factor before clustering. Therein, the interested clustering factor and the confounding factor are coarsely considered in the raw feature space, where the correlation between the data and the confounding factor is ideally assumed to be linear for convenient solutions. These approaches are thus limited in scope as the data in real applications is usually complex and non-linearly correlated with the confounding factor. This paper presents a new clustering framework named Sanitized Clustering Against confounding Bias, which removes the confounding factor in the semantic latent space of complex data through a non-linear dependence measure. To be specific, we eliminate the bias information in the latent space by minimizing the mutual information between the confounding factor and the latent representation delivered by variational auto-encoder. Meanwhile, a clustering module is introduced to cluster over the purified latent representations. Extensive experiments on complex datasets demonstrate that our SCAB achieves a significant gain in clustering performance by removing the confounding bias.
License type:
Publisher Copyright
Funding Info:
This research / project is supported by the A*STAR - AI Singapore Material Design Grand Challenge
Grant Reference no. : AISG2-GC-2023-010

This research / project is supported by the A*STAR - Career Development Fund
Grant Reference no. : C222812019

This research / project is supported by the A*STAR - Career Development Award/Fund (A*I)
Grant Reference no. : 232D800027

Program for Guangdong Introducing Innovative and Entrepreneurial Teams (Grant No. 2017ZT07X386); and in part by the Program for Guangdong Provincial Key Laboratory (Grant No. 2020B121201001).
Description:
This version of the article has been accepted for publication, after peer review and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: http://dx.doi.org/10.1007/s10994-023-06451-5
ISSN:
0885-6125
1573-0565
Files uploaded:

File Size Format Action
s10994-023-06451-5pdf-safe.pdf 1.03 MB PDF Open