Disentangling Voice and Content with Self-Supervision for Speaker Recognition

Page view(s)

Checked on

Please use this identifier to cite or link to this item: https://oar.a-star.edu.sg/communities-collections/articles/19767

Title:

Disentangling Voice and Content with Self-Supervision for Speaker Recognition

Journal Title:

Thirty-seventh Conference on Neural Information Processing Systems

DOI:

Publication URL:

https://proceedings.neurips.cc/paper_files/paper/2023/hash/9d276b0a087efdd2404f3295b26c24c1-Abstract-Conference.html

Authors:

Tianchi Liu, Kong Aik Lee, Qiongqiong Wang, Haizhou Li

Keywords:

Publication Date:

22 September 2023

Citation:

Liu, Tianchi, et al. "Disentangling voice and content with self-supervision for speaker recognition" Advances in Neural Information Processing Systems 36 (2024).

Abstract:

For speaker recognition, it is difficult to extract an accurate speaker representation from speech because of its mixture of speaker traits and content. This paper proposes a disentanglement framework that simultaneously models speaker traits and content variability in speech. It is realized with the use of three Gaussian inference layers, each consisting of a learnable transition model that extracts distinct speech components. Notably, a strengthened transition model is specifically designed to model complex speech dynamics. We also propose a self-supervision method to dynamically disentangle content without the use of labels other than speaker identities. The efficacy of the proposed framework is validated via experiments conducted on the VoxCeleb and SITW datasets with 9.56\% and 8.24\% average reductions in EER and minDCF, respectively. Since neither additional model training nor data is specifically needed, it is easily applicable in practical use.

License type:

Publisher Copyright

Funding Info:

This research / project is supported by the Agency for Science, Technology and Research (A*STAR) - Council Research Fund
Grant Reference no. : CR-2021-005

This research / project is supported by the Agency for Science, Technology and Research (A*STAR) - Advanced Manufacturing and Engineering (AME) Programmatic Fund
Grant Reference no. : A18A2b0046

Description:

URI:

https://oar.a-star.edu.sg/communities-collections/articles/19767

ISBN:

unknown

Collections:

Institute for Infocomm Research

Files uploaded:

Manuscripts in This Item:

File	Size	Format	Action
2023-nips-disentangling-voice-and-content-with-self-supervision-for-speaker-recognition.pdf	5.38 MB	PDF	Open