Joint Feature Enhancement and Speaker Recognition with Multi-Objective Task-Oriented Network

Page view(s)

Checked on Aug 10, 2025

Please use this identifier to cite or link to this item: https://oar.a-star.edu.sg/communities-collections/articles/17885

Title:

Joint Feature Enhancement and Speaker Recognition with Multi-Objective Task-Oriented Network

Journal Title:

Interspeech 2021

DOI:

10.21437/Interspeech.2021-1978

Publication URL:

https://doi.org/10.21437/interspeech.2021-1978

Authors:

Yibo Wu, LONGBIAO WANG, Kong Aik Lee, MENG LIU, JIANWU DANG

Keywords:

Voice Biometrics, Far-field speaker recognition, Neural speech enhancement

Publication Date:

30 August 2021

Citation:

Wu, Y., Wang, L., Lee, K. A., Liu, M., Dang, J. (2021). Joint Feature Enhancement and Speaker Recognition with Multi-Objective Task-Oriented Network. Interspeech 2021. doi:10.21437/interspeech.2021-1978

Abstract:

Recently, increasing attention has been paid to the joint training of upstream and downstream tasks, and to address the challenge of how to synchronize various loss functions in a multi-objective scenario. In this paper, to address the competing gradient directions between the speaker classification loss and the feature enhancement loss, we propose an asynchronous subregion optimization approach for the joint training of feature enhancement and speaker embedding neural networks. For the asynchronous subregion optimization, the squeeze and excitation (SE) method is introduced in the enhancement network to adaptively select important channels for speaker embedding. Furthermore, channel-wise feature concatenation is applied between the input feature and the enhanced feature to address the distortion of speaker information that is caused by enhancement loss. By using the proposed joint training network with asynchronous subregion optimization and channel-wise feature concatenation, we obtained relative gains of 11.95% and 6.43% in equal error rate on a noisy version of Voxceleb1 and VOiCES corpus, respectively.

License type:

Publisher Copyright

Funding Info:

This work was supported by the National Key R&D Program of China under Grant 2018YFB1305200, the National Natural Science Foundation of China under Grant 61771333 and the Tianjin Municipal Science and Technology Project under Grant 18ZXZNGX00330.

Description:

URI:

https://oar.a-star.edu.sg/communities-collections/articles/17885

ISSN:

1990-9772

Collections:

Institute for Infocomm Research

Files uploaded:

Manuscripts in This Item:

File	Size	Format	Action
wu21c-interspeech.pdf	534.24 KB	PDF	Open