Learning Domain-Invariant Transformation for Speaker Verification

Page view(s)
157
Checked on Nov 08, 2024
Learning Domain-Invariant Transformation for Speaker Verification
Title:
Learning Domain-Invariant Transformation for Speaker Verification
Journal Title:
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Publication Date:
27 April 2022
Citation:
Zhang, H., Wang, L., Lee, K. A., Liu, M., Dang, J., & Chen, H. (2022). Learning Domain-Invariant Transformation for Speaker Verification. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp43922.2022.9747514
Abstract:
Automatic speaker verification (ASV) faces domain shift caused by the mismatch of speaker-independent information such as recording device and speaking style in real-world applications, which leads to unsatisfactory performance. To this end, we propose the meta generalized transformation via meta-learning to build a domain-invariant embedding space. Specifically, the transformation module is motivated to learn the domain generalization knowledge by executing meta-optimization on the meta-train and meta-test sets which are adopted for simulating domain shift. Furthermore, distribution optimization is incorporated to supervise the metric structure of embeddings. In terms of the transformation module, we investigate various instantiations and observe the multilayer perceptron with gating (gMLP) is most effective due to its extrapolation capability. The experimental results on cross-genre and cross-dataset issues demonstrate that the meta generalized transformation dramatically improves the robustness of ASV systems to domain shift, while outperforms the state-of-the-art methods.
License type:
Publisher Copyright
Funding Info:
There was no specific funding for the research done
Description:
© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
ISBN:
978-1-6654-0541-6
Files uploaded: