Zhang, H., Wang, L., Lee, K. A., Liu, M., Dang, J., &amp; Chen, H. (2022). Learning Domain-Invariant Transformation for Speaker Verification. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp43922.2022.9747514
Automatic speaker verification (ASV) faces domain shift caused by the mismatch of speaker-independent information such as recording device and speaking style in real-world applications, which leads to unsatisfactory performance. To this end, we propose the meta generalized transformation via meta-learning to build a domain-invariant embedding space. Specifically, the transformation module is motivated to learn the domain generalization knowledge by executing meta-optimization on the meta-train and meta-test sets which are adopted for simulating domain shift. Furthermore, distribution optimization is incorporated to supervise the metric structure of embeddings. In terms of the transformation module, we investigate various instantiations and observe the multilayer perceptron with gating (gMLP) is most effective due to its extrapolation capability. The experimental results on cross-genre and cross-dataset issues demonstrate that the meta generalized transformation dramatically improves the robustness of ASV systems to domain shift, while outperforms the state-of-the-art methods.
There was no specific funding for the research done