On the Study of Generative Adversarial Networks for Cross-Lingual Voice Conversion

Page view(s)

Checked on Aug 10, 2025

Please use this identifier to cite or link to this item: https://oar.a-star.edu.sg/communities-collections/articles/16583

Title:

On the Study of Generative Adversarial Networks for Cross-Lingual Voice Conversion

Journal Title:

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

DOI:

10.1109/ASRU46091.2019.9003939

Publication URL:

https://doi.org/10.1109/ASRU46091.2019.9003939

Authors:

Berrak Sisman, MIngyang Zhang, Minghui Dong, Haizhou Li

Keywords:

cross-lingual voice conversion, generative models, variational autoencoders, generative adversarial networks

Publication Date:

20 February 2020

Citation:

B. Sisman, M. Zhang, M. Dong and H. Li, "On the Study of Generative Adversarial Networks for Cross-Lingual Voice Conversion," 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), SG, Singapore, 2019, pp. 144-151, doi: 10.1109/ASRU46091.2019.9003939.

Abstract:

Cross-lingual voice conversion (VC) aims to convert the source speaker's voice to sound like that of the target speaker, when the source and target speakers speak different languages. In this paper, we propose to use Generative Adversarial Networks (GANs) for cross-lingual voice-conversion. We further the studies on Variational Autoencoding Wasserstein GAN (VAW-GAN) and cycle-consistent adversarial network (CycleGAN), that are known to be effective for mono-lingual voice conversion. As cross-lingual voice conversion needs to converts the voice across different phonetic system, it is more challenging than mono-lingual voice conversion. By using VAW-GAN and CycleGAN, we successfully convert the speaker identity while carrying over the source speaker's linguistic content. The proposed idea is unique in the sense that it neither relies on bilingual data and their alignment, nor any external process, such as ASR. Moreover, it works with limited amount of training data of any two languages. To our best knowledge, this is the first comprehensive study of Generative Adversarial Networks in cross-lingual voice conversion. In the experiments, we achieve high-quality converted voice, that performs equally well or better than mono-lingual voice conversion.

License type:

Publisher Copyright

Funding Info:

This research is supported by Programmatic grant from the Singapore Governments Research, Innovation and Enterprise 2020 plan (Advanced Manufacturing and Engineering domain).

This research / project is supported by the National Research Foundation, Singapore - AI Singapore Programme
Grant Reference no. : AISG-100E-2018-006

Description:

© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

URI:

https://oar.a-star.edu.sg/communities-collections/articles/16583

ISBN:

978-1-7281-0306-8
978-1-7281-0305-1
978-1-7281-0307-5

Collections:

Institute for Infocomm Research

Files uploaded:

Manuscripts in This Item:

File	Size	Format	Action
asru2019.pdf	377.64 KB	PDF	Open