DeepConversion: Voice conversion with limited parallel training data

Page view(s)
14
Checked on Dec 06, 2024
DeepConversion: Voice conversion with limited parallel training data
Title:
DeepConversion: Voice conversion with limited parallel training data
Journal Title:
Speech Communication
Publication Date:
04 June 2020
Citation:
Zhang, M., Sisman, B., Zhao, L., & Li, H. (2020). DeepConversion: Voice conversion with limited parallel training data. Speech Communication, 122, 31–43. https://doi.org/10.1016/j.specom.2020.05.004
Abstract:
A deep neural network approach to voice conversion usually depends on a large amount of parallel training data from source and target speakers. In this paper, we propose a novel conversion pipeline, DeepConversion, that leverages a large amount of non-parallel, multi-speaker data, but requires only a small amount of parallel training data. It is believed that we can represent the shared characteristics of speakers by training a speaker independent general model on a large amount of publicly available, non-parallel, multi-speaker speech data. Such general model can then be used to learn the mapping between source and target speaker more effectively from a limited amount of parallel training data. We also propose a strategy to make full use of the parallel data in all models along the pipeline. In particular, the parallel data is used to adapt the general model towards the source-target speaker pair to achieve a coarse grained conversion, and to develop a compact Error Reduction Network (ERN) for a fine-grained conversion. The parallel data is also used to adapt the WaveNet vocoder towards the source-target pair. The experiments show that DeepConversion that only uses a limited amount of parallel training data, consistently outperforms the traditional approaches that use a large amount of parallel training data, in both objective and subjective evaluations.
License type:
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Funding Info:
This research / project is supported by the National Research Foundation Singapore - AI Singapore Programme
Grant Reference no. : AISG-100E-2018-006, AISG-GC-2019-002

This research / project is supported by the National Research Foundation Singapore - National Robotics Programme
Grant Reference no. : 192 25 00054

This research / project is supported by the Agency for Science, Technology and Research (A*STAR) - Human Robot Collaborative AI for AME
Grant Reference no. : A18A2b0046

This research / project is supported by the Agency for Science, Technology and Research (A*STAR) - Singapore Government’s Research, Innovation and Enterprise 2020 plan in the Advanced Manufacturing and Engineering domain
Grant Reference no. : A1687b0033

This research / project is supported by the SUTD Start-up Grant Artificial Intelligence for Human Voice Conversion - N/A
Grant Reference no. : SRG ISTD 2020 158
Description:
ISSN:
0167-6393
Files uploaded:

File Size Format Action
82mingyang.pdf 1.60 MB PDF Open