Zhang, M., Sisman, B., Zhao, L., & Li, H. (2020). DeepConversion: Voice conversion with limited parallel training data. Speech Communication, 122, 31–43. https://doi.org/10.1016/j.specom.2020.05.004
Abstract:
A deep neural network approach to voice conversion usually depends on a large amount of parallel training data from source and target speakers. In this paper, we propose a novel conversion pipeline, DeepConversion, that leverages a large amount of non-parallel, multi-speaker data, but requires only a small amount of parallel training data. It is believed that we can represent the shared characteristics of speakers by training a speaker independent general model on a large amount of publicly available, non-parallel, multi-speaker speech data. Such general model can then be used to learn the mapping between source and target speaker more effectively from a limited amount of parallel training data. We also propose a strategy to make full use of the parallel data in all models along the pipeline. In particular, the parallel data is used to adapt the general model towards the source-target speaker pair to achieve a coarse grained conversion, and to develop a compact Error Reduction Network (ERN) for a fine-grained conversion. The parallel data is also used to adapt the WaveNet vocoder towards the source-target pair. The experiments show that DeepConversion that only uses a limited amount of parallel training data, consistently outperforms the traditional approaches that use a large amount of parallel training data, in both objective and subjective evaluations.
License type:
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Funding Info:
This research / project is supported by the National Research Foundation Singapore - AI Singapore Programme
Grant Reference no. : AISG-100E-2018-006, AISG-GC-2019-002
This research / project is supported by the National Research Foundation Singapore - National Robotics Programme
Grant Reference no. : 192 25 00054
This research / project is supported by the Agency for Science, Technology and Research (A*STAR) - Human Robot Collaborative AI for AME
Grant Reference no. : A18A2b0046
This research / project is supported by the Agency for Science, Technology and Research (A*STAR) - Singapore Government’s Research, Innovation and Enterprise 2020 plan in the Advanced Manufacturing and Engineering domain
Grant Reference no. : A1687b0033
This research / project is supported by the SUTD Start-up Grant Artificial Intelligence for Human Voice Conversion - N/A
Grant Reference no. : SRG ISTD 2020 158