Implementing Prosodic Phrasing in Chinese End-to-end Speech Synthesis

Implementing Prosodic Phrasing in Chinese End-to-end Speech Synthesis
Title:
Implementing Prosodic Phrasing in Chinese End-to-end Speech Synthesis
Other Titles:
2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Publication Date:
12 May 2019
Citation:
Y. Lu, M. Dong and Y. Chen, "Implementing Prosodic Phrasing in Chinese End-to-end Speech Synthesis," ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 7050-7054. doi: 10.1109/ICASSP.2019.8682368
Abstract:
Text-to-Speech (TTS) systems have been evolving rapidly in recent years. With the great modelling power of deep neural networks, researchers have achieved end-to-end conversion from raw text to speech. It has been shown by various research projects that end-to-end TTS systems are able to generate speech that sounds akin to human voice for English and other languages. However, for languages like Chinese, there are two problems to deal with. Firstly, due to the large character set, a small input set comparable to the English character set is needed for the end-to-end solution. Secondly, there are serious prosodic phrasing mistakes when the end-to-end method is applied to Chinese. In this paper, we will propose a solution for an end-to-end Chinese TTS system on the basis of Tacotron 2 and Wavenet vocoder. We will then add extra contextual information to improve the performance of prosodic phrasing. Our experiments have demonstrated the effectiveness of this proposal.
License type:
PublisherCopyrights
Funding Info:
National Science Foundation of China, approval number 61573187
Description:
(c) 2019 IEEE.
ISSN:
2379-190X
1520-6149
Files uploaded:

File Size Format Action
tacotronenhance.pdf 1.39 MB PDF Open