Liu, R., Sisman, B., Bao, F., Gao, G., & Li, H. (2020). Modeling Prosodic Phrasing With Multi-Task Learning in Tacotron-Based TTS. IEEE Signal Processing Letters, 27, 1470–1474. https://doi.org/10.1109/lsp.2020.3016564
Abstract:
Tacotron-based end-to-end speech synthesis has shown remarkable voice quality. However, the rendering of prosody in the synthesized speech remains to be improved, especially for long sentences, where prosodic phrasing errors can occur frequently. In this paper, we extend the Tacotron-based speech synthesis framework to explicitly model the prosodic phrase breaks. We propose a multi-task learning scheme for Tacotron training, that optimizes the system to predict both Mel spectrum and phrase breaks. To our best knowledge, this is the first implementation of multi-task learning for Tacotron based TTS with a prosodic phrasing model. Experiments show that our proposed training scheme consistently improves the voice quality for both Chinese and Mongolian systems.
License type:
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Funding Info:
This research / project is supported by the Agency for Science, Technology and Research (A*STAR) - AME Programmatic Funding Scheme
Grant Reference no. : A18A2b0046
This research / project is supported by the National Research Foundation (NRF) Singapore - National Robotics Programme
Grant Reference no. : 192 25 00054
This research / project is supported by the National Research Foundation (NRF) Singapore - AI Singapore Programme
Grant Reference no. : AISG-100E-2018-006, AISG-GC-2019-002
This research / project is supported by the China National Natural Science Foundation - N/A
Grant Reference no. : 61773224
This research / project is supported by the Agency for Science, Technology and Research (A*STAR) - RIE2020 Advanced Manufacturing and Engineering Programme
Grant Reference no. : A1687b0033, A18A2b0046
This research / project is supported by the Singapore University of Technology and Design (SUTD) - AI Grant
Grant Reference no. : PIE-SGP-AI-2020-02
This research / project is supported by the Singapore University of Technology and Design (SUTD) - Start-up Grant Artificial Intelligence for Human Voice Conversion
Grant Reference no. : SRG ISTD 2020 158