Modeling Prosodic Phrasing With Multi-Task Learning in Tacotron-Based TTS

Page view(s)

Checked on Sep 05, 2025

Please use this identifier to cite or link to this item: https://oar.a-star.edu.sg/communities-collections/articles/20947

Title:

Modeling Prosodic Phrasing With Multi-Task Learning in Tacotron-Based TTS

Journal Title:

IEEE Signal Processing Letters

DOI:

10.1109/LSP.2020.3016564

Publication URL:

http://dx.doi.org/10.1109/lsp.2020.3016564

Authors:

Rui Liu, Berrak Sisman, Feilong Bao, Guanglai Gao, Haizhou Li

Keywords:

Electrical and Electronic Engineering

Publication Date:

13 August 2020

Citation:

Liu, R., Sisman, B., Bao, F., Gao, G., & Li, H. (2020). Modeling Prosodic Phrasing With Multi-Task Learning in Tacotron-Based TTS. IEEE Signal Processing Letters, 27, 1470–1474. https://doi.org/10.1109/lsp.2020.3016564

Abstract:

Tacotron-based end-to-end speech synthesis has shown remarkable voice quality. However, the rendering of prosody in the synthesized speech remains to be improved, especially for long sentences, where prosodic phrasing errors can occur frequently. In this paper, we extend the Tacotron-based speech synthesis framework to explicitly model the prosodic phrase breaks. We propose a multi-task learning scheme for Tacotron training, that optimizes the system to predict both Mel spectrum and phrase breaks. To our best knowledge, this is the first implementation of multi-task learning for Tacotron based TTS with a prosodic phrasing model. Experiments show that our proposed training scheme consistently improves the voice quality for both Chinese and Mongolian systems.

License type:

Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

Funding Info:

This research / project is supported by the Agency for Science, Technology and Research (A*STAR) - AME Programmatic Funding Scheme
Grant Reference no. : A18A2b0046

This research / project is supported by the National Research Foundation (NRF) Singapore - National Robotics Programme
Grant Reference no. : 192 25 00054

This research / project is supported by the National Research Foundation (NRF) Singapore - AI Singapore Programme
Grant Reference no. : AISG-100E-2018-006, AISG-GC-2019-002

This research / project is supported by the China National Natural Science Foundation - N/A
Grant Reference no. : 61773224

This research / project is supported by the Agency for Science, Technology and Research (A*STAR) - RIE2020 Advanced Manufacturing and Engineering Programme
Grant Reference no. : A1687b0033, A18A2b0046

This research / project is supported by the Singapore University of Technology and Design (SUTD) - AI Grant
Grant Reference no. : PIE-SGP-AI-2020-02

This research / project is supported by the Singapore University of Technology and Design (SUTD) - Start-up Grant Artiﬁcial Intelligence for Human Voice Conversion
Grant Reference no. : SRG ISTD 2020 158

Description:

URI:

https://oar.a-star.edu.sg/communities-collections/articles/20947

ISSN:

1070-9908
1558-2361

Collections:

Institute for Infocomm Research

Files uploaded:

https://ieeexplore.ieee.org/document/9166626