Liu, R., Sisman, B., Bao, F., Yang, J., Gao, G., & Li, H. (2021). Exploiting Morphological and Phonological Features to Improve Prosodic Phrasing for Mongolian Speech Synthesis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 274–285. https://doi.org/10.1109/taslp.2020.3040523
Abstract:
Prosodic phrasing is an important factor that affects naturalness and intelligibility in text-to-speech synthesis. Studies show that deep learning techniques improve prosodic phrasing when large text and speech corpus are available. However, for low-resource languages, such as Mongolian, prosodic phrasing
remains a challenge for various reasons. First, the database suitable for system training is limited. Second, word composition knowledge that is prosody-informing has not been used in prosodic phrase modeling. To address these problems, in this article, we propose a feature augmentation method in conjunction with a self-attention neural classifier. We augment input text with morphological and phonological decompositions of words to enhance the text encoder. We study the use of self-attention classifier, that makes use of global
context of a sentence, as a decoder for phrase break prediction. Both objective and subjective evaluations validate the effectiveness of the proposed phrase break prediction framework, that consistently.
License type:
Attribution 4.0 International (CC BY 4.0)
Funding Info:
This research / project is supported by the National Research Foundation Singapore - AI Singapore Programme
Grant Reference no. : AISG-GC-2019-002, AISG-100E-2018-006
This research / project is supported by the Agency for Science, Technology and Research (A*STAR) - National Robotics Programme
Grant Reference no. : 192 25 00054
This research / project is supported by the Agency for Science, Technology and Research (A*STAR) - RIE2020 Advanced Manufacturing and Engineering Programmatic Grant
Grant Reference no. : A18A2b0046, A1687b0033
This research / project is supported by the NA - National Key Research and Development Project
Grant Reference no. : 2018YFE0122900
This research / project is supported by the National Natural Science Foundation of China - NA
Grant Reference no. : 61773224, 62066033
This research / project is supported by the National Natural Science Foundation of Inner Mongolia - NA
Grant Reference no. : 2018MS06006
This research / project is supported by the Achievements Transformation Project of Inner Mongolia Autonomous Region - N/A
Grant Reference no. : CGZH2018125
This research / project is supported by the SUTD Startup Grant Artificial Intelligence for Human Voice Conversion - N/A
Grant Reference no. : SRG ISTD2020 158
This research / project is supported by the SUTD - SUTD AI Grant - The Understanding and Synthesis of Expressive Speech by AI
Grant Reference no. : N/A