Exploiting Morphological and Phonological Features to Improve Prosodic Phrasing for Mongolian Speech Synthesis

Page view(s)
2
Checked on Nov 21, 2024
Exploiting Morphological and Phonological Features to Improve Prosodic Phrasing for Mongolian Speech Synthesis
Title:
Exploiting Morphological and Phonological Features to Improve Prosodic Phrasing for Mongolian Speech Synthesis
Journal Title:
IEEE/ACM Transactions on Audio, Speech, and Language Processing
Publication Date:
25 November 2020
Citation:
Liu, R., Sisman, B., Bao, F., Yang, J., Gao, G., & Li, H. (2021). Exploiting Morphological and Phonological Features to Improve Prosodic Phrasing for Mongolian Speech Synthesis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 274–285. https://doi.org/10.1109/taslp.2020.3040523
Abstract:
Prosodic phrasing is an important factor that affects naturalness and intelligibility in text-to-speech synthesis. Studies show that deep learning techniques improve prosodic phrasing when large text and speech corpus are available. However, for low-resource languages, such as Mongolian, prosodic phrasing remains a challenge for various reasons. First, the database suitable for system training is limited. Second, word composition knowledge that is prosody-informing has not been used in prosodic phrase modeling. To address these problems, in this article, we propose a feature augmentation method in conjunction with a self-attention neural classifier. We augment input text with morphological and phonological decompositions of words to enhance the text encoder. We study the use of self-attention classifier, that makes use of global context of a sentence, as a decoder for phrase break prediction. Both objective and subjective evaluations validate the effectiveness of the proposed phrase break prediction framework, that consistently.
License type:
Attribution 4.0 International (CC BY 4.0)
Funding Info:
This research / project is supported by the National Research Foundation Singapore - AI Singapore Programme
Grant Reference no. : AISG-GC-2019-002, AISG-100E-2018-006

This research / project is supported by the Agency for Science, Technology and Research (A*STAR) - National Robotics Programme
Grant Reference no. : 192 25 00054

This research / project is supported by the Agency for Science, Technology and Research (A*STAR) - RIE2020 Advanced Manufacturing and Engineering Programmatic Grant
Grant Reference no. : A18A2b0046, A1687b0033

This research / project is supported by the NA - National Key Research and Development Project
Grant Reference no. : 2018YFE0122900

This research / project is supported by the National Natural Science Foundation of China - NA
Grant Reference no. : 61773224, 62066033

This research / project is supported by the National Natural Science Foundation of Inner Mongolia - NA
Grant Reference no. : 2018MS06006

This research / project is supported by the Achievements Transformation Project of Inner Mongolia Autonomous Region - N/A
Grant Reference no. : CGZH2018125

This research / project is supported by the SUTD Startup Grant Artificial Intelligence for Human Voice Conversion - N/A
Grant Reference no. : SRG ISTD2020 158

This research / project is supported by the SUTD - SUTD AI Grant - The Understanding and Synthesis of Expressive Speech by AI
Grant Reference no. : N/A
Description:
© 2024 IEEE.  Personal use of this material is permitted.  Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
ISSN:
2329-9290
2329-9304