Deep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion

Page view(s)

Checked on Sep 12, 2025

Please use this identifier to cite or link to this item: https://oar.a-star.edu.sg/communities-collections/articles/13371

Title:

Deep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion

Journal Title:

Interspeech 2016

DOI:

10.21437/Interspeech.2016-1053

Publication URL:

http://dx.doi.org/10.21437/Interspeech.2016-1053

Authors:

Huaiping Ming, Dongyan Huang, Lei Xie, Jie Wu, Minghui Dong, Haizhou Li

Keywords:

Publication Date:

07 May 2021

Citation:

Ming, H., Huang, D., Xie, L., Wu, J., Dong, M., Li, H. (2016) Deep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion. Proc. Interspeech 2016, 2453-2457.

Abstract:

Emotional voice conversion aims at converting speech from one emotion state to another. This paper proposes to model timbre and prosody features using a deep bidirectional long short-term memory (DBLSTM) for emotional voice conversion. A continuous wavelet transform~(CWT) representation of fundamental frequency (F0) and energy contour are used for prosody modeling. Specifically, we use CWT to decompose F0 into a five-scale representation, and decompose energy contour into a ten-scale representation, where each feature scale corresponds to a temporal scale. Both spectrum and prosody (F0 and energy contour) features are simultaneously converted by a sequence to sequence conversion method with DBLSTM model, which captures both frame-wise and long-range relationship between source and target voice. The converted speech signals are evaluated both objectively and subjectively, which confirms the effectiveness of the proposed method.

License type:

PublisherCopyrights

Funding Info:

Description:

URI:

https://oar.a-star.edu.sg/communities-collections/articles/13371

ISBN:

Collections:

Institute for Infocomm Research

Files uploaded:

Manuscripts in This Item:

File	Size	Format	Action
deep-bidirectional-lstm-modeling-of-timbre-and-prosody-for-emotional-voice-conversion.pdf	593.02 KB	PDF	Open