Extracting Bottleneck Features and Word-Like Pairs from Untranscribed Speech for Feature Representation

Extracting Bottleneck Features and Word-Like Pairs from Untranscribed Speech for Feature Representation
Title:
Extracting Bottleneck Features and Word-Like Pairs from Untranscribed Speech for Feature Representation
Other Titles:
ASRU 2017
DOI:
Publication Date:
16 December 2017
Citation:
Y. Yuan, C.-C. Leung, L. Xie, H. Chen, B. Ma, and H. Li, "Extracting Bottleneck Features and Word-Like Pairs from Untranscribed Speech for Feature Representation," in Proc. ASRU, 2017, pp. 734-739.
Abstract:
We propose a framework to learn a frame-level speech representation in a scenario where no manual transcription is available. Our framework is based on pairwise learning using bottleneck features (BNFs). Initial frame-level features are extracted from a bottleneck-shaped multilingual deep neural network (DNN) which is trained with unsupervised phoneme-like labels. Word-like pairs are discovered in the untranscribed speech using the initial features, and frame alignment is performed on each word-like speech pair. The matching frame pairs are used as input-output to train another DNN with the mean square error (MSE) loss function. The final frame-level features are extracted from an internal hidden layer of MSE-based DNN. Our pairwise learned feature representation is evaluated on the ZeroSpeech 2017 challenge. The experiments show that pairwise learning improves phoneme discrimination in 10s and 120s test conditions. We find that it is important to use BNFs as initial features when pairwise learning is performed. With more word pairs obtained from the Switchboard corpus and its manual transcription, the phoneme discrimination of three languages in the evaluation data can further be improved despite data mismatch.
License type:
PublisherCopyrights
Funding Info:
National Natural Science Foundation of China (Grant No. 61571363) and China Scholarship Council (Grant No. 201706290169).
Description:
ISBN:

Files uploaded:

File Size Format Action
asru2017-yyg.pdf 222.77 KB PDF Open