H. Chen, C.-C. Leung, L. Xie, B. Ma, and H. Li, “Multitask Feature Learning for Low-Resource Query-by-Example Spoken Term Detection,” IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 8, 2017.
We propose a novel technique that learns a low-dimensional feature representation from unlabeled data of a target language, and labeled data from a non-target language. The technique is studied as a solution to query-by-example spoken term detection for a low-resource language. We extract low-dimensional features from a bottle-neck layer of a multi-task deep neural network, which is jointly trained with speech data from the low-resource target language and resource-rich non-target language. The proposed feature learning technique aims to extract acoustic features that offer phonetic discriminability. It explores a new way of leveraging cross-lingual speech data to overcome the resource limitation in the target language. We conduct query-by-example spoken term detection (QbE-STD) experiments using the dynamic time warping distance of the multi-task bottle-neck features between the query and the search database. The QbE-STD process does not rely on an automatic speech recognition pipeline of the target language. We validate the effectiveness of multi-task feature learning through a series of comparative experiments.
The Chinese Scholarship Council (No. 201606291069) and the National Natural Science Foundation of China (No. 61571363).
(c) 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.