Unsupervised Mining of Acoustic Subword Units with Segment-Level Gaussian Posteriorgrams

Unsupervised Mining of Acoustic Subword Units with Segment-Level Gaussian Posteriorgrams
Title:
Unsupervised Mining of Acoustic Subword Units with Segment-Level Gaussian Posteriorgrams
Other Titles:
DOI:
Publication Date:
25 August 2013
Citation:
Abstract:
We consider the problem of unsupervised acoustic unit mining from unlabeled speech data. One typical method involves two steps: unsupervised segmentation and segment clustering. This paper proposes to improve segment clustering with segment-level Gaussian posteriorgram representation, which is generated by averaging the frame-level Gaussian posterior probabilities within each segment. Stacking together the segment-level Gaussian posteriorgrams of all the speech data, a Gaussian-by-segment data matrix is constructed. Given the Gaussian-by-segment matrix, we have the flexility to cluster either the Gaussian components or the segments into different acoustic unit categories. We have investigated both normalized cut and non-negative matrix factorization approaches on the data matrix for the clustering purpose. We carried out experiments to measure the quality of the clustering results with reference to manual phoneme labels. Experimental results show that the proposed methods consistently outperform a traditional vector quantization method and a Gaussian mixture model labeling method.
License type:
PublisherCopyrights
Funding Info:
Description:
ISBN:

Files uploaded:

File Size Format Action
is2013-hpwang.pdf 87.17 KB PDF Open