Unsupervised Mining of Acoustic Subword Units with Segment-Level Gaussian Posteriorgrams

Page view(s)

Checked on Apr 29, 2025

Please use this identifier to cite or link to this item: https://oar.a-star.edu.sg/communities-collections/articles/11440

Title:

Unsupervised Mining of Acoustic Subword Units with Segment-Level Gaussian Posteriorgrams

Journal Title:

DOI:

Publication URL:

http://www.isca-speech.org/iscaweb/index.php/archive/online-archive

Authors:

Haipeng Wang, Tan Lee, Cheung-Chi Leung, Bin Ma, Haizhou Li

Keywords:

Computing Science

Publication Date:

25 August 2013

Citation:

Abstract:

We consider the problem of unsupervised acoustic unit mining from unlabeled speech data. One typical method involves two steps: unsupervised segmentation and segment clustering. This paper proposes to improve segment clustering with segment-level Gaussian posteriorgram representation, which is generated by averaging the frame-level Gaussian posterior probabilities within each segment. Stacking together the segment-level Gaussian posteriorgrams of all the speech data, a Gaussian-by-segment data matrix is constructed. Given the Gaussian-by-segment matrix, we have the flexility to cluster either the Gaussian components or the segments into different acoustic unit categories. We have investigated both normalized cut and non-negative matrix factorization approaches on the data matrix for the clustering purpose. We carried out experiments to measure the quality of the clustering results with reference to manual phoneme labels. Experimental results show that the proposed methods consistently outperform a traditional vector quantization method and a Gaussian mixture model labeling method.

License type:

PublisherCopyrights

Funding Info:

Description:

URI:

https://oar.a-star.edu.sg/communities-collections/articles/11440

ISBN:

Collections:

Institute for Infocomm Research

Files uploaded:

Manuscripts in This Item:

File	Size	Format	Action
is2013-hpwang.pdf	87.17 KB	PDF	Open