We consider the problem of unsupervised acoustic unit mining from unlabeled speech data. One typical method involves two steps: unsupervised segmentation and segment clustering. This paper proposes to improve segment clustering with segment-level Gaussian posteriorgram representation, which is generated by averaging the frame-level Gaussian posterior probabilities within each segment. Stacking together the segment-level
Gaussian posteriorgrams of all the speech data, a Gaussian-by-segment data matrix is constructed. Given the Gaussian-by-segment matrix, we have the flexility to cluster either the Gaussian components or the segments into different acoustic unit categories. We have investigated both normalized cut and non-negative matrix factorization approaches on the data matrix for the clustering purpose. We carried out experiments to measure the quality of the clustering results with reference to manual phoneme labels. Experimental results show that the proposed methods consistently outperform a traditional vector quantization method and a Gaussian mixture model labeling method.