An Unsupervised Deep Learning System for Acoustic Scene Analysis

An Unsupervised Deep Learning System for Acoustic Scene Analysis
An Unsupervised Deep Learning System for Acoustic Scene Analysis
Other Titles:
Applied Sciences
Publication Date:
19 March 2020
Wang, M.; Zhang, X.-L.; Rahardja, S. An Unsupervised Deep Learning System for Acoustic Scene Analysis. Appl. Sci. 2020, 10, 2076.
Acoustic scene analysis has attracted a lot of attention recently. Existing methods are mostly supervised, which requires well-predefined acoustic scene categories and accurate labels. In practice, there exists a large amount of unlabeled audio data, but labeling large-scale data is not only costly but also time-consuming. Unsupervised acoustic scene analysis on the other hand does not require manual labeling but is known to have significantly lower performance and therefore has not been well explored. In this paper, a new unsupervised method based on deep auto-encoder networks and spectral clustering is proposed. It first extracts a bottleneck feature from the original acoustic feature of audio clips by an auto-encoder network, and then employs spectral clustering to further reduce the noise and unrelated information in the bottleneck feature. Finally, it conducts hierarchical clustering on the low-dimensional output of the spectral clustering. To fully utilize the spatial information of stereo audio, we further apply the binaural representation and conduct joint clustering on that. To the best of our knowledge, this is the first time that a binaural representation is being used in unsupervised learning. Experimental results show that the proposed method outperforms the state-of-the-art competing methods.
License type:
Funding Info:
This research was funded in part by the Project of the Science, Technology, and Innovation Commission of Shenzhen Municipality under grant number JCYJ20170815161820095, and by the Innovation Foundation for Doctor Dissertation of Northwestern Polytechnical University.
Files uploaded:

File Size Format Action
i2r-paper-01-published-version.pdf 853.58 KB PDF Open