Open-Set Audio Classification with Limited Training Resources Based on Augmentation Enhanced Variational Auto-Encoder GAN with Detection-Classification Joint Training
Page view(s)
44
Checked on Jan 11, 2025
Open-Set Audio Classification with Limited Training Resources Based on Augmentation Enhanced Variational Auto-Encoder GAN with Detection-Classification Joint Training
Open-Set Audio Classification with Limited Training Resources Based on Augmentation Enhanced Variational Auto-Encoder GAN with Detection-Classification Joint Training
Teh, K. K., Tran, H. D. (2021). Open-Set Audio Classification with Limited Training Resources Based on Augmentation Enhanced Variational Auto-Encoder GAN with Detection-Classification Joint Training. Interspeech 2021. doi:10.21437/interspeech.2021-1142
Abstract:
In this paper, we propose a novel method to address practical problems when deploying audio classification systems in operations that are the presence of unseen sound classes (openset) and the limitation of training resources. To solve it, a novel method which embeds variational auto-encoder (VAE), data augmentation and detection-classification joint training into conventional GAN networks is proposed. The VAE input to GAN-generator helps to generate realistic outlier samples which are not too far from in-distribution class and hence improve the open-set discrimination capabilities of classifiers. Next, the augmentation enhanced GAN scheme developed in our previous work [4] for close-set audio classification, will help to address the limited training resources by in cooperating the physical data augmentation to work together with traditional GAN produced samples to prevent overfitting and improve the optimization convergences. The detection-classification joint training further steps on advantages of VAE and Augmentation GAN to further improving the performances of detection and classification tasks. The experiments carried out on Google Speech Command database show great improvements of openset classification accuracy from 62.41% to 88.29% when using only 10% amount of training data.
License type:
Publisher Copyright
Funding Info:
There was no specific funding for the research done