Multi-Tone Phase Coding of Interaural Time Difference for Sound Source Localization With Spiking Neural Networks

Page view(s)
52
Checked on Nov 09, 2024
Multi-Tone Phase Coding of Interaural Time Difference for Sound Source Localization With Spiking Neural Networks
Title:
Multi-Tone Phase Coding of Interaural Time Difference for Sound Source Localization With Spiking Neural Networks
Journal Title:
IEEE/ACM Transactions on Audio, Speech, and Language Processing
Publication Date:
29 July 2021
Citation:
Pan, Z., Zhang, M., Wu, J., Wang, J., Li, H. (2021). Multi-Tone Phase Coding of Interaural Time Difference for Sound Source Localization With Spiking Neural Networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 2656–2670. doi:10.1109/taslp.2021.3100684
Abstract:
Mammals exhibit remarkable capability of detecting and localizing sound sources in complex acoustic environments by using binaural cues in the spiking manner. Emulating the auditory process for sound source localization (SSL) by mammals, we propose a computational model for accurate and robust SSL under the neuromorphic spiking neural network (SNN) framework. The center of this model is a Multi-Tone Phase Coding (MTPC) scheme, which encodes the interaural time difference (ITD) between binaural pure tones into discriminative spike patterns that can be directly classified by SNNs. As such, SSL can be implemented as an event-driven task on highly efficient, neuromorphic parallel processors. We evaluate the proposed computational model on a directional audio dataset recorded from a microphone array in a realistic acoustic environment with background noise, obstruction, reflection, and other interferences. We report superior localization capability with a mean absolute error (MAE) of 1.02∘ or 100% classification accuracy with an angle resolution of 5∘ , which surpasses other SNN-based biologically plausible neuromorphic approaches by a relatively large margin and on par with human performance in similar tasks. This study opens up many application opportunities in human-robot interaction where energy efficiency is crucial. As a case study, we successfully deploy the proposed SSL system in a robotic platform to track the speaker and orient the robot's attention.
License type:
Publisher Copyright
Funding Info:
This research / project is supported by the Research, Innovation and Enterprise 2020 Plan - Advanced Manufacturing and Engineering domain
Grant Reference no. : A1687b0033 and I2001E0053

This research / project is supported by the Agency for Science, Technology and Research - National Robotics Program
Grant Reference no. : 192 25 00054

The work of J. Wu was also partially supported by the Zhejiang Lab (No.2019KC0AB02). The work of M. Zhang was aslo partially supported by the China Postdoctoral Science Foundation under Grant No. 495 2020M680148, Zhejiang Lab’s International Talent Found for Young Professionals, and National Key R&D Program of China under Grant No. 2018AAA0100202.
Description:
© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
ISSN:
2329-9290
2329-9304
Files uploaded:

File Size Format Action
taslp-submission-copy.pdf 4.04 MB PDF Open