Liu, T., Zhang, L., Das, R. K., Ma, Y., Tao, R., & Li, H. (2024). How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio? Proceedings of Interspeech 2024, 1105–1109. https://doi.org/10.21437/interspeech.2024-2009
Abstract:
Partially manipulating a sentence can greatly change its meaning. Recent work shows that countermeasures (CMs) trained on partially spoofed audio can effectively detect such spoofing. However, the current understanding of the decision-making process of CMs is limited. We utilize Grad-CAM and introduce a quantitative analysis metric to interpret CMs' decisions. We find that CMs prioritize the artifacts of transition regions created when concatenating bona fide and spoofed audio. This focus differs from that of CMs trained on fully spoofed audio, which concentrate on the pattern differences between bona fide and spoofed parts. Our further investigation explains the varying nature of CMs' focus while making correct or incorrect predictions. These insights provide a basis for the design of CM models and the creation of datasets. Moreover, this work lays a foundation of interpretability in the field of partial spoofed audio detection that has not been well explored previously.
License type:
Publisher Copyright
Funding Info:
This research / project is supported by the National Research Foundation, Prime Minister’s Office, Singapore, and the Ministry of Communications and Information - Online Trust and Safety (OTS) Re-search Programme
Grant Reference no. : MCI-OTS-001
This research / project is supported by the Shenzhen Science and Technology Research Fund - Fundamental Research Key Project Grant
Grant Reference no. : JCYJ20220818103001002
This research / project is supported by the Shenzhen Science and Technology Program - N/A
Grant Reference no. : ZDSYS20230626091302006