Deep Spectro-temporal Artifacts for Detecting Synthesized Speech

Page view(s)

Checked on Aug 30, 2025

Please use this identifier to cite or link to this item: https://oar.a-star.edu.sg/communities-collections/articles/18694

Title:

Deep Spectro-temporal Artifacts for Detecting Synthesized Speech

Journal Title:

Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia

DOI:

10.1145/3552466.3556527

Publication URL:

http://dx.doi.org/10.1145/3552466.3556527

Authors:

Xiaohui Liu, MENG LIU, Lin Zhang, Linjuan Zhang, Chang Zeng, Kai Li, Nan Li, Kong Aik Lee, LONGBIAO WANG, JIANWU DANG

Keywords:

Publication Date:

10 October 2022

Citation:

Liu, X., Liu, M., Zhang, L., Zhang, L., Zeng, C., Li, K., Li, N., Lee, K. A., Wang, L., & Dang, J. (2022). Deep Spectro-temporal Artifacts for Detecting Synthesized Speech. Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia. https://doi.org/10.1145/3552466.3556527

Abstract:

The Audio Deep Synthesis Detection (ADD) Challenge has been held to detect generated human-like speech. With our submitted system, this paper provides an overall assessment of track 1 (Low quality Fake Audio Detection) and track 2 (Partially Fake Audio Detection). In this paper, spectro-temporal artifacts were detected using raw temporal signals, spectral features, as well as deep embedding features. To address track 1, low-quality data augmentation, domain adaptation via finetuning, and various complementary feature information fusion were aggregated in our system. Furthermore, we analyzed the clustering characteristics of subsystems with different features by visualization method and explained the effectiveness of our proposed greedy fusion strategy. As for track 2, frame transition and smoothing were detected using self-supervised learning structure to capture the manipulation of PF attacks in the time domain. We ranked 4th and 5th in track 1 and track 2, respectively.

License type:

Publisher Copyright

Funding Info:

This research is supported by core funding from: Council Research Fund (CRF)
Grant Reference no. : CR-2021-005

This work was supported by the National Natural Science Foundation of China under Grant 62176182, JST CREST Grants (JPMJCR18A6, JPMJCR20D3 and JPMJFS2136), MEXT KAKENHI Grant (21H04906)

Description:

© {Author | ACM} {2022}. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, http://dx.doi.org/10.1145/3552466.3556527

URI:

https://oar.a-star.edu.sg/communities-collections/articles/18694

ISBN:

978-1-4503-9496-3

Collections:

Institute for Infocomm Research

Files uploaded:

Manuscripts in This Item:

File	Size	Format	Action
design-of-wideband-mushroom-antennas-using-single-and-multi-objective-bayesian-optimization-amended.pdf	316.61 KB	PDF	Open