An Analysis of Semantically-Aligned Speech-Text Embeddings

Page view(s)
27
Checked on Dec 24, 2024
An Analysis of Semantically-Aligned Speech-Text Embeddings
Title:
An Analysis of Semantically-Aligned Speech-Text Embeddings
Journal Title:
2022 IEEE Spoken Language Technology Workshop (SLT)
Publication Date:
27 January 2023
Citation:
Huzaifah, M., & Kukanov, I. (2023). An Analysis of Semantically-Aligned Speech-Text Embeddings. 2022 IEEE Spoken Language Technology Workshop (SLT). https://doi.org/10.1109/slt54892.2023.10023147
Abstract:
Embeddings play an important role in end-to-end solutions for multi-modal language processing problems. Although there has been some effort to understand the properties of single-modality embedding spaces, particularly that of text, their cross-modal counterparts are less understood. In this work, we study some intrinsic properties of a joint speech-text embedding space, constructed by minimizing the distance between paired utterance and transcription inputs in a teacher-student model setup, that are informative for several prominent use cases. We found that incorporating automatic speech recognition through both pretraining and multitask scenarios aid semantic alignment significantly, resulting in more tightly coupled embeddings. To analyse cross-modal embeddings we utilise a quantitative retrieval accuracy metric for semantic alignment, zero-shot classification for generalisability, and probing of the encoders to observe the extent of knowledge transfer from one modality to another.
License type:
Publisher Copyright
Funding Info:
This research is supported by core funding from: I2R
Grant Reference no. : NIL
Description:
© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
ISBN:
979-8-3503-9690-4
Files uploaded:

File Size Format Action
je-end2end-slt-2022.pdf 1.87 MB PDF Open