Singlish Message Paraphrasing: A Joint Task of Creole Translation and Text Normalization

Page view(s)
13
Checked on Jan 19, 2023
Singlish Message Paraphrasing: A Joint Task of Creole Translation and Text Normalization
Title:
Singlish Message Paraphrasing: A Joint Task of Creole Translation and Text Normalization
Other Titles:
Proceedings of the 29th International Conference on Computational Linguistics
DOI:
Keywords:
Publication Date:
12 October 2022
Citation:
Zhengyuan Liu, Shikang Ni, Ai Ti Aw, and Nancy F. Chen. 2022. Singlish Message Paraphrasing: A Joint Task of Creole Translation and Text Normalization. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3924–3936, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Abstract:
Within the natural language processing community, English is by far the most resource-rich language. There is emerging interest in conducting translation via computational approaches to conform its dialects or creole languages back to standard English. This computational approach paves the way to leverage generic English language backbones, which are beneficial for various downstream tasks. However, in practical online communication scenarios, the use of language varieties is often accompanied by noisy user-generated content, making this translation task more challenging. In this work, we introduce a joint paraphrasing task of creole translation and text normalization of Singlish messages, which can shed light on how to process other language varieties and dialects. We formulate the task in three different linguistic dimensions: lexical level normalization, syntactic level editing, and semantic level rewriting. We build an annotated dataset of Singlish-to-Standard English messages, and report performance on a perturbation-resilient sequence-to-sequence model. Experimental results show that the model produces reasonable generation results, and can improve the performance of downstream tasks like stance detection.
License type:
Attribution 4.0 International (CC BY 4.0)
Funding Info:
This research is supported by core funding from: I2R
Grant Reference no. : SC20/21-816400
Description:
ISBN:
2022.coling-1.345
Files uploaded:

File Size Format Action
singlish-coling-cameraready.pdf 1.72 MB PDF Open