NUT-RC: Noisy User-generated Text-oriented Reading Comprehension

NUT-RC: Noisy User-generated Text-oriented Reading Comprehension
NUT-RC: Noisy User-generated Text-oriented Reading Comprehension
Other Titles:
International Committee on Computational Linguistics
Publication Date:
08 December 2020
Reading comprehension (RC) on social media such as Twitter is a critical and challenging task due to its noisy, informal, but informative nature. Most existing RC models are developed on formal datasets such as news articles and Wikipedia documents, which severely limit their performances when directly applied to the noisy and informal texts in social media. Moreover, these models only focus on a certain type of RC, extractive or generative, but ignore the integration of them. To well address these challenges, we come up with a noisy user-generated text-oriented RC model. In particular, we first introduce a set of text normalizers to transform the noisy and informal texts to the formal ones. Then, we integrate the extractive and the generative RC model by a multi-task learning mechanism and an answer selection module. Experimental results on TweetQA demonstrate that our NUT-RC model significantly outperforms the state-of-the-art social media-oriented RC models.
License type:
Funding Info:
This research is supported by the National Natural Science Foundation of China (No. 61703293, No. 61751206, and No. 61672368) and the National Key Research and Development Program of China(2017YFB1002104). This work was also supported by the joint research project of Alibaba and Soochow University. Finally, we would like to thank the anonymous reviewers for their insightful comments and suggestions.

Files uploaded:

File Size Format Action
2020coling-main242.pdf 510.34 KB PDF Open