Extraction of Indonesian and English Parallel Sentences from Movie Subtitles

Page view(s)

Checked on Sep 10, 2025

Please use this identifier to cite or link to this item: https://oar.a-star.edu.sg/communities-collections/articles/13904

Title:

Extraction of Indonesian and English Parallel Sentences from Movie Subtitles

Journal Title:

International Conference on Asian Language Processing (IALP) 2017

DOI:

Publication URL:

Authors:

Ai Ti Aw, Xuancong Wang, Boon Hong Yeo

Keywords:

Publication Date:

05 December 2017

Citation:

Abstract:

Parallel corpus serves as a mandatory resource to develop machine translation engine. The size and coverage of parallel corpus available for training directly affects the translation accuracy of the engine. To acquire more training data for the development of the translation engine in conversational domain, we propose a method to extract parallel data from Movie Subtitles using dynamic time warping, cosine similarity and beam search algorithm. The proposed method is capable of extracting 30% parallel sentences from a set of Indonesian-English movie subtitles with a precision of 98%.

License type:

PublisherCopyrights

Funding Info:

Description:

URI:

https://oar.a-star.edu.sg/communities-collections/articles/13904

ISBN:

Collections:

Institute for Infocomm Research

Files uploaded:

Manuscripts in This Item:

File	Size	Format	Action
parallel-sentence-matching-v4.docx	214.06 KB	DOCX	Open