Extraction of Indonesian and English Parallel Sentences from Movie Subtitles

Page view(s)
29
Checked on Nov 20, 2024
Extraction of Indonesian and English Parallel Sentences from Movie Subtitles
Title:
Extraction of Indonesian and English Parallel Sentences from Movie Subtitles
Journal Title:
International Conference on Asian Language Processing (IALP) 2017
DOI:
Publication URL:
Keywords:
Publication Date:
05 December 2017
Citation:
Abstract:
Parallel corpus serves as a mandatory resource to develop machine translation engine. The size and coverage of parallel corpus available for training directly affects the translation accuracy of the engine. To acquire more training data for the development of the translation engine in conversational domain, we propose a method to extract parallel data from Movie Subtitles using dynamic time warping, cosine similarity and beam search algorithm. The proposed method is capable of extracting 30% parallel sentences from a set of Indonesian-English movie subtitles with a precision of 98%.
License type:
PublisherCopyrights
Funding Info:
Description:
ISBN:

Files uploaded:

File Size Format Action
parallel-sentence-matching-v4.docx 214.06 KB DOCX Open