CMDR: Classifying Nodes for Mining Data Records with Different HTML Structures

Page view(s)

Checked on Aug 10, 2025

Please use this identifier to cite or link to this item: https://oar.a-star.edu.sg/communities-collections/articles/13783

Title:

CMDR: Classifying Nodes for Mining Data Records with Different HTML Structures

Journal Title:

IEEE TENCON2017

DOI:

Publication URL:

Authors:

Kar Wai Fok, Wee Yong Lim, Vrizlynn L. L. Thing, Victor Pomponiu

Keywords:

Computing Science

Publication Date:

05 November 2017

Citation:

Abstract:

This paper addresses the problem of automated structured data records extraction from web pages. In particular, we focus on the extraction of posts from online forum sites. We show that variability in the HTML structure within user generated content in forum posts can negatively affect the extraction accuracy and propose the integration of a deep learning node classifier in the popular Mining Data Regions (MDR) process proposed in prior work. Experiment on a forum web page dataset containing posts with varying HTML structures indicate the merits of the proposed modification for MDR.

License type:

PublisherCopyrights

Funding Info:

Singapore National Research Foundation

Description:

URI:

https://oar.a-star.edu.sg/communities-collections/articles/13783

ISBN:

Collections:

Institute for Infocomm Research

Files uploaded:

Manuscripts in This Item:

File	Size	Format	Action
cmdr-tencon-camerareadyversion-0929.pdf	368.20 KB	PDF	Open