Supporting Exploratory Hypothesis Testing and Analysis

Page view(s)

Checked on Aug 05, 2025

Please use this identifier to cite or link to this item: https://oar.a-star.edu.sg/communities-collections/articles/12862

Title:

Supporting Exploratory Hypothesis Testing and Analysis

Journal Title:

ACM Transactions on Knowledge Discovery from Data

DOI:

10.1145/2701430

Publication URL:

http://dx.doi.org/10.1145/2701430

Authors:

Guimei Liu, Haojun Zhang, Mengling Feng, Limsoon Wong, See-Kiong Ng

Keywords:

Computing Science

Publication Date:

01 June 2015

Citation:

Abstract:

Conventional hypothesis testing is carried out in a hypothesis-driven manner. A scientist must first formu- late a hypothesis based on what he/she sees, and then devise a variety of experiments to test it. Given the rapid growth of data, it has become virtually impossible for a person to manually inspect all the data to find all the interesting hypotheses for testing. In this paper, we propose and develop a data-driven frame- work for automatic hypothesis testing and analysis. We define a hypothesis as a comparison between two or more sub-populations. We find sub-populations for comparison using frequent pattern mining techniques and then pair them up for statistical hypothesis testing. We also generate additional information for fur- ther analysis of the hypotheses that are deemed significant. The number of hypotheses generated can be very large and many of them are very similar. We develop algorithms to remove redundant hypotheses and present a succinct set of significant hypotheses to users. We conducted a set of experiments to show the efficiency and effectiveness of the proposed algorithms. The results show that our system can help users (1) identify significant hypotheses efficiently; (2) isolate the reasons behind significant hypotheses efficiently; and (3) find confounding factors that form Simpson’s Paradoxes with discovered significant hypotheses.

License type:

PublisherCopyrights

Funding Info:

supported in part by Singapore Agency for Science, Technology and Research grant SERC 102 1010 0030

Description:

URI:

https://oar.a-star.edu.sg/communities-collections/articles/12862

ISSN:

1556-4681
1556-472X

Collections:

Institute for Infocomm Research

Files uploaded:

Manuscripts in This Item:

File	Size	Format	Action
ehta.pdf	188.01 KB	PDF	Open