Supporting Exploratory Hypothesis Testing and Analysis

Supporting Exploratory Hypothesis Testing and Analysis
Supporting Exploratory Hypothesis Testing and Analysis
Other Titles:
ACM Transactions on Knowledge Discovery from Data
Publication Date:
01 June 2015
Conventional hypothesis testing is carried out in a hypothesis-driven manner. A scientist must first formu- late a hypothesis based on what he/she sees, and then devise a variety of experiments to test it. Given the rapid growth of data, it has become virtually impossible for a person to manually inspect all the data to find all the interesting hypotheses for testing. In this paper, we propose and develop a data-driven frame- work for automatic hypothesis testing and analysis. We define a hypothesis as a comparison between two or more sub-populations. We find sub-populations for comparison using frequent pattern mining techniques and then pair them up for statistical hypothesis testing. We also generate additional information for fur- ther analysis of the hypotheses that are deemed significant. The number of hypotheses generated can be very large and many of them are very similar. We develop algorithms to remove redundant hypotheses and present a succinct set of significant hypotheses to users. We conducted a set of experiments to show the efficiency and effectiveness of the proposed algorithms. The results show that our system can help users (1) identify significant hypotheses efficiently; (2) isolate the reasons behind significant hypotheses efficiently; and (3) find confounding factors that form Simpson’s Paradoxes with discovered significant hypotheses.
License type:
Funding Info:
supported in part by Singapore Agency for Science, Technology and Research grant SERC 102 1010 0030
Files uploaded:

File Size Format Action
ehta.pdf 188.01 KB PDF Open