The push to introduce faster supercomputers is generating ever-larger data sets, challenging traditional methods of data analysis and visualization. While the computational power of supercomputers keep increasing with every generation, the I/O systems have not kept pace, resulting in a significant performance bottle neck. This has greatly hampered the performance of data analysis and visualization in big data era.
We propose a solution, VisDSI (Visualization via a Distributed Storage Infrastructure), to address the problem and eliminate I/O bottlenecks by 1) using traditional high performance clusters with disks directly attached within each node; 2) deploying a data-intensive distributed file system on the cluster and exploiting data locality information to visualization; and 3) developing a POSIX-compatible I/O layer to enable the traditional visualization applications to smoothly port to this new platform as well as to provide a new I/O semantic of retrieving the data location with respect to the POSIX standards. In particular, VisDSI guarantees the co-located compute and data storage by introducing a scheduling of work assignments to nodes with local copies of needed data. Compared with the original visualization application, our solution runs at least 10 times faster.