Project Description: We are living in an increasingly connected world, where the Big Data movement has resulted in not only more data but, more importantly, more connected data, such as, social networks, knowledge graphs and deep neural networks. Sub-graph isomorphism, which finds sub-graphs of interest from an enormous data graph, is a fundamental tool for an array of critical applications, e.g., cyber-security, criminal detection and health care. In spite of such a great potential, identifying all isomorphic sub-graphs is a challenge in the Big Data context due to high computation complexity and memory consumption. This project addresses this issue, and will benefit both industrial and academic communities, as well as train undergraduate, underrepresented and STEM high school students for high-performance data analytics (HPDA) research.
Conventional efforts split the query graph into two disjoint parts for prune and join, which impairs the prune strength of the query graph and results in a large volume of unpromising candidates for the 'join' phase. This project advocates an entire query graph-based prune approach to resolve the computation and memory challenge. In particular, this research consists of two parts: 1) algorithms research -- provenance-aware intersection-based candidate construction -- will greatly reduce the false positives faced by the conventional approaches; 2) systems research, i.e., GPU-enabled massively parallel provenance group intersection, will tackle the bottleneck that is encountered on conventional CPU platforms, through use of Graph Processing Unit (GPU) acceleration.