Simply knowing a new approach is faster is much less useful than knowing why the new approach is faster. In the case of graph processing research, many projects span multiple laters of the computational stack, making it difficult to determine which factors are responsible for a performance improvement. To understand these performance improvements will require metrics more introspective than execution time.
We present the Graph Algorithm Iron Law (GAIL) to weigh the tradeoffs between the three most important graph algorithm performance factors: algorithm, implementation, and hardware platform. To quantify algorithmic work, we annotate the code to count the number of edges traversed. From our characterization work, the interaction between the implementation and the platform is best summarized by examining locality (quantified by memory requests) and memory bandwidth utilization. With GAIL, we factor these terms analogously to the CPU Iron Law:
As a simple example of using GAIL, we compare the performance of three breadth-first search (BFS) implementations traversing the same input graph using the same hardware platform. From the execution time, we see the direction-optimizing implementation is the fastest, but there is no indication as to what is driving the speedup. A different implementation is best at each metric (lower is better), but with GAIL, these differences can be quantitatively weighed to asses tradeoffs. In this case, the algorithmic reduction in edges traversed by the direction-optimizing implementation trumps its poor cache locality and memory bandwidth utilization.