Skip to content
icoming edited this page Jun 16, 2014 · 22 revisions

FlashGraph is an SSD-based semi-external memory graph processing engine, and is optimized for a high-speed SSD array. It enables us to process a billion-node graph in a single machine and has performance comparable to or exceed in-memory graph engines such as PowerGraph. It also has very short loading time.

Comparison to PowerGraph

Here is some performance result of FlashGraph and PowerGraph on large graphs in a single machine. We have five graph applications: breadth-first search (BFS), triangle counting (TC), weakly connected components (WCC), scan statistics (SS), page rank (PR). Each of them has a FlashGraph implementation and a PowerGraph implementation. We use these applications to compare the performance of FlashGraph and PowerGraph on the twitter graph (42M vertices and 1.5B edges) and the subdomain graph (89M vertices and 2B edges). Figure 8 shows the runtime of the applications in both graph engines and Figure 9 shows the memory consumption of FlashGraph and PowerGraph. Although FlashGraph runs in the semi-external memory mode, it can still significantly outperform PowerGraph in most graph applications and it uses a small fraction of the memory used by PowerGraph.

runtime on twitter and subdomain

memory consumption

FlashGraph on a billion-node graph

FlashGraph enables us to process a very large graph in a single machine. Here we demonstrate that FlashGraph can process a real-world hyperlink page graph (3.4B vertices and 129B edges) in a single machine. The table below show the performance of the graph applications as above as well as diameter estimation (DE) on the page graph.

runtime on page graph

Comparison to other graph engines on large graphs

FlashGraph takes only eight minutes to traverse the page graph (3.5B vertices and 128B edges) with a cache size of 4GB and less than six minutes with a cache size of 256 GB. Pregel used 300 multicore machines to run the shortest path algorithm on their largest random graph (1B vertices and 127B edges) and took a little over ten minutes. More recently, Trinity took over ten minutes to perform breadth-first search on a graph of one billion vertices and 13 billion edges on 14 12-core machines.

FlashGraph takes 75 minutes to run 30 iterations of PageRank on the page graph in 75 minutes. Based on a recent talk by the main developer of Giraph, Giraph running with 20 workers can run five iterations of PageRank on a graph with 5 billion edges in approx. 75 minutes.

Clone this wiki locally