-
Notifications
You must be signed in to change notification settings - Fork 53
Home
FlashGraph is an SSD-based semi-external memory graph processing engine. FlashGraph stores vertex state in memory and the edge lists of vertices on SSDs. It is optimized for a high-speed SSD array or other high-speed flash storage. Its current implementation is tightly integrated with SAFS (Set Associative File System) to take advantage of the high I/O throughput of an SSD array. Due to the natural of semi-external memory graph engine, FlashGraph has very short load time and small memory consumption when processing very large graphs, which enables us to process a billion-node graph in a single machine. Thanks to the high-speed SSDs, it also has performance comparable to or exceed in-memory graph engines such as PowerGraph.
Here is some performance result of FlashGraph and PowerGraph on large graphs in a single machine. We have five graph applications: breadth-first search (BFS), triangle counting (TC), weakly connected components (WCC), scan statistics (SS), page rank (PR). Each of them has a FlashGraph implementation and a PowerGraph implementation. We use these applications to compare the performance of FlashGraph and PowerGraph on the twitter graph (42M vertices and 1.5B edges) and the subdomain graph (89M vertices and 2B edges). Figure 8 shows the runtime of the applications in both graph engines and Figure 9 shows the memory consumption of FlashGraph and PowerGraph. Although FlashGraph runs in the semi-external memory mode, it can still significantly outperform PowerGraph in most graph applications and it uses a small fraction of the memory used by PowerGraph.
FlashGraph enables us to process a very large graph in a single machine. Here we demonstrate that FlashGraph can process a real-world hyperlink page graph (3.4B vertices and 129B edges) in a single machine. The table below show the performance of the graph applications as above as well as diameter estimation (DE) on the page graph.