Skip to content
icoming edited this page Jun 17, 2014 · 22 revisions

FlashGraph is a semi-external memory graph processing engine, optimized for a high-speed SSD array. It enables the processing of billion-node graphs in a single machine and has performance comparable to or exceeding in-memory graph engines such as PowerGraph. Furthermore, it notably provides significantly shorter loading time than PowerGraph due to the nature of semi-external memory graph engines.

FlashGraph targets programmers who need to implement parallel graph algorithms to process large graphs. FlashGraph provides a flexible vertex-centric programming interface so that users may express many graph algorithms as well as custom algorithm-specific optimizations with minimum effort. Users' graph algorithms are implemented in vertex programs and FlashGraph executes the vertex programs in parallel and out of core. FlashGraph hides the complexity of parallel and out-of-core execution of graph algorithms so that users only need to implement serial code and access data in memory. FlashGraph is implemented with C++.

We further implement graph algorithms on top of FlashGraph and pack them as a graph library to benefit end users, much like iGraph has done for serial algorithm implementations. Currently, the graph algorithms below are distributed with FlashGraph with many more to follow in future versions!

  • SSSP
  • k-core
  • diameter estimation
  • triangle counting
  • scan statistics in a single graph as well as in a time-series graph
  • pagerank
  • weakly connected components
  • strongly connected components
  • eigensolver on a sparse symmetric matrix
  • SVD on a sparse matrix

FlashGraph maintains user-defined vertex state in memory and accesses the edge lists of vertices from SSDs through SAFS (Set Associative File System), as shown in the architecture below. It is tightly integrated with SAFS to take advantage of the high I/O throughput of an SSD array. FlashGraph loads the edge lists of vertices to memory only when they are required by the graph algorithms at runtime. Part of vertex programs, provided by users , is executed in FlashGraph and part of it is executed in SAFS to overlap I/O with computation and also minimize the overhead of memory copy.

Architecture

Clone this wiki locally