The GraphStorm V0.3.1 release contains a few major feature enhancements. In this version, we have reorganized the overall documentation and tutorial to facilitate a more efficient learning curve for users. The new documentation is organized into four sections: i) Getting Started, which offers a concise tutorial on usinh GraphStorm; ii) Command Line Interface User Guide, which provides an overview of the GraphStorm command line interfaces (CLI); iii) Programming Interface User Guide, which provides details the application programming interfaces (API) of GraphStorm; and vi) ) Advanced Topics, which explores complex subjects such as custom model implementation, link prediction training optimization, multi-task learning, etc. In addition, we have enhanced the distributed graph processing functionalities to improve user experience. We provided four notebook examples to demonstrate the use of GraphStorm APIs in developing custom models and training/inference pipelines.

Major features

Reorganized the documentations and tutorials to group the main contents under two top-level menus, i.e., COMMAND LINE INTERFACE USER GUIDE and PROGRAMMING INTERFACE USER GUIDE. #956
- Under the CLI user guide menu, regrouped the contents in into two 2nd-level menus, i.e., GraphStorm Graph Construction and GraphStorm Model Training and Inference.
  - Under the GraphStorm Graph Construction, added a new document, Input Raw Data Specification, to explain the specifications of the input data, and provide a simple raw data example. #996
  - Added a new document, Single Machine Graph Construction, to introduce the gconstruct module, and provide a simple construction configuration JSON example. #996
  - In the Distributed Graph Construction, reorganized the document structure of GSProcessing. #907
- Renamed the DISTRIBUTED TRAINING to GraphStorm Model Training and Inference and move it under COMMAND LINE INTERFACE USER GUIDE. #956
  - Added a new Model Training and Inference on a Single Machine 2nd-level menu to explain the launch commands.
    - Moved the Model Training and Inference Configurations section under it. #969
    - Added a new GraphStorm Training and Inference Output section to explain the intermediate outputs. #964
    - Added a new GraphStorm Output Node ID Remapping section to explain the CLIs output and the remapping operation. #970
- Under the PROGRAMMING INTERFACE USER GUIDE menu,
  - Added new notebooks for APIs examples, #919 #929
  - Revised all doc strings of released APIs. #934 #941 #950 #952
- Refined hard negative tutorial and multi-task learning tutorial. #898 #944
Added a new GSProcessing launch script for EMR on EC2 that allows users to run a GSProcessing job as an EMR step, simplifying the user experience. #902

New examples

Add a Jupyter Notebook example for using GraphStorm APIs to implement GraphStorm built-in GNN model #919
Add a Jupyter Notebook example for using GraphStorm APIs to customize GNN model components #929

Minor features

Add a hit@k evaluator for both classification and link prediction tasks. #911 #948
Remove the limit that save model frequency must be dividable by the evaluation frequency. Allow users to set the save model frequency freely. #893 #948
Added a new truncate_dim argument to GSProcessing no-op transformation and for gconstruct.construct_graph too. #922

Breaking changes

Add a new argument norm in the __init__ of GraphStorm classification and regression decoders. This allows users to set layer or batch normalization on the neural network layers of these decoders. Only MLPFeatEdgeDecoder implements the normalization in this release. #948
Rename the pos_graph_feat_fields with pos_graph_edge_feat_fields in the GSgnnLinkPredictionDataLoaderBase class to make its meaning clearer. #934

Contributors

Xiang Song from AWS
Jian Zhang from AWS
Theodore Vasiloudis from AWS
Runjie Ma from AWS
Han Xie from AWS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V0.3.1 Release Note

Major features

New examples

Minor features

Breaking changes

Contributors