Skip to content

V0.3.1 Release Note

Compare
Choose a tag to compare
@zhjwy9343 zhjwy9343 released this 19 Aug 17:27

The GraphStorm V0.3.1 release contains a few major feature enhancements. In this version, we have reorganized the overall documentation and tutorial to facilitate a more efficient learning curve for users. The new documentation is organized into four sections: i) Getting Started, which offers a concise tutorial on usinh GraphStorm; ii) Command Line Interface User Guide, which provides an overview of the GraphStorm command line interfaces (CLI); iii) Programming Interface User Guide, which provides details the application programming interfaces (API) of GraphStorm; and vi) ) Advanced Topics, which explores complex subjects such as custom model implementation, link prediction training optimization, multi-task learning, etc. In addition, we have enhanced the distributed graph processing functionalities to improve user experience. We provided four notebook examples to demonstrate the use of GraphStorm APIs in developing custom models and training/inference pipelines.

Major features

  • Reorganized the documentations and tutorials to group the main contents under two top-level menus, i.e., COMMAND LINE INTERFACE USER GUIDE and PROGRAMMING INTERFACE USER GUIDE. #956
    • Under the CLI user guide menu, regrouped the contents in into two 2nd-level menus, i.e., GraphStorm Graph Construction and GraphStorm Model Training and Inference.
      • Under the GraphStorm Graph Construction, added a new document, Input Raw Data Specification, to explain the specifications of the input data, and provide a simple raw data example. #996
      • Added a new document, Single Machine Graph Construction, to introduce the gconstruct module, and provide a simple construction configuration JSON example. #996
      • In the Distributed Graph Construction, reorganized the document structure of GSProcessing. #907
    • Renamed the DISTRIBUTED TRAINING to GraphStorm Model Training and Inference and move it under COMMAND LINE INTERFACE USER GUIDE. #956
      • Added a new Model Training and Inference on a Single Machine 2nd-level menu to explain the launch commands.
        • Moved the Model Training and Inference Configurations section under it. #969
        • Added a new GraphStorm Training and Inference Output section to explain the intermediate outputs. #964
        • Added a new GraphStorm Output Node ID Remapping section to explain the CLIs output and the remapping operation. #970
    • Under the PROGRAMMING INTERFACE USER GUIDE menu,
    • Refined hard negative tutorial and multi-task learning tutorial. #898 #944
  • Added a new GSProcessing launch script for EMR on EC2 that allows users to run a GSProcessing job as an EMR step, simplifying the user experience. #902

New examples

  • Add a Jupyter Notebook example for using GraphStorm APIs to implement GraphStorm built-in GNN model #919
  • Add a Jupyter Notebook example for using GraphStorm APIs to customize GNN model components #929

Minor features

  • Add a hit@k evaluator for both classification and link prediction tasks. #911 #948
  • Remove the limit that save model frequency must be dividable by the evaluation frequency. Allow users to set the save model frequency freely. #893 #948
  • Added a new truncate_dim argument to GSProcessing no-op transformation and for gconstruct.construct_graph too. #922

Breaking changes

  • Add a new argument norm in the __init__ of GraphStorm classification and regression decoders. This allows users to set layer or batch normalization on the neural network layers of these decoders. Only MLPFeatEdgeDecoder implements the normalization in this release. #948
  • Rename the pos_graph_feat_fields with pos_graph_edge_feat_fields in the GSgnnLinkPredictionDataLoaderBase class to make its meaning clearer. #934

Contributors