Release GraphStorm v0.2 release · awslabs/graphstorm

GraphStorm V0.2 release contains a few major features enhancement. In this release, we have added distributed graph processing support for large-scale graphs. Users may now use Spark clusters such as SageMaker, PySpark to execute distributed graph processing. We have added multi-task learning support for node classification tasks. Now, GraphStorm supports even more Huggingface language models (LM) Like bert, roberta, albert, etc.(See https://github.com/awslabs/graphstorm/blob/v0.2/python/graphstorm/model/lm_model/utils.py#L22) for more details. We have enhanced GraphStorm model training speed by supporting NCCL backend. Further performance enhancement to speedup node feature fetching during distributed GNN training by collaborating with Nvidia on NVidia WholeGraph support. We have expanded graph model support for distilling a GNN model into a Huggingface DistilBertModel, and added two new models HGT and GraphSage in GraphStorm model zoo. New GraphStorm doc and tutorial are available on https://graphstorm.readthedocs.io for all user group.

Major features

Support multi-task learning for node classification tasks (#410)
Enable NCCL backend (#383, #337)
Publish GraphStorm doc on https://graphstorm.readthedocs.io.
Support using multiple language models available in Huggingface including bert, roberta, albert, etc, in graph aware LM fine-tuning, GNN-LM co-training and GLEM. (#385)
[Experimental] Distributed graph processing support (#435, #427, #419, #408, #407, #400)
[Experimental] Support using NVidia WholeGraph to speedup node feature fetching during distributed GNN training. (#428, #405)
[Pre-View] Support for distilling a GNN model into a Huggingface DistilBertModel. (#443, #463)

New Built-in Models

Heterogeneous Graph Transformer (HGT) (#396)
GraphSage (#352)
[Experimental] GLEM semi-supervised training for node tasks. (#327, #432)

Minor features

Support per edge type link prediction metric report (#393)
Support per class roc-auc report for multi-label multi-class classification tasks (#397)
Support batch norm and layer norm (#384)
Enable standalone mode that allows users to run the training/inference scripts without using the launch script (#331)

API breaking changes

We changed the filename format of saved embeddings (either learnable embeddings or node embeddings) and model prediction results from .pt to <padding_zeros>.pt . For example, suppose we have 4 trainers, the saved node embeddings will be named as emb.part00000.pt, emb.part00001.pt, emb.part00002.pt, emb.part00003.pt.

Contributors

Da Zheng from AWS
Xiang Song from AWS
Jian Zhang from AWS
Theodore Vasiloudis from AWS
Runjie Ma from AWS
Israt Nisa from AWS
Qi Zhu from AWS
Houyu Zhang from Amazon Search
Zichen Wang from AWS
Weilin Cong from University of Penn State
Nicolas Castet from NVidia
Chang Liu from NVidia

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GraphStorm v0.2 release

Major features

New Built-in Models

Minor features

API breaking changes

Contributors