GitHub - classicsong/dgl-ke-1: High performance, easy-to-use, and scalable package for learning large-scale knowledge graph embeddings.

Documentation

Knowledge graphs (KGs) are data structures that store information about different entities (nodes) and their relations (edges). A common approach of using KGs in various machine learning tasks is to compute knowledge graph embeddings. DGL-KE is a high performance, easy-to-use, and scalable package for learning large-scale knowledge graph embeddings. The package is implemented on the top of Deep Graph Library (DGL) and developers can run DGL-KE on CPU machine, GPU machine, as well as clusters with a set of popular models, including TransE, TransR, RESCAL, DistMult, ComplEx, and RotatE.

Figure: DGL-KE Overall Architecture

Currently DGL-KE support three tasks:

Training, trains KG embeddings using dglke_train(single machine) or dglke_dist_train(distributed environment).
Evaluation, reads the pre-trained embeddings and evaluates the embeddings with a link prediction task on the test set using dglke_eval.
Inference, reads the pre-trained embeddings and do the linkage score ranking inference tasks using dglke_score or do the embedding similarity ranking inference tasks using dglke_emb_sim.

A Quick Start

To install the latest version of DGL-KE run:

sudo pip3 install dgl
sudo pip3 install dglke

Train a transE model on FB15k dataset by running the following command:

DGLBACKEND=pytorch dglke_train --model_name TransE_l2 --dataset FB15k --batch_size 1000 \
--neg_sample_size 200 --hidden_dim 400 --gamma 19.9 --lr 0.25 --max_step 500 --log_interval 100 \
--batch_size_eval 16 -adv --regularization_coef 1.00E-09 --test --num_thread 1 --num_proc 8

This command will download the FB15k dataset, train the transE model and save the trained embeddings into the file.

Performance and Scalability

DGL-KE is designed for learning at scale. It introduces various novel optimizations that accelerate training on knowledge graphs with millions of nodes and billions of edges. Our benchmark on knowledge graphs consisting of over 86M nodes and 338M edges shows that DGL-KE can compute embeddings in 100 minutes on an EC2 instance with 8 GPUs and 30 minutes on an EC2 cluster with 4 machines (48 cores/machine). These results represent a 2×∼5× speedup over the best competing approaches.

Figure: DGL-KE vs GraphVite on FB15k

Figure: DGL-KE vs Pytorch-BigGraph on Freebase

Learn more details with our documentation! If you are interested in the optimizations in DGL-KE, please check out our paper for more details.

Cite

If you use DGL-KE in a scientific publication, we would appreciate citations to the following paper:

@misc{zheng2020dglke,
    title={DGL-KE: Training Knowledge Graph Embeddings at Scale},
    author={Da Zheng and Xiang Song and Chao Ma and Zeyuan Tan and Zihao Ye and Jin Dong and Hao Xiong and Zheng Zhang and George Karypis},
    year={2020},
    eprint={2004.08532},
    archivePrefix={arXiv},
}

License

This project is licensed under the Apache-2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 156 Commits
conda		conda
docker		docker
docs		docs
examples		examples
img		img
python		python
tests		tests
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTORS.md		CONTRIBUTORS.md
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Quick Start

Performance and Scalability

Cite

License

About

Releases

Packages

Languages

License

classicsong/dgl-ke-1

Folders and files

Latest commit

History

Repository files navigation

A Quick Start

Performance and Scalability

Cite

License

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages