The open source code for our paper "edge2vec: Learning Node Representation Using Edge Semantics".
The dataset we offer for test is data.csv. The data contains four columns, which refer to Source ID, Target ID, Edge Type, Edge ID. And columns are seperated by comma ','.
For unweighted graph, please see unweighted_graph.txt. The four columns are Source ID, Target ID, Edge Type, Edge ID. And columns are seperated by comma ','. For weighted graph, please see weighted_graph.txt. The five columns are Source ID, Target ID, Edge Type, Edge Weight, Edge ID. And columns are seperated by comma ','.
There are two steps for running the code. First, to calculate transition matrix in heterogeneous networks. run transition.py from bash:
$ transition.py --input data.csv --output matrix.txt --type_size 3 --em_iteration 5 --e_step 3 --walk-length 3 --num-walks 2
The output is matrix.txt which stores edge transition matrix. Second, run edge2vec.py to the node embeddings via biased random walk. To use it from bash:
$ edge2vec.py --input data.csv --matrix matrix.txt --output vector.txt --dimensions 128 --walk-length 3 --num-walks 2 --p 1 --q 1
The output is the node embedding file vector.txt. Data repository for medical dataset in the link: http://ella.ils.indiana.edu/~gao27/data_repo/edge2vec%20vector.zip (It is a re-computed version so the evaluation output may be a little bit different with the paper reported results.)
if you use the code, please cite:
- Gao, Zheng, Gang Fu, Chunping Ouyang, Satoshi Tsutsui, Xiaozhong Liu, and Ying Ding. "edge2vec: Learning Node Representation Using Edge Semantics." arXiv preprint arXiv:1809.02269 (2018).
The code is released under BSD 3-Clause License.
- Zheng Gao - [email protected]