-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could you provide the input data format? #2
Comments
Thank you so much! they are very timely and helpful. Can you provide information on how to generate an xx_split_idx.pkl file from the dataset and its storage format? |
The |
Thanks for the clean datasets! One issue I have regarding the data specification: graph_data_storage.md specifies |
Yes Also note that the data format we created for large graph datasets could be easily extended with other special graph attributes based on your problems. Hope this is helpful! |
Thanks @jzhou316, that makes perfect sense - but it might be a nice addition to graph_data_storage.md :). |
cool. I'll add some details |
The detailed instructions are very helpful. How to set num_evils and num_evils_avg if our problem is for multiclassification but not biclassification (evil/non-evil)? |
@velpc These are dataset statistics stored in the HDF5 file (and may not be used by the model). For different specific problems such as multiclassification, you can write your own data following our format with your other dataset attributes. For example, you could have attributes such as "num_class_0" "num_class_1" "num_class_2" etc. to describe the dataset. We have some example code of writing these attributes here. Hope this answers your question! |
Hi @jzhou316, is there a much much smaller dataset that can be used for quick testing of the algorithm? I wanted to try out with a smaller subset without having to download these ones specified on |
@helmoai Sorry that we currently don't have an official mini dataset for quick testing. Could you download the data and take out a subset (e.g. a few graphs) to run the mini-test? Otherwise I could generate a smaller subset from one of the datasets for you. |
Found an issue in the code you gave, to read the hdf5 files here. I think you missed the
|
@helmoai yes you are right. Thanks for pointing it out! Updated it. |
In scatter_ of common.py, out (SRC, index, 0, out, dim_size, fill_value) has 6 parameters, but the display can only enter 2-5 parameters. |
I've been implementing this on a different network dataset and noticed a few gotchas related to this. If you use the |
Detailed explanation of hdf5 instance format of pyg, dgl, nx, or dict.
The text was updated successfully, but these errors were encountered: