EX-Graph is a rich dataset containing Ethereum transaction data and X data. With a focus on bridging the gap between anonymous Ethereum activities and the leading social media X's information, it's specifically designed to facilitate various tasks including Ethereum link prediction, wash-trading addresses detection, and matching link prediction between Ethereum addresses and X accounts. By integrating these two different dimensions, EX-Graph provides an on-chain and off-chain resource for researchers and analysts to explore the complex dynamics of Ethereum.
Additionally, we've made the EX-Graph dataset accessible through a dedicated website EX-Graph. Here, you are welcomed to upload and compare your experiment results on our leaderboard.
EX-Graph dataset consists of several components:
X Graph
: X accounts and their following relationships.Ethereum Graph
: Ethereum addresses and their transaction records.Ethereum Graph with X features
: A DGL graph contains Ethereum addresses and their transaction records, as well as Ethereum addresses features got from both Ethereum and matched-X accounts, it is used for Ethereum link prediction task.Wash-trading Addresses Detection Training Graph
: Training graph used for wash-trading addresses detection task.Wash-trading Addresses Detection Validation Graph
: Validation graph used for wash-trading addresses detection task.Wash-trading Addresses Detection Test Graph
: Test graph used for wash-trading addresses detection task.Matching Link Prediction Graph
: DGL graph used for matching link prediction task.ethereum_link_prediction
: Folder contains models to run Ethereum link prediction experiment and train, validation, test edge indices pickle files.wash_trading_address_detection
: Folder contains code of models to run wash trading address detection experiment.matching_link_prediction
: Folder contains models to run matching link prediction experiment and train, validation, test edge indices pickle files.X_matching.csv
: X accounts and the Ethereum addresses matched with them. We did anonymization operations so that you can see the X numerical id matching with Ethereum addresses without leaking any personal information.dune_wash_trade_tx.csv
: Wash-trading transactions downloaded from Dune.
We present the original datasets and datasets tailored for use in three distinct tasks. For each dataset, a Google Drive link is provided for easy access and download.
Dataset | #Nodes | #Edges | Node Features | Edge Features |
---|---|---|---|---|
X Graph | 1,103,509 | 3,768,281 | n.a. | n.a. |
Ethereum Graph | 2,610,465 | 29,585,858 | n.a. | from_address, to_address, weight, block_number |
Ethereum Graph with X features | 1,709,575 | 13,170,869 | X_features, ethereum_features, ethereum_X_combined_features | from_address, to_address, weight, block_number |
Wash-trading Addresses Detection Training Graph | 1,268,607 | 8,276,446 | ethereum_features, X_semantic_features, X_structure_features, ethereum_X_combined_features, wash_trading_label | n.a. |
Wash-trading Addresses Detection Validation Graph | 452,930 | 1,182,349 | ethereum_features, X_semantic_features, X_structure_features, ethereum_X_combined_features, wash_trading_label | n.a. |
Wash-trading Addresses Detection Test Graph | 711,084 | 2,364,700 | ethereum_features, X_semantic_features, X_structure_features, ethereum_X_combined_features, wash_trading_label | n.a. |
Matching Link Prediction Graph | 2,833,659 | 18,433,366 | combined_features | n.a. |
-
Download this repository and extract the datasets to a desired location.
-
Create a Conda environment using the provided
environment.yml
file, which includes the requirements to run experiments in this repository. Run:conda env create -f environment.yml
We list main requirements of this repository below. For full requirements, please refer to the provided environment.yml
file
- dgl==1.1.0
- torch==2.0.0+cu117
- torch-cluster==1.6.1+pt20cu117
- torch-geometric==2.3.0
- torch-scatter==2.1.1+pt20cu117
- torch-sparse==0.6.17+pt20cu117
- torch-spline-conv==1.2.2+pt20cu117
- torchaudio==2.0.1+cu117
- torchmetrics==0.11.4
- torchvision==0.15.1+cu117
- transformers==4.30.2
We suppose that you have downloaded the datasets to the corresponding task's folders. Then, you can use the following command to load the data in the pkl format.
e.g. For Ethereum link prediction task, you have downloaded Ethereum Graph with X features in the folder. Then you can use the following command to load the dataset
import dgl
import pickle as pkl
with open('ethereum_graph_with_X_features.pkl', 'rb') as f:
ethereum_graph_with_X_features = pkl.load(f)
This section details how to using EX-Graph to run benchmark baselines. We explain experiments Ethereum link prediction, wash-trading addresses detection, matching link prediction one by one.
-
Navigate to ethereum_link_prediction folder, and download Ethereum Graph with X features to this folder.
-
Each experiment code file is named as
model_wo/with_X.py
.wo
means this experiment does not consider X features.with
means this experiment considers X features. For example, if you want to run clusterGCN model considering X features in Ethereum link prediction, the command is as below:python clusterGCN_with_X.py
-
Navigate to wash_trading_address_detection folder, and download the wash trading graphs: Wash-trading Addresses Detection Training Graph, Wash-trading Addresses Detection Validation Graph, Wash-trading Addresses Detection Test Graph to this folder.
-
Each experiment code file is named as
model_wo/with_X.py
.wo
means this experiment does not consider X features.with
means this experiment considers X features. For example, if you want to run GCN model considering X features, the command is as below:python GCN_with_X.py
-
Navigate to matching_link_prediction folder, and download Matching Link Prediction Graph to this folder.
-
Each experiment code file is named as
model.py
. For example, if you want to run GCN model on this matching link prediction, the command is as below:python GCN.py
This dataset is under license CC BY-NC-SA.
We thank Ruochen Li for building the website and providing valuable suggestions for this repository.
If you find this repository useful, please cite our paper:
@inproceedings{wangex,
title={EX-Graph: A Pioneering Dataset Bridging Ethereum and X},
author={Wang, Qian and Zhang, Zhen and Liu, Zemin and Lu, Shengliang and Luo, Bingqiao and He, Bingsheng},
booktitle={The Twelfth International Conference on Learning Representations}
}