CRPGCN:Predicting circRNA-DiseaseAssociations Using Graph Convolutional Network Based on Heterogeneous Network
Background: The existing studies show that circRNAs can be used as biomarkers of diseases and play a prominent role in the treatment and diagnosis of diseases. However, the relationships between the vast majority of circRNAs and diseases are still unclear, and more experiments are needed to study the mechanism of circRNAs. Nowadays, some scholars use the attributes between circRNAs and diseases to study and predict their associations. Nonetheless, most of the existing experimental methods use less information about the attributes of circRNAs, which has a certain impact on the accuracy of the final prediction results. On the other hand, some scholars also apply experimental methods to predict the associations between circRNAs and diseases. But such methods are usually expensive and time-consuming. Based on these shortcomings, follow-up studies are needed to propose more effective computation-based methods to predict the associations between circRNAs and diseases.
Results: In this study, a novel algorithm (method) is proposed, which is based on the Graph Convolutional Network (GCN) constructed with Random Walk with Restart (RWR) and Principal Component Analysis (PCA) to predict the associations between circRNAs and diseases (CRPGCN). In the construction of CRPGCN, the RWR algorithm is used to improve the similarity associations of the computed nodes with their neighbours. After that, the PCA method is used to dimensionality reduction and extract features, it makes the \textcolor{red}{connections} between circRNAs with higher similarity and diseases closer. Finally, The GCN algorithm is used to learn the features between circRNAs and diseases and calculate the final similarity scores, and the learning data are constructed from the adjacency matrix, similarity matrix and feature matrix as a heterogeneous adjacency matrix and a heterogeneous feature matrix.
Conclusions: After 2-fold cross-validation (CV), 5-fold CV and 10-fold CV, the area under the ROC curve (AUC) of the CRPGCN are 0.9490, 0.9720 and 0.9722, respectively. The CRPGCN method has a valuable effect in predict the associations between circRNAs and diseases.
The main contributions of this work are summarized as follows:
-
The CRPGCN method incorporates the RWR similarity calculation method and the PCA feature extraction method, allowing the calculated nodes to better combine the similarity between neighbouring nodes while greatly reducing the impact on the prediction results.
-
The CRPGCN algorithm improves prediction accuracy and has the highest AUC values and AUPR values when compared to advanced algorithms.
-
The GCRGCN algorithm is more stable than some of the advanced algorithms, and its AUCs are stable when compared by a variety of methods with different datasets.
-
By comparing various evaluation metrics, the CRPGCN algorithm outperforms other advanced algorithms in terms of overall performance.
numpy~=1.19.4
pandas~=1.1.5
matplotlib~=3.3.3
pyrwr~=1.0.0
tensorflow~=2.4.0
tensorflow-gpu~=2.4.0
scipy~=1.5.4
networkx~=2.5
tqdm~=4.55.0
sklearn~=0.0
scikit-learn~=0.23.2
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Cuda compilation tools, release 11.1, V11.1.74
Build cuda_11.1
- AM: Adjacency matrix
- Disease_sim : Disease similarity matrix DS
- RNA_sim: circRNA similarity matrix CS
- Install the runtime environment in the Terminal with the command: pip install -r requirements.txt
- Put the adjacency matrix (.csv), disease similarity matrix (.csv) and RNA similarity matrix (.csv) into the dataset folder
- run main.py
If you are unable to debug the CRPGCN runtime environment, you can download the environment we have configured in the following way:
If you found this paper or code helpful, please cite our paper:
@article{Ma2021,
author = {Ma, Zhihao and Kuang, Zhufang and Deng, Lei},
doi = {10.1186/s12859-021-04467-z},
issn = {1471-2105},
journal = {BMC Bioinformatics},
keywords = {CircRNA-disease,Deep learning,Graph convolutional network,Heterogenous network,Principal component analysis},
pages = {1--23},
publisher = {BioMed Central},
title = {{CRPGCN : predicting circRNA‑disease associations using graph convolutional network based on heterogeneous network}},
url = {https://doi.org/10.1186/s12859-021-04467-z},
year = {2021}
}
If you have any questions, please submit your issues.