-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xxxx Killed #21
Comments
Hi, it would be helpful if you could provide more detailed error message, so that we can see where the program errors out in the code exactly. |
Hi, tks for your reply but it no error message, it show like above and stop. |
Do you provide a jupyter notebook? I spend 3 days to run this project, but have no result. I am very grateful if you provide a jupyter notebook. |
https://github.com/TruongDuyLongPTIT/DoAnTotNghiepPTIT/blob/main/Untitled1.ipynb |
In the installation, can you specify the version of pytorch scatter to be a previous one by |
When i install torch-scatter=1.3.1 then this happen: |
I think "Killed" happen because it take a lot of ram. Do you have some ways to save ram? |
when i run !dmseg. It show me it is killed because out of memory. I have 25Gb ram, and it not enough. I think you code have a problem about load data to ram |
The graph data is big so you need bigger RAM. However, if it is the RAM size that causes the problem, you can load the data on demand instead of loading it all at once to RAM. This can be down via setting |
Many thank for your support. I have solved this problem by comment model.to(device) then train it by TPUs in Colab. TPU in colab has 35GB ram (GPU just have 25Gb). So i train it successful. But i dont understand when you say use smaller batch size, because batch size in your code is just 2. I think it is smallest batch size so can not decrease it. |
I have solved problem about train time. When i train it by TPU with batch size = 64, it take about 45 minute each epoch. |
Sounds good. I have never run it on TPU so your input is valuable; there might be something that needs to be customized to TPU as you suggested. |
Hi, i need your help.
When i run bash run_botnet.sh then i have a error below. Can you have solution for me?
Mon Aug 29 08:22:24 2022
loading dataset...
model ----------
GCNModel(
(gcn_net): ModuleList(
(0): GCNLayer(
(gcn): NodeModelAdditive (in_channels: 1, out_channels: 32, in_edgedim: None, deg_norm: rw, edge_gate: NoneType,aggr: add | number of parameters: 64)
(non_linear): Identity()
)
(1): GCNLayer(
(gcn): NodeModelAdditive (in_channels: 32, out_channels: 32, in_edgedim: None, deg_norm: rw, edge_gate: NoneType,aggr: add | number of parameters: 1056)
)
(dropout): Dropout(p=0.0, inplace=False)
(residuals): ModuleList(
(0): Linear(in_features=1, out_features=32, bias=False)
(1): Identity()
(2): Identity()
(3): Identity()
(4): Identity()
(5): Identity()
(6): Identity()
(7): Identity()
(8): Identity()
(9): Identity()
(10): Identity()
(11): Identity()
)
(non_linear): ReLU()
(final): Linear(in_features=32, out_features=2, bias=True)
)
/content/botnet_detection/run_botnet.sh: line 3: 3960 Killed CUDA_VISIBLE_DEVICES=$gpu python /content/botnet_detection/train_botnet.py --devid 0 --data_dir ./data/botnet --data_name "$topo" --batch_size 2 --enc_sizes 32 32 32 32 32 32 32 32 32 32 32 32 --act relu --residual_hop 1 --deg_norm rw --final proj --epochs 50 --lr 0.005 --early_stop 1 --save_dir ./saved_models --save_name "$topo"_model_lay12_rh1_rw_ep50.pt
The text was updated successfully, but these errors were encountered: