xxxx Killed #21

TruongDuyLongPTIT · 2022-08-29T06:13:49Z

Hi, i need your help.
When i run bash run_botnet.sh then i have a error below. Can you have solution for me?

Mon Aug 29 08:22:24 2022

loading dataset...
model ----------
GCNModel(
(gcn_net): ModuleList(
(0): GCNLayer(
(gcn): NodeModelAdditive (in_channels: 1, out_channels: 32, in_edgedim: None, deg_norm: rw, edge_gate: NoneType,aggr: add | number of parameters: 64)
(non_linear): Identity()
)
(1): GCNLayer(
(gcn): NodeModelAdditive (in_channels: 32, out_channels: 32, in_edgedim: None, deg_norm: rw, edge_gate: NoneType,aggr: add | number of parameters: 1056)

  (non_linear): Identity()
)
(2): GCNLayer(
  (gcn): NodeModelAdditive (in_channels: 32, out_channels: 32, in_edgedim: None, deg_norm: rw, edge_gate: NoneType,aggr: add | number of parameters: 1056)
  (non_linear): Identity()
)
(3): GCNLayer(
  (gcn): NodeModelAdditive (in_channels: 32, out_channels: 32, in_edgedim: None, deg_norm: rw, edge_gate: NoneType,aggr: add | number of parameters: 1056)
  (non_linear): Identity()
)
(4): GCNLayer(
  (gcn): NodeModelAdditive (in_channels: 32, out_channels: 32, in_edgedim: None, deg_norm: rw, edge_gate: NoneType,aggr: add | number of parameters: 1056)
  (non_linear): Identity()
)
(5): GCNLayer(
  (gcn): NodeModelAdditive (in_channels: 32, out_channels: 32, in_edgedim: None, deg_norm: rw, edge_gate: NoneType,aggr: add | number of parameters: 1056)
  (non_linear): Identity()
)
(6): GCNLayer(
  (gcn): NodeModelAdditive (in_channels: 32, out_channels: 32, in_edgedim: None, deg_norm: rw, edge_gate: NoneType,aggr: add | number of parameters: 1056)
  (non_linear): Identity()
)
(7): GCNLayer(
  (gcn): NodeModelAdditive (in_channels: 32, out_channels: 32, in_edgedim: None, deg_norm: rw, edge_gate: NoneType,aggr: add | number of parameters: 1056)
  (non_linear): Identity()
)
(8): GCNLayer(
  (gcn): NodeModelAdditive (in_channels: 32, out_channels: 32, in_edgedim: None, deg_norm: rw, edge_gate: NoneType,aggr: add | number of parameters: 1056)
  (non_linear): Identity()
)
(9): GCNLayer(
  (gcn): NodeModelAdditive (in_channels: 32, out_channels: 32, in_edgedim: None, deg_norm: rw, edge_gate: NoneType,aggr: add | number of parameters: 1056)
  (non_linear): Identity()
)
(10): GCNLayer(
  (gcn): NodeModelAdditive (in_channels: 32, out_channels: 32, in_edgedim: None, deg_norm: rw, edge_gate: NoneType,aggr: add | number of parameters: 1056)
  (non_linear): Identity()
)
(11): GCNLayer(
  (gcn): NodeModelAdditive (in_channels: 32, out_channels: 32, in_edgedim: None, deg_norm: rw, edge_gate: NoneType,aggr: add | number of parameters: 1056)
  (non_linear): Identity()
)

)
(dropout): Dropout(p=0.0, inplace=False)
(residuals): ModuleList(
(0): Linear(in_features=1, out_features=32, bias=False)
(1): Identity()
(2): Identity()
(3): Identity()
(4): Identity()
(5): Identity()
(6): Identity()
(7): Identity()
(8): Identity()
(9): Identity()
(10): Identity()
(11): Identity()
)
(non_linear): ReLU()
(final): Linear(in_features=32, out_features=2, bias=True)
)
/content/botnet_detection/run_botnet.sh: line 3: 3960 Killed CUDA_VISIBLE_DEVICES=$gpu python /content/botnet_detection/train_botnet.py --devid 0 --data_dir ./data/botnet --data_name "$topo" --batch_size 2 --enc_sizes 32 32 32 32 32 32 32 32 32 32 32 32 --act relu --residual_hop 1 --deg_norm rw --final proj --epochs 50 --lr 0.005 --early_stop 1 --save_dir ./saved_models --save_name "$topo"_model_lay12_rh1_rw_ep50.pt

The text was updated successfully, but these errors were encountered:

jzhou316 · 2022-08-29T14:57:39Z

Hi, it would be helpful if you could provide more detailed error message, so that we can see where the program errors out in the code exactly.

TruongDuyLongPTIT · 2022-08-29T15:02:04Z

Hi, tks for your reply but it no error message, it show like above and stop.

TruongDuyLongPTIT · 2022-08-29T15:09:44Z

Do you provide a jupyter notebook? I spend 3 days to run this project, but have no result. I am very grateful if you provide a jupyter notebook.

TruongDuyLongPTIT · 2022-08-29T17:36:23Z

https://github.com/TruongDuyLongPTIT/DoAnTotNghiepPTIT/blob/main/Untitled1.ipynb
You can see error here.

jzhou316 · 2022-08-29T20:45:55Z

In the installation, can you specify the version of pytorch scatter to be a previous one by pip install torch-scatter=1.3.1? There is a known issue with the newest version for our code.

TruongDuyLongPTIT · 2022-08-30T02:42:00Z

When i install torch-scatter=1.3.1 then this happen:
Traceback (most recent call last):
File "/content/botnet_detection/train_botnet.py", line 12, in
from botdet.models_pyg.gcn_model import GCNModel
File "/content/botnet_detection/botdet/models_pyg/gcn_model.py", line 4, in
from .gcn_base_models import NodeModelAdditive, NodeModelMLP
File "/content/botnet_detection/botdet/models_pyg/gcn_base_models.py", line 5, in
from torch_geometric.nn.inits import glorot, zeros
File "/usr/local/lib/python3.7/dist-packages/torch_geometric/init.py", line 2, in
import torch_geometric.nn
File "/usr/local/lib/python3.7/dist-packages/torch_geometric/nn/init.py", line 2, in
from .data_parallel import DataParallel
File "/usr/local/lib/python3.7/dist-packages/torch_geometric/nn/data_parallel.py", line 5, in
from torch_geometric.data import Batch
File "/usr/local/lib/python3.7/dist-packages/torch_geometric/data/init.py", line 1, in
from .data import Data
File "/usr/local/lib/python3.7/dist-packages/torch_geometric/data/data.py", line 7, in
from torch_sparse import coalesce
File "/usr/local/lib/python3.7/dist-packages/torch_sparse/init.py", line 40, in
from .storage import SparseStorage # noqa
File "/usr/local/lib/python3.7/dist-packages/torch_sparse/storage.py", line 5, in
from torch_scatter import segment_csr, scatter_add
ImportError: cannot import name 'segment_csr' from 'torch_scatter' (/usr/local/lib/python3.7/dist-packages/torch_scatter/init.py)

TruongDuyLongPTIT · 2022-08-30T03:16:45Z

I think "Killed" happen because it take a lot of ram. Do you have some ways to save ram?

TruongDuyLongPTIT · 2022-08-30T04:43:31Z

when i run !dmseg. It show me it is killed because out of memory. I have 25Gb ram, and it not enough. I think you code have a problem about load data to ram
[ 5504.567657] Memory cgroup out of memory: Killed process 55997 (python3) total-vm:32736176kB, anon-rss:24958644kB, file-rss:76852kB, shmem-rss:12288kB, UID:0 pgtables:51400kB oom_score_adj:0
[ 5505.200038] oom_reaper: reaped process 55997 (python3), now anon-rss:0kB, file-rss:74732kB, shmem-rss:12288kB
[ 5565.818167] printk: dmesg (59088): Attempt to access syslog with CAP_SYS_ADMIN but no CAP_SYSLOG (deprecated).

jzhou316 · 2022-08-30T16:18:05Z

The graph data is big so you need bigger RAM. However, if it is the RAM size that causes the problem, you can load the data on demand instead of loading it all at once to RAM. This can be down via setting in_memory=False here (you could also use smaller batch size). Also you may find some of the discussions here be helpful.

TruongDuyLongPTIT · 2022-08-31T02:32:04Z

Many thank for your support. I have solved this problem by comment model.to(device) then train it by TPUs in Colab. TPU in colab has 35GB ram (GPU just have 25Gb). So i train it successful. But i dont understand when you say use smaller batch size, because batch size in your code is just 2. I think it is smallest batch size so can not decrease it.
And i have an other problem, even i train with TPU but each epoch take about 1hour. And can can not train all 50 epoch. Do you have some tips to train fastter? (I using batch size = 2, 8, 16 but time do not improve)

TruongDuyLongPTIT · 2022-08-31T10:36:43Z

I have solved problem about train time. When i train it by TPU with batch size = 64, it take about 45 minute each epoch.
But when I train it with GPU instead TPU with batch_size = 16, each epoch just take about 2 minute.
So i guess pytorch only should train on GPU (not TPU).

jzhou316 · 2022-09-03T03:55:03Z

Sounds good. I have never run it on TPU so your input is valuable; there might be something that needs to be customized to TPU as you suggested.

TruongDuyLongPTIT changed the title ~~2536 Killed~~ xxxx Killed Aug 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xxxx Killed #21

xxxx Killed #21

TruongDuyLongPTIT commented Aug 29, 2022 •

edited

Loading

jzhou316 commented Aug 29, 2022

TruongDuyLongPTIT commented Aug 29, 2022

TruongDuyLongPTIT commented Aug 29, 2022

TruongDuyLongPTIT commented Aug 29, 2022

jzhou316 commented Aug 29, 2022

TruongDuyLongPTIT commented Aug 30, 2022

TruongDuyLongPTIT commented Aug 30, 2022

TruongDuyLongPTIT commented Aug 30, 2022

jzhou316 commented Aug 30, 2022

TruongDuyLongPTIT commented Aug 31, 2022

TruongDuyLongPTIT commented Aug 31, 2022 •

edited

Loading

jzhou316 commented Sep 3, 2022 •

edited

Loading

xxxx Killed #21

xxxx Killed #21

Comments

TruongDuyLongPTIT commented Aug 29, 2022 • edited Loading

Hi, i need your help. When i run bash run_botnet.sh then i have a error below. Can you have solution for me?

Mon Aug 29 08:22:24 2022

jzhou316 commented Aug 29, 2022

TruongDuyLongPTIT commented Aug 29, 2022

TruongDuyLongPTIT commented Aug 29, 2022

TruongDuyLongPTIT commented Aug 29, 2022

jzhou316 commented Aug 29, 2022

TruongDuyLongPTIT commented Aug 30, 2022

TruongDuyLongPTIT commented Aug 30, 2022

TruongDuyLongPTIT commented Aug 30, 2022

jzhou316 commented Aug 30, 2022

TruongDuyLongPTIT commented Aug 31, 2022

TruongDuyLongPTIT commented Aug 31, 2022 • edited Loading

jzhou316 commented Sep 3, 2022 • edited Loading

TruongDuyLongPTIT commented Aug 29, 2022 •

edited

Loading

Hi, i need your help.
When i run bash run_botnet.sh then i have a error below. Can you have solution for me?

TruongDuyLongPTIT commented Aug 31, 2022 •

edited

Loading

jzhou316 commented Sep 3, 2022 •

edited

Loading