Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training with rois RuntimeError: Trying to resize storage that is not resizable #441

Open
mselimata opened this issue Jun 2, 2023 · 1 comment

Comments

@mselimata
Copy link
Contributor

mselimata commented Jun 2, 2023

Hi,
I am testing training with the region of interests instead of the full voxels and training using the default training settings and a fully connected network, I successfully ran the prepare data part and when training started I got the error below. I did not save the roi tensors, used the default setting.
clinicadl version 1.3.1 (with the recent edits in a separate conda environment), python 3.8

./trainDLroi.sh
17:58:14 - Find mask for roi rightHippocampusBox.
17:58:14 - Find mask for roi leftHippocampusBox.
17:58:14 - A new MAPS was created at /home/msa/ADCNDLroi
17:58:14 - Path of json file: /home/msa/ADCNDLroi/maps.json
17:58:16 - Training split 0
17:58:16 - Find mask for roi rightHippocampusBox.
17:58:16 - Find mask for roi leftHippocampusBox.
17:58:16 - Find mask for roi rightHippocampusBox.
17:58:16 - Find mask for roi leftHippocampusBox.
17:58:19 - Working on cuda:1
17:58:19 - Criterion for classification is CrossEntropyLoss()
17:58:19 - Beginning epoch 0.
Traceback (most recent call last):
File "/home/msa/miniconda3/envs/ClinicaDL-Dev/bin/clinicadl", line 8, in
sys.exit(cli())
File "/home/msa/miniconda3/envs/ClinicaDL-Dev/lib/python3.8/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/home/msa/miniconda3/envs/ClinicaDL-Dev/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/msa/miniconda3/envs/ClinicaDL-Dev/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/msa/miniconda3/envs/ClinicaDL-Dev/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/msa/miniconda3/envs/ClinicaDL-Dev/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/msa/miniconda3/envs/ClinicaDL-Dev/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/msa/clinicadl-dev/clinicadl/train/tasks/classification_cli.py", line 79, in cli
task_launcher("classification", task_specific_options, **kwargs)
File "/home/msa/clinicadl-dev/clinicadl/train/tasks/task_utils.py", line 110, in task_launcher
train(Path(kwargs["output_maps_directory"]), train_dict, train_dict.pop("split"))
File "/home/msa/clinicadl-dev/clinicadl/train/train.py", line 15, in train
maps_manager.train(split_list=split_list, overwrite=erase_existing)
File "/home/msa/clinicadl-dev/clinicadl/utils/maps_manager/maps_manager.py", line 149, in train
self._train_single(split_list, resume=False)
File "/home/msa/clinicadl-dev/clinicadl/utils/maps_manager/maps_manager.py", line 669, in _train_single
self._train(
File "/home/msa/clinicadl-dev/clinicadl/utils/maps_manager/maps_manager.py", line 855, in _train
for i, data in enumerate(train_loader):
File "/home/msa/miniconda3/envs/ClinicaDL-Dev/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 628, in next
data = self._next_data()
File "/home/msa/miniconda3/envs/ClinicaDL-Dev/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_data
return self._process_data(data)
File "/home/msa/miniconda3/envs/ClinicaDL-Dev/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_data
data.reraise()
File "/home/msa/miniconda3/envs/ClinicaDL-Dev/lib/python3.8/site-packages/torch/_utils.py", line 543, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/msa/miniconda3/envs/ClinicaDL-Dev/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/msa/miniconda3/envs/ClinicaDL-Dev/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 61, in fetch
return self.collate_fn(data)
File "/home/msa/miniconda3/envs/ClinicaDL-Dev/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 265, in default_collate
return collate(batch, collate_fn_map=default_collate_fn_map)
File "/home/msa/miniconda3/envs/ClinicaDL-Dev/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 128, in collate
return elem_type({key: collate([d[key] for d in batch], collate_fn_map=collate_fn_map) for key in elem})
File "/home/msa/miniconda3/envs/ClinicaDL-Dev/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 128, in
return elem_type({key: collate([d[key] for d in batch], collate_fn_map=collate_fn_map) for key in elem})
File "/home/msa/miniconda3/envs/ClinicaDL-Dev/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 120, in collate
return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map)
File "/home/msa/miniconda3/envs/ClinicaDL-Dev/lib/python3.8/site-packages/torch/utils/data/utils/collate.py", line 162, in collate_tensor_fn
out = elem.new(storage).resize
(len(batch), *list(elem.size()))
RuntimeError: Trying to resize storage that is not resizable

@ncassereau
Copy link
Contributor

ncassereau commented Aug 7, 2023

Hi,
I think it might related to the dataset. It is likely that you have two samples in the dataset which do not have exactly the same dimension, therefore the collate function crashes. If you use a num_workers in the dataloader of 0 (with clinicadl it would be --n_proc 0) you might get a more detailed stacktrace to help you pinpoint the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants