-
Notifications
You must be signed in to change notification settings - Fork 35
multi-GPU training problem #6
Comments
Can you please provide your config file? |
ok. In config.py, i jush modified the _C.TEST.VIS = True. In debug_multi-gpu.yml, the configurations are show as follows: INPUT: DATASETS: DATALOADER: SOLVER: STEPS: [40,70] WARMUP_FACTOR: 0.01 CHECKPOINT_PERIOD: 10 TEST: OUTPUT_DIR: "/home/wl/.pytorch_project/person_reid/reid_baseline_with_syncbn-master/outputs/20190925" |
the TRAIN_PATH, QUERY_PATH and GALLERY_PATH should be the folder to the images |
yes, i know it, so i modified the data.py according to my requirements.so i think the problem may not be here. |
Can you post your data.py? |
import torch def read_image(img_path): class ImageDataset(Dataset):
class BaseDataset:
def init_dataset(cfg): |
HI,when i decrease the IMS_PER_BATCH form 128 to 64, the model started to training. |
Hi,there is another question? |
@LilySys Did you reduce the batch size to solve this problem? and, Do you have any decrease in test accuracy after training? |
Hi, I also met this problem. I changed the batch size to 32 but it doesn't work. |
HI, when i trained the model with multi-gpu training, the model didn't start training after more than 30 minutes, and i don't konw why, could you give me some suggestions? Thank you!
2019-09-25 14:56:36,708 reid_baseline.train INFO: More than one gpu used, convert model to use SyncBN.
2019-09-25 14:56:40,504 reid_baseline.train INFO: Using pytorch SyncBN implementation
2019-09-25 14:56:40,535 reid_baseline.train INFO: Trainer Built
The text was updated successfully, but these errors were encountered: