About the learning rate for resnet-50 #34

cswaynecool · 2022-10-08T07:25:42Z

I met an issue training resnet-50 with moco-v3. Under the distributed training setting with 16 V100 GPUs (each process only has one gpu, batch size 4096), I can get the training loss at about 27.2 in the 100-th epoch. When I lower the learning to 1.5e-4 (the default one is 0.6), the loss decreases more resonably and it reaches 27.0 in the 100-th epoch. Could you please verify if this is reasonable.

cswaynecool · 2022-10-08T08:01:26Z

It seems that the training process is hardly convergent under the default learning rate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the learning rate for resnet-50 #34

About the learning rate for resnet-50 #34

cswaynecool commented Oct 8, 2022

cswaynecool commented Oct 8, 2022

About the learning rate for resnet-50 #34

About the learning rate for resnet-50 #34

Comments

cswaynecool commented Oct 8, 2022

cswaynecool commented Oct 8, 2022