You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File "train.py", line 202, in
lr=args.lr, device=device, img_scale=args.scale, val_percent=args.val / 100)
File "train.py", line 95, in train_net
scaled_loss.backward()
File "D:\ProgramData\Anaconda3\envs\UNet3plus\lib\contextlib.py", line 119, in exit
next(self.gen)
File "D:\ProgramData\Anaconda3\envs\UNet3plus\lib\site-packages\apex\amp\handle.py", line 123, in scale_loss
optimizer._post_amp_backward(loss_scaler)
File "D:\ProgramData\Anaconda3\envs\UNet3plus\lib\site-packages\apex\amp_process_optimizer.py", line 249, in post_backward_no_master_weights
post_backward_models_are_masters(scaler, params, stashed_grads)
File "D:\ProgramData\Anaconda3\envs\UNet3plus\lib\site-packages\apex\amp_process_optimizer.py", line 135, in post_backward_models_are_masters
scale_override=(grads_have_scale, stashed_have_scale, out_scale))
File "D:\ProgramData\Anaconda3\envs\UNet3plus\lib\site-packages\apex\amp\scaler.py", line 183, in unscale_with_stashed
out_scale/grads_have_scale,
ZeroDivisionError: float division by zero
Traceback (most recent call last):
File "train.py", line 202, in
lr=args.lr, device=device, img_scale=args.scale, val_percent=args.val / 100)
File "train.py", line 95, in train_net
scaled_loss.backward()
File "D:\ProgramData\Anaconda3\envs\UNet3plus\lib\contextlib.py", line 119, in exit
next(self.gen)
File "D:\ProgramData\Anaconda3\envs\UNet3plus\lib\site-packages\apex\amp\handle.py", line 123, in scale_loss
optimizer._post_amp_backward(loss_scaler)
File "D:\ProgramData\Anaconda3\envs\UNet3plus\lib\site-packages\apex\amp_process_optimizer.py", line 249, in post_backward_no_master_weights
post_backward_models_are_masters(scaler, params, stashed_grads)
File "D:\ProgramData\Anaconda3\envs\UNet3plus\lib\site-packages\apex\amp_process_optimizer.py", line 135, in post_backward_models_are_masters
scale_override=(grads_have_scale, stashed_have_scale, out_scale))
File "D:\ProgramData\Anaconda3\envs\UNet3plus\lib\site-packages\apex\amp\scaler.py", line 183, in unscale_with_stashed
out_scale/grads_have_scale,
ZeroDivisionError: float division by zero
epochs跑到2次,就报这个错误,查到网上说将lr改小一个等级,就可以。我把lr从0.01 改成 0.001,到了26epoch又报这个错误。
请问,是否有其他方法消除这个错误?以及这个错误由什么引起的?谢谢
The text was updated successfully, but these errors were encountered: