-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss is nan, stopping training #30
Comments
我也出现了这个问题,请问你解决了吗 |
Maybe try Float32 and reduce learning rate, BF16 can suffer from some stability issue. |
保证--if_amp为False,似乎能解决这个问题。(Try setting --if_amp to False) |
Hi, |
I also have this problem, have you ever solved the problem? |
Not really. It seems all vision mambas have the same problem. |
set AMP=False may work or just set lower lr |
I just change the backbone Vim to other vision mamba model, and it works... |
Thanks for the information! I'll look into it. |
Got same problem, fixed by dividing the sum of forward/backward hidden states by 2 to make hidden states/residuals of all layers have similar magnitude. Check out the detail: #90 |
@mdchuc, do you have any idea why, in code, they are flipping the out_b across the dim=-1? Shouldn't it be dim = 1? |
During the training process, the problem of loss being nan occurred. Why is this?
The text was updated successfully, but these errors were encountered: