Loss is nan, stopping training #30

JunLiangZ · 2024-02-27T02:02:01Z

During the training process, the problem of loss being nan occurred. Why is this?

jasscia18 · 2024-03-01T02:26:10Z

During the training process, the problem of loss being nan occurred. Why is this?

我也出现了这个问题，请问你解决了吗

radarFudan · 2024-03-03T13:31:24Z

Maybe try Float32 and reduce learning rate, BF16 can suffer from some stability issue.

zhenyuZ-HUST · 2024-03-12T12:32:59Z

保证--if_amp为False，似乎能解决这个问题。（Try setting --if_amp to False）

sailor-z · 2024-04-18T13:41:54Z

Hi,
if_amp = False doesn't work for me. I also tried using a small learning rate, but the problem still exists. Does anyone know how to handle it?

BranStarkkk · 2024-05-15T15:11:43Z

Hi, if_amp = False doesn't work for me. I also tried using a small learning rate, but the problem still exists. Does anyone know how to handle it?

I also have this problem, have you ever solved the problem?

sailor-z · 2024-05-16T01:51:03Z

Hi, if_amp = False doesn't work for me. I also tried using a small learning rate, but the problem still exists. Does anyone know how to handle it?

I also have this problem, have you ever solved the problem?

Not really. It seems all vision mambas have the same problem.

CacatuaAlan · 2024-05-17T05:10:49Z

set AMP=False may work or just set lower lr

BranStarkkk · 2024-05-17T07:22:41Z

Hi, if_amp = False doesn't work for me. I also tried using a small learning rate, but the problem still exists. Does anyone know how to handle it?

I also have this problem, have you ever solved the problem?

Not really. It seems all vision mambas have the same problem.

I just change the backbone Vim to other vision mamba model, and it works...
Its name is VMamba.

sailor-z · 2024-05-17T09:19:51Z

Hi, if_amp = False doesn't work for me. I also tried using a small learning rate, but the problem still exists. Does anyone know how to handle it?

I also have this problem, have you ever solved the problem?

Not really. It seems all vision mambas have the same problem.

I just change the backbone Vim to other vision mamba model, and it works... Its name is VMamba.

Thanks for the information! I'll look into it.

mdchuc · 2024-05-30T00:16:36Z

Got same problem, fixed by dividing the sum of forward/backward hidden states by 2 to make hidden states/residuals of all layers have similar magnitude. Check out the detail: #90

Karn3003 · 2024-11-10T11:06:41Z

@mdchuc, do you have any idea why, in code, they are flipping the out_b across the dim=-1? Shouldn't it be dim = 1?

anwai98 mentioned this issue Apr 1, 2024

UMamba in encoder computational-cell-analytics/vimunet-benchmarking#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss is nan, stopping training #30

Loss is nan, stopping training #30

JunLiangZ commented Feb 27, 2024

jasscia18 commented Mar 1, 2024

radarFudan commented Mar 3, 2024

zhenyuZ-HUST commented Mar 12, 2024

sailor-z commented Apr 18, 2024

BranStarkkk commented May 15, 2024

sailor-z commented May 16, 2024

CacatuaAlan commented May 17, 2024

BranStarkkk commented May 17, 2024

sailor-z commented May 17, 2024

mdchuc commented May 30, 2024

Karn3003 commented Nov 10, 2024

Loss is nan, stopping training #30

Loss is nan, stopping training #30

Comments

JunLiangZ commented Feb 27, 2024

jasscia18 commented Mar 1, 2024

radarFudan commented Mar 3, 2024

zhenyuZ-HUST commented Mar 12, 2024

sailor-z commented Apr 18, 2024

BranStarkkk commented May 15, 2024

sailor-z commented May 16, 2024

CacatuaAlan commented May 17, 2024

BranStarkkk commented May 17, 2024

sailor-z commented May 17, 2024

mdchuc commented May 30, 2024

Karn3003 commented Nov 10, 2024