NaN value in the loss. #1

mountains-high · 2024-01-24T04:44:59Z

Hi, thanks for the fine work.
I didn't change any hyperparameters and ran the code as is. I'm getting NaN in the loss. Could you please help me solve this issue?
Thanks.

hchoi71 · 2024-01-24T20:00:36Z

Hello thanks for brining up this issue.

I've not seen this when running the code, but I appreciate you finding it. I suspect that there might be a factor contributing to training instability, especially when 'stds' and 'stdt' are very small..

Unfortunately, I'm not able to see&fix it immediately due to other ongoing tasks, I will update it once I complete current workload. In the meantime, could you try it with clipping stds and stdt in MIXSTD.py if this works?

stdt = torch.clamp(torch.std(logit_t, dim=-1, keepdim=True), min=1e-4)
stds = torch.clamp(torch.std(logit_s, dim=-1, keepdim=True), min=1e-4)

Also, I was wondering what model's configuration causes this NaN.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NaN value in the loss. #1

NaN value in the loss. #1

mountains-high commented Jan 24, 2024

hchoi71 commented Jan 24, 2024 •

edited

Loading

NaN value in the loss. #1

NaN value in the loss. #1

Comments

mountains-high commented Jan 24, 2024

hchoi71 commented Jan 24, 2024 • edited Loading

hchoi71 commented Jan 24, 2024 •

edited

Loading