This repository has been archived by the owner on Oct 31, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 124
In DifferentiableAdam, sqrt() is non-differentiable at zero #125
Comments
Forgot to mention, anomaly detection needs to be enabled to see the runtime error:
|
@rickyloynd-microsoft I'm facing the same problem. Did you solve it without adding a small value? |
Yes, the solution above has been working for me. |
Although I'm having another issue now with Adam's internal state not getting persisted (and detached) between rollouts (#114). |
Can you tell me how to solve the problem? |
|
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
When using higher with Adam as the inner optimizer, calling the outer loss.backward() sometimes raises the following torch error:
The problem occurs when the exp_avg_sq tensor in DifferentiableAdam contains a zero, in which case exp_avg_sq.sqrt() is non-differentiable.
The problem disappears when I add a tiny value to exp_avg_sq before applying the sqrt():
But I don’t know whether this would cause other problems.
Is the _maybe_mask function designed to deal with zeros in exp_avg_sq?
I’m using the latest pip-installed version (higher=0.2.1).
The text was updated successfully, but these errors were encountered: