You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The training logic is quite complex, hard to maintain, and probably bug-prone.
Concretely, I believe not handling two cases for grad_accumulation=True/False separately when creating the train op in dptraining/utils/train_utils could help to somewhat mitigate this and contribute towards a simpler code base. I would suggest for the case of grad_acc=1 (i.e. grad_accumulation=False) to simply call calc_grads() and apply_grads on every step. This should have negligible run-time overhead and make the training logic a lot simpler and more readable (P.S. There are also a couple other areas that i think could benefit from simplification if that's possible, but will have to ponder some more).
What do you think? If i'm not too busy I'll try and tackle this next week.
The text was updated successfully, but these errors were encountered:
The training logic is quite complex, hard to maintain, and probably bug-prone.
Concretely, I believe not handling two cases for
grad_accumulation=True/False
separately when creating the train op indptraining/utils/train_utils
could help to somewhat mitigate this and contribute towards a simpler code base. I would suggest for the case of grad_acc=1 (i.e.grad_accumulation=False
) to simply callcalc_grads()
andapply_grads
on every step. This should have negligible run-time overhead and make the training logic a lot simpler and more readable (P.S. There are also a couple other areas that i think could benefit from simplification if that's possible, but will have to ponder some more).What do you think? If i'm not too busy I'll try and tackle this next week.
The text was updated successfully, but these errors were encountered: