Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify training logic #34

Open
moritzknolle opened this issue May 26, 2023 · 0 comments
Open

Simplify training logic #34

moritzknolle opened this issue May 26, 2023 · 0 comments

Comments

@moritzknolle
Copy link
Collaborator

moritzknolle commented May 26, 2023

The training logic is quite complex, hard to maintain, and probably bug-prone.

Concretely, I believe not handling two cases for grad_accumulation=True/False separately when creating the train op in dptraining/utils/train_utils could help to somewhat mitigate this and contribute towards a simpler code base. I would suggest for the case of grad_acc=1 (i.e. grad_accumulation=False) to simply call calc_grads() and apply_grads on every step. This should have negligible run-time overhead and make the training logic a lot simpler and more readable (P.S. There are also a couple other areas that i think could benefit from simplification if that's possible, but will have to ponder some more).

What do you think? If i'm not too busy I'll try and tackle this next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant