Simplify training logic #34

moritzknolle · 2023-05-26T14:29:11Z

The training logic is quite complex, hard to maintain, and probably bug-prone.

Concretely, I believe not handling two cases for grad_accumulation=True/False separately when creating the train op in dptraining/utils/train_utils could help to somewhat mitigate this and contribute towards a simpler code base. I would suggest for the case of grad_acc=1 (i.e. grad_accumulation=False) to simply call calc_grads() and apply_grads on every step. This should have negligible run-time overhead and make the training logic a lot simpler and more readable (P.S. There are also a couple other areas that i think could benefit from simplification if that's possible, but will have to ponder some more).

What do you think? If i'm not too busy I'll try and tackle this next week.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify training logic #34

Simplify training logic #34

moritzknolle commented May 26, 2023 •

edited

Loading

Simplify training logic #34

Simplify training logic #34

Comments

moritzknolle commented May 26, 2023 • edited Loading

moritzknolle commented May 26, 2023 •

edited

Loading