You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Niccolo-Ajroldi
changed the title
Weight decay on normalization layers and Mamba custom parameters
Weight decay incorrectly applied to LayerNorm and Mamba A, D parameters
Dec 9, 2024
Niccolo-Ajroldi
changed the title
Weight decay incorrectly applied to LayerNorm and Mamba A, D parameters
[bug🐛] weight decay incorrectly applied to LayerNorm and Mamba A, D parameters
Dec 9, 2024
Niccolo-Ajroldi
changed the title
[bug🐛] weight decay incorrectly applied to LayerNorm and Mamba A, D parameters
[bug 🐛] weight decay incorrectly applied to LayerNorm and Mamba A, D parameters
Dec 9, 2024
Description
The current implementation adds weight decay to all model parameters.
However:
A_log
andD
:mad-lab/mad/model/layers/mamba.py
Line 116 in 69d09e2
Fix
I think 1. is a more crucial issue, but we should also include 2. to reflect standard practices in Language Modelling.
#6 implements a fix for both.
It creates two different param_groups for parameters with and without weight decay (see 5ab076a):
To distinguish normalization layers from other modules, I had to give them a name in the model initialization.
This is achieved by replacing:
with:
The text was updated successfully, but these errors were encountered: