You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I found that the init method of parameters in pythia-6.9B model is inconsistent with the standard deviation of the step0 checkpoint. Table 6 in the paper shows that init-method is small-init and output-layer-init-method is wang-init. But I got different std values from step0 models.
Inconsistent std values:
input_layer_std: 0.009882117688026186(small_init), 0.02(std calculated from step0 model paramters)
output_layer_std: 0.0009765625(wang_init), 0.0025(std calculated from step0 model paramters)
Could you provide the real init method? Thanks!
Config Table 6:
Here are the reproducible script and results.
import math
from transformers import GPTNeoXForCausalLM, AutoTokenizer
model = GPTNeoXForCausalLM.from_pretrained(
"EleutherAI/pythia-6.9b",
revision="step0",
)
model_dim = 4096 # Pythia-6.9b
# compute right std values of the two init methods
# reference https://github.com/EleutherAI/gpt-neox/blob/v1.0/megatron/model/init_functions.py#L101-L118
small_init_std = (2/(5* model_dim)) ** 0.5
wang_init_std = 2 / (32 * math.sqrt(model_dim))
print('small_init_std:', small_init_std)
print('wang_init_std:', wang_init_std)
for n, p in model.named_parameters():
print(n, p.shape, p.std().item())
no initialization method was set for these models. I don't think the standard deviations are empirical, probably more an artifact of the lack of proper init methods.
Hi, I found that the init method of parameters in pythia-6.9B model is inconsistent with the standard deviation of the step0 checkpoint. Table 6 in the paper shows that init-method is small-init and output-layer-init-method is wang-init. But I got different std values from step0 models.
Inconsistent std values:
Could you provide the real init method? Thanks!
Config Table 6:
Here are the reproducible script and results.
Results:
The text was updated successfully, but these errors were encountered: