[P0] Fixing LoReFT rotation layer hot loading problem (#114) #123

frankaging · 2024-07-25T01:45:19Z

Descriptions:

As reported in #114, LoReFT experiments with GLUE seem to be not reproducing the results. In fact, the evaluation results are far off. We suspect that model loading from the best checkpoint is not working. Evidence is given:

when running a standalone eval script, the evaluation results are stable and match expected results;
when running eval after loading the best checkpoint (either manually, or through the HF trainer), the evaluation results do not match.

Thanks to @m-dev12, the issue seems to be that the loaded rotation layer weights are incorrect. The rotation layer is saved as low-rank matrix to save disk space; when loading it back, we overwrite corresponding columns of the rotation weight matrix. It seems like, it does not overwrite.

To resolve this, we modify the loading function inside the intervention to make sure it is properly loaded.

frankaging · 2024-07-25T01:47:03Z

This change also fixes another dtype issue with the GLUE trainer (label dtype is not correct) + another minor issue with pkg dependency. Test logs:

Command,

python train.py -task glue -train_dataset stsb -model FacebookAI/roberta-base -seed 42 -l all -r 1 -p f3 -e 5 -lr 8e-3 -type LoreftIntervention -gradient_accumulation_steps 1 -batch_size 32 -eval_batch_size 32 -test_split validation -max_length 256 --metric_for_best_model pearson --dropout 0.00 --weight_decay 0.0000 --warmup_ratio 0.06 --logging_steps 20 --allow_cls_grad

Before,

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 40.43it/s]
{'validation_pearson': 0.14262418203758725, 'validation_spearmanr': 0.13899118144358605, 'validation_combined_score': 0.14080768174058667}
Training results can be found in ./official_results/roberta-base.glue.stsb.validation.20240724161727112780

After,

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 30.21it/s]
{'validation_pearson': 0.8391224830717346, 'validation_spearmanr': 0.8385444933269972, 'validation_combined_score': 0.838833488199366}
Training results can be found in ./official_results/roberta-base.glue.stsb.validation.20240724183608588062

frankaging · 2024-07-25T04:04:40Z

reproducing one of the paper result with STS-B to further validate this change:

Running command:

$ python train.py -task glue -train_dataset stsb -model FacebookAI/roberta-base -seed 45 -l all -r 1 -p f3 -e 60 -lr 6e-4 -type LoreftIntervention -gradient_accumulation_steps 1 -batch_size 32 -eval_batch_size 32 -test_split validation -max_length 256 --metric_for_best_model pearson --dropout 0.05 --weight_decay 0.0000 --warmup_ratio 0.03 --logging_steps 20 --allow_cls_grad

Results:

{'loss': 0.277, 'grad_norm': 3.5031702518463135, 'learning_rate': 1.145475372279496e-06, 'epoch': 59.89}                                                                                                              
{'loss': 0.2966, 'grad_norm': 6.497849464416504, 'learning_rate': 0.0, 'epoch': 60.0}                                                                                                                                 
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 38.08it/s]
{'eval_pearson': 0.9022589082956544, 'eval_spearmanr': 0.9016221832172824, 'eval_combined_score': 0.9019405457564684, 'epoch': 60.0}                                                                                  
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10800/10800 [10:14<00:00, 19.58it/s]Directory './official_results/roberta-base.glue.stsb.validation.20240724203908668458/checkpoint-10800/intervenable_model' created successfully.
Loading best model from ./official_results/roberta-base.glue.stsb.validation.20240724203908668458/checkpoint-8280 (score: 0.9030545788928331).
{'train_runtime': 614.9703, 'train_samples_per_second': 560.905, 'train_steps_per_second': 17.562, 'train_loss': 0.48447934751157407, 'epoch': 60.0}                                                                  
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10800/10800 [10:14<00:00, 17.56it/s]
{'n_params': 18444}
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 37.89it/s]
{'validation_pearson': 0.9017251334987769, 'validation_spearmanr': 0.8985514125829985, 'validation_combined_score': 0.9001382730408878}
Training results can be found in ./official_results/roberta-base.glue.stsb.validation.20240724203908668458

[P0] Fixing LoReFT rotation layer hot loading problem

7510aa5

remove logging and clean up

5bee6b8

frankaging merged commit deb3d83 into main Jul 25, 2024

frankaging mentioned this pull request Jul 25, 2024

[P1] Eval time model is not loaded: Unable to replicate results from paper for RoBERTa Base for Glue tasks like CoLa #114

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[P0] Fixing LoReFT rotation layer hot loading problem (#114) #123

[P0] Fixing LoReFT rotation layer hot loading problem (#114) #123

frankaging commented Jul 25, 2024

frankaging commented Jul 25, 2024 •

edited

Loading

frankaging commented Jul 25, 2024

[P0] Fixing LoReFT rotation layer hot loading problem (#114) #123

[P0] Fixing LoReFT rotation layer hot loading problem (#114) #123

Conversation

frankaging commented Jul 25, 2024

frankaging commented Jul 25, 2024 • edited Loading

frankaging commented Jul 25, 2024

frankaging commented Jul 25, 2024 •

edited

Loading