Questions regarding the WSC evaluation results #172

mutiann · 2024-09-20T14:27:15Z

Hi,

I'm recently trying to run lm-eval on Pythia models using the benchmarks listed in the paper. All the benchmarks show similar results to those reported in the paper, except WSC. In the paper the Pythia models report a WSC score of 0.3~0.5, while the models can easily get 0.6~0.8 accuracy on the WSC273 task from lm-eval. May I confirm what is the WSC task reported in the paper and how is it evaluated?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions regarding the WSC evaluation results #172

Questions regarding the WSC evaluation results #172

mutiann commented Sep 20, 2024 •

edited

Loading

Questions regarding the WSC evaluation results #172

Questions regarding the WSC evaluation results #172

Comments

mutiann commented Sep 20, 2024 • edited Loading

mutiann commented Sep 20, 2024 •

edited

Loading