Testing Inverse scaling dataset #4

lauritowal · 2023-04-11T20:38:28Z

We want to take the inverse scaling datasets and train a DLK probe for the following models:

a small model (GPT2)
a mid-sized model (GPT-J)
a big model (some LLama model…)

Then we want to check if the representation of truth in inner representations is also getting less accurate for bigger models. If that happens, it could point in the direction of the model's understanding actually getting worse. On the other hand, if it doesn't hold, it could point in the direction of the inverse scaling law cases thus far having more to do with something weird going on with output behavior in a given context, and they might not generalize. Also, this seems like a potentially interesting additional testing ground for whether DLK can provide information about the model beyond output behavior.

See in blog post

lauritowal · 2023-04-11T20:38:37Z

interested in working on this next

lauritowal · 2023-04-12T10:07:18Z

Less promising after GPT-4 and https://arxiv.org/abs/2211.02011

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing Inverse scaling dataset #4

Testing Inverse scaling dataset #4

lauritowal commented Apr 11, 2023

lauritowal commented Apr 11, 2023 •

edited

Loading

lauritowal commented Apr 12, 2023

Testing Inverse scaling dataset #4

Testing Inverse scaling dataset #4

Comments

lauritowal commented Apr 11, 2023

lauritowal commented Apr 11, 2023 • edited Loading

lauritowal commented Apr 12, 2023

lauritowal commented Apr 11, 2023 •

edited

Loading