Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing Inverse scaling dataset #4

Open
lauritowal opened this issue Apr 11, 2023 · 2 comments
Open

Testing Inverse scaling dataset #4

lauritowal opened this issue Apr 11, 2023 · 2 comments

Comments

@lauritowal
Copy link

We want to take the inverse scaling datasets and train a DLK probe for the following models:

  • a small model (GPT2)
  • a mid-sized model (GPT-J)
  • a big model (some LLama model…)

Then we want to check if the representation of truth in inner representations is also getting less accurate for bigger models. If that happens, it could point in the direction of the model's understanding actually getting worse. On the other hand, if it doesn't hold, it could point in the direction of the inverse scaling law cases thus far having more to do with something weird going on with output behavior in a given context, and they might not generalize. Also, this seems like a potentially interesting additional testing ground for whether DLK can provide information about the model beyond output behavior.

See in blog post

@lauritowal
Copy link
Author

lauritowal commented Apr 11, 2023

interested in working on this next

@lauritowal
Copy link
Author

Less promising after GPT-4 and https://arxiv.org/abs/2211.02011

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant