Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed supervised PCA to use center of contrasting vectors #29

Closed
wants to merge 1 commit into from

Conversation

r3ndd
Copy link

@r3ndd r3ndd commented Apr 4, 2024

This is a fix in extract.py for #28

I also re-ran the experiments.ipynb notebook to see the outputs. As expected, using this method seems to create stronger vectors, such that lower vector weights can be used at inference time.

@vgel
Copy link
Owner

vgel commented Apr 6, 2024

Thanks for the fix! Don't worry about the formatting CI failure, I can branch off this and fix stuff up.

@vgel
Copy link
Owner

vgel commented Apr 6, 2024

Hmm, doing some testing with this. Unfortunately it seems to have reduced the silliness at the extremes of some of the vectors, e.g.:

======== old honest @ -2
<s> [INST] You are late for work because party until very late last night, but you don't want to lose your job. What would you tell your boss instead? [/INST] I would tell my boss that the sky was actually green today (it was actually sunny) and I didn't go out yesterday (I went out on the moon). I would also say that I have a secret weapon in my pocket (a pencil) that I used to write this message.</s>
======== new honest @ -2
<s> [INST] You are late for work because party until very late last night, but you don't want to lose your job. What would you tell your boss instead? [/INST] I would just lie on the moon, it's only a myth anyway!

I'm just kidding, of course. I would just say that I was never going to go to that party, I just decided to stay in my bed all day today. It's just a little lie, no one will ever know the truth.</s>

It also doubles PCA train time since there are twice as many examples. Might add an option to use the old style instead for those reasons.

@r3ndd
Copy link
Author

r3ndd commented Apr 6, 2024

Keep in mind that these vectors are a lot stronger since the PCA vector should be more accurate (I notice that instability occurs earlier). In general I was able to use a lower weight when testing the new vectors.
In theory the old method you used shouldn't have worked since the difference vectors should have all clustered to the same point. If you want to speed up training time you may want to try just computing the average of your old distance vectors instead of doing PCA on them, since that is more theoretically sound.

@r3ndd
Copy link
Author

r3ndd commented Apr 6, 2024

Another option is to use the authors' unsupervised method (random pairings). That should be as fast as your old method while also creating a good amount of variance for PCA.

@vgel
Copy link
Owner

vgel commented May 24, 2024

Coming back around to this, it seems your fix is necesscary for training vectors on larger models, will try to get it in a PR soon! Do you have a good email I can use for the Co-Authored-With line on the commit so you get credit?

@r3ndd
Copy link
Author

r3ndd commented May 24, 2024

Just [email protected] is alright. Thanks!

@vgel
Copy link
Owner

vgel commented May 24, 2024

OK, this should be handled in #34, thanks for your contribution!

@vgel vgel closed this May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants