representation-engineering/examples/memorization at main · andyzoujm/representation-engineering

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
quote_completions_control.ipynb		quote_completions_control.ipynb
utils.py		utils.py

README.md

This notebook provides an example of using directional representations extracted with RepE to control memorization behavior in a large language model.

It first loads a pretrained LLaMA model and tokenizer. It then extracts reading vectors on two datasets - literary openings and quotes. These reading vectors are expected to encode information about whether the model has memorized a given piece of text. The notebook then shows an example of using these reading vectors to control quote completions. It takes a dataset of incomplete famous quotes and their completions. Using the quote memorization reading vector with a negative coefficient substantially reduces the model's tendency to complete the quotes verbatim, demonstrating that the reading vector can potentially be used to reduce unwanted memorization.

For more details, please check out section 6.5 of our RepE paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memorization

memorization

README.md

Files

memorization

Directory actions

More options

Directory actions

More options

Latest commit

History

memorization

Folders and files

parent directory

README.md