You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On my fork I'm attempting to add in support for HuggingFace's OpenWebText dataset and their GPT2 tokenizer so that I can do a comparison against HF's GPT2-small. If you were willing, I'd love advice on setting up model params for self-supervised autoregressive NLP. Thanks!
The text was updated successfully, but these errors were encountered:
Hi @nathanneuro. You can probably use pretty normal settings. In the paper, we used some special hparams for PG-19 and Wikitext-103 to avoid overfitting, but OpenWebText is probably large enough that you don't need to worry about that. For text, rotary position encoding seems to work well. Maybe start with 8192 context and 1024 latents?
On my fork I'm attempting to add in support for HuggingFace's OpenWebText dataset and their GPT2 tokenizer so that I can do a comparison against HF's GPT2-small. If you were willing, I'd love advice on setting up model params for self-supervised autoregressive NLP. Thanks!
The text was updated successfully, but these errors were encountered: