-
-
Notifications
You must be signed in to change notification settings - Fork 675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a limit to context (prefix) length? #82
Comments
Based on #2, I think the output limit enforces a prefix limit, because the prefix is the start of the output. |
@woctezuma thanks for the info. If I understand the model correctly, in that example where you talked about feeding in half of the previous output as the next input, the only "memory" it has is the most recent context provided, right? |
That is my understanding. |
@woctezuma thanks for the help. Any idea how this input size limit affects training? For instance, I have bunch of documents with a specific format that I want to train the model on. At the top of each document is some meta data that hopefully the model will learn to use as context for the rest of the document. However, will the model still have the top of the document in context during training when it reaches the bottom of a document much larger than 1024 tokens? |
That is a good question. After looking at the code, I believe the limit, called window size, affects training. I guess it is set here in the original GPT-2 repository. |
Yes, the metadata + text would have to be < 1024 tokens in order for it to be incorporated into the training. |
Hmm, thanks. So let's say I have all of my documents of size <= 1024 tokens and delimited by <|document|></|document|> tags. Is there a simple way I can ensure that the model is always trained with an entire document at once and not two halves of separate documents? |
I imagine that the window is sliding, so, even if the document is too big, it would not be just split in two. For instance, if the document is of length 1030 tokens, I expect it to be used as 7 lists of length 1024 tokens. |
Good to know, thanks. By the way, is there a simple way to approximate token length so I can determine how many tokens a document is? When evaluating some of the outputs of the model for the Shakespeare examples I found the average token length to be about 3 characters, assuming all of the sample outputs were 1024 tokens. |
I understand there is a 1024 token output limit on each request, but are there any constraints on how long the context (prefix) can be?
Additionally, and this is directly related to a context limit, what is the best way to implement a system like a chatbot where the bot is continually switching between generating text (sending messages) and taking in new context (receiving messages)? The conversation history could grow indefinitely, but would the history need to be fed into the model every time the bot wants to send a new message? If so, are there speed/resource constraints to that naive approach which would warrant the removal of some older messages?
The text was updated successfully, but these errors were encountered: