Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a limit to context (prefix) length? #82

Open
r3ndd opened this issue Jul 5, 2019 · 9 comments
Open

Is there a limit to context (prefix) length? #82

r3ndd opened this issue Jul 5, 2019 · 9 comments

Comments

@r3ndd
Copy link

r3ndd commented Jul 5, 2019

I understand there is a 1024 token output limit on each request, but are there any constraints on how long the context (prefix) can be?

Additionally, and this is directly related to a context limit, what is the best way to implement a system like a chatbot where the bot is continually switching between generating text (sending messages) and taking in new context (receiving messages)? The conversation history could grow indefinitely, but would the history need to be fed into the model every time the bot wants to send a new message? If so, are there speed/resource constraints to that naive approach which would warrant the removal of some older messages?

@woctezuma
Copy link
Contributor

woctezuma commented Jul 5, 2019

are there any constraints on how long the context (prefix) can be?

Based on #2, I think the output limit enforces a prefix limit, because the prefix is the start of the output.

@r3ndd
Copy link
Author

r3ndd commented Jul 5, 2019

@woctezuma thanks for the info. If I understand the model correctly, in that example where you talked about feeding in half of the previous output as the next input, the only "memory" it has is the most recent context provided, right?

@woctezuma
Copy link
Contributor

That is my understanding.

@r3ndd
Copy link
Author

r3ndd commented Jul 5, 2019

@woctezuma thanks for the help. Any idea how this input size limit affects training? For instance, I have bunch of documents with a specific format that I want to train the model on. At the top of each document is some meta data that hopefully the model will learn to use as context for the rest of the document. However, will the model still have the top of the document in context during training when it reaches the bottom of a document much larger than 1024 tokens?

@woctezuma
Copy link
Contributor

That is a good question. After looking at the code, I believe the limit, called window size, affects training. I guess it is set here in the original GPT-2 repository.

@minimaxir
Copy link
Owner

Yes, the metadata + text would have to be < 1024 tokens in order for it to be incorporated into the training.

@r3ndd
Copy link
Author

r3ndd commented Jul 6, 2019

Hmm, thanks. So let's say I have all of my documents of size <= 1024 tokens and delimited by <|document|></|document|> tags. Is there a simple way I can ensure that the model is always trained with an entire document at once and not two halves of separate documents?

@woctezuma
Copy link
Contributor

woctezuma commented Jul 6, 2019

not two halves of separate documents

I imagine that the window is sliding, so, even if the document is too big, it would not be just split in two. For instance, if the document is of length 1030 tokens, I expect it to be used as 7 lists of length 1024 tokens.

@r3ndd
Copy link
Author

r3ndd commented Jul 6, 2019

Good to know, thanks. By the way, is there a simple way to approximate token length so I can determine how many tokens a document is? When evaluating some of the outputs of the model for the Shakespeare examples I found the average token length to be about 3 characters, assuming all of the sample outputs were 1024 tokens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants