-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Req for Ollama reload of models + chapters not reaching word counts #67
Comments
You can pass it as model parameter but I'll add a default for it for 8192 which should be a healthy amount. Care for the extra VRAM usage (~+200MB for qwen2.5:7b). |
Great. Thanks for that. Any thoughts on the reloading of the models? Even with 8k I don't think it would make it through an entire run with a single model. (qwen2.5:7b is ok for testing but I'll move on to eva.qwen2.5:72b which I find is really good at novels.) |
They shouldn't fill the context up. It doesn't pass the whole history down on each turn. It should only pass the required information for each each step. You can also see the prompts here: https://github.com/datacrystals/AIStoryWriter/blob/main/Writer/Prompts.py |
Thanks for pointing out. Will investigate further. |
I have been using this for about a week now and I'm loving it (almost) and more people should know about this program. I currently have 2 problems with it.
I set a prompt like this: "Please write a story set in modern times, the story should contain 10 chapters of 1000-1500 words in each chapter." Then I add the story details. I have noticed that if you use a single model for all steps then it hits the context limit really quick. (I initially though that none of my models worked as they would all start looping). Then I tried copying the file in Ollama to a new name but it must have known it was the same file as it didn't reload. Is there a way to add reloading of a model in Ollama for each model stage of the config.py? Nothing popped out at me in the Ollama library (I only know very basic python). This would reset the context for the model at each stage and clear up part of the problem.
I can also see that Ollama has a context of only 2k for all models but looking at the Ollama python library I saw a reference to num_ctx in the _types.py under
class Options(TypedDict, total=False):
# load time options
num_ctx: int.
I think it may be able to be changed somewhere in your code but I have no idea where it would go. (1000 words should only be around 1400 tokens + overheads but with 2k context it's not going far.)
Thank you
[edit]
just found that num_ctx is already listed in wrapper.py but not implemented. Not sure how to implement it.
The text was updated successfully, but these errors were encountered: