-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running Bloom #52
Comments
Source: https://www.infoq.com/news/2022/07/bigscience-bloom-nlp-ai/ According to this post you can run it on consumer hardware at 3 minutes/token. According to this post even on pretty good GPU hardware it can take 90 seconds/token though. Seems like you need really upper range systems to run it quickly. |
For inference only, what are the minimum requirements for RAM and GPU memories? |
About 350 GB of GPU RAM (~200 GB if you quantise to int8). |
|
Yep, need to get all those parameters into GPU RAM to run inference. Like I mentioned, you can use the accelerate framework to do "swapping" from CPU RAM to GPU RAM, which lets you do it with much less GPU RAM at a ridiculous speed penalty. |
What kind of machine is required to just run the inference on the 176B model? https://huggingface.co/bigscience/bloom
The text was updated successfully, but these errors were encountered: