-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NotImplementedError: Cannot copy out of meta tensor; no data! #87
Comments
I have the same problem. I'm running this on AWS g3.4xlarge model with 128GB of memory. python3 inference/bot.py --model togethercomputer/Pythia-Chat-Base-7B nvidia-smi -L pip3 freeze |
OK, solved it. The problem was the g3.4xlarge instance has only 8GB per GPU, clearly not enough. I re-ran this on a g5.2xlarge and the problem disappears. |
I have the same problem |
@zas97 @akashmittal18 Could you please describe your setup? I see that a lot of people have this issue but I'm not able to reproduce it. |
I used paperspace gradient with a P500 |
This error is caused by Accelerate auto-offloading weights to either the cpu or disk because of insufficient memory on the GPU. @zas97 can you try manually offloading weights using the So on the g3.4xlarge (8GB VRAM, 122 GB memory) you'd run: This can work better with #84 as you'd be able to change the 6 to an 8. @koonseng can you try this too? |
@orangetin can you give more details regarding the exact cause of this error? |
Sure @wemoveon2 ! When loading the model using Solutions:
|
@orangetin Not sure if There is another thread documenting this same issue (occurs at the line, with a different torch version IIRC) in which the solution was resolved by using @akashmittal18 did the proposed solution help resolve your issue? And if so, can you confirm whether you are still using CPU/disk offload along with the dtype assigned by |
I am having the same problem i loaded the model checkpoint shards in both float32 and bfloat16 but it does not work for me i do not know for what reason. This is my google colab file its a request to have a look in it. AN OVERVIEW OF MY CODE: This is the code snip that is giving me error: my checkpoint folder that i am passing. Please correct if i am conceptually wrong or missing some imp step. Thank You! |
@anujsahani01 I can't import your Colab file. The error is caused by offloading model weights incorrectly. Refer to my previous comments on how to fix it:
Closing this thread as it is solved. Feel free to continue the conversation if you're still having issues. |
Thank You ! |
Based on what was said, reordering the commands might provide a solution:
Ofc if the model itself (without inference data) can fit into VRAM |
While trying to implement Pythia-Chat-Base-7B I am getting this error on running the very fist command (
python inference/bot.py --model togethercomputer/Pythia-Chat-Base-7B
) after creating and activating the conda env.Can anyone help to identify what could possibly be the issue?
The text was updated successfully, but these errors were encountered: