-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Installation]: Running ohereForAI/c4ai-command-r-v01 with main pytorch #6355
Comments
the undefined symbol cuTensorMapEncodeTiled is solved by workaround |
|
I resolved the issue , I was running python from vllm/vllm folder and python was confused |
i get the following messages and running vllm stucks on the benchmark
|
how does your nightly pytorch use NCCL? static link or dynamic link? which NCCL does it use? |
I fixed that by setting the version of the nccl libarary
the benchmark still gets stuck though, i have seen local posts about nccl getting stuck and having tor revert not sure if related. |
can you try to follow the debugging guide https://docs.vllm.ai/en/latest/getting_started/debugging.html ? there is a sanity check script to help you locate the problem. |
I used the VLLM_TRACE and it get stuck at note the following warning which i am unusre if it related
|
the warning is irrelevant. can you stably reproduce this hang? if so, you can remove that part of code sets up a publish-subscribe message queue for communication. |
+1 hanged there at |
It turns out that using host network on a IPv6 only machine is problematic. Bridged network works just fine on my side |
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you! |
Your current environment
why is it important:
This is a prerequisite to the work on enabling troch.compile on vllm, we need to be able to build vllm with nightly so that we can iterate on changes and try features that are not released yet.
current error:
Failed to import from vllm._C with ImportError('/home/lsakka/vllm/vllm/_C.abi3.so: undefined symbol: cuTensorMapEncodeTiled')
any idea what this could be?
It was mentioned that vllm was struggling to upgrade one step version
diff file
How you are installing vllm
what did I do:
when Import vllm i get the error bellow:
current error:
Failed to import from vllm._C with ImportError('/home/lsakka/vllm/vllm/_C.abi3.so: undefined symbol: cuTensorMapEncodeTiled')
The text was updated successfully, but these errors were encountered: