Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ascend): support w4a16 #2587

Merged
merged 6 commits into from
Oct 23, 2024
Merged

Conversation

yao-fengchen
Copy link
Collaborator

@yao-fengchen yao-fengchen commented Oct 11, 2024

  1. Support w4a16 on ascend.
  2. The models we currently support and test on ascend include:
    • llama-2-7b-hf
    • llama-2-70b-chat-hf
    • Llama-3-8B-Instruct
    • Llama-3.1-8B-Instruct
    • internlm2-chat-7b
    • internlm2-chat-20b
    • internlm2_5-7b-chat
    • internlm2_5-20b-chat
    • Mini-InternVL-Chat-2B-V1-5
    • InternVL-Chat-V1-5
    • InternVL2-2B
    • InternVL2-26B
  3. related PR feat: support quantifying weights on ascend DeepLink-org/dlinfer#61
  4. The command:
    lmdeploy lite auto_awq internvl2-26b --work-dir ./internvl2-26b-4bit --device npu

@yao-fengchen yao-fengchen marked this pull request as draft October 11, 2024 09:11
@yao-fengchen yao-fengchen marked this pull request as ready for review October 16, 2024 11:17
@lvhan028 lvhan028 added the enhancement New feature or request label Oct 23, 2024
@lvhan028 lvhan028 merged commit 1530afe into InternLM:main Oct 23, 2024
5 checks passed
@yao-fengchen yao-fengchen deleted the yfc/ascend_w4a16 branch November 19, 2024 05:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants