feat(ascend): support w4a16 #2587

yao-fengchen · 2024-10-11T09:11:02Z

Support w4a16 on ascend.
The models we currently support and test on ascend include:
- llama-2-7b-hf
- llama-2-70b-chat-hf
- Llama-3-8B-Instruct
- Llama-3.1-8B-Instruct
- internlm2-chat-7b
- internlm2-chat-20b
- internlm2_5-7b-chat
- internlm2_5-20b-chat
- Mini-InternVL-Chat-2B-V1-5
- InternVL-Chat-V1-5
- InternVL2-2B
- InternVL2-26B
related PR feat: support quantifying weights on ascend DeepLink-org/dlinfer#61
The command：
lmdeploy lite auto_awq internvl2-26b --work-dir ./internvl2-26b-4bit --device npu

yao-fengchen marked this pull request as draft October 11, 2024 09:11

yao-fengchen marked this pull request as ready for review October 16, 2024 11:17

yao-fengchen added 5 commits October 22, 2024 03:05

feat(ascend): support w4a16

b67386d

refactor ascend awq_linear

a415321

update doc

91963ee

refine code

124a2fa

update code

f63881f

yao-fengchen force-pushed the yfc/ascend_w4a16 branch from f7cf4f3 to 206fbd3 Compare October 22, 2024 04:33

rebase main

6e6b578

yao-fengchen force-pushed the yfc/ascend_w4a16 branch from 206fbd3 to 6e6b578 Compare October 22, 2024 04:37

jinminxi104 requested review from grimoire and lvhan028 October 22, 2024 11:22

lvhan028 approved these changes Oct 23, 2024

View reviewed changes

grimoire approved these changes Oct 23, 2024

View reviewed changes

lvhan028 added the enhancement New feature or request label Oct 23, 2024

lvhan028 merged commit 1530afe into InternLM:main Oct 23, 2024
5 checks passed

yao-fengchen deleted the yfc/ascend_w4a16 branch November 19, 2024 05:57

Provide feedback