You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for contributing such an amazing work! I'm impressed by the acceleration with Qualcomm NPU.
I guess the QNN backend in mllm only supports int8 models, right?
If so, it seems that the quantizer (code here) does not support int8 format. How to quantize into int8? More specifically, how to reproduce mllm's Qwen 1.5 1.8B int8 model listed in the readme?
Thanks!
The text was updated successfully, but these errors were encountered:
Thanks for your attention. To get the quantized int8 model for QNN prefilling, you need to get the profiled PyTorch model with quantization scale and outliers weight first, which is in tools/convertor/profiling_activation. Then you need to convert the weight to mllm type using src/quantizer/main.cpp.
The I8 quantization option, which is used by QNN models is not integrated into src/quantizer/main.cpp. This is a negligence, sorry for that.😓 We will add this as soon as possible.
Thank you for contributing such an amazing work! I'm impressed by the acceleration with Qualcomm NPU.
I guess the QNN backend in mllm only supports int8 models, right?
If so, it seems that the quantizer (code here) does not support int8 format. How to quantize into int8? More specifically, how to reproduce mllm's
Qwen 1.5 1.8B
int8 model listed in the readme?Thanks!
The text was updated successfully, but these errors were encountered: