Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor torch inference engine #871

Merged
merged 68 commits into from
Dec 28, 2023
Merged

Refactor torch inference engine #871

merged 68 commits into from
Dec 28, 2023

Conversation

lvhan028
Copy link
Collaborator

No description provided.

* cherry-pick Fix meta tensor error commits

* fix smooth quant

---------

Co-authored-by: pppppM <[email protected]>
@lvhan028
Copy link
Collaborator Author

lvhan028 commented Dec 21, 2023

这个PR中,我们重点保证功能的正确性:

  • @RunningLeon 请结合 opencompass 中支持 pytorch_poc 分支评测的功能,测以下模型的精度: llama2-7b-chat, internlm-7b-chat, baichuan2-13b-chat, qwen-7b-chat
  • @HIT-cwh 测试 chatglm,falcon 的对话结果是否正确。142上有模型
  • @AllentDan 测试 w8a8 的功能(llama2,internlm),对话结果是否正确。以及 internlm-20b的TP
  • @lvhan028 测试benchmark脚本的正确性

Alternatively, you can manually convert original 16-bit weights into 8-bit by referring to the content under the ["8bit Weight Quantization"](#8bit-weight-quantization) section. Save them in the internlm-chat-7b-w8 directory, using the command below:

```shell
python lmdeploy/lite/apis/smooth_quant.py internlm/internlm-7b ./internlm-chat-7b-w8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got:

from .calibrate import calibrate
ImportError: attempted relative import with no known parent package

following the doc.

docs/en/w8a8.md Outdated
Afterwards, use the following command to interact with the model via the terminal:

```shell
python lmdeploy/pytorch_poc/chat.py ./internlm-chat-7b-w8 internlm-chat
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytorch_poc -> pytorch

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And another relative import error for the command.

@AllentDan
Copy link
Collaborator

20b w8 转换完后,直接跑 tp 2 挂了。

rank[1]: cannot unpack non-iterable NoneType object

@grimoire
Copy link
Collaborator

20b w8 转换完后,直接跑 tp 2 挂了。

rank[1]: cannot unpack non-iterable NoneType object

@HIT-cwh

@lvhan028
Copy link
Collaborator Author

20b w8 转换完后,直接跑 tp 2 挂了。

rank[1]: cannot unpack non-iterable NoneType object

不量化的话,TP是否正确呢?

@AllentDan
Copy link
Collaborator

20b w8 转换完后,直接跑 tp 2 挂了。

rank[1]: cannot unpack non-iterable NoneType object

不量化的话,TP是否正确呢?

正确

@HIT-cwh
Copy link
Collaborator

HIT-cwh commented Dec 26, 2023

20b w8 转换完后,直接跑 tp 2 挂了。

rank[1]: cannot unpack non-iterable NoneType object

@HIT-cwh

添加了 W8A8 支持 TP 的 pr,麻烦帮忙 review 下。
主要修改为在切分权重的时候需要考虑到 buffer ,因为在 QLinear 中,Weight, Bias, Scale 都是以 buffer 形式存储的。

如果把 Weight, Bias, Scale 都注册为 Parameter ,可能需要统一修改 register parameter 的逻辑。因为 Weight 需要是 int8 类型的,将其注册为 Parameter 的时候需要将requires_grad 设为False,而 accelerator 的 init_on_device 中重写了注册 Parameter 的机制,导致 self.weight = nn.Parameter(a_int8_tensor, requires_grad=False)这种写法是不起效果的。可能需要将所有涉及注册 Parameter 的地方都改成下面这种写法:

param = torch.Tensor._make_subclass(nn.Parameter, param, False)
if param.dtype == torch.int8:
      param.__dict__.update({'requires_grad': False})
module.register_parameter(name, param)

* fix smooth quant save_pretrained

* support w8a8 tp

* change weight and bias in QLinear back to buffer

* remove debug codes and add comments
Copy link
Collaborator

@RunningLeon RunningLeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HIT-cwh
Copy link
Collaborator

HIT-cwh commented Dec 27, 2023

chatglm 对话没问题,falcon 对话比较奇怪,但可能是模型本身性能不行,例如下图的对话结果(会一直重复):
image

Copy link
Collaborator

@AllentDan AllentDan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM But scripts in documents should be updated. Such as pytorch-poc -> pytorch

@lvhan028 lvhan028 merged commit 344e126 into main Dec 28, 2023
4 of 6 checks passed
@lvhan028 lvhan028 deleted the pytorch-poc branch January 30, 2024 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants