Support minicpm3-4b #2465

AllentDan · 2024-09-13T10:10:59Z

update_weights function is not obvious to users. @grimoire

grimoire · 2024-09-13T11:14:58Z

It is ok to copy weight into lm_head in load_weight (with the price that same weight would be doubled).

lvhan028 · 2024-09-14T06:00:08Z

May update the supported models

lvhan028 · 2024-09-14T06:00:43Z

@zhulinJulia24 Please add this model to test set

grimoire · 2024-09-15T06:58:09Z

lmdeploy/pytorch/configurations/minicpm3.py

+from .builder import AutoModelConfigBuilder
+
+
+class DeepseekV2ModelConfigBuilder(AutoModelConfigBuilder):


grimoire · 2024-09-15T07:10:00Z

lmdeploy/pytorch/models/minicpm3.py

+        )
+
+
+class MiniCPMLongRoPE(MiniCPMRotaryEmbedding):


lmdeploy/lmdeploy/pytorch/backends/rotary_embedding.py

Line 15 in 5cedcb9

LongRoPEScaling = auto()

grimoire · 2024-09-15T07:16:47Z

Can we use mla?

AllentDan · 2024-09-18T04:02:21Z

Can we use mla?

I tried MLA and rope scaling like deepseekv2. But the result seemed wrong.

lmdeploy/pytorch/configurations/minicpm3.py

lvhan028 · 2024-09-18T13:21:28Z

May update the supported models
miss README

grimoire · 2024-09-19T03:04:26Z

lmdeploy/pytorch/models/minicpm3.py

+    return torch.cat((-x2, x1), dim=-1)
+
+
+def apply_rotary_pos_emb(q, k, cos, sin, position_ids, unsqueeze_dim=1):


We have provide apply rotary and rotary embedding op in pytorch.nn, Which is fused and would provide better performance.

grimoire · 2024-09-19T03:15:10Z

I tried MLA and rope scaling like deepseekv2. But the result seemed wrong.

Less than 1000 blocks can be allocated for kv caches with cache_max_entry_count=0.8 on A100. Add TODO comment if you don't want to optimize in this PR.

lvhan028 · 2024-09-20T04:06:23Z

lmdeploy/pytorch/models/minicpm3.py

+            dtype=dtype,
+            device=device,
+            quant_config=quantization_config,
+            is_tp=True,


If tp=1 is passed to PytorchEngineConfig, do we still set is_tp=True here?

Seems all the is_tp are set to True in pytorch engine regardless of tp=1 or not.

Support minicpm3-4b

5cedcb9

lvhan028 added the enhancement New feature or request label Sep 14, 2024

lvhan028 requested review from grimoire and irexyc September 14, 2024 03:18

grimoire reviewed Sep 15, 2024

View reviewed changes

update doc and fix typo

0e57ea6

grimoire reviewed Sep 18, 2024

View reviewed changes

lmdeploy/pytorch/configurations/minicpm3.py Outdated Show resolved Hide resolved

AllentDan added 2 commits September 19, 2024 10:16

comments

ce96cb6

sum -> max

068a790

grimoire reviewed Sep 19, 2024

View reviewed changes

add TODO

84ac01d

lvhan028 reviewed Sep 20, 2024

View reviewed changes

use pytorch engine's rotary embedding

c15ffcf

AllentDan mentioned this pull request Sep 20, 2024

MiniCPM3-4B会支持吗？ #2455

Closed

grimoire approved these changes Sep 23, 2024

View reviewed changes

irexyc approved these changes Sep 23, 2024

View reviewed changes

lvhan028 merged commit f3bef7b into InternLM:main Sep 23, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support minicpm3-4b #2465

Support minicpm3-4b #2465

AllentDan commented Sep 13, 2024

grimoire commented Sep 13, 2024

lvhan028 commented Sep 14, 2024

lvhan028 commented Sep 14, 2024

grimoire Sep 15, 2024

grimoire Sep 15, 2024

grimoire commented Sep 15, 2024

AllentDan commented Sep 18, 2024

lvhan028 commented Sep 18, 2024

grimoire Sep 19, 2024

grimoire commented Sep 19, 2024

lvhan028 Sep 20, 2024

AllentDan Sep 20, 2024 •

edited

Loading

		from .builder import AutoModelConfigBuilder


		class DeepseekV2ModelConfigBuilder(AutoModelConfigBuilder):

		return torch.cat((-x2, x1), dim=-1)


		def apply_rotary_pos_emb(q, k, cos, sin, position_ids, unsqueeze_dim=1):

Support minicpm3-4b #2465

Support minicpm3-4b #2465

Conversation

AllentDan commented Sep 13, 2024

grimoire commented Sep 13, 2024

lvhan028 commented Sep 14, 2024

lvhan028 commented Sep 14, 2024

grimoire Sep 15, 2024

Choose a reason for hiding this comment

grimoire Sep 15, 2024

Choose a reason for hiding this comment

grimoire commented Sep 15, 2024

AllentDan commented Sep 18, 2024

lvhan028 commented Sep 18, 2024

grimoire Sep 19, 2024

Choose a reason for hiding this comment

grimoire commented Sep 19, 2024

lvhan028 Sep 20, 2024

Choose a reason for hiding this comment

AllentDan Sep 20, 2024 • edited Loading

Choose a reason for hiding this comment

AllentDan Sep 20, 2024 •

edited

Loading