Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support minicpm3-4b #2465

Merged
merged 6 commits into from
Sep 23, 2024
Merged

Support minicpm3-4b #2465

merged 6 commits into from
Sep 23, 2024

Conversation

AllentDan
Copy link
Collaborator

update_weights function is not obvious to users. @grimoire

@grimoire
Copy link
Collaborator

It is ok to copy weight into lm_head in load_weight (with the price that same weight would be doubled).

@lvhan028 lvhan028 added the enhancement New feature or request label Sep 14, 2024
@lvhan028 lvhan028 requested review from grimoire and irexyc September 14, 2024 03:18
@lvhan028
Copy link
Collaborator

May update the supported models

@lvhan028
Copy link
Collaborator

@zhulinJulia24 Please add this model to test set

from .builder import AutoModelConfigBuilder


class DeepseekV2ModelConfigBuilder(AutoModelConfigBuilder):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename

)


class MiniCPMLongRoPE(MiniCPMRotaryEmbedding):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grimoire
Copy link
Collaborator

Can we use mla?

@AllentDan
Copy link
Collaborator Author

Can we use mla?

I tried MLA and rope scaling like deepseekv2. But the result seemed wrong.

@lvhan028
Copy link
Collaborator

May update the supported models
miss README

return torch.cat((-x2, x1), dim=-1)


def apply_rotary_pos_emb(q, k, cos, sin, position_ids, unsqueeze_dim=1):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have provide apply rotary and rotary embedding op in pytorch.nn, Which is fused and would provide better performance.

@grimoire
Copy link
Collaborator

I tried MLA and rope scaling like deepseekv2. But the result seemed wrong.

Less than 1000 blocks can be allocated for kv caches with cache_max_entry_count=0.8 on A100. Add TODO comment if you don't want to optimize in this PR.

dtype=dtype,
device=device,
quant_config=quantization_config,
is_tp=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If tp=1 is passed to PytorchEngineConfig, do we still set is_tp=True here?

Copy link
Collaborator Author

@AllentDan AllentDan Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems all the is_tp are set to True in pytorch engine regardless of tp=1 or not.

@lvhan028 lvhan028 merged commit f3bef7b into InternLM:main Sep 23, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants