Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Models: Add Yi-1.5-9B-Chat-16K #2750

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Models: Add Yi-1.5-9B-Chat-16K #2750

wants to merge 4 commits into from

Conversation

ThiloteE
Copy link
Collaborator

@ThiloteE ThiloteE commented Jul 26, 2024

Resolves #176

Adds model support for Yi-1.5-9B-Chat-16K

Description of Model

It is a bilingual model and at the date of writing with strong results in benchmarks (for its parameter size). It supports a context of up to 16K.

  • The model was trained/finetuned on English and Chinese language
  • License: Apache 2.0

Personal Impression:

I got the impression the model is very task focused and this is the reason, why I chose Below is an instruction that describes a task. Write a response that appropriately completes the request. as system prompt. I have seen refusals when it was tasked with certain things and it has the typical "know it better than the user" vibe and seems to be finetuned with a particular alignment. For instance, roleplay caused refusals, but tasking it to write a cover letter was no problem. Its long context and quality of responses makes it a good model, if you can bear its alignment or your use case happens to fall within the originally intended use cases of the model. It mainly will appeal to English and Chinese speaking users.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • I have added thorough documentation for my code.
  • I have tagged PR with relevant project labels. I acknowledge that a PR without labels may be dismissed.
  • If this PR addresses a bug, I have provided both a screenshot/video of the original bug and the working solution.

Adds model support for [Yi-1.5-9B-Chat-16K](https://huggingface.co/GPT4All-Community/Yi-1.5-9B-Chat-16K-GGUF)

## Description:

It is a bilingual model and at the date of writing with strong results in benchmarks (for its parameter size). It supports a context of up to 16K.

- Minimum required version: GPT4All 3.1.
- The model was trained on English and Chinese language.
- License: Apache 2.0
- Q4_0

## Personal Impression:
I got the impression the model is very task focused and this is the reason, why I chose  `Below is an instruction that describes a task. Write a response that appropriately completes the request.` as system prompt. I have seen refusals when it was tasked with certain things and has the typical "know it better than the user" vibe and seems to be finetuned for being a professional assistant. For instance, roleplay caused refusals, but writing a cover letter was no problem. Its long context and quality of responses makes it a good model, if you can bear its alignment. It mainly will appeal to English and Chinese speaking users.

Signed-off-by: ThiloteE <[email protected]>
@ThiloteE ThiloteE added models models.json This requires a change to the official model list. labels Jul 26, 2024
@ThiloteE ThiloteE changed the title Models3.json Add Yi-1.5-9B-Chat-16K Models: Add Yi-1.5-9B-Chat-16K Jul 26, 2024
ThiloteE added 2 commits July 27, 2024 01:58
Signed-off-by: ThiloteE <[email protected]>
Signed-off-by: ThiloteE <[email protected]>
Copy link
Collaborator

@3Simplex 3Simplex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested file hash size prompts download link. Seems like it's all good.

@ThiloteE ThiloteE requested a review from manyoso July 27, 2024 00:37
@ThiloteE ThiloteE mentioned this pull request Jul 27, 2024
5 tasks
@manyoso
Copy link
Collaborator

manyoso commented Jul 28, 2024

Is this model for mainland chinese or taiwanese? I'd like our maintainers of the translations for these to have a look

@manyoso
Copy link
Collaborator

manyoso commented Jul 28, 2024

Also, we really need sections key in our models.json so we don't just have a huge list of models, but we can overhaul the GUI to provide sections for a model that is more specialized, right?

@ThiloteE
Copy link
Collaborator Author

Unfortunately I am not fluent in Chinese. The original model card does not specify, if mainland or taiwanese.

@manyoso
Copy link
Collaborator

manyoso commented Jul 29, 2024

@supersonictw can you comment on this model's chinese abilities? is it traditional chinese or simplified? wondering if we should advertise its purported bilingual abilities

@supersonictw
Copy link
Contributor

Yi is a simplified chinese based model.
They call that as "零一万物"(01.ai).
The model is provided for Mainland China mainly, though it is found by Taiwanese Scientist.

@supersonictw
Copy link
Contributor

supersonictw commented Jul 29, 2024

The model is very friendly for people in Mainland China.
But if you want to add more models for Mainland China, it's better to add Qwen/Qwen2 models also oh I found they're already added, wow #2759 .

People in Taiwan are prefer to use LLaMa(or ChatGPT-4, lol 🤪), it's more general and can be accepted. For best Traditional Chinese model, it might be "TaiwanLLM", but it's not so required. LLaMa model family is useful enough for us.

@manyoso
Copy link
Collaborator

manyoso commented Aug 1, 2024

The model is very friendly for people in Mainland China. But if you want to add more models for Mainland China, it's better to add Qwen/Qwen2 models also oh I found they're already added, wow #2759 .

People in Taiwan are prefer to use LLaMa(or ChatGPT-4, lol 🤪), it's more general and can be accepted. For best Traditional Chinese model, it might be "TaiwanLLM", but it's not so required. LLaMa model family is useful enough for us.

This one is larger than the Qwen models so I think it should probably be an addition, right?

@ThiloteE
Copy link
Collaborator Author

ThiloteE commented Aug 1, 2024

If this model is not good enough, I can also try to find a finetune of it, but it is hard to find good finetunes nowadays, since the huggingface open leaderboard 2 has been quite inactive since weeks/months now.

My motivation for supporting this model specifically:

  • I think that having a model in a language that billions of people speak is a good idea.
  • The model claims to have larger context than most of the models GPT4All currently supports and if meta's llama-3.1-8b-instruct-128k turns out to not do well with larger context or still is buggy in the next release of GPT4All, then at least we have this one for longer context.
  • It is relatively high on the benchmarks.
  • None of this models quants out in the wild are compatible with GPT4All, so adding support for this model will save many users from trying bad quants.
  • I had to start "somewhere" at fixing models and opening pull-requests to GPT4All. I was planning to add more models, so this was just the first one I opened a pull-request for.
  • The model is from an organisation that is backed by a large corporation (Alibaba Cloud), so reputation is high.
  • The model's license is Apache 2.0
  • The model is still a recent model. Not too old yet.

@ThiloteE
Copy link
Collaborator Author

ThiloteE commented Aug 1, 2024

I will add a PR for Qwen2 as well. Maybe one of its finetunes as well. I think there are more finetunes for Qwen2

@ThiloteE
Copy link
Collaborator Author

ThiloteE commented Aug 1, 2024

image
image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
models.json This requires a change to the official model list. models
Projects
None yet
Development

Successfully merging this pull request may close these issues.

can you add chinese model?
4 participants