Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Nvembed v2 model #2649

Merged
merged 11 commits into from
Dec 3, 2024
Merged

Add Nvembed v2 model #2649

merged 11 commits into from
Dec 3, 2024

Conversation

cdoko
Copy link
Contributor

@cdoko cdoko commented Nov 29, 2024

NV-Embed-v2 is a text embedding model that ranks No. 1 (as of Nov 25 2024) on the MTEB benchmark.

This implementation reproduces the example in the model card.

License: CC-BY-NC-4.0

Please let me know if any changes are required!

@LaurentMazare
Copy link
Collaborator

Thanks for adding this model, one tricky bit is that the candle-transformers crate has an apache/mit license so we cannot really include code that is CC-BY-NC whithin it - having models for which the code is apache/mit and the weights is CC-BY--NC would be fine though.
So if you can base your implementation on code that doesn't have the NC restriction that would make it possible to have this in candle-transformers.

@cdoko
Copy link
Contributor Author

cdoko commented Dec 1, 2024

Hi, I'm new to open-source licensing, I want to clarify how it works. The NVEmbedV2 repository contains a CC-BY-NC-4.0 license in the README, but doesn't have explicit licensing information in individual files, including the implementation code. Is it generally assumed that the license applies to the entire repository, including both code and weights?

The README states 'This model should not be used for any commercial purpose. Refer the license for the detailed terms.' Does this language imply that the CC-BY-NC-4.0 license only applies to the model weights, or also to the accompanying implementation code?

To clarify, I added the CC-BY-NC-4.0 license to my implementation code, but I did this out of caution and uncertainty.

If necessary, I'd be happy to re-implement.

@LaurentMazare
Copy link
Collaborator

(disclaimer: not a lawyer neither a specialist of license)
The repo on huggingface has a license of "cc-by-nc-4.0" and unless it's explicitely stated otherwise in the files in the repo, my understanding is that it applies to them, so in particular it applies to the modeling_nvembed.py file and this cannot be used as a base for the implementation itself.
If you can re-implement the model based on the paper or an alternative open-source implementation with a more open source license that would be great.

@cdoko
Copy link
Contributor Author

cdoko commented Dec 3, 2024

Thanks for the clarification on the licensing. I've updated the implementation to be based on the paper and weights. Looking forward to your feedback!

@LaurentMazare LaurentMazare merged commit 145aa71 into huggingface:main Dec 3, 2024
10 checks passed
@LaurentMazare
Copy link
Collaborator

Looks all good to me, I've merged it, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants