Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for asymmetric embedding models #710

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

br3no
Copy link

@br3no br3no commented Apr 25, 2024

Description

This PR adds support for asymmetric embedding models such as https://huggingface.co/intfloat/multilingual-e5-small to the neural-search plugin.

It builds on the work done in opensearch-project/ml-commons#1799.

Asymmetric embedding models behave differently when embedding passages and queries. For that end, the model must "know" on inference time, what kind of data it is embedding.

The changes are:

1. src/main/java/org/opensearch/neuralsearch/processor/TextEmbeddingProcessor.java

The processor signals it is embedding passages, by passing the new AsymmetricTextEmbeddingParameters using the content type EmbeddingContentType.PASSAGE.

2. src/main/java/org/opensearch/neuralsearch/query/NeuralQueryBuilder.java

Analogously, the query builder uses EmbeddingContentType.QUERY.

3. src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java

Here is where most of the work was done. The class has been extended in a backwards-compatible way with inference methods that allow one to pass MLAlgoParams objects. Usage of AsymmetricTextEmbeddingParameters (which implements MLAlgoParams) is mandatory for asymmetric models. At the same time symmetric models do not accept them.

The only way to know whether a model is asymmetric or symmetric is by reading its model configuration (if the models' configuration contains a passage_prefix and/or a query_prefix, they are asymmetric, otherwise they are symmetric).

The src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java class deals with this, keeping the complexity in one place and not requiring any API change to the neural-search plugin (as proposed in #620). When calling the inference methods, clients (such as the TextEmbeddingProcessor) may pass the AsymmetricTextEmbeddingParameters object without caring if the model they are using is symmetric or asymmetric. The accessor class will first read the model's configuration (by calling the getModel API of the mlClient) and deal appropriately.

To avoid adding this extra roundtrip to every inference call, the asymmetry information is kept in a cache in memory.

Issues Resolved

#620

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@navneet1v
Copy link
Collaborator

@br3no can you add an entry in the changelog.

@navneet1v
Copy link
Collaborator

navneet1v commented Apr 26, 2024

@br3no Thanks for raising the PR. I am wondering do we require this change? In MLCommons repository a generic MLInference processor is getting launched which is supposed to do the inference of any kind of model both during ingestion and search. RFC: opensearch-project/ml-commons#2173

That capability is getting build as of now. Do you think we still need this feature then?

@br3no
Copy link
Author

br3no commented Apr 26, 2024

@navneet1v I have been loosely following the discussions in the mentioned RFC. It's a large change that I don't expect to be stable soon – the PR is very much in flux. Also, I don't see the use-case of asymmetric embedding models being addressed.

This PR here is much smaller in comparison and is not in any way in conflict with the RFC work. If once the work on the ML Inference Processors is finished and the use-case is addressed there as well, we can deprecate and eventually remove the functionality again.

Until then, this PR offers users the chance to use more modern local embeddings. I'm eager to put this to spin, tbh.

@navneet1v
Copy link
Collaborator

Also, I don't see the use-case of asymmetric embedding models being addressed.

If that is the case I would recommend posting the same on the RFC to ensure that your use case is handled.

On the other hand, I do agree this is an interesting feature. I would like to get some eyes on this change mainly in terms of should this be added or not given a more generic processor is around the corner. As I am of my opinion is concerned the main reason of generic processor was to avoid creating new/updating processors to support new model types which is happening in this PR.

Thoughts? @jmazanec15 , @martin-gaievski , @vamshin , @vibrantvarun .

Let me add some PMs too for Opensearch-project to know their thoughts. @dylan-tong-aws

Copy link

codecov bot commented Apr 26, 2024

Codecov Report

Attention: Patch coverage is 87.12871% with 13 lines in your changes missing coverage. Please review.

Project coverage is 84.41%. Comparing base (7c54c86) to head (44f14ec).
Report is 12 commits behind head on main.

Current head 44f14ec differs from pull request most recent head 6d3dba6

Please upload reports for the commit 6d3dba6 to get more accurate results.

Files Patch % Lines
...earch/neuralsearch/ml/MLCommonsClientAccessor.java 85.22% 9 Missing and 4 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #710      +/-   ##
============================================
- Coverage     85.02%   84.41%   -0.61%     
+ Complexity      790      785       -5     
============================================
  Files            60       59       -1     
  Lines          2430     2464      +34     
  Branches        410      409       -1     
============================================
+ Hits           2066     2080      +14     
- Misses          202      215      +13     
- Partials        162      169       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@br3no
Copy link
Author

br3no commented Apr 26, 2024

@navneet1v I have added a comment earlier today to the RFC (cf. opensearch-project/ml-commons#2173 (comment)).

Sure, let's open the discussion and get some PMs into it.

I really don't mind leaving this out if the support is introduced in another PR in 2.14. I'm concerned opensearch-project/ml-commons#2173 is a much larger effort, that won't be ready that quickly...

It's not about my contribution – I need the feature. 🙃

@navneet1v
Copy link
Collaborator

I really don't mind leaving this out if the support is introduced in another PR in 2.14. I'm concerned opensearch-project/ml-commons#2173 is a much larger effort, that won't be ready that quickly...

I can see the feature is marked for 2.14 release of Opensearch. Let me add maintainers from ML team too. @mingshl , @ylwu-amzn

@br3no
Copy link
Author

br3no commented Apr 29, 2024

@mingshl @ylwu-amzn, I'd really like to have this feature in 2.14.

Do you think this use-case will be fully supported with opensearch-project/ml-commons#2173? Cf. opensearch-project/ml-commons#2173 (comment)

If not, I'd be happy to help this PR get merged as an interim solution! Let me know what you think!

@mingshl
Copy link

mingshl commented Apr 29, 2024

@br3no ml inference processor is targeting at first supporting remote model only. How did you usually connect this model? is it local or remote?

if remote, can you please provide a SageMaker deployment code piece then I can quickly test it in 2.14 test cluster. Thanks

@br3no
Copy link
Author

br3no commented May 13, 2024

@mingshl sorry for taking so long to answer!

The use-case for now is to use a local, asymmetric model such as https://huggingface.co/intfloat/multilingual-e5-small.

This PR here is the last puzzle piece to allow one to use these kinds of model and should in principle also work with remote models. It makes sure that the neural-search plugin uses the correct inference parameters when embedding passages and queries with asymmetric models. Regardless of whether the model is local or remote, if you are using asymmetric models, you will need to provide this information anyway.

The thing is that asymmetric models need to know at inference time what exactly they are embedding. OpenSearch currently treats embedding models as symmetric, meaning that regardless of whether the text being embedded is a query or a passage, the embedding will be always the same. Asymmetric models require content "hints" to the text being embedded; the model exemplified above uses the string prefixes passage: and query: . These models perform better than similarly sized symmetric models.

In opensearch-project/ml-commons#1799 we have added the concept of asymmetric models into ml-commons, introducing the AsymmetricTextEmbeddingParameters class, used at inference time to signal if the text being embedded is a query or a passage. So this PR is only using this new infrastructure.

I would really be happy to get this merged as an interim solution until the ml inference processor fully supports this use-case.

@reuschling
Copy link

I also vote for this PR in need for this functionality.

@navneet1v
Copy link
Collaborator

@br3no will it possible if you can contribute back in MLInference processor for local model support? Is that even an option?

@br3no
Copy link
Author

br3no commented May 15, 2024

@navneet1v you mean making sure this works there as well? Sure, I can commit to that. I'd propose then to merge this PR now and then start the work to eventually replace this once the MLInference processor supports this use case...

@br3no
Copy link
Author

br3no commented Nov 6, 2024

I think I will need some assistance in understanding what to do exactly regarding the BWC tests. I'm not even sure I understand where to look for errors.

The error scenario you folks are seeing is:

  • a cluster is updated
  • a query is sent to a node running an older version of OS
  • the query contains an AsymmetricTextEmbeddingParameters
  • the legacy version cannot deserialize this class

Did I get this right? Or is it another scenario you are concerned with?


The class AsymmetricTextEmbeddingParameters has been part of ml-commons for about 3 months. This would then be a problem for all versions older than 2.14.

@yuye-aws
Copy link
Member

yuye-aws commented Nov 7, 2024

The class AsymmetricTextEmbeddingParameters has been part of ml-commons for about 3 months. This would then be a problem for all versions older than 2.14.

You can exclude versions older than 2.14 just like here.

@yuye-aws
Copy link
Member

yuye-aws commented Nov 7, 2024

@yuye-aws aren't the test cases covered completely by the HybridSearchIT classes? There the text embedding processor is created, documents are indexed and a hybrid query is executed.

Do you mean the BWC tests for HybridSearchIT? If so, you can fill in the gap between BWC tests and integrations tests for TextEmbeddingProcessorIT.

@br3no
Copy link
Author

br3no commented Nov 7, 2024

@yuye-aws

What I meant with this comment is that I don't see the need to implement a different BWC Test class for the change in this PR. The existing BWC tests (e.g. the HybridQueryIT, but others as well) test the complete process of creating an index, creating a pipeline for document embedding and issuing neural queries. There is no new process introduced with this PR that requires different BWC test code, from my point of view. Please correct me if I'm wrong.

BTW the rolling-upgrade BWC tests are failing on the main branch.

I ran

./gradlew :qa:rolling-upgrade:testRollingUpgrade -D'tests.bwc.version=2.18.0-SNAPSHOT'

without success.

This makes it hard to add new tests.

@yuye-aws
Copy link
Member

yuye-aws commented Nov 8, 2024

./gradlew :qa:rolling-upgrade:testRollingUpgrade

@vibrantvarun The comment makes sense to me. Can you check whether BWC tests are needed and help him?

@br3no br3no force-pushed the asymmetric-embeddings-620 branch from 390d04c to 61347b0 Compare November 8, 2024 10:05
@br3no
Copy link
Author

br3no commented Nov 8, 2024

@martin-gaievski I have addressed all your latest comments. Hope this PR can now be approved.

Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me, thank you

}));

Consumer<Boolean> predictConsumer = isAsymmetricModel -> {
MLInput mlInput = createMLMultimodalInput(targetResponseFilters, inputObjects, isAsymmetricModel ? mlAlgoParams : null);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be a case where we might want to pass mlAlgoParams for SymmetricModel in the future. Shouldn't we check if the model is asymmetric or not before constructing the request?

// Check here if model is symmetric or asymmetric
InferenceRequest.builder().modelId(this.modelId).inputTexts(inferenceList).mlAlgoParams(PASSAGE_PARAMETERS).build(),

Modifying the request internally will lead to a confusion later when we need to pass mlAlgoParams to symmetric model but it silently omit it before calling the model.

@zane-neo
Copy link
Collaborator

@br3no How are we ensuring BWC test compatibility?

This is a valid concern, I took a look on the code and it's could have BWC issue during seder. The latest version serializes the AsymmetricTextEmbeddingParameters, and if a node is deployed with a legacy version OS and when deserialization, it fetches the class from the internal cache map here, and since a legacy version doesn't have this class, the IllegalArgumentException will be thrown. @br3no Can you do a test on this case to double confirm if this is true? Thanks.

@br3no Please take a look on this comment, the seder is the only risk I see between old nodes and new nodes, and I don't think HybridSearchIT has covered asymmetric case as it only tested with text embedding model(symmetric model), they don't have the new configuration introduced in asymmetric model so no seder issue either. I would suggest you create a old cluster with two nodes(ml-node) and replace one nodes with latest code, and test two cases:

  1. Send request to old node and make the old node dispatch the request to new node.
  2. Send request to new node and make the new node dispatch request to old node.
    You don't need to manually configure anything to enable the dispatch as ml-commons automatically dispatch the requests to different nodes in a round-robin fashion, by controlling which node receives the request and trigger request twice, one of the request will be dispatched to another node. If you don't see any seder exception then it's good to merge the PR, thanks.

@br3no
Copy link
Author

br3no commented Nov 11, 2024

@zane-neo so if I get this right, I should create a new test that uses the asymmetric model feature. This test should only run for OS versions >= 2.19. Is this right?

Your concern is about making sure future releases will not break compatibility with this feature.

@br3no
Copy link
Author

br3no commented Nov 13, 2024

Ping.

@zane-neo
Copy link
Collaborator

@zane-neo so if I get this right, I should create a new test that uses the asymmetric model feature. This test should only run for OS versions >= 2.19. Is this right?

Your concern is about making sure future releases will not break compatibility with this feature.

@br3no That's right to use the asymmetric model feature for OS>=2.19, but the thing is a little bit different. you should test a mixed cluster with OS 2.19 and OS 2.18, since 2.18 has different code base of neural search, so a request to 2.19 node and then being dispatched to 2.18 node could encounter seder issues. My guess is you can test on this and based on the result:

  1. No seder error, you can add BWC tests with OS >= 2.19
  2. Has seder error, fix that and add BWC tests with OS >= 2.19

@martin-gaievski
Copy link
Member

btw, if you're working on BWC for 2.19+ and want to check how the results are, better rebase on latest main, we've switched 2.18-snapshot to 2.18/2.19-snapshot. This branch will keep failing on running BWCs

@martin-gaievski
Copy link
Member

@br3no the CI is in good shape now, you can work on BWC tests.

@zane-neo
Copy link
Collaborator

@zane-neo so if I get this right, I should create a new test that uses the asymmetric model feature. This test should only run for OS versions >= 2.19. Is this right?
Your concern is about making sure future releases will not break compatibility with this feature.

@br3no That's right to use the asymmetric model feature for OS>=2.19, but the thing is a little bit different. you should test a mixed cluster with OS 2.19 and OS 2.18, since 2.18 has different code base of neural search, so a request to 2.19 node and then being dispatched to 2.18 node could encounter seder issues. My guess is you can test on this and based on the result:

  1. No seder error, you can add BWC tests with OS >= 2.19
  2. Has seder error, fix that and add BWC tests with OS >= 2.19

@br3no I tested on this and I didn't find seder issues for OS version >= 2.13 so there's no concern on this, you can work on creating BWC tests for OS version >= 2.19, thanks.

@br3no
Copy link
Author

br3no commented Dec 16, 2024

Sorry folks, I didn't have time lately to invest here. I'll try to do this in the holiday season.

@mingshl
Copy link

mingshl commented Dec 16, 2024

Sorry folks, I didn't have time lately to invest here. I'll try to do this in the holiday season.

Hi @br3no, @brianf-aws is planning on adding a tutorial for using asymmetric local model with ml inference processors during ingest and search. opensearch-project/ml-commons#3258 you can watch for the new updates and post comments through the pr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancements Increases software capabilities beyond original client specifications
Projects
None yet
Development

Successfully merging this pull request may close these issues.