[FEATURE] Warning about Mismatch Between similarity function of Embedding Model and Index `space_type` #2356

YeonghyeonKO · 2024-12-26T09:46:49Z

Is your feature request related to a problem?

There can be a problem when embedding vectors(ex. msmarco-distilbert-base-tas-b; say it's similarity function is cosine similarity) are indexed if we map the knn_vector field with a different space_type. (ex. L2)
The distance calculated from the embedding model's weights and the vector distance from a HNSW Graph can differ, leading to inaccurate search scores.
This means that since OpenSearch stores HNSW Graph structures of each segment created by Faiss/NMSLIB/Lucene, search results from the graph could vary depending on the space_type.

What solution would you like?

Are there any benefits to using different space_type values with the similarity function of embedding models?
I suggest displaying warning messages in the above scenario to alert users to potential inaccuracies.

The text was updated successfully, but these errors were encountered:

navneet1v · 2024-12-27T07:34:18Z

@YeonghyeonKO this is an interesting ask, but since Opensearch can run in wide variety of environment, I don't see how opensearch can know what is the model being used to ingest the vectors in Opensearch and what space type the model is using.

YeonghyeonKO · 2024-12-28T15:00:45Z

@navneet1v Oh, from the perspective of a high degree of freedom as you said, what I've asked depends on the user side, not OpenSearch's. Also, since OpenSearch allow users to deploy custom ML models, the mismatch problem I've been worried should be well controlled/solved by them. It's up to us, not OpenSearch haha

heemin32 · 2024-12-30T17:19:45Z

@YeonghyeonKO I believe your request can be addressed through this GitHub issue. Please consider giving it a +1 if you'd like to have the feature.

YeonghyeonKO added enhancement untriaged labels Dec 26, 2024

navneet1v added question Further information is requested and removed untriaged enhancement labels Dec 27, 2024

navneet1v removed the question Further information is requested label Dec 27, 2024

YeonghyeonKO mentioned this issue Jan 1, 2025

[PROPOSAL] Neural Search field type opensearch-project/neural-search#803

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Warning about Mismatch Between similarity function of Embedding Model and Index `space_type` #2356

[FEATURE] Warning about Mismatch Between similarity function of Embedding Model and Index `space_type` #2356

YeonghyeonKO commented Dec 26, 2024

navneet1v commented Dec 27, 2024

YeonghyeonKO commented Dec 28, 2024

heemin32 commented Dec 30, 2024 •

edited

Loading

[FEATURE] Warning about Mismatch Between similarity function of Embedding Model and Index space_type #2356

[FEATURE] Warning about Mismatch Between similarity function of Embedding Model and Index space_type #2356

Comments

YeonghyeonKO commented Dec 26, 2024

navneet1v commented Dec 27, 2024

YeonghyeonKO commented Dec 28, 2024

heemin32 commented Dec 30, 2024 • edited Loading

[FEATURE] Warning about Mismatch Between similarity function of Embedding Model and Index `space_type` #2356

[FEATURE] Warning about Mismatch Between similarity function of Embedding Model and Index `space_type` #2356

heemin32 commented Dec 30, 2024 •

edited

Loading