You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There can be a problem when embedding vectors(ex. msmarco-distilbert-base-tas-b; say it's similarity function is cosine similarity) are indexed if we map the knn_vector field with a different space_type. (ex. L2)
The distance calculated from the embedding model's weights and the vector distance from a HNSW Graph can differ, leading to inaccurate search scores.
This means that since OpenSearch stores HNSW Graph structures of each segment created by Faiss/NMSLIB/Lucene, search results from the graph could vary depending on the space_type.
What solution would you like?
Are there any benefits to using different space_type values with the similarity function of embedding models?
I suggest displaying warning messages in the above scenario to alert users to potential inaccuracies.
The text was updated successfully, but these errors were encountered:
@YeonghyeonKO this is an interesting ask, but since Opensearch can run in wide variety of environment, I don't see how opensearch can know what is the model being used to ingest the vectors in Opensearch and what space type the model is using.
@navneet1v Oh, from the perspective of a high degree of freedom as you said, what I've asked depends on the user side, not OpenSearch's. Also, since OpenSearch allow users to deploy custom ML models, the mismatch problem I've been worried should be well controlled/solved by them. It's up to us, not OpenSearch haha
Is your feature request related to a problem?
knn_vector
field with a different space_type. (ex. L2)space_type
.What solution would you like?
The text was updated successfully, but these errors were encountered: