[PROPOSAL] Neural Search field type #803

asfoorial · 2024-06-25T12:59:15Z

Can we memic this feature in OpenSearch https://www.elastic.co/search-labs/blog/semantic-search-simplified-semantic-text

I know that a lot has been done recently in OpenSearch projects to make things headache free. I think a neural-search field type in OpenSearch would be an interesting addition. However, it should account for synonyms to avoid any fine-tuning headache.

navneet1v · 2024-07-04T02:57:06Z

@asfoorial from Opensearch side we do this via a combination of ingestion processor and vector field. As there are multiple use-cases for semantic search including multi-model, this would be an interesting field to have.

But is there any specific reason you are looking for the field as compared to what is present currently. My main motive here is to know the advantages of a new field vs what is currently present in opensearch.

asfoorial · 2024-07-04T10:00:25Z

The main reason is simplifying the process and keep the focus on the business. In fact elasticsearch had the same reason when they introduced the field.

Another reason is alignment of new features across multiple OpenSearch projects. I have noticed over the past number of releases we get new features in ml-commons and kNN. But it takes a while until we see their benefits reflected in neural-search. If they become one component (neural-search field), then that would sort of guarantee that any new feature in ml-common or kNN must be reflected in the neural-search field type before their release.

navneet1v · 2024-07-29T19:18:47Z

If they become one component (neural-search field), then that would sort of guarantee that any new feature in ml-common or kNN must be reflected in the neural-search field type before their release.

@asfoorial thanks for providing the details. I want to know little bit more on what features added in ML/k-NN doesn't make into Neural. May be there is something missing.

But I really like the idea of having a field which can encapsulate the processor information.

navneet1v · 2024-07-29T19:21:54Z

One place where having the field will be useful is nested fields. I see putting this information in the processor is very painful and not intutive.

navneet1v · 2024-09-03T19:15:57Z

@minalsha please take a look into this and please add your thoughts

heemin32 · 2024-11-26T18:46:57Z

I think this is a good idea as it simplifies the use of neural search significantly. By defining a neural field, all other processes, such as the neural search pipeline, neural ingestion pipeline, KNN index creation, chunking, and more, will be handled behind the scenes.

navneet1v · 2024-12-26T17:37:36Z

@heemin32 any reason for closing this gh issue?

heemin32 · 2024-12-30T17:17:56Z

@heemin32 any reason for closing this gh issue?

@navneet1v I think it is closed automatically when I added them in NeuralSearch RoadMap. Reopened it.

navneet1v · 2024-12-30T20:37:21Z

One case where I feel this field type will be very useful is in cases of complex nested fields. Currently with TextEmbedding processor it is always feels like we are finding different cases where the processor is not working some GH issues:

[BUG] Fail to generate embedding for ingest document with nested field defined in field map #1042
[BUG] Fail to ingest document with nested list into text_embedding processor #1024
[BUG] Text chunking processor not working with nested documents #895
[BUG] _bulk update request failing when using text chunking processor pipeline #798
[BUG] Incorrect validation logic for map type in xxxProcessor #739
[BUG] error on complex types list type field [category] has empty string, cannot process it #678
IllegalArgumentException when all embedding fields not shown or doing a partial update without embedding fields #73

I believe having a field type will solve this problem, in the mappers only we will call the MLCommons inference APIs to convert the text to embeddings. I think we can use the concept of properties in the mapper to have a neural field handling both text and vectors.

cc: @minalsha , @heemin32 , @vibrantvarun , @martin-gaievski

YeonghyeonKO · 2024-12-30T22:52:13Z

This will also reduce the number of inference requests when multiple fields have to be embedded.

Inference requests in semantic_text fields are also batched. If you have 10 documents in a bulk API request, and each document contains 2 semantic_text fields, then that request will perform a single inference request with 20 texts to your inference service in one go, instead of making 10 separate inference requests of 2 texts each.
(https://www.elastic.co/search-labs/blog/semantic-search-simplified-semantic-text)

bzhangam · 2024-12-31T00:29:44Z

I'll work on this item.

YeonghyeonKO · 2025-01-01T11:37:50Z

@bzhangam, is there room for consideration to include a minor feature? (See: opensearch-project/k-NN#2356)

Either

Give an warning message about mismatch between original similarity function of embedding model and space_type of indices

or

Suggest or fix space_type when defining mappings for an index according to the embedding model which neural_search field type will use.

heemin32 · 2025-01-01T19:07:56Z

@YeonghyeonKO, the space_type will be automatically retrieved from the model metadata, so users won't need to specify it explicitly.

YeonghyeonKO · 2025-01-02T04:11:52Z

@heemin32
if then, users who aren't familiar with vector spaces can easily transform text type fields to knn_vector type. Thanks for initiating this proposal @asfoorial

github-actions bot added the untriaged label Jun 25, 2024

asfoorial changed the title ~~[PROPOSAL] Neural Search built-in type~~ [PROPOSAL] Neural Search field type Jun 25, 2024

navneet1v added Enhancements Increases software capabilities beyond original client specifications and removed untriaged labels Jul 4, 2024

jmazanec15 assigned minalsha Sep 4, 2024

heemin32 added this to Neural Search RoadMap Dec 26, 2024

heemin32 closed this as completed by moving to Backlog(Hot) in Neural Search RoadMap Dec 26, 2024

heemin32 moved this to Backlog(Hot) in Neural Search RoadMap Dec 26, 2024

heemin32 reopened this Dec 30, 2024

github-actions bot added the untriaged label Dec 30, 2024

heemin32 unassigned minalsha Dec 30, 2024

heemin32 mentioned this issue Dec 30, 2024

[FEATURE] Warning about Mismatch Between similarity function of Embedding Model and Index space_type opensearch-project/k-NN#2356

Open

heemin32 assigned bzhangam Dec 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PROPOSAL] Neural Search field type #803

[PROPOSAL] Neural Search field type #803

asfoorial commented Jun 25, 2024

navneet1v commented Jul 4, 2024

asfoorial commented Jul 4, 2024

navneet1v commented Jul 29, 2024

navneet1v commented Jul 29, 2024

navneet1v commented Sep 3, 2024

heemin32 commented Nov 26, 2024

navneet1v commented Dec 26, 2024

heemin32 commented Dec 30, 2024 •

edited

Loading

navneet1v commented Dec 30, 2024

YeonghyeonKO commented Dec 30, 2024 •

edited

Loading

bzhangam commented Dec 31, 2024

YeonghyeonKO commented Jan 1, 2025

heemin32 commented Jan 1, 2025

YeonghyeonKO commented Jan 2, 2025

[PROPOSAL] Neural Search field type #803

[PROPOSAL] Neural Search field type #803

Comments

asfoorial commented Jun 25, 2024

navneet1v commented Jul 4, 2024

asfoorial commented Jul 4, 2024

navneet1v commented Jul 29, 2024

navneet1v commented Jul 29, 2024

navneet1v commented Sep 3, 2024

heemin32 commented Nov 26, 2024

navneet1v commented Dec 26, 2024

heemin32 commented Dec 30, 2024 • edited Loading

navneet1v commented Dec 30, 2024

YeonghyeonKO commented Dec 30, 2024 • edited Loading

bzhangam commented Dec 31, 2024

YeonghyeonKO commented Jan 1, 2025

heemin32 commented Jan 1, 2025

YeonghyeonKO commented Jan 2, 2025

heemin32 commented Dec 30, 2024 •

edited

Loading

YeonghyeonKO commented Dec 30, 2024 •

edited

Loading