You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The key idea is a multi-expansion search of the graph. Instead of pure fan out only looking at the current neighborhood, additional neighbor-neighborhoods are explored, regardless if you have collected all the candidates or not.
This does allow some nice properties and honestly, doesn't seem that difficult to implement.
I would ignore Acorn-gamma part of the paper and focus in on Acorn-1. I am not sure we need to adjust the graph construction.
I think for graph construction and storage, building from quantized estimations & potentially bi-partite graph organization of the nodes would be overall better.
But, this "explore the next hop" thing does seem nice.
Also, our recent connectivity improvements will only make filtered search better.
One additional thought, I wonder if we should also allow more than one entry point into the bottom layer with filtered search?
Honestly, all this optimization tuning can get tricky as you consider the filtering percentages (did the user filter out only 1% of the docs or 80% of them).
The text was updated successfully, but these errors were encountered:
Hey @benwtrent - I played around with implementing this.
I tried a bunch of things, benchmarking as I went. Here's what I found:
The performance improvements seem to come from the "predicate subgraph traversal" (aka only score/consider candidates that pass the filter). This saves time because it doesn't score nodes that don't pass the filter, and thus won't be collected as results. However, this approach risks missing filter-passing results 2+ hops away.
Two-hop neighbor expansion mitigates the above risk. I was getting decent results doing predicate subgraph traversal + two-hop expansion.
Weaviate "conditionally evaluates whether to use the two-hop expansion" - I tried this and saw lower latencies for inclusive filters, so I added it as well.
Here's a draft PR with the best benchmarking code I found: #14085.
I benchmarked using Cohere embeddings with params:
nDoc = 200000
topK = 100
fanout = 50
maxConn = 32
beamWidth = 100
filterSelectivity = [0.05, 0.25, 0.5, 0.75, 0.95]
Here are some results:
Baseline:
filterSelectivity
recall
latency (ms)
0.05
0.037
17.182
0.25
0.166
7.348
0.5
0.332
4.376
0.75
0.489
3.165
0.95
0.608
2.441
Candidate (this code):
filterSelectivity
recall
latency (ms)
0.05
0.028
2.744
0.25
0.157
4.614
0.5
0.308
4.833
0.75
0.449
4.622
0.95
0.563
3.244
Going forward, I think one thing that must be tested is correlation between the filter and the query vector. The paper discusses this at length. The risks lie in a negative correlation. To your point on multiple entry points to layer 0, the Weaviate article mentions using that approach to mitigate low correlations, so definitely something worth trying as well.
I'm going to start by adding a correlation knob to the KNN testing suite in luceneutil. I figured I'd share this in the meantime in case you had any ideas.
Description
Lucene already does OK in filtered kNN search, but it can be better.
An interesting paper in this area: https://arxiv.org/abs/2403.04871
Weaviate has done an implementation of such paper: weaviate/weaviate#5369
The key idea is a multi-expansion search of the graph. Instead of pure fan out only looking at the current neighborhood, additional neighbor-neighborhoods are explored, regardless if you have collected all the candidates or not.
This does allow some nice properties and honestly, doesn't seem that difficult to implement.
I would ignore Acorn-gamma part of the paper and focus in on Acorn-1. I am not sure we need to adjust the graph construction.
I think for graph construction and storage, building from quantized estimations & potentially bi-partite graph organization of the nodes would be overall better.
But, this "explore the next hop" thing does seem nice.
Also, our recent connectivity improvements will only make filtered search better.
One additional thought, I wonder if we should also allow more than one entry point into the bottom layer with filtered search?
Honestly, all this optimization tuning can get tricky as you consider the filtering percentages (did the user filter out only 1% of the docs or 80% of them).
The text was updated successfully, but these errors were encountered: