-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PGVector integration #222
base: develop
Are you sure you want to change the base?
PGVector integration #222
Conversation
Co-authored by: Doug Ortiz <[email protected]> Sarah Conway <[email protected]>
'INSERT INTO embeddings (id, embedding) VALUES (%s, %s::vector) ON CONFLICT (id) DO UPDATE SET embedding = EXCLUDED.embedding', | ||
(id, embedding.tolist()) | ||
) | ||
self._conn.commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this is the right place but somewhere you need to create an HNSW index. I think the default values for index density are fine. You should probably ensure that working memory is temporarily set larger than the default. See here
https://github.com/thesteve0/arvix-query/blob/main/sql_commands.sql#L23
Cosine is also a fine distance metric to have as the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the elasticsearch has an explicit method for index creation
def _create_index(self, dimension): |
|
||
_SUPPORTED_METRICS = ("cosine", "dotproduct", "euclidean") | ||
|
||
class PgVectorSimilarityConfig(SimilarityConfig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at the elasticsearch integration they allow for defining the distance metric. We should do the same:
metric="cosine", |
We will also need the SSL certs and other connection types that people use with PostgreSQL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need an implementation for
get_embeddings -
def get_embeddings( |
remove_from_index -
def remove_from_index( |
Hi!
These files still need to be polished up and the documentation + similarity test added. I'm opening this draft PR so existing work can be reviewed if desired and so the current progress is known.
Additional changes will be pushed up to this PR ASAP.
The existing code will have a pre-commit hook performed on it before this PR is marked as ready to merge.