-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pinecone vector store #1135
Pinecone vector store #1135
Conversation
Good work Chitti-Ankith! Thanks for the contribution. Is it possible to add some test cases? |
Any suggestions on how to add them since we need an API Key to init |
Get it. We can mock the pinecone client, and have a unit test case in this case. |
@xzdandy Is there an example of such mocking that @Chitti-Ankith can use? Would this approach also work for testing third-party structured data sources like Clickhouse added by @Preethi1609? |
We can refer the unit testcases under https://github.com/georgia-tech-db/evadb/blob/staging/test/unit_tests/storage. Yes, it also works for Clickhouse. Meanwhile, if it is possible, we shall have some kind of integration test to verify it works end to end at staging branch. Does pinecone have any free-tier API keys? |
Nice job, @Chitti-Ankith. This wraps up We could consider adding a subset of these systems if they are easy to add with this framework: https://thedataquarry.com/posts/vector-db-4/ |
USING PINECONE;""" | ||
execute_query_fetch_all(self.evadb, create_index_query) | ||
# Sleep to ensure the pinecone records get updated as Pinecone is eventually consistent. | ||
time.sleep(20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xzdandy @jiashenC @gaurav274 @jarulraj since Pinecone is eventually consistent, we can't immediately query results after creating indexes. Hence i had to put a sleep in between here. Going through the documentation, I couldnt find a proper solution for this.
One suggestion was to look at the total_vector_count in describe_index_stats() fn https://docs.pinecone.io/docs/insert-data and wait till it gets updated before returning from the add function but this won't work for updates. Any suggestions on this? Should we just assume that the clients will wait for the results to persist before querying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not able to find a flush function either. For testcases, I think sleep is fine. One option is to print out a warning message when user is creating index using pinecone to indicate that the operation is eventual consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
os.environ["PINECONE_API_KEY"] = "657e4fae-7208-4555-b0f2-9847dfa5b818" | ||
os.environ["PINECONE_ENV"] = "gcp-starter" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Crated this free tier API key just for EvaDB testing sake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use self.evadb.config.update_value("third_party", "PINECONE_API_KEY", "657e4fae-7208-4555-b0f2-9847dfa5b818")
to update the value. You may also want to move this logic into cls setup and teardown. Otherwise, if the test failed, the environment variable will not be recovered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used environment vars itself in setup and teardown
Also the tests are failing with the issue Andy mentioned here https://evadb.slack.com/archives/C047S6Z0V40/p1695092906362269?thread_ts=1695071176.493499&cid=C047S6Z0V40. We should probably get it fixed instead of having to revert the change and test everytimg |
You can pull the latest staging. This has been already fixed by @gaurav274 |
Added support for Pinecone in EvaDB. Since pinecone requires an API key, created an evironment variable in evadb yml file. If not set, we check for os environment.