Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pinecone vector store #1135

Merged
merged 10 commits into from
Sep 21, 2023

Conversation

Chitti-Ankith
Copy link
Contributor

Added support for Pinecone in EvaDB. Since pinecone requires an API key, created an evironment variable in evadb yml file. If not set, we check for os environment.

@Chitti-Ankith Chitti-Ankith changed the title Vector pinecone Pinecone vector store Sep 16, 2023
@xzdandy
Copy link
Collaborator

xzdandy commented Sep 16, 2023

Good work Chitti-Ankith! Thanks for the contribution. Is it possible to add some test cases?

@Chitti-Ankith
Copy link
Contributor Author

Any suggestions on how to add them since we need an API Key to init

@xzdandy
Copy link
Collaborator

xzdandy commented Sep 16, 2023

Any suggestions on how to add them since we need an API Key to init

Get it. We can mock the pinecone client, and have a unit test case in this case.

@jarulraj
Copy link
Member

@xzdandy Is there an example of such mocking that @Chitti-Ankith can use?

Would this approach also work for testing third-party structured data sources like Clickhouse added by @Preethi1609?

@xzdandy
Copy link
Collaborator

xzdandy commented Sep 17, 2023

We can refer the unit testcases under https://github.com/georgia-tech-db/evadb/blob/staging/test/unit_tests/storage. Yes, it also works for Clickhouse.

Meanwhile, if it is possible, we shall have some kind of integration test to verify it works end to end at staging branch. Does pinecone have any free-tier API keys?

@xzdandy xzdandy added the Integrations 🧩 Pull requests that update an integration label Sep 18, 2023
@xzdandy xzdandy added this to the v0.3.5 milestone Sep 18, 2023
@jarulraj
Copy link
Member

jarulraj commented Sep 19, 2023

Nice job, @Chitti-Ankith. This wraps up Qdrant, Pinecone, FAISS and pgvector. That should be enough for our vector benchmark.

We could consider adding a subset of these systems if they are easy to add with this framework:
Weaviate, Milvus, Vespa, and Chroma.

https://thedataquarry.com/posts/vector-db-4/
https://thedataquarry.com/posts/

USING PINECONE;"""
execute_query_fetch_all(self.evadb, create_index_query)
# Sleep to ensure the pinecone records get updated as Pinecone is eventually consistent.
time.sleep(20)
Copy link
Contributor Author

@Chitti-Ankith Chitti-Ankith Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xzdandy @jiashenC @gaurav274 @jarulraj since Pinecone is eventually consistent, we can't immediately query results after creating indexes. Hence i had to put a sleep in between here. Going through the documentation, I couldnt find a proper solution for this.

One suggestion was to look at the total_vector_count in describe_index_stats() fn https://docs.pinecone.io/docs/insert-data and wait till it gets updated before returning from the add function but this won't work for updates. Any suggestions on this? Should we just assume that the clients will wait for the results to persist before querying.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not able to find a flush function either. For testcases, I think sleep is fine. One option is to print out a warning message when user is creating index using pinecone to indicate that the operation is eventual consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 406 to 407
os.environ["PINECONE_API_KEY"] = "657e4fae-7208-4555-b0f2-9847dfa5b818"
os.environ["PINECONE_ENV"] = "gcp-starter"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crated this free tier API key just for EvaDB testing sake.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use self.evadb.config.update_value("third_party", "PINECONE_API_KEY", "657e4fae-7208-4555-b0f2-9847dfa5b818") to update the value. You may also want to move this logic into cls setup and teardown. Otherwise, if the test failed, the environment variable will not be recovered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used environment vars itself in setup and teardown

@Chitti-Ankith
Copy link
Contributor Author

Also the tests are failing with the issue Andy mentioned here https://evadb.slack.com/archives/C047S6Z0V40/p1695092906362269?thread_ts=1695071176.493499&cid=C047S6Z0V40. We should probably get it fixed instead of having to revert the change and test everytimg

@xzdandy
Copy link
Collaborator

xzdandy commented Sep 20, 2023

Also the tests are failing with the issue Andy mentioned here https://evadb.slack.com/archives/C047S6Z0V40/p1695092906362269?thread_ts=1695071176.493499&cid=C047S6Z0V40. We should probably get it fixed instead of having to revert the change and test everytimg

You can pull the latest staging. This has been already fixed by @gaurav274

@xzdandy xzdandy modified the milestones: v0.3.5, v0.3.6 Sep 20, 2023
@Chitti-Ankith Chitti-Ankith merged commit 952d110 into georgia-tech-db:staging Sep 21, 2023
@Chitti-Ankith Chitti-Ankith deleted the vector_pinecone branch September 21, 2023 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Integrations 🧩 Pull requests that update an integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants