You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After enabling my services as containers in Kubernetes, I have performed some data ingestion tests for PosgresSQL with the Langchain framework
When performing the tests for the retriever-pgvector I encounter the error: sqlalchemy.exc.DataError: (psycopg2.errors.DataException) different vector dimensions 1024 and 768
Although I use the default model "BAAI/bge-base-en-v1.5", the vector in DB has a different size.
Reproduce steps
With this code I was able to check the error
fromlangchain_huggingfaceimportHuggingFaceEmbeddingsimportrandomimportrequestsimportjsonimportasyncio# Generate the embeddingasyncdefget_embedding(text):
embeddings=HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5")
embedding=awaitembeddings.aembed_query(text)
returnembeddingtext="What is the revenue of Nike in 2023?"embedding_pre=asyncio.run(get_embedding(text))
embedding=embedding_pre+ [0.0] * (1024-len(embedding_pre))
print(len(embedding))
print(type(embedding))
print(embedding[0:10])
# Define the URL and the payloadurl="http://localhost:7000/v1/retrieval"payload= {
"text": "What is the revenue of Nike in 2023?",
"embedding": embedding
}
# Send the POST requestheaders= {'Content-Type': 'application/json'}
response=requests.post(url, data=json.dumps(payload), headers=headers)
# Print the responseprint(response.text)
Raw log
PS C:\> kubectl logs -l app=retriever-pgvector-server -n gaas -f
cursor.execute(statement, parameters)
sqlalchemy.exc.DataError: (psycopg2.errors.DataException) different vector dimensions 1024 and 768
[SQL: SELECT langchain_pg_embedding.collection_id AS langchain_pg_embedding_collection_id, langchain_pg_embedding.embedding AS langchain_pg_embedding_embedding, langchain_pg_embedding.document AS langchain_pg_embedding_document, langchain_pg_embedding.cmetadata AS langchain_pg_embedding_cmetadata, langchain_pg_embedding.custom_id AS langchain_pg_embedding_custom_id, langchain_pg_embedding.uuid AS langchain_pg_embedding_uuid, langchain_pg_embedding.embedding <=> %(embedding_1)s AS distance
FROM langchain_pg_embedding JOIN langchain_pg_collection ON langchain_pg_embedding.collection_id = langchain_pg_collection.uuid
WHERE langchain_pg_embedding.collection_id = %(collection_id_1)s::UUID ORDER BY distance ASC
LIMIT %(param_1)s]
[parameters: {'embedding_1': '[0.011756504885852337,-0.0120476633310318,-0.00515980226919055,-0.00771339749917388,0.04292724281549454,0.011159130372107029,0.06352052837610245,0.01 ... (15909 characters truncated) ... 7,-0.025849001482129097,0.05056244507431984,-0.00946701318025589,-0.02234337292611599,0.013910794630646706,0.023309819400310516,-0.05879069119691849]', 'collection_id_1': UUID('e5f56093-bf77-4f95-8c2d-81ff968cca56'), 'param_1': 4}]
(Background on this error at: https://sqlalche.me/e/20/9h9h)
INFO: 127.0.0.1:49758 - "POST /v1/retrieval HTTP/1.1" 200 OK
INFO: 127.0.0.1:38832 - "POST /v1/retrieval HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/user/.local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context
self.dialect.do_execute(
File "/home/user/.local/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 941, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.DataException: different vector dimensions 1024 and 768
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/user/.local/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/applications.py", line 113, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
raise exc
File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
await self.app(scope, receive, _send)
File "/home/user/.local/lib/python3.11/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 174, in __call__
raise exc
File "/home/user/.local/lib/python3.11/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 172, in __call__
await self.app(scope, receive, send_wrapper)
File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 715, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 735, in app
await route.handle(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle
await self.app(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 76, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 73, in app
response = await f(request)
^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/fastapi/routing.py", line 301, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/comps/retrievers/pgvector/langchain/retriever_pgvector.py", line 41, in retrieve
search_res = await vector_db.asimilarity_search_by_vector(embedding=input.embedding)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/langchain_core/vectorstores/base.py", line 685, in asimilarity_search_by_vector
return await run_in_executor(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/langchain_core/runnables/config.py", line 588, in run_in_executor
return await asyncio.get_running_loop().run_in_executor(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/langchain_core/runnables/config.py", line 579, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/langchain_community/vectorstores/pgvector.py", line 990, in similarity_search_by_vector
docs_and_scores = self.similarity_search_with_score_by_vector(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/langchain_community/vectorstores/pgvector.py", line 632, in similarity_search_with_score_by_vector
results = self._query_collection(embedding=embedding, k=k, filter=filter)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/langchain_community/vectorstores/pgvector.py", line 968, in _query_collection
.all()
^^^^^
File "/home/user/.local/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 2673, in all
returnself._iter().all() # type: ignore
^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 2827, in _iter
result: Union[ScalarResult[_T], Result[_T]] = self.session.execute(
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 2362, in execute
return self._execute_internal(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 2247, in _execute_internal
result: Result[Any] = compile_state_cls.orm_execute_statement(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/sqlalchemy/orm/context.py", line 305, in orm_execute_statement
result = conn.execute(
^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1418, in execute
return meth(
^^^^^
File "/home/user/.local/lib/python3.11/site-packages/sqlalchemy/sql/elements.py", line 515, in _execute_on_connection
return connection._execute_clauseelement(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1640, in _execute_clauseelement
ret = self._execute_context(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1846, in _execute_context
return self._exec_single_context(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1986, in _exec_single_context
self._handle_dbapi_exception(
File "/home/user/.local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2355, in _handle_dbapi_exception
raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
File "/home/user/.local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context
self.dialect.do_execute(
File "/home/user/.local/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 941, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.DataError: (psycopg2.errors.DataException) different vector dimensions 1024 and 768
[SQL: SELECT langchain_pg_embedding.collection_id AS langchain_pg_embedding_collection_id, langchain_pg_embedding.embedding AS langchain_pg_embedding_embedding, langchain_pg_embedding.document AS langchain_pg_embedding_document, langchain_pg_embedding.cmetadata AS langchain_pg_embedding_cmetadata, langchain_pg_embedding.custom_id AS langchain_pg_embedding_custom_id, langchain_pg_embedding.uuid AS langchain_pg_embedding_uuid, langchain_pg_embedding.embedding <=> %(embedding_1)s AS distance
FROM langchain_pg_embedding JOIN langchain_pg_collection ON langchain_pg_embedding.collection_id = langchain_pg_collection.uuid
WHERE langchain_pg_embedding.collection_id = %(collection_id_1)s::UUID ORDER BY distance ASC
LIMIT %(param_1)s]
[parameters: {'embedding_1': '[0.011756504885852337,-0.0120476633310318,-0.00515980226919055,-0.00771339749917388,0.04292724281549454,0.011159130372107029,0.06352052837610245,0.01 ... (15909 characters truncated) ... 7,-0.025849001482129097,0.05056244507431984,-0.00946701318025589,-0.02234337292611599,0.013910794630646706,0.023309819400310516,-0.05879069119691849]', 'collection_id_1': UUID('e5f56093-bf77-4f95-8c2d-81ff968cca56'), 'param_1': 4}]
(Background on this error at: https://sqlalche.me/e/20/9h9h)
The text was updated successfully, but these errors were encountered:
Hi @erojaso , why do you explicitly convert the 768 vec to 1024 by embedding = embedding_pre + [0.0] * (1024 - len(embedding_pre))? Please remove this conversion and the code should work.
Please make sure your dataprep and retriever both access the same TEI Endpoint/Local model. If you have a bge-base (dim:768) for one and a bge-large(dim: 1024) for the other, you are likely to have these dimension mismatch issues.
Priority
Undecided
OS type
Ubuntu
Hardware type
CPU-other (Please let us know in description)
Installation method
Deploy method
Running nodes
Single Node
What's the version?
docker pull opea/dataprep-pgvector:latest
docker pull opea/retriever-pgvector:latest
Description
After enabling my services as containers in Kubernetes, I have performed some data ingestion tests for PosgresSQL with the Langchain framework
When performing the tests for the retriever-pgvector I encounter the error: sqlalchemy.exc.DataError: (psycopg2.errors.DataException) different vector dimensions 1024 and 768
Although I use the default model "BAAI/bge-base-en-v1.5", the vector in DB has a different size.
Reproduce steps
With this code I was able to check the error
Raw log
The text was updated successfully, but these errors were encountered: