Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Docsum file upload handling fails with concurrent http request with same file name #1279

Closed
2 of 6 tasks
vrantala opened this issue Dec 20, 2024 · 8 comments
Closed
2 of 6 tasks
Assignees
Labels
aitce bug Something isn't working Dev
Milestone

Comments

@vrantala
Copy link

Priority

P2-High

OS type

Ubuntu

Hardware type

Xeon-SPR

Installation method

  • Pull docker images from hub.docker.com
  • Build docker images from source

Deploy method

  • Docker compose
  • Docker
  • Kubernetes
  • Helm

Running nodes

Single Node

What's the version?

opea/docsum:latest sha256:42115ddbf5199060c94f66d654e9e2f128406e94b8dc0a3b3929bae93d4ba829
opea/llm-docsum-tgi:latest sha256:06c8095ccd0ca48de6588ee3bfbddb233d1e9e8cf4c4aeaa60e10eca4571fea4

Description

When sending http-request
Making request to /v1/docsum with files {'type': (None, 'text'), 'messages': (None, ''), 'files': ('pubmed_10.txt', <_io.BufferedReader name='/home/testrunner/users.vrantala/opea_benchmarking/data/pubmed_10.txt'>, 'text/plain'), 'max_tokens': (None, '1024'), 'language': (None, 'en'), 'stream': (None, 'true')}
to docsum service endpoint with Locust test-bench.
It starts giving "POST /v1/docsum HTTP/1.1" 500 Internal Server Error.

Reproduce steps

Send concurrent http-requests with same filename to the docsum service endpoint.
Http-request info:
Making request to /v1/docsum with files {'type': (None, 'text'), 'messages': (None, ''), 'files': ('pubmed_10.txt', <_io.BufferedReader name='/home/testrunner/users.vrantala/opea_benchmarking/data/pubmed_10.txt'>, 'text/plain'), 'max_tokens': (None, '1024'), 'language': (None, 'en'), 'stream': (None, 'true')}

Raw log

INFO:     10.245.0.1:57716 - "POST /v1/docsum HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 113, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.11/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 174, in __call__
    raise exc
  File "/usr/local/lib/python3.11/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 172, in __call__
    await self.app(scope, receive, send_wrapper)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 715, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 735, in app
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 76, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 73, in app
    response = await f(request)
               ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 301, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/docsum.py", line 129, in handle_request
    os.remove(file_path)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pubmed_10.txt'
@vrantala vrantala added the bug Something isn't working label Dec 20, 2024
@eero-t
Copy link
Contributor

eero-t commented Dec 20, 2024

With multiple parallel requests, Python HTTP module calls code here, from multiple threads, in parallel:
https://github.com/opea-project/GenAIExamples/blob/main/DocSum/docsum.py#L146

Meaning that those threads overwrite and remove each other's temporary file.

That code has also other security / reliability bugs:

  • It's writing to a location (/tmp) that is shared with rest of the system (at least when program is not containerized)
  • It's using path/filename controlled by the (potential) attacker

There are few ways to fix that:

  • Use unique temp file names (as security guidelines require), or
  • Do not use extra temporary file, but parse the file object directly
    • => This could be noticeable perf optimization for large file uploads

Also, are (HTTP server component) limits set on how large and how many files can be upload in parallel?

I.e. could the (potential) attacker just send e.g. 1000x 1GB files to evict all OPEA services from a node[1], or OOM the node[2], because service exhausted /tmp space?


[1] in case /tmp is disk backed.
[2] iin case /tmp is in RAM.

@ZailiWang ZailiWang self-assigned this Dec 23, 2024
@joshuayao joshuayao added the Dev label Dec 24, 2024
@joshuayao joshuayao added this to the v1.2 milestone Dec 24, 2024
@joshuayao joshuayao added this to OPEA Dec 24, 2024
@joshuayao joshuayao moved this to In progress in OPEA Dec 24, 2024
@Spycsh Spycsh mentioned this issue Dec 24, 2024
7 tasks
@Spycsh
Copy link
Member

Spycsh commented Dec 24, 2024

#1286 should fix this.

@eero-t
Copy link
Contributor

eero-t commented Dec 30, 2024

#1286 should fix this.

Looking at the changes, yes, but I'll let @vrantala to verify that for real.

Fix seems rather non-optimal. Easiest change should be just using file.filename directly in read_text_from_file(), instead of copying existing file to another temp file, and passing name of that too. Was that tested?

@Spycsh
Copy link
Member

Spycsh commented Dec 31, 2024

#1286 should fix this.

Looking at the changes, yes, but I'll let @vrantala to verify that for real.

Fix seems rather non-optimal. Easiest change should be just using file.filename directly in read_text_from_file(), instead of copying existing file to another temp file, and passing name of that too. Was that tested?

I don't get your point. This issue is about the concurrency issue when uploading multiple files with the same file names. If you still use the file name to differentiate, won't it just repeat the bug?

Would be great if you can open your own PR and do the test and the benchmark to prove what you say the "optimal" approach!

@eero-t
Copy link
Contributor

eero-t commented Dec 31, 2024

I don't get your point. This issue is about the concurrency issue when uploading multiple files with the same file names. If you still use the file name to differentiate, won't it just repeat the bug?

Ah, sorry that was brainfart, I thought that either read_text_from_file() could handle (file.file) file objects, or that UploadFile would be already on disk, but on closer look at implementations of FastAPI/Starlette [1], langchain PyPDFLoader/BaseLoader [2] and docx2text/ZipFile [3], I see that temporary file is unavoidable. file object is not yet on disk, and neither BaseLoader nor ZipFile APIs supports file objects, only file paths.

But I think temporary file handling should be moved to read_text_from_file(), and done after text files have been handled, because otherwise file content is redundantly read & written for text/plain type: https://github.com/opea-project/GenAIExamples/blob/main/DocSum/docsum.py#L105

And temporary file handling could be using aiofiles helper for that.


[1] UploadFile: https://github.com/encode/starlette/blob/master/starlette/datastructures.py#L409
[2] ZipFile: https://docs.python.org/3/library/zipfile.html#zipfile-objects
[3] BaseLoader: https://python.langchain.com/api_reference/core/document_loaders/langchain_core.document_loaders.base.BaseLoader.html

@eero-t
Copy link
Contributor

eero-t commented Dec 31, 2024

Would be great if you can open your own PR and do the test and the benchmark to prove what you say the "optimal" approach!

@Spycsh I did some testing with aiofiles temp file functionality to fix the 2 items I mentioned, and attached docsum.py patch seems OK according to pylint:
docsum-aiofiles-tempfile-use.patch.txt

I don't have time to test it properly / with full DocSum functionality though, so it would be great if you could check it!
(and push it forward if you think it useful)

@Spycsh
Copy link
Member

Spycsh commented Jan 2, 2025

@eero-t , thanks for suggestions and explanation, will revisit the your patch at a later point when my schedule allows!

@vrantala
Copy link
Author

vrantala commented Jan 3, 2025

Tested the fix in #1286. PR fixed this issue.

@vrantala vrantala closed this as completed Jan 3, 2025
@github-project-automation github-project-automation bot moved this from In progress to Done in OPEA Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aitce bug Something isn't working Dev
Projects
Status: Done
Development

No branches or pull requests

5 participants