i still got an error with read-only file system in nvidia docker container #747

connor-tan · 2024-01-05T17:56:32Z

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

using docker image: pytorch/pytorch 2.1.0-cuda12.1-cudnn8-runtime
execute "docker run -itd --name test --runtime=nvidia -e NVIDIA_DRIVER_CAPABILITIES="all" pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime"
execute docker-entrypoint.sh in container "test" when container start running
execute patch-test.sh

Expected behavior
patched successfully

Output listings
cu->cuMemAlloc(&data, size) failed -> CUDA_ERROR_OUT_OF_MEMORY: out of memory
[AVHWDeviceContext @ 0x55fc0b49f5c0] cu->cuMemAlloc(&data, size) failed -> CUDA_ERROR_OUT_OF_MEMORY: out of memory
[hwupload @ 0x55fc0bd69780] Failed to allocate frame to upload to.
[vf#0:0 @ 0x55fc0b49d980] Error while filtering: Cannot allocate memory
Failed to inject frame into filter network: Cannot allocate memory
Error while filtering: Cannot allocate memory

Environment (please complete the following information):

OS: [e.g. Ubuntu 22.04]
GPU model: [e.g. RTX 3080Ti]
Patch commit used: master branch ,The latest submission
Nvidia driver version: [e.g. 535.146.02]

Additional context
when i execute patch.sh,shows "./patch.sh: line 373: /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.535.146.02: Read-only file system"

jailuthra · 2024-01-06T08:59:28Z

Can you make the docker filesystem write-able?

connor-tan · 2024-01-06T09:01:33Z

Can you make the docker filesystem write-able?
this directory "/usr/lib/x86_64-linux-gnu/" is write-able

jailuthra · 2024-01-06T09:10:44Z

You might have to build a custom image based on top of the cuda one you want to use. Please go through the README:

It is possible to use this patch with nvidia-docker containers, even if host machine hasn't patched drivers. See Dockerfile for example.

Essentially all you need to do during build is:

COPY the patch.sh and docker-entrypoint.sh files into your container.

Make sure docker-entrypoint.sh is invoked on container start.

docker-entrypoint.sh script does on-the-fly patching by means of manipulating dynamic linker to workaround read-only mount of Nvidia runtime. Finally it passes original docker command to shell, like if entrypoint was not restricted by ENTRYPOINT directive. So docker run --runtime=nvidia -it mycontainer echo 123 will print 123. Also it can be just invoked from your entrypoint script, if you have any.

connor-tan added the bug label Jan 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

i still got an error with read-only file system in nvidia docker container #747

i still got an error with read-only file system in nvidia docker container #747

connor-tan commented Jan 5, 2024 •

edited

Loading

jailuthra commented Jan 6, 2024

connor-tan commented Jan 6, 2024

jailuthra commented Jan 6, 2024

i still got an error with read-only file system in nvidia docker container #747

i still got an error with read-only file system in nvidia docker container #747

Comments

connor-tan commented Jan 5, 2024 • edited Loading

jailuthra commented Jan 6, 2024

connor-tan commented Jan 6, 2024

jailuthra commented Jan 6, 2024

connor-tan commented Jan 5, 2024 •

edited

Loading