-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update dockerfile.gpu #6452
Update dockerfile.gpu #6452
Conversation
Create blank file
It seems like I`ve found the solution. So, there are questions starts:
What do you think about that optimizations? |
Please use an official image from NVIDIA (or some other official base image like Please keep compilation of LightGBM from source, not pulling from pre-compiled sources. I strongly recommend that you try changing the base image (first |
Understood. I will only change base image, add missing driver and change the |
@microsoft-github-policy-service agree |
Change the base image, add missing driver and change the libnvidia-opencl.so.1 location (according to the new driver and new image) inside required file.
I did all changes so what next? :) |
Did you test this? |
@@ -88,7 +89,7 @@ RUN cd /usr/local/src && mkdir lightgbm && cd lightgbm && \ | |||
|
|||
ENV PATH /usr/local/src/lightgbm/LightGBM:${PATH} | |||
|
|||
RUN /bin/bash -c "source activate py3 && cd /usr/local/src/lightgbm/LightGBM && sh ./build-python.sh install --precompile && source deactivate" | |||
RUN /bin/bash -c "source activate py3 && cd /usr/local/src/lightgbm/LightGBM && sh ./build-python.sh install --gpu && source deactivate" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct. lib_lightgbm.so
has already been compiled a few lines up (the line running cmake --build build
), so --precompile
is necessary to build a Python package bundling it in.
Using --gpu
makes that previous compilation unnecessary... and will not use the same OpenCL library and headers that was passed there.
This should be reverted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get it. I'll try to work with it today if I have spare time and ckeck everything one more time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I turn back to --precompile
I got this errors:
[LightGBM] [Warning] Using sparse features with CUDA is currently not supported.
[LightGBM] [Fatal] CUDA Tree Learner was not enabled in this build.
Please recompile with CMake option -DUSE_CUDA=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That error suggests to me that you're passing {"device": "cuda"}
through parameters. That isn't appropriate for this image, where the library hasn't been built with -DUSE_CUDA=1
.
In this Dockerfile, lib_lightgbm
is being built only with -DUSE_GPU=1
, which means you'd need to pass {"device": "gpu"}
through params.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried different versions of building like cmake -DUSE_GPU=1
or cmake -DUSE_CUDA=1
, and then in the installation command, I also tried all possible variants: sh ./build-python.sh install --gpu
, sh ./build-python.sh install --cuda
, and sh ./build-python.sh install --precompile
as well. I even found your reply on StackOverflow and tried to change some installation steps, but it still didn't work.
The good news is that I fixed the missing files and driver in the Docker image, so now we just need to figure out how to install it properly :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameslamb, @shiyu1994, Today I decide to install it with simple pip command like pip install --no-binary lightgbm --config-settings=cmake.define.USE_CUDA=ON 'lightgbm>=4.0.0'
and after run code with device: cuda
, I get already known error from this issue. This gave me an idea that promblem with instalation from the sorce can be inside the build-python.sh
or cmakelists.txt
files. I ask you to get look at this if you can
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's difficult for me to help you because you're reporting error messages but not showing the code you ran the led to them.
This Dockerfile is about the -DUSE_GPU
version of LightGBM (OpenCL-based), not the -DUSE_CUDA
version (CUDA kernels). Please keep it that way.
Stop passing -DUSE_CUDA
or using {"device": "cuda"}
with images built from this Dockerfile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NisuSan are you still interested in working on this?
If you don't have the time / interest right now please tell us, so we can close this and someone else can work on fixing this Dockerfile.
I'm going to close this due to lack of response (#6452 (comment) was posted 5 weeks ago), so that others know they can contribute to #6450. We'd love to have you come back in the future and contribute when you have time to work with us. |
For those finding this from GitHub search... I'm continuing the work in #6638 |
Create blank file.