-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDA] Multi-GPU for CUDA Version #6138
Open
shiyu1994
wants to merge
70
commits into
master
Choose a base branch
from
nccl-dev
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 41 commits
Commits
Show all changes
70 commits
Select commit
Hold shift + click to select a range
ee3923b
initialize nccl
shiyu1994 82668d0
Merge branch 'master' into nccl-dev
shiyu1994 6189cbb
Merge branch 'master' into nccl-dev
shiyu1994 f39f877
change year in header
shiyu1994 e513662
Merge branch 'master' into nccl-dev
shiyu1994 47f3e50
Merge branch 'nccl-dev' of https://github.com/Microsoft/LightGBM into…
shiyu1994 985780f
add implementation of nccl gbdt
shiyu1994 35b0ca1
add nccl topology
shiyu1994 7d36a14
clean up
shiyu1994 5470d99
Merge branch 'master' into nccl-dev
shiyu1994 7b47a1e
clean up
shiyu1994 839c375
Merge branch 'nccl-dev' of https://github.com/Microsoft/LightGBM into…
shiyu1994 8eaf3ad
Merge branch 'master' into nccl-dev
shiyu1994 cc72fc8
Merge branch 'master' into nccl-dev
shiyu1994 209e25d
set nccl info
shiyu1994 431f967
support quantized training with categorical features on cpu
shiyu1994 b07caf2
remove white spaces
shiyu1994 cf60467
add tests for quantized training with categorical features
shiyu1994 bf2f649
skip tests for cuda version
shiyu1994 2fc9525
fix cases when only 1 data block in row-wise quantized histogram cons…
shiyu1994 dce770c
remove useless capture
shiyu1994 f0c44fc
Merge branch 'master' into nccl-dev
shiyu1994 e2cb41f
Merge branch 'nccl-dev' of https://github.com/Microsoft/LightGBM into…
shiyu1994 f3985ef
fix inconsistency of gpu devices
shiyu1994 d000a41
fix creating boosting object from file
shiyu1994 ecdccd5
change num_gpu to num_gpus in test case
shiyu1994 dfa4419
fix objective initialization
shiyu1994 f4b8906
Merge branch 'nccl-dev' of https://github.com/Microsoft/LightGBM into…
shiyu1994 f0b22d1
fix c++ compilation warning
shiyu1994 617b3b2
fix lint errors
shiyu1994 6d090b2
Merge branch 'master' into fix-6257
shiyu1994 736ab8a
Merge branch 'master' into nccl-dev
shiyu1994 ad72d9f
Merge branch 'fix-6257' into nccl-dev
shiyu1994 2670f48
fix compilation warnings
shiyu1994 02b725b
change num_gpu to num_gpus in R test case
shiyu1994 3bfb784
add nccl synchronization in tree training
shiyu1994 fe1f592
fix global num data update
shiyu1994 a528bd6
merge master
shiyu1994 996d70b
fix ruff-format issues
shiyu1994 671bed3
merge master
shiyu1994 34610fb
use global num data in split finder
shiyu1994 041018b
Merge branch 'master' into nccl-dev
shiyu1994 e1b4512
explicit initialization of NCCLInfo members
shiyu1994 0a21b5f
Merge branch 'master' into nccl-dev
shiyu1994 be29624
Merge branch 'nccl-dev' of https://github.com/Microsoft/LightGBM into…
shiyu1994 06cfde4
Merge branch 'master' into nccl-dev
shiyu1994 75afe5e
Merge branch 'master' into nccl-dev
shiyu1994 1e6e4a1
Merge branch 'master' into nccl-dev
shiyu1994 614605c
merge master
shiyu1994 18babb0
Merge branch 'nccl-dev' of https://github.com/Microsoft/LightGBM into…
shiyu1994 11f4062
fix compilation
shiyu1994 b4c21c2
use CUDAVector
shiyu1994 70fe10f
use CUDAVector
shiyu1994 849a554
merge master
shiyu1994 19a2662
merge master
shiyu1994 6db879a
use CUDAVector
shiyu1994 b43f88b
use CUDAVector for cuda tree and column data
shiyu1994 582c760
update gbdt
shiyu1994 b9e143b
changes for cuda tree
shiyu1994 483e521
use CUDAVector for cuda column data
shiyu1994 950199d
fix bug in GetDataByColumnPointers
shiyu1994 f30ee85
Merge branch 'master' into nccl-dev
shiyu1994 d11991a
disable cuda by default
shiyu1994 4bb4411
Merge branch 'nccl-dev' of https://github.com/Microsoft/LightGBM into…
shiyu1994 b56b39e
fix single machine gbdt
shiyu1994 3bebc19
merge main
shiyu1994 47b4364
clean up
shiyu1994 a326c87
fix typo
shiyu1994 d8ea043
Merge branch 'master' into nccl-dev
shiyu1994 2f040b7
Merge branch 'master' into nccl-dev
shiyu1994 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
# Tries to find NCCL headers and libraries. | ||
# | ||
# Usage of this module as follows: | ||
# | ||
# find_package(NCCL) | ||
# | ||
# Variables used by this module, they can change the default behaviour and need | ||
# to be set before calling find_package: | ||
# | ||
# NCCL_ROOT - When set, this path is inspected instead of standard library | ||
# locations as the root of the NCCL installation. | ||
# The environment variable NCCL_ROOT overrides this variable. | ||
# | ||
# This module defines | ||
# Nccl_FOUND, whether nccl has been found | ||
# NCCL_INCLUDE_DIR, directory containing header | ||
# NCCL_LIBRARY, directory containing nccl library | ||
# NCCL_LIB_NAME, nccl library name | ||
# USE_NCCL_LIB_PATH, when set, NCCL_LIBRARY path is also inspected for the | ||
# location of the nccl library. This would disable | ||
# switching between static and shared. | ||
# | ||
# This module assumes that the user has already called find_package(CUDA) | ||
|
||
if (NCCL_LIBRARY) | ||
if(NOT USE_NCCL_LIB_PATH) | ||
# Don't cache NCCL_LIBRARY to enable switching between static and shared. | ||
unset(NCCL_LIBRARY CACHE) | ||
endif(NOT USE_NCCL_LIB_PATH) | ||
endif() | ||
|
||
if (BUILD_WITH_SHARED_NCCL) | ||
# libnccl.so | ||
set(NCCL_LIB_NAME nccl) | ||
else () | ||
# libnccl_static.a | ||
set(NCCL_LIB_NAME nccl_static) | ||
endif (BUILD_WITH_SHARED_NCCL) | ||
|
||
find_path(NCCL_INCLUDE_DIR | ||
NAMES nccl.h | ||
PATHS $ENV{NCCL_ROOT}/include ${NCCL_ROOT}/include) | ||
|
||
find_library(NCCL_LIBRARY | ||
NAMES ${NCCL_LIB_NAME} | ||
PATHS $ENV{NCCL_ROOT}/lib/ ${NCCL_ROOT}/lib) | ||
|
||
message(STATUS "Using nccl library: ${NCCL_LIBRARY}") | ||
|
||
include(FindPackageHandleStandardArgs) | ||
find_package_handle_standard_args(Nccl DEFAULT_MSG | ||
NCCL_INCLUDE_DIR NCCL_LIBRARY) | ||
|
||
mark_as_advanced( | ||
NCCL_INCLUDE_DIR | ||
NCCL_LIBRARY | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with changing the main parameter name to
num_gpus
, but can we please keepnum_gpu
as a parameter alias? So that existing code using that parameter isn't broken?