Update prepare_build_environment_windows.sh #1830

BBC-Esq · 2024-12-09T19:16:26Z

There was a mistaken path to CUDA 12.2, which is causing compatibility issues:

1) torch only has the following prebuilt wheels for CUDA support:

Pytorch Wheel	PyTorch Versions Supported
cu124	2.5.1, 2.5.0, 2.4.1, 2.4.0
cu121	2.5.1, 2.5.0, 2.4.1, 2.4.0, 2.3.1, 2.3.0, 2.2.2...2.1.0

2) Based on the `torch` version, the following will be bundled for Linux:

triton, mkl, and sympy being dependencies for all platforms:

Torch	cuda-nvrtc-cu12	cuda-runtime-cu12	cublas-cu12	cudnn-cu12	triton	mkl	sympy
2.5.1	12.4.127	12.4.127	12.4.5.8	9.1.0.70	3.1.0	-	1.13.1
2.5.0	12.4.127	12.4.127	12.4.5.8	9.1.0.70	3.1.0	-	1.13.1
2.4.1	12.1.105	12.1.105	12.1.3.1	9.1.0.70	3.0.0	-	-
2.4.0	12.1.105	12.1.105	12.1.3.1	9.1.0.70	3.0.0	-	-
2.3.1	12.1.105	12.1.105	12.1.3.1	8.9.2.26	2.3.1	<=2021.4.0	-
2.3.0	12.1.105	12.1.105	12.1.3.1	8.9.2.26	2.3.0	<=2021.4.0	-
2.2.2	12.1.105	12.1.105	12.1.3.1	8.9.2.26	2.2.0	-	-

12.1.105 and 12.1.3.1 - ALL COME FROM CUDA release 12.1.1
12.4.127 and 12.4.5.8 - ALL COME FROM CUDA release 12.4.1
In other words, torch is NOT 100% compatible with CUDA 12.1.0 or 12.4.0, for example, or any other version.

CTRANSLATE2 currently uses CUDA 12.2 for which torch doesn't build wheels for!

3) CUDA and cuDNN compatibility is flexible and primarily affects whether static linking is available:

cuDNN Version	Static Linking	No Static Linking
8.9.2	12.1, 11.8	12.0, ≤11.7
8.9.3	12.1, 11.8	12.0, ≤11.7
8.9.4	12.2, 11.8	12.1, 12.0, ≤11.7
8.9.5	12.2, 11.8	12.1, 12.0, ≤11.7
8.9.6	12.2, 11.8	12.1, 12.0, ≤11.7
8.9.7	12.2, 11.8	12.1, 12.0, ≤11.7
9.0.0	12.3, 11.8	12.2, 12.1, 12.0, ≤11.7
9.1.0	12.4-12.0, 11.8	≤11.7
9.1.1	12.5-12.0, 11.8	≤11.7

4) HOWEVER, despite this compatibility, `TORCH` presents its own complications:

PyTorch Version	Python	Stable	Experimental
2.5	>=3.9, <=3.12, (3.13 exp.)	CUDA 11.8, CUDA 12.1, CUDA 12.4, CUDNN 9.1.0.70	None
2.4	>=3.8, <=3.12	CUDA 11.8, CUDA 12.1, CUDNN 9.1.0.70	CUDA 12.4, CUDNN 9.1.0.70
2.3	>=3.8, <=3.11, (3.12 exp.)	CUDA 11.8, CUDNN 8.7.0.84	CUDA 12.1, CUDNN 8.9.2.26
2.2	>=3.8, <=3.11, (3.12 exp.)	CUDA 11.8, CUDNN 8.7.0.84	CUDA 12.1, CUDNN 8.9.2.26

This is outlined here: https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-compatibility-matrix

5) The problem...

Ctranslate2 currently installs CUDA libraries originating from release 12.2, which results in the following being installed:
nvidia-cuda-runtime-cu12==12.2.140
nvidia-cublas-cu12==12.2.5.6
nvidia-cuda-nvcc-cu12==12.2.140
nvidia-cuda-nvrtc-cu12==12.2.140

These do not match any of the compatible libraries that torch bundles with its wheels - i.e. that are compatible. Again, torch is, apparently, very specific about which CUDA libraries it supports. For example, these libraries are even different between CUDA releases 12.4.0 (which torch doesn't support) and 12.4.1 (which torch bundles with all its wheels).

6) Solution...

Make ctranslate2 use the libraries from CUDA 12.4.1 (not 12.4.0 even) to maximize compatibility.

7) Additional reasons...

Other libraries can be highly dependent on compatibility with torch and/or CUDA versions. For example:

Xformers Compatibility

Xformers Version	Torch Version
v0.0.28.post3	2.5.1
v0.0.28.post2	2.5.0
v0.0.28.post1	2.4.1
v0.0.27.post2	2.4.0
v0.0.27.post1	2.4.0
v0.0.27	2.3.0
v0.0.26.post1	2.3.0
v0.0.25.post1	2.2.2

Windows prebuilt wheels on PyPI through 2.4.0 (i.e., pip installable)
Windows prebuilt wheels after 2.4.0 only available from PyTorch:
- https://download.pytorch.org/whl/cu124/xformers/
- Example: pip install https://download.pytorch.org/whl/cu124/xformers-0.0.28.post3-cp311-cp311-win_amd64.whl

LINUX Flash Attention 2 Compatibility

FA2 Version	Torch Versions Supported	CUDA Versions
v2.7.1.post4	2.2.2, 2.3.1, 2.4.0, 2.5.1, 2.6.0.dev20241001	11.8.0, 12.3.2
v2.7.1.post3	2.2.2, 2.3.1, 2.4.0, 2.5.1, 2.6.0.dev20241001	11.8.0, 12.3.2
v2.7.1.post2	2.2.2, 2.3.1, 2.4.0, 2.5.1, 2.6.0.dev20241001	11.8.0, 12.3.2
v2.7.1.post1	2.2.2, 2.3.1, 2.4.0, 2.5.1, 2.6.0.dev20241010	11.8.0, 12.4.1
v2.7.1	2.2.2, 2.3.1, 2.4.0, 2.5.1, 2.6.0.dev20241010	11.8.0, 12.4.1
v2.7.0.post2	2.2.2, 2.3.1, 2.4.0, 2.5.1	11.8.0, 12.4.1
v2.7.0.post1	2.2.2, 2.3.1, 2.4.0, 2.5.1	11.8.0, 12.4.1
v2.7.0	2.2.2, 2.3.1, 2.4.0, 2.5.1	11.8.0, 12.3.2
v2.6.3*	2.2.2, 2.3.1, 2.4.0	11.8.0, 12.3.2
v2.6.2	2.2.2, 2.3.1, 2.4.0.dev20240527	11.8.0, 12.3.2
v2.6.1	2.2.2, 2.3.1, 2.4.0.dev20240514	11.8.0, 12.3.2
v2.6.0.post1	2.2.2, 2.3.1, 2.4.0.dev20240514	11.8.0, 12.2.2
v2.6.0	2.2.2, 2.3.1, 2.4.0.dev20240512	11.8.0, 12.2.2
v2.5.9.post1	2.2.2, 2.3.0, 2.4.0.dev20240407	11.8.0, 12.2.2

2.5.8 is the first to support torch 2.2.2
No prebuilt wheels simultaneously support torch 2.2.2 and CUDA prior to 12.2.2

Regarding triton, which torch requires...triton==3.0.0 only supports up to Python 3.11. triton==3.1.0, on the other hand, supports Python 3.12.

Conclusion

Using CUDA libraries related to release 12.4.1 maximizes the compatibility with torch and other popular libraries like flash attention 2, xformers, triton (a requirement of torch now), not to mention many others.

BBC-Esq · 2024-12-10T12:10:53Z

Can you rerun this please? Apparently, it errored because there was a connection error downloading a particular helskini model from huggingface.

Update prepare_build_environment_windows.sh

deae6d2

This was referenced Dec 9, 2024

v4.5.0 is not compatible with torch>=2.*.*+cu121 #1806

Open

CUDNN 9 support #1780

Open

BBC-Esq closed this by deleting the head repository Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update prepare_build_environment_windows.sh #1830

Update prepare_build_environment_windows.sh #1830

BBC-Esq commented Dec 9, 2024 •

edited

Loading

BBC-Esq commented Dec 10, 2024

Update prepare_build_environment_windows.sh #1830

Update prepare_build_environment_windows.sh #1830

Conversation

BBC-Esq commented Dec 9, 2024 • edited Loading

1) torch only has the following prebuilt wheels for CUDA support:

2) Based on the torch version, the following will be bundled for Linux:

CTRANSLATE2 currently uses CUDA 12.2 for which torch doesn't build wheels for!

3) CUDA and cuDNN compatibility is flexible and primarily affects whether static linking is available:

4) HOWEVER, despite this compatibility, TORCH presents its own complications:

5) The problem...

6) Solution...

7) Additional reasons...

Conclusion

BBC-Esq commented Dec 10, 2024

BBC-Esq commented Dec 9, 2024 •

edited

Loading

2) Based on the `torch` version, the following will be bundled for Linux:

4) HOWEVER, despite this compatibility, `TORCH` presents its own complications: