Skip to content

TorchServe v0.12.0 Release Notes

Latest
Compare
Choose a tag to compare
@agunapal agunapal released this 30 Sep 22:46
· 8 commits to master since this release
6bdb1ba

Highlights Include

  • GenAI updates
    • No code LLM deployments with TorchServe + vLLM & TensorRT-LLM using ts.llm_launcher script
    • OpenAI API support for TorchServe + vLLM
    • Integration of TensorRT-LLM engine
    • Stateful Inference on AWS Sagemaker (see blog)
  • Support for linux-aarch64
    • CI & nightly regression added
    • Publish docker & KServe images
  • PyTorch updates
    • Support for PyTorch 2.4
    • Deprecation of TorchText

PyTorch Updates

GenAI

  • Implement stateful inference session timeout by @namannandan in #3263
  • Use Case: Enhancing LLM Serving with Torch Compiled RAG on AWS Graviton by @agunapal in #3276
  • Feature add openai api for vllm integration by @mreso in #3287
  • Set vllm multiproc method to spawn by @mreso in #3310
  • TRT LLM Integration with LORA by @agunapal in #3305
  • Bump vllm from 0.5.0 to 0.5.5 in /examples/large_models/vllm by @dependabot in #3321
  • Use startup time in async worker thread instead of worker timeout by @mreso in #3315
  • Rename vllm dockerfile by @mreso in #3330

Support for linux-aarch64

Documentation

Improvements and Bug Fixing

New Contributors

Platform Support

Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe requires Python >= 3.8 and JDK17.

GPU Support Matrix

TorchServe version PyTorch version Python Stable CUDA Experimental CUDA
0.12.0 2.4.0 >=3.8, <=3.11 CUDA 11.8, CUDNN 8.7.0.84 CUDA 12.1, CUDNN 8.9.2.26
0.11.1 2.3.0 >=3.8, <=3.11 CUDA 11.8, CUDNN 8.7.0.84 CUDA 12.1, CUDNN 8.9.2.26
0.11.0 2.3.0 >=3.8, <=3.11 CUDA 11.8, CUDNN 8.7.0.84 CUDA 12.1, CUDNN 8.9.2.26
0.10.0 2.2.1 >=3.8, <=3.11 CUDA 11.8, CUDNN 8.7.0.84 CUDA 12.1, CUDNN 8.9.2.26
0.9.0 2.1 >=3.8, <=3.11 CUDA 11.8, CUDNN 8.7.0.84 CUDA 12.1, CUDNN 8.9.2.26
0.8.0 2.0 >=3.8, <=3.11 CUDA 11.7, CUDNN 8.5.0.96 CUDA 11.8, CUDNN 8.7.0.84
0.7.0 1.13 >=3.7, <=3.10 CUDA 11.6, CUDNN 8.3.2.44 CUDA 11.7, CUDNN 8.5.0.96

Inferentia2 Support Matrix

TorchServe version PyTorch version Python Neuron SDK
0.12.0 2.1 >=3.8, <=3.11 2.18.2+
0.11.1 2.1 >=3.8, <=3.11 2.18.2+
0.11.0 2.1 >=3.8, <=3.11 2.18.2+
0.10.0 1.13 >=3.8, <=3.11 2.16+
0.9.0 1.13 >=3.8, <=3.11 2.13.2+