Releases · triton-inference-server/vllm_backend

30 Aug 18:37

Latest

What's Changed

Full Changelog: v24.07...v24.08

kthui and yinggeh

Assets 2

05 Aug 20:38

Removed explicit mode for multi-lora by @oandreeva-nv in #45
test: Limiting multi-gpu tests to use Ray as distributed_executor_backend by @oandreeva-nv in #47
perf: Improve vLLM backend performance by using a separate thread for responses by @Tabrizian in #46

Full Changelog: v24.06...v24.07

Tabrizian and oandreeva-nv

Assets 2

23 Jul 19:27

fix: Enhance checks around KIND_GPU and tensor parallelism (#42)

Co-authored-by: Olga Andreeva <[email protected]>

Assets 2