Update on the development branch #1892

kaiyux · 2024-07-04T06:46:18Z

kaiyux
Jul 4, 2024
Maintainer

Hi,

The TensorRT-LLM team is pleased to announce that we have pushed an update to the development branch (and the Triton backend).

This update includes:

Features
- Support the pipeline parallelism cases when the number of layers cannot be divided by PP size.
- Add LoRA support to Qwen2, see “Run models with LoRA” section in examples/qwen/README.md.
- Add support for Phi-3-mini/small FP8 base + FP16/BF16 LoRA, see “Run Phi-3 with LoRA” section in examples/phi/README.md.
- Add support for starcoder-v2 FP8 base + FP16/BF16 LoRA, see “Run StarCoder2 with LoRA” section in examples/gpt/README.md.
- Add numQueuedRequests to the iteration stats log of the executor API.
- Support FP8 OOTB MoE.
- Add concurrency argument for gptManagerBenchmark.
API
- [BREAKING CHANGE] Remove attention_qk_half_accumulation knob from trtllm-build command.
- [BREAKING CHANGE] Add a runtime max_num_tokens knob to the ExecutorConfig and gptManagerBenchmark.
- [BREAKING CHANGE] The default value of max_seq_len is read from the HuggingFace mode config now.
- [BREAKING CHANGE] Several refactors to the Python high level API, see examples/high-level-api/README.md.
- Update the apps examples using the LLM APIs, please refer to the examples/apps/READEME.md for details.
Bug fixes
- Fix stop and bad words list contiguous for ModelRunner [ModelRunner] Fix stop and bad words list contiguous for offsets #1815, thanks to the contribution from @Marks101.
- Fix missing comment for FAST_BUILD, thanks to the support from @lkm2835 in Add FAST_BUILD comment at #endif #1851.
- Fix the issues that Top-P sampling occasionally produces invalid tokens. Top-P sampling occasionally produces invalid tokens #1590
Infra
- The dependent ModelOpt version is updated to v0.13.0.

Thanks,
The TensorRT-LLM Engineering Team

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update on the development branch #1892

{{title}}

Replies: 0 comments

Select a reply

Update on the development branch #1892

kaiyux Jul 4, 2024 Maintainer

Replies: 0 comments

kaiyux
Jul 4, 2024
Maintainer