Update on the development branch #1892
kaiyux
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
The TensorRT-LLM team is pleased to announce that we have pushed an update to the development branch (and the Triton backend).
This update includes:
examples/qwen/README.md
.examples/phi/README.md
.examples/gpt/README.md
.numQueuedRequests
to the iteration stats log of the executor API.concurrency
argument forgptManagerBenchmark
.attention_qk_half_accumulation
knob fromtrtllm-build
command.max_num_tokens
knob to theExecutorConfig
andgptManagerBenchmark
.max_seq_len
is read from the HuggingFace mode config now.examples/high-level-api/README.md
.apps
examples using theLLM
APIs, please refer to theexamples/apps/READEME.md
for details.ModelRunner
[ModelRunner] Fix stop and bad words list contiguous for offsets #1815, thanks to the contribution from @Marks101.FAST_BUILD
, thanks to the support from @lkm2835 in Add FAST_BUILD comment at #endif #1851.Thanks,
The TensorRT-LLM Engineering Team
Beta Was this translation helpful? Give feedback.
All reactions