Update on the development branch #2437
kaiyux
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
The TensorRT-LLM team is pleased to announce that we have pushed an update to the development branch (and the Triton backend) this Nov 12, 2024.
This update includes:
examples/nemotron
.examples/gpt/README.md
.examples/multimodal/README.md
.trtllm-serve
command to launch a FastAPI based server.examples/prompt_lookup/README.md
.examples/nemotron_nas/README.md
.examples/llama/README.md
.executor
API, see “executorExampleFastLogits” section inexamples/cpp/executor/README.md
.auto
is used as the default value for--dtype
option in quantize and checkpoints conversion scripts.moeTopK()
cannot find the correct expert when the number of experts is not a power of two. Thanks @dongjiyingdjy for reporting this bug.crossKvCacheFraction
. (Assertion failed: Must set crossKvCacheFraction for encoder-decoder model #2419)docs/source/performance/perf-benchmarking.md
, thanks @MARD1NO for pointing it out in Small Typo #2425.nvcr.io/nvidia/pytorch:24.10-py3
.nvcr.io/nvidia/tritonserver:24.10-py3
.Thanks,
The TensorRT-LLM Engineering Team
Beta Was this translation helpful? Give feedback.
All reactions