TRT/TRT-LLM support for NestedTensor / jagged or ragged tensors for doing away with padding #1469
Unanswered
vadimkantorov
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Does TRT / TRT-LLM support pytorch's concept of NestedTensor or is there some similar concept?
The idea is to not compute FeedForward's on padded-out positions and skip attention computation for them as well
I guess the question also becomes of whether underlying flash attention implementation supports such jagged compact formats.
In PyTorch it seems that they support SDPA with NestedTensor for speedups: pytorch/pytorch#105913 (comment) (can be useful for batched BERT inference)
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions