run on g5.4xlarge #5

colorzhang · 2023-08-08T13:07:27Z

I got the following error when i run vllm_variable_size or naive_hf_variable_size.

1536 4 1000
~/AIML/llm-continuous-batching-benchmarks-winston ~/AIML/llm-continuous-batching-benchmarks-winston/benchmark_configs
Traceback (most recent call last):
File "/home/ubuntu/AIML/llm-continuous-batching-benchmarks-winston/./benchmark_throughput.py", line 597, in
main()
File "/home/ubuntu/AIML/llm-continuous-batching-benchmarks-winston/./benchmark_throughput.py", line 539, in main
prompts, prompt_lens = gen_random_prompts_return_lens(
File "/home/ubuntu/AIML/llm-continuous-batching-benchmarks-winston/./benchmark_throughput.py", line 479, in gen_random_prompts_return_lens
assert len(
AssertionError: Expected prompt to contain exactly 512 tokens, got len(encoded)=350

my env:
Machine: g5.4xlarge
Model: meta-llama/Llama-2-7b-chat-hf

any reason for this error?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run on g5.4xlarge #5

run on g5.4xlarge #5

colorzhang commented Aug 8, 2023

run on g5.4xlarge #5

run on g5.4xlarge #5

Comments

colorzhang commented Aug 8, 2023