Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run on g5.4xlarge #5

Open
colorzhang opened this issue Aug 8, 2023 · 0 comments
Open

run on g5.4xlarge #5

colorzhang opened this issue Aug 8, 2023 · 0 comments

Comments

@colorzhang
Copy link

I got the following error when i run vllm_variable_size or naive_hf_variable_size.

1536 4 1000
~/AIML/llm-continuous-batching-benchmarks-winston ~/AIML/llm-continuous-batching-benchmarks-winston/benchmark_configs
Traceback (most recent call last):
File "/home/ubuntu/AIML/llm-continuous-batching-benchmarks-winston/./benchmark_throughput.py", line 597, in
main()
File "/home/ubuntu/AIML/llm-continuous-batching-benchmarks-winston/./benchmark_throughput.py", line 539, in main
prompts, prompt_lens = gen_random_prompts_return_lens(
File "/home/ubuntu/AIML/llm-continuous-batching-benchmarks-winston/./benchmark_throughput.py", line 479, in gen_random_prompts_return_lens
assert len(
AssertionError: Expected prompt to contain exactly 512 tokens, got len(encoded)=350

my env:
Machine: g5.4xlarge
Model: meta-llama/Llama-2-7b-chat-hf

any reason for this error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant