Skip to content
This repository has been archived by the owner on Dec 1, 2024. It is now read-only.

Peak gpu memory use not scale linearly with the percentage of gpu usage of weight #108

Open
frankxyy opened this issue Apr 17, 2023 · 0 comments

Comments

@frankxyy
Copy link

frankxyy commented Apr 17, 2023

command 1:
python -m flexgen.flex_opt --model facebook/opt-30b --path _DUMMY_ --prompt-len 20 --gen-len 15 --percent 25 75 60 40 0 100 --gpu-batch-size 1 --num-gpu-batches 2 --cpu-cache-compute --debug fewer_batch
peak gpu mem: 6.0679 GB

command 2:
python -m flexgen.flex_opt --model facebook/opt-30b --path _DUMMY_ --prompt-len 20 --gen-len 15 --percent 30 70 60 40 0 100 --gpu-batch-size 1 --num-gpu-batches 2 --cpu-cache-compute --debug fewer_batch
gpu oom

The only difference of command 2 from command 1 is the percentage of gpu usage of weight to increase from 25% to 30%.

The capacity of my gpu is 24 GB.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant