-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do you use LLaVA AnyRes? #27
Comments
Thank you for your interest in our work! You are correct that image tokens can significantly consume GPU memory, limiting the per-device batch size to around 2 to 4 on devices like the H100. If your GPU has less memory, it’s expected that only a per device batch size of 1 might be feasible. However, the actual batch size is not the same as the per-device batch size. We utilize the GradCache technique to scale the actual batch size to 2K or even larger. "--grad_cache True" will enable GradCache. (The README contains the whole commands.) Please let me know whether it works. |
Without GradCache, limitation of batch size is 2~4 as you said above. Then, how did you use batch size 256 in Table 3 of the paper on 8 * H100? The environment of table 3 experiment doesn't look like using GradCache (basic setting). Is it from model size difference between phi3.5v and llava_next? or Did you use gradient accumulation? Thank you. |
Hi @kimwongyuda , all the experiments are using GradCache to scale up the batch size. (It's a default settings.) |
@XMHZZ2018 |
Thank you for your nice work.
When I try to run the code, just 1 batch size per device is allowed, even though I use llava with mistral and also lora (but grad cache not used).
It is suspected that a lot of image tokens from AnyRes of llava occupy gpus too much.
I didn't modify any code of your work.
How can I increase batch size? & Did you use AnyRes like above?
Thank you.
The text was updated successfully, but these errors were encountered: