-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weights Still in FP32 after Quantization #347
Comments
I just tested your example file quantize_sst2_model.py and printed the parameters of the reloaded model and also there all the parameters are still in float32.
Float model |
@ClaraLovesFunk thank you for your feedback. The parameters dtype is still float32, but if you check their type, you will see that they are now |
Thank you so much for the explanation, David! Will do. |
Do you maybe also have an explanation, why i can't use bigger batch sizes after applying quantization and veryfing my model shrinked from 413.44 to 169.11 MB? |
How can i get param's dtype or qtype? param.qtype? |
Heyy Lian, you can check the datatype of model weights with:
I actually did not check the qtype tho, but gen ai suggests:
Cheers <3 |
Dear quanto folks,
I implemented quantization as suggested in your coding example quantize_sst2_model.py. When printing the datatypes of the parameters, I found that after quantization all the weights remained in float32. Do you have any explaination to this?
And also do you have any explainations, why i can't use bigger batch sizes when applying quantization of both weights and activations? I used PubMedBERT for Huggingface, fine-tuned it myself and applied static quantization (see code below).
And do you know why inference speed significantly slows down when i use the reloaded statically quantized model (code below) as opposed to the directly statically quantized model? I again followed the instructions of the coding example
Any help greatly appreciated since I'm just wrapping up my soon due master thesis about this <3
Clara
Direct Static Quantiation:
Reloading statically quantized model:
The text was updated successfully, but these errors were encountered: