-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AO and Automated Mixed Precision #1390
Comments
@bhack , would love to learn some more context - is there an issue when you use the checkpoint directly? |
Yes exactly it is the case when we use a checkpoint trained with AMP and we are entering in the AO world to optimize it for inference. |
Generally in the repo before AO I just see |
Sorry for the late response, I was on holiday last week. There are no special things you need to do when using amp + torchao for converting the model to inference - just use torchao according to the documentation. Would you mind clarifying your question if that does not answer it? |
Just to make an example from the Doc it is not clear if this is the best practice or not: model.eval()
quantize_(model, int8_dynamic_activation_int8_weight())
unwrap_tensor_subclass(model)
with torch.amp.autocast(enabled=use_amp, dtype=input_dtype, device_type="cuda"):
......
torch.export.export(....)
aoti_compile_and_package(...) |
I see, looks like your question is whether using autocast for quantized model inference is supported / recommended. I'll tag @jerryzh168 on this one. |
we haven't tested this, but this seems reasonable to me, and
|
For 2: |
Yes |
Ok, it think it is better to add a note somewhere as it is quite common to have models trained with AMP. |
Do you know the e2e flow for a model trained with AMP? do people
|
I think users want to know if they need to use the amp context on a quantized model or not as the documentation still suggests to use it for inference in the amp cases: |
I see, I think we could add a note to say that amp is typically not required in the torchao context anymore, since we typically start with a bfloat16 or float16 model But if people want to run inference for a model trained with AMP, I'd suggest to start with saving the trained model and load the model: https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html#saving-resuming and quantize. but we'll need to test this path before updating the README |
Can we clarify in the readme what are the best practices to use ao at inference with a pytorch AMP trainer model/checkpoint?
The text was updated successfully, but these errors were encountered: