-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support ONNX export on torch.float16
type
#749
Support ONNX export on torch.float16
type
#749
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just add two nits, btw is the torch onnx export on fp16 stable?
I've tried on a single model - and did not try to load into an InferenceSession. I will add a test for it, thanks. |
The documentation is not available anymore as the PR was closed or merged. |
1ff3ed8
to
2d210a6
Compare
torch.float16
devicetorch.float16
type
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
…rty/optimum into support-onnx-export-float16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
optional_group.add_argument( | ||
"--fp16", | ||
action="store_true", | ||
help="Experimental option: use half precision during the export. PyTorch-only, requires `--device cuda`.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Experimental because it doesn't work with all models?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say experimental because I haven't thouroughly tested it with ONNX Runtime + CUDAExecutionProvider / TensorrtExecutionProvider, and neither with native TensorRT (though in the validation itself we call InferenceSession on CUDA EP, so it's a good sign it's fine). But the export itself is thoroughly tested.
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
…rty/optimum into support-onnx-export-float16
As per title.
Test still missing. Partly fixes https://discuss.huggingface.co/t/convert-gpt-j-to-fp-16-onnx/30294