-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] FP16 conversion yields an unusable model #17447
Comments
Can you share the original model? It's not the CUDA execution provider that is failing, it's the ONNX validation of the model during loading.
The Resize spec says the node can only provide either the That makes the next question whether the original model was valid or not. What happens if you try and load the original model in onnxruntime? You don't need to run it - just create an InferenceSession with it and see if that is successful. |
Sure, the model is SSD ResNet152 V1 FPN 640x640 (RetinaNet152). Sagemaker works with TF Zoo models. The original UINT8/FP32 (input/outputs) converted to ONNX model works. Loading it in a session is fine, I've ran inference with it too. The FP16 converted one gives the same error when creating the session. import onnxruntime
session = onnxruntime.InferenceSession('./model_fp16.onnx', providers=['CUDAExecutionProvider', 'CPUExecutionProvider']) Returns nothing for the full model and the following for the FP16 one: Traceback (most recent call last):
File "onnx-runtime-test.py", line 3, in <module>
session = onnxruntime.InferenceSession('./model_fp16.onnx', providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
File "/home/rootlab/triton_learning/yolo_v8/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 383, in __init__
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/home/rootlab/triton_learning/yolo_v8/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 424, in _create_inference_session
sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from ./model_fp16.onnx failed:Node (StatefulPartitionedCall/map/while_loop) Op (Loop) [TypeInferenceError] Graph attribute inferencing failed: Node (Resize__59) Op (Resize) [ShapeInferenceError] Either `sizes` or `scales` must be provided, but not both of them |
Given the original model works, but the converted one is invalid, it appears the issue is with the converter creating an invalid model rather than ONNX Runtime. As such, your onnxconverter-common issue would be the place to follow up. |
Describe the issue
I'm working with a model in Sagemaker (Resnet50 640x640, size
[1, -1, -1, 3]
) converted to ONNX. When trying to get more performance out of it by converting it to FP16, the conversion succeeds but trying to run the model gives this error:Trying out mixed precision instead fails at shape inferencing:
It gives the same error with the latest shape inferencing script from GitHub. I am not sure where I need to post this issue as multiple parts of the ONNX stack seem involved and not working.
Linking my onnxconverter-common issue here - #266.
To reproduce
FP16:
or mixed precision:
Yes, model inputs are UINT8. I don't know why but it breaks TensorRT acceleration and conversion too. Considering the difference is mainly FP32 being used for normalized data and UINT8 is raw pixel data... There shouldn't be that big of a difference. I understand TRT not working but the CUDA provider should work and it doesn't.
Urgency
Urgent.
A whole pipeline is built around training these models and deploying them with all the supporting applications for a client.
The model is slower than necessary though and these optimizations would have helped. Model choice is also limited, as the training pipeline is in Sagemaker.
Platform
Linux
OS Version
Ubuntu 20.04 LTS
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.15.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
11.8
The text was updated successfully, but these errors were encountered: