Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] FP16 conversion yields an unusable model #17447

Open
eddieevt-DXC opened this issue Sep 7, 2023 · 3 comments
Open

[Bug] FP16 conversion yields an unusable model #17447

eddieevt-DXC opened this issue Sep 7, 2023 · 3 comments
Labels
ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider

Comments

@eddieevt-DXC
Copy link

eddieevt-DXC commented Sep 7, 2023

Describe the issue

I'm working with a model in Sagemaker (Resnet50 640x640, size [1, -1, -1, 3]) converted to ONNX. When trying to get more performance out of it by converting it to FP16, the conversion succeeds but trying to run the model gives this error:

E0907 08:27:25.823138 1379 model_lifecycle.cc:626] failed to load 'sagemaker' version 1: Internal: onnx runtime error 1: 
Load model from /models/sagemaker/1/model.onnx failed:Node (StatefulPartitionedCall/map/while_loop) Op (Loop) TypeInferenceError] 
Graph attribute inferencing failed: Node (Resize__59) Op (Resize) [ShapeInferenceError] 
Either `sizes` or `scales` must be provided, but not both of them

Trying out mixed precision instead fails at shape inferencing:

Traceback (most recent call last):
  File "/workspace/fp-16-onnx-converter.py", line 15, in <module>
    model_fp16 = auto_mixed_precision.auto_convert_mixed_precision(model, input_feed, rtol=0.01, atol=0.001, keep_io_types=True)
  File "/usr/local/lib/python3.10/dist-packages/onnxconverter_common/auto_mixed_precision.py", line 80, in auto_convert_mixed_precision
    if not run_attempt(node_names):
  File "/usr/local/lib/python3.10/dist-packages/onnxconverter_common/auto_mixed_precision.py", line 72, in run_attempt
    res1 = get_tensor_values_using_ort(model, feed_dict)
  File "/usr/local/lib/python3.10/dist-packages/onnxconverter_common/auto_mixed_precision.py", line 132, in get_tensor_values_using_ort
    sess = ort.InferenceSession(model.SerializeToString(), sess_options, providers=['CUDAExecutionProvider'])
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 383, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 426, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (StatefulPartitionedCall/map/while_loop) Op (Loop) [TypeInferenceError] Graph attribute inferencing failed: Node (Resize__59) Op (Resize) [ShapeInferenceError] Either `sizes` or `scales` must be provided, but not both of them

It gives the same error with the latest shape inferencing script from GitHub. I am not sure where I need to post this issue as multiple parts of the ONNX stack seem involved and not working.

Linking my onnxconverter-common issue here - #266.

To reproduce

FP16:

import onnx
from onnxconverter_common import float16

model = onnx.load("./model.onnx")
model_fp16 = float16.convert_float_to_float16(model)
onnx.checker.check_model(model_fp16) 
onnx.save(model_fp16, "./model_fp16.onnx")

or mixed precision:

from onnxconverter_common import auto_mixed_precision
import onnx
import numpy as np

input_feed = { "input_tensor": np.random.randint(0, 255, size=(1, 230, 150, 3), dtype=np.uint8) }

model = onnx.load("./model.onnx")
model_fp16 = auto_mixed_precision.auto_convert_mixed_precision(model, input_feed, rtol=0.01, atol=0.001, keep_io_types=True)
onnx.save(model_fp16, "./model_mixed.onnx")

Yes, model inputs are UINT8. I don't know why but it breaks TensorRT acceleration and conversion too. Considering the difference is mainly FP32 being used for normalized data and UINT8 is raw pixel data... There shouldn't be that big of a difference. I understand TRT not working but the CUDA provider should work and it doesn't.

Urgency

Urgent.

A whole pipeline is built around training these models and deploying them with all the supporting applications for a client.
The model is slower than necessary though and these optimizations would have helped. Model choice is also limited, as the training pipeline is in Sagemaker.

Platform

Linux

OS Version

Ubuntu 20.04 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.15.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

11.8

@github-actions github-actions bot added ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider labels Sep 7, 2023
@skottmckay
Copy link
Contributor

Can you share the original model? It's not the CUDA execution provider that is failing, it's the ONNX validation of the model during loading.

Graph attribute inferencing failed: Node (Resize__59) Op (Resize) [ShapeInferenceError]
Either sizes or scales must be provided, but not both of them

The Resize spec says the node can only provide either the sizes or scales optional input, so the model is invalid as it does not conform to the ONNX spec. https://github.com/onnx/onnx/blob/main/docs/Operators.md#Resize

That makes the next question whether the original model was valid or not. What happens if you try and load the original model in onnxruntime? You don't need to run it - just create an InferenceSession with it and see if that is successful.

@eddieevt-DXC
Copy link
Author

eddieevt-DXC commented Sep 7, 2023

Sure, the model is SSD ResNet152 V1 FPN 640x640 (RetinaNet152). Sagemaker works with TF Zoo models.

The original UINT8/FP32 (input/outputs) converted to ONNX model works. Loading it in a session is fine, I've ran inference with it too. The FP16 converted one gives the same error when creating the session.

import onnxruntime

session = onnxruntime.InferenceSession('./model_fp16.onnx', providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])

Returns nothing for the full model and the following for the FP16 one:

Traceback (most recent call last):
  File "onnx-runtime-test.py", line 3, in <module>
    session = onnxruntime.InferenceSession('./model_fp16.onnx', providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
  File "/home/rootlab/triton_learning/yolo_v8/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 383, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/rootlab/triton_learning/yolo_v8/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 424, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from ./model_fp16.onnx failed:Node (StatefulPartitionedCall/map/while_loop) Op (Loop) [TypeInferenceError] Graph attribute inferencing failed: Node (Resize__59) Op (Resize) [ShapeInferenceError] Either `sizes` or `scales` must be provided, but not both of them

@skottmckay
Copy link
Contributor

Given the original model works, but the converted one is invalid, it appears the issue is with the converter creating an invalid model rather than ONNX Runtime. As such, your onnxconverter-common issue would be the place to follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider
Projects
None yet
Development

No branches or pull requests

2 participants