TensorRT convert-to-uff not running under Tensorflow 2.2 #774

fengye · 2021-01-28T12:51:48Z

When I followed the docs to convert .h5 to .uff file for TensorRT, I hit an error:
AttributeError: module 'tensorflow' has no attribute 'gfile'

It's essentially the same issue from this guy: https://forums.developer.nvidia.com/t/deepstream-object-detector-ssd-cannot-convert-to-uff-file-in-ds5-0ga/145820

After a bit investigation I found the latest TensorRT 7.2.2 only compatible with Tensorflow 1.15:
https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-7.html#rel_7-2-2

So my workaround is to create another conda environment which has tf 1.15 installed:

conda create -n donkey_tf1 python=3.7
conda activate donkey_tf1
# This will nstall tensorflow 1.15
pip install --pre --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v44 'tensorflow<2'
# Install TensorRT 7.2.2.3, the tarball file method
# https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#installing-tar
# Assuming ${TensorRT-7.2.2.3-Dir} is untar directory
cd ${TensorRT-7.2.2.3-Dir}/python
pip install tensorrt-7.2.2.3-cp37-none-linux_x86_64.whl
cd ${TensorRT-7.2.2.3-Dir}/uff
pip install uff-0.6.9-py2.py3-none-any.whl
cd ${TensorRT-7.2.2.3-Dir}/graphsurgeon
pip install graphsurgeon-0.4.5-py2.py3-none-any.whl
cd ${TensorRT-7.2.2.3-Dir}/onnx_graphsurgeon
pip install onnx_graphsurgeon-0.2.6-py2.py3-none-any.whl

# Now we can use convert-to-uff but has to use the python environment conda provides
~/miniconda3/envs/donkey_tf1/bin/convert-to-uff mypilot.pb

NOTE: if a user want to convert his model to TensorRT, before calling convert-to-uff, the current dev branch also has an issue freezing the model. PR is here: #773

I guess this either need to be addressed or to be documented. TensorRT has great potential and it's sad not very well supported.

The text was updated successfully, but these errors were encountered:

cloud-rocket · 2021-01-29T21:06:03Z

I started working on this - but never finished.

If somebody wants to have some reference - dev...cloud-rocket:add-tf2-tensorrt-support

tikurahul · 2021-01-29T21:49:04Z

https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#usingtftrt

The conversion process is a bit different starting TF 2.x.

fengye · 2021-02-06T17:30:51Z

Thanks for the link. I'll take a look when I get some spare time.

fengye · 2021-07-17T05:12:24Z

It looks like this issue has been addressed by #884 @DocGarbanzo ? I haven't merged the latest dev branch but if so feel free to close the issue. Thanks!

fengye · 2021-07-17T17:08:51Z

Seems working, although the loading(and compiling on-the-fly?) a .trt model with --type=tensorrt_linear is painfully slow on my Nano. Close this issue.

DocGarbanzo · 2021-07-17T19:41:22Z

@fengye - Great that you tested on Nano. Yes, the on-the-fly graph optimisation is slow, but it is called only on the first inference. I saw it to be very slow on the RPi, but when it finished, the fps rates were close to tflite - like 10% slower. You can also use it on your PC (should work on all OS, I can only confirm ubuntu and OSX). Creating the tensorrt graph on the target architecture should allow the best optimisation for the respective hardware -at least that's what I understood.

fengye · 2021-07-18T08:22:28Z

@fengye - Great that you tested on Nano. Yes, the on-the-fly graph optimisation is slow, but it is called only on the first inference. I saw it to be very slow on the RPi, but when it finished, the fps rates were close to tflite - like 10% slower. You can also use it on your PC (should work on all OS, I can only confirm ubuntu and OSX). Creating the tensorrt graph on the target architecture should allow the best optimisation for the respective hardware -at least that's what I understood.

Sorry for not being specific, I was referring to the loading time. Once the trt model is up and running, the average time spent on pilot is around 40ms and that's not slow at all. Is there another way that can avoid this on-the-fly optimisation and serialise the optimised model onto disk for TensorRT?

DocGarbanzo · 2021-07-18T11:42:15Z

Ok, 40ms on the nano is quite slow as this equates to 25Hz. Even on the RPi 4 I'm seeing more like 50Hz, I would expect at least 100Hz on the nano. There is a way to generate the tensorrt model already in training, but afaik it won't be optimised to the target platform. Do you want to have a look at this and try it out (https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html)? We can integrate it afterwards, if it works.

TCIII · 2021-07-18T20:30:34Z

@fengye,

Jetpack 4.5.1 includes TensorRT 7.1.3. Why are you running TensorRT 7.2.2?

fengye · 2021-07-18T20:35:43Z

No matter what version TensorRT is, on the compatibility page it says tested with Tensorflow 1.15. https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-7.html#rel_7-1-3 Cheers, Ye On 19/07/2021, at 08:30, TCIII ***@***.***> wrote: @fengye<https://github.com/fengye>, Jetpack 4.5.1 includes TensorRT 7.1.3. Why are you running TensorRT 7.2.2? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#774 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AACE42KDAFEVJRF4Q5VFTI3TYM2XJANCNFSM4WXAGG2A>.

fengye · 2021-07-18T20:42:44Z

Ok, 40ms on the nano is quite slow as this equates to 25Hz. Even on the RPi 4 I'm seeing more like 50Hz, I would expect at least 100Hz on the nano. There is a way to generate the tensorrt model already in training, but afaik it won't be optimised to the target platform. Do you want to have a look at this and try it out (https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html)? We can integrate it afterwards, if it works.

Might need to double check the numbers. If memory serves right, my Pi 4 runs about 10hz on tflite models and Nano runs about 20-30hz on trt models in 5w mode, when I still used to export .uff files. Either way it's the loading time bothers me. It takes about 5 minutes to load and compile a trt model on Nano everytime I started a autopilot drive. I will have some work on a new parts code then come back to see if there's anything I can do with the tf-trt integration.

TCIII · 2021-07-18T20:46:07Z

@fengye,

NVIDIA has always been behind the curve when it comes to updating their compatibility chart.

DocGarbanzo · 2021-07-18T21:27:48Z

Ok, 40ms on the nano is quite slow as this equates to 25Hz. Even on the RPi 4 I'm seeing more like 50Hz, I would expect at least 100Hz on the nano. There is a way to generate the tensorrt model already in training, but afaik it won't be optimised to the target platform. Do you want to have a look at this and try it out (https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html)? We can integrate it afterwards, if it works.

Might need to double check the numbers. If memory serves right, my Pi 4 runs about 10hz on tflite models and Nano runs about 20-30hz on trt models in 5w mode, when I still used to export .uff files. Either way it's the loading time bothers me. It takes about 5 minutes to load and compile a trt model on Nano everytime I started a autopilot drive. I will have some work on a new parts code then come back to see if there's anything I can do with the tf-trt integration.

There is a profile.py script that you can run independently of the car loop. You should get > 50fps on RPi for the standard linear model in tflite and standard image size (120x160x3), otherwise your CPU might be throttled. For consistency you should see the same execution time (like 1000 / fps) in ms in the car loop when looking at the performance data that is printed when you stop the car. If you don't see this, then s/th is flawed with your install.

fengye · 2021-07-18T22:49:03Z

Ok, 40ms on the nano is quite slow as this equates to 25Hz. Even on the RPi 4 I'm seeing more like 50Hz, I would expect at least 100Hz on the nano. There is a way to generate the tensorrt model already in training, but afaik it won't be optimised to the target platform. Do you want to have a look at this and try it out (https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html)? We can integrate it afterwards, if it works.

Might need to double check the numbers. If memory serves right, my Pi 4 runs about 10hz on tflite models and Nano runs about 20-30hz on trt models in 5w mode, when I still used to export .uff files. Either way it's the loading time bothers me. It takes about 5 minutes to load and compile a trt model on Nano everytime I started a autopilot drive. I will have some work on a new parts code then come back to see if there's anything I can do with the tf-trt integration.

There is a profile.py script that you can run independently of the car loop. You should get > 50fps on RPi for the standard linear model in tflite and standard image size (120x160x3), otherwise your CPU might be throttled. For consistency you should see the same execution time (like 1000 / fps) in ms in the car loop when looking at the performance data that is printed when you stop the car. If you don't see this, then s/th is flawed with your install.

There're definitely some gaps I can see between a profiler.py and a manage.py drive(camera loop 30hz). For my Nano 5w mode, the profiler.py gives 60fps for a trt model, and actual drive run gives 30ms(33fps) on average for KerasLinear part. For some reason, the PWM steering and PWM throttle parts are taking way too much time and that's probably the reason that lower the overall model inference performance, depending on how the code profiles the parts(whether taking the user mode into account or not, profile the main thread or actual worker thread, haven't looked into it yet).

Anyway given the performance on the profiler running pure model, I don't have too much problem with it for now. It's the loading time that slows down the development iteration.

fengye closed this as completed Jul 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT convert-to-uff not running under Tensorflow 2.2 #774

TensorRT convert-to-uff not running under Tensorflow 2.2 #774

fengye commented Jan 28, 2021 •

edited

Loading

cloud-rocket commented Jan 29, 2021

tikurahul commented Jan 29, 2021

fengye commented Feb 6, 2021

fengye commented Jul 17, 2021

fengye commented Jul 17, 2021

DocGarbanzo commented Jul 17, 2021

fengye commented Jul 18, 2021

DocGarbanzo commented Jul 18, 2021

TCIII commented Jul 18, 2021

fengye commented Jul 18, 2021 via email

fengye commented Jul 18, 2021

TCIII commented Jul 18, 2021

DocGarbanzo commented Jul 18, 2021

fengye commented Jul 18, 2021

TensorRT convert-to-uff not running under Tensorflow 2.2 #774

TensorRT convert-to-uff not running under Tensorflow 2.2 #774

Comments

fengye commented Jan 28, 2021 • edited Loading

cloud-rocket commented Jan 29, 2021

tikurahul commented Jan 29, 2021

fengye commented Feb 6, 2021

fengye commented Jul 17, 2021

fengye commented Jul 17, 2021

DocGarbanzo commented Jul 17, 2021

fengye commented Jul 18, 2021

DocGarbanzo commented Jul 18, 2021

TCIII commented Jul 18, 2021

fengye commented Jul 18, 2021 via email

fengye commented Jul 18, 2021

TCIII commented Jul 18, 2021

DocGarbanzo commented Jul 18, 2021

fengye commented Jul 18, 2021

fengye commented Jan 28, 2021 •

edited

Loading