-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TensorRT convert-to-uff not running under Tensorflow 2.2 #774
Comments
I started working on this - but never finished. If somebody wants to have some reference - dev...cloud-rocket:add-tf2-tensorrt-support |
https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#usingtftrt The conversion process is a bit different starting TF 2.x. |
Thanks for the link. I'll take a look when I get some spare time. |
It looks like this issue has been addressed by #884 @DocGarbanzo ? I haven't merged the latest dev branch but if so feel free to close the issue. Thanks! |
Seems working, although the loading(and compiling on-the-fly?) a .trt model with --type=tensorrt_linear is painfully slow on my Nano. Close this issue. |
@fengye - Great that you tested on Nano. Yes, the on-the-fly graph optimisation is slow, but it is called only on the first inference. I saw it to be very slow on the RPi, but when it finished, the fps rates were close to tflite - like 10% slower. You can also use it on your PC (should work on all OS, I can only confirm ubuntu and OSX). Creating the tensorrt graph on the target architecture should allow the best optimisation for the respective hardware -at least that's what I understood. |
Sorry for not being specific, I was referring to the loading time. Once the trt model is up and running, the average time spent on pilot is around 40ms and that's not slow at all. Is there another way that can avoid this on-the-fly optimisation and serialise the optimised model onto disk for TensorRT? |
Ok, 40ms on the nano is quite slow as this equates to 25Hz. Even on the RPi 4 I'm seeing more like 50Hz, I would expect at least 100Hz on the nano. There is a way to generate the tensorrt model already in training, but afaik it won't be optimised to the target platform. Do you want to have a look at this and try it out (https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html)? We can integrate it afterwards, if it works. |
Jetpack 4.5.1 includes TensorRT 7.1.3. Why are you running TensorRT 7.2.2? |
No matter what version TensorRT is, on the compatibility page it says tested with Tensorflow 1.15.
https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-7.html#rel_7-1-3
Cheers,
Ye
On 19/07/2021, at 08:30, TCIII ***@***.***> wrote:
@fengye<https://github.com/fengye>,
Jetpack 4.5.1 includes TensorRT 7.1.3. Why are you running TensorRT 7.2.2?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#774 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AACE42KDAFEVJRF4Q5VFTI3TYM2XJANCNFSM4WXAGG2A>.
|
Might need to double check the numbers. If memory serves right, my Pi 4 runs about 10hz on tflite models and Nano runs about 20-30hz on trt models in 5w mode, when I still used to export .uff files. Either way it's the loading time bothers me. It takes about 5 minutes to load and compile a trt model on Nano everytime I started a autopilot drive. I will have some work on a new parts code then come back to see if there's anything I can do with the tf-trt integration. |
NVIDIA has always been behind the curve when it comes to updating their compatibility chart. |
There is a |
There're definitely some gaps I can see between a profiler.py and a manage.py drive(camera loop 30hz). For my Nano 5w mode, the profiler.py gives 60fps for a trt model, and actual drive run gives 30ms(33fps) on average for KerasLinear part. For some reason, the PWM steering and PWM throttle parts are taking way too much time and that's probably the reason that lower the overall model inference performance, depending on how the code profiles the parts(whether taking the user mode into account or not, profile the main thread or actual worker thread, haven't looked into it yet). Anyway given the performance on the profiler running pure model, I don't have too much problem with it for now. It's the loading time that slows down the development iteration. |
When I followed the docs to convert .h5 to .uff file for TensorRT, I hit an error:
AttributeError: module 'tensorflow' has no attribute 'gfile'
It's essentially the same issue from this guy: https://forums.developer.nvidia.com/t/deepstream-object-detector-ssd-cannot-convert-to-uff-file-in-ds5-0ga/145820
After a bit investigation I found the latest TensorRT 7.2.2 only compatible with Tensorflow 1.15:
https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-7.html#rel_7-2-2
So my workaround is to create another conda environment which has tf 1.15 installed:
NOTE: if a user want to convert his model to TensorRT, before calling convert-to-uff, the current
dev
branch also has an issue freezing the model. PR is here: #773I guess this either need to be addressed or to be documented. TensorRT has great potential and it's sad not very well supported.
The text was updated successfully, but these errors were encountered: