Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for dynamic input size #506

Closed
meremeev opened this issue Feb 19, 2021 · 10 comments
Closed

Support for dynamic input size #506

meremeev opened this issue Feb 19, 2021 · 10 comments

Comments

@meremeev
Copy link
Contributor

Is there any possibility to generate TensorRT engine with dynamic input size?
If not, do you have any plans to provide this functionality or ideas how to approach it?

@jaybdub
Copy link
Contributor

jaybdub commented Feb 19, 2021

Hi @meremeev ,

Thanks for reaching out!

I imagine with enough work this may be possible but I'd have to investigate what changes are necessary. I haven't personally spent much time exploring this feature because most of the embedded systems use cases target static shapes.

I'd have to dig into it a bit more to get back with a meaningful answer.

Do you mind sharing your use case for dynamic shapes? I'm curious to understand the motivation for the feature.

Best,
John

@meremeev
Copy link
Contributor Author

meremeev commented Feb 20, 2021

Hi John,

Luminar is LiDAR company. But in addition to hardware we provide software SDK. Part of SDK functionality is semantic segmentation model. Depend on scan pattern settings size of point cloud could be different. So only one possibility to support this flexibility is to have model which can handle point clouds with different sizes.

Aside from our use case, embedded system is a very large domain. It cover everything from simple/low cost/single function devices e.g. door bell with face recognition to very complex/multi-functional devices with a lot of computational resources e.g. self-driving autopilot. For such systems it is essential to have flexibility in format/size of input/sensor data.

Another factor is domain. In area of image recognition/object detection input data is usually fixed size image. But such areas as sequence analysis, voice recognition, motion detection, movie analysis, etc. have dimension for which size flexibility is very important.
So if you see torch2trt as a universal solution to convert Torch model to TensorRT support for dynamic size is essential.

And I think something like this might work.

  1. extend conversion entry point with a way to provide additional info about dynamic dimensions for each input and min/opt/max values for this dimension to build TensorRT optimization profile. e.g. model_trt = torch2trt(model, [x], dynamic_sizes=[{0: (1, 10, 100)}])
  2. add argument to specify TensorRT builder flags. Particularly implicit batch size vs. explicit. I believe nvinfer1::IPluginV2DynamicExt interface works only with explicit batch (V2 interfaces). Right now you build implicit batch network by default.
  3. when build engine mark requested dimensions as a dynamic (-1) and provide optimization profile.

I am considering to do this changes but would like to discuss it first.

@meremeev
Copy link
Contributor Author

meremeev commented Feb 20, 2021

Another API which would be very useful is to serialize TensorRT engine and save it to file. This let load it to C/C++ application later. Right now we do it in little bit hacky way.

@jaybdub
Copy link
Contributor

jaybdub commented Feb 24, 2021

Hi @meremeev ,

Thanks for your reply, you raise some interesting use cases!

Regarding dynamic shapes

I've done some more research on what might be possible, but I'm not yet able to assess the impact of this feature / if we can safely integrate it here. Currently, I understand that some converters (ie: interpolation) will require adjustment to ensure they handle dynamic shapes appropriately. Our current test cases may not reveal this, since we use the same shape for building / testing.

Another note is that TensorRT allows for multiple optimization profiles (to cover multiple input shape ranges). This adds complexity and introduces some nuanced limitations (like INT8 calibration only applies to one profile). For your use case, do most of the tensor shapes fall within a continuous range, or multiple ranges? I'm trying to assess whether there is a tangible benefit to using multiple profiles, or if it's best to just support one profile w. multiple engines (if necessary).

Also, out of curiosity, have you explored the ONNX->TensorRT workflow for your purposes? This supports dynamic shapes, but perhaps has other limitations (which I'm interested to understand if this was the case for you).

Regarding serialization for C++

Good point, I'm not sure yet if an API is needed for this, but we definitely need to at least add instructions for this to our documentation.

Is this the solution you used?

with open('model.engine', 'wb') as f:
    f.write(model_trt.engine.serialize())

Best,
John

@meremeev
Copy link
Contributor Author

Hi John

For my use case I need only one dynamic dimension with one range/one profile. I agree, dynamic size support is a serious rework and could be some problems to resolve.

As far as I know TensorRT do not like multiple dynamic dimensions for the same tensor. It gives performance warning.
I think idea behind multiple profiles is a way to build multiple engines from the same network. But if we convert model we always can convert it multiple times.

Our current conversion pipeline use Torch->ONNX->TensorRT path with dynamic input size. But this path has some problems I hope to avoid by using toch2trt. Torch -> ONNX does not support some operations, has types restrictions, I can not parameterize custom kernels, etc.
Actually I am not sure torch2trt support that ops either because I already have them converted to custom kernels. Something to try.

But major problem comes from ONNX format compatibility. Torch 1.6 has ONNX IR version 0.0.6. It is compatible with TensorRT 7 But conversion for TensorRT 6 require IR 0.0.3 It is Torch 1.2 or 1.3. So I want to find more straight conversion pass without extra layers.

Yes, we use exactly the same code to serialize TensorRT engine.

Thanks,
Mark

@jaybdub
Copy link
Contributor

jaybdub commented Feb 26, 2021

Hi @meremeev ,

It seems like supporting just one dynamic range may be sufficient (or even preferred), and just run torch2trt multiple times if needed. The only potential downside I see is memory overhead from duplicating weights, but if this proves to be an issue it could be addressed later. I may explore this feature more soon, but I still can't make any guarantees. If you happen to experiment / discover more, I'm curious to hear.

Thanks for sharing your experience with ONNX. You might find the following helpful for your purposes

  1. Since this PR torch2trt will allow you to attach converters to user-defined methods. Instructions are in the PR currently. This will allow you to apply conversion at any level you desire. You could use this to implement your custom layer with native tensorrt layers, or your own plugin layer.
  2. We have a couple of plugin examples here. These current plugins simply wrap the torch C++ calls. We haven't fully streamlined this process, but you may be able to model your plugin off of these. They use torch mechanisms for serialization, and allow for parameterization directly in Python by passing torch tensors. This approach of wrapping torch calls is relatively simple, but perhaps not optimal for memory / performance reasons (many torch calls don't allow for in-place execution, so will incur a tensor copy overhead). Also, this will pull in the torch binaries, which you may/may not find acceptable. If you've defined your own kernel, you can still develop / parameterize a plugin and use it with torch2trt without using this torch wrapping trick, it will just take more work.

I've considered streamlining this process, which I may re-explore if it proves beneficial. For now, hopefully you find the above information helpful.

Best,
John

@MatthieuToulemont
Copy link

MatthieuToulemont commented Jul 27, 2021

I don't know if it is appropriate to mention it here, but depending on the set of operations you use, you might be able to do this with TRTorch.

To be more precise, it will work if your model has a UNet like architecture for which the upsampling factor is always the same (e.g. times 2).

It is a bit more opaque than this repo but works very well for traditional CNN architectures.

Best,
Matthieu

@meremeev
Copy link
Contributor Author

meremeev commented Jul 28, 2021 via email

@jaybdub jaybdub closed this as completed Jul 18, 2022
@jihad-akl
Copy link

Hi, does torch2trt now support custom dynamic input size?

@meremeev
Copy link
Contributor Author

meremeev commented Apr 28, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants