Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes tensorrt cache being regenerated on path change #126

Merged
merged 2 commits into from
Jul 25, 2022

Conversation

fran6co
Copy link
Contributor

@fran6co fran6co commented Jul 6, 2022

Onnxruntime uses the path as a hash for the tensorrt cache, passing a binary avoids this and works as intended.

Fixes triton-inference-server/server#4587

Onnxruntime uses the path as a hash for the tensorrt cache, passing a binary avoids this and works as intended
@GuanLuo
Copy link
Contributor

GuanLuo commented Jul 19, 2022

@pranavsharma can you or someone familiar with TRT EP take a look at this PR? The change looks fine to me and just want to double check if the cache behavior will be changed as intended.

@pranavsharma
Copy link
Contributor

@stevenlix - any comments? Is this how we avoid TRT cache regeneration?

@stevenlix
Copy link

Yes. ORT-TRT uses ModelMetadefIdGenerator function to generate engine id. If the model passed in has a path, the id will be generated from the path string (https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/core/framework/execution_provider.cc#L151). To avoid the dependency on path, one can pass model binary to the inference session.

@GuanLuo GuanLuo requested review from Tabrizian and tanmayv25 July 20, 2022 18:01
@cnegron-nv
Copy link

@pranavsharma Can you give a final review to this PR?

pranavsharma
pranavsharma previously approved these changes Jul 20, 2022
tanmayv25
tanmayv25 previously approved these changes Jul 21, 2022
@fran6co fran6co dismissed stale reviews from tanmayv25 and pranavsharma via c062f96 July 21, 2022 07:18
@Tabrizian Tabrizian merged commit 05e0383 into triton-inference-server:main Jul 25, 2022
Tabrizian added a commit that referenced this pull request Jul 26, 2022
@Tabrizian
Copy link
Member

Hi @fran6co, @pranavsharma, @stevenlix

We are observing a failure in the CI as a result of this change. Looks like for models that expect additional weight files the relative paths do not work properly anymore. The test fails with the error below:

Deserialize tensor embeddings.20.weight failed.open file "./embeddings.20.weight" failed: No such file or directory |

Do you have a fix in mind for this? Otherwise, I think we need to revert this change.

@pranavsharma
Copy link
Contributor

I've reached out to @stevenlix. When is the latest you want this fix? If it's blocking the release feel free to revert it.

@Tabrizian
Copy link
Member

Tabrizian commented Jul 27, 2022

Thanks for the quick response! We are code freezing by the end of next week but I think it would be great if it is possible to have the fix ready for review by Wednesday next week so that we can test it before we are code frozen.

@pranavsharma
Copy link
Contributor

Hi @fran6co, @pranavsharma, @stevenlix

We are observing a failure in the CI as a result of this change. Looks like for models that expect additional weight files the relative paths do not work properly anymore. The test fails with the error below:

Deserialize tensor embeddings.20.weight failed.open file "./embeddings.20.weight" failed: No such file or directory |

Do you have a fix in mind for this? Otherwise, I think we need to revert this change.

fyi @jywu-msft

@stevenlix
Copy link

@pranavsharma
Copy link
Contributor

Let's revert this at this point because using byte stream won't work with external weight files. So, this PR will fail.

@bamdadd
Copy link

bamdadd commented Sep 6, 2022

Hi, @Tabrizian Do you plan to revert the revert anytime soon?

@Tabrizian
Copy link
Member

The underlying issue has not been fixed yet so I don't think there is any plan to revert this change.

CC @pranavsharma

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ONNXRuntime TensorRT cache gets regenerated every time a model is uploaded even with correct settings
9 participants