Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ONNXRuntime] Update inference_performance_optimization.md #1621

Merged
merged 1 commit into from
May 3, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions docs/development/inference_performance_optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,3 +142,26 @@ TVM internally leverages full hardware resource. Based on our experiment, settin
```bash
export TVM_NUM_THREADS=1
```

### ONNXRuntime

#### Thread configuration

You can use the following settings for thread optimization in Criteria

```
.optOption("interOpNumThreads", <num_of_thread>)
.optOption("intraOpNumThreads", <num_of_thread>)
```

Tips: Set to 1 on both of them at the beginning to see the performance.
Then set to total_cores/total_java_inference_thread on one of them to see how performance goes.

#### (GPU) TensorRT Backend

If you have tensorRT installed, you can try with the following backend on ONNXRuntime for performance optimization in Criteria

```
.optOption("ortDevice", "TensorRT")
```