Skip to content

Commit

Permalink
reviewers remarks
Browse files Browse the repository at this point in the history
  • Loading branch information
myshevts committed Mar 16, 2022
1 parent 9bd2d25 commit 94d3935
Showing 1 changed file with 15 additions and 23 deletions.
38 changes: 15 additions & 23 deletions docs/OV_Runtime_UG/performance_hints.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,28 +28,20 @@ Internally, every device "translates" the value of the hint to the actual perfor
For example the `ov::hint::PerformanceMode::THROUGHPUT` selects number of CPU or GPU streams.
For the GPU, additionally the optimal batch size is selected and the [automatic batching](../OV_Runtime_UG/automatic_batching.md) is applied whenever possible (and also if the device supports that [refer to the devices/features support matrix](./supported_plugins/Device_Plugins.md)).

The resulting (device-specific) settings can be queried back from the instance of the `ov:compiled_model`.
Notice that the `benchmark_app`, outputs the actual settings, for example:

<code>
$benchmark_app -hint tput -d CPU -m 'path to your favorite model'

...

[Step 8/11] Setting optimal runtime parameters

[ INFO ] Device: CPU

[ INFO ] { PERFORMANCE_HINT , THROUGHPUT }

...

[ INFO ] { OPTIMAL_NUMBER_OF_INFER_REQUESTS , 4 }

[ INFO ] { NUM_STREAMS , 4 }

...
</code>
The resulting (device-specific) settings can be queried back from the instance of the `ov:Compiled_Model`.
Notice that the `benchmark_app`, outputs the actual settings for the THROUGHPUT hint, please the bottom of the output example:

```
$benchmark_app -hint tput -d CPU -m 'path to your favorite model'
...
[Step 8/11] Setting optimal runtime parameters
[ INFO ] Device: CPU
[ INFO ] { PERFORMANCE_HINT , THROUGHPUT }
...
[ INFO ] { OPTIMAL_NUMBER_OF_INFER_REQUESTS , 4 }
[ INFO ] { NUM_STREAMS , 4 }
...
```

## Using the Performance Hints: Basic API
In the example code-snippet below the `ov::hint::PerformanceMode::THROUGHPUT` is specified for the `ov::hint::performance_mode` property for the compile_model:
Expand Down Expand Up @@ -106,7 +98,7 @@ Using the hints assumes that the application queries the `ov::optimal_number_of_

@endsphinxdirective

While an application if free to create more requests if needed (for example to support asynchronous inputs population) **it is very important to at least run the `ov::optimal_number_of_infer_requests` of the inference requests in parallel**, for efficiency (device utilization) reasons.
While an application is free to create more requests if needed (for example to support asynchronous inputs population) **it is very important to at least run the `ov::optimal_number_of_infer_requests` of the inference requests in parallel**, for efficiency (device utilization) reasons.

Also, notice that `ov::hint::PerformanceMode::LATENCY` does not necessarily imply using single inference request. For example, multi-socket CPUs can deliver as high number of requests (at the same minimal latency) as there are NUMA nodes the machine features.
To make your application fully scalable, prefer to query the `ov::optimal_number_of_infer_requests` directly.
Expand Down

0 comments on commit 94d3935

Please sign in to comment.