-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf Hints docs and General Opt Guide refactoring #10815
Perf Hints docs and General Opt Guide refactoring #10815
Conversation
8764140
to
6331667
Compare
fd895b8
to
15f1396
Compare
|
||
## (Optional) Additional Hints from the App | ||
Let's take an example of an application that processes 4 video streams. The most future-proof way to communicate the limitation of the parallel slack is to equip the performance hint with the optional `ov::hint::num_requests` configuration key set to 4. | ||
As discussed previosly, for the GPU this will limit the batch size, for the CPU - the number of inference streams, so each device uses the `ov::hint::num_requests` while converting the hint to the actual device configuration options: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed previosly, for the GPU this will limit the batch size, for the CPU - the number of inference streams, so each device uses the `ov::hint::num_requests` while converting the hint to the actual device configuration options: | |
For the GPU this will limit the batch size, for the CPU - the number of inference streams, so each device uses the `ov::hint::num_requests` while converting the hint to the actual device configuration options: |
## Combining the Hints and Individual Low-Level Settings | ||
|
||
## Testing the Performance of THe Hints with the Benchmark_App | ||
The `benchmark_app`, that exists in both [C++](../../samples/cpp/benchmark_app/README.md) and [Python](../../tools/benchmark_tool/README.md) versions, is the best way to evaluate the performance of the performaqnce hints for a particular device: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The `benchmark_app`, that exists in both [C++](../../samples/cpp/benchmark_app/README.md) and [Python](../../tools/benchmark_tool/README.md) versions, is the best way to evaluate the performance of the performaqnce hints for a particular device: | |
The `benchmark_app`, that exists in both [C++](../../samples/cpp/benchmark_app/README.md) and [Python](../../tools/benchmark_tool/README.md) versions, is the best way to evaluate the performance of the performance hints for a particular device: |
e8ac81b
to
356cfcc
Compare
53c8d1a
to
522f11d
Compare
776810e
to
ee16abe
Compare
|
||
In summary, when the performance _portability_ is of concern, consider the [High-Level Performance Hints](../OV_Runtime_UG/performance_hints.md). | ||
Below you can find the implementation details (particularly how the OpenVINO implements the 'throughput' approach) for the specific devices. | ||
Keep in mind that while multiple scheduling and/or batching approaches (combining individual inference requests) can work together, the hints make these decisions to be transparent to the application. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the paragraph above provides a lot of text, but no code examples to use this mode. Can we have a single example at least?
… following docs moved from the temp into specific feature, general product desc or corresponding plugins - openvino_docs_IE_DG_Model_caching_overview - openvino_docs_IE_DG_Int8Inference - openvino_docs_IE_DG_Bfloat16Inference - openvino_docs_OV_UG_NoDynamicShapes
7760124
to
cf9c3ac
Compare
cf9c3ac
to
94d3935
Compare
9b21c43
to
895f5d5
Compare
14c3df4
to
6bb649a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review will be provided later
@@ -9,22 +9,19 @@ When evaluating performance of your model with the OpenVINO Runtime, you must me | |||
|
|||
- Track separately the operations that happen outside the OpenVINO Runtime, like video decoding. | |||
|
|||
> **NOTE**: Some image pre-processing can be baked into the IR and accelerated. For more information, refer to [Embedding Preprocessing Computation](Additional_Optimizations.md) | |||
> **NOTE**: Some image pre-processing can be baked into the IR and accelerated accordingly. For more information, refer to [Embedding the Preprocessing](Additional_Optimizations.md). Also consider [_runtime_ preprocessing optimizations](../../optimization_guide/dldt_deployment_optimization_common). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this link does not work. Looks like .md is missed at the end
- Ensure the inputs are identical for the OpenVINO Runtime and the framework. For example, beware of random values that can be used to populate the inputs. | ||
- Consider [Image Pre-processing and Conversion](../../OV_Runtime_UG/preprocessing_overview.md), while any user-side pre-processing should be tracked separately. | ||
- When applicable, leverage the [Dynamic Shapes support](../../OV_Runtime_UG/ov_dynamic_shapes.md) | ||
- If possible, demand the same accuracy. For example, TensorFlow allows `FP16` execution, so when comparing to that, make sure to test the OpenVINO Runtime with the `FP16` as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can / should we refer to inference_precision hint here?
@@ -41,5 +41,14 @@ auto compiled_model = core.compile_model(model, "GPU", | |||
//! [hint_num_requests] | |||
} | |||
|
|||
//! [hint_plus_low_level] | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move global {} outside of doxygen markers hint_plus_low_level
and remove indentation.
- benchmark_app **-hint tput** -d 'device' -m 'path to your model' | ||
- benchmark_app **-hint latency** -d 'device' -m 'path to your model' | ||
- Disabling the hints to emulate the pre-hints era (highly recommended before trying the individual low-level settings, such as the number of streams as below, threads, etc): | ||
- - benchmark_app **-hint none -nstreams 1** -d 'device' -m 'path to your model' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we provide such examples as cmd line? BTW, current line contain double '-' and rendered incorrectly
openvino_docs_deployment_optimization_guide_common | ||
openvino_docs_deployment_optimization_guide_latency | ||
openvino_docs_deployment_optimization_guide_tput | ||
openvino_docs_deployment_optimization_guide_hints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we split the text in this document according to these sections? Now it's a plain text, so it's hard to pay attention on specific use case: latency vs tput. E.g. if I want latency, I don't want to read the whole stuff to find phrases about latency and filter out about tput
This document does not guide me, it provides a list of all capabilities.
A typical use-case for the `ov::InferRequest::infer()` is running a dedicated application thread per source of inputs (e.g. a camera), so that every step (frame capture, processing, results parsing and associated logic) is kept serial within the thread. | ||
In contrast, the `ov::InferRequest::start_async()` and `ov::InferRequest::wait()` allow the application to continue its activities and poll or wait for the inference completion when really needed. So one reason for using asynchronous code is _efficiency_. | ||
|
||
**NOTE**: Although the Synchronous API can be somewhat easier to start with, in the production code always prefer to use the Asynchronous (callbacks-based, below) API, as it is the most general and scalable way to implement the flow control for any possible number of requests (and hence both latency and throughput scenarios). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE should start with '>' to make proper indentation
Notice that the Async's `ov::InferRequest::wait()` waits for the specific request only. However, running multiple inference requests in parallel provides no guarantees on the completion order. This may complicate a possible logic based on the `ov::InferRequest::wait`. The most scalable approach is using callbacks (set via the `ov::InferRequest::set_callback`) that are executed upon completion of the request. The callback functions will be used by the OpenVINO runtime to notify on the results (or errors. | ||
This is more event-driven approach. | ||
|
||
Few important points on the callbacks: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we duplicate these notes about callback to documentation of set_callback itself?
However, for example, specific configurations, like multi-socket CPUs can deliver as high number of requests (at the same minimal latency) as there are NUMA nodes in the machine. | ||
Thus, human expertise is required to get the most out of the device even in the latency case. Consider using [OpenVINO high-level performance hints](../OV_Runtime_UG/performance_hints.md) instead. | ||
|
||
**NOTE**: [OpenVINO performance hints](./dldt_deployment_optimization_hints.md) is a recommended way for performance configuration, which is both device-agnostic and future-proof. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add >
) * Brushed the general optimization page * Opt GUIDE, WIP * perf hints doc placeholder * WIP * WIP2 * WIP 3 * added streams and few other details * fixed titles, misprints etc * Perf hints * movin the runtime optimizations intro * fixed link * Apply suggestions from code review Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> * some details on the FIL and other means when pure inference time is not the only factor * shuffled according to general->use-case->device-specifics flow, minor brushing * next iter * section on optimizing for tput and latency * couple of links to the features support matrix * Links, brushing, dedicated subsections for Latency/FIL/Tput * had to make the link less specific (otherwise docs compilations fails) * removing the Temp/Should be moved to the Opt Guide * shuffled the tput/latency/etc info into separated documents. also the following docs moved from the temp into specific feature, general product desc or corresponding plugins - openvino_docs_IE_DG_Model_caching_overview - openvino_docs_IE_DG_Int8Inference - openvino_docs_IE_DG_Bfloat16Inference - openvino_docs_OV_UG_NoDynamicShapes * fixed toc for ov_dynamic_shapes.md * referring the openvino_docs_IE_DG_Bfloat16Inference to avoid docs compilation errors * fixed main product TOC, removed ref from the second-level items * reviewers remarks * reverted the openvino_docs_OV_UG_NoDynamicShapes * reverting openvino_docs_IE_DG_Bfloat16Inference and openvino_docs_IE_DG_Int8Inference * "No dynamic shapes" to the "Dynamic shapes" as TOC * removed duplication * minor brushing * Caching to the next level in TOC * brushing * more on the perf counters ( for latency and dynamic cases) Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
Ported as a part of #11040, please, check that everything is correct |
* Added migration for deployment (#10800) * Added migration for deployment * Addressed comments * more info after the What's new Sessions' questions (#10803) * more info after the What's new Sessions' questions * generalizing the optimal_batch_size vs explicit value message * Update docs/OV_Runtime_UG/automatic_batching.md Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> * Update docs/OV_Runtime_UG/automatic_batching.md Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> * Update docs/OV_Runtime_UG/automatic_batching.md Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> * Update docs/OV_Runtime_UG/automatic_batching.md Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> * Update docs/OV_Runtime_UG/automatic_batching.md Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> * Update docs/OV_Runtime_UG/automatic_batching.md Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> * Perf Hints docs and General Opt Guide refactoring (#10815) * Brushed the general optimization page * Opt GUIDE, WIP * perf hints doc placeholder * WIP * WIP2 * WIP 3 * added streams and few other details * fixed titles, misprints etc * Perf hints * movin the runtime optimizations intro * fixed link * Apply suggestions from code review Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> * some details on the FIL and other means when pure inference time is not the only factor * shuffled according to general->use-case->device-specifics flow, minor brushing * next iter * section on optimizing for tput and latency * couple of links to the features support matrix * Links, brushing, dedicated subsections for Latency/FIL/Tput * had to make the link less specific (otherwise docs compilations fails) * removing the Temp/Should be moved to the Opt Guide * shuffled the tput/latency/etc info into separated documents. also the following docs moved from the temp into specific feature, general product desc or corresponding plugins - openvino_docs_IE_DG_Model_caching_overview - openvino_docs_IE_DG_Int8Inference - openvino_docs_IE_DG_Bfloat16Inference - openvino_docs_OV_UG_NoDynamicShapes * fixed toc for ov_dynamic_shapes.md * referring the openvino_docs_IE_DG_Bfloat16Inference to avoid docs compilation errors * fixed main product TOC, removed ref from the second-level items * reviewers remarks * reverted the openvino_docs_OV_UG_NoDynamicShapes * reverting openvino_docs_IE_DG_Bfloat16Inference and openvino_docs_IE_DG_Int8Inference * "No dynamic shapes" to the "Dynamic shapes" as TOC * removed duplication * minor brushing * Caching to the next level in TOC * brushing * more on the perf counters ( for latency and dynamic cases) Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> * Updated common IE pipeline infer-request section (#10844) * Updated common IE pipeline infer-reqest section * Update ov_infer_request.md * Apply suggestions from code review Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com> Co-authored-by: Maxim Shevtsov <maxim.y.shevtsov@intel.com> Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com> * DOCS: Removed useless 4 spaces in snippets (#10870) * Updated snippets * Added link to encryption * [DOCS] ARM CPU plugin docs (#10885) * initial commit ARM_CPU.md added ARM CPU is added to the list of supported devices * Update the list of supported properties * Update Device_Plugins.md * Update CODEOWNERS * Removed quotes in limitations section * NVIDIA and Android are added to the list of supported devices * Added See Also section and reg sign to arm * Added Preprocessing acceleration section * Update the list of supported layers * updated list of supported layers * fix typos * Added support disclaimer * update trade and reg symbols * fixed typos * fix typos * reg fix * add reg symbol back Co-authored-by: Vitaly Tuzov <vitaly.tuzov@intel.com> * Try to fix visualization (#10896) * Try to fix visualization * New try * Update Install&Deployment for migration guide to 22/1 (#10933) * updates * update * Getting started improvements (#10948) * Onnx updates (#10962) * onnx changes * onnx updates * onnx updates * fix broken anchors api reference (#10976) * add ote repo (#10979) * DOCS: Increase content width (#10995) * fixes * fix * Fixed compilation Co-authored-by: Maxim Shevtsov <maxim.y.shevtsov@intel.com> Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com> Co-authored-by: Aleksandr Voron <aleksandr.voron@intel.com> Co-authored-by: Vitaly Tuzov <vitaly.tuzov@intel.com> Co-authored-by: Ilya Churaev <ilya.churaev@intel.com> Co-authored-by: Yuan Xu <yuan1.xu@intel.com> Co-authored-by: Victoria Yashina <victoria.yashina@intel.com> Co-authored-by: Nikolay Tyukaev <nikolay.tyukaev@intel.com>
I have removed many sections that are either obsolete or need a heavy refactoring, for example the HETERO sections that was FPGA-oriented, also a tools section (e.g. VTune, etc), interop section, etc