Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf Hints docs and General Opt Guide refactoring #10815

Merged

Conversation

myshevts
Copy link
Contributor

@myshevts myshevts commented Mar 5, 2022

I have removed many sections that are either obsolete or need a heavy refactoring, for example the HETERO sections that was FPGA-oriented, also a tools section (e.g. VTune, etc), interop section, etc

@myshevts myshevts requested a review from a team as a code owner March 5, 2022 14:43
@myshevts myshevts requested review from avladimi and removed request for a team March 5, 2022 14:43
@myshevts myshevts force-pushed the perf-hints-docs branch 3 times, most recently from 8764140 to 6331667 Compare March 5, 2022 15:38
@ilya-lavrenov ilya-lavrenov added the port to master Required port to master from 2022.3 LTS label Mar 7, 2022
@myshevts myshevts force-pushed the perf-hints-docs branch 2 times, most recently from fd895b8 to 15f1396 Compare March 9, 2022 15:30

## (Optional) Additional Hints from the App
Let's take an example of an application that processes 4 video streams. The most future-proof way to communicate the limitation of the parallel slack is to equip the performance hint with the optional `ov::hint::num_requests` configuration key set to 4.
As discussed previosly, for the GPU this will limit the batch size, for the CPU - the number of inference streams, so each device uses the `ov::hint::num_requests` while converting the hint to the actual device configuration options:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As discussed previosly, for the GPU this will limit the batch size, for the CPU - the number of inference streams, so each device uses the `ov::hint::num_requests` while converting the hint to the actual device configuration options:
For the GPU this will limit the batch size, for the CPU - the number of inference streams, so each device uses the `ov::hint::num_requests` while converting the hint to the actual device configuration options:

## Combining the Hints and Individual Low-Level Settings

## Testing the Performance of THe Hints with the Benchmark_App
The `benchmark_app`, that exists in both [C++](../../samples/cpp/benchmark_app/README.md) and [Python](../../tools/benchmark_tool/README.md) versions, is the best way to evaluate the performance of the performaqnce hints for a particular device:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `benchmark_app`, that exists in both [C++](../../samples/cpp/benchmark_app/README.md) and [Python](../../tools/benchmark_tool/README.md) versions, is the best way to evaluate the performance of the performaqnce hints for a particular device:
The `benchmark_app`, that exists in both [C++](../../samples/cpp/benchmark_app/README.md) and [Python](../../tools/benchmark_tool/README.md) versions, is the best way to evaluate the performance of the performance hints for a particular device:

@myshevts myshevts changed the title WIP: Perf hints docs Perf Hints docs and General Opt Guide refactoring Mar 10, 2022
@myshevts myshevts force-pushed the perf-hints-docs branch 5 times, most recently from 53c8d1a to 522f11d Compare March 15, 2022 09:05
@myshevts myshevts requested review from a team as code owners March 15, 2022 11:51
@myshevts myshevts requested a review from a team March 15, 2022 11:51

In summary, when the performance _portability_ is of concern, consider the [High-Level Performance Hints](../OV_Runtime_UG/performance_hints.md).
Below you can find the implementation details (particularly how the OpenVINO implements the 'throughput' approach) for the specific devices.
Keep in mind that while multiple scheduling and/or batching approaches (combining individual inference requests) can work together, the hints make these decisions to be transparent to the application.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the paragraph above provides a lot of text, but no code examples to use this mode. Can we have a single example at least?

Copy link
Contributor

@ilya-lavrenov ilya-lavrenov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review will be provided later

@ilya-lavrenov ilya-lavrenov merged commit cbfb8a1 into openvinotoolkit:releases/2022/1 Mar 17, 2022
@@ -9,22 +9,19 @@ When evaluating performance of your model with the OpenVINO Runtime, you must me

- Track separately the operations that happen outside the OpenVINO Runtime, like video decoding.

> **NOTE**: Some image pre-processing can be baked into the IR and accelerated. For more information, refer to [Embedding Preprocessing Computation](Additional_Optimizations.md)
> **NOTE**: Some image pre-processing can be baked into the IR and accelerated accordingly. For more information, refer to [Embedding the Preprocessing](Additional_Optimizations.md). Also consider [_runtime_ preprocessing optimizations](../../optimization_guide/dldt_deployment_optimization_common).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this link does not work. Looks like .md is missed at the end

- Ensure the inputs are identical for the OpenVINO Runtime and the framework. For example, beware of random values that can be used to populate the inputs.
- Consider [Image Pre-processing and Conversion](../../OV_Runtime_UG/preprocessing_overview.md), while any user-side pre-processing should be tracked separately.
- When applicable, leverage the [Dynamic Shapes support](../../OV_Runtime_UG/ov_dynamic_shapes.md)
- If possible, demand the same accuracy. For example, TensorFlow allows `FP16` execution, so when comparing to that, make sure to test the OpenVINO Runtime with the `FP16` as well.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can / should we refer to inference_precision hint here?

@@ -41,5 +41,14 @@ auto compiled_model = core.compile_model(model, "GPU",
//! [hint_num_requests]
}

//! [hint_plus_low_level]
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move global {} outside of doxygen markers hint_plus_low_level and remove indentation.

- benchmark_app **-hint tput** -d 'device' -m 'path to your model'
- benchmark_app **-hint latency** -d 'device' -m 'path to your model'
- Disabling the hints to emulate the pre-hints era (highly recommended before trying the individual low-level settings, such as the number of streams as below, threads, etc):
- - benchmark_app **-hint none -nstreams 1** -d 'device' -m 'path to your model'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we provide such examples as cmd line? BTW, current line contain double '-' and rendered incorrectly

openvino_docs_deployment_optimization_guide_common
openvino_docs_deployment_optimization_guide_latency
openvino_docs_deployment_optimization_guide_tput
openvino_docs_deployment_optimization_guide_hints
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we split the text in this document according to these sections? Now it's a plain text, so it's hard to pay attention on specific use case: latency vs tput. E.g. if I want latency, I don't want to read the whole stuff to find phrases about latency and filter out about tput

This document does not guide me, it provides a list of all capabilities.

A typical use-case for the `ov::InferRequest::infer()` is running a dedicated application thread per source of inputs (e.g. a camera), so that every step (frame capture, processing, results parsing and associated logic) is kept serial within the thread.
In contrast, the `ov::InferRequest::start_async()` and `ov::InferRequest::wait()` allow the application to continue its activities and poll or wait for the inference completion when really needed. So one reason for using asynchronous code is _efficiency_.

**NOTE**: Although the Synchronous API can be somewhat easier to start with, in the production code always prefer to use the Asynchronous (callbacks-based, below) API, as it is the most general and scalable way to implement the flow control for any possible number of requests (and hence both latency and throughput scenarios).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE should start with '>' to make proper indentation

Notice that the Async's `ov::InferRequest::wait()` waits for the specific request only. However, running multiple inference requests in parallel provides no guarantees on the completion order. This may complicate a possible logic based on the `ov::InferRequest::wait`. The most scalable approach is using callbacks (set via the `ov::InferRequest::set_callback`) that are executed upon completion of the request. The callback functions will be used by the OpenVINO runtime to notify on the results (or errors.
This is more event-driven approach.

Few important points on the callbacks:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we duplicate these notes about callback to documentation of set_callback itself?

However, for example, specific configurations, like multi-socket CPUs can deliver as high number of requests (at the same minimal latency) as there are NUMA nodes in the machine.
Thus, human expertise is required to get the most out of the device even in the latency case. Consider using [OpenVINO high-level performance hints](../OV_Runtime_UG/performance_hints.md) instead.

**NOTE**: [OpenVINO performance hints](./dldt_deployment_optimization_hints.md) is a recommended way for performance configuration, which is both device-agnostic and future-proof.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add >

ilya-lavrenov pushed a commit to ilya-lavrenov/openvino that referenced this pull request Mar 18, 2022
)

* Brushed the general optimization page

* Opt GUIDE, WIP

* perf hints doc placeholder

* WIP

* WIP2

* WIP 3

* added streams and few other details

* fixed titles, misprints etc

* Perf hints

* movin the runtime optimizations intro

* fixed link

* Apply suggestions from code review

Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>

* some details on the FIL and other means when pure inference time is not the only factor

* shuffled according to general->use-case->device-specifics flow, minor brushing

* next iter

* section on optimizing for tput and latency

* couple of links to the features support matrix

* Links, brushing, dedicated subsections for Latency/FIL/Tput

* had to make the link less specific (otherwise docs compilations fails)

* removing the Temp/Should be moved to the Opt Guide

* shuffled the tput/latency/etc info into separated documents. also the following docs moved from the temp into specific feature, general product desc or corresponding plugins

-   openvino_docs_IE_DG_Model_caching_overview
-   openvino_docs_IE_DG_Int8Inference
-   openvino_docs_IE_DG_Bfloat16Inference
-   openvino_docs_OV_UG_NoDynamicShapes

* fixed toc for ov_dynamic_shapes.md

* referring the openvino_docs_IE_DG_Bfloat16Inference to avoid docs compilation errors

* fixed main product TOC, removed ref from the second-level items

* reviewers remarks

* reverted the openvino_docs_OV_UG_NoDynamicShapes

* reverting openvino_docs_IE_DG_Bfloat16Inference and openvino_docs_IE_DG_Int8Inference

* "No dynamic shapes" to the "Dynamic shapes" as TOC

* removed duplication

* minor brushing

* Caching to the next level in TOC

* brushing

* more on the perf counters ( for latency and dynamic cases)

Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
@ilya-lavrenov
Copy link
Contributor

Ported as a part of #11040, please, check that everything is correct

@ilya-lavrenov ilya-lavrenov added ported to master Ported from 2022.x branches to master and removed port to master Required port to master from 2022.3 LTS labels Mar 18, 2022
azhogov pushed a commit that referenced this pull request Mar 18, 2022
* Added migration for deployment (#10800)

* Added migration for deployment

* Addressed comments

* more info after the What's new Sessions' questions (#10803)

* more info after the What's new Sessions' questions

* generalizing the optimal_batch_size vs explicit value message

* Update docs/OV_Runtime_UG/automatic_batching.md

Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>

* Update docs/OV_Runtime_UG/automatic_batching.md

Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>

* Update docs/OV_Runtime_UG/automatic_batching.md

Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>

* Update docs/OV_Runtime_UG/automatic_batching.md

Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>

* Update docs/OV_Runtime_UG/automatic_batching.md

Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>

* Update docs/OV_Runtime_UG/automatic_batching.md

Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>

Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>

* Perf Hints docs and General Opt Guide refactoring (#10815)

* Brushed the general optimization page

* Opt GUIDE, WIP

* perf hints doc placeholder

* WIP

* WIP2

* WIP 3

* added streams and few other details

* fixed titles, misprints etc

* Perf hints

* movin the runtime optimizations intro

* fixed link

* Apply suggestions from code review

Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>

* some details on the FIL and other means when pure inference time is not the only factor

* shuffled according to general->use-case->device-specifics flow, minor brushing

* next iter

* section on optimizing for tput and latency

* couple of links to the features support matrix

* Links, brushing, dedicated subsections for Latency/FIL/Tput

* had to make the link less specific (otherwise docs compilations fails)

* removing the Temp/Should be moved to the Opt Guide

* shuffled the tput/latency/etc info into separated documents. also the following docs moved from the temp into specific feature, general product desc or corresponding plugins

-   openvino_docs_IE_DG_Model_caching_overview
-   openvino_docs_IE_DG_Int8Inference
-   openvino_docs_IE_DG_Bfloat16Inference
-   openvino_docs_OV_UG_NoDynamicShapes

* fixed toc for ov_dynamic_shapes.md

* referring the openvino_docs_IE_DG_Bfloat16Inference to avoid docs compilation errors

* fixed main product TOC, removed ref from the second-level items

* reviewers remarks

* reverted the openvino_docs_OV_UG_NoDynamicShapes

* reverting openvino_docs_IE_DG_Bfloat16Inference and openvino_docs_IE_DG_Int8Inference

* "No dynamic shapes" to the "Dynamic shapes" as TOC

* removed duplication

* minor brushing

* Caching to the next level in TOC

* brushing

* more on the perf counters ( for latency and dynamic cases)

Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>

* Updated common IE pipeline infer-request section (#10844)

* Updated common IE pipeline infer-reqest section

* Update ov_infer_request.md

* Apply suggestions from code review

Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>

Co-authored-by: Maxim Shevtsov <maxim.y.shevtsov@intel.com>
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>

* DOCS: Removed useless 4 spaces in snippets (#10870)

* Updated snippets

* Added link to encryption

* [DOCS] ARM CPU plugin docs (#10885)

* initial commit

ARM_CPU.md added
ARM CPU is added to the list of supported devices

* Update the list of supported properties

* Update Device_Plugins.md

* Update CODEOWNERS

* Removed quotes in limitations section

* NVIDIA and Android are added to the list of supported devices

* Added See Also section and reg sign to arm

* Added Preprocessing acceleration section

* Update the list of supported layers

* updated list of supported layers

* fix typos

* Added support disclaimer

* update trade and reg symbols

* fixed typos

* fix typos

* reg fix

* add reg symbol back

Co-authored-by: Vitaly Tuzov <vitaly.tuzov@intel.com>

* Try to fix visualization (#10896)

* Try to fix visualization

* New try

* Update Install&Deployment for migration guide to 22/1 (#10933)

* updates

* update

* Getting started improvements (#10948)

* Onnx updates (#10962)

* onnx changes

* onnx updates

* onnx updates

* fix broken anchors api reference (#10976)

* add ote repo (#10979)

* DOCS: Increase content width (#10995)

* fixes

* fix

* Fixed compilation

Co-authored-by: Maxim Shevtsov <maxim.y.shevtsov@intel.com>
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
Co-authored-by: Aleksandr Voron <aleksandr.voron@intel.com>
Co-authored-by: Vitaly Tuzov <vitaly.tuzov@intel.com>
Co-authored-by: Ilya Churaev <ilya.churaev@intel.com>
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
Co-authored-by: Victoria Yashina <victoria.yashina@intel.com>
Co-authored-by: Nikolay Tyukaev <nikolay.tyukaev@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: docs OpenVINO documentation ported to master Ported from 2022.x branches to master
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants