Perf Hints docs and General Opt Guide refactoring #10815

myshevts · 2022-03-05T14:43:06Z

I have removed many sections that are either obsolete or need a heavy refactoring, for example the HETERO sections that was FPGA-oriented, also a tools section (e.g. VTune, etc), interop section, etc

docs/MO_DG/prepare_model/Getting_performance_numbers.md

docs/OV_Runtime_UG/performance_hints.md

tsavina · 2022-03-10T10:56:11Z

docs/OV_Runtime_UG/performance_hints.md

+
+## (Optional) Additional Hints from the App
+Let's take an example  of an application that processes 4 video streams.  The most future-proof way to communicate the limitation of the parallel slack is to equip the performance hint with the optional `ov::hint::num_requests` configuration key set to 4. 
+As discussed previosly, for the GPU this will limit the batch size, for the CPU - the number of inference streams, so each device uses the `ov::hint::num_requests` while converting the hint to the actual device configuration options:


Suggested change

As discussed previosly, for the GPU this will limit the batch size, for the CPU - the number of inference streams, so each device uses the `ov::hint::num_requests` while converting the hint to the actual device configuration options:

For the GPU this will limit the batch size, for the CPU - the number of inference streams, so each device uses the `ov::hint::num_requests` while converting the hint to the actual device configuration options:

tsavina · 2022-03-10T10:56:54Z

docs/OV_Runtime_UG/performance_hints.md

+## Combining the Hints and Individual Low-Level Settings
+
+## Testing the Performance of THe Hints with the Benchmark_App
+The `benchmark_app`, that exists in both  [C++](../../samples/cpp/benchmark_app/README.md) and [Python](../../tools/benchmark_tool/README.md) versions, is the best way to evaluate the performance of the performaqnce hints for a particular device:


Suggested change

The `benchmark_app`, that exists in both [C++](../../samples/cpp/benchmark_app/README.md) and [Python](../../tools/benchmark_tool/README.md) versions, is the best way to evaluate the performance of the performaqnce hints for a particular device:

The `benchmark_app`, that exists in both [C++](../../samples/cpp/benchmark_app/README.md) and [Python](../../tools/benchmark_tool/README.md) versions, is the best way to evaluate the performance of the performance hints for a particular device:

docs/OV_Runtime_UG/performance_hints.md

docs/optimization_guide/dldt_deployment_optimization_guide.md

ilya-lavrenov · 2022-03-14T12:58:03Z

docs/optimization_guide/dldt_deployment_optimization_guide.md

+
+In summary, when the performance _portability_ is of concern, consider the [High-Level Performance Hints](../OV_Runtime_UG/performance_hints.md). 
+Below you can find the implementation details (particularly how the OpenVINO implements the 'throughput' approach) for the specific devices. 
+Keep in mind that while multiple scheduling and/or batching approaches (combining individual inference requests) can work together, the hints make these decisions to be transparent to the application.


the paragraph above provides a lot of text, but no code examples to use this mode. Can we have a single example at least?

docs/OV_Runtime_UG/performance_hints.md

docs/optimization_guide/dldt_deployment_optimization_guide.md

docs/OV_Runtime_UG/performance_hints.md

… following docs moved from the temp into specific feature, general product desc or corresponding plugins - openvino_docs_IE_DG_Model_caching_overview - openvino_docs_IE_DG_Int8Inference - openvino_docs_IE_DG_Bfloat16Inference - openvino_docs_OV_UG_NoDynamicShapes

…pilation errors

…DG_Int8Inference

ilya-lavrenov

review will be provided later

ilya-lavrenov · 2022-03-17T08:10:02Z

docs/MO_DG/prepare_model/Getting_performance_numbers.md

@@ -9,22 +9,19 @@ When evaluating performance of your model with the OpenVINO Runtime, you must me

 - Track separately the operations that happen outside the OpenVINO Runtime, like video decoding. 

-> **NOTE**: Some image pre-processing can be baked into the IR and accelerated. For more information, refer to [Embedding Preprocessing Computation](Additional_Optimizations.md)
+> **NOTE**: Some image pre-processing can be baked into the IR and accelerated accordingly. For more information, refer to [Embedding the Preprocessing](Additional_Optimizations.md). Also consider [_runtime_ preprocessing optimizations](../../optimization_guide/dldt_deployment_optimization_common).


this link does not work. Looks like .md is missed at the end

ilya-lavrenov · 2022-03-17T08:11:16Z

docs/MO_DG/prepare_model/Getting_performance_numbers.md

+-	Ensure the inputs are identical for the OpenVINO Runtime and the framework. For example, beware of random values that can be used to populate the inputs.
+-	Consider [Image Pre-processing and Conversion](../../OV_Runtime_UG/preprocessing_overview.md), while any user-side pre-processing should be tracked separately.
+-   When applicable, leverage the [Dynamic Shapes support](../../OV_Runtime_UG/ov_dynamic_shapes.md)
+-	If possible, demand the same accuracy. For example, TensorFlow allows `FP16` execution, so when comparing to that, make sure to test the OpenVINO Runtime with the `FP16` as well.


can / should we refer to inference_precision hint here?

ilya-lavrenov · 2022-03-17T08:23:09Z

docs/snippets/ov_auto_batching.cpp

@@ -41,5 +41,14 @@ auto compiled_model = core.compile_model(model, "GPU",
 //! [hint_num_requests]
 }

+//! [hint_plus_low_level]
+{


move global {} outside of doxygen markers hint_plus_low_level and remove indentation.

ilya-lavrenov · 2022-03-17T08:24:38Z

docs/OV_Runtime_UG/performance_hints.md

+ - benchmark_app **-hint tput** -d 'device' -m 'path to your model'
+ - benchmark_app **-hint latency** -d 'device' -m 'path to your model'
+-  Disabling the hints to emulate the pre-hints era (highly recommended before trying the individual low-level settings, such as the number of streams as below, threads, etc):
+- - benchmark_app **-hint none -nstreams 1**  -d 'device' -m 'path to your model'


should we provide such examples as cmd line? BTW, current line contain double '-' and rendered incorrectly

ilya-lavrenov · 2022-03-17T08:34:00Z

docs/optimization_guide/dldt_deployment_optimization_guide.md

+   openvino_docs_deployment_optimization_guide_common
+   openvino_docs_deployment_optimization_guide_latency
+   openvino_docs_deployment_optimization_guide_tput
+   openvino_docs_deployment_optimization_guide_hints


should we split the text in this document according to these sections? Now it's a plain text, so it's hard to pay attention on specific use case: latency vs tput. E.g. if I want latency, I don't want to read the whole stuff to find phrases about latency and filter out about tput

This document does not guide me, it provides a list of all capabilities.

ilya-lavrenov · 2022-03-17T08:36:04Z

docs/optimization_guide/dldt_deployment_optimization_common.md

+A typical use-case for the `ov::InferRequest::infer()` is running a dedicated application thread per source of inputs (e.g. a camera), so that every step (frame capture, processing, results parsing and associated logic) is kept serial within the thread.
+In contrast, the `ov::InferRequest::start_async()` and `ov::InferRequest::wait()` allow the application to continue its activities and poll or wait for the inference completion when really needed. So one reason for using asynchronous code is _efficiency_.
+
+**NOTE**: Although the Synchronous API can be somewhat easier to start with, in the production code always prefer to use the Asynchronous (callbacks-based, below) API, as it is the most general and scalable way to implement the flow control for any possible number of requests (and hence both latency and throughput scenarios).


NOTE should start with '>' to make proper indentation

ilya-lavrenov · 2022-03-17T08:37:53Z

docs/optimization_guide/dldt_deployment_optimization_common.md

+Notice that the Async's `ov::InferRequest::wait()` waits for the specific request only. However, running multiple inference requests in parallel provides no guarantees on the completion order. This may complicate a possible logic based on the `ov::InferRequest::wait`. The most scalable approach is using callbacks (set via the `ov::InferRequest::set_callback`) that are executed upon completion of the request. The callback functions will be used by the OpenVINO runtime to notify on the results (or errors. 
+This is more event-driven approach.
+
+Few important points on the callbacks:


can we duplicate these notes about callback to documentation of set_callback itself?

ilya-lavrenov · 2022-03-17T08:39:46Z

docs/optimization_guide/dldt_deployment_optimization_latency.md

+However, for example, specific configurations, like multi-socket CPUs can deliver as high number of requests (at the same minimal latency) as there are NUMA nodes in the machine.
+Thus, human expertise is required to get the most out of the device even in the latency case. Consider using [OpenVINO high-level performance hints](../OV_Runtime_UG/performance_hints.md) instead.
+
+**NOTE**: [OpenVINO performance hints](./dldt_deployment_optimization_hints.md) is a recommended way for performance configuration, which is both device-agnostic and future-proof. 


) * Brushed the general optimization page * Opt GUIDE, WIP * perf hints doc placeholder * WIP * WIP2 * WIP 3 * added streams and few other details * fixed titles, misprints etc * Perf hints * movin the runtime optimizations intro * fixed link * Apply suggestions from code review Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> * some details on the FIL and other means when pure inference time is not the only factor * shuffled according to general->use-case->device-specifics flow, minor brushing * next iter * section on optimizing for tput and latency * couple of links to the features support matrix * Links, brushing, dedicated subsections for Latency/FIL/Tput * had to make the link less specific (otherwise docs compilations fails) * removing the Temp/Should be moved to the Opt Guide * shuffled the tput/latency/etc info into separated documents. also the following docs moved from the temp into specific feature, general product desc or corresponding plugins - openvino_docs_IE_DG_Model_caching_overview - openvino_docs_IE_DG_Int8Inference - openvino_docs_IE_DG_Bfloat16Inference - openvino_docs_OV_UG_NoDynamicShapes * fixed toc for ov_dynamic_shapes.md * referring the openvino_docs_IE_DG_Bfloat16Inference to avoid docs compilation errors * fixed main product TOC, removed ref from the second-level items * reviewers remarks * reverted the openvino_docs_OV_UG_NoDynamicShapes * reverting openvino_docs_IE_DG_Bfloat16Inference and openvino_docs_IE_DG_Int8Inference * "No dynamic shapes" to the "Dynamic shapes" as TOC * removed duplication * minor brushing * Caching to the next level in TOC * brushing * more on the perf counters ( for latency and dynamic cases) Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>

ilya-lavrenov · 2022-03-18T12:21:31Z

Ported as a part of #11040, please, check that everything is correct

* Added migration for deployment (#10800) * Added migration for deployment * Addressed comments * more info after the What's new Sessions' questions (#10803) * more info after the What's new Sessions' questions * generalizing the optimal_batch_size vs explicit value message * Update docs/OV_Runtime_UG/automatic_batching.md Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> * Update docs/OV_Runtime_UG/automatic_batching.md Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> * Update docs/OV_Runtime_UG/automatic_batching.md Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> * Update docs/OV_Runtime_UG/automatic_batching.md Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> * Update docs/OV_Runtime_UG/automatic_batching.md Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> * Update docs/OV_Runtime_UG/automatic_batching.md Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> * Perf Hints docs and General Opt Guide refactoring (#10815) * Brushed the general optimization page * Opt GUIDE, WIP * perf hints doc placeholder * WIP * WIP2 * WIP 3 * added streams and few other details * fixed titles, misprints etc * Perf hints * movin the runtime optimizations intro * fixed link * Apply suggestions from code review Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> * some details on the FIL and other means when pure inference time is not the only factor * shuffled according to general->use-case->device-specifics flow, minor brushing * next iter * section on optimizing for tput and latency * couple of links to the features support matrix * Links, brushing, dedicated subsections for Latency/FIL/Tput * had to make the link less specific (otherwise docs compilations fails) * removing the Temp/Should be moved to the Opt Guide * shuffled the tput/latency/etc info into separated documents. also the following docs moved from the temp into specific feature, general product desc or corresponding plugins - openvino_docs_IE_DG_Model_caching_overview - openvino_docs_IE_DG_Int8Inference - openvino_docs_IE_DG_Bfloat16Inference - openvino_docs_OV_UG_NoDynamicShapes * fixed toc for ov_dynamic_shapes.md * referring the openvino_docs_IE_DG_Bfloat16Inference to avoid docs compilation errors * fixed main product TOC, removed ref from the second-level items * reviewers remarks * reverted the openvino_docs_OV_UG_NoDynamicShapes * reverting openvino_docs_IE_DG_Bfloat16Inference and openvino_docs_IE_DG_Int8Inference * "No dynamic shapes" to the "Dynamic shapes" as TOC * removed duplication * minor brushing * Caching to the next level in TOC * brushing * more on the perf counters ( for latency and dynamic cases) Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> * Updated common IE pipeline infer-request section (#10844) * Updated common IE pipeline infer-reqest section * Update ov_infer_request.md * Apply suggestions from code review Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com> Co-authored-by: Maxim Shevtsov <maxim.y.shevtsov@intel.com> Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com> * DOCS: Removed useless 4 spaces in snippets (#10870) * Updated snippets * Added link to encryption * [DOCS] ARM CPU plugin docs (#10885) * initial commit ARM_CPU.md added ARM CPU is added to the list of supported devices * Update the list of supported properties * Update Device_Plugins.md * Update CODEOWNERS * Removed quotes in limitations section * NVIDIA and Android are added to the list of supported devices * Added See Also section and reg sign to arm * Added Preprocessing acceleration section * Update the list of supported layers * updated list of supported layers * fix typos * Added support disclaimer * update trade and reg symbols * fixed typos * fix typos * reg fix * add reg symbol back Co-authored-by: Vitaly Tuzov <vitaly.tuzov@intel.com> * Try to fix visualization (#10896) * Try to fix visualization * New try * Update Install&Deployment for migration guide to 22/1 (#10933) * updates * update * Getting started improvements (#10948) * Onnx updates (#10962) * onnx changes * onnx updates * onnx updates * fix broken anchors api reference (#10976) * add ote repo (#10979) * DOCS: Increase content width (#10995) * fixes * fix * Fixed compilation Co-authored-by: Maxim Shevtsov <maxim.y.shevtsov@intel.com> Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com> Co-authored-by: Aleksandr Voron <aleksandr.voron@intel.com> Co-authored-by: Vitaly Tuzov <vitaly.tuzov@intel.com> Co-authored-by: Ilya Churaev <ilya.churaev@intel.com> Co-authored-by: Yuan Xu <yuan1.xu@intel.com> Co-authored-by: Victoria Yashina <victoria.yashina@intel.com> Co-authored-by: Nikolay Tyukaev <nikolay.tyukaev@intel.com>

myshevts added category: docs OpenVINO documentation do not merge do_not_review labels Mar 5, 2022

myshevts assigned tsavina Mar 5, 2022

myshevts requested a review from a team as a code owner March 5, 2022 14:43

myshevts requested review from avladimi and removed request for a team March 5, 2022 14:43

myshevts force-pushed the perf-hints-docs branch 3 times, most recently from 8764140 to 6331667 Compare March 5, 2022 15:38

ilya-lavrenov added the port to master Required port to master from 2022.3 LTS label Mar 7, 2022

myshevts force-pushed the perf-hints-docs branch 2 times, most recently from fd895b8 to 15f1396 Compare March 9, 2022 15:30

tsavina reviewed Mar 10, 2022

View reviewed changes

myshevts force-pushed the perf-hints-docs branch from e8ac81b to 356cfcc Compare March 10, 2022 11:34

myshevts removed do not merge do_not_review labels Mar 10, 2022

myshevts changed the title ~~WIP: Perf hints docs~~ Perf Hints docs and General Opt Guide refactoring Mar 10, 2022

myshevts force-pushed the perf-hints-docs branch 5 times, most recently from 53c8d1a to 522f11d Compare March 15, 2022 09:05

myshevts requested review from a team as code owners March 15, 2022 11:51

myshevts requested a review from a team March 15, 2022 11:51

myshevts force-pushed the perf-hints-docs branch from 776810e to ee16abe Compare March 15, 2022 11:53

ilya-lavrenov reviewed Mar 15, 2022

View reviewed changes

yeonbok reviewed Mar 15, 2022

View reviewed changes

docs/OV_Runtime_UG/performance_hints.md Outdated Show resolved Hide resolved

yeonbok reviewed Mar 15, 2022

View reviewed changes

docs/OV_Runtime_UG/performance_hints.md Outdated Show resolved Hide resolved

myshevts added 9 commits March 16, 2022 15:02

section on optimizing for tput and latency

6dd9dbd

couple of links to the features support matrix

3fd22c4

Links, brushing, dedicated subsections for Latency/FIL/Tput

45c8d15

had to make the link less specific (otherwise docs compilations fails)

c463e47

removing the Temp/Should be moved to the Opt Guide

2223c51

fixed toc for ov_dynamic_shapes.md

0b8b1de

referring the openvino_docs_IE_DG_Bfloat16Inference to avoid docs com…

bbbdda2

…pilation errors

fixed main product TOC, removed ref from the second-level items

9bd2d25

myshevts force-pushed the perf-hints-docs branch from 7760124 to cf9c3ac Compare March 16, 2022 12:05

reviewers remarks

94d3935

myshevts force-pushed the perf-hints-docs branch from cf9c3ac to 94d3935 Compare March 16, 2022 12:06

myshevts added 2 commits March 16, 2022 15:20

reverted the openvino_docs_OV_UG_NoDynamicShapes

a77c7e4

reverting openvino_docs_IE_DG_Bfloat16Inference and openvino_docs_IE_…

895f5d5

…DG_Int8Inference

myshevts force-pushed the perf-hints-docs branch from 9b21c43 to 895f5d5 Compare March 16, 2022 13:19

myshevts added 6 commits March 16, 2022 16:21

"No dynamic shapes" to the "Dynamic shapes" as TOC

76d3b08

removed duplication

2537a54

minor brushing

5c6d649

Caching to the next level in TOC

caa90c5

brushing

a48210e

more on the perf counters ( for latency and dynamic cases)

6bb649a

myshevts force-pushed the perf-hints-docs branch from 14c3df4 to 6bb649a Compare March 16, 2022 17:19

ilya-lavrenov approved these changes Mar 17, 2022

View reviewed changes

ilya-lavrenov merged commit cbfb8a1 into openvinotoolkit:releases/2022/1 Mar 17, 2022

ilya-lavrenov reviewed Mar 18, 2022

View reviewed changes

ilya-lavrenov added ported to master Ported from 2022.x branches to master and removed port to master Required port to master from 2022.3 LTS labels Mar 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf Hints docs and General Opt Guide refactoring #10815

Perf Hints docs and General Opt Guide refactoring #10815

myshevts commented Mar 5, 2022 •

edited

Loading

tsavina Mar 10, 2022

tsavina Mar 10, 2022

ilya-lavrenov Mar 14, 2022

ilya-lavrenov left a comment

ilya-lavrenov Mar 17, 2022

ilya-lavrenov Mar 17, 2022

ilya-lavrenov Mar 17, 2022

ilya-lavrenov Mar 17, 2022

ilya-lavrenov Mar 17, 2022

ilya-lavrenov Mar 17, 2022

ilya-lavrenov Mar 17, 2022

ilya-lavrenov Mar 17, 2022

ilya-lavrenov commented Mar 18, 2022

	As discussed previosly, for the GPU this will limit the batch size, for the CPU - the number of inference streams, so each device uses the `ov::hint::num_requests` while converting the hint to the actual device configuration options:
	For the GPU this will limit the batch size, for the CPU - the number of inference streams, so each device uses the `ov::hint::num_requests` while converting the hint to the actual device configuration options:

	The `benchmark_app`, that exists in both [C++](../../samples/cpp/benchmark_app/README.md) and [Python](../../tools/benchmark_tool/README.md) versions, is the best way to evaluate the performance of the performaqnce hints for a particular device:
	The `benchmark_app`, that exists in both [C++](../../samples/cpp/benchmark_app/README.md) and [Python](../../tools/benchmark_tool/README.md) versions, is the best way to evaluate the performance of the performance hints for a particular device:

Perf Hints docs and General Opt Guide refactoring #10815

Perf Hints docs and General Opt Guide refactoring #10815

Conversation

myshevts commented Mar 5, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilya-lavrenov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilya-lavrenov commented Mar 18, 2022

myshevts commented Mar 5, 2022 •

edited

Loading