WIP 3

openvinotoolkit · Mar 9, 2022 · fd895b8 · fd895b8
1 parent bfe54f5
commit fd895b8
Show file tree

Hide file tree

Showing 3 changed files with 9 additions and 13 deletions.
diff --git a/docs/MO_DG/prepare_model/Getting_performance_numbers.md b/docs/MO_DG/prepare_model/Getting_performance_numbers.md
@@ -19,7 +19,7 @@ You need to build your performance conclusions on reproducible data. Do the perf
 -	For time values that range too much, consider geomean.
 
 
-## Getting performance numbers using OpenVINO tool 
+## Getting performance numbers using OpenVINO's benchmark_app 
 
 To get performance numbers please use the dedicated [Benchmark App](../../../samples/cpp/benchmark_app/README.md) sample that is the best way to produce the performance reference.
 It has a lot of device-specific knobs, but the primary usage is as simple as: 
@@ -34,7 +34,7 @@ $ ./benchmark_app –d CPU –m <model> -i <input>
 to execute on the CPU instead.
 
 Each of the [OpenVINO supported devices](../OV_Runtime_UG/supported_plugins/Device_Plugins.md) offers a bunch of performance settings that have a command-line equivalents in the [Benchmark App](../../../samples/cpp/benchmark_app/README.md).
-While these settings provide really low-level control and allow to leverage the optimal model performance on the _specific_ device, we suggest to always start the performance evaluation with trying the [OpenVINO High-Level Performance Hints](../OV_Runtime_UG/performance_hints.md) first:
+While these settings provide really low-level control and allow to leverage the optimal model performance on the _specific_ device, we suggest to always start the performance evaluation with trying the [OpenVINO High-Level Performance Hints](../../OV_Runtime_UG/performance_hints.md) first:
  - benchmark_app **-hint tput** -d 'device' -m 'path to your favorite model'
  - benchmark_app **-hint latency** -d 'device' -m 'path to your favorite model'
 

diff --git a/docs/optimization_guide/dldt_deployment_optimization_guide.md b/docs/optimization_guide/dldt_deployment_optimization_guide.md
@@ -57,17 +57,16 @@ Compared with the batching, the parallelism is somewhat transposed (i.e. perform
 ![](../img/cpu_streams_explained.png)
 
 ### Automatic Batching Internals <a name="ov-auto-batching"></a>
-As explained in the section on the [automatic batching](../OV_Runtime_UG/automatic_batching.md), the feature performs on-the-fly grouping of the inference requests toimprove device utilization.
+As explained in the section on the [automatic batching](../OV_Runtime_UG/automatic_batching.md), the feature performs on-the-fly grouping of the inference requests to improve device utilization.
 The Automatic Batching relaxes the requirement for an application to saturate devices like GPU by _explicitly_ using a large batch. It essentially it performs transparent inputs gathering from 
 individual inference requests followed by the actual batched execution, with no programming effort from the user:
 ![](../img/BATCH_device.PNG)
 
-Essentially, the Automatic Batching shifts the asynchronousity from the individual requests to the groups of requests that constitute the batches. Thus, for the execution to be efficient it is very important that the requests arrive timely, without causing a timeout. Normally, the timeout should never be hit. It is rather a graceful way to handle the application exit (when the inputs are not arriving anymore, so the full batch is not possible to collect). So if your workload experiences the timeouts (which would result i the performance drop, as, when happened, the timeout value adds itself to the latency of every request), consider balancing the timeout value vs the batch size. For example in many cases having smaller timeout value/batch size may yield better performance than large batch size, but coupled with the timeout value that is cannot guarantee accommodating the full number of the required requests. 
-  TBD
+Essentially, the Automatic Batching shifts the asynchronousity from the individual requests to the groups of requests that constitute the batches. Thus, for the execution to be efficient it is very important that the requests arrive timely, without causing a batching timeout. Normally, the timeout should never be hit. It is rather a graceful way to handle the application exit (when the inputs are not arriving anymore, so the full batch is not possible to collect). So if your workload experiences the timeouts (resulting in the performance drop, as the timeout value adds itself to the latency of every request), consider balancing the timeout value vs the batch size. For example in many cases having smaller timeout value/batch size may yield better performance than large batch size, but coupled with the timeout value that is cannot guarantee accommodating the full number of the required requests.
 
 ## OpenVINO Async API <a name="ov-async-api"></a>
 
-OpenVINO Async API can improve overall throughput rate of the application. While a device is busy with the inference, the application can do other things in parallel rather than wait for the inference to complete.
+OpenVINO Async API can improve overall throughput rate of the application. While a device is busy with the inference, the application can do other things in parallel (e.g. populating inputs or even scheduling other requests) rather than wait for the inference to complete.
 
 In the example below, inference is applied to the results of the video decoding. So it is possible to keep two parallel infer requests, and while the current is processed, the input frame for the next is being captured. This essentially hides the latency of capturing, so that the overall frame rate is rather determined only by the slowest part of the pipeline (decoding IR inference) and not by the sum of the stages.
 
@@ -88,11 +87,11 @@ You can compare the pseudo-codes for the regular and async-based approaches:
 The technique can be generalized to any available parallel slack. For example, you can do inference and simultaneously encode the resulting or previous frames or run further inference, like emotion detection on top of the face detection results.
 Refer to the [Object Detection С++ Demo](@ref omz_demos_object_detection_demo_cpp), [Object Detection Python Demo](@ref omz_demos_object_detection_demo_python)(latency-oriented Async API showcase) and [Benchmark App Sample](../../samples/cpp/benchmark_app/README.md) for complete examples of the Async API in action.
 
-## Request-Based API and “GetBlob” Idiom <a name="new-request-based-api"></a>
+## Request-Based API and the "get_tensor" Idiom <a name="new-request-based-api"></a>
 
-Infer Request based API offers two types of request: Sync and Async. The Sync is considered below. The Async splits (synchronous) `Infer` into `StartAsync` and `Wait` (see <a href="#ie-async-api">OpenVINO Async API</a>).
+The API of the inference requests offers Sync and Async execution. While the Sync serializes the execution flow, the Async splits (synchronous) `Infer` into `StartAsync` and `Wait` (see <a href="#ie-async-api">OpenVINO Async API</a>).
 
-More importantly, an infer request encapsulates the reference to the “executable” network and actual inputs/outputs. Now, when you load the network to the plugin, you get a reference to the executable network (you may consider that as a queue). Actual infer requests are created by the executable network:
+An infer request encapsulates the reference to the "compiled" model and actual inputs/outputs:
 
 ```sh
 
@@ -144,6 +143,4 @@ relu5_9_x2    OPTIMIZED_OUT     layerType: ReLU             realTime: 0
 ```
 
 This contains layers name (as seen in IR), layers type and execution statistics. Notice the `OPTIMIZED_OUT`, which indicates that the particular activation was fused into adjacent convolution.
-
-
-TODO:execution graphs
+Both benchmark_app versions also support "exec_graph_path" command-line option governing the OpenVINO to output the same per-layer execution statistics, but in the form of the plugin-specific (Netron-viewable)[https://netron.app/] graph to the specified file.
diff --git a/docs/snippets/dldt_optimization_guide9.cpp b/docs/snippets/dldt_optimization_guide9.cpp
@@ -1,7 +1,6 @@
 #include <ie_core.hpp>
 
 int main() {
-using namespace InferenceEngine;
 //! [part9]
 while(true) {
     // capture frame