diff --git a/docs/IE_PLUGIN_DG/QuantizedNetworks.md b/docs/IE_PLUGIN_DG/QuantizedNetworks.md index fb7880b66fce61..0c8ad29c234991 100644 --- a/docs/IE_PLUGIN_DG/QuantizedNetworks.md +++ b/docs/IE_PLUGIN_DG/QuantizedNetworks.md @@ -9,7 +9,7 @@ For more details about low-precision model representation please refer to this [ During the model load each plugin can interpret quantization rules expressed in *FakeQuantize* operations: - Independently based on the definition of *FakeQuantize* operation. - Using a special library of low-precision transformations (LPT) which applies common rules for generic operations, -such as Convolution, Fully-Connected, Eltwise, etc., and translates "fake-quantized" models into the models with low-precision operations. For more information about low-precision flow please refer to the following [document](@ref openvino_docs_IE_DG_Int8Inference). +such as Convolution, Fully-Connected, Eltwise, etc., and translates "fake-quantized" models into the models with low-precision operations. For more information about low-precision flow please refer to the following [document](../OV_Runtime_UG/Int8Inference.md). Here we provide only a high-level overview of the interpretation rules of FakeQuantize. At runtime each FakeQuantize can be split into two independent operations: **Quantize** and **Dequantize**. diff --git a/docs/OV_Runtime_UG/Bfloat16Inference.md b/docs/OV_Runtime_UG/Bfloat16Inference.md index 5091901e986df5..0720a8414dff6c 100644 --- a/docs/OV_Runtime_UG/Bfloat16Inference.md +++ b/docs/OV_Runtime_UG/Bfloat16Inference.md @@ -1,4 +1,4 @@ -# Bfloat16 Inference {#openvino_docs_IE_DG_Bfloat16Inference} +# Bfloat16 Inference ## Bfloat16 Inference Usage (C++) diff --git a/docs/OV_Runtime_UG/Int8Inference.md b/docs/OV_Runtime_UG/Int8Inference.md index 20f002b2a29407..a51f1f54e173b5 100644 --- a/docs/OV_Runtime_UG/Int8Inference.md +++ b/docs/OV_Runtime_UG/Int8Inference.md @@ -1,4 +1,4 @@ -# Low-Precision 8-bit Integer Inference {#openvino_docs_IE_DG_Int8Inference} +# Low-Precision 8-bit Integer Inference ## Disclaimer diff --git a/docs/OV_Runtime_UG/ov_dynamic_shapes.md b/docs/OV_Runtime_UG/ov_dynamic_shapes.md index fdc9da8cb89dbb..b10e18ccdab417 100644 --- a/docs/OV_Runtime_UG/ov_dynamic_shapes.md +++ b/docs/OV_Runtime_UG/ov_dynamic_shapes.md @@ -1,15 +1,5 @@ # Dynamic Shapes {#openvino_docs_OV_UG_DynamicShapes} -@sphinxdirective - -.. toctree:: - :maxdepth: 1 - :hidden: - - openvino_docs_OV_UG_NoDynamicShapes - -@endsphinxdirective - As it was demonstrated in the [Changing Input Shapes](ShapeInference.md) article, there are models that support changing of input shapes before model compilation in `Core::compile_model`. Reshaping models provides an ability to customize the model input shape for exactly that size that is required in the end application. This article explains how the ability of model to reshape can further be leveraged in more dynamic scenarios. diff --git a/docs/OV_Runtime_UG/ov_without_dynamic_shapes.md b/docs/OV_Runtime_UG/ov_without_dynamic_shapes.md index 8e07d1b7821bed..9cbe23bda65e3c 100644 --- a/docs/OV_Runtime_UG/ov_without_dynamic_shapes.md +++ b/docs/OV_Runtime_UG/ov_without_dynamic_shapes.md @@ -1,4 +1,4 @@ -# When Dynamic Shapes API is Not Applicable {#openvino_docs_OV_UG_NoDynamicShapes} +# When Dynamic Shapes API is Not Applicable Several approaches to emulate dynamic shapes are considered in this chapter Apply these methods only if [native dynamic shape API](ov_dynamic_shapes.md) doesn't work for you or doesn't give desired performance. diff --git a/docs/OV_Runtime_UG/supported_plugins/CPU.md b/docs/OV_Runtime_UG/supported_plugins/CPU.md index 59f6bab39b31e0..c29ae9bf417cc8 100644 --- a/docs/OV_Runtime_UG/supported_plugins/CPU.md +++ b/docs/OV_Runtime_UG/supported_plugins/CPU.md @@ -1,12 +1,4 @@ # CPU device {#openvino_docs_OV_UG_supported_plugins_CPU} -@sphinxdirective - -.. toctree:: - :maxdepth: 1 - :hidden: - openvino_docs_IE_DG_Bfloat16Inference - -@endsphinxdirective ## Introducing the CPU Plugin The CPU plugin was developed to achieve high performance of neural networks on CPU, using the IntelĀ® Math Kernel Library for Deep Neural Networks (IntelĀ® MKL-DNN). diff --git a/docs/documentation.md b/docs/documentation.md index e4a18481222c27..f3a7f93c2ae96a 100644 --- a/docs/documentation.md +++ b/docs/documentation.md @@ -30,6 +30,11 @@ openvino_docs_MO_DG_Getting_Performance_Numbers openvino_docs_model_optimization_guide openvino_docs_deployment_optimization_guide_dldt_optimization_guide + openvino_docs_deployment_optimization_guide_common + openvino_docs_deployment_optimization_guide_latency + openvino_docs_IE_DG_Model_caching_overview + openvino_docs_deployment_optimization_guide_tput + openvino_docs_deployment_optimization_guide_hints openvino_docs_tuning_utilities openvino_docs_performance_benchmarks diff --git a/docs/optimization_guide/dldt_deployment_optimization_common.md b/docs/optimization_guide/dldt_deployment_optimization_common.md index 58b9e8801dc3c0..2e1b7c6b35899d 100644 --- a/docs/optimization_guide/dldt_deployment_optimization_common.md +++ b/docs/optimization_guide/dldt_deployment_optimization_common.md @@ -1,13 +1,5 @@ # General Runtime/Deployment Optimizations {#openvino_docs_deployment_optimization_guide_common} -@sphinxdirective - -.. toctree:: - :maxdepth: 1 - :hidden: - -@endsphinxdirective - ## Inputs Pre-processing with OpenVINO In many cases, a network expects a pre-processed image, so make sure you do not perform unnecessary steps in your code: diff --git a/docs/optimization_guide/dldt_deployment_optimization_guide.md b/docs/optimization_guide/dldt_deployment_optimization_guide.md index 0a76d5b4ba8adc..cdcdcada4eddb6 100644 --- a/docs/optimization_guide/dldt_deployment_optimization_guide.md +++ b/docs/optimization_guide/dldt_deployment_optimization_guide.md @@ -1,4 +1,4 @@ -# Introduction to Inference Runtime Optimizations {#openvino_docs_deployment_optimization_guide_dldt_optimization_guide} +# Runtime Inference Optimizations {#openvino_docs_deployment_optimization_guide_dldt_optimization_guide} @sphinxdirective @@ -6,10 +6,10 @@ :maxdepth: 1 :hidden: - openvino_docs_deployment_optimization_guide_common - openvino_docs_deployment_optimization_guide_latency - openvino_docs_deployment_optimization_guide_tput - openvino_docs_deployment_optimization_guide_hints + openvino_docs_deployment_optimization_guide_common + openvino_docs_deployment_optimization_guide_latency + openvino_docs_deployment_optimization_guide_tput + openvino_docs_deployment_optimization_guide_hints @endsphinxdirective diff --git a/docs/optimization_guide/dldt_deployment_optimization_hints.md b/docs/optimization_guide/dldt_deployment_optimization_hints.md index 137a803a65badf..c06cfc4caa2e75 100644 --- a/docs/optimization_guide/dldt_deployment_optimization_hints.md +++ b/docs/optimization_guide/dldt_deployment_optimization_hints.md @@ -1,14 +1,5 @@ # High-level Performance Hints (Presets) {#openvino_docs_deployment_optimization_guide_hints} -@sphinxdirective - -.. toctree:: - :maxdepth: 1 - :hidden: - -@endsphinxdirective - - Traditionally, each of the OpenVINO's [supported devices](../OV_Runtime_UG/supported_plugins/Supported_Devices.md) offers a bunch of low-level performance settings. Tweaking this detailed configuration requires deep architecture understanding. Also, while the resulting performance may be optimal for the specific combination of the device and the model that is inferred, it is actually neither device/model nor future-proof: diff --git a/docs/optimization_guide/dldt_deployment_optimization_latency.md b/docs/optimization_guide/dldt_deployment_optimization_latency.md index f102f86234c77a..4badeb593fe872 100644 --- a/docs/optimization_guide/dldt_deployment_optimization_latency.md +++ b/docs/optimization_guide/dldt_deployment_optimization_latency.md @@ -1,14 +1,5 @@ ## Optimizing for the Latency {#openvino_docs_deployment_optimization_guide_latency} -@sphinxdirective - -.. toctree:: - :maxdepth: 1 - :hidden: - openvino_docs_IE_DG_Model_caching_overview - -@endsphinxdirective - ## Latency Specifics A significant fraction of applications focused on the situations where typically a single model is loaded (and single input is used) at a time. This is a regular "consumer" use case and a default (also for the legacy reasons) performance setup for any OpenVINO device. diff --git a/docs/optimization_guide/dldt_deployment_optimization_tput.md b/docs/optimization_guide/dldt_deployment_optimization_tput.md index f27ab3cc764083..e3e96387dcb5bd 100644 --- a/docs/optimization_guide/dldt_deployment_optimization_tput.md +++ b/docs/optimization_guide/dldt_deployment_optimization_tput.md @@ -1,13 +1,5 @@ # General Throughput Considerations Optimization Guide {#openvino_docs_deployment_optimization_guide_tput} -@sphinxdirective - -.. toctree:: - :maxdepth: 1 - :hidden: - -@endsphinxdirective - ### General Throughput Considerations As described in the section on the [latency-specific considerations](./dldt_deployment_optimization_latency.md) one possible use-case is focused on delivering the every single request at the minimal delay. Throughput on the other hand, is about inference scenarios in which potentially large number of inference requests are served simultaneously. diff --git a/docs/optimization_guide/model_optimization_guide.md b/docs/optimization_guide/model_optimization_guide.md index 50469ea5acb1ee..cabe56776e29d3 100644 --- a/docs/optimization_guide/model_optimization_guide.md +++ b/docs/optimization_guide/model_optimization_guide.md @@ -8,7 +8,6 @@ pot_README docs_nncf_introduction - openvino_docs_IE_DG_Int8Inference @endsphinxdirective @@ -32,4 +31,5 @@ POT is the easiest way to get optimized models, and usually takes several minute ![](../img/WHAT_TO_USE.svg) ## See also -- [Deployment optimization](./dldt_deployment_optimization_guide.md) \ No newline at end of file +- [Deployment optimization](./dldt_deployment_optimization_guide.md) +- [int8 runtime specifics](../OV_Runtime_UG/Int8Inference.md) \ No newline at end of file