Take #3

microsoft · Nov 26, 2020 · 3eb9349 · guoyu-wang · Nov 30, 2020 · skottmckay
1 parent 6ff11f2
commit 3eb9349
Show file tree

Hide file tree

Showing 3 changed files with 135 additions and 17 deletions.
diff --git a/docs/ONNX_Runtime_Mobile_NNAPI_perf_considerations.md b/docs/ONNX_Runtime_Mobile_NNAPI_perf_considerations.md
@@ -0,0 +1,112 @@
+# ONNX Runtime Mobile: Performance Considerations When Using NNAPI
+
+ONNX Runtime Mobile with the NNAPI Execution Provider (EP) can be used to execute ORT format models on Android platforms using NNAPI. This document explains the details of how different optimizations affect performance, and provides some suggestions for performance testing with ORT format models. 
+
+Please review the introductory details for [using NNAPI with ONNX Runtime Mobile](ONNX_Runtime_for_Mobile_Platforms.md#Using-NNAPI-with-ONNX-Runtime-Mobile) first.
+
+
+## ONNX Model Optimization Example
+
+ONNX Runtime applies optimizations to the ONNX model to improve inferencing performance. These optimizations occur prior to exporting an ORT format model. See the [graph optimization](ONNX_Runtime_Graph_Optimizations.md) documentation for further details of the available optimizations.
+
+It is important to understand how the different optimization levels affect the nodes in the model, as this will determine how much of the model can be executed using NNAPI.
+
+*Basic*
+
+The _basic_ optimizations remove redundant nodes and perform constant folding. Only ONNX operators are used by these optimizations when modifying the model.
+
+*Extended*
+
+The _extended_ optimizations replace one or more standard ONNX operators with custom internal ONNX Runtime operators to boost performance. Each optimization has a list of EPs that it is valid for. It will only replace nodes that are assigned to that EP, and the replacement node will be executed using the same EP.
+
+*Layout*
+
+_Layout_ optimizations are hardware specific, and should not be used when creating ORT format models.
+
+### Outcome of optimizations when creating an optimized ORT format model
+
+Below is an example of the changes that occur in _basic_ and _extended_ optimizations when applied to the MNIST model with only the CPU EP enabled.
+
+  - At the _basic_ level we combine the Conv and Add nodes (the addition is done via the 'B' input to Conv), we combine the MatMul and Add into a single Gemm node (the addition is done via the 'C' input to Gemm), and constant fold to remove one of the Reshape nodes. 
+  - At the _extended_ level we additionally fuse the Conv and Relu nodes using the internal ONNX Runtime FusedConv operator.
+
+<img align="center" src="images/mnist_optimization.png" alt="Changes to nodes from basic and extended optimizations."/>
+
+If we were to load the result of these optimizations as ORT format models on an Android device, all nodes would execute using the CPU EP by default. 
+
+### Outcome of loading an optimized ORT format model with NNAPI enabled
+
+If the NNAPI EP is enabled, it is given an opportunity to select the nodes it can execute after the model is loaded. When doing so it will group as many nodes together as possible to minimize the overhead of copying data between the CPU and NNAPI to execute the nodes. Each group of nodes can be considered as a sub-graph. The more nodes in each sub-graph, and the fewer sub-graphs, the better the performance will be.
+
+For each sub-graph, the NNAPI EP will create an NNAPI model that replicates the processing of the original nodes. It will create a function that executes this NNAPI model and performs any required data copies between CPU and NNAPI. ONNX Runtime will replace the original nodes in the loaded model with a single node that calls this function.
+
+If the NNAPI EP is not enabled, or can not process a node, the node will be executed using the CPU EP.
+
+Below is an example for the MNIST model comparing what happens to the ORT format model created with _basic_ or _extended_ optimizations when loaded with the NNAPI EP enabled.
+
+As the _basic_ level optimizations result in a model that only uses ONNX operators, the NNAPI EP is able to handle the majority of the model in a single function, as NNAPI can execute all the Conv, Relu and MaxPool nodes at once.
+
+The _extended_ level optimizations introduced the custom FusedConv nodes, resulting in two functions using NNAPI, each handling a single MaxPool node. As the NNAPI EP is only aware of ONNX operators it ignores the FusedConv nodes. The performance of this model is likely to be significantly worse than running it using only the CPU EP due to the device copies between CPU and NNAPI.
+
+<img align="center" src="images/mnist_optimization_with_nnapi.png" alt="Changes to nodes by NNAPI depending on optimization level of input.">
+
+## Initial Performance Testing
+
+The best optimization settings will differ by model. Some models may perform better with NNAPI, some models may not. As the performance will be model specific you must performance test to determine the best combination for your model.
+
+It is suggested to run performance tests:
+  - with NNAPI enabled and an ORT format model created with _basic_ level optimization
+  - with NNAPI disabled and an ORT format model created with _extended_ level optimization 
+
+For most scenarios it is expected that one of these two approaches will yield the best performance.
+
+If using an ORT format model with _basic_ level optimizations and NNAPI yields the best performance, in some cases it _may_ be possible to slightly improve performance by creating an NNAPI-aware ORT format model. The difference with this model is that the _extended_ optimizations are applied to nodes that can not be executed using NNAPI. Whether any nodes fall into this category is model dependent. 
+
+
+## Creating an NNAPI-aware ORT format model
+
+An NNAPI-aware ORT format model will keep all nodes from the ONNX model that can be executed using NNAPI, and allow _extended_ optimizations to be applied to any remaining nodes.
+
+For our MNIST model that would mean the nodes in the red shading are kept, and nodes in the green shading could have _extended_ optimizations applied to them.
+
+<img align="center" src="images/nnapi_aware_ort_format_model.png" alt="Show nodes that are preserved as NNAPI can execute them, and nodes that are considered by extended optimizations.">
+
+To create an NNAPI-aware ORT format model please follow these steps.
+
+1. Create a 'full' build of ONNX Runtime with the NNAPI EP by [building ONNX Runtime from source](https://github.com/microsoft/onnxruntime/blob/master/BUILD.md#start-baseline-cpu). 
+
+    This build can be done on any platform, as the NNAPI EP can be used to create the ORT format model without the Android NNAPI library as there is no model execution in this process. When building add `--use_nnapi --build_shared_lib --build_wheel` to the build flags if any of those are missing.
+
+    Do NOT add the --minimal_build` flag.
+    - Windows :
+        ```
+        <ONNX Runtime repository root>\build.bat --config RelWithDebInfo --use_nnapi --build_shared_lib --build_wheel --parallel
+        ```
+
+    - Linux:
+        ```
+        <ONNX Runtime repository root>/build.sh --config RelWithDebInfo --use_nnapi --build_shared_lib --build_wheel --parallel
+        ```
+
+  - **NOTE** if you have previously done a minimal build you will need to run `git reset --hard` to make sure any operator kernel exclusions are reversed prior to performing the 'full' build. If you do not, you may not be able to load the ONNX format model due to missing kernels.
+
+2. Install the python wheel from the build output directory.
+
+    - Windows : This is located in `build/Windows/<config>/<config>/dist/<package name>.whl`. 
+    
+    - Linux : This is located in `build/Linux/<config>/dist/<package name>.whl`.
+    
+        The package name will differ based on your platform, python version, and build parameters. `<config>` is the value from the `--config` parameter from the build command.
+        ```
+            pip install -U build\Windows\RelWithDebIfo\RelWithDebIfo\dist\onnxruntime_noopenmp-1.5.2-cp37-cp37m-win_amd64.whl
+        ```
+
+3. Create an NNAPI-aware ORT format model by running `convert_onnx_models_to_ort.py` as per the [standard instructions](ONNX_Runtime_for_Mobile_Platforms.md#Create-ORT-format-model-and-configuration-file-with-required-operators), with NNAPI enabled (`--use_nnapi`), and the optimization level set to _extended_ (`--optimization_level extended`). This will allow extended level optimizations to run on any nodes that NNAPI can not handle.
+
+        ```
+        python <ORT repository root>/tools/python/convert_onnx_models_to_ort.py --use_nnapi --optimization_level extended /models
+        ```
+
+    The python package from your 'full' build with NNAPI enabled must be installed for `--use_nnapi` to be a valid option
+
+This model can be used with [a minimal build that includes the NNAPI EP](ONNX_Runtime_for_Mobile_Platforms.md#Create-a-minimal-build-for-Android-with-NNAPI-support).
diff --git a/docs/ONNX_Runtime_for_Mobile_Platforms.md b/docs/ONNX_Runtime_for_Mobile_Platforms.md
@@ -2,7 +2,7 @@
 
 ## Overview
 
-<img align="left" width=40% src="images/Mobile.png" alt="Steps to build the reduced binary size."/>
+<img align="left" width=40% src="images/Mobile.png" alt="Steps to build reduce the binary size."/>
 
 ONNX Runtime now supports an internal model format to minimize the build size for usage in mobile and embedded scenarios. An ONNX model can be converted to an internal ONNX Runtime format ('ORT format model') using the below instructions.
 
@@ -54,8 +54,8 @@ The follow options can be used to reduce the build size. Enable all options that
   - Enable minimal build (`--minimal_build`)
     - A minimal build will ONLY support loading and executing ORT format models. 
     - RTTI is disabled by default in this build, unless the Python bindings (`--build_wheel`) are enabled. 
-    - If you wish to enable a compiling execution provider such as NNAPI specify `--minimal_build extended`. 
-      - See [here](#Enabling-Execution-Providers-that-compile-kernels-in-a-minimal-build) for more information
+    - If you wish to enable execution providers that compile kernels such as NNAPI specify `--minimal_build extended`. 
+      - See [here](#Using-NNAPI-with-ONNX-Runtime-Mobile) for more information about using NNAPI with ONNX Runtime Mobile on Android platforms
 
   - Disable exceptions (`--disable_exceptions`)
     - Disables support for exceptions in the build.
@@ -120,26 +120,32 @@ so.add_session_config_entry('session.load_model_format', 'ORT')
 session = onnxruntime.InferenceSession(<path to model>, so)
 ```
 
-## Enabling Execution Providers that compile kernels in a minimal build
+## Using NNAPI with ONNX Runtime Mobile
 
-It is possible to enable execution providers that compile kernels in a minimal build. 
-Currently the NNAPI execution provider is the only compiling execution provider that has support for running in a minimal build.
+Using the NNAPI Execution Provider on Android platforms is now supported by ONNX Runtime Mobile. A minimal build targeting Android with NNAPI support must be created. An ORT format model that only uses ONNX operators is also recommended as a starting point.
 
-### Create ORT format model limited to ONNX operators
+For a more in-depth analysis of the performance considerations when using NNAPI with an ORT format model please see [ONNX Runtime Mobile: Performance Considerations When Using NNAPI](ONNX_Runtime_Mobile_NNAPI_perf_considerations.md). 
 
-Specify `--optimization_level basic` when running `tools\python\convert_onnx_models_to_ort.py` as per above [instructions](#1-Create-ORT-format-model-and-configuration-file-with-required-operators) . 
+### Limit ORT format model to ONNX operators
 
-This will result in a model that only uses ONNX operators. All nodes that NNAPI could handle are preserved, at the cost of any higher level optimizations that may have been possible.
+The NNAPI Execution Provider is only able to execute ONNX operators using NNAPI. When creating the ORT format model it is recommended to limit the optimization level to 'basic' so that custom internal ONNX Runtime operators are not added by the 'extended' optimizations. This will ensure that the maximum number of nodes can be executed using NNAPI. See the [graph optimization](ONNX_Runtime_Graph_Optimizations.md) documentation for details on the optimization levels.
 
-It is possible to perform additional steps to allow the higher level optimizations to be applied. See [here](Advanced_Minimal_Build_usage.md) for more details.
+To limit the optimization level when creating the ORT format models using `tools\python\convert_onnx_models_to_ort.py` as per the above [instructions](#1-Create-ORT-format-model-and-configuration-file-with-required-operators), add `--optimization_level basic` to the arguments.
+  - e.g. `python <ORT repository root>/tools/python/convert_onnx_models_to_ort.py --optimization_level basic /models`
 
-### Create a minimal build with NNAPI support
-NOTE: A minimal build with full NNAPI support can only be created for the Android platform as NNAPI is only available on Android. 
-See [these](https://github.com/microsoft/onnxruntime/blob/master/BUILD.md#Android-NNAPI-Execution-Provider) instructions for details on creating an Android build with NNAPI included. 
+### Create a minimal build for Android with NNAPI support
 
-  - Follow the above instructions to [create the minimal build](#2-Create-the-minimal-build), with the following changes:
-    - Replace `--minimal_build` with `--minimal_build extended` to enable the support for execution providers that compile kernels in the minimal build.
-    - Add `--use_nnapi` to include the NNAPI execution provider in the build
+For NNAPI to be used on Android with ONNX Runtime Mobile, the NNAPI Execution Provider must be included in the minimal build.
+
+First, read the general instructions for [creating an Android build with NNAPI included](https://github.com/microsoft/onnxruntime/blob/master/BUILD.md#Android-NNAPI-Execution-Provider). These provide details on setting up the components required to create an Android build of ONNX Runtime, such as the Android NDK.
+
+Once you have all the necessary components setup, follow the instructions to [create the minimal build](#2-Create-the-minimal-build), with the following changes:
+  - Replace `--minimal_build` with `--minimal_build extended` to enable support for execution providers that dynamically create kernels at runtime, which is needed by the NNAPI Execution Provider.
+  - Add `--use_nnapi` to include the NNAPI Execution Provider in the build
+  - Windows example:  
+    `<ONNX Runtime repository root>.\build.bat --config RelWithDebInfo --android --android_sdk_path D:\Android --android_ndk_path D:\Android\ndk\21.1.6352462\ --android_abi arm64-v8a --android_api 29 --cmake_generator Ninja --minimal_build extended --use_nnapi --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config <config file produced by step 1>`
+  - Linux example:  
+    `<ONNX Runtime repository root>./build.sh --config RelWithDebInfo --android --android_sdk_path /Android --android_ndk_path /Android/ndk/21.1.6352462/ --android_abi arm64-v8a --android_api 29 --minimal_build extended --use_nnapi --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config <config file produced by step 1>`
 
 ## Limitations
 
@@ -152,7 +158,7 @@ A minimal build has the following limitations currently:
   - Limited support for runtime partitioning (assigning nodes in a model to specific execution providers)
     - Execution providers that statically register kernels and will be used at runtime MUST be enabled when creating the ORT format model
     - Execution providers that compile nodes are optionally supported, and nodes they create will be correctly partitioned at runtime
-      - currently this is limited to the NNAPI execution provider
+      - currently this is limited to the NNAPI Execution Provider
   - No support for custom operators
 
 We do not currently offer backwards compatibility guarantees for ORT format models, as we will be expanding the capabilities in the short term and may need to update the internal format in an incompatible manner to accommodate these changes. You may need to regenerate the ORT format models to use with a future version of ONNX Runtime. Once the feature set stabilizes we will provide backwards compatibility guarantees.

diff --git a/docs/images/nnapi_aware_ort_format_model.png b/docs/images/nnapi_aware_ort_format_model.png