-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[RUNTIME][CLML] OpenCLML tuning and profiling enhanced (#13843)
* [RUNTIME][CLML] OpenCLML tuning and profiling enhanced Tuning cache bin is serialized through DMLC::Stream to support multiple CLML sub graphs with in a tvm module. Individual tuning cache blobs are saved to same output file. New API on OpenCLWorkspace to enable or disable profiling on command queue rather doing this only when Timer is invoked. This is required to perform CLML operator tuning. CLML layer profiling now uses OpenCL Timer interface. This PR also fix avoiding pad operator offloading at the very first layer (to be specific before at least one convolution layer) due to the limitation of CLML pad operator is concerned about layout. Please refer to CLML SDK documentation for more details. * Update src/runtime/opencl/opencl_common.h Co-authored-by: Egor Churaev <egor.churaev@gmail.com> * * review comments --------- Co-authored-by: Egor Churaev <egor.churaev@gmail.com>
- Loading branch information
1 parent
10d6c17
commit 3c81d9b
Showing
2 changed files
with
111 additions
and
72 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters