-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable ROCm to use tunable GEMM #12853
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
For recording purpose, the perf difference with initial try
|
773ea60
to
0148fbb
Compare
8e71431
to
14839aa
Compare
0148fbb
to
e856516
Compare
14839aa
to
4988216
Compare
e856516
to
f79b05b
Compare
4988216
to
23a3f90
Compare
f79b05b
to
833a4ea
Compare
zhangyaobit
reviewed
Sep 28, 2022
zhangyaobit
approved these changes
Sep 28, 2022
This PR is split into 2, the following #13116 the enabling and testing for it. |
cloudhan
added a commit
that referenced
this pull request
Sep 30, 2022
Reverts #12853 due to CI pipeline problem.
linnealovespie
pushed a commit
that referenced
this pull request
Sep 30, 2022
Change ROCm to use tunable GEMM. It is not enabled in this PR. This will drastically improve GEMM performance in some shapes and dtypes configuration. This will benefit the overall performance for BERT inference and hopefully, training, when enabled.
linnealovespie
pushed a commit
that referenced
this pull request
Sep 30, 2022
Reverts #12853 due to CI pipeline problem.
Merged
cloudhan
added a commit
that referenced
this pull request
Oct 5, 2022
Update for ROCm CI before reland tunable GEMM #12853. This PR also update composable kernel to use CMakes's HIP language support so that we can mix C/C++ compiler with HIP compiler instead of locking to hip-clang
yuslepukhin
pushed a commit
that referenced
this pull request
Oct 5, 2022
Update for ROCm CI before reland tunable GEMM #12853. This PR also update composable kernel to use CMakes's HIP language support so that we can mix C/C++ compiler with HIP compiler instead of locking to hip-clang
cloudhan
added a commit
that referenced
this pull request
Oct 7, 2022
Change ROCm to use tunable GEMM. It is not enabled in this PR. This will drastically improve GEMM performance in some shapes and dtypes configuration. This will benefit the overall performance for BERT inference and hopefully, training, when enabled.# This is a combination of 2 commits.
cloudhan
added a commit
that referenced
this pull request
Oct 12, 2022
Change ROCm to use tunable GEMM. It is not enabled in this PR. This will drastically improve GEMM performance in some shapes and dtypes configuration. This will benefit the overall performance for BERT inference and hopefully, training, when enabled.# This is a combination of 2 commits.
zhangyaobit
pushed a commit
that referenced
this pull request
Oct 14, 2022
Reland: Change ROCm to use tunable GEMM (#12853)
preetha-intel
pushed a commit
to intel/onnxruntime
that referenced
this pull request
Nov 11, 2022
Change ROCm to use tunable GEMM. It is not enabled in this PR. This will drastically improve GEMM performance in some shapes and dtypes configuration. This will benefit the overall performance for BERT inference and hopefully, training, when enabled.
preetha-intel
pushed a commit
to intel/onnxruntime
that referenced
this pull request
Nov 29, 2022
Change ROCm to use tunable GEMM. It is not enabled in this PR. This will drastically improve GEMM performance in some shapes and dtypes configuration. This will benefit the overall performance for BERT inference and hopefully, training, when enabled.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related PRs #12855 #12856 #12857
Description: Enable ROCm to use tunable GEMM for better performance.
Motivation and Context
This drastically improve some GEMM performance, aka, the overall performance for bert inference.