Enable ROCm to use tunable GEMM #12853

cloudhan · 2022-09-05T03:25:00Z

Description: Enable ROCm to use tunable GEMM for better performance.

Motivation and Context

Why is this change required? What problem does it solve?
This drastically improve some GEMM performance, aka, the overall performance for bert inference.

cloudhan · 2022-09-05T03:30:10Z

For recording purpose, the perf difference with initial try

Latency(ms)     Latency_P50     Latency_P75     Latency_P90     Latency_P95     Latency_P99     Throughput(QPS) model   graph_optimization_level        intra_op_num_threads    batch_size      sequence_length test_cases      test_timesuse_gpu
113.03  113.01  113.15  113.26  113.38  113.53  9059.37 fbv_bert_fp16_rocm_no_attention_fusion.onnx     ENABLE_ALL      24      1024    128     10      10      True
94.89   94.88   94.92   94.96   94.98   95.02   10791.95        fbv_bert_fp16_rocm_no_attention_fusion.onnx     ENABLE_ALL      24      1024    128     10      10      True

onnxruntime/core/providers/rocm/tunable/gemm.cu

cloudhan · 2022-09-28T08:27:12Z

This PR is split into 2, the following #13116 the enabling and testing for it.

This reverts commit 32c2c4b.

Reverts #12853 due to CI pipeline problem.

Change ROCm to use tunable GEMM. It is not enabled in this PR. This will drastically improve GEMM performance in some shapes and dtypes configuration. This will benefit the overall performance for BERT inference and hopefully, training, when enabled.

Reverts #12853 due to CI pipeline problem.

Update for ROCm CI before reland tunable GEMM #12853. This PR also update composable kernel to use CMakes's HIP language support so that we can mix C/C++ compiler with HIP compiler instead of locking to hip-clang

Change ROCm to use tunable GEMM. It is not enabled in this PR. This will drastically improve GEMM performance in some shapes and dtypes configuration. This will benefit the overall performance for BERT inference and hopefully, training, when enabled.# This is a combination of 2 commits.

Reland: Change ROCm to use tunable GEMM (#12853)

…ns and env var (#13116) Related PRs #12853 This allows the user enable/disbale tunable GEMM on demand.

Change ROCm to use tunable GEMM. It is not enabled in this PR. This will drastically improve GEMM performance in some shapes and dtypes configuration. This will benefit the overall performance for BERT inference and hopefully, training, when enabled.

cloudhan requested review from PeixuanZuo and zhangyaobit September 5, 2022 05:36

cloudhan force-pushed the guangyunhan/ort-use-tunable-gemm branch from 773ea60 to 0148fbb Compare September 5, 2022 08:18

cloudhan changed the base branch from main to guangyunhan/tunableop-move-to-ep September 5, 2022 08:36

cloudhan marked this pull request as ready for review September 5, 2022 08:37

cloudhan mentioned this pull request Sep 5, 2022

Remove the stub file tunable_op.h after moving it to EP #12858

Closed

cloudhan force-pushed the guangyunhan/tunableop-move-to-ep branch from 8e71431 to 14839aa Compare September 7, 2022 03:42

cloudhan force-pushed the guangyunhan/ort-use-tunable-gemm branch from 0148fbb to e856516 Compare September 7, 2022 11:31

cloudhan force-pushed the guangyunhan/tunableop-move-to-ep branch from 14839aa to 4988216 Compare September 8, 2022 04:12

cloudhan force-pushed the guangyunhan/ort-use-tunable-gemm branch from e856516 to f79b05b Compare September 8, 2022 04:13

cloudhan force-pushed the guangyunhan/tunableop-move-to-ep branch from 4988216 to 23a3f90 Compare September 21, 2022 11:33

Base automatically changed from guangyunhan/tunableop-move-to-ep to main September 23, 2022 03:10

Use tunable GEMM

833a4ea

cloudhan force-pushed the guangyunhan/ort-use-tunable-gemm branch from f79b05b to 833a4ea Compare September 27, 2022 05:22

microsoft deleted a comment from lgtm-com bot Sep 27, 2022

cloudhan mentioned this pull request Sep 27, 2022

Enable/Disbale tunable GEMM by using tunable switch in provider options and env var #13116

Merged

cloudhan added 2 commits September 27, 2022 06:44

Address lint and refine comment

8d34640

Use composable_kernel unconditionally

c4c5056

zhangyaobit reviewed Sep 28, 2022

View reviewed changes

onnxruntime/core/providers/rocm/tunable/gemm.cu Outdated Show resolved Hide resolved

Minor

8a841e9

cloudhan requested a review from zhangyaobit September 28, 2022 03:19

zhangyaobit approved these changes Sep 28, 2022

View reviewed changes

cloudhan merged commit 32c2c4b into main Sep 28, 2022

cloudhan deleted the guangyunhan/ort-use-tunable-gemm branch September 28, 2022 08:21

cloudhan added a commit that referenced this pull request Sep 29, 2022

Revert "Change ROCm to use tunable GEMM (#12853)"

be1ae4b

This reverts commit 32c2c4b.

cloudhan mentioned this pull request Sep 29, 2022

Revert "Enable ROCm to use tunable GEMM" #13160

Merged

cloudhan added a commit that referenced this pull request Sep 30, 2022

Revert "Enable ROCm to use tunable GEMM" (#13160)

c93cb8f

Reverts #12853 due to CI pipeline problem.

linnealovespie pushed a commit that referenced this pull request Sep 30, 2022

Revert "Enable ROCm to use tunable GEMM" (#13160)

1f18f65

Reverts #12853 due to CI pipeline problem.

cloudhan mentioned this pull request Oct 4, 2022

Update ROCm CI #13214

Merged

cloudhan mentioned this pull request Oct 7, 2022

Reland: Change ROCm to use tunable GEMM #13231

Merged

zhangyaobit pushed a commit that referenced this pull request Oct 14, 2022

Reland: Change ROCm to use tunable GEMM (#13231)

790e363

Reland: Change ROCm to use tunable GEMM (#12853)

zhangyaobit pushed a commit that referenced this pull request Oct 20, 2022

Enable/Disbale tunable GEMM by using tunable switch in provider optio…

fc12abf

…ns and env var (#13116) Related PRs #12853 This allows the user enable/disbale tunable GEMM on demand.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable ROCm to use tunable GEMM #12853

Enable ROCm to use tunable GEMM #12853

cloudhan commented Sep 5, 2022 •

edited

Loading

cloudhan commented Sep 5, 2022

cloudhan commented Sep 28, 2022

Enable ROCm to use tunable GEMM #12853

Enable ROCm to use tunable GEMM #12853

Conversation

cloudhan commented Sep 5, 2022 • edited Loading

cloudhan commented Sep 5, 2022

cloudhan commented Sep 28, 2022

cloudhan commented Sep 5, 2022 •

edited

Loading