[ONNX] Add converter for QAttention from Microsoft onnxruntime contrib opset #13654

KJlaccHoeUM9l · 2022-12-23T12:00:19Z

This PR adds support for QAttention - quantized version of Attention from Microsoft onnxruntime contrib opset.
An explanation and illustration of how this layer works can be found, for example, in @lena-voita NLP course.

tvm-bot · 2022-12-23T12:00:22Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

cc @ehsanmok _{See #10317 for details}

_{Generated by tvm-bot}

…atmul_tensorcore_cuda

KJlaccHoeUM9l · 2022-12-26T17:15:14Z

Hello @vvchernov, @echuraev, @AndrewZhaoLuo!
Could you review this PR?

echuraev

LGTM. But I don't have a lot of knowledge in this codebase. @jwfromm, @AndrewZhaoLuo could you please take a look at this PR?

echuraev · 2022-12-28T06:26:13Z

python/tvm/relay/frontend/onnx.py

+        #  Currently only (batch, past_seq_length + seq_length) shape is supported.
+        mask_index = inputs[5]
+
+        # Scalar, which means a per-tensor/layer quantization


nit: You have absolutely the same comment for input[3] and input[7]

echuraev · 2022-12-28T06:35:06Z

python/tvm/relay/frontend/onnx.py

+                result,
+                _op.multiply(lhs_scale, rhs_scale),
+                zero_point_zero,
+                axis=-1,  # TODO(agladyshev): what is 'axis' parameter for?


Do you still need this todo comment?

AndrewZhaoLuo · 2022-12-28T07:53:17Z

Apologies, I've been quite sick. I'll try to look at this Thursday.

AndrewZhaoLuo

Thanks! Sorry for late review. LGTM

…b opset (apache#13654) * init QAttention converter * add type and shape checking * add test for QAttention * add tests for optional parameters * change mask_index shape * add support for 'past' input * add support for 'unidirectional' attribute * expand test coverage * fix lint * fix pylint * fix batch dimension for topi/cuda/batch_matmul_tensorcore.py::batch_matmul_tensorcore_cuda * code review fix

KJlaccHoeUM9l added 9 commits December 20, 2022 14:25

init QAttention converter

9f2b84c

add type and shape checking

635bef1

add test for QAttention

848775e

add tests for optional parameters

18057fe

change mask_index shape

e64ffc2

add support for 'past' input

ae3c53f

add support for 'unidirectional' attribute

7eb6fff

expand test coverage

7ac15e9

fix lint

6c38a20

KJlaccHoeUM9l added 2 commits December 23, 2022 15:55

fix pylint

27aadeb

fix batch dimension for topi/cuda/batch_matmul_tensorcore.py::batch_m…

9d2a99f

…atmul_tensorcore_cuda

echuraev reviewed Dec 28, 2022

View reviewed changes

code review fix

6f92a1b

AndrewZhaoLuo approved these changes Jan 3, 2023

View reviewed changes

AndrewZhaoLuo merged commit e24d4fb into apache:main Jan 3, 2023

KJlaccHoeUM9l deleted the agladyshev/dev/qattention branch January 10, 2023 12:02

This was referenced Jan 17, 2023

[ONNX] Extend converter for Attention from Microsoft onnxruntime contrib opset #13797

Merged

[Bug] Attention and QAttention don't work properly in some cases microsoft/onnxruntime#14363

Open

ysh329 mentioned this pull request Apr 17, 2023

[Release] v0.12.0 Release Candidate Notes #14645

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ONNX] Add converter for QAttention from Microsoft onnxruntime contrib opset #13654

[ONNX] Add converter for QAttention from Microsoft onnxruntime contrib opset #13654

KJlaccHoeUM9l commented Dec 23, 2022

tvm-bot commented Dec 23, 2022

KJlaccHoeUM9l commented Dec 26, 2022

echuraev left a comment

echuraev Dec 28, 2022

KJlaccHoeUM9l Dec 28, 2022

echuraev Dec 28, 2022

KJlaccHoeUM9l Dec 28, 2022

AndrewZhaoLuo commented Dec 28, 2022

AndrewZhaoLuo left a comment

[ONNX] Add converter for QAttention from Microsoft onnxruntime contrib opset #13654

[ONNX] Add converter for QAttention from Microsoft onnxruntime contrib opset #13654

Conversation

KJlaccHoeUM9l commented Dec 23, 2022

tvm-bot commented Dec 23, 2022

KJlaccHoeUM9l commented Dec 26, 2022

echuraev left a comment

Choose a reason for hiding this comment

echuraev Dec 28, 2022

Choose a reason for hiding this comment

KJlaccHoeUM9l Dec 28, 2022

Choose a reason for hiding this comment

echuraev Dec 28, 2022

Choose a reason for hiding this comment

KJlaccHoeUM9l Dec 28, 2022

Choose a reason for hiding this comment

AndrewZhaoLuo commented Dec 28, 2022

AndrewZhaoLuo left a comment

Choose a reason for hiding this comment