[Tensorize][runtime] Add support for AMX(Advanced Matrix Extensions) through Tensor intrinsics #13642

Qianshui-Jiang · 2022-12-19T09:59:27Z

In this PR, we add support for AMX(Advanced Matrix Extensions) through Tensor intrinsics;

we chose to enabled the AMX support and INT8 intrinsics and dense op at first, which includes:

Add building options and LLVM AMX intrinsic target detection ;
Register a global AMX init and config function;
LLVM AMX intrinsic integrated as Tensor intrinsic (1 for microkernel computation and 1 for accumulation tile store);
Add test case on u8s8s32 matmul usding the AMX Tensor intrinsics;
Integration of int8 dense kernel, testcase also added;

BF16 and more op are going to be considered in the future.

…into amx_int8_dev

tvm-bot · 2022-12-19T09:59:31Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

cc @areusch _{See #10317 for details}

_{Generated by tvm-bot}

cbalint13

Thank you @Qianshui-Jiang for this latest x86 SIMD feature addition !

Just few cosmetic nits
The autoschedule + autotensorize could be extended, maybe later, in a subsequent PR.

python/tvm/topi/x86/dense_alter_op.py

cbalint13 · 2022-12-24T22:09:14Z

src/runtime/contrib/amx/amx_config.cc

+  int rows = args[0];
+  int cols = args[1];
+  LOG(INFO) << "rows: " << rows << ", cols:" << cols;
+  // -----------Config for AMX tile resgister----------------------


s|resgister|register|

tests/python/contrib/test_gemm_acc32_vnni.py

cbalint13

Looks good to me.

@Qianshui-Jiang , just curiousity (offtopic):
Do you have a TVM number for AMX performance (e.g. vs. VNNI) ?

Qianshui-Jiang · 2022-12-26T15:11:48Z

@cbalint13 Thanks for your curiousity!
In our test, Compared to TVM testcase test_fc_int8_acc32(), AMX int8 dense performes about 8X faster in single physical core.
(Since AMX machine not officially lunched yet, I can only gives a little perf data, but it's coming soon~)

masahi · 2022-12-27T07:03:23Z

@Qianshui-Jiang Can we test TVM-generated AMX code using the emulator (SDE)?

Qianshui-Jiang · 2022-12-27T14:45:33Z

@masahi Sure, you can use SDE by assign platform info like -spr,
But if you are not built on 5.16 kernel, you have to comment all the system related code runtime.amx_init ,
and don't need to init amx before the testcase when using SDE;

Qianshui-Jiang · 2022-12-27T14:48:19Z

@masahi, BTW, SDE would be very slow, the repeat times of time_evaluator should be small.

python/tvm/topi/x86/dense.py

python/tvm/topi/x86/tensor_intrin.py

masahi

Also please verify that the VNNI test https://github.com/apache/tvm/blob/b16a64d6edb9fd1a014fc51995dff7d0e2f4c84e/tests/python/unittest/test_meta_schedule_vnni_integration.py is still functional after this change.

masahi · 2022-12-31T05:22:55Z

python/tvm/topi/x86/dense.py

@@ -293,16 +324,13 @@ def dense_vnni_compute(cfg, X, packed_w, bias=None):
            ),
            axis=ak,
        ),
-        tag="dense_vnni",
-        attrs={"schedule_rule": "dense_vnni"},


This shouldn't be removed, it is used here

tvm/tests/python/unittest/test_meta_schedule_vnni_integration.py

Line 198 in b16a64d

register_func("meta_schedule.x86.dense_vnni", schedule_rule_dense_vnni)

.

Since this only affects MetaSchedule, you don't have to provide this value for AMX. So only when dense_int8_compute is called for VNNI, you need to provide this attribute.

@masahi Is this case only use the x86 int8 compute method and inject a particular TIR scheduling? Can we just change the attribute dense_vnni to dense_int8 which used here?

Yes, but it is important that we specify that we use this compute for VNNI. If the schedule rule annotation only says "dense_int8", we don't know which intrinsic to tensorize this compute with.

This shouldn't be removed, it is used here

tvm/tests/python/unittest/test_meta_schedule_vnni_integration.py

Line 198 in b16a64d

register_func("meta_schedule.x86.dense_vnni", schedule_rule_dense_vnni)

.
Since this only affects MetaSchedule, you don't have to provide this value for AMX. So only when dense_int8_compute is called for VNNI, you need to provide this attribute.

@masahi Sorry, may given the misunderstanding, I mean that can we use the dense_int8 in this test case? cuz here inject the VNNI intrinsic explicitly.
And by default in relay build flow, it will check if the VNNI or AMX is availiable and chose different schedulling.
I've verified that this test case still functional after the little modification in this commit bellow

Of course the test still works, because it was already written for VNNI. The point is that the name dense_vnni tells that the particular TE expression is meant for VNNI tensorization, in particular the weight is pre-packed appropriately. But a generic name like dense_int8 doesn't provide such information. Just because AMX can use the same layout doesn't mean we can use dense_int8. If MetaSchedule finds a compute annotated with dense_int8, it cannot tell if it should apply VNNI or AMX tensorization (if the latter is supported by MS in the future).

So please revert that commit and restore and pass attrs={"schedule_rule": "meta_schedule.x86.dense_vnni"} when you create a compute expression for VNNI. schedule_rule is not relevant for AMX for now.

@masahi yep, got it, the schedule rule for ms dense_vnni are restored.
Keep it remained but AMX not use it for now.
( A smalll question: How do we know inside of this compute expression if it's created for VNNI not AMX? )

I suggest checking the target string in the op strategy, and create separate compute for VNNI or AMX (rather than using the same function, dense_int8, for both).

This reverts commit 5718a05.

python/tvm/relay/op/strategy/x86.py

masahi

Are the changes in app/ directory intended?

masahi · 2023-01-05T00:27:09Z

apps/microtvm/zephyr/template_project/qemu-hack/qemu-system-arm

@@ -1 +0,0 @@
-qemu-system-i386


Is this change relevant? If not remove it.

@masahi by misoperation, already removed.

python/tvm/relay/op/strategy/x86.py

This reverts commit 2bda03e.

This reverts commit c2e9f26.

…through Tensor intrinsics (apache#13642) * add AMX config functions and building option. * amx tensor intrinsics and u8s8s32 matmul testcase * add int8 dense kernel use amx tensorize * add int8 dense kernel use amx tensorize * add amx init() and config() for dense test case * correct the amx config * fix lint. * fix dense schedule * remove operation of signal stack * fix nit * unified amx and vnni compute, remove dup one * fix lint * adopt to x86 int8 dense compute method; * Revert "adopt to x86 int8 dense compute method;" This reverts commit 5718a05. * restore schedule ruls specially for ms dense_vnni * add vnni ms target attributes * remove the misoperations * Revert "restore schedule ruls specially for ms dense_vnni" This reverts commit 2bda03e. * add vnni ms target attributes and remove misops * Revert "add vnni ms target attributes" This reverts commit c2e9f26. * remove the misops

Qianshui-Jiang added 7 commits December 19, 2022 01:01

add AMX config functions and building option.

3230886

amx tensor intrinsics and u8s8s32 matmul testcase

2312242

add int8 dense kernel use amx tensorize

3e2fc4e

add int8 dense kernel use amx tensorize

3f19099

add amx init() and config() for dense test case

c53c394

Merge branch 'amx_int8_dev' of https://github.com/Qianshui-Jiang/tvm …

98b9a23

…into amx_int8_dev

correct the amx config

79d6636

Qianshui-Jiang changed the title ~~[Tensorize][TOPI] Add support for Intel® AMX(Advanced Matrix Extensions) through Tensor intrinsics~~ [Tensorize][runtime] Add support for AMX(Advanced Matrix Extensions) through Tensor intrinsics Dec 19, 2022

Qianshui-Jiang added 2 commits December 19, 2022 05:31

fix lint.

b866673

fix dense schedule

48fa37e

masahi self-assigned this Dec 19, 2022

remove operation of signal stack

dd1eb24

cbalint13 requested changes Dec 24, 2022

View reviewed changes

cbalint13 mentioned this pull request Dec 24, 2022

[TIR][TOPI][x86][CI] Support skylake avx512 #13621

Merged

cbalint13 reviewed Dec 24, 2022

View reviewed changes

tests/python/contrib/test_gemm_acc32_vnni.py Outdated Show resolved Hide resolved

fix nit

73f45ef

cbalint13 approved these changes Dec 25, 2022

View reviewed changes

masahi requested changes Dec 28, 2022

View reviewed changes

python/tvm/topi/x86/dense.py Outdated Show resolved Hide resolved

python/tvm/topi/x86/dense.py Show resolved Hide resolved

python/tvm/topi/x86/tensor_intrin.py Show resolved Hide resolved

Qianshui-Jiang added 2 commits December 29, 2022 02:07

unified amx and vnni compute, remove dup one

b921052

fix lint

e749360

Qianshui-Jiang requested a review from masahi December 31, 2022 03:04

masahi requested changes Dec 31, 2022

View reviewed changes

adopt to x86 int8 dense compute method;

5718a05

Qianshui-Jiang requested a review from masahi December 31, 2022 14:22

Revert "adopt to x86 int8 dense compute method;"

581331a

This reverts commit 5718a05.

restore schedule ruls specially for ms dense_vnni

2bda03e

masahi reviewed Jan 2, 2023

View reviewed changes

python/tvm/relay/op/strategy/x86.py Show resolved Hide resolved

add vnni ms target attributes

c2e9f26

masahi requested changes Jan 5, 2023

View reviewed changes

masahi approved these changes Jan 5, 2023

View reviewed changes

masahi merged commit 07a5a9e into apache:main Jan 5, 2023

Qianshui-Jiang added 5 commits January 5, 2023 10:07

remove the misoperations

4469fd9

Revert "restore schedule ruls specially for ms dense_vnni"

f763d52

This reverts commit 2bda03e.

add vnni ms target attributes and remove misops

1f59aff

Revert "add vnni ms target attributes"

383d0b2

This reverts commit c2e9f26.

remove the misops

9422363

Qianshui-Jiang mentioned this pull request Jan 10, 2023

[Tensorize][TOPI] Add AMX Tensorizing for int8 batch matmul #13745

Merged

ysh329 mentioned this pull request Apr 17, 2023

[Release] v0.12.0 Release Candidate Notes #14645

Closed

cbalint13 mentioned this pull request Sep 11, 2023

[Target][TOPI] Use LLVM for x86 CPU feature lookup #15685

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tensorize][runtime] Add support for AMX(Advanced Matrix Extensions) through Tensor intrinsics #13642

[Tensorize][runtime] Add support for AMX(Advanced Matrix Extensions) through Tensor intrinsics #13642

Qianshui-Jiang commented Dec 19, 2022 •

edited

Loading

tvm-bot commented Dec 19, 2022 •

edited

Loading

cbalint13 left a comment •

edited

Loading

cbalint13 Dec 24, 2022

cbalint13 left a comment •

edited

Loading

Qianshui-Jiang commented Dec 26, 2022 •

edited

Loading

masahi commented Dec 27, 2022

Qianshui-Jiang commented Dec 27, 2022

Qianshui-Jiang commented Dec 27, 2022

masahi left a comment

masahi Dec 31, 2022

Qianshui-Jiang Dec 31, 2022

masahi Jan 1, 2023 •

edited

Loading

Qianshui-Jiang Jan 1, 2023

masahi Jan 1, 2023 •

edited

Loading

Qianshui-Jiang Jan 1, 2023

masahi Jan 2, 2023

masahi left a comment

masahi Jan 5, 2023

Qianshui-Jiang Jan 5, 2023

		@@ -1 +0,0 @@
		qemu-system-i386

[Tensorize][runtime] Add support for AMX(Advanced Matrix Extensions) through Tensor intrinsics #13642

[Tensorize][runtime] Add support for AMX(Advanced Matrix Extensions) through Tensor intrinsics #13642

Conversation

Qianshui-Jiang commented Dec 19, 2022 • edited Loading

tvm-bot commented Dec 19, 2022 • edited Loading

cbalint13 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cbalint13 left a comment • edited Loading

Choose a reason for hiding this comment

Qianshui-Jiang commented Dec 26, 2022 • edited Loading

masahi commented Dec 27, 2022

Qianshui-Jiang commented Dec 27, 2022

Qianshui-Jiang commented Dec 27, 2022

masahi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masahi Jan 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masahi Jan 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masahi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Qianshui-Jiang commented Dec 19, 2022 •

edited

Loading

tvm-bot commented Dec 19, 2022 •

edited

Loading

cbalint13 left a comment •

edited

Loading

cbalint13 left a comment •

edited

Loading

Qianshui-Jiang commented Dec 26, 2022 •

edited

Loading

masahi Jan 1, 2023 •

edited

Loading

masahi Jan 1, 2023 •

edited

Loading