-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tensorize][runtime] Add support for AMX(Advanced Matrix Extensions) through Tensor intrinsics #13642
Conversation
…into amx_int8_dev
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment. Generated by tvm-bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Qianshui-Jiang for this latest x86 SIMD feature addition !
- Just few cosmetic nits
- The autoschedule + autotensorize could be extended, maybe later, in a subsequent PR.
int rows = args[0]; | ||
int cols = args[1]; | ||
LOG(INFO) << "rows: " << rows << ", cols:" << cols; | ||
// -----------Config for AMX tile resgister---------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s|resgister|register|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
@Qianshui-Jiang , just curiousity (offtopic):
Do you have a TVM number for AMX performance (e.g. vs. VNNI) ?
@cbalint13 Thanks for your curiousity! |
@Qianshui-Jiang Can we test TVM-generated AMX code using the emulator (SDE)? |
@masahi Sure, you can use SDE by assign platform info like |
@masahi, BTW, SDE would be very slow, the repeat times of time_evaluator should be small. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please verify that the VNNI test https://github.com/apache/tvm/blob/b16a64d6edb9fd1a014fc51995dff7d0e2f4c84e/tests/python/unittest/test_meta_schedule_vnni_integration.py is still functional after this change.
@@ -293,16 +324,13 @@ def dense_vnni_compute(cfg, X, packed_w, bias=None): | |||
), | |||
axis=ak, | |||
), | |||
tag="dense_vnni", | |||
attrs={"schedule_rule": "dense_vnni"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be removed, it is used here
register_func("meta_schedule.x86.dense_vnni", schedule_rule_dense_vnni) |
Since this only affects MetaSchedule, you don't have to provide this value for AMX. So only when dense_int8_compute
is called for VNNI, you need to provide this attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@masahi Is this case only use the x86 int8 compute method and inject a particular TIR scheduling? Can we just change the attribute dense_vnni
to dense_int8
which used here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it is important that we specify that we use this compute for VNNI. If the schedule rule annotation only says "dense_int8", we don't know which intrinsic to tensorize this compute with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be removed, it is used here
register_func("meta_schedule.x86.dense_vnni", schedule_rule_dense_vnni) .
Since this only affects MetaSchedule, you don't have to provide this value for AMX. So only whendense_int8_compute
is called for VNNI, you need to provide this attribute.
@masahi Sorry, may given the misunderstanding, I mean that can we use the dense_int8
in this test case? cuz here inject the VNNI intrinsic explicitly.
And by default in relay build flow, it will check if the VNNI or AMX is availiable and chose different schedulling.
I've verified that this test case still functional after the little modification in this commit bellow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course the test still works, because it was already written for VNNI. The point is that the name dense_vnni
tells that the particular TE expression is meant for VNNI tensorization, in particular the weight is pre-packed appropriately. But a generic name like dense_int8
doesn't provide such information. Just because AMX can use the same layout doesn't mean we can use dense_int8
. If MetaSchedule finds a compute annotated with dense_int8
, it cannot tell if it should apply VNNI or AMX tensorization (if the latter is supported by MS in the future).
So please revert that commit and restore and pass attrs={"schedule_rule": "meta_schedule.x86.dense_vnni"}
when you create a compute expression for VNNI. schedule_rule
is not relevant for AMX for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@masahi yep, got it, the schedule rule for ms dense_vnni are restored.
Keep it remained but AMX not use it for now.
( A smalll question: How do we know inside of this compute expression if it's created for VNNI not AMX? )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest checking the target string in the op strategy, and create separate compute for VNNI or AMX (rather than using the same function, dense_int8
, for both).
This reverts commit 5718a05.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the changes in app/
directory intended?
@@ -1 +0,0 @@ | |||
qemu-system-i386 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change relevant? If not remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@masahi by misoperation, already removed.
…through Tensor intrinsics (apache#13642) * add AMX config functions and building option. * amx tensor intrinsics and u8s8s32 matmul testcase * add int8 dense kernel use amx tensorize * add int8 dense kernel use amx tensorize * add amx init() and config() for dense test case * correct the amx config * fix lint. * fix dense schedule * remove operation of signal stack * fix nit * unified amx and vnni compute, remove dup one * fix lint * adopt to x86 int8 dense compute method; * Revert "adopt to x86 int8 dense compute method;" This reverts commit 5718a05. * restore schedule ruls specially for ms dense_vnni * add vnni ms target attributes * remove the misoperations * Revert "restore schedule ruls specially for ms dense_vnni" This reverts commit 2bda03e. * add vnni ms target attributes and remove misops * Revert "add vnni ms target attributes" This reverts commit c2e9f26. * remove the misops
In this PR, we add support for AMX(Advanced Matrix Extensions) through Tensor intrinsics;
we chose to enabled the AMX support and INT8 intrinsics and dense op at first, which includes:
BF16 and more op are going to be considered in the future.