-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU] Generic matmul-like not lowering to MFMA by default #19864
Comments
One thing to note is that |
nirvedhmeshram
added a commit
that referenced
this issue
Feb 3, 2025
…#19884) Currently some efforts such as #19854 and #19520 are ongoing to make the Tile and Fuse matmul pipeline on by default. However, these efforts are still WIP in achieving exact parity with the current default of Vector Distribute in all use cases. This PR in the time being tries Tile and Fuse after Vector Distribute so that we can get the benefits of tile and fuse such as handling unaligned to intrinsic shape while leaving the shapes that vector distribute handles untouched. Fixes : #19864 Fixes: #19855 --------- Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>
ita9naiwa
pushed a commit
to ita9naiwa/iree
that referenced
this issue
Feb 4, 2025
…iree-org#19884) Currently some efforts such as iree-org#19854 and iree-org#19520 are ongoing to make the Tile and Fuse matmul pipeline on by default. However, these efforts are still WIP in achieving exact parity with the current default of Vector Distribute in all use cases. This PR in the time being tries Tile and Fuse after Vector Distribute so that we can get the benefits of tile and fuse such as handling unaligned to intrinsic shape while leaving the shapes that vector distribute handles untouched. Fixes : iree-org#19864 Fixes: iree-org#19855 --------- Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com> Signed-off-by: Hyunsung Lee <ita9naiwa@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Issue
Trying to compile some linalg IR with
linalg.generic
ops that are essentially matmuls, the default compiler flags don't result in the matmul-like ops getting lowered to mfma instructions for mi300 (even after trying out a few reasonable sounding flags likeiree-preprocessing-pad-to-intrinsics
). After asking a few people, I was told to try using the flag--iree-codegen-llvmgpu-test-tile-and-fuse-matmul
, which did generate these instructions.The option
--debug-only=iree-llvmgpu-kernel-config
(also suggested to me) provided some valuable information about what is going on, but this issue arose when trying to work with other teams who are testing IREE out from a pip install (so I don't think debug info is available to them).I suppose the ask here is: can we turn on
-iree-codegen-llvmgpu-test-tile-and-fuse-matmul
by default?Some other asks: Is it reasonable to emit a warning (thinking of non-codegen folks) that says "hey, this big matmul-like op isn't going down a high-performance path, try using *** to see what is going on".
Generic IR
Here is an IR snippet I was trying to benchmark for MI300:
Trying
Then
Gave poor performance, so (naively) I took a look at some compile dumps with
--mlir-print-ir-after-all
, and noticed that this matmul was getting converted intovector.fma
ops instead ofamdgpu.mfma
ops.When trying
It does seem to pad the matmul, but I still get
vector.fma
instructions instead ofamdgpu.mfma
.IR With Named Op
Here is some equivalent IR to the first:
With no flags, this also lowers to
vector.fma
ops (probably expected if it doesn't match any intrinsics). With thepad-to-intrinsics
preprocessing pass, however, it does actually lower toamdgpu.mfma
and showed significant performance improvement (around 10x faster than the generic version with the same flag). I'm not sure why the preprocessing pass works here and not with the generic op.The text was updated successfully, but these errors were encountered: