-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Codegen] Allow padding of dynamic allocas #19399
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by: Instead of using template, would using AllocLikeOp make sense?
(I'm not asking for a change. It is just a question.)
Ah, I didn't know about this. That looks a bit better to me, thanks! |
Hmm, actually I think AllocLikeOp is just a tablegen class. I don't think I can access it in C++. I'll update the template typename to be more consistent with other code, though. |
4ebd47f
to
1d704a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Chatted with Max offline. Max is right, please ignore my comment. |
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
1d704a2
to
03d0870
Compare
… shapes (#19484) This PR does two things 1. Allow all GEMM shapes to use padded TileAndFuse Matmul configuration. This is still behind the `iree-codegen-llvmgpu-test-tile-and-fuse-matmul=false` flag by default and does not change the default behavior. However following PRs that have landed in the past month make it possible to relax the guards we originally had on this. #19196 #19307 llvm/llvm-project#117340 2. Allow fused producers to use use padded TileAndFuse Matmul configuration. Following PRs make this possible now #19399 llvm/llvm-project#119039 Together this allows us to do padded IGEMM with intrinsics for shapes unaligned to intrinsic which we use by default. [Here](https://docs.google.com/spreadsheets/d/1O-SdUZCn5pHsxx7JTGjIIdH6PWCFnvlfe4XBbjEBaIM/edit?gid=0#gid=0) is the performance difference observed in conv cases in iree-kernel-benchmark-module that utilize this change. A median speedup of 2.26x was observed. The numeric changes I observed with enabling this path were the same between any aligned shape when comparing intrinsic vs no intrinsic use. Generally some differences are noticed for narrow types like f16 but they are within a relative error of 0.001 but since our tests use absolute errors we may have to change some test values to account for this change. The perf difference in CI seem to be within noise margin compared to main, https://github.com/iree-org/iree/actions/runs/12323399269/attempts/1#summary-34399247902 --------- Signed-off-by: Nirvedh <nirvedh@gmail.com>
This PR adds support for padding for allocas in the PadDynamicAllocsPass. The padding works the same for alloca as for alloc.