Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Codegen] Allow padding of dynamic allocas #19399

Merged
merged 2 commits into from
Dec 13, 2024

Conversation

Max191
Copy link
Contributor

@Max191 Max191 commented Dec 6, 2024

This PR adds support for padding for allocas in the PadDynamicAllocsPass. The padding works the same for alloca as for alloc.

Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by: Instead of using template, would using AllocLikeOp make sense?

(I'm not asking for a change. It is just a question.)

https://github.com/llvm/llvm-project/blob/12bdeba76eef1c7adf004a280036a7fb690ba573/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td#L57-L67

@Max191
Copy link
Contributor Author

Max191 commented Dec 6, 2024

Drive-by: Instead of using template, would using AllocLikeOp make sense?

(I'm asking for a change. It is just a question.)

https://github.com/llvm/llvm-project/blob/12bdeba76eef1c7adf004a280036a7fb690ba573/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td#L57-L67

Ah, I didn't know about this. That looks a bit better to me, thanks!

@Max191
Copy link
Contributor Author

Max191 commented Dec 6, 2024

Drive-by: Instead of using template, would using AllocLikeOp make sense?

Hmm, actually I think AllocLikeOp is just a tablegen class. I don't think I can access it in C++. I'll update the template typename to be more consistent with other code, though.

@Max191 Max191 force-pushed the pad-dynamic-allocas branch from 4ebd47f to 1d704a2 Compare December 6, 2024 19:07
Copy link
Contributor

@nirvedhmeshram nirvedhmeshram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hanhanW
Copy link
Contributor

hanhanW commented Dec 6, 2024

Drive-by: Instead of using template, would using AllocLikeOp make sense?

Hmm, actually I think AllocLikeOp is just a tablegen class. I don't think I can access it in C++. I'll update the template typename to be more consistent with other code, though.

Chatted with Max offline. Max is right, please ignore my comment.

Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
@Max191 Max191 force-pushed the pad-dynamic-allocas branch from 1d704a2 to 03d0870 Compare December 13, 2024 16:48
@Max191 Max191 enabled auto-merge (squash) December 13, 2024 16:49
@Max191 Max191 merged commit 99b600f into iree-org:main Dec 13, 2024
39 checks passed
nirvedhmeshram added a commit that referenced this pull request Dec 18, 2024
… shapes (#19484)

This PR does two things
1. Allow all GEMM shapes to use padded TileAndFuse Matmul configuration.
This is still behind the
`iree-codegen-llvmgpu-test-tile-and-fuse-matmul=false` flag by default
and does not change the default behavior. However following PRs that
have landed in the past month make it possible to relax the guards we
originally had on this.
#19196
#19307
llvm/llvm-project#117340
2. Allow fused producers to use use padded TileAndFuse Matmul
configuration. Following PRs make this possible now
#19399
llvm/llvm-project#119039

Together this allows us to do padded IGEMM with intrinsics for shapes
unaligned to intrinsic which we use by default.
[Here](https://docs.google.com/spreadsheets/d/1O-SdUZCn5pHsxx7JTGjIIdH6PWCFnvlfe4XBbjEBaIM/edit?gid=0#gid=0)
is the performance difference observed in conv cases in
iree-kernel-benchmark-module that utilize this change. A median speedup
of 2.26x was observed.

The numeric changes I observed with enabling this path were the same
between any aligned shape when comparing intrinsic vs no intrinsic use.
Generally some differences are noticed for narrow types like f16 but
they are within a relative error of 0.001 but since our tests use
absolute errors we may have to change some test values to account for
this change.

The perf difference in CI seem to be within noise margin compared to
main,
https://github.com/iree-org/iree/actions/runs/12323399269/attempts/1#summary-34399247902

---------

Signed-off-by: Nirvedh <nirvedh@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants