Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance of horizontally fused contraction #19855

Closed
IanWood1 opened this issue Jan 30, 2025 · 0 comments
Closed

Poor performance of horizontally fused contraction #19855

IanWood1 opened this issue Jan 30, 2025 · 0 comments
Assignees
Labels
bug 🐞 Something isn't working codegen Shared code generation infrastructure and dialects

Comments

@IanWood1
Copy link
Contributor

IanWood1 commented Jan 30, 2025

What happened?

See #19847 for some context, the changes there had the side effect of allowing more horizontal fusion. However, this was leading to a slow dispatch.

I noticed disabling horizontal fusion was improving unet perf on the base commit too (but by less), so this issue may be affecting main.

Steps to reproduce your issue

Running fp16 Unet at dd66e55 will reproduce the slow dispatch. It can also be obtained by downloading this gist

Version information

Dispatch was created using dd66e55 from the PR but codegen issues are from main at 50a7087 (PR's base commit)

@IanWood1 IanWood1 added the bug 🐞 Something isn't working label Jan 30, 2025
@IanWood1 IanWood1 added the codegen Shared code generation infrastructure and dialects label Jan 30, 2025
nirvedhmeshram added a commit that referenced this issue Feb 4, 2025
…#19857)

This PR matches the failure criteria to what we see in the
SetContractConfig for vector distribute for bailing out on skinny
matmuls.

The dispatch in #19855 goes to
0.068 ms vs the default path which gets 1.64 ms as this skinny matmul
with multiple dims cannot be currently supported by vector reduction and
warp reduction but tile and fuse can support it using padding.

This needs the flag
`--iree-codegen-llvmgpu-test-tile-and-fuse-matmul=true`

Also the `GPUMatmulShapeType` was becoming too large as this PR adds
batch sizes to it and was giving the following error.
```
 error: static_assert failed due to requirement 'sizeof(mlir::iree_compiler::GPUMatmulShapeType) <= 256' "You are trying to use a default number of inlined elements for SmallVector<T> but sizeof(T) is really big! Please use an explicit number of inlined elements with SmallVector<T, N> to make sure you really want that much inline storage."
 ```
 This PR fixes this issue both by explictly mentioning vector sizes in the struct members.

---------

Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>
ita9naiwa pushed a commit to ita9naiwa/iree that referenced this issue Feb 4, 2025
…iree-org#19884)

Currently some efforts such as
iree-org#19854 and
iree-org#19520 are ongoing to make the Tile
and Fuse matmul pipeline on by default. However, these efforts are still
WIP in achieving exact parity with the current default of Vector
Distribute in all use cases.
This PR in the time being tries Tile and Fuse after Vector Distribute so
that we can get the benefits of tile and fuse such as handling unaligned
to intrinsic shape while leaving the shapes that vector distribute
handles untouched.
Fixes :  iree-org#19864
Fixes: iree-org#19855

---------

Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>
Signed-off-by: Hyunsung Lee <ita9naiwa@gmail.com>
ita9naiwa pushed a commit to ita9naiwa/iree that referenced this issue Feb 4, 2025
…iree-org#19857)

This PR matches the failure criteria to what we see in the
SetContractConfig for vector distribute for bailing out on skinny
matmuls.

The dispatch in iree-org#19855 goes to
0.068 ms vs the default path which gets 1.64 ms as this skinny matmul
with multiple dims cannot be currently supported by vector reduction and
warp reduction but tile and fuse can support it using padding.

This needs the flag
`--iree-codegen-llvmgpu-test-tile-and-fuse-matmul=true`

Also the `GPUMatmulShapeType` was becoming too large as this PR adds
batch sizes to it and was giving the following error.
```
 error: static_assert failed due to requirement 'sizeof(mlir::iree_compiler::GPUMatmulShapeType) <= 256' "You are trying to use a default number of inlined elements for SmallVector<T> but sizeof(T) is really big! Please use an explicit number of inlined elements with SmallVector<T, N> to make sure you really want that much inline storage."
 ```
 This PR fixes this issue both by explictly mentioning vector sizes in the struct members.

---------

Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>
Signed-off-by: Hyunsung Lee <ita9naiwa@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working codegen Shared code generation infrastructure and dialects
Projects
None yet
Development

No branches or pull requests

2 participants