-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor performance of horizontally fused contraction #19855
Comments
This was referenced Jan 30, 2025
nirvedhmeshram
added a commit
that referenced
this issue
Feb 4, 2025
…#19857) This PR matches the failure criteria to what we see in the SetContractConfig for vector distribute for bailing out on skinny matmuls. The dispatch in #19855 goes to 0.068 ms vs the default path which gets 1.64 ms as this skinny matmul with multiple dims cannot be currently supported by vector reduction and warp reduction but tile and fuse can support it using padding. This needs the flag `--iree-codegen-llvmgpu-test-tile-and-fuse-matmul=true` Also the `GPUMatmulShapeType` was becoming too large as this PR adds batch sizes to it and was giving the following error. ``` error: static_assert failed due to requirement 'sizeof(mlir::iree_compiler::GPUMatmulShapeType) <= 256' "You are trying to use a default number of inlined elements for SmallVector<T> but sizeof(T) is really big! Please use an explicit number of inlined elements with SmallVector<T, N> to make sure you really want that much inline storage." ``` This PR fixes this issue both by explictly mentioning vector sizes in the struct members. --------- Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>
ita9naiwa
pushed a commit
to ita9naiwa/iree
that referenced
this issue
Feb 4, 2025
…iree-org#19884) Currently some efforts such as iree-org#19854 and iree-org#19520 are ongoing to make the Tile and Fuse matmul pipeline on by default. However, these efforts are still WIP in achieving exact parity with the current default of Vector Distribute in all use cases. This PR in the time being tries Tile and Fuse after Vector Distribute so that we can get the benefits of tile and fuse such as handling unaligned to intrinsic shape while leaving the shapes that vector distribute handles untouched. Fixes : iree-org#19864 Fixes: iree-org#19855 --------- Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com> Signed-off-by: Hyunsung Lee <ita9naiwa@gmail.com>
ita9naiwa
pushed a commit
to ita9naiwa/iree
that referenced
this issue
Feb 4, 2025
…iree-org#19857) This PR matches the failure criteria to what we see in the SetContractConfig for vector distribute for bailing out on skinny matmuls. The dispatch in iree-org#19855 goes to 0.068 ms vs the default path which gets 1.64 ms as this skinny matmul with multiple dims cannot be currently supported by vector reduction and warp reduction but tile and fuse can support it using padding. This needs the flag `--iree-codegen-llvmgpu-test-tile-and-fuse-matmul=true` Also the `GPUMatmulShapeType` was becoming too large as this PR adds batch sizes to it and was giving the following error. ``` error: static_assert failed due to requirement 'sizeof(mlir::iree_compiler::GPUMatmulShapeType) <= 256' "You are trying to use a default number of inlined elements for SmallVector<T> but sizeof(T) is really big! Please use an explicit number of inlined elements with SmallVector<T, N> to make sure you really want that much inline storage." ``` This PR fixes this issue both by explictly mentioning vector sizes in the struct members. --------- Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com> Signed-off-by: Hyunsung Lee <ita9naiwa@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
What happened?
See #19847 for some context, the changes there had the side effect of allowing more horizontal fusion. However, this was leading to a slow dispatch.
I noticed disabling horizontal fusion was improving unet perf on the base commit too (but by less), so this issue may be affecting main.
Steps to reproduce your issue
Running fp16 Unet at dd66e55 will reproduce the slow dispatch. It can also be obtained by downloading this gist
Version information
Dispatch was created using dd66e55 from the PR but codegen issues are from main at 50a7087 (PR's base commit)
The text was updated successfully, but these errors were encountered: