Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DispatchCreation] Make truncate operations fuse with producers. #19847

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

MaheshRavishankar
Copy link
Contributor

No description provided.

@IanWood1 IanWood1 force-pushed the truncate_with_producer branch 2 times, most recently from 2ced301 to cebecc1 Compare January 29, 2025 21:44
@IanWood1
Copy link
Contributor

IanWood1 commented Jan 30, 2025

https://gist.github.com/IanWood1/0aceb30d6fc0a5a99ccca68e9eac6cba from fp16 unet is still a problem, and takes 2ms (~500ms total) when it should be in the single digit microsecond range

Edit: for some reason this change is causing more horizontal fusion to kick in leading to this bad dispatch

@IanWood1 IanWood1 force-pushed the truncate_with_producer branch 2 times, most recently from 6d767f6 to 2dbb8a5 Compare January 30, 2025 03:25
@MaheshRavishankar
Copy link
Contributor Author

Could you file an issue for the changed dispatch. @nirvedhmeshram can you help fixing that

@nirvedhmeshram
Copy link
Contributor

nirvedhmeshram commented Jan 30, 2025

The layout is pretty bad for this dispatch it is (M,K) * (B, N, K) -> (B, M, N). I can explore if we can improve perf for this, or should we be trying to make this (B,M,K) * (N, K) -> (B, M, N) followed by a transpose to make that (B, N, M) before dispatch creation?

@qedawkins
Copy link
Contributor

Why is (M,K) * (B, N, K) -> (B, M, N) a poor layout? This looks close to a typical matmul_transpose_b to me.

@nirvedhmeshram
Copy link
Contributor

Why is (M,K) * (B, N, K) -> (B, M, N) a poor layout? This looks close to a typical matmul_transpose_b to me.

I figured we would rather have the batch dimension on the lhs, rather than the the rhs, but you are right it shouldnt matter. Let me dig deeper if its just something in our default config.

@nirvedhmeshram
Copy link
Contributor

Currently this going though tile-and-fuse vectorize path, I even relaxed the limit of 4 we have to classify as matvec and it still gave

[iree-gpu-config-utils]: Matmul TileAndFuse Config
[iree-gpu-config-utils]: Attempting to deduce unaligned TileAndFuse MMA schedulee
[iree-gpu-config-utils]: Failed to deduce TileAndFuse MMA schedule

@nirvedhmeshram
Copy link
Contributor

Attempting to deduce unaligned TileAndFuse MMA schedulee

Sorry I made a bug in relaxing this, when done correctly we get, 0.068 ms vs the default path which gets 1.64 ms

MaheshRavishankar added a commit to MaheshRavishankar/iree that referenced this pull request Jan 30, 2025
… to same trunc operator.

Currently the fusion does not always fuse truncates with its producers
(that is being fixed in iree-org#19847),
but without that the truncate operators could be the same. Allow
horizontal fusion to work with that case.

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
@IanWood1
Copy link
Contributor

@nirvedhmeshram could you ping me when you have a branch, I want to bench that change + re-enable horizontal fusion.

@nirvedhmeshram
Copy link
Contributor

@nirvedhmeshram could you ping me when you have a branch, I want to bench that change + re-enable horizontal fusion.

@IanWood1 PR is ready #19857 you need the flag --iree-codegen-llvmgpu-test-tile-and-fuse-matmul=true,

MaheshRavishankar and others added 4 commits February 5, 2025 13:22
Signed-off-by: MaheshRavishankar <mravisha@amd.com>
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants