-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU] Match Tile And Fuse skinny matmul bail-out to Vector Distribute #19857
[GPU] Match Tile And Fuse skinny matmul bail-out to Vector Distribute #19857
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd need a bit more evidence to change this heuristics. Can you run this over matves-like shapes with, say, 1x to 8x columns in the 'vector' operand and report the numbers? We can add these to iree-kernel-benchmark.
When I benchmarked this in the past, the vector reduction pipeline did a very good job on matvecs with 4 columns.
@kuhar would something like this work? |
Can we make the path that doesn't map to intrinsic directly because of shapes go down the tile and fuse by default? |
I think we need this case split:
|
09ff19c
to
01a6ae6
Compare
point 2. and 3. are in scope for this PR. I have re-invented this PR to make it so that we correctly fail on only those cases that vector reduction would actually support. |
@kuhar did not end up doing the shape sweep as I am not changing the heuristic. |
compiler/src/iree/compiler/Codegen/Common/GPU/GPUHeuristics.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine
compiler/src/iree/compiler/Codegen/Dialect/GPU/TargetUtils/ConfigUtils.cpp
Outdated
Show resolved
Hide resolved
eb9609f
to
5e66560
Compare
Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>
Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>
Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>
Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>
Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>
a378bfb
to
f613634
Compare
Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>
d3b37a9
to
e1c3217
Compare
W7900 runner seems down but since other runners are green, I believe this is okay to merge so landing it. |
…iree-org#19857) This PR matches the failure criteria to what we see in the SetContractConfig for vector distribute for bailing out on skinny matmuls. The dispatch in iree-org#19855 goes to 0.068 ms vs the default path which gets 1.64 ms as this skinny matmul with multiple dims cannot be currently supported by vector reduction and warp reduction but tile and fuse can support it using padding. This needs the flag `--iree-codegen-llvmgpu-test-tile-and-fuse-matmul=true` Also the `GPUMatmulShapeType` was becoming too large as this PR adds batch sizes to it and was giving the following error. ``` error: static_assert failed due to requirement 'sizeof(mlir::iree_compiler::GPUMatmulShapeType) <= 256' "You are trying to use a default number of inlined elements for SmallVector<T> but sizeof(T) is really big! Please use an explicit number of inlined elements with SmallVector<T, N> to make sure you really want that much inline storage." ``` This PR fixes this issue both by explictly mentioning vector sizes in the struct members. --------- Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com> Signed-off-by: Hyunsung Lee <ita9naiwa@gmail.com>
This PR matches the failure criteria to what we see in the SetContractConfig for vector distribute for bailing out on skinny matmuls.
The dispatch in #19855 goes to 0.068 ms vs the default path which gets 1.64 ms as this skinny matmul with multiple dims cannot be currently supported by vector reduction and warp reduction but tile and fuse can support it using padding.
This needs the flag
--iree-codegen-llvmgpu-test-tile-and-fuse-matmul=true
Also the
GPUMatmulShapeType
was becoming too large as this PR adds batch sizes to it and was giving the following error.This PR fixes this issue both by explictly mentioning vector sizes in the struct members.