[GPU] Match Tile And Fuse skinny matmul bail-out to Vector Distribute #19857

nirvedhmeshram · 2025-01-30T17:55:42Z

This PR matches the failure criteria to what we see in the SetContractConfig for vector distribute for bailing out on skinny matmuls.

The dispatch in #19855 goes to 0.068 ms vs the default path which gets 1.64 ms as this skinny matmul with multiple dims cannot be currently supported by vector reduction and warp reduction but tile and fuse can support it using padding.

This needs the flag --iree-codegen-llvmgpu-test-tile-and-fuse-matmul=true

Also the GPUMatmulShapeType was becoming too large as this PR adds batch sizes to it and was giving the following error.

 error: static_assert failed due to requirement 'sizeof(mlir::iree_compiler::GPUMatmulShapeType) <= 256' "You are trying to use a default number of inlined elements for SmallVector<T> but sizeof(T) is really big! Please use an explicit number of inlined elements with SmallVector<T, N> to make sure you really want that much inline storage."

This PR fixes this issue both by explictly mentioning vector sizes in the struct members.

kuhar

I think we'd need a bit more evidence to change this heuristics. Can you run this over matves-like shapes with, say, 1x to 8x columns in the 'vector' operand and report the numbers? We can add these to iree-kernel-benchmark.

When I benchmarked this in the past, the vector reduction pipeline did a very good job on matvecs with 4 columns.

nirvedhmeshram · 2025-01-30T18:31:46Z

@kuhar would something like this work?
for i=1,2,4,8
NARROW = [
(i, 128, 128),
(i, 256, 256),
(i, 512, 512),
(i, 1024, 1024),
(i, 2048, 2048),
(i, 4096, 4096),
(i, 8192, 8192),
(128, i, 128),
(256, i, 256),
(512, i, 512),
(1024, i, 1024),
(2048, i, 2048),
(4096, i, 4096),
(8192, i, 8192),
]

MaheshRavishankar · 2025-01-30T18:54:31Z

Can we make the path that doesn't map to intrinsic directly because of shapes go down the tile and fuse by default?

kuhar · 2025-01-30T18:57:20Z

Can we make the path that doesn't map to intrinsic directly because of shapes go down the tile and fuse by default?

I think we need this case split:

Multiple of the intrinsic of supported type --> Default (vector distribution or tile and fuse)
Matvec-like --> warp reduction
Everything else --> tile and fuse

nirvedhmeshram · 2025-01-31T19:17:46Z

Can we make the path that doesn't map to intrinsic directly because of shapes go down the tile and fuse by default?

I think we need this case split:

Multiple of the intrinsic of supported type --> Default (vector distribution or tile and fuse)

Matvec-like --> warp reduction

Everything else --> tile and fuse

point 2. and 3. are in scope for this PR. I have re-invented this PR to make it so that we correctly fail on only those cases that vector reduction would actually support.

nirvedhmeshram · 2025-01-31T19:21:39Z

@kuhar did not end up doing the shape sweep as I am not changing the heuristic.

compiler/src/iree/compiler/Codegen/Common/GPU/GPUHeuristics.cpp

compiler/src/iree/compiler/Codegen/Common/GPU/GPUHeuristics.h

jerryyin

LGTM

kuhar

Seems fine

compiler/src/iree/compiler/Codegen/Dialect/GPU/TargetUtils/ConfigUtils.cpp

compiler/src/iree/compiler/Codegen/LLVMGPU/KernelConfig.cpp

Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>

nirvedhmeshram · 2025-02-04T01:10:34Z

W7900 runner seems down but since other runners are green, I believe this is okay to merge so landing it.

…iree-org#19857) This PR matches the failure criteria to what we see in the SetContractConfig for vector distribute for bailing out on skinny matmuls. The dispatch in iree-org#19855 goes to 0.068 ms vs the default path which gets 1.64 ms as this skinny matmul with multiple dims cannot be currently supported by vector reduction and warp reduction but tile and fuse can support it using padding. This needs the flag `--iree-codegen-llvmgpu-test-tile-and-fuse-matmul=true` Also the `GPUMatmulShapeType` was becoming too large as this PR adds batch sizes to it and was giving the following error. ``` error: static_assert failed due to requirement 'sizeof(mlir::iree_compiler::GPUMatmulShapeType) <= 256' "You are trying to use a default number of inlined elements for SmallVector<T> but sizeof(T) is really big! Please use an explicit number of inlined elements with SmallVector<T, N> to make sure you really want that much inline storage." ``` This PR fixes this issue both by explictly mentioning vector sizes in the struct members. --------- Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com> Signed-off-by: Hyunsung Lee <ita9naiwa@gmail.com>

nirvedhmeshram requested review from antiagainst and qedawkins as code owners January 30, 2025 17:55

nirvedhmeshram mentioned this pull request Jan 30, 2025

[DispatchCreation] Make truncate operations fuse with producers. #19847

Draft

nirvedhmeshram requested review from MaheshRavishankar, kuhar and Groverkss as code owners January 30, 2025 18:14

kuhar reviewed Jan 30, 2025

View reviewed changes

nirvedhmeshram force-pushed the relax_matvec_threshold_further branch from 09ff19c to 01a6ae6 Compare January 31, 2025 19:14

nirvedhmeshram changed the title ~~[GPU] Only dont do padding for pure matvecs~~ [GPU] Match Tile And Fuse skinny matmul bail-out to Vector Distribute Jan 31, 2025

nirvedhmeshram requested a review from kuhar January 31, 2025 19:20

nirvedhmeshram requested a review from jerryyin February 3, 2025 17:20

nirvedhmeshram mentioned this pull request Feb 3, 2025

[GPU] Use tile and fuse for matmul after vector distribute by default #19884

Merged

jerryyin reviewed Feb 3, 2025

View reviewed changes

compiler/src/iree/compiler/Codegen/Common/GPU/GPUHeuristics.cpp Show resolved Hide resolved

compiler/src/iree/compiler/Codegen/Common/GPU/GPUHeuristics.cpp Show resolved Hide resolved

kuhar reviewed Feb 3, 2025

View reviewed changes

compiler/src/iree/compiler/Codegen/Common/GPU/GPUHeuristics.cpp Outdated Show resolved Hide resolved

compiler/src/iree/compiler/Codegen/Common/GPU/GPUHeuristics.h Outdated Show resolved Hide resolved

jerryyin self-requested a review February 3, 2025 21:05

jerryyin approved these changes Feb 3, 2025

View reviewed changes

kuhar approved these changes Feb 3, 2025

View reviewed changes

compiler/src/iree/compiler/Codegen/Dialect/GPU/TargetUtils/ConfigUtils.cpp Outdated Show resolved Hide resolved

compiler/src/iree/compiler/Codegen/LLVMGPU/KernelConfig.cpp Outdated Show resolved Hide resolved

nirvedhmeshram force-pushed the relax_matvec_threshold_further branch 3 times, most recently from eb9609f to 5e66560 Compare February 3, 2025 21:42

nirvedhmeshram added 5 commits February 3, 2025 16:58

[GPU] Only dont do padding for pure matvecs

7a35663

Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>

add batch in vector distribute

9911ef4

Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>

address reviwer comments

84cc41d

Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>

Reviwer comments

9e0462b

Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>

bump as CI seems stuck

97e9cb4

Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>

fix new test

f613634

nirvedhmeshram force-pushed the relax_matvec_threshold_further branch from a378bfb to f613634 Compare February 3, 2025 23:10

fix bug in cantargetintrinisic ordering

e1c3217

Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>

nirvedhmeshram force-pushed the relax_matvec_threshold_further branch from d3b37a9 to e1c3217 Compare February 3, 2025 23:49

nirvedhmeshram merged commit d96a3f0 into iree-org:main Feb 4, 2025
42 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] Match Tile And Fuse skinny matmul bail-out to Vector Distribute #19857

[GPU] Match Tile And Fuse skinny matmul bail-out to Vector Distribute #19857

nirvedhmeshram commented Jan 30, 2025 •

edited

Loading

kuhar left a comment

nirvedhmeshram commented Jan 30, 2025

MaheshRavishankar commented Jan 30, 2025

kuhar commented Jan 30, 2025

nirvedhmeshram commented Jan 31, 2025

nirvedhmeshram commented Jan 31, 2025

jerryyin left a comment

kuhar left a comment

nirvedhmeshram commented Feb 4, 2025

[GPU] Match Tile And Fuse skinny matmul bail-out to Vector Distribute #19857

[GPU] Match Tile And Fuse skinny matmul bail-out to Vector Distribute #19857

Conversation

nirvedhmeshram commented Jan 30, 2025 • edited Loading

kuhar left a comment

Choose a reason for hiding this comment

nirvedhmeshram commented Jan 30, 2025

MaheshRavishankar commented Jan 30, 2025

kuhar commented Jan 30, 2025

nirvedhmeshram commented Jan 31, 2025

nirvedhmeshram commented Jan 31, 2025

jerryyin left a comment

Choose a reason for hiding this comment

kuhar left a comment

Choose a reason for hiding this comment

nirvedhmeshram commented Feb 4, 2025

nirvedhmeshram commented Jan 30, 2025 •

edited

Loading