-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagate reshapes through generics with reduction iterators #18857
Conversation
Tracked regression to the following pattern (input to // Note %420 is used by %421 and %423
%420 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} ins(%collapsed_476 : tensor<32x4x1048576xf16>) outs(%358 : tensor<32x4x1048576xf32>) {
^bb0(%in: f16, %out: f32):
%436 = arith.extf %in : f16 to f32
linalg.yield %436 : f32
} -> tensor<32x4x1048576xf32>
%421 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0)>], iterator_types = ["parallel", "reduction", "reduction"]} ins(%420 : tensor<32x4x1048576xf32>) outs(%20 : tensor<32xf32>) {
^bb0(%in: f32, %out: f32):
%436 = arith.addf %in, %out : f32
linalg.yield %436 : f32
} -> tensor<32xf32>
%422 = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>], iterator_types = ["parallel"]} ins(%421 : tensor<32xf32>) outs(%19 : tensor<32xf32>) {
^bb0(%in: f32, %out: f32):
%436 = arith.divf %in, %cst_3 : f32
linalg.yield %436 : f32
} -> tensor<32xf32>
%423 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0)>, affine_map<(d0, d1, d2) -> (d0)>], iterator_types = ["parallel", "reduction", "reduction"]} ins(%420, %422 : tensor<32x4x1048576xf32>, tensor<32xf32>) outs(%20 : tensor<32xf32>) {
^bb0(%in: f32, %in_488: f32, %out: f32):
%436 = arith.subf %in, %in_488 : f32
%437 = arith.mulf %436, %436 : f32
%438 = arith.addf %437, %out : f32
linalg.yield %438 : f32
} -> tensor<32xf32>
%expanded_477 = tensor.expand_shape %422 [[0, 1]] output_shape [1, 32] : tensor<32xf32> into tensor<1x32xf32>
%expanded_478 = tensor.expand_shape %423 [[0, 1]] output_shape [1, 32] : tensor<32xf32> into tensor<1x32xf32> An |
a39d2e4
to
4397805
Compare
// CHECK: %[[GEN1:.+]] = linalg.generic | ||
// CHECK-SAME: ["parallel", "reduction", "reduction"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this is because the collapse_shape
(from drop unit dims) now gets propagated down, and CollapseDimensionsPass
won't collapse operations with multiple reduction iterators (waiting for codegen support)
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
4397805
to
7cd7fef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like having as little checks as possible here. THanks!
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Removes the constraint in `BubbleUpExpandShapes` that prevents moving tensor reshape ops through reduction `linalg.generic` ops. This has the benefit of increasing the dimensionality of reduction ops (more fusion opportunities) as well as increasing the chance these ops will be moved to the edge of the program. Closes #18854 --------- Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu> Signed-off-by: Elias Joseph <eljoseph@amd.com>
…ree-org#18857)" This reverts commit 78481a6.
…ree-org#18857)" This reverts commit 78481a6. Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
…ree-org#18857)" This reverts commit 78481a6. Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
This is the 1/2 changes needed to reland #18857 (with an open PR #19113). Adds pattern to bubble up expand shape through extract slice. i.e `expand(extract)` to `extract(expand)`. This only supports the case where the expanded dimensions are not modified by the extract slice and there are no dynamic dimensions. This is important because `tensor.expand_shape` ops _cannot be cloned_ while `tensor.extract_slice` ops _can be cloned_. So, if the `expand_shape` gets stuck on the bottom of the `extract_slice` it will block it from being cloned and the `extract_slice` will have to be put into its own dispatch. --------- Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
…g#18857) Removes the constraint in `BubbleUpExpandShapes` that prevents moving tensor reshape ops through reduction `linalg.generic` ops. This has the benefit of increasing the dimensionality of reduction ops (more fusion opportunities) as well as increasing the chance these ops will be moved to the edge of the program. Closes iree-org#18854 --------- Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu> Signed-off-by: Giacomo Serafini <179146510+giacs-epic@users.noreply.github.com>
…#18968) …(iree-org#18857)" This regresses sdxl int8 perf by increasing the dimensionality of `attention` ops which messes with the attention spec. Revert this for now and reland once `CollapseDimensionsPass` can handle attention. This reverts commit 78481a6. Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu> Signed-off-by: Giacomo Serafini <179146510+giacs-epic@users.noreply.github.com>
…#19325) This is the 1/2 changes needed to reland iree-org#18857 (with an open PR iree-org#19113). Adds pattern to bubble up expand shape through extract slice. i.e `expand(extract)` to `extract(expand)`. This only supports the case where the expanded dimensions are not modified by the extract slice and there are no dynamic dimensions. This is important because `tensor.expand_shape` ops _cannot be cloned_ while `tensor.extract_slice` ops _can be cloned_. So, if the `expand_shape` gets stuck on the bottom of the `extract_slice` it will block it from being cloned and the `extract_slice` will have to be put into its own dispatch. --------- Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu> Signed-off-by: Giacomo Serafini <179146510+giacs-epic@users.noreply.github.com>
This is the 2/2 changes needed to reland #18857 (open PR #19113). There are 2 small subchanges in this pr: - Refactor to lambda `isPossiblySoftmax` and tweak the condition to check for operand indexing maps that are projected permutations but not permutations. - Extend `getCollapsibleLoops` look at all operations to get contiguous loops. Also, I added test case that needed both these changes to pass (without being collapsed it was causing regressions on the GPU backend). --------- Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Removes the constraint in
BubbleUpExpandShapes
that prevents moving tensor reshape ops through reductionlinalg.generic
ops. This has the benefit of increasing the dimensionality of reduction ops (more fusion opportunities) as well as increasing the chance these ops will be moved to the edge of the program.Closes #18854