Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate reshapes through generics with reduction iterators #18857

Merged
merged 5 commits into from
Oct 30, 2024

Conversation

IanWood1
Copy link
Contributor

@IanWood1 IanWood1 commented Oct 21, 2024

Removes the constraint in BubbleUpExpandShapes that prevents moving tensor reshape ops through reduction linalg.generic ops. This has the benefit of increasing the dimensionality of reduction ops (more fusion opportunities) as well as increasing the chance these ops will be moved to the edge of the program.

Closes #18854

@IanWood1
Copy link
Contributor Author

Tracked regression to the following pattern (input to BubbleUpExpandShapes):

// Note %420 is used by %421 and %423
%420 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} ins(%collapsed_476 : tensor<32x4x1048576xf16>) outs(%358 : tensor<32x4x1048576xf32>) {
  ^bb0(%in: f16, %out: f32):
    %436 = arith.extf %in : f16 to f32
    linalg.yield %436 : f32
  } -> tensor<32x4x1048576xf32>
  %421 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0)>], iterator_types = ["parallel", "reduction", "reduction"]} ins(%420 : tensor<32x4x1048576xf32>) outs(%20 : tensor<32xf32>) {
  ^bb0(%in: f32, %out: f32):
    %436 = arith.addf %in, %out : f32
    linalg.yield %436 : f32
  } -> tensor<32xf32>
  %422 = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>], iterator_types = ["parallel"]} ins(%421 : tensor<32xf32>) outs(%19 : tensor<32xf32>) {
  ^bb0(%in: f32, %out: f32):
    %436 = arith.divf %in, %cst_3 : f32
    linalg.yield %436 : f32
  } -> tensor<32xf32>
  %423 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0)>, affine_map<(d0, d1, d2) -> (d0)>], iterator_types = ["parallel", "reduction", "reduction"]} ins(%420, %422 : tensor<32x4x1048576xf32>, tensor<32xf32>) outs(%20 : tensor<32xf32>) {
  ^bb0(%in: f32, %in_488: f32, %out: f32):
    %436 = arith.subf %in, %in_488 : f32
    %437 = arith.mulf %436, %436 : f32
    %438 = arith.addf %437, %out : f32
    linalg.yield %438 : f32
  } -> tensor<32xf32>
  %expanded_477 = tensor.expand_shape %422 [[0, 1]] output_shape [1, 32] : tensor<32xf32> into tensor<1x32xf32>
  %expanded_478 = tensor.expand_shape %423 [[0, 1]] output_shape [1, 32] : tensor<32xf32> into tensor<1x32xf32>

An tensor.expand_shape get caught in between %420 and %423 because %420 has multiple uses. I removed the check that prevents bubbling if the producer has multiple uses, and it seemed to clear up the problem. I'm currently testing locally to see what other regressions pop up

cc @MaheshRavishankar

@IanWood1 IanWood1 force-pushed the prop_through_reduction branch from a39d2e4 to 4397805 Compare October 22, 2024 17:06
Comment on lines 88 to -89
// CHECK: %[[GEN1:.+]] = linalg.generic
// CHECK-SAME: ["parallel", "reduction", "reduction"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this is because the collapse_shape (from drop unit dims) now gets propagated down, and CollapseDimensionsPass won't collapse operations with multiple reduction iterators (waiting for codegen support)

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
@IanWood1 IanWood1 force-pushed the prop_through_reduction branch from 4397805 to 7cd7fef Compare October 22, 2024 17:37
@IanWood1 IanWood1 marked this pull request as ready for review October 22, 2024 18:00
@IanWood1 IanWood1 removed the request for review from ScottTodd October 22, 2024 18:00
Copy link
Contributor

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like having as little checks as possible here. THanks!

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
@IanWood1 IanWood1 merged commit 78481a6 into iree-org:main Oct 30, 2024
36 checks passed
@IanWood1 IanWood1 deleted the prop_through_reduction branch October 30, 2024 20:44
Eliasj42 pushed a commit that referenced this pull request Oct 31, 2024
Removes the constraint in `BubbleUpExpandShapes` that prevents moving
tensor reshape ops through reduction `linalg.generic` ops. This has the
benefit of increasing the dimensionality of reduction ops (more fusion
opportunities) as well as increasing the chance these ops will be moved
to the edge of the program.

Closes #18854

---------

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Signed-off-by: Elias Joseph <eljoseph@amd.com>
IanWood1 added a commit to IanWood1/iree that referenced this pull request Oct 31, 2024
IanWood1 added a commit to IanWood1/iree that referenced this pull request Oct 31, 2024
…ree-org#18857)"

This reverts commit 78481a6.

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
IanWood1 added a commit to IanWood1/iree that referenced this pull request Oct 31, 2024
…ree-org#18857)"

This reverts commit 78481a6.

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
IanWood1 added a commit that referenced this pull request Oct 31, 2024
…(#18857)"

This regresses sdxl int8 perf by increasing the dimensionality of
`attention` ops which messes with the attention spec. Revert this for
now and reland once `CollapseDimensionsPass` can handle attention.

This reverts commit 78481a6.

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
IanWood1 added a commit that referenced this pull request Dec 3, 2024
This is the 1/2 changes needed to reland
#18857 (with an open PR
#19113).


Adds pattern to bubble up expand shape through extract slice. i.e
`expand(extract)` to `extract(expand)`. This only supports the case
where the expanded dimensions are not modified by the extract slice and
there are no dynamic dimensions.

This is important because `tensor.expand_shape` ops _cannot be cloned_
while `tensor.extract_slice` ops _can be cloned_. So, if the
`expand_shape` gets stuck on the bottom of the `extract_slice` it will
block it from being cloned and the `extract_slice` will have to be put
into its own dispatch.

---------

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
giacs-epic pushed a commit to giacs-epic/iree that referenced this pull request Dec 4, 2024
…g#18857)

Removes the constraint in `BubbleUpExpandShapes` that prevents moving
tensor reshape ops through reduction `linalg.generic` ops. This has the
benefit of increasing the dimensionality of reduction ops (more fusion
opportunities) as well as increasing the chance these ops will be moved
to the edge of the program.

Closes iree-org#18854

---------

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Signed-off-by: Giacomo Serafini <179146510+giacs-epic@users.noreply.github.com>
giacs-epic pushed a commit to giacs-epic/iree that referenced this pull request Dec 4, 2024
…#18968)

…(iree-org#18857)"

This regresses sdxl int8 perf by increasing the dimensionality of
`attention` ops which messes with the attention spec. Revert this for
now and reland once `CollapseDimensionsPass` can handle attention.

This reverts commit 78481a6.

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Signed-off-by: Giacomo Serafini <179146510+giacs-epic@users.noreply.github.com>
giacs-epic pushed a commit to giacs-epic/iree that referenced this pull request Dec 4, 2024
…#19325)

This is the 1/2 changes needed to reland
iree-org#18857 (with an open PR
iree-org#19113).

Adds pattern to bubble up expand shape through extract slice. i.e
`expand(extract)` to `extract(expand)`. This only supports the case
where the expanded dimensions are not modified by the extract slice and
there are no dynamic dimensions.

This is important because `tensor.expand_shape` ops _cannot be cloned_
while `tensor.extract_slice` ops _can be cloned_. So, if the
`expand_shape` gets stuck on the bottom of the `extract_slice` it will
block it from being cloned and the `extract_slice` will have to be put
into its own dispatch.

---------

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Signed-off-by: Giacomo Serafini <179146510+giacs-epic@users.noreply.github.com>
IanWood1 added a commit that referenced this pull request Dec 5, 2024
This is the 2/2 changes needed to reland
#18857 (open PR
#19113).


There are 2 small subchanges in this pr:
- Refactor to lambda `isPossiblySoftmax` and tweak the condition to
check for operand indexing maps that are projected permutations but not
permutations.
- Extend `getCollapsibleLoops` look at all operations to get contiguous
loops.

Also, I added test case that needed both these changes to pass (without
being collapsed it was causing regressions on the GPU backend).

---------

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move reshape ops through linalg.generic with reduction iterators
2 participants