[DispatchCreation] Run preprocessing before elementwise fusion #18920

IanWood1 · 2024-10-28T18:20:10Z

I think it makes sense to run FusionPreprocessingPass before ElementwiseOpFusionPass because it helps put the IR in a better state for fusion (e.g. interchanging linalg.generic indexing maps). But also, reshapes have been propagated to the edges of the program, which allows the GatherFusionPattern to be more effective.

Fixes compilation error from #17226 (comment).

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>

IanWood1 · 2024-10-28T18:22:17Z

This is the state of the IR (from the linked issue) before reshape propagation:

  %2 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%cst : tensor<30522x128xf16>) outs(%1 : tensor<30522x128xf32>) {
  ^bb0(%in: f16, %out: f32):
    %8 = arith.extf %in : f16 to f32
    linalg.yield %8 : f32
  } -> tensor<30522x128xf32>   
  %expanded = tensor.expand_shape %2 [[0, 1], [2]] output_shape [1, 30522, 128] : tensor<30522x128xf32> into tensor<1x30522x128xf32>                                                         
  %collapsed = tensor.collapse_shape %0 [[0, 1]] : tensor<1x128xi32> into tensor<128xi32>
  %3 = tensor.empty() : tensor<128x128xf32>
  %4 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%collapsed : tensor<128xi32>) outs(%3 : tensor<128x128xf32>) {
  ^bb0(%in: i32, %out: f32):
    %8 = arith.index_cast %in : i32 to index
    %9 = linalg.index 1 : index
    %extracted = tensor.extract %expanded[%c0, %8, %9] : tensor<1x30522x128xf32>
    linalg.yield %extracted : f32
  } -> tensor<128x128xf32>

Note that %expanded blocks GatherFusionPattern. But after reshapes are propagated, this is not a problem

hanhanW

It makes sense to me. Just one question, I think it is a prerequisite for element-wise fusion. Perhaps we should also delete the below passes (which are run right before addDispatchRegionCreationPreprocessingPasses), because they are moved into the preprocessing passes. What do you think?

(You probably want to run canonicalizer and cse at the beginning of the pipeline.)

iree/compiler/src/iree/compiler/DispatchCreation/Passes.cpp

Lines 297 to 301 in e66171a

    
           FunctionLikeNest(passManager) 
        
               // Preprocess the input to a form more amenable for fusion. 
        
               .addPass(DispatchCreation::createFusionPreprocessingPass) 
        
               .addPass(IREE::Flow::createCanonicalizerPass) 
        
               .addPass(mlir::createCSEPass);

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>

IanWood1 · 2024-10-28T19:42:55Z

It makes sense to me. Just one question, I think it is a prerequisite for element-wise fusion. Perhaps we should also delete the below passes (which are run right before addDispatchRegionCreationPreprocessingPasses), because they are moved into the preprocessing passes. What do you think?

(You probably want to run canonicalizer and cse at the beginning of the pipeline.)

iree/compiler/src/iree/compiler/DispatchCreation/Passes.cpp

Lines 297 to 301 in e66171a

FunctionLikeNest(passManager)

// Preprocess the input to a form more amenable for fusion.

.addPass(DispatchCreation::createFusionPreprocessingPass)

.addPass(IREE::Flow::createCanonicalizerPass)

.addPass(mlir::createCSEPass);

That makes sense to me, I'll move them inside and delete the extra preprocessing

IanWood1 · 2024-10-28T19:43:47Z

@c-rhodes this should fix the problem you were encountering on #17226 (comment)

MaheshRavishankar

Test please :) .

c-rhodes

Can confirm this fixes the issue, thanks for the fix!

c-rhodes · 2024-10-29T07:42:08Z

there's a regression in amdgpu_rocm_mi300_gfx942 but I can see this in other PRs so I think it's safe to say it's not because of this PR. I'll go ahead and land the fix, cheers!

ScottTodd · 2024-10-29T15:56:01Z

This might have regressed VAE decode time on MI250.

Time before was around 317ms: https://github.com/iree-org/iree/actions/runs/11567948617/job/32199348505#step:8:56
Time after is 338ms https://github.com/iree-org/iree/actions/runs/11569362382/job/32227054297#step:8:128

That was also reported in the checks on the PR: https://github.com/iree-org/iree/actions/runs/11561437104/job/32180922269#step:8:128

IanWood1 · 2024-10-29T16:05:45Z

This might have regressed VAE decode time on MI250.
* Time before was around 317ms: [iree-org/iree/actions/runs/11567948617/job/32199348505#step:8:56](https://github.com/iree-org/iree/actions/runs/11567948617/job/32199348505#step:8:56)

* Time after is 338ms [iree-org/iree/actions/runs/11569362382/job/32227054297#step:8:128](https://github.com/iree-org/iree/actions/runs/11569362382/job/32227054297#step:8:128)
That was also reported in the checks on the PR: iree-org/iree/actions/runs/11561437104/job/32180922269#step:8:128

@ScottTodd thank you for the comment, I didn't realize this was merged. I was going to look into the regressions before merging but don't have a fix yet. This may cause issues for other's PRs so I'll open a revert and re-land once the regressions have been resolved.

I looked into this a bit yesterday and I'm a bit confused why this is causing runtime regressions but no change to dispatch count.

#18920)" This reverts commit fa752ae.

c-rhodes · 2024-10-29T16:15:28Z

apologies for jumping the gun and landing this one! I noticed the regression but thought I saw the same one on other PRs and disregarded it.

IanWood1 · 2024-10-29T16:16:49Z

apologies for jumping the gun and landing this one! I noticed the regression but thought I saw the same one on other PRs and disregarded it.

No worries, I also thought it was just being flaky at first :)

@ScottTodd

This PR got merged before I was able to resolve the perf regressions in VAE decode on MI250. See @ScottTodd's comment on the original PR. I need time to resolve the regressions but this can be relanded once resolved Reverts #18920

I think it makes sense to run `FusionPreprocessingPass` before `ElementwiseOpFusionPass` because it helps put the IR in a better state for fusion (e.g. interchanging `linalg.generic` indexing maps). But also, reshapes have been propagated to the edges of the program, which allows the `GatherFusionPattern` to be more effective. Fixes compilation error from #17226 (comment). --------- Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu> Signed-off-by: Elias Joseph <eljoseph@amd.com>

@ScottTodd

This PR got merged before I was able to resolve the perf regressions in VAE decode on MI250. See @ScottTodd's comment on the original PR. I need time to resolve the regressions but this can be relanded once resolved Reverts #18920 Signed-off-by: Elias Joseph <eljoseph@amd.com>

…org#18920) I think it makes sense to run `FusionPreprocessingPass` before `ElementwiseOpFusionPass` because it helps put the IR in a better state for fusion (e.g. interchanging `linalg.generic` indexing maps). But also, reshapes have been propagated to the edges of the program, which allows the `GatherFusionPattern` to be more effective. Fixes compilation error from iree-org#17226 (comment). --------- Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu> Signed-off-by: Giacomo Serafini <179146510+giacs-epic@users.noreply.github.com>

@ScottTodd

This PR got merged before I was able to resolve the perf regressions in VAE decode on MI250. See @ScottTodd's comment on the original PR. I need time to resolve the regressions but this can be relanded once resolved Reverts iree-org#18920 Signed-off-by: Giacomo Serafini <179146510+giacs-epic@users.noreply.github.com>

Run preprocessing before elementwise fusion

e6df5a2

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>

IanWood1 marked this pull request as ready for review October 28, 2024 18:33

IanWood1 requested review from hanhanW and MaheshRavishankar as code owners October 28, 2024 18:33

hanhanW reviewed Oct 28, 2024

View reviewed changes

Move cse & canon passes and delete extra preprocessing

3260dff

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>

hanhanW approved these changes Oct 28, 2024

View reviewed changes

MaheshRavishankar approved these changes Oct 29, 2024

View reviewed changes

c-rhodes approved these changes Oct 29, 2024

View reviewed changes

c-rhodes merged commit fa752ae into iree-org:main Oct 29, 2024
35 of 36 checks passed

IanWood1 added a commit that referenced this pull request Oct 29, 2024

Revert "[DispatchCreation] Run preprocessing before elementwise fusion (

7106785

#18920)" This reverts commit fa752ae.

IanWood1 mentioned this pull request Oct 29, 2024

Revert "[DispatchCreation] Run preprocessing before..." #18934

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DispatchCreation] Run preprocessing before elementwise fusion #18920

[DispatchCreation] Run preprocessing before elementwise fusion #18920

IanWood1 commented Oct 28, 2024

IanWood1 commented Oct 28, 2024 •

edited

Loading

hanhanW left a comment

IanWood1 commented Oct 28, 2024

IanWood1 commented Oct 28, 2024

MaheshRavishankar left a comment

c-rhodes left a comment

c-rhodes commented Oct 29, 2024

ScottTodd commented Oct 29, 2024

IanWood1 commented Oct 29, 2024

c-rhodes commented Oct 29, 2024

IanWood1 commented Oct 29, 2024

	FunctionLikeNest(passManager)
	// Preprocess the input to a form more amenable for fusion.
	.addPass(DispatchCreation::createFusionPreprocessingPass)
	.addPass(IREE::Flow::createCanonicalizerPass)
	.addPass(mlir::createCSEPass);

[DispatchCreation] Run preprocessing before elementwise fusion #18920

[DispatchCreation] Run preprocessing before elementwise fusion #18920

Conversation

IanWood1 commented Oct 28, 2024

IanWood1 commented Oct 28, 2024 • edited Loading

hanhanW left a comment

Choose a reason for hiding this comment

IanWood1 commented Oct 28, 2024

IanWood1 commented Oct 28, 2024

MaheshRavishankar left a comment

Choose a reason for hiding this comment

c-rhodes left a comment

Choose a reason for hiding this comment

c-rhodes commented Oct 29, 2024

ScottTodd commented Oct 29, 2024

IanWood1 commented Oct 29, 2024

c-rhodes commented Oct 29, 2024

IanWood1 commented Oct 29, 2024

IanWood1 commented Oct 28, 2024 •

edited

Loading