[TKW] Implement support for multiple iter args on Reduction #166

raikonenfnu · 2024-09-25T00:06:26Z

The main motivation behind this PR is to enable multiple induction variable/iterArg on the same tiled "Reduction" loop. To enable above we did a couple things:

Enable lowering/expansion on operator.getitem (the op that extract multiple results in python i.e res0, res1 = fn) by templating it onGetResult(CustomOp) since they have the same args and interface and can reuse most of the indexing/expansion helper.
Introduce res_idx, a variable to represent which result index of an op we are referring to, during expansion and context map. This is useful for ops that has more than one results / variables as outputs.
bug fix in expand_reduction, where we hoist out iterating and expanding of reduction.init_args out of the loop that iterates and expands over the yield/return_val of the reduction loop. It is expected that the size of init_args is the same as size of yield/return_val. Hence if we had N iter_args/yields, we ended up expanding the init_args N x N time instead of N times. We haven't seen it thus far because we have been only playing with 1 init_arg/iterArg, and 1x1 == 1.
Introduce a canonicalization pattern to fold chains of GetResult. this is because GetResult by semantic/design is only expected to extract and have one result. Hence a chain of GetResult should just be replaced by itself. This help clean up the IR.

num.4 also helps circumvent issue where Reduction and GetResult is expanded completely by itself not following the DFS structure per dimension like the rest of the expansion code. This becomes especially problematic for multiple IterArg since Getitem is not expecting its' source value to be expanded without it.

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

harsh-nod

lgtm! modulo some comments

shark_turbine/kernel/wave/utils.py

shark_turbine/kernel/wave/expansion.py

harsh-nod · 2024-09-26T22:27:38Z

tests/kernel/wave/wave_e2e_test.py

@@ -343,7 +351,7 @@ def repeat(
        # Assert equal does cast to boolean on torch.Tensor
        # which causes issues, hence we cast to numpy before
        # checking.
-        assert_equal(c, ref.values.numpy())
+        assert_allclose(ref, c, atol=0.07)


That tolerance seems quite high for fp32?

lowered to 0.015 :)

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

…g#166) The main motivation behind this PR is to enable multiple induction variable/iterArg on the same tiled "Reduction" loop. To enable above we did a couple things: 1. Enable lowering/expansion on `operator.getitem` (the op that extract multiple results in python i.e `res0, res1 = fn`) by templating it on`GetResult(CustomOp)` since they have the same args and interface and can reuse most of the indexing/expansion helper. 2. Introduce `res_idx`, a variable to represent which result index of an op we are referring to, during expansion and context map. This is useful for ops that has more than one results / variables as outputs. 3. bug fix in expand_reduction, where we hoist out iterating and expanding of `reduction.init_args` out of the loop that iterates and expands over the `yield`/`return_val` of the reduction loop. It is expected that the size of `init_args` is the same as size of `yield`/`return_val`. Hence if we had N iter_args/yields, we ended up expanding the `init_args` N x N time instead of N times. We haven't seen it thus far because we have been only playing with 1 init_arg/iterArg, and 1x1 == 1. 4. Introduce a canonicalization pattern to fold chains of GetResult. this is because GetResult by semantic/design is only expected to extract and have one result. Hence a chain of GetResult should just be replaced by itself. This help clean up the IR. num.4 also helps circumvent issue where Reduction and GetResult is expanded completely by itself not following the DFS structure per dimension like the rest of the expansion code. This becomes especially problematic for multiple IterArg since Getitem is not expecting its' source value to be expanded without it. --------- Signed-off-by: Stanley Winata <stanley.winata@amd.com>

…g#166) The main motivation behind this PR is to enable multiple induction variable/iterArg on the same tiled "Reduction" loop. To enable above we did a couple things: 1. Enable lowering/expansion on `operator.getitem` (the op that extract multiple results in python i.e `res0, res1 = fn`) by templating it on`GetResult(CustomOp)` since they have the same args and interface and can reuse most of the indexing/expansion helper. 2. Introduce `res_idx`, a variable to represent which result index of an op we are referring to, during expansion and context map. This is useful for ops that has more than one results / variables as outputs. 3. bug fix in expand_reduction, where we hoist out iterating and expanding of `reduction.init_args` out of the loop that iterates and expands over the `yield`/`return_val` of the reduction loop. It is expected that the size of `init_args` is the same as size of `yield`/`return_val`. Hence if we had N iter_args/yields, we ended up expanding the `init_args` N x N time instead of N times. We haven't seen it thus far because we have been only playing with 1 init_arg/iterArg, and 1x1 == 1. 4. Introduce a canonicalization pattern to fold chains of GetResult. this is because GetResult by semantic/design is only expected to extract and have one result. Hence a chain of GetResult should just be replaced by itself. This help clean up the IR. num.4 also helps circumvent issue where Reduction and GetResult is expanded completely by itself not following the DFS structure per dimension like the rest of the expansion code. This becomes especially problematic for multiple IterArg since Getitem is not expecting its' source value to be expanded without it. --------- Signed-off-by: Stanley Winata <stanley.winata@amd.com> Signed-off-by: Ian <ian.nordeng@amd.com>

The main motivation behind this PR is to enable multiple induction variable/iterArg on the same tiled "Reduction" loop. To enable above we did a couple things: 1. Enable lowering/expansion on `operator.getitem` (the op that extract multiple results in python i.e `res0, res1 = fn`) by templating it on`GetResult(CustomOp)` since they have the same args and interface and can reuse most of the indexing/expansion helper. 2. Introduce `res_idx`, a variable to represent which result index of an op we are referring to, during expansion and context map. This is useful for ops that has more than one results / variables as outputs. 3. bug fix in expand_reduction, where we hoist out iterating and expanding of `reduction.init_args` out of the loop that iterates and expands over the `yield`/`return_val` of the reduction loop. It is expected that the size of `init_args` is the same as size of `yield`/`return_val`. Hence if we had N iter_args/yields, we ended up expanding the `init_args` N x N time instead of N times. We haven't seen it thus far because we have been only playing with 1 init_arg/iterArg, and 1x1 == 1. 4. Introduce a canonicalization pattern to fold chains of GetResult. this is because GetResult by semantic/design is only expected to extract and have one result. Hence a chain of GetResult should just be replaced by itself. This help clean up the IR. num.4 also helps circumvent issue where Reduction and GetResult is expanded completely by itself not following the DFS structure per dimension like the rest of the expansion code. This becomes especially problematic for multiple IterArg since Getitem is not expecting its' source value to be expanded without it. --------- Signed-off-by: Stanley Winata <stanley.winata@amd.com>

raikonenfnu requested review from harsh-nod and Hardcode84 September 25, 2024 00:06

raikonenfnu force-pushed the multipleIVUpstream branch from e89af02 to 179ac05 Compare September 26, 2024 22:04

raikonenfnu added 4 commits September 26, 2024 15:05

[TKW] Implement support for multiple iter args

4f47e3b

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

Add queried arg idx

12d0ce1

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

Add LIT and e2e tests

45ef794

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

rename for better clarity

179ac05

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

harsh-nod approved these changes Sep 26, 2024

View reviewed changes

fix comments and NITs

0463f29

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

raikonenfnu merged commit 0e16d54 into iree-org:main Sep 27, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TKW] Implement support for multiple iter args on Reduction #166

[TKW] Implement support for multiple iter args on Reduction #166

raikonenfnu commented Sep 25, 2024 •

edited

Loading

harsh-nod left a comment

harsh-nod Sep 26, 2024

raikonenfnu Sep 26, 2024

[TKW] Implement support for multiple iter args on Reduction #166

[TKW] Implement support for multiple iter args on Reduction #166

Conversation

raikonenfnu commented Sep 25, 2024 • edited Loading

harsh-nod left a comment

Choose a reason for hiding this comment

harsh-nod Sep 26, 2024

Choose a reason for hiding this comment

raikonenfnu Sep 26, 2024

Choose a reason for hiding this comment

raikonenfnu commented Sep 25, 2024 •

edited

Loading