-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TKW] Implement support for multiple iter args on Reduction #166
Conversation
e89af02
to
179ac05
Compare
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! modulo some comments
tests/kernel/wave/wave_e2e_test.py
Outdated
@@ -343,7 +351,7 @@ def repeat( | |||
# Assert equal does cast to boolean on torch.Tensor | |||
# which causes issues, hence we cast to numpy before | |||
# checking. | |||
assert_equal(c, ref.values.numpy()) | |||
assert_allclose(ref, c, atol=0.07) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That tolerance seems quite high for fp32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lowered to 0.015 :)
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
…g#166) The main motivation behind this PR is to enable multiple induction variable/iterArg on the same tiled "Reduction" loop. To enable above we did a couple things: 1. Enable lowering/expansion on `operator.getitem` (the op that extract multiple results in python i.e `res0, res1 = fn`) by templating it on`GetResult(CustomOp)` since they have the same args and interface and can reuse most of the indexing/expansion helper. 2. Introduce `res_idx`, a variable to represent which result index of an op we are referring to, during expansion and context map. This is useful for ops that has more than one results / variables as outputs. 3. bug fix in expand_reduction, where we hoist out iterating and expanding of `reduction.init_args` out of the loop that iterates and expands over the `yield`/`return_val` of the reduction loop. It is expected that the size of `init_args` is the same as size of `yield`/`return_val`. Hence if we had N iter_args/yields, we ended up expanding the `init_args` N x N time instead of N times. We haven't seen it thus far because we have been only playing with 1 init_arg/iterArg, and 1x1 == 1. 4. Introduce a canonicalization pattern to fold chains of GetResult. this is because GetResult by semantic/design is only expected to extract and have one result. Hence a chain of GetResult should just be replaced by itself. This help clean up the IR. num.4 also helps circumvent issue where Reduction and GetResult is expanded completely by itself not following the DFS structure per dimension like the rest of the expansion code. This becomes especially problematic for multiple IterArg since Getitem is not expecting its' source value to be expanded without it. --------- Signed-off-by: Stanley Winata <stanley.winata@amd.com>
…g#166) The main motivation behind this PR is to enable multiple induction variable/iterArg on the same tiled "Reduction" loop. To enable above we did a couple things: 1. Enable lowering/expansion on `operator.getitem` (the op that extract multiple results in python i.e `res0, res1 = fn`) by templating it on`GetResult(CustomOp)` since they have the same args and interface and can reuse most of the indexing/expansion helper. 2. Introduce `res_idx`, a variable to represent which result index of an op we are referring to, during expansion and context map. This is useful for ops that has more than one results / variables as outputs. 3. bug fix in expand_reduction, where we hoist out iterating and expanding of `reduction.init_args` out of the loop that iterates and expands over the `yield`/`return_val` of the reduction loop. It is expected that the size of `init_args` is the same as size of `yield`/`return_val`. Hence if we had N iter_args/yields, we ended up expanding the `init_args` N x N time instead of N times. We haven't seen it thus far because we have been only playing with 1 init_arg/iterArg, and 1x1 == 1. 4. Introduce a canonicalization pattern to fold chains of GetResult. this is because GetResult by semantic/design is only expected to extract and have one result. Hence a chain of GetResult should just be replaced by itself. This help clean up the IR. num.4 also helps circumvent issue where Reduction and GetResult is expanded completely by itself not following the DFS structure per dimension like the rest of the expansion code. This becomes especially problematic for multiple IterArg since Getitem is not expecting its' source value to be expanded without it. --------- Signed-off-by: Stanley Winata <stanley.winata@amd.com> Signed-off-by: Ian <ian.nordeng@amd.com>
…g#166) The main motivation behind this PR is to enable multiple induction variable/iterArg on the same tiled "Reduction" loop. To enable above we did a couple things: 1. Enable lowering/expansion on `operator.getitem` (the op that extract multiple results in python i.e `res0, res1 = fn`) by templating it on`GetResult(CustomOp)` since they have the same args and interface and can reuse most of the indexing/expansion helper. 2. Introduce `res_idx`, a variable to represent which result index of an op we are referring to, during expansion and context map. This is useful for ops that has more than one results / variables as outputs. 3. bug fix in expand_reduction, where we hoist out iterating and expanding of `reduction.init_args` out of the loop that iterates and expands over the `yield`/`return_val` of the reduction loop. It is expected that the size of `init_args` is the same as size of `yield`/`return_val`. Hence if we had N iter_args/yields, we ended up expanding the `init_args` N x N time instead of N times. We haven't seen it thus far because we have been only playing with 1 init_arg/iterArg, and 1x1 == 1. 4. Introduce a canonicalization pattern to fold chains of GetResult. this is because GetResult by semantic/design is only expected to extract and have one result. Hence a chain of GetResult should just be replaced by itself. This help clean up the IR. num.4 also helps circumvent issue where Reduction and GetResult is expanded completely by itself not following the DFS structure per dimension like the rest of the expansion code. This becomes especially problematic for multiple IterArg since Getitem is not expecting its' source value to be expanded without it. --------- Signed-off-by: Stanley Winata <stanley.winata@amd.com> Signed-off-by: Ian <ian.nordeng@amd.com>
The main motivation behind this PR is to enable multiple induction variable/iterArg on the same tiled "Reduction" loop. To enable above we did a couple things: 1. Enable lowering/expansion on `operator.getitem` (the op that extract multiple results in python i.e `res0, res1 = fn`) by templating it on`GetResult(CustomOp)` since they have the same args and interface and can reuse most of the indexing/expansion helper. 2. Introduce `res_idx`, a variable to represent which result index of an op we are referring to, during expansion and context map. This is useful for ops that has more than one results / variables as outputs. 3. bug fix in expand_reduction, where we hoist out iterating and expanding of `reduction.init_args` out of the loop that iterates and expands over the `yield`/`return_val` of the reduction loop. It is expected that the size of `init_args` is the same as size of `yield`/`return_val`. Hence if we had N iter_args/yields, we ended up expanding the `init_args` N x N time instead of N times. We haven't seen it thus far because we have been only playing with 1 init_arg/iterArg, and 1x1 == 1. 4. Introduce a canonicalization pattern to fold chains of GetResult. this is because GetResult by semantic/design is only expected to extract and have one result. Hence a chain of GetResult should just be replaced by itself. This help clean up the IR. num.4 also helps circumvent issue where Reduction and GetResult is expanded completely by itself not following the DFS structure per dimension like the rest of the expansion code. This becomes especially problematic for multiple IterArg since Getitem is not expecting its' source value to be expanded without it. --------- Signed-off-by: Stanley Winata <stanley.winata@amd.com>
The main motivation behind this PR is to enable multiple induction variable/iterArg on the same tiled "Reduction" loop. To enable above we did a couple things:
Enable lowering/expansion on
operator.getitem
(the op that extract multiple results in python i.eres0, res1 = fn
) by templating it onGetResult(CustomOp)
since they have the same args and interface and can reuse most of the indexing/expansion helper.Introduce
res_idx
, a variable to represent which result index of an op we are referring to, during expansion and context map. This is useful for ops that has more than one results / variables as outputs.bug fix in expand_reduction, where we hoist out iterating and expanding of
reduction.init_args
out of the loop that iterates and expands over theyield
/return_val
of the reduction loop. It is expected that the size ofinit_args
is the same as size ofyield
/return_val
. Hence if we had N iter_args/yields, we ended up expanding theinit_args
N x N time instead of N times. We haven't seen it thus far because we have been only playing with 1 init_arg/iterArg, and 1x1 == 1.Introduce a canonicalization pattern to fold chains of GetResult. this is because GetResult by semantic/design is only expected to extract and have one result. Hence a chain of GetResult should just be replaced by itself. This help clean up the IR.
num.4 also helps circumvent issue where Reduction and GetResult is expanded completely by itself not following the DFS structure per dimension like the rest of the expansion code. This becomes especially problematic for multiple IterArg since Getitem is not expecting its' source value to be expanded without it.